How do I read files with weird characters in it?
Using clisp 2.32 with XEmacs on Windows XP, I'm trying to read a file
into a string using this:
(with-open-file (stream "sample.file")
(let ((result (make-array (file-length stream))))
(read-sequence result stream)
(setf *file-string* result)))
This works unless there are weird characters in the file such as
#\LATIN_CAPITAL_LETTER_N_WITH_TILDE. Then I get:
*** - invalid byte #x81 in CHARSET:CP1252 conversion
I gather this has something to do with Unicode character conversion. So
looking in the Clisp documentation I find that I need to have unicode
enabled. However, I think I do since char-code-limit is 1114112.
Any help appreciated.
Niklas Kambeitz
> * Niklas Kambeitz <······@cb-obk.zptvyy.pn> [2005-02-16 18:16:06 +0000]:
>
> How do I read files with weird characters in it?
>
> Using clisp 2.32 with XEmacs on Windows XP, I'm trying to read a file
> into a string using this:
>
> (with-open-file (stream "sample.file")
> (let ((result (make-array (file-length stream))))
> (read-sequence result stream)
> (setf *file-string* result)))
>
> This works unless there are weird characters in the file such as
> #\LATIN_CAPITAL_LETTER_N_WITH_TILDE. Then I get:
>
> *** - invalid byte #x81 in CHARSET:CP1252 conversion
clisp home -> FAQ -> trouble -> invalid byte ==>
<http://clisp.cons.org/faq.html#enc-err> ==>
<http://clisp.cons.org/clisp.html#opt-enc> ==>
<http://clisp.cons.org/impnotes.html#def-file-enc>
<http://clisp.cons.org/impnotes.html#extfmt>
in short, you should either use a 1:1 encoding,
e.g., charset:iso-8859-1,
or the specific encoding in which your file has been written.
(with-open-file (stream "sample.file" :external-format charset:iso-8859-1)
(let ((result (make-array (file-length stream))))
(read-sequence result stream)
(setf *file-string* result)))
--
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.honestreporting.com>
Between grand theft and a legal fee, there only stands a law degree.
Sam Steingold wrote:
>>*** - invalid byte #x81 in CHARSET:CP1252 conversion
> in short, you should either use a 1:1 encoding,
> e.g., charset:iso-8859-1,
> or the specific encoding in which your file has been written.
>
> (with-open-file (stream "sample.file" :external-format charset:iso-8859-1)
> (let ((result (make-array (file-length stream))))
> (read-sequence result stream)
> (setf *file-string* result)))
This worked like a charm. Thanks!
Niklas Kambeitz writes:
> How do I read files with weird characters in it?
>
> Using clisp 2.32 with XEmacs on Windows XP, I'm trying to read a file
> into a string using this:
>
> (with-open-file (stream "sample.file")
> (let ((result (make-array (file-length stream))))
> (read-sequence result stream)
> (setf *file-string* result)))
>
> This works unless there are weird characters in the file such as
> #\LATIN_CAPITAL_LETTER_N_WITH_TILDE. Then I get:
>
> *** - invalid byte #x81 in CHARSET:CP1252 conversion
>
> I gather this has something to do with Unicode character conversion. So
> looking in the Clisp documentation I find that I need to have unicode
> enabled. However, I think I do since char-code-limit is 1114112.
>
> Any help appreciated.
>
Maybe you can have a look to:
http://clisp.cons.org/faq.html#enc-err
--
Philippe Brochard <···········@SPAM_free.fr>
http://hocwp.free.fr