What should the DIRECTORY function do when it encounters a character
whose
ascii code is greater than 128?
It looks like the sbcl DIRECTORY function fails if the UNIX directory
contains
a file with name "Würzburg". I am assuming it is the ü that is
confusing something.
Is this what DIRECTORY should do in this case?
Does anyone have a fix for Peter Seibel's Portable Pathname Library
which can handle this?
The value #\Ã is not of type BASE-CHAR.
[Condition of type TYPE-ERROR]
Restarts:
0: [ABORT-REQUEST] Abort handling SLIME request.
1: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread"
{10003BF451}>)
0: (SB-KERNEL:HAIRY-DATA-VECTOR-SET ··············@·@·@·@·@·@·@·@" 12
#\Ã)
1: (SB-IMPL::CONCAT-TO-SIMPLE* BASE-STRING)
2: (CONCATENATE BASE-STRING)
3: (SB-IMPL::%ENUMERATE-FILES "/tmp/xyzzy/" #P"/tmp/xyzzy/*.*" T
#<CLOSURE (LAMBDA (SB-IMPL::MATCH)) {1000DF4E79}>)
4: ((LABELS SB-IMPL::DO-DIRECTORY) #P"/tmp/xyzzy/*.*")
5: (DIRECTORY #P"/tmp/xyzzy/*.*")
From: Peter Seibel
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date:
Message-ID: <m2fymxa1a4.fsf@gigamonkeys.com>
"Jimka" <·····@rdrop.com> writes:
> What should the DIRECTORY function do when it encounters a character
> whose ascii code is greater than 128?
>
> It looks like the sbcl DIRECTORY function fails if the UNIX
> directory contains a file with name "W�rzburg". I am assuming it is
> the � that is confusing something. Is this what DIRECTORY should do
> in this case?
Probably not. But the SBCL guys aparently haven't decided what it
should do instead. (The problem, as I recall, is that it's not
well-defined what character set/character encoding should be used when
interpreting the bytes that make up a file name.)
> Does anyone have a fix for Peter Seibel's Portable Pathname Library
> which can handle this?
Not that I'm aware of--if we can't get names out of DIRECTORY, there's
not a lot we can do.
-Peter
--
Peter Seibel * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp * http://www.gigamonkeys.com/book/
+ "Jimka" <·····@rdrop.com>:
| It looks like the sbcl DIRECTORY function fails if the UNIX directory
| contains
| a file with name "W�rzburg". I am assuming it is the � that is
| confusing something.
Indeed. The code in filesys.lisp that does this assumes that
filenames are BASE-STRINGs. Since sbcl started supporting Unicode,
BASE-CHARs just cover the ASCII range, so the pathname code doesn't
work right anymore. For example, just typing #p"W�rzburg" produces an
error.
I think you should report this as a bug to the sbcl-devel mailing
list. No, I'll do it... Done. But a quick fix may not be
forthcoming.
--
* Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
when there is no ground whatsoever for supposing it is true.
-- Bertrand Russell
From: John Thingstad
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date:
Message-ID: <op.s4iyfcg6pqzri1@mjolner.upc.no>
On Sun, 05 Feb 2006 23:15:24 +0100, Jimka <·····@rdrop.com> wrote:
> What should the DIRECTORY function do when it encounters a character
> whose
> ascii code is greater than 128?
>
Can't you just use unicode?
That way it would be easier to port too.
Afterall there are many extended character sets.
The one you would prefer I suppose is ISO 8859-1 (LATIN-1).
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: John Thingstad
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date:
Message-ID: <op.s4iy2lyjpqzri1@mjolner.upc.no>
On Mon, 06 Feb 2006 01:03:02 +0100, John Thingstad
<··············@chello.no> wrote:
> On Sun, 05 Feb 2006 23:15:24 +0100, Jimka <·····@rdrop.com> wrote:
>
>> What should the DIRECTORY function do when it encounters a character
>> whose
>> ascii code is greater than 128?
>>
>
> Can't you just use unicode?
On closer investigation no.
It needs to be standard ASCII.
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
John Thingstad <··············@chello.no> wrote:
+---------------
| > Jimka <·····@rdrop.com> wrote:
| > Can't you just use unicode?
|
| On closer investigation no.
| It needs to be standard ASCII.
+---------------
ASCII (ANSI X3.4-1986) contains only the character codes 0-127.
Do you perhaps mean ISO Latin 1 (a.k.a. ISO-8859-1)?
-Rob
-----
Rob Warnock <····@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
From: John Thingstad
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date:
Message-ID: <op.s4j5wmgqpqzri1@mjolner.upc.no>
On Mon, 06 Feb 2006 11:52:18 +0100, Rob Warnock <····@rpw3.org> wrote:
> John Thingstad <··············@chello.no> wrote:
> +---------------
> | > Jimka <·····@rdrop.com> wrote:
> | > Can't you just use unicode?
> |
> | On closer investigation no.
> | It needs to be standard ASCII.
> +---------------
>
> ASCII (ANSI X3.4-1986) contains only the character codes 0-127.
> Do you perhaps mean ISO Latin 1 (a.k.a. ISO-8859-1)?
>
No. It seems in SBCL it needs to be standard ASCII.
Unix file system treats file names as a binary sequence.
Thus it cannot report back on encoding.
For this reason SBCL has opted for standard ASCII.
I use LispWorks on Windows.
My default encoding for BASE-CHAR is ISO-LATIN-1 so
this can vary from system to system.
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/