From: Jimka
Subject: DIRECTORY cannot read directory that contains umlaut
Date: 
Message-ID: <1139177724.723181.157630@g43g2000cwa.googlegroups.com>
What should the DIRECTORY function do when it encounters a character
whose
ascii code is greater than 128?

It looks like the sbcl DIRECTORY function fails if the UNIX directory
contains
a file with name  "Würzburg".  I am assuming it is the ü that is
confusing something.
Is this what DIRECTORY should do in this case?

Does anyone have a fix for Peter Seibel's Portable Pathname Library
which can handle this?

The value #\Ã is not of type BASE-CHAR.
   [Condition of type TYPE-ERROR]

Restarts:
  0: [ABORT-REQUEST] Abort handling SLIME request.
  1: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread"
{10003BF451}>)

  0: (SB-KERNEL:HAIRY-DATA-VECTOR-SET ··············@·@·@·@·@·@·@·@" 12
#\Ã)
  1: (SB-IMPL::CONCAT-TO-SIMPLE* BASE-STRING)
  2: (CONCATENATE BASE-STRING)
  3: (SB-IMPL::%ENUMERATE-FILES "/tmp/xyzzy/" #P"/tmp/xyzzy/*.*" T
#<CLOSURE (LAMBDA (SB-IMPL::MATCH)) {1000DF4E79}>)
  4: ((LABELS SB-IMPL::DO-DIRECTORY) #P"/tmp/xyzzy/*.*")
  5: (DIRECTORY #P"/tmp/xyzzy/*.*")

From: Jimka
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date: 
Message-ID: <1139178262.525167.277970@g44g2000cwa.googlegroups.com>
It seems to work in clisp.
From: Peter Seibel
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date: 
Message-ID: <m2fymxa1a4.fsf@gigamonkeys.com>
"Jimka" <·····@rdrop.com> writes:

> What should the DIRECTORY function do when it encounters a character
> whose ascii code is greater than 128?
>
> It looks like the sbcl DIRECTORY function fails if the UNIX
> directory contains a file with name "W�rzburg". I am assuming it is
> the � that is confusing something. Is this what DIRECTORY should do
> in this case?

Probably not. But the SBCL guys aparently haven't decided what it
should do instead. (The problem, as I recall, is that it's not
well-defined what character set/character encoding should be used when
interpreting the bytes that make up a file name.)

> Does anyone have a fix for Peter Seibel's Portable Pathname Library
> which can handle this?

Not that I'm aware of--if we can't get names out of DIRECTORY, there's
not a lot we can do.

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/
From: Harald Hanche-Olsen
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date: 
Message-ID: <pco4q3d4cap.fsf@shuttle.math.ntnu.no>
+ "Jimka" <·····@rdrop.com>:

| It looks like the sbcl DIRECTORY function fails if the UNIX directory
| contains
| a file with name  "W�rzburg".  I am assuming it is the � that is
| confusing something.

Indeed.  The code in filesys.lisp that does this assumes that
filenames are BASE-STRINGs.  Since sbcl started supporting Unicode,
BASE-CHARs just cover the ASCII range, so the pathname code doesn't
work right anymore.  For example, just typing #p"W�rzburg" produces an
error.

I think you should report this as a bug to the sbcl-devel mailing
list.  No, I'll do it...  Done.  But a quick fix may not be
forthcoming.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell
From: John Thingstad
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date: 
Message-ID: <op.s4iyfcg6pqzri1@mjolner.upc.no>
On Sun, 05 Feb 2006 23:15:24 +0100, Jimka <·····@rdrop.com> wrote:

> What should the DIRECTORY function do when it encounters a character
> whose
> ascii code is greater than 128?
>

Can't you just use unicode?
That way it would be easier to port too.
Afterall there are many extended character sets.
The one you would prefer I suppose is ISO 8859-1 (LATIN-1).

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: John Thingstad
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date: 
Message-ID: <op.s4iy2lyjpqzri1@mjolner.upc.no>
On Mon, 06 Feb 2006 01:03:02 +0100, John Thingstad  
<··············@chello.no> wrote:

> On Sun, 05 Feb 2006 23:15:24 +0100, Jimka <·····@rdrop.com> wrote:
>
>> What should the DIRECTORY function do when it encounters a character
>> whose
>> ascii code is greater than 128?
>>
>
> Can't you just use unicode?

On closer investigation no.
It needs to be standard ASCII.

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: Rob Warnock
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date: 
Message-ID: <QKydnQx9PvP_t3renZ2dnUVZ_t-dnZ2d@speakeasy.net>
John Thingstad <··············@chello.no> wrote:
+---------------
| > Jimka <·····@rdrop.com> wrote:
| > Can't you just use unicode?
| 
| On closer investigation no.
| It needs to be standard ASCII.
+---------------

ASCII (ANSI X3.4-1986) contains only the character codes 0-127.
Do you perhaps mean ISO Latin 1 (a.k.a. ISO-8859-1)?


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: John Thingstad
Subject: Re: DIRECTORY cannot read directory that contains umlaut
Date: 
Message-ID: <op.s4j5wmgqpqzri1@mjolner.upc.no>
On Mon, 06 Feb 2006 11:52:18 +0100, Rob Warnock <····@rpw3.org> wrote:

> John Thingstad <··············@chello.no> wrote:
> +---------------
> | > Jimka <·····@rdrop.com> wrote:
> | > Can't you just use unicode?
> |
> | On closer investigation no.
> | It needs to be standard ASCII.
> +---------------
>
> ASCII (ANSI X3.4-1986) contains only the character codes 0-127.
> Do you perhaps mean ISO Latin 1 (a.k.a. ISO-8859-1)?
>

No. It seems in SBCL it needs to be standard ASCII.
Unix file system treats file names as a binary sequence.
Thus it cannot report back on encoding.
For this reason SBCL has opted for standard ASCII.

I use LispWorks on Windows.
My default encoding for BASE-CHAR is ISO-LATIN-1 so
this can vary from system to system.


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/