From: Christopher Brown
Subject: sbcl and non-ascii filenames (again)
Date: 
Message-ID: <1156590153.032796.308980@75g2000cwc.googlegroups.com>
I'm a n00bie to lisp. I've been reading Peter Siebel's book.  I've read
this thread:
http://groups.google.co.za/group/comp.lang.lisp/browse_thread/thread/fb5641a8ee2791c9.
I've read the sbcl internals manual, read the code in filesys.lisp and
I still don't know what to do if I want to list directories and
manipulate files with non-ascii names using sbcl on OS X.  Am I
completely out of luck or do I need to resort to a non-standard
extension or library (osicat?) ?.

I'm running into exactly the problem mentioned in the thread above, and
the same code "just works" on Allegro 8.0, but I'd like to stick with
sbcl.

Thoughts?  Thanks for your patience,
Christopher

From: Pascal Bourguignon
Subject: Re: sbcl and non-ascii filenames (again)
Date: 
Message-ID: <878xlby2o1.fsf@informatimago.com>
"Christopher Brown" <··········@gmail.com> writes:

> I'm a n00bie to lisp. I've been reading Peter Siebel's book.  I've read
> this thread:
> http://groups.google.co.za/group/comp.lang.lisp/browse_thread/thread/fb5641a8ee2791c9.
> I've read the sbcl internals manual, read the code in filesys.lisp and
> I still don't know what to do if I want to list directories and
> manipulate files with non-ascii names using sbcl on OS X.  Am I
> completely out of luck or do I need to resort to a non-standard
> extension or library (osicat?) ?.

On MacOSX, the file system stores the file names in utf-8, IIRC.
Unfortunately, AFAIK, sbcl can only use iso-8859-1 for file names.

Therefore, if you want to convert between unicode strings and file
names, you'll have to do something like:

(defun namestring-to-fs-hack (string)
  (SB-EXT:OCTETS-TO-STRING
        (SB-EXT:STRING-TO-OCTETS string :external-format :utf-8)
        :external-format :iso-8859-1))

(defun namestring-from-fs-hack (fsstring)
  (SB-EXT:OCTETS-TO-STRING
        (SB-EXT:STRING-TO-OCTETS fsstring :external-format :iso-8859-1)
        :external-format :utf-8))


(namestring-to-fs-hack "éléments archvés")

--> "éléments archvés"


(namestring-from-fs-hack "éléments archvés")

--> "éléments archvés"


Unfortunately, this is not enough, because MacOSX tends to use
decomposed characters, that is, "é" is stored as "é"

http://developer.apple.com/qa/qa2001/qa1235.html

Perhaps you can do some FFI using CFStringNormalize to do it (but the
same restriction of encodings applies on FFI in sbcl AFAIK).

So, something like:

  (open (CFStringNormalize (namestring-to-fs-hack "éléments archvés")))

and:

  (mapcar (lambda (path)
              (namestring-from-fs-hack
                 (ConvertUnicodeToCanonical
                    (namestring path)))) (directory "/Users/cgb/*"))

might work.

-- 
__Pascal Bourguignon__
From: Christopher Brown
Subject: Re: sbcl and non-ascii filenames (again)
Date: 
Message-ID: <1156604953.754565.13370@m79g2000cwm.googlegroups.com>
Thanks Pascal.  The precomposed vs. decomposed Unicode would never have
occured to me, despite having seen its effects and scratching my head
about it.

Looks like my wishes may have been granted however.  Here's an
sbcl-devel thread with relevant patches:
http://groups.google.com/group/sbcl-devel/browse_thread/thread/9ed14bf7953de1d8/#
(I've looked at the patches to filesys.lisp and one change was to
broaden from simple-base-string to simple-string  and from base-string
to string across the board.  I haven't examined the rest closely).

I'm rebuilding sbcl from cvs + patches & I'll report back on the
outcome for various filename character combinations.

Again, thanks,
Christopher
From: Robert Uhl
Subject: Re: sbcl and non-ascii filenames (again)
Date: 
Message-ID: <m3lkpaophy.fsf@NOSPAMgmail.com>
Pascal Bourguignon <···@informatimago.com> writes:
>
> On MacOSX, the file system stores the file names in utf-8, IIRC.
> Unfortunately, AFAIK, sbcl can only use iso-8859-1 for file names.

SBCL can't even do that--DIRECTORY bombs on anything more than standard
ASCII.  It's ugly indeed.

-- 
Robert Uhl <http://public.xdi.org/=ruhl>
However low a man sinks he never reaches the level of the police.
                                                 --Quentin Crisp
From: kavenchuk
Subject: Re: sbcl and non-ascii filenames (again)
Date: 
Message-ID: <1156841377.995076.43470@74g2000cwt.googlegroups.com>
Robert Uhl �����(�):

> SBCL can't even do that--DIRECTORY bombs on anything more than standard
> ASCII.  It's ugly indeed.

colinux:~# locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
colinux:~# mkdir �����
colinux:~# ls
checkppp.sh  dead.letter  lib  src  vmlinux-modules.tar.gz  �����
colinux:~# sbcl --core /usr/lib/sbcl/sbcl.core --noinform
* (directory "*")

(#P"/root/.bash_history" #P"/root/.bashrc" #P"/root/.clisprc"
#P"/root/.emacs"
 #P"/root/.emacs~" #P"/root/.profile" #P"/root/.ssh/" #P"/root/lib/"
 #P"/root/src/" #P"/root/�����/")
From: Robert Uhl
Subject: Re: sbcl and non-ascii filenames (again)
Date: 
Message-ID: <m3k64rti0q.fsf@NOSPAMgmail.com>
Counterexample:

[····@latakia foo]$locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY=en_US
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER=en_US
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT=en_US
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
[····@latakia foo]$ mkdir æsc
[····@latakia foo]$ ls
æsc
[····@latakia foo]$ sbcl --core /usr/lib/sbcl/sbcl.core --noinform
; loading system definition from
; /usr/lib/sbcl/sb-bsd-sockets/sb-bsd-sockets.asd into #<PACKAGE "ASDF0">
; registering #<SYSTEM SB-BSD-SOCKETS {B061BA9}> as SB-BSD-SOCKETS
; registering #<SYSTEM SB-BSD-SOCKETS-TESTS {B2062F1}> as SB-BSD-SOCKETS-TESTS
; loading system definition from /usr/lib/sbcl/sb-posix/sb-posix.asd into
; #<PACKAGE "ASDF0">
; registering #<SYSTEM SB-POSIX {A751321}> as SB-POSIX
; registering #<SYSTEM SB-POSIX-TESTS {A872401}> as SB-POSIX-TESTS
* (directory "*")

debugger invoked on a TYPE-ERROR in thread #<THREAD "initial thread" {A68B559}>:
  The value #\LATIN_CAPITAL_LETTER_A_WITH_TILDE is not of type BASE-CHAR.

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [ABORT] Exit debugger, returning to top level.

(SB-KERNEL:HAIRY-DATA-VECTOR-SET
 "/home/ruhl/tmp/foo/
From: Christopher Brown
Subject: Re: sbcl and non-ascii filenames (again)
Date: 
Message-ID: <1156918558.918130.137780@h48g2000cwc.googlegroups.com>
Note: sbcl 0.9.16 with the patches I mentioned above solve this problem
and the previous poster might be running that, or a recent CVS
snapshot.  The "...is not of type BASE_CHAR" is exactly what those
patches address.

Check your (lisp-implementation-version) for comparison:

; SLIME 2006-04-20
CL-USER> (lisp-implementation-version)
"0.9.16"

* (directory "/Users/cbrown/Music/iTunes/iTunes Music/*")

 #P"/Users/cbrown/Music/iTunes/iTunes Music/Benny Anderson_Tim
Rice_Björn Ulvaeus/"
 #P"/Users/cbrown/Music/iTunes/iTunes Music/Björk/"

Unfortunately, it appears to have broken SLIME... I pasted the example
above straight from the command line.  SLIME hangs when I do the same
:(

Cheers,
Chris

Robert Uhl wrote:
> Counterexample:
>
> [····@latakia foo]$locale
> LANG=en_GB.UTF-8
> LC_CTYPE="en_GB.UTF-8"
> LC_NUMERIC="en_GB.UTF-8"
> LC_TIME="en_GB.UTF-8"
> LC_COLLATE="en_GB.UTF-8"
> LC_MONETARY=en_US
> LC_MESSAGES="en_GB.UTF-8"
> LC_PAPER=en_US
> LC_NAME="en_GB.UTF-8"
> LC_ADDRESS="en_GB.UTF-8"
> LC_TELEPHONE="en_GB.UTF-8"
> LC_MEASUREMENT=en_US
> LC_IDENTIFICATION="en_GB.UTF-8"
> LC_ALL=
> [····@latakia foo]$ mkdir æsc
> [····@latakia foo]$ ls
> æsc
> [····@latakia foo]$ sbcl --core /usr/lib/sbcl/sbcl.core --noinform
> ; loading system definition from
> ; /usr/lib/sbcl/sb-bsd-sockets/sb-bsd-sockets.asd into #<PACKAGE "ASDF0">
> ; registering #<SYSTEM SB-BSD-SOCKETS {B061BA9}> as SB-BSD-SOCKETS
> ; registering #<SYSTEM SB-BSD-SOCKETS-TESTS {B2062F1}> as SB-BSD-SOCKETS-TESTS
> ; loading system definition from /usr/lib/sbcl/sb-posix/sb-posix.asd into
> ; #<PACKAGE "ASDF0">
> ; registering #<SYSTEM SB-POSIX {A751321}> as SB-POSIX
> ; registering #<SYSTEM SB-POSIX-TESTS {A872401}> as SB-POSIX-TESTS
> * (directory "*")
>
> debugger invoked on a TYPE-ERROR in thread #<THREAD "initial thread" {A68B559}>:
>   The value #\LATIN_CAPITAL_LETTER_A_WITH_TILDE is not of type BASE-CHAR.
>
> Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.
>
> restarts (invokable by number or by possibly-abbreviated name):
>   0: [ABORT] Exit debugger, returning to top level.
> 
> (SB-KERNEL:HAIRY-DATA-VECTOR-SET
>  "/home/ruhl/tmp/foo/
From: kavenchuk
Subject: Re: sbcl and non-ascii filenames (again)
Date: 
Message-ID: <1156919116.491842.202610@i42g2000cwa.googlegroups.com>
Christopher Brown писал(а):

> Note: sbcl 0.9.16 with the patches I mentioned above solve this problem
> and the previous poster might be running that

Yes

> Unfortunately, it appears to have broken SLIME... I pasted the example
> above straight from the command line.  SLIME hangs when I do the same
> :(

See SWANK::*CODING-SYSTEM* and 'slime-net-coding-system

-- 
WBR, Yaroslav Kavenchuk.
From: Robert Uhl
Subject: Re: sbcl and non-ascii filenames (again)
Date: 
Message-ID: <m3k64papm0.fsf@NOSPAMgmail.com>
"Christopher Brown" <··········@gmail.com> writes:

> Note: sbcl 0.9.16 with the patches I mentioned above solve this problem
> and the previous poster might be running that, or a recent CVS
> snapshot.  The "...is not of type BASE_CHAR" is exactly what those
> patches address.

Does 0.9.16 include the patches, or are the patches on top of it?

-- 
Robert Uhl <http://public.xdi.org/=ruhl>
Security-wise, NT is a server with a 'kick me' sign taped to it.
                                                --Peter Gutmann
From: kavenchuk
Subject: Re: sbcl and non-ascii filenames (again)
Date: 
Message-ID: <1157005956.447965.60640@i3g2000cwc.googlegroups.com>
Robert Uhl писал(а):

> "Christopher Brown" <··········@gmail.com> writes:
>
> > Note: sbcl 0.9.16 with the patches I mentioned above solve this problem
> > and the previous poster might be running that, or a recent CVS
> > snapshot.  The "...is not of type BASE_CHAR" is exactly what those
> > patches address.
>
> Does 0.9.16 include the patches, or are the patches on top of it?
>
> --
> Robert Uhl <http://public.xdi.org/=ruhl>
> Security-wise, NT is a server with a 'kick me' sign taped to it.
>                                                 --Peter Gutmann

Not include.
http://groups.google.com/group/sbcl-devel/browse_thread/thread/9ed14bf7953de1d8/#
for 0.9.16. (not for >=0.9.16.2)

-- 
WBR, Yaroslav Kavenchuk.