Chinese Characters

From: ············@gmail.com
Subject: Chinese Characters
Date: Thu, 14 Dec 2006 02:03:28 +0000
Message-ID: <1166061808.661880.116060@f1g2000cwa.googlegroups.com>

Hello:

   (write-line "¦¨¥\") failed in both SBCL and Corman Lisp,
because the character "¥\" is a5 5c, while 5c is backslash \
   Those Chinese characters work in most moderen language like Java,
C#, Ruby,
   and in recent C++ compilers it works too.
   I am not sure if these Chinese characters are currently unsupported,
   or if some configuration needs to be set.

Re: Chinese Characters Stefan Nobis
Re: Chinese Characters Pascal Bourguignon
- Re: Chinese Characters ············@gmail.com
  - Re: Chinese Characters Pascal Bourguignon
Re: Chinese Characters Sidney Markowitz
Re: Chinese Characters Chris Barts
- Re: Chinese Characters Pascal Bourguignon

From: Stefan Nobis
Subject: Re: Chinese Characters
Date: Thu, 14 Dec 2006 09:05:11 +0000
Message-ID: <87r6v2zw3c.fsf@snobis.de>

·············@gmail.com" <············@gmail.com> writes:

>    (write-line "成功") failed in both SBCL and Corman Lisp,

I tried via Emacs/Slime with this configuration (I used the first one
for sbcl):

(setq slime-lisp-implementations        ; choose alternative with M-- M-x slime
      '((sbcl ("/usr/bin/sbcl" "--noinform") :coding-system utf-8-unix)
        (sbcl-l1 ("/usr/bin/sbcl" "--noinform"))
        (clisp ("/usr/bin/clisp" "-K full"))
        (cmucl ("/usr/bin/cmucl" "-quiet"))))

CL-USER> (write-line "成功")
成功
"成功"

Just works. :)

If I use sbcl-l1 I get a warning from slime about coding system mismatch.

-- 
Stefan.

From: Pascal Bourguignon
Subject: Re: Chinese Characters
Date: Thu, 14 Dec 2006 12:48:12 +0000
Message-ID: <8764ceod83.fsf@thalassa.informatimago.com>

·············@gmail.com" <············@gmail.com> writes:

> Hello:
>
>    (write-line "成功") failed in both SBCL and Corman Lisp,
> because the character "功" is a5 5c, while 5c is backslash \

You are wrong. 功 is 529F.

C/USER[2]> (princ #\ua55c)
ꕜ
#\UA55C
C/USER[3]> (format t "~X" (char-code #\功))
529F
NIL
C/USER[4]> (princ #\u529f)
功
#\U529F
C/USER[5]> 

http://www.cliki.net/CloserLookAtCharacters

>    Those Chinese characters work in most moderen language like Java,
> C#, Ruby,
>    and in recent C++ compilers it works too.
>    I am not sure if these Chinese characters are currently unsupported,
>    or if some configuration needs to be set.

The problem is that you need to specify the character encoding to be
used for files and terminals.

With sbcl and clisp (I don't know for Corman Lisp), you can specify
the encoding in LC_CTYPE:

[···@thalassa tmp]$ export LC_CTYPE=en_US.UTF-8 ; sbcl --userinit /dev/null
* (write-line "成功")
成功
"成功"
* 


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
        Un chat errant
se soulage
        dans le jardin d'hiver
                                        Shiki

From: ············@gmail.com
Subject: Re: Chinese Characters
Date: Thu, 14 Dec 2006 23:43:56 +0000
Message-ID: <1166139835.936011.126560@l12g2000cwl.googlegroups.com>

Pascal Bourguignon ¼g¹D¡G

> ·············@gmail.com" <············@gmail.com> writes:
>
> > Hello:
> >
> >    (write-line "¦¨¥\") failed in both SBCL and Corman Lisp,
> > because the character "¥\" is a5 5c, while 5c is backslash \
>
> You are wrong. ¥\ is 529F.
   The code I mentioned is not unicode, it's the encoding of
big-5(codepage 950), that's why there is a backslash in the second byte
of the character.

From: Pascal Bourguignon
Subject: Re: Chinese Characters
Date: Fri, 15 Dec 2006 00:05:03 +0000
Message-ID: <87slfikor4.fsf@thalassa.informatimago.com>

·············@gmail.com" <············@gmail.com> writes:

> Pascal Bourguignon 寫道：
>
>> ·············@gmail.com" <············@gmail.com> writes:
>>
>> > Hello:
>> >
>> >    (write-line "成功") failed in both SBCL and Corman Lisp,
>> > because the character "功" is a5 5c, while 5c is backslash \
>>
>> You are wrong. 功 is 529F.
>    The code I mentioned is not unicode, it's the encoding of
> big-5(codepage 950), that's why there is a backslash in the second byte
> of the character.

Ah ah!  But sbcl (0.9.18) doesn't support this encoding.

S/CL-USER[23]> (mapcar 'first SB-IMPL::*EXTERNAL-FORMATS*)

((:UCS-2BE :UCS2BE) (:UCS-2LE :UCS2LE) (:EUC-JP :EUCJP :|eucJP|)
(:CP1258 :|cp1258| :WINDOWS-1258 :|windows-1258|) (:CP1257 :|cp1257|
:WINDOWS-1257 :|windows-1257|) (:CP1256 :|cp1256|) (:CP1255 :|cp1255|
:WINDOWS-1255 :|windows-1255|) (:CP1254 :|cp1254|) (:CP1253 :|cp1253|
:WINDOWS-1253 :|windows-1253|) (:CP1252 :|cp1252| :WINDOWS-1252
:|windows-1252|) (:CP1251 :|cp1251| :WINDOWS-1251 :|windows-1251|)
(:CP1250 :|cp1250| :WINDOWS-1250 :|windows-1250|) (:ISO-8859-14
:|iso-8859-14| :LATIN-8 :|latin-8|) (:ISO-8859-13 :|iso-8859-13|
:LATIN-7 :|latin-7|) (:ISO-8859-11 :|iso-8859-11|) (:ISO-8859-10
:|iso-8859-10| :LATIN-6 :|latin-6|) (:ISO-8859-9 :|iso-8859-9|
:LATIN-5 :|latin-5|) (:ISO-8859-8 :|iso-8859-8|) (:ISO-8859-7
:|iso-8859-7|) (:ISO-8859-6 :|iso-8859-6|) (:ISO-8859-5 :|iso-8859-5|)
(:ISO-8859-4 :|iso-8859-4| :LATIN-4 :|latin-4|) (:ISO-8859-3
:|iso-8859-3| :LATIN-3 :|latin-3|) (:ISO-8859-2 :|iso-8859-2| :LATIN-2
:|latin-2|) (:CP874 :|cp874|) (:CP869 :|cp869|) (:CP866 :|cp866|)
(:CP865 :|cp865|) (:CP864 :|cp864|) (:CP863 :|cp863|) (:CP862
:|cp862|) (:CP861 :|cp861|) (:CP860 :|cp860|) (:CP857 :|cp857|)
(:CP855 :|cp855|) (:CP852 :|cp852|) (:CP850 :|cp850|) (:CP437
:|cp437|) (:X-MAC-CYRILLIC :|x-mac-cyrillic|) (:KOI8-U :|koi8-u|)
(:KOI8-R :|koi8-r|) (:UTF-8 :UTF8) (:LATIN-9 :LATIN9 :ISO-8859-15
:ISO8859-15) (:EBCDIC-US :IBM-037 :IBM037) (:ASCII :US-ASCII
:ANSI_X3.4-1968 :ISO-646 :ISO-646-US :|646|) (:LATIN-1 :LATIN1
:ISO-8859-1 :ISO8859-1))


You should try clisp instead, or use UTF-8.

C/USER[11]> (find-symbol "BIG5" "CHARSET")
CHARSET:BIG5 ;
:EXTERNAL
C/USER[12]> (find-symbol "CP950" "CHARSET")
CHARSET:CP950 ;
:EXTERNAL

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

READ THIS BEFORE OPENING PACKAGE: According to certain suggested
versions of the Grand Unified Theory, the primary particles
constituting this product may decay to nothingness within the next
four hundred million years.

From: Sidney Markowitz
Subject: Re: Chinese Characters
Date: Thu, 14 Dec 2006 11:51:53 +0000
Message-ID: <45813ae1$0$69006$742ec2ed@news.sonic.net>

············@gmail.com wrote, On 14/12/06 3:03 PM:
>    (write-line "���\") failed in both SBCL and Corman Lisp,

Are you trying this in an emacs buffer using SLIME? If you are, try
putting this in your ~/.emacs file

(setf slime-net-coding-system 'utf-8-unix)

That worked for me using the latest sbcl.

When I tried the write-line in sbcl called from the command line of an
ordinary terminal shell, it worked with no problem.

-- 
    Sidney Markowitz
    http://www.sidney.com

From: Chris Barts
Subject: Re: Chinese Characters
Date: Tue, 13 Feb 2007 23:58:59 +0000
Message-ID: <pan.2006.12.14.05.25.12.730856@tznvy.pbz>

On Wed, 13 Dec 2006 18:03:28 -0800, ············@gmail.com wrote:

> Hello:
> 
>    (write-line "成功") failed in both SBCL and Corman Lisp,

It works in CLisp. CMUCL I know has problems with UTF-8 (at least under
Slime) and it seems SBCL and Corman Lisp might as well.

>    Those Chinese characters work in most moderen language like Java,
> C#, Ruby, and in recent C++ compilers it works too.

Good for them. They can join CLisp in the 'good UTF-8 handling' corner.

>    I am not sure if these Chinese characters are currently unsupported,
>    or if some configuration needs to be set.

Support varies by the implementation.

-- 
My address happens to be com (dot) gmail (at) usenet (plus) chbarts,
wardsback and translated.
It's in my header if you need a spoiler.


----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----

From: Pascal Bourguignon
Subject: Re: Chinese Characters
Date: Wed, 14 Feb 2007 01:30:50 +0000
Message-ID: <87sld9o791.fsf@thalassa.informatimago.com>

Chris Barts <··············@tznvy.pbz> writes:

> On Wed, 13 Dec 2006 18:03:28 -0800, ············@gmail.com wrote:
>
>> Hello:
>> 
>>    (write-line "成功") failed in both SBCL and Corman Lisp,
>
> It works in CLisp. CMUCL I know has problems with UTF-8 (at least under
> Slime) and it seems SBCL and Corman Lisp might as well.

Well, even if it works, it may not work.

Assuming a 8-bit clean implementation end-to-end, it could be that
(length "成功") /= 2 !  (eg it could be 6 if you send UTF-8 bytes,
interpreted as ISO-8859-1+control-code-mapped-to-some-character by
your lisp, and interpreted back as UTF-8 bytes by the terminal.

So what you need, is to ensure that you have configured the same
encoding from end to end, from the keyboard or the input file, in the
lisp program, and on the terminal or the output file.

In sbcl, you can do it with the LC_CTYPE environment variable:

   LC_CTYPE=en_US.${encoding} ; export LC_CTYPE ; sbcl 

where encoding is something like: UTF-8, ISO-8859-1, etc.

or with current versions, with the unsupported, undocumented lisp variable:

   (SETF SB-IMPL::*DEFAULT-EXTERNAL-FORMAT* encoding)

where encoding is a lisp keyword such as :UTF-8, :ISO-8859-1, etc

(You can find the list of encoding supported by sbcl with this
unsupported, undocumented expression:

   (mapcar 'first SB-IMPL::*EXTERNAL-FORMATS*) 

In any case, before using (write-line "成功"), 
be sure to test (assert (= 2 (length "成功"))).

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay