I'm sure this is simple, but believe it or not, I can't find the word
'unicode' anywhere in the manual.
I'm having issues outputting char-codes for certain symbols, and I
think unicode is what I need. Is this on by default, or is there
something I have to do to 'activate' unicode support?
"Jonathon McKitrick" <···········@bigfoot.com> writes:
> I'm sure this is simple, but believe it or not, I can't find the word
> 'unicode' anywhere in the manual.
>
> I'm having issues outputting char-codes for certain symbols, and I
> think unicode is what I need. Is this on by default, or is there
> something I have to do to 'activate' unicode support?
It's on by default. You can double-check by looking for :SB-UNICODE
on *FEATURES*. Without a more specific question, that's all I can say.
"Jonathon McKitrick" <···········@bigfoot.com> writes:
> I'm sure this is simple, but believe it or not, I can't find the word
> 'unicode' anywhere in the manual.
>
> I'm having issues outputting char-codes for certain symbols, and I
> think unicode is what I need. Is this on by default, or is there
> something I have to do to 'activate' unicode support?
In the shell, what does "locale" print?
Zach
On Wed, 17 May 2006 22:07:04 -0700, Jonathon McKitrick wrote:
> I'm sure this is simple, but believe it or not, I can't find the word
> 'unicode' anywhere in the manual.
>
> I'm having issues outputting char-codes for certain symbols,
? What symbols?
> and I
> think unicode is what I need.
You probably need a character _encoding_ (like utf-8 or similar
encodings).
> Is this on by default, or is there
> something I have to do to 'activate' unicode support?
What version of SBCL? What's your SBCL's feature list? Mine says:
CL-USER> *features*
(:ASDF :CLC-OS-DEBIAN
:COMMON-LISP-CONTROLLER
:ANSI-CL
:COMMON-LISP
:SBCL
:UNIX
:SB-DOC
:SB-TEST
:SB-PACKAGE-LOCKS
:SB-UNICODE ; <-------- Look for this
:SB-SOURCE-LOCATIONS
:IEEE-FLOATING-POINT
:PPC
:ELF
:LINUX
:STACK-ALLOCATABLE-CLOSURES
:OS-PROVIDES-DLOPEN
:OS-PROVIDES-DLADDR
:OS-PROVIDES-PUTWC)
Can you elaborate on your problem? What exactly isn't working?
HTH Ralf Mattes
> Can you elaborate on your problem? What exactly isn't working?
I found sb-unicode on *features*, of course. And 'locale' on my
machine says 'en_US.UTF-8' for each item.
I have discovered 2 problem areas.
First, I need to explain some data flow.
A CSV file of data is read into the database by an SQL script. This
file has international characters, mostly just Spanish or Portuguese
names. A CL-SQL function parses the table, and re-inserts the data
where necessary into another table. This is the problem area. SBCL
chokes on the characters coming from the DB queries with the
international characters. I have to go through the CSV file first,
changing them to simple vowels before the import will work, even though
the SQL has no problem.
Later, when I want to generate something as simple as a copyright or
trademark symbol with cl-pdf, each of the characters appears with an
'A' with a carat before it. So
(format nil "~c" #\COPYRIGHT_SIGN)
produces, roughly,
··@
where the carat should be over the A and the @ is a copyright. I
really need to learn utf-8 input on Firefox. ;-)
Interestingly,
(code-char 169)
gives #\COPYRIGHT_SIGN on sbcl, but ··@ (the actual copyright sign, not
@) on ACL.
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Can't get Unicode on SBCL to work
Date:
Message-ID: <87wtcjcufa.fsf@qrnik.zagroda>
"Jonathon McKitrick" <···········@bigfoot.com> writes:
> Later, when I want to generate something as simple as a copyright or
> trademark symbol with cl-pdf, each of the characters appears with an
> 'A' with a carat before it. So
>
> (format nil "~c" #\COPYRIGHT_SIGN)
>
> produces, roughly,
>
> ··@
This means that UTF-8 encoded text is reinterpreted as if it were
ISO-8859-1.
--
__("< Marcin Kowalczyk
\__/ ······@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Marcin 'Qrczak' Kowalczyk wrote:
> > (format nil "~c" #\COPYRIGHT_SIGN)
> >
> > produces, roughly,
> >
> > ··@
>
> This means that UTF-8 encoded text is reinterpreted as if it were
> ISO-8859-1.
Is this a simple fix?
"Jonathon McKitrick" <···········@bigfoot.com> writes:
> I'm sure this is simple, but believe it or not, I can't find the word
> 'unicode' anywhere in the manual.
>
> I'm having issues outputting char-codes for certain symbols, and I
> think unicode is what I need. Is this on by default, or is there
> something I have to do to 'activate' unicode support?
It should be activated by default. Do you have a recent version of sbcl?
However, it has some difficulties with korean for example:
% /usr/local/bin/sbcl
This is SBCL 0.9.12, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.
SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
;; Reading ASDF packages from /home/pjb/asdf-central-registry.data...
; loading system definition from
; /usr/local/languages/sbcl/lib/sbcl/sb-bsd-sockets/sb-bsd-sockets.asd into
; #<PACKAGE "ASDF0">
; registering #<SYSTEM SB-BSD-SOCKETS {AB17591}> as SB-BSD-SOCKETS
; registering #<SYSTEM SB-BSD-SOCKETS-TESTS {AE89119}> as SB-BSD-SOCKETS-TESTS
; loading system definition from
; /usr/local/languages/sbcl/lib/sbcl/sb-posix/sb-posix.asd into
; #<PACKAGE "ASDF0">
; registering #<SYSTEM SB-POSIX {A6ED601}> as SB-POSIX
; registering #<SYSTEM SB-POSIX-TESTS {A873E81}> as SB-POSIX-TESTS
~/.sbclrc loaded
* (defun ιοτα (&key (номер 10) (단계 1) (בכוכ 0))
(loop :for i :from בכוכ :to номер :by 단계 :collect i))
|ιοτα|
* (ιοτα :номер 10 :단계 2 :בכוכ 2)
(2 4 6 8 10)
* (print '(ιοτα :номер 10 :단계 2 :בכוכ 2))
(|ιοτα| :|номер| 10 :|ˋ�ʳ�| 2 :|בכוכ| 2)
(|ιοτα| :|номер| 10 :|ˋ�ʳ�| 2 :|בכוכ| 2)
*features*
(:COM.INFORMATIMAGO.PJB :ASDF :ANSI-CL :COMMON-LISP :SBCL :UNIX :SB-DOC :SB-TEST :SB-PACKAGE-LOCKS :SB-UNICODE :SB-SOURCE-LOCATIONS :IEEE-FLOATING-POINT :X86 :ELF :LINUX :GENCGC :STACK-GROWS-DOWNWARD-NOT-UPWARD :C-STACK-IS-CONTROL-STACK :STACK-ALLOCATABLE-CLOSURES :ALIEN-CALLBACKS :LINKAGE-TABLE :OS-PROVIDES-DLOPEN :OS-PROVIDES-DLADDR :OS-PROVIDES-PUTWC)
* (sb-ext:string-to-octets "단계" :external-format :utf-8)
#(195 171 194 139 194 168 195 170 194 179 194 132)
Here is what clisp gives:
[73]> (ext:convert-string-to-bytes "단계" charset:utf-8)
#(235 139 168 234 179 132)
--
__Pascal Bourguignon__ http://www.informatimago.com/
"You can tell the Lisp programmers. They have pockets full of punch
cards with close parentheses on them." --> http://tinyurl.com/8ubpf