Dear Lispers,
I'm trying to connect to a MySQL and retrieve some data that are
encoded in utf-8 and contain some Turkish characters. First let me
show data correctly using commandline mysql:
···@cas-laptop:~$ mysql --host=10.15.4.100 -D "flac_deneme" -u emres -
p
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.0.51a-3ubuntu5.4 (Ubuntu)
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM Album WHERE ID=3;
+----+-------------------+--------+------+-------+--------+
| ID | Name | Artist | Date | Label | Studio |
+----+-------------------+--------+------+-------+--------+
| 3 | Agannaga Rüþvet | 362 | 1992 | 8 | |
+----+-------------------+--------+------+-------+--------+
1 row in set (0.00 sec)
mysql> exit
As you may see the critical part I'm interested in is the Name field
which contains the Turkish string "Rüþvet".
Now, let's see what kind of problem happens when I try to run the same
query within Allegro CL 8.1 (Free Express Edition) with the help of cl-
sql (some irrelevant message lines are ommited and shown as dots):
···@cas-laptop:~/Documents/applications/acl81_express$ ./alisp
;;; Installing locale patch, version 1.
International Allegro CL Free Express Edition
8.1 [Linux (x86)] (Nov 27, 2008 11:52)
Copyright (C) 1985-2007, Franz Inc., Oakland, CA, USA. All Rights
Reserved.
This development copy of Allegro CL is licensed to:
Trial User
CL-USER(1): (push #P"/usr/share/common-lisp/systems/" asdf:*central-
registry*)
; Autoloading for package "ASDF":
; Fast loading
; /home/cas/Documents/applications/acl81_express/code/asdf.fasl
(#P"/usr/share/common-lisp/systems/" *DEFAULT-PATHNAME-DEFAULTS*)
CL-USER(2): (asdf:operate 'asdf:load-op 'clsql)
; loading system definition from
; /usr/share/common-lisp/systems/clsql.asd into #<The ASDF0 package>
; Loading /usr/share/common-lisp/systems/clsql.asd
; loading system definition from
; /usr/share/common-lisp/systems/uffi.asd into #<The ASDF1 package>
; Loading /usr/share/common-lisp/systems/uffi.asd
; registering #<SYSTEM UFFI @ #x71619482> as UFFI
; Loading /etc/clsql-init.lisp
.
.
.
NIL
Now I connect to the database on another machine:
CL-USER(3): (clsql:connect '("10.15.4.100"
"flac_deneme"
"emres"
"emres")
:database-type :mysql)
; loading system definition from
; /usr/share/common-lisp/systems/clsql-mysql.asd into
; #<The ASDF0 package>
; Loading /usr/share/common-lisp/systems/clsql-mysql.asd
; registering #<SYSTEM :CLSQL-MYSQL @ #x71781db2> as CLSQL-MYSQL
; loading system definition from
; /usr/share/common-lisp/systems/clsql-uffi.asd into
.
.
.
#<CLSQL-MYSQL:MYSQL-DATABASE 10.15.4.100/flac_deneme/emres OPEN @
#x717a7aa2>
And I run the same query:
CL-USER(4): (clsql-sys:query "SELECT * FROM Album WHERE ID=3")
((3 "Agannaga Rü?vet" 362 "1992" 8 ""))
("ID" "Name" "Artist" "Date" "Label" "Studio")
Here is the first problematic version above, it returns "Rü?vet" not
the correct "Rüþvet".
Let's try harder:
CL-USER(5): (clsql-sys:execute-command "SET NAMES 'utf8'")
CL-USER(6): (clsql-sys:query "SELECT * FROM Album WHERE ID=3")
((3 "Agannaga RüÅvet" 362 "1992" 8 ""))
("ID" "Name" "Artist" "Date" "Label" "Studio")
Another problematic version above, it returns "RüÅvet" instead of the
correct version "Rüþvet".
What is the *locale* of my Allegro Common Lisp?
CL-USER(7): *locale*
#<locale "en_US" [:UTF8-BASE] @ #x7130205a>
I give it another try even though it'll switch to iso-8859-9 encoding:
CL-USER(8): (setf *locale* (find-locale "tr_TR"))
;; Autoloading locale from #P"/home/cas/Documents/applications/
acl81_express/locales/tr_TR"
; Fast loading from bundle code/ef-iso8859-9.fasl.
; Fast loading from bundle code/efft-iso8859-9-base.fasl.
#<locale "tr_TR" [:ISO8859-9-BASE] @ #x716a72f2>
Here we go:
CL-USER(9): (clsql-sys:query "SELECT * FROM Album WHERE ID=3")
((3 "Agannaga RüÅvet" 362 "1992" 8 ""))
("ID" "Name" "Artist" "Date" "Label" "Studio")
No hope and it seems like en_EN.utf-8 returns the same result as
tr_TR.iso8859-9
My system is Ubuntu GNU/Linux Hardy. I'm using Allegro CL Express
Edition 8.1 (free version).
$ uname -a
Linux cas-laptop 2.6.24-22-generic #1 SMP Mon Nov 24 18:32:42 UTC 2008
i686 GNU/Linux
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Any ideas how I can get my utf-8 encoded Turkish characters from cl-
sql using Allegro CL?
Note: There are no problems reading and writing utf-8 encoded Turkish
characters to/from files on my local hard disk, Allegro CL can handle
that without problems:
$ cat /home/cas/Documents/source/Lisp/ilc/name.txt
Agannaga Rüþvet 362 1992 8
CL-USER(1): (with-open-file (stream "/home/cas/Documents/source/Lisp/
ilc/name.txt")
(format t "~a~%" (read-line stream)))
Agannaga Rüþvet 362 1992 8
NIL
Cheers,
--
Emre sevinc
On Dec 25, 10:04 am, Emre Sevinc <···········@gmail.com> wrote:
> mysql> SELECT * FROM Album WHERE ID=3;
> +----+-------------------+--------+------+-------+--------+
> | ID | Name | Artist | Date | Label | Studio |
> +----+-------------------+--------+------+-------+--------+
> | 3 | Agannaga Rüþvet | 362 | 1992 | 8 | |
> +----+-------------------+--------+------+-------+--------+
> 1 row in set (0.00 sec)
>
> mysql> exit
>
> As you may see the critical part I'm interested in is the Name field
> which contains the Turkish string "Rüþvet".
It seems like Google Groups messed up too!
I'm talkin about "Latin small letter s with cedilla" which I type as
"ş" (don't know if this will be rendered correctly when this message
shows up even though my Firefox's character encoding seems to be
utf-8).
--
Emre
From: Pascal J. Bourguignon
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL?
Date:
Message-ID: <873agc32ig.fsf@informatimago.com>
Emre Sevinc <···········@gmail.com> writes:
>> As you may see the critical part I'm interested in is the Name field
>> which contains the Turkish string "Rüþvet".
>
> It seems like Google Groups messed up too!
Yes.
> I'm talkin about "Latin small letter s with cedilla" which I type as
> "ş" (don't know if this will be rendered correctly when this message
> shows up even though my Firefox's character encoding seems to be
> utf-8).
This one shows ok.
AFAIK, clsql doesn't manage encodings. You have to do it yourself
explicitely. Here is what I did:
(defparameter *weubcwk*
#+sbcl :iso-8859-1
#+clisp charset:iso-8859-1
#-(or sbcl clisp)
(error "What encoding is used by clsql in ~A?" (lisp-implementation-type))
"Whatever Encoding Used By Clsql Who Knows")
(defun utf-8-bytes (data)
#-(or sbcl clisp)
(error "utf-8-bytes not implemented in ~A?" (lisp-implementation-type))
(let ((bytes
#+sbcl (sb-ext:string-to-octets data :external-format *weubcwk*)
#+clisp (ext:convert-string-to-bytes data *weubcwk*)))
#+sbcl (sb-ext:octets-to-string bytes :external-format :utf-8)
#+clisp (ext:convert-string-from-bytes bytes charset:utf-8)))
(defun utf-8-string (string)
#-(or sbcl clisp)
(error "utf-8-string not implemented in ~A?" (lisp-implementation-type))
(let ((bytes
#+sbcl (sb-ext:string-to-octets string :external-format :utf-8)
#+clisp (ext:convert-string-to-bytes string charset:utf-8)))
#+sbcl (sb-ext:octets-to-string bytes :external-format *weubcwk*)
#+clisp (ext:convert-string-from-bytes bytes *weubcwk*)))
Then, when you define your views in clsql, use the :db-reader and
:db-writer to convert the strings:
(clsql:def-view-class example ()
((text :db-kind :base
:db-constraints :not-null
:db-reader utf-8-bytes
:db-writer utf-8-string
:type (varchar 75)
:reader example-text))
(:base-table "EXAMPLE"))
--
__Pascal Bourguignon__
> Then, when you define your views in clsql, use the :db-reader and
> :db-writer to convert the strings:
>
> (clsql:def-view-class example ()
> ((text :db-kind :base
> :db-constraints :not-null
> :db-reader utf-8-bytes
> :db-writer utf-8-string
> :type (varchar 75)
> :reader example-text))
> (:base-table "EXAMPLE"))
>
Thank you very much!
I tried it and it seems to work:
CG-USER(19): (defun utf-8-bytes (data)
(let ((bytes
(string-to-octets data :external-format :iso-8859-1)))
(octets-to-string bytes :external-format :utf-8)))
UTF-8-BYTES
CG-USER(26): (clsql:def-view-class gecici ()
((metin
:db-kind :base
:type (varchar 75)
:db-reader utf-8-bytes
:accessor text
:initarg :text))
(:base-table "gecici"))
#<CLSQL-SYS::STANDARD-DB-CLASS GECICI>
CG-USER(27): (let ((new-text (car
(clsql:select 'gecici
:flatp t))))
(format t "~A" (text new-text)))
rüðþvetçi
NIL
--
Emre
From: Jens Teich
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL?
Date:
Message-ID: <m2iqp8idvu.fsf@jensteich.de>
For these reasons I switched to PostgreSQL and postmodern
(http://common-lisp.net/project/postmodern/) and am fine
with utf8 on LispWorks and SBCL.
Jens
From: Kenny
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL?
Date:
Message-ID: <4953cfa8$0$14277$607ed4bc@cv.net>
Jens Teich wrote:
> For these reasons I switched to PostgreSQL and postmodern
> (http://common-lisp.net/project/postmodern/) and am fine
> with utf8 on LispWorks and SBCL.
>
I was not involved in the effort to get AllegroCL to read something
similar so I can offer little more than we too ended up with a
Postgres/Postmodern stack.
kt
From: Alex Mizrahi
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL?
Date:
Message-ID: <4953b1c2$0$90262$14726298@news.sunsite.dk>
ES> Another problematic version above, it returns "Rü�vet" instead of the
ES> correct version "R��vet".
CLSQL thinks that octets it recieves are in latin1, one-byte encoding, and
it just puts them one by one into a lisp string. what you see here is UTF-8
interpreted
as if it was latin1.
it does so using uffi:convert-from-cstring, which in its turn uses Allegro
CL function
excl:native-to-string. according to documentation:
http://www.franz.com/support/documentation/7.0/doc/operators/excl/native-to-string.htm
you can customize this convesion setting *locale*'s locale-external-format.
HTH
(alternatively you can decode strings you get from CLSQL into octets (using
latin1 format
that is used by default) and concert this octets back to string with UTF-8
external format.
that's what Pascal have wrote, but he have got format order reversed.)
On Dec 25, 6:15 pm, "Alex Mizrahi" <········@users.sourceforge.net>
wrote:
> ES> Another problematic version above, it returns "RüÅvet" instead of the
> ES> correct version "Rüþvet".
>
> CLSQL thinks that octets it recieves are in latin1, one-byte encoding, and
> it just puts them one by one into a lisp string. what you see here is UTF-8
> interpreted as if it was latin1.
>
> it does so using uffi:convert-from-cstring, which in its turn uses Allegro
> CL function excl:native-to-string. according to documentation:http://www.franz.com/support/documentation/7.0/doc/operators/excl/nat...
> you can customize this convesion setting *locale*'s locale-external-format.
> HTH
>
> (alternatively you can decode strings you get from CLSQL into octets (using
> latin1 format
> that is used by default) and concert this octets back to string with UTF-8
> external format.
> that's what Pascal have wrote, but he have got format order reversed.)
I gave it a try without success, here's what I did:
CG-USER(2): *locale*
#<locale "en_US" [:UTF8-BASE] @ #x71e2d7ea>
CG-USER(3): (LOCALE-EXTERNAL-FORMAT (FIND-LOCALE "en_EN"))
#<EXTERNAL-FORMAT :LATIN1 [(CRLF-BASE-EF :LATIN1)] @ #x71112242>
CG-USER(4): (find-external-format :utf-8)
#<EXTERNAL-FORMAT :UTF8 [(CRLF-BASE-EF :UTF8)] @ #x711121c2>
CG-USER(5): (SETF (LOCALE-EXTERNAL-FORMAT (FIND-LOCALE "en_EN")) (find-
external-format :utf-8))
#<EXTERNAL-FORMAT :UTF8 [(CRLF-BASE-EF :UTF8)] @ #x711121c2>
CG-USER(8): (LOCALE-EXTERNAL-FORMAT (FIND-LOCALE "en_EN"))
#<EXTERNAL-FORMAT :UTF8 [(CRLF-BASE-EF :UTF8)] @ #x711121c2>
So now it seems like I have utf-8 enabled, now I try:
CG-USER(11): (clsql:execute-command "SET NAMES 'utf8'")
CG-USER(12): (CLSQL:QUERY "SELECT * FROM gecici")
(("rüÄÅvetçi"))
("metin")
CG-USER(13): (string-to-native (CAR (CAR (CLSQL-SYS:QUERY "SELECT *
FROM gecici"))))
2685176784
22
CG-USER(14): (NATIVE-TO-STRING 2685176784 :external-format :utf-8)
"rüÄÅvetçi"
13
21
CG-USER(15): (NATIVE-TO-STRING 2685176784)
"rüÄÅvetçi"
13
21
Seems to have no effect. Now I cannot understand because you said "it
does so using uffi:convert-from-cstring, which in its turn uses
Allegro CL function excl:native-to-string", I'm trying to use native-
to-string directly but it still seems problematic.
--
Emre
On Dec 25, 6:15 pm, "Alex Mizrahi" <········@users.sourceforge.net>
wrote:
> ES> Another problematic version above, it returns "RüÅvet" instead of the
> ES> correct version "Rüþvet".
>
> CLSQL thinks that octets it recieves are in latin1, one-byte encoding, and
> it just puts them one by one into a lisp string. what you see here is UTF-8
> interpreted
> as if it was latin1.
>
> it does so using uffi:convert-from-cstring, which in its turn uses Allegro
> CL function
> excl:native-to-string. according to documentation:http://www.franz.com/support/documentation/7.0/doc/operators/excl/nat...
> you can customize this convesion setting *locale*'s locale-external-format.
> HTH
Hmm, after playing with Pascal's solution I guess I understood what
you meant, too. This time a correct example:
CG-USER(29): (string-to-native (CAR (CAR (CLSQL-SYS:QUERY "SELECT *
FROM gecici"))) :external-format :iso-8859-1)
2685292688
27
CG-USER(30): (NATIVE-TO-STRING 2685292688 :external-format :utf-8)
"rüğşvetçi ĞŞİıçö"
16
26
(Google Groups may mess up the characters but I had what I wanted in
my Allegro CL REPL happily :)
Cheers,
--
Emre