How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL?

From: Emre Sevinc
Subject: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish 	characters from MySQL?
Date: Thu, 25 Dec 2008 08:04:26 +0000
Message-ID: <4ab95f09-6b0c-4a8d-8640-ceed580e68c6@m4g2000vbp.googlegroups.com>

Dear Lispers,

I'm trying to connect to a MySQL and retrieve some data that are
encoded in utf-8 and contain some Turkish characters. First let me
show data correctly using commandline mysql:

···@cas-laptop:~$ mysql --host=10.15.4.100 -D "flac_deneme" -u emres -
p
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.0.51a-3ubuntu5.4 (Ubuntu)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM Album WHERE ID=3;
+----+-------------------+--------+------+-------+--------+
| ID | Name              | Artist | Date | Label | Studio |
+----+-------------------+--------+------+-------+--------+
|  3 | Agannaga Rüþvet |    362 | 1992 |     8 |        |
+----+-------------------+--------+------+-------+--------+
1 row in set (0.00 sec)

mysql> exit

As you may see the critical part I'm interested in is the Name field
which contains the Turkish string "Rüþvet".

Now, let's see what kind of problem happens when I try to run the same
query within Allegro CL 8.1 (Free Express Edition) with the help of cl-
sql (some irrelevant message lines are ommited and shown as dots):

···@cas-laptop:~/Documents/applications/acl81_express$ ./alisp
;;; Installing locale patch, version 1.
International Allegro CL Free Express Edition
8.1 [Linux (x86)] (Nov 27, 2008 11:52)
Copyright (C) 1985-2007, Franz Inc., Oakland, CA, USA.  All Rights
Reserved.

This development copy of Allegro CL is licensed to:
   Trial User

CL-USER(1): (push #P"/usr/share/common-lisp/systems/" asdf:*central-
registry*)

; Autoloading for package "ASDF":
; Fast loading
;    /home/cas/Documents/applications/acl81_express/code/asdf.fasl
(#P"/usr/share/common-lisp/systems/" *DEFAULT-PATHNAME-DEFAULTS*)

CL-USER(2): (asdf:operate 'asdf:load-op 'clsql)
; loading system definition from
; /usr/share/common-lisp/systems/clsql.asd into #<The ASDF0 package>
; Loading /usr/share/common-lisp/systems/clsql.asd
; loading system definition from
; /usr/share/common-lisp/systems/uffi.asd into #<The ASDF1 package>
;   Loading /usr/share/common-lisp/systems/uffi.asd
; registering #<SYSTEM UFFI @ #x71619482> as UFFI
; Loading /etc/clsql-init.lisp
.
.
.

NIL


Now I connect to the database on another machine:

CL-USER(3): (clsql:connect '("10.15.4.100"
                "flac_deneme"
                "emres"
                "emres")
               :database-type :mysql)
; loading system definition from
; /usr/share/common-lisp/systems/clsql-mysql.asd into
; #<The ASDF0 package>
; Loading /usr/share/common-lisp/systems/clsql-mysql.asd
; registering #<SYSTEM :CLSQL-MYSQL @ #x71781db2> as CLSQL-MYSQL
; loading system definition from
; /usr/share/common-lisp/systems/clsql-uffi.asd into
.
.
.
#<CLSQL-MYSQL:MYSQL-DATABASE 10.15.4.100/flac_deneme/emres OPEN @
  #x717a7aa2>


And I run the same query:

CL-USER(4): (clsql-sys:query "SELECT * FROM Album WHERE ID=3")
((3 "Agannaga Rü?vet" 362 "1992" 8 ""))
("ID" "Name" "Artist" "Date" "Label" "Studio")

Here is the first problematic version above, it returns "Rü?vet" not
the correct "Rüþvet".

Let's try harder:

CL-USER(5): (clsql-sys:execute-command "SET NAMES 'utf8'")
CL-USER(6): (clsql-sys:query "SELECT * FROM Album WHERE ID=3")
((3 "Agannaga RÃ¼Åvet" 362 "1992" 8 ""))
("ID" "Name" "Artist" "Date" "Label" "Studio")

Another problematic version above, it returns "RÃ¼Åvet" instead of the
correct version "Rüþvet".

What is the *locale* of my Allegro Common Lisp?

CL-USER(7): *locale*
#<locale "en_US" [:UTF8-BASE] @ #x7130205a>

I give it another try even though it'll switch to iso-8859-9 encoding:

CL-USER(8): (setf *locale* (find-locale "tr_TR"))
;; Autoloading locale from #P"/home/cas/Documents/applications/
acl81_express/locales/tr_TR"
; Fast loading from bundle code/ef-iso8859-9.fasl.
; Fast loading from bundle code/efft-iso8859-9-base.fasl.
#<locale "tr_TR" [:ISO8859-9-BASE] @ #x716a72f2>

Here we go:

CL-USER(9): (clsql-sys:query "SELECT * FROM Album WHERE ID=3")
((3 "Agannaga RÃ¼Åvet" 362 "1992" 8 ""))
("ID" "Name" "Artist" "Date" "Label" "Studio")

No hope and it seems like en_EN.utf-8 returns the same result as
tr_TR.iso8859-9

My system is Ubuntu GNU/Linux Hardy. I'm using Allegro CL Express
Edition 8.1 (free version).

$ uname -a
Linux cas-laptop 2.6.24-22-generic #1 SMP Mon Nov 24 18:32:42 UTC 2008
i686 GNU/Linux

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Any ideas how I can get my utf-8 encoded Turkish characters from cl-
sql using Allegro CL?

Note: There are no problems reading and writing utf-8 encoded Turkish
characters to/from files on my local hard disk, Allegro CL can handle
that without problems:

$ cat /home/cas/Documents/source/Lisp/ilc/name.txt
Agannaga Rüþvet 362 1992 8

CL-USER(1): (with-open-file (stream "/home/cas/Documents/source/Lisp/
ilc/name.txt")
  (format t "~a~%" (read-line stream)))
Agannaga Rüþvet 362 1992 8
NIL

Cheers,

--
Emre sevinc

Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL? Emre Sevinc
- Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL? Pascal J. Bourguignon
  - Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL? Emre Sevinc
Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL? Jens Teich
- Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL? Kenny
Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL? Alex Mizrahi
- Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL? Emre Sevinc
- Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL? Emre Sevinc

From: Emre Sevinc
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish 	characters from MySQL?
Date: Thu, 25 Dec 2008 08:43:56 +0000
Message-ID: <a1168d5c-2875-48ce-b40b-54f72a8d6a5c@a12g2000pro.googlegroups.com>

On Dec 25, 10:04 am, Emre Sevinc <···········@gmail.com> wrote:

> mysql> SELECT * FROM Album WHERE ID=3;
> +----+-------------------+--------+------+-------+--------+
> | ID | Name              | Artist | Date | Label | Studio |
> +----+-------------------+--------+------+-------+--------+
> |  3 | Agannaga Rüþvet |    362 | 1992 |     8 |        |
> +----+-------------------+--------+------+-------+--------+
> 1 row in set (0.00 sec)
>
> mysql> exit
>
> As you may see the critical part I'm interested in is the Name field
> which contains the Turkish string "Rüþvet".

It seems like Google Groups messed up too!

I'm talkin about "Latin small letter s with cedilla" which I type as
"ş" (don't know if this will be rendered correctly when this message
shows up even though my Firefox's character encoding seems to be
utf-8).


--
Emre

From: Pascal J. Bourguignon
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish  characters from MySQL?
Date: Thu, 25 Dec 2008 11:41:59 +0000
Message-ID: <873agc32ig.fsf@informatimago.com>

Emre Sevinc <···········@gmail.com> writes:
>> As you may see the critical part I'm interested in is the Name field
>> which contains the Turkish string "Rüþvet".
>
> It seems like Google Groups messed up too!

Yes.

> I'm talkin about "Latin small letter s with cedilla" which I type as
> "ş" (don't know if this will be rendered correctly when this message
> shows up even though my Firefox's character encoding seems to be
> utf-8).

This one shows ok. 

AFAIK, clsql doesn't manage encodings.  You have to do it yourself
explicitely.  Here is what I did:

(defparameter *weubcwk*
  #+sbcl  :iso-8859-1
  #+clisp charset:iso-8859-1
  #-(or sbcl clisp) 
  (error "What encoding is used by clsql in ~A?" (lisp-implementation-type))
  "Whatever Encoding Used By Clsql Who Knows")

(defun utf-8-bytes  (data)
  #-(or sbcl clisp) 
  (error "utf-8-bytes not implemented in ~A?" (lisp-implementation-type))
  (let ((bytes
         #+sbcl  (sb-ext:string-to-octets     data  :external-format *weubcwk*)
         #+clisp (ext:convert-string-to-bytes data  *weubcwk*)))
    #+sbcl  (sb-ext:octets-to-string          bytes :external-format :utf-8)
    #+clisp (ext:convert-string-from-bytes    bytes charset:utf-8)))

(defun utf-8-string (string)
  #-(or sbcl clisp) 
  (error "utf-8-string not implemented in ~A?" (lisp-implementation-type))
  (let ((bytes
         #+sbcl  (sb-ext:string-to-octets     string :external-format :utf-8)
         #+clisp (ext:convert-string-to-bytes string charset:utf-8)))
    #+sbcl  (sb-ext:octets-to-string          bytes  :external-format *weubcwk*)
    #+clisp (ext:convert-string-from-bytes    bytes  *weubcwk*)))


Then, when you define your views in clsql, use the :db-reader and
:db-writer to convert the strings:

(clsql:def-view-class example ()
  ((text         :db-kind :base
                 :db-constraints :not-null
                 :db-reader utf-8-bytes
                 :db-writer utf-8-string
                 :type (varchar 75)
                 :reader example-text))
  (:base-table "EXAMPLE"))

-- 
__Pascal Bourguignon__

From: Emre Sevinc
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish 	characters from MySQL?
Date: Fri, 26 Dec 2008 12:08:21 +0000
Message-ID: <8e681c54-9953-4ed0-bf19-13f4e0bf6310@v5g2000prm.googlegroups.com>

> Then, when you define your views in clsql, use the :db-reader and
> :db-writer to convert the strings:
>
> (clsql:def-view-class example ()
>   ((text         :db-kind :base
>                  :db-constraints :not-null
>                  :db-reader utf-8-bytes
>                  :db-writer utf-8-string
>                  :type (varchar 75)
>                  :reader example-text))
>   (:base-table "EXAMPLE"))
>

Thank you very much!

I tried it and it seems to work:

CG-USER(19): (defun utf-8-bytes  (data)
  (let ((bytes
         (string-to-octets data :external-format :iso-8859-1)))
    (octets-to-string bytes :external-format :utf-8)))
UTF-8-BYTES

CG-USER(26): (clsql:def-view-class gecici ()
  ((metin
    :db-kind :base
    :type (varchar 75)
    :db-reader utf-8-bytes
    :accessor text
    :initarg :text))
  (:base-table "gecici"))
#<CLSQL-SYS::STANDARD-DB-CLASS GECICI>

CG-USER(27): (let ((new-text (car
                  (clsql:select 'gecici
                                :flatp t))))
  (format t "~A" (text new-text)))
rüðþvetçi
NIL

--
Emre

From: Jens Teich
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish  characters from MySQL?
Date: Thu, 25 Dec 2008 13:27:17 +0000
Message-ID: <m2iqp8idvu.fsf@jensteich.de>

For these reasons I switched to PostgreSQL and postmodern
(http://common-lisp.net/project/postmodern/) and am fine
with utf8 on LispWorks and SBCL.

Jens

From: Kenny
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish  characters from MySQL?
Date: Thu, 25 Dec 2008 18:23:36 +0000
Message-ID: <4953cfa8$0$14277$607ed4bc@cv.net>

Jens Teich wrote:
> For these reasons I switched to PostgreSQL and postmodern
> (http://common-lisp.net/project/postmodern/) and am fine
> with utf8 on LispWorks and SBCL.
> 

I was not involved in the effort to get AllegroCL to read something 
similar so I can offer little more than we too ended up with a 
Postgres/Postmodern stack.

kt

From: Alex Mizrahi
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish characters from MySQL?
Date: Thu, 25 Dec 2008 16:15:55 +0000
Message-ID: <4953b1c2$0$90262$14726298@news.sunsite.dk>

 ES> Another problematic version above, it returns "Rü�vet" instead of the
 ES> correct version "R��vet".

CLSQL thinks that octets it recieves are in latin1, one-byte encoding, and
it just puts them one by one into a lisp string. what you see here is UTF-8 
interpreted
as if it was latin1.

it does so using uffi:convert-from-cstring, which in its turn uses Allegro 
CL function
excl:native-to-string. according to documentation: 
http://www.franz.com/support/documentation/7.0/doc/operators/excl/native-to-string.htm
you can customize this convesion setting *locale*'s locale-external-format. 
HTH

(alternatively you can decode strings you get from CLSQL into octets (using 
latin1 format
that is used by default) and concert this octets back to string with UTF-8 
external format.
that's what Pascal have wrote, but he have got format order reversed.)

From: Emre Sevinc
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish 	characters from MySQL?
Date: Fri, 26 Dec 2008 11:56:56 +0000
Message-ID: <ba8eb894-fb66-492f-83af-588d1fc1e84c@q26g2000prq.googlegroups.com>

On Dec 25, 6:15 pm, "Alex Mizrahi" <········@users.sourceforge.net>
wrote:
>  ES> Another problematic version above, it returns "RÃ¼Åvet" instead of the
>  ES> correct version "Rüþvet".
>
> CLSQL thinks that octets it recieves are in latin1, one-byte encoding, and
> it just puts them one by one into a lisp string. what you see here is UTF-8
> interpreted as if it was latin1.
>
> it does so using uffi:convert-from-cstring, which in its turn uses Allegro
> CL function  excl:native-to-string. according to documentation:http://www.franz.com/support/documentation/7.0/doc/operators/excl/nat...
> you can customize this convesion setting *locale*'s locale-external-format.
> HTH
>
> (alternatively you can decode strings you get from CLSQL into octets (using
> latin1 format
> that is used by default) and concert this octets back to string with UTF-8
> external format.
> that's what Pascal have wrote, but he have got format order reversed.)

I gave it a try without success, here's what I did:

CG-USER(2): *locale*
#<locale "en_US" [:UTF8-BASE] @ #x71e2d7ea>

CG-USER(3): (LOCALE-EXTERNAL-FORMAT (FIND-LOCALE "en_EN"))
#<EXTERNAL-FORMAT :LATIN1 [(CRLF-BASE-EF :LATIN1)] @ #x71112242>

CG-USER(4): (find-external-format :utf-8)
#<EXTERNAL-FORMAT :UTF8 [(CRLF-BASE-EF :UTF8)] @ #x711121c2>

CG-USER(5): (SETF (LOCALE-EXTERNAL-FORMAT (FIND-LOCALE "en_EN")) (find-
external-format :utf-8))
#<EXTERNAL-FORMAT :UTF8 [(CRLF-BASE-EF :UTF8)] @ #x711121c2>

CG-USER(8): (LOCALE-EXTERNAL-FORMAT (FIND-LOCALE "en_EN"))
#<EXTERNAL-FORMAT :UTF8 [(CRLF-BASE-EF :UTF8)] @ #x711121c2>

So now it seems like I have utf-8 enabled, now I try:

CG-USER(11): (clsql:execute-command "SET NAMES 'utf8'")

CG-USER(12): (CLSQL:QUERY "SELECT * FROM gecici")
(("rÃ¼ÄÅvetÃ§i"))
("metin")

CG-USER(13): (string-to-native (CAR (CAR (CLSQL-SYS:QUERY "SELECT *
FROM gecici"))))
2685176784
22

CG-USER(14): (NATIVE-TO-STRING 2685176784 :external-format :utf-8)
"rÃ¼ÄÅvetÃ§i"
13
21

CG-USER(15): (NATIVE-TO-STRING 2685176784)
"rÃ¼ÄÅvetÃ§i"
13
21

Seems to have no effect. Now I cannot understand because you said "it
does so using uffi:convert-from-cstring, which in its turn uses
Allegro  CL function  excl:native-to-string", I'm trying to use native-
to-string directly but it still seems problematic.

--
Emre

From: Emre Sevinc
Subject: Re: How to: Allegro CL, cl-sql, utf-8 and problem getting my Turkish 	characters from MySQL?
Date: Fri, 26 Dec 2008 12:13:33 +0000
Message-ID: <eb18e4f5-f5b6-45fa-abcc-1dd2be1054f0@c36g2000prc.googlegroups.com>

On Dec 25, 6:15 pm, "Alex Mizrahi" <········@users.sourceforge.net>
wrote:
>  ES> Another problematic version above, it returns "RÃ¼Åvet" instead of the
>  ES> correct version "Rüþvet".
>
> CLSQL thinks that octets it recieves are in latin1, one-byte encoding, and
> it just puts them one by one into a lisp string. what you see here is UTF-8
> interpreted
> as if it was latin1.
>
> it does so using uffi:convert-from-cstring, which in its turn uses Allegro
> CL function
> excl:native-to-string. according to documentation:http://www.franz.com/support/documentation/7.0/doc/operators/excl/nat...
> you can customize this convesion setting *locale*'s locale-external-format.
> HTH

Hmm, after playing with Pascal's solution I guess I understood what
you meant, too. This time a correct example:

CG-USER(29): (string-to-native (CAR (CAR (CLSQL-SYS:QUERY "SELECT *
FROM gecici"))) :external-format :iso-8859-1)
2685292688
27

CG-USER(30): (NATIVE-TO-STRING 2685292688 :external-format :utf-8)
"rüğşvetçi ĞŞİıçö"
16
26

(Google Groups may mess up the characters but I had what I wanted in
my Allegro CL REPL happily :)

Cheers,

--
Emre