From: Mathias  Dahl
Subject: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <1158186063.992133.312300@e63g2000cwd.googlegroups.com>
Hola!

I am playing with a small web application of mine. I got SBCL + CLSQL +
TBNL + MySQL + Apache + mod_lisp working, so I was actually quite
happy. Until I thought I should test with Unicode data in the database.
First I needed to upgrade to MySQL 4.1 and after an hour or so it
worked and I can properly see Unicode characters both in Emacs (SLIME)
and gnome-terminal by setting the encoding to utf-8 in all places.

However, SBCL reports the following error as soon as it gets Unicode
data in a CLSQL-call. Before I did the call, I told MySQL to be nice,
from inside SBCL:

* (clsql:execute-command "SET NAMES utf8")

* (clsql:execute-command "SET CHARACTER SET utf8")

And here is the query that fails (other queries work at this point).
Btw, the data is the word "правда" (pravda = truth), which is one
of the few words I can type in cyrillic:

* (clsql:query "SELECT t.name FROM tag_versions t")

debugger invoked on a SB-INT:STREAM-ENCODING-ERROR in thread
#<THREAD "initial thread" {B2AA0D1}>:
  encoding error on stream #<SB-SYS:FD-STREAM for "standard output"
{B2AA339}>
  (:EXTERNAL-FORMAT :LATIN-1):
    the character with code 1087 cannot be encoded.

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [OUTPUT-NOTHING] Skip output of this character.
  1: [ABORT         ] Exit debugger, returning to top level.

(SB-INT:STREAM-ENCODING-ERROR
 #<SB-SYS:FD-STREAM for "standard output" {B2AA339}>
 1087)
0] 1

Now, the error is quite "good" in my opinion; I can partly understand
what is going wrong (seems it tries to encode a number as high as 1087
as latin-1, which will never work.

But I have a hard time figuring out where to begin debugging this. I
got all the above components (uffi and all too) installed by reading a
lot of blog posts etc and it is probably a miracle that it works in the
first place :), and my Lisp experience is mostly with Emacs Lisp, so...
I scanned the CLSQL package for ":external-format" and only found it in
once place.

Any pointers on where to look further or what do tweak? Assuming I get
this step to work I want, of course, to have TBNL let the data through
aaaaall the way to the browser, and I don't expect that to happen by
itself either.

Okay, some version information might be in place:

SBCL 0.9.13
MySQL 4.1

My asdf:registry says the rest:

(setf asdf:*central-registry*
      '(*default-pathname-defaults*
	(concatenate 'string *lisp-dirs* "cl-base64-3.3.1/")
	(concatenate 'string *lisp-dirs* "cl-ppcre-1.2.14/")
	(concatenate 'string *lisp-dirs* "cl-who-0.6.0/")
	(concatenate 'string *lisp-dirs* "uffi-1.5.16/")
	(concatenate 'string *lisp-dirs* "clsql-3.6.6/")
	(concatenate 'string *lisp-dirs* "rt-20040621/")
	(concatenate 'string *lisp-dirs* "kmrcl-1.85/")
	(concatenate 'string *lisp-dirs* "md5-1.8.5/")
	(concatenate 'string *lisp-dirs* "tbnl-0.9.10/")
	(concatenate 'string *lisp-dirs* "rfc2388/")
	(concatenate 'string *lisp-dirs* "url-rewrite/")))

Thanks!

/Mathias

From: kavenchuk
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <1158225797.102742.200390@k70g2000cwa.googlegroups.com>
See in uffi src/strings.lisp
uffi is not support non-ascii (or not in :latin-1?) encodings for sbcl
(or for all?)
From: Mathias  Dahl
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <1158261202.443817.8200@h48g2000cwc.googlegroups.com>
> See in uffi src/strings.lisp
> uffi is not support non-ascii (or not in :latin-1?) encodings for sbcl
> (or for all?)

So you're saying that I can never get this combination to work? I mean,
UFFI is used in CLSQL IIRC.

/Mathias
From: kavenchuk
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <1158305937.390942.177220@b28g2000cwb.googlegroups.com>
Mathias  Dahl писал(а):

> So you're saying that I can never get this combination to work? I mean,
> UFFI is used in CLSQL IIRC.

Ask authors of the UFFI about term of realization of encodings support.

-- 
WBR, Yaroslav Kavenchuk.
From: Ralf Mattes
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <pan.2006.09.15.16.31.35.494231@mh-freiburg.de>
On Fri, 15 Sep 2006 00:38:57 -0700, kavenchuk wrote:

> Mathias:
> 
>> So you're saying that I can never get this combination to work? I mean,
>> UFFI is used in CLSQL IIRC.
> 
> Ask authors of the UFFI about term of realization of encodings support.

How utterly strange: i use UTF-8 encoded data from my postgresql database
via clsql and SBCL since sbcl introduced experimental unicode support.
Never was a problem ... 

 HTH Ralf Mattes  ;-)
From: Harald Hanche-Olsen
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <pcoodthkrhq.fsf@shuttle.math.ntnu.no>
+ Ralf Mattes <··@mh-freiburg.de>:

| How utterly strange: i use UTF-8 encoded data from my postgresql database
| via clsql and SBCL since sbcl introduced experimental unicode support.
| Never was a problem ... 

I don't think clsql uses uffi to access a postgresql database.  Mysql,
however, is a different story.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell
From: kavenchuk
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <1158348513.161227.288030@h48g2000cwc.googlegroups.com>
Harald Hanche-Olsen wrote:

> I don't think clsql uses uffi to access a postgresql database.  Mysql,
> however, is a different story.

:) Oops, yes, uffi is support utf-8 encoding for c-string on sbcl, but
only this encoding.

-- 
WBR, Yaroslav Kavenchuk.
From: Ralf Mattes
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <pan.2006.09.15.21.50.38.601115@mh-freiburg.de>
On Fri, 15 Sep 2006 18:46:09 +0200, Harald Hanche-Olsen wrote:

> + Ralf Mattes <··@mh-freiburg.de>:
> 
> | How utterly strange: i use UTF-8 encoded data from my postgresql database
> | via clsql and SBCL since sbcl introduced experimental unicode support.
> | Never was a problem ... 
> 
> I don't think clsql uses uffi to access a postgresql database.

Just for the records: it does.
You probably think of the postgres socket interface ...

 Cheers, RalfD
 
>  Mysql,
> however, is a different story.
From: Harald Hanche-Olsen
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <pco3bas8e2r.fsf@shuttle.math.ntnu.no>
+ Ralf Mattes <··@mh-freiburg.de>:

| On Fri, 15 Sep 2006 18:46:09 +0200, Harald Hanche-Olsen wrote:
|
|> I don't think clsql uses uffi to access a postgresql database.
|
| Just for the records: it does.
| You probably think of the postgres socket interface ...

Um, you're right, I probably do.  It's the one I am using after all.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell
From: vanekl
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <1158233619.773485.96660@p79g2000cwp.googlegroups.com>
I had a similar problem. SBCL parses your environment's 'LANG' and
sets it's standard-output accordingly. In your shell, what do you get
when you do a 'locale' or 'echo $LANG'? When i added
   export LANG=en_US.UTF-8
to my .bashrc (and sourced it) sbcl started setting standard-output
to utf-8, which fixed the standard-output problem.

BTW, I believe most lispers just create
one or two directories that store asdf links back to the real asdf
files and push these dirs onto their central-registry.
That way, when using asdf, your start-up times will be a little quicker
if asdf only has to search one or two directories instead
of 11 directories (if that matters to you).

Lou



Mathias  Dahl wrote:
> debugger invoked on a SB-INT:STREAM-ENCODING-ERROR in thread
> #<THREAD "initial thread" {B2AA0D1}>:
>   encoding error on stream #<SB-SYS:FD-STREAM for "standard output"
> {B2AA339}>
>   (:EXTERNAL-FORMAT :LATIN-1):
>     the character with code 1087 cannot be encoded.
>
> Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.
>
> restarts (invokable by number or by possibly-abbreviated name):
>   0: [OUTPUT-NOTHING] Skip output of this character.
>   1: [ABORT         ] Exit debugger, returning to top level.
>
> (SB-INT:STREAM-ENCODING-ERROR
>  #<SB-SYS:FD-STREAM for "standard output" {B2AA339}>
>  1087)
From: Harald Hanche-Olsen
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <pco1wqe7u8e.fsf@shuttle.math.ntnu.no>
+ "Mathias  Dahl" <············@gmail.com>:

| And here is the query that fails (other queries work at this point).
| Btw, the data is the word "���ѧӧէ�" (pravda = truth), which is one
| of the few words I can type in cyrillic:
|
| * (clsql:query "SELECT t.name FROM tag_versions t")
|
| debugger invoked on a SB-INT:STREAM-ENCODING-ERROR in thread
| #<THREAD "initial thread" {B2AA0D1}>:
|   encoding error on stream #<SB-SYS:FD-STREAM for "standard output"
| {B2AA339}>
|   (:EXTERNAL-FORMAT :LATIN-1):
|     the character with code 1087 cannot be encoded.

So clearly, your standard output stream is latin-1 encoded, and all
the clsql/mysql stuff is just a red herring.  If anything, your test
shows that the communication to and from the database probably does
work as intended.  It's the communication link between sbcl and
emacs/slime that is the problem.

This is easily tested: Just try to evaluate "���ѧӧէ�" from the slime
prompt.  Or to test just one direction or the other, try running
(char-code #\��) or (code-char #o434).

What is usually recommended (I think) is setting
slime-net-coding-system to 'utf-8-unix before starting slime.

I do the following instead, which also works:

(setf slime-lisp-implementations
      '((sbcl ("sbcl") :coding-system utf-8-unix)
	(cmucl ("cmucl") :coding-system iso-latin-1-unix)))

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell
From: Pascal Bourguignon
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <87irjqde8d.fsf@thalassa.informatimago.com>
Harald Hanche-Olsen <······@math.ntnu.no> writes:
> This is easily tested: Just try to evaluate "правда" from the slime
> prompt.  

Becareful.  It's not so easy.  Inputing and outputing whole utf-8
strings doesn't tell much about utf-8 support by the  tested
software.  At most, it tells that it's 8-bit transparent.

     (aref  "правда" 0) is a better test.


In "clisp -Eterminal utf-8":

[213]> (aref  "правда" 0)
#\CYRILLIC_SMALL_LETTER_PE


In "sbcl --noinform"  (version 0.9.12):

* (aref  "правда" 0)

#\LATIN_CAPITAL_LETTER_ETH
* 

Both accept and display the utf-8 strings, but in the first you have
unicode characters, while in the second you only have 8-bit
characters.


> Or to test just one direction or the other, try running
> (char-code #\д) or (code-char #o434).


* (char-code #\д)

debugger invoked on a READER-ERROR: READER-ERROR on #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDIN* {90CCFB1}>:
unrecognized character name: "д"


* (map 'vector (function identity) "д")

#(#\LATIN_CAPITAL_LETTER_ETH #\ACUTE_ACCENT)



> What is usually recommended (I think) is setting
> slime-net-coding-system to 'utf-8-unix before starting slime.
>
> I do the following instead, which also works:
>
> (setf slime-lisp-implementations
>       '((sbcl ("sbcl") :coding-system utf-8-unix)
> 	(cmucl ("cmucl") :coding-system iso-latin-1-unix)))

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

There is no worse tyranny than to force a man to pay for what he does not
want merely because you think it would be good for him. -- Robert Heinlein
From: Harald Hanche-Olsen
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <pcofyeumibw.fsf@shuttle.math.ntnu.no>
+ Pascal Bourguignon <···@informatimago.com>:

| Harald Hanche-Olsen <······@math.ntnu.no> writes:
|> This is easily tested: Just try to evaluate "���ѧӧէ�" from the slime
|> prompt.  
|
| Becareful.  It's not so easy.  Inputing and outputing whole utf-8
| strings doesn't tell much about utf-8 support by the  tested
| software.  At most, it tells that it's 8-bit transparent.

Good point, though for slime <-> lisp communications (which is the
context here) I think my test will still reveal any problem, since you
have to work pretty hard to make slime and the backend have different
notions of the encoding on the channel - so long as the slime
developers have done that part right.  But the tests you propose will
certainly reveal any problems in that area that my simple test might
overlook.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell
From: Mathias  Dahl
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <1158260120.245971.253790@i3g2000cwc.googlegroups.com>
> the clsql/mysql stuff is just a red herring.  If anything, your test
> shows that the communication to and from the database probably does
> work as intended.  It's the communication link between sbcl and
> emacs/slime that is the problem.

Sorry that I mixed things together. I get this error in "pure" SBCL,
running from the shell (gnome-terminal set to use utf-8):

* (clsql:query "SELECT t.name FROM tag_versions t")

debugger invoked on a SB-INT:STREAM-ENCODING-ERROR in thread
#<THREAD "initial thread" {B2AA0D1}>:
  encoding error on stream #<SB-SYS:FD-STREAM for "standard output"
{B2AA339}>
  (:EXTERNAL-FORMAT :LATIN-1):
    the character with code 65533 cannot be encoded.

In this case, Emacs nor SLIME is involved.

> This is easily tested: Just try to evaluate "правда" from the slime
> prompt.

That works:

* "правда"

"правда"

> Or to test just one direction or the other, try running
> (char-code #\д) or (code-char #o434).

Here I get the same result as Pascal did:
* (char-code #\д)

debugger invoked on a READER-ERROR in thread
#<THREAD "initial thread" {B2AA0D1}>:
  READER-ERROR on #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDIN* {90CDF61}>:
unrecognized character name: "д"

and:

* (code-char #o434)

#\LATIN_CAPITAL_LETTER_G_WITH_CIRCUMFLEX

> I do the following instead, which also works:
>
> (setf slime-lisp-implementations
>       '((sbcl ("sbcl") :coding-system utf-8-unix)
> 	(cmucl ("cmucl") :coding-system iso-latin-1-unix)))

For me, doing that makes SLIME not start. I never get the REPL, instead
I get stuck in the inferior lisp buffer. It even fails if I use
iso-latin-1-unix for SBCL.

If we forget Emacs and SLIME for the moment, can you see any other
things that I can do to make the call to MySQL return the desired
results?

/Mathias
From: Harald Hanche-Olsen
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <pcou039zedj.fsf@shuttle.math.ntnu.no>
+ "Mathias  Dahl" <············@gmail.com>:

| Sorry that I mixed things together. I get this error in "pure" SBCL,
| running from the shell (gnome-terminal set to use utf-8):
|
| * (clsql:query "SELECT t.name FROM tag_versions t")
|
| debugger invoked on a SB-INT:STREAM-ENCODING-ERROR in thread
| #<THREAD "initial thread" {B2AA0D1}>:
|   encoding error on stream #<SB-SYS:FD-STREAM for "standard output"
| {B2AA339}>
|   (:EXTERNAL-FORMAT :LATIN-1):
|     the character with code 65533 cannot be encoded.
|
| In this case, Emacs nor SLIME is involved.

No, but the error message still seems to mumble about communications
with the terminal, not with the database.  Try

(stream-external-format *standard-output*)
(stream-external-format *standard-input*)

to check that you have the right encoding.  (Which won't work under
slime, BTW.)

If you get :ascii rather than :utf-8, you may try setting LC_CTYPE to
some UTF-8 locale before starting sbcl.

|> I do the following instead, which also works:
|>
|> (setf slime-lisp-implementations
|>       '((sbcl ("sbcl") :coding-system utf-8-unix)
|> 	(cmucl ("cmucl") :coding-system iso-latin-1-unix)))
|
| For me, doing that makes SLIME not start. I never get the REPL,
| instead I get stuck in the inferior lisp buffer. It even fails if I
| use iso-latin-1-unix for SBCL.

Strange.  Works for me, though I admit I have not studied this
carefully.  I just tweak my environment until it works.  8-)

The slime mailing list may be a better place to ask about this bit.

| If we forget Emacs and SLIME for the moment, can you see any other
| things that I can do to make the call to MySQL return the desired
| results?

Not me personally, no.  There is also a clsql mailing list.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell
From: Mathias  Dahl
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <1158443776.487376.291600@d34g2000cwd.googlegroups.com>
> No, but the error message still seems to mumble about communications
> with the terminal, not with the database.  Try
>
> (stream-external-format *standard-output*)
> (stream-external-format *standard-input*)
>
> to check that you have the right encoding.  (Which won't work under
> slime, BTW.)
>
> If you get :ascii rather than :utf-8, you may try setting LC_CTYPE to
> some UTF-8 locale before starting sbcl.

Yay! That's it!

When I ran this:

 (stream-external-format *standard-output*)

I got :LATIN-1 as a response.

So, I exited SBCL and did:

 export LC_CTYPE=en_US.UTF-8

Started SBCL again and checked the external-format. Got :UTF-8.
Promising!

Ran my database setup stuff and executed a query that should return
UTF-8 encoded data.

It worked! "правда" in all its glory! :)

Many thanks!

Now let's see if the rest of the "stuff" (TBNL and more) will let this
through all the way to the browser... :)

/Mathias
From: Mathias  Dahl
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <1158444061.884595.35880@h48g2000cwc.googlegroups.com>
> Now let's see if the rest of the "stuff" (TBNL and more) will let this
> through all the way to the browser... :)

Aaaaand.... It works too! Wee! All systems go (etc etc)!

Now I can concentrate on the app instead. As soon as I got this running
under SLIME (the way I formerly tried to set the encoding in SLIME did
not seem to work)...

/Mathias
From: Mathias  Dahl
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <1158446842.425699.200800@e3g2000cwe.googlegroups.com>
Harald Hanche-Olsen wrote:

> + "Mathias  Dahl" <············@gmail.com>:
>
> | Sorry that I mixed things together. I get this error in "pure" SBCL,
> | running from the shell (gnome-terminal set to use
> |> (setf slime-lisp-implementations
> |>       '((sbcl ("sbcl") :coding-system utf-8-unix)
> |> 	(cmucl ("cmucl") :coding-system iso-latin-1-unix)))
> |
> | For me, doing that makes SLIME not start. I never get the REPL,
> | instead I get stuck in the inferior lisp buffer. It even fails if I
> | use iso-latin-1-unix for SBCL.
>
> Strange.  Works for me, though I admit I have not studied this
> carefully.  I just tweak my environment until it works.  8-)

Seems I was using a very old SLIME version (1.2). After downloading
2.0, the following works perfectly:

(setq slime-net-coding-system 'utf-8-unix)

After doing that I see the Unicode chars in Emacs too. Yay! :)

I found that variable in the manual, here:

http://common-lisp.net/project/slime/doc/html/slime_42.html#SEC42
From: David Hansen
Subject: Re: Unicode with SBCL, CLSQL, TBNL and MySQL?
Date: 
Message-ID: <87r6yefc7g.fsf@robotron.kosmorama>
On Thu, 14 Sep 2006 10:00:49 +0200 Harald Hanche-Olsen wrote:

> What is usually recommended (I think) is setting
> slime-net-coding-system to 'utf-8-unix before starting slime.
>
> I do the following instead, which also works:
>
> (setf slime-lisp-implementations
>       '((sbcl ("sbcl") :coding-system utf-8-unix)
> 	(cmucl ("cmucl") :coding-system iso-latin-1-unix)))

Additionally i had to tell swank to use utf-8 for the
communication (from ~/.sbclrc):

(asdf:operate 'asdf:load-op 'swank)
(swank:create-swank-server
 4005 :spawn #'swank::simple-announce-function t :utf-8-unix)

David