Re: Printing unicode

From: blandest
Subject: Re: Printing unicode
Date: Mon, 10 Nov 2008 10:00:51 +0000
Message-ID: <9308eb7a-e403-456b-9c8c-3d5f6cf07ea7@s9g2000prm.googlegroups.com>

On Nov 9, 10:46 pm, ·············@gmail.com wrote:
> On Nov 9, 8:36 pm, Lars Rune Nøstdal <···········@gmail.com> wrote:
>
>
>
> > On Sun, 2008-11-09 at 11:20 -0800, ·············@gmail.com wrote:
> > > On Nov 9, 5:42 pm, smallpond <·········@juno.com> wrote:
> > > > The US DOS character set is CP437, which lists the spade character
> > > > as Unicode value dec 9824 / hex 2660.
>
> > > Seems I got my decimal and hex mixed up there:)
> > > But I still get the same message:
>
> > > Character #\u2660 cannot be represented in the character set
> > > CHARSET:ISO-8859-1
>
> > Just a guess, in my ~/.emacs I have:
>
> >   (setq current-language-environment "UTF-8")
>
> > ..also, at some point after loading Slime (after calling setup-slime), I
> > do:
>
> >   (setq slime-net-coding-system 'utf-8-unix)
>
> > ..but I'm using Debian, so might need some different values for Windows.
>
> I've pasted those into the the scratch buffer, and evaluated it, but I
> still get the same result.  I'll see what happens when I put them in
> the .emacs file though. (First I'll have to figure out where that is,
> or is expected to be, on windows.)

It seems that you must change the *terminal-encoding* only when
running your code without SLIME (directly from CLISP), but you always
need to set the *default-file-encoding*. I've tested this assumption
on Windows XP SP3 with CLISP 2.45.

From: ·············@gmail.com
Subject: Re: Printing unicode
Date: Mon, 10 Nov 2008 15:20:20 +0000
Message-ID: <7c7740ab-6e9e-48ad-91fb-a671b4d7cb65@i18g2000prf.googlegroups.com>

On Nov 10, 11:00 am, blandest <··············@gmail.com> wrote:
> On Nov 9, 10:46 pm, ·············@gmail.com wrote:
>
>
>
> > On Nov 9, 8:36 pm, Lars Rune N©ªstdal <···········@gmail.com> wrote:
>
> > > On Sun, 2008-11-09 at 11:20 -0800, ·············@gmail.com wrote:
> > > > On Nov 9, 5:42 pm, smallpond <·········@juno.com> wrote:
> > > > > The US DOS character set is CP437, which lists the spade character
> > > > > as Unicode value dec 9824 / hex 2660.
>
> > > > Seems I got my decimal and hex mixed up there:)
> > > > But I still get the same message:
>
> > > > Character #\u2660 cannot be represented in the character set
> > > > CHARSET:ISO-8859-1
>
> > > Just a guess, in my ~/.emacs I have:
>
> > >   (setq current-language-environment "UTF-8")
>
> > > ..also, at some point after loading Slime (after calling setup-slime), I
> > > do:
>
> > >   (setq slime-net-coding-system 'utf-8-unix)
>
> > > ..but I'm using Debian, so might need some different values for Windows.
>
> > I've pasted those into the the scratch buffer, and evaluated it, but I
> > still get the same result.  I'll see what happens when I put them in
> > the .emacs file though. (First I'll have to figure out where that is,
> > or is expected to be, on windows.)
>
> It seems that you must change the *terminal-encoding* only when
> running your code without SLIME (directly from CLISP), but you always
> need to set the *default-file-encoding*. I've tested this assumption
> on Windows XP SP3 with CLISP 2.45.

Ok, I'll try to go through the latest suggestions I've gotten one by
one.

> Perhaps something like the following in ~/.emacs would help:
> (setf slime-net-coding-system 'utf-8-unix)

Done.

> (setf inferior-lisp-program "/usr/bin/clisp -ansi -q -K full -m 32M -I -E UTF-8 -Epathname ISO-8859-1 -Eforeign ISO-8859-1")

Done, with usr/bin/clisp changed to the appropriate executable. I
think LispBox on windows runs a .bat file to launch clisp though, so
I've changed that one as well to have the flag "-E UTF-8". When I run
CLISP alone in the command prompt, this flag fixes the problem; CLISP
outputs utf-8, but the dos command prompt can't print it, so it
doesn't help. It leads me to believe that the problem isn't with CLISP
though. Right? I must say I don't fully understand the rest of the
technology stack, but I guess SLIME is what's in between Emacs and
CLISP, so maybe that's where the problem is.

> When you M-x slime, check that the first line in *inferior-lisp*
mentions :utf-8-unix as:

> (progn (load "/usr/share/emacs/site-lisp/slime/swank-loader.lisp" :verbose t) (funcall (read-from-string "swank:start-server") "/tmp/slime.5749" :external-format :utf-8-unix))

I've not been able to find this line. LispBox loads a lot .el files on
startup, and in the inferior-lisp buffer I just see a lot of "Loading
file C:\Program Files\LispBox\load.lisp ...". Looking at the
definition of (start-server), it doesn't take an external-format
argument. I did a search through the slime files for iso-8859, and
found this in the file swank-clisp.lisp:

	(defimplementation accept-connection (socket)
	  (socket:socket-accept socket
				:buffered nil ;; XXX should be t
				:element-type 'character
				:external-format (ext:make-encoding
						  :charset 'charset:iso-8859-1
						  :line-terminator :unix)))

So I changed ":charset 'charset:iso-8859-1" to be ":charset
'charset:utf-8", and restarted LispBox. This resulted in errors:

    Connecting to Swank on port 2671..
    Initial handshake...
    error in process filter: slime-process-available-input: Wrong type
argument: listp, ¢¬
    error in process filter: Wrong type argument: listp, ¢¬

I also found this in clisp-2.33\emacs\clisp-indent.lisp

	(with-output-to-printer (s
	                         :external-format charset:iso-8859-1)

	  (form1))

Changed to charset:utf-8, no change.

No combination of the two changes above has worked either. The error
message in the first one is interesting, "Wrong type argument: listp,
¢¬  ". That character after the comma looks like something is treated
as utf-8, where it shouldn't be.

> It seems that you must change the *terminal-encoding* only when
> running your code without SLIME (directly from CLISP), but you always
> need to set the *default-file-encoding*.

I've been able to set *terminal-encoding* to utf-8, but it doesn't
help. I did that be adding the -E utf-8 flag to the call to launch
CLISP.

Here's my environment right now:

CL-USER> *default-file-encoding*
#<ENCODING CHARSET:UTF-8 :DOS>

CL-USER> *pathname-encoding*
#<ENCODING CHARSET:UTF-8 :DOS>

CL-USER> *terminal-encoding*
#<ENCODING CHARSET:UTF-8 :DOS>

CL-USER> *misc-encoding*
#<ENCODING CHARSET:UTF-8 :DOS>

CL-USER> *foreign-encoding*
#<ENCODING CHARSET:ASCII :UNIX>

I'm able to type/paste unicode into emacs, but it gets lost somewhere
along the read-eval-print.
CL-USER> (print "¢¼")

"`"
"`"

CL-USER> (format t "~a" "¢¼")
`
NIL

CL-USER> (format t "~a" (code-char 9824))
; Evaluation aborted
(Gives error "Character #\u2660 cannot be represented in the character
set CHARSET:ISO-8859-1")

Gah. This is killing me.