unicode

From: paul johnston
Subject: unicode
Date: Thu, 15 Jun 2000 00:00:00 +0000
Message-ID: <3948F7A4.AC6ED328@ccl.umist.ac.uk>

I've got a bit of a problem so could anyone out there help me.
As a computer officer in the Dept of Language Engineering at UMIST  I am
asked to supply software for various projects.

We  tend to have to work in several languages with non-latin scripts,
i.e. Greek Cyrillic and even Arabic. Does anyone have a suggestion as to
a unicode compatible lisp that we can use.
We have Allegro CL ver 5.0 has anyone any experience in using non-latin
scripts with this, either under NT or Solaris7?
Many Thanks

--
Paul Johnston
System Admin
Language Engineering
UMIST
Tel 0161 200 3111

Re: unicode Arthur Lemmens
Re: unicode William Deakin
Re: unicode ······@clisp.cons.org
- Re: unicode ······@clisp.cons.org
- Re: unicode Erik Naggum
  - Re: unicode Marco Antoniotti
    - Re: unicode Erik Naggum
      - Re: unicode Kent M Pitman
        Re: unicode ······@clisp.cons.org
      - Re: unicode Marco Antoniotti
        Re: unicode Erik Naggum
        Re: unicode ······@clisp.cons.org
        Re: unicode Erik Naggum
    - Re: unicode ······@clisp.cons.org
      - Re: unicode Pekka P. Pirinen
      - Re: unicode Steven M. Haflich

From: Arthur Lemmens
Subject: Re: unicode
Date: Fri, 16 Jun 2000 00:00:00 +0000
Message-ID: <394A4A25.55E689EA@simplex.nl>

paul johnston wrote:
> 
> I've got a bit of a problem so could anyone out there help me.
> As a computer officer in the Dept of Language Engineering at UMIST  I am
> asked to supply software for various projects.
> 
> We  tend to have to work in several languages with non-latin scripts,
> i.e. Greek Cyrillic and even Arabic. Does anyone have a suggestion as to
> a unicode compatible lisp that we can use.

Lispworks has reasonably good support for Unicode. I've used it to edit and 
process some Unicode files that contained characters from the ASCII, Latin1 
and Cyrillic character blocks.

It was pretty easy to configure the editor so it could switch between two 
sets of keyboard bindings. (I think it would be easy to support Greek in the 
same way, but configuring the editor for working with Arabic would be a lot 
more difficult, as you probably know.)

I had a few small problems with Lispwork's Unicode support, but no  major 
gotchas. 

Arthur

From: William Deakin
Subject: Re: unicode
Date: Fri, 16 Jun 2000 00:00:00 +0000
Message-ID: <3949F30A.C9296AF@pindar.com>

paul johnston wrote:

> We  tend to have to work in several languages with non-latin scripts,
> i.e. Greek Cyrillic and even Arabic. Does anyone have a suggestion as to
> a unicode compatible lisp that we can use.
> We have Allegro CL ver 5.0 has anyone any experience in using non-latin
> scripts with this, either under NT or Solaris7?

Although I have very limited experience with unicode (I wrote a couple of
strings in Italian once), there was an interesting discussion on c.l.l. a
short while ago about issues related to this but I am not sure if it is
exactly what you wanted[1].

I would also contact Franz directly. Finally, having run a quick search on
the Franz website I found references to International Allegro CL which has
support for Japanese (kanji, ganji &c) so I would have though Greek,
Cyrillic, Arabic or whatever must be tractable.

Best Regards,

:) will

[1] This was the thread `strings and characters'  see
www.deja.com/getdoc.xp?AN=598005460. Also Deja is always a good starting
point for researching historic postings from c.l.l.

From: ······@clisp.cons.org
Subject: Re: unicode
Date: Tue, 20 Jun 2000 00:00:00 +0000
Message-ID: <8inqa3$s06$1@news.u-bordeaux.fr>

paul johnston <·····@ccl.umist.ac.uk> asks:
> Does anyone have a suggestion as to a unicode compatible lisp that
> we can use.

The Linux Unicode HOWTO [1], section 5.3, answers your question:

  The Common Lisp standard specifies two character types: `base-char'
  and `character'. It's up to the implementation to support Unicode or
  not. The language also specifies a keyword argument `:external-format'
  to `open', as the natural place to specify a character set or
  encoding.

  Among the free Common Lisp implementations, only CLISP http://clisp.cons.org/
  supports Unicode. You need a CLISP version from March 2000 or
  newer. ftp://clisp.cons.org/pub/lisp/clisp/source/clispsrc.tar.gz.
  The types `base-char' and `character' are both equivalent to 16-bit
  Unicode. The functions char-width and string-width provide an API
  comparable to wcwidth() and wcswidth(). The encoding used for file or
  socket/pipe I/O can be specified through the `:external-format'
  argument. The encodings used for tty I/O and the default encoding for
  file/socket/pipe I/O are locale dependent.

  Among the commercial Common Lisp implementations, only Eclipse
  http://www.elwood.com/eclipse/eclipse.htm supports Unicode. See
  http://www.elwood.com/eclipse/char.htm. The type `base-char' is
  equivalent to ISO-8859-1, and the type `character' contains all
  Unicode characters. The encoding used for file I/O can be specified
  through a combination of the `:element-type' and `:external-format'
  arguments to `open'. Limitations: Character attribute functions are
  locale dependent. Source and compiled source files cannot contain
  Unicode string literals.

  The commercial Common Lisp implementation Allegro CL does not support
  Unicode yet, but Erik Naggum is working on it.

Bruno                                         http://clisp.cons.org/~haible/

From: ······@clisp.cons.org
Subject: Re: unicode
Date: Tue, 20 Jun 2000 00:00:00 +0000
Message-ID: <8inql9$s1t$1@news.u-bordeaux.fr>

> The Linux Unicode HOWTO [1], section 5.3

Oops, here's the URL:
  ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO.html

From: Erik Naggum
Subject: Re: unicode
Date: Tue, 20 Jun 2000 00:00:00 +0000
Message-ID: <3170508962807354@naggum.no>

* Bruno Haible
| The Linux Unicode HOWTO [1], section 5.3, answers your question:
:
|   The commercial Common Lisp implementation Allegro CL does not support
|   Unicode yet, but Erik Naggum is working on it.

  Franz Inc has had Unicode support in Allegro CL for Windows for
  quite some time, now, thanks to the efforts of Charles Cox.  He has
  also been working on Unicode support for Allegro CL for Unix for
  quite some time, now.  Allegro CL 6.0 supports Unicode natively.

#:Erik
-- 
  If this is not what you expected, please alter your expectations.

From: Marco Antoniotti
Subject: Re: unicode
Date: Tue, 20 Jun 2000 00:00:00 +0000
Message-ID: <lwya40w6el.fsf@parades.rm.cnr.it>

Erik Naggum <····@naggum.no> writes:

> * Bruno Haible
> | The Linux Unicode HOWTO [1], section 5.3, answers your question:
> :
> |   The commercial Common Lisp implementation Allegro CL does not support
> |   Unicode yet, but Erik Naggum is working on it.
> 
>   Franz Inc has had Unicode support in Allegro CL for Windows for
>   quite some time, now, thanks to the efforts of Charles Cox.  He has
>   also been working on Unicode support for Allegro CL for Unix for
>   quite some time, now.  Allegro CL 6.0 supports Unicode natively.

Now the question is: are CLisp, ECLipse and ACL compatible in their
treatment of Unicode?

Cheers

-- 
Marco Antoniotti ===========================================

From: Erik Naggum
Subject: Re: unicode
Date: Tue, 20 Jun 2000 00:00:00 +0000
Message-ID: <3170519803902342@naggum.no>

* Marco Antoniotti <·······@parades.rm.cnr.it>
| Now the question is: are CLisp, ECLipse and ACL compatible in their
| treatment of Unicode?

  Since basically the only useful thing to do with Unicode (data) is
  to have _real_ wide strings, with characters at least 16 bits wide
  _each_ and real character types that reflect real Unicoditude, and
  since Unicode (the standard) defines pretty much what you can do in
  the outside world, the question of what it means to be compatible
  appears to be a question of how each Common Lisp treats _streams_ of
  Unicode characters.

#:Erik
-- 
  If this is not what you expected, please alter your expectations.

From: Kent M Pitman
Subject: Re: unicode
Date: Tue, 20 Jun 2000 00:00:00 +0000
Message-ID: <sfwog4wqelg.fsf@world.std.com>

Erik Naggum <····@naggum.no> writes:

> 
> * Marco Antoniotti <·······@parades.rm.cnr.it>
> | Now the question is: are CLisp, ECLipse and ACL compatible in their
> | treatment of Unicode?
> 
>   Since basically the only useful thing to do with Unicode (data) is
>   to have _real_ wide strings, with characters at least 16 bits wide
>   _each_ and real character types that reflect real Unicoditude, and
>   since Unicode (the standard) defines pretty much what you can do in
>   the outside world, the question of what it means to be compatible
>   appears to be a question of how each Common Lisp treats _streams_ of
>   Unicode characters.

Not having played with it but just thinking about it for a second, I'd
think there'd also the issue of what #\xxx you write to refer to such
a character, and whether a unicode A is char= to a non-unicode A (intended
to be constrained by the CL spec, but...), and probably many other 
little details.  It would certainly be interesting to hear about differences
people uncover.

From: ······@clisp.cons.org
Subject: Re: unicode
Date: Wed, 21 Jun 2000 00:00:00 +0000
Message-ID: <8ir8c3$e3r$1@news.u-bordeaux.fr>

Kent M Pitman <······@world.std.com> wrote:
>
> there'd also the issue of what #\xxx you write to refer to such
> a character

LispWorks uses the syntax #\U+203E, CLISP uses #\U203E, and ACL and LispWorks
have no such notation.

> whether a unicode A is char= to a non-unicode A

The entire idea of Unicode is that there is are no characters outside
Unicode. The only non-Unicode characters I have ever seen in use in Web pages
are Inuktitut (some Eskimo people in Canada).

The bits and font attributes in CL are a different issue, of course.

Bruno

From: Marco Antoniotti
Subject: Re: unicode
Date: Tue, 20 Jun 2000 00:00:00 +0000
Message-ID: <lw4s6ogjif.fsf@parades.rm.cnr.it>

Erik Naggum <····@naggum.no> writes:

> * Marco Antoniotti <·······@parades.rm.cnr.it>
> | Now the question is: are CLisp, ECLipse and ACL compatible in their
> | treatment of Unicode?
> 
>   Since basically the only useful thing to do with Unicode (data) is
>   to have _real_ wide strings, with characters at least 16 bits wide
>   _each_ and real character types that reflect real Unicoditude, and
>   since Unicode (the standard) defines pretty much what you can do in
>   the outside world, the question of what it means to be compatible
>   appears to be a question of how each Common Lisp treats _streams_ of
>   Unicode characters.

Well, Bruno mentioned CHAR-WIDTH and STRING-WIDTH.  He also mentioned
the treatment of :EXTERNAL-FORMAT.

Does ACL have these functions?

Cheers

-- 
Marco Antoniotti ===========================================

From: Erik Naggum
Subject: Re: unicode
Date: Wed, 21 Jun 2000 00:00:00 +0000
Message-ID: <3170598344242639@naggum.no>

* Marco Antoniotti <·······@parades.rm.cnr.it>
| Well, Bruno mentioned CHAR-WIDTH and STRING-WIDTH.

  From the names, I guess these are relics of coding systems.  If you
  think you need to work with coding systems, you are mistaken.  If
  you still think you need to work with coding systems, measuring the
  width of characters in bytes is wrong.

| He also mentioned the treatment of :EXTERNAL-FORMAT.

  The various external-formats you will need in the complex world of
  universal character sets are not covered by the standard.  Nor
  should they.  There are, however, several conflicting attempts to
  enumerate them outside of the Lisp world, and it is not necessarily
  useful to standardize on one of those.

| Does ACL have these functions?

  I hope to <deity> that there won't be any char-width or similar
  cruftitude in Allegro 6.0.  If anything, we should have learned from
  the Great Emacs Experience that exposing coding systems internals to
  users is Just Plain Wrong.

#:Erik
-- 
  If this is not what you expected, please alter your expectations.

From: ······@clisp.cons.org
Subject: Re: unicode
Date: Wed, 21 Jun 2000 00:00:00 +0000
Message-ID: <8ir98c$eku$1@news.u-bordeaux.fr>

Erik Naggum <····@naggum.no> wrote:
>| Well, Bruno mentioned CHAR-WIDTH and STRING-WIDTH.
>
>  From the names, I guess these are relics of coding systems.

Naggum, you are guessing wrong, because you neglected to look up the
documentation of the things you are talking about.
   http://clisp.sourceforge.net/impnotes.html#string-width

These functions are needed for anyone wanting to perform tabular output
or word wrapping, assuming an output device with a fixed size font
and a double size font, like kterm or xterm.

>  If you still think you need to work with coding systems, measuring the
>  width of characters in bytes is wrong.

Measuring the width of characters in bytes is *only* useful when you deal
with memory allocation, which you don't normally do in Lisp.

>  If anything, we should have learned from the Great Emacs Experience
>  that exposing coding systems internals to users is Just Plain Wrong.

I agree with you. What do you think about the NATIVE-STRING-SIZEOF
function in ACL 5.0.1
(see http://www.franz.com/support/documentation/5.0.1/doc/cl/iacl.htm) ?

Bruno

From: Erik Naggum
Subject: Re: unicode
Date: Wed, 21 Jun 2000 00:00:00 +0000
Message-ID: <3170612852124747@naggum.no>

* Bruno Haible
| Naggum, you are guessing wrong, because you neglected to look up the
| documentation of the things you are talking about.

  I love you, too.

| Measuring the width of characters in bytes is *only* useful when you deal
| with memory allocation, which you don't normally do in Lisp.

  Well, gee, _I_ think you may not have paid attention to the Great
  Emacs Experiment, but you wouldn't do something that would even make
  it _possible_ to claim you haven't looked up the documentation of
  the things you're talking about, would you?  Nah, of course not.

| I agree with you.

  I don't generally consider that comforting.  This is no exception.

#:Erik
-- 
  If this is not what you expected, please alter your expectations.

From: ······@clisp.cons.org
Subject: Re: unicode
Date: Wed, 21 Jun 2000 00:00:00 +0000
Message-ID: <8ir7vf$dsj$1@news.u-bordeaux.fr>

Marco Antoniotti <·······@parades.rm.cnr.it> asked:
>
> Now the question is: are CLisp, ECLipse and ACL compatible in their
> treatment of Unicode?

Let me try to give a summary of the features in
  - CLISP 2000-03-06
  - Eclipse
  - ACL 6.0 (not yet released)
  - LispWorks 4.0.1

* Character and string types

  In Eclipse, ACL, LispWorks the type BASE-CHAR includes only Latin-1
  characters, whereas the CHARACTER type includes all of Unicode (16 bit).

  In CLISP, BASE-CHAR and CHARACTER are equivalent and include all of
  Unicode (16 bit). The memory representation of read-only strings
  (e.g. symbol print names and program literals) is optimized to 1
  byte/character if possible.

* Supported external formats of streams

  - CLISP 2000-03-06: Around 80 external formats, including all of the
    ones supported by browsers and Linux locales.
  - Eclipse: Only :ASCII (1 byte/character), :UCS (2 bytes/character),
    and :MULTI-BYTE (locale dependent multibyte representation, works
    only on OSes for which wchar_t is Unicode).
  - ACL 6.0: Lots of external formats, mostly table-driven.
  - LispWorks: Around 10 external formats, including Latin-1, Unicode
    (2 bytes/character), UTF-8, and the most important Japanese encodings
    (but not ISO-2022-JP).

  Different end-of-line conventions are indicated to OPEN through the
  :external-format argument in CLISP and LispWorks, and through an extra
  argument to OPEN in Eclipse.

* Additional API

  - CLISP: STRING-WIDTH returns the display width of a string, used by
    FORMAT ~T.
  - Eclipse: none.
  - ACL: unknown.
  - LispWorks: functions for guessing the encoding of a file (important
    for Japanese environments)

* FFI support

  - CLISP: FFI can pass strings only with single-byte encodings.
  - Eclipse, ACL: unknown
  - LispWorks: a few specialized macros for passing strings from/to C.

Bruno

From: Pekka P. Pirinen
Subject: Re: unicode
Date: Fri, 23 Jun 2000 00:00:00 +0000
Message-ID: <ixhfakjzap.fsf@harlequin.co.uk>

······@clisp.cons.org writes:
> Let me try to give a summary of the features in
>   - CLISP 2000-03-06
>   - Eclipse
>   - ACL 6.0 (not yet released)
>   - LispWorks 4.0.1

Current version is LW 4.1, but the differences should be small.  Liquid
5.0 has many of the same interfaces.

> * Supported external formats of streams
>   - LispWorks: Around 10 external formats, including Latin-1, Unicode
>     (2 bytes/character), UTF-8, and the most important Japanese encodings
>     (but not ISO-2022-JP).

LW support would probably help you, if you needed another external
format.  There's a relatively painless way of adding one.

Also, on Windows, all the installed codepages are available as
external formats.

> * Additional API
>   - LispWorks: functions for guessing the encoding of a file (important
>     for Japanese environments)

Also functions for code conversions (see package EXTERNAL-FORMAT in
the Reference Manual), lots of string and character types and
predicates (package LISPWORKS), and *DEFAULT-CHARACTER-ELEMENT-TYPE*
for controlling the default size of strings &c.

> * FFI support
>   - LispWorks: a few specialized macros for passing strings from/to C.

That's true, but lest people think that describes a limitation, the
FLI is pretty C-oriented, and the macros provided together with the
foreign types :EF-MB-STRING and :EF-WC-STRING allow passing of strings
in any encoding.  The types take an external-format parameter, that
defaults to the encoding used by the current C locale (assuming you
tell LW what that is, see SET-LOCALE in the FLI manual).
-- 
Pekka P. Pirinen, Adaptive Memory Management Team, Harlequin Limited
Controlling complexity is the essence of computer programming.
  - Kernighan

From: Steven M. Haflich
Subject: Re: unicode
Date: Sat, 24 Jun 2000 00:00:00 +0000
Message-ID: <3954E346.BEF30BB9@pacbell.net>

······@clisp.cons.org wrote:
> 
> Marco Antoniotti <·······@parades.rm.cnr.it> asked:

> Let me try to give a summary of the features in
>   - CLISP 2000-03-06
>   - Eclipse
>   - ACL 6.0 (not yet released)
>   - LispWorks 4.0.1
> ...

I'm sorry to say that Bruno's information about ACL (both 5.0 and 6.0)
is incorrect in a number of regards.  Anyone wanting accurate information
should get it directly from Franz, not usenet.