Reviews for lisp implementations

From: Loh Yoon Chao, Peter
Subject: Reviews for lisp implementations
Date: Wed, 14 Apr 1999 00:00:00 +0000
Message-ID: <3714671D.136215D2@singnet.com.sg>

Hi,
    Could someone recommend any good, independent
sites for the above?  In particular, I'm looking
for product comparisons between Harlequin's end
Franz's implementations.  I had tried the ALU site
(and the rest of the web) for a few hours without
any success.  Thanks in advance.

Best regards,
Peter

Re: Reviews for lisp implementations Arthur Lemmens
- Re: Reviews for lisp implementations Erik Naggum
  - Re: Reviews for lisp implementations Arthur Lemmens
    - Re: Reviews for lisp implementations Erik Naggum
    - Re: Reviews for lisp implementations Vassil Nikolov
      - [off-topic] alphabets &c (Was: Reviews for lisp implementations) Valeriy E. Ushakov
        Re: [off-topic] alphabets &c (Was: Reviews for lisp implementations) Vassil Nikolov
      - Re: Reviews for lisp implementations Arthur Lemmens
        Re: Reviews for lisp implementations Vassil Nikolov
    - Re: Reviews for lisp implementations David Fox
      - Re: Reviews for lisp implementations Lars Marius Garshol
        Re: Reviews for lisp implementations Vassil Nikolov
        Re: Reviews for lisp implementations Erik Naggum
        Re: Reviews for lisp implementations Philip Lijnzaad
        Re: Reviews for lisp implementations Lars Marius Garshol
        Re: Reviews for lisp implementations Erik Naggum
        Re: Reviews for lisp implementations Arthur Lemmens
        Re: Reviews for lisp implementations Erik Naggum
        Re: Reviews for lisp implementations Lieven Marchand
        Re: Reviews for lisp implementations Erik Naggum
        Re: Reviews for lisp implementations Breanndán Ó Nualláin
        Re: Reviews for lisp implementations Casper H.S. Dik - Network Security Engineer
        Re: Reviews for lisp implementations Philip Lijnzaad
        Re: Reviews for lisp implementations Marco Antoniotti
        Re: Reviews for lisp implementations Lieven Marchand
        Re: Reviews for lisp implementations Lieven Marchand
        Re: Reviews for lisp implementations Vassil Nikolov
        Re: Reviews for lisp implementations Erik Naggum
        Re: Reviews for lisp implementations Juanma Barranquero
        string collating (Re: Reviews for lisp implementations) Howard R. Stearns
      - Re: Reviews for lisp implementations Reini Urban
        Re: Reviews for lisp implementations Vassil Nikolov
- Re: Reviews for lisp implementations Fernando D. Mato Mira

From: Arthur Lemmens
Subject: Re: Reviews for lisp implementations
Date: Thu, 15 Apr 1999 00:00:00 +0000
Message-ID: <3715A6F9.51E830D0@simplex.nl>

"Loh Yoon Chao, Peter" wrote:

> In particular, I'm looking for product comparisons between 
> Harlequin's and Franz's implementations.  I had tried the ALU site
> (and the rest of the web) for a few hours without
> any success.  

Apart from Usenet snippets, the only product comparison I'm aware of
is by David Lamkins (http://www.teleport.com/~dlamkins). Unfortunately,
it's probably too old to be of any use.

I've waited two days for people with more experience to shed some
light here. But, apparently, nobody is willing to burn his fingers 
on a comparison between Harlequin and Franz. So here's my (very 
personal and very subjective) impression, based on about 1000 hours 
of working with Harlequin's Lispworks, 50 hours of experiments with 
Franz' previous version (don't remember version number) for Windows 
and about 5 hours of playing with Franz' current version. All of this 
on Windows 95/98.

* Price 
  Franz is a lot more expensive than Harlequin (at least a few thousand
vs. less than one thousand dollars). Also, Franz wants royalties for 
programs that you distribute; Harlequin doesn't (unless you use their 
Enterprise Edition).
  For personal use, both companies have a free version.

* Conformance to standards.
  My impression is that both companies are pretty good at conforming
to the ANSI spec, but that Harlequin takes it a bit more seriously
than Franz. Harlequin seems to be much better at supporting Unicode
and other character sets. Franz still seems to think that 256 characters
is more than enough (just like Bill Gates thought that 640K is more
than anyone would ever need).

* Integration with underlying platform
  My impression is that Franz puts more effort into this than Harlequin.
For Windows, Franz seems to support more platform-specific stuff
(e.g. multimedia extensions, tree views). Also, their development
environment has a more 'natural' feel.

* Performance
  I haven't run any benchmarks, but Lispworks feels a bit more 
sluggish (both in space and speed) than Allegro CL.

If money didn't matter, I would use Allegro for platform-dependent 
stuff and Lispworks for everything else. Personally, I can't afford
Allegro and I've settled for Lispworks. I've never regretted buying
it.

I'll be happy to have my impressions corrected by people who know
better.

Arthur Lemmens

From: Erik Naggum
Subject: Re: Reviews for lisp implementations
Date: Thu, 15 Apr 1999 00:00:00 +0000
Message-ID: <3133160386747264@naggum.no>

* Arthur Lemmens <·······@simplex.nl>
| I've waited two days for people with more experience to shed some
| light here.  But, apparently, nobody is willing to burn his fingers 
| on a comparison between Harlequin and Franz.

  that's because this is the kind of stuff lawsuits are made of.  you need
  a protective wrapper of serious legal quality to dive into this matter of
  comparing products in general.  not that I think Franz or Harlequin will
  sue anyone, but most professionals are aware of the problems of comparing
  products, and consequently avoid it, at least in public.

| So here's my (very personal and very subjective) impression, based on
| about 1000 hours of working with Harlequin's Lispworks, 50 hours of
| experiments with Franz' previous version (don't remember version number)
| for Windows and about 5 hours of playing with Franz' current version.
| All of this on Windows 95/98.

  although extremely important to inform your readers of (thanks), this
  makes your comparison "weak".  (I wouldn't be able to provide a stronger
  comparison, by the way.)

| * Price 

  price comparisons are more dangerous than any other comparisons.

| * Conformance to standards.

  this comparison should be performed by someone very familiar with the
  standard and its semantics, because impressions of non-conformance may
  actually be within the bounds of conformance, and some non-conformances
  may be insignificant and easily fixed if the vendor is alerted to them.

| My impression is that both companies are pretty good at conforming to the
| ANSI spec, but that Harlequin takes it a bit more seriously than Franz.

  this is _very_ difficult to establish from watching the products, as it
  refers to intentions and future, not to the past.  it _is_ fair to say
  that Harlequin's LispWorks conforms better to the specification in some
  areas than Franz's Allegro CL does, but it has to be an area-by-area
  comparison to be fair, and the severity of the non-conformance is also
  important for a fair comparison.  e.g., _my_ impression is that Allegro
  CL has a weaker safe mode (not all errors signal errors as they should)
  than one could hope for, but this is not an area where I need it, so it
  may or may not matter to a particular programmer.  (incidentally, I know
  that Franz Inc _is_ taking conformance seriously and I'm working with
  them to help us all get there.)

| Harlequin seems to be much better at supporting Unicode and other
| character sets.

  although very valuable for a user, this is not about conformance to the
  ANSI Common Lisp standard.  it is therefore important to state what you
  expect from a product.

| Franz still seems to think that 256 characters is more than enough (just
| like Bill Gates thought that 640K is more than anyone would ever need).

  such parenthetical remarks, however, make your "comparison" nigh useless.

  incidentally, Franz Inc has an "international" (= Japanese) version that
  covers the need of most present non-Latin speakers.  (I have had to do a
  little home-brewing to get ISO 8859-1 working as I want it to in Allegro
  CL, but I don't know whether LispWorks is any better.)
  
| * Integration with underlying platform

  this is a valuable comment to a user.

| * Performance

  comparisons here are fraught with danger and should be performed with
  published code and all sorts of things.  e.g., some property that makes
  it feel "sluggish" could be extremely easy to fix, and other properties
  can be very hard to change because they are pandemic to the design.  I
  think performance comparisons are _generally_ unfair, because after you
  have decided on a product, you learn how to make it faster.

| I'll be happy to have my impressions corrected by people who know
| better.

  I don't want to snap at you, but it's a _lot_ safer to talk to the person
  requesting a comparison and let it be a personal exchange, rather than
  post impressions and request correction; it usually requires a huge
  effort to correct simple misimpressions.  this is why comparisons often
  produce a tremendous amount of noise on the newsgroups.  also, most user
  impressions are exceedingly hard to quantify, and a lot of factors come
  into play.

  incidentally, I haven't had the opportunity to compare Allegro CL with
  much anything else.  (I went from CMUCL 17f to Allegro CL 4.3 and it was
  a world of difference, so I don't even consider CMUCL possible to compare
  in the area I think matters the most: the development environment.)  I
  get the performance I need, and I get the support I need from Franz Inc
  whenever I wonder about something or find a problem, and I see no reason
  to go look for a competing product.  now, this is more an accident of
  history than anything else, so it does in no way preclude similar
  experiences with Harlequin -- it just didn't happen to me.  my guess is
  that this is how most user impressions are formed: luck and good timing.

#:Erik
-- 
environmentalists are much too concerned with planet earth.  their geocentric
attitude prevents them from seeing the greater picture -- lots of planets are
much worse off than earth is.

From: Arthur Lemmens
Subject: Re: Reviews for lisp implementations
Date: Thu, 15 Apr 1999 00:00:00 +0000
Message-ID: <37164EF3.F15982CB@simplex.nl>

Erik Naggum wrote:

>   price comparisons are more dangerous than any other comparisons.

Would you care to explain?
I would think that price is just about the only thing you can 
compare without the risk of giving "misimpressions".

>   this comparison should be performed by someone very familiar with the
>   standard and its semantics
>   [...]
>   but it has to be an area-by-area comparison to be fair, and the 
>   severity of the non-conformance is also important for a fair comparison. 

I can't disagree with this, of course.
But I tried to make it clear that I was giving my personal impression
and not attempting to make a fair comparison. (I couldn't possibly find 
the time for a fair comparison, but I didn't want to leave the original 
question unanswered.)

> | Franz still seems to think that 256 characters is more than enough (just
> | like Bill Gates thought that 640K is more than anyone would ever need).
> 
>   such parenthetical remarks, however, make your "comparison" nigh useless.

Sorry, I shouldn't have said that. 
This wasn't the right place to vent my frustration about the slow 
acceptance of a decent international character set.

>  (I have had to do a little home-brewing to get ISO 8859-1 working 
>  as I want it to in Allegro CL, but I don't know whether LispWorks 
>  is any better.)

I'm so glad that I only need to type 
  (code-char #x41A) 
to actually get a Russian K that I've forgiven Lispworks for 
returning NIL when I ask
  (alpha-char-p *)

But it  _does_ know something about Latin 1:

CL-USER 17 > (code-char #xF0)
#\�

CL-USER 18 > (char-upcase *)
#\�

>   it's a _lot_ safer to talk to the person requesting a comparison 
>   and let it be a personal exchange, rather than post impressions 
>   and request correction;

Thanks for the advice. I don't know if I will actually follow it,
though. Having a public discussion increases the chance that _I_ can 
learn something as well. E.g., if I had sent my remarks privately, 
I wouldn't have learnt from you that Franz has an international 
version of Allegro CL.

>  (I went from CMUCL 17f to Allegro CL 4.3 and it was a world of 
>   difference, so I don't even consider CMUCL possible to compare
>   in the area I think matters the most: the development environment.)

Let's hope the CMUCL maintainers won't sue you for this remark ;-)

Arthur Lemmens

From: Erik Naggum
Subject: Re: Reviews for lisp implementations
Date: Thu, 15 Apr 1999 00:00:00 +0000
Message-ID: <3133200321383041@naggum.no>

* Arthur Lemmens <·······@simplex.nl>
| Would you care to explain?  I would think that price is just about the
| only thing you can  compare without the risk of giving "misimpressions".

  there are all sorts of pricing policies around, depending on who you are
  (student, commercial, educational), where you are (United States, Europe,
  Asia), how much you want to buy (trial, student, professional, enterprise
  edition), etc, etc.  it's actually difficult to compare the price that
  you would have to pay unless you're the person buying something and in
  position to weigh alternatives.  for instance, one might find that some
  add-on product is not worth the price from one vendor and roll your own,
  while from another vendor the price is acceptable.  the result may be
  that the former costs less from the vendor than the latter, but more
  after life-cycle costs for the new code are accounted for, but not with
  the initial prices, only.  stuff like this is why large companies have
  acquisitions departments who work like hell to get good package deals.

| But it  _does_ know something about Latin 1:
| 
| CL-USER 17 > (code-char #xF0)
| #\�
| 
| CL-USER 18 > (char-upcase *)
| #\�

  good.  Allegro CL does this correctly only with my personal fixes.
  (which, incidentally, supports the entire ISO 8859 family, one by one,
  once properly invoked.)

| E.g., if I had sent my remarks privately, I wouldn't have learnt from
| you that Franz has an international version of Allegro CL.

  valid point.  however, it would have been prudent to ask Franz Inc if you
  tried to write a fair comparison.

#:Erik
-- 
environmentalists are much too concerned with planet earth.  their geocentric
attitude prevents them from seeing the greater picture -- lots of planets are
much worse off than earth is.

From: Vassil Nikolov
Subject: Re: Reviews for lisp implementations
Date: Thu, 15 Apr 1999 00:00:00 +0000
Message-ID: <7f5t8d$fet$1@nnrp1.dejanews.com>

Off-topic.

In article <·················@simplex.nl>,
  Arthur Lemmens <·······@simplex.nl> wrote:

(...)
> I'm so glad that I only need to type
>   (code-char #x41A)
> to actually get a Russian K

Cyrillic K, please.  There are a number of nations besides the
Russians, including Byelorussians, Macedonians, Serbs, Ukrainians,
as well as Bulgarians, who use (different versions of) this alphabet.

I am too lazy to look it up, but I believe the ISO 10646 name of
this character is CYRILLIC CAPITAL LETTER KA or something.

Historical Note:
The original version of the Cyrillic alphabet was developed in the
9th century on the basis of the Greek alphabet.  Its name is a
tribute to St. Cyril, the Eastern Roman scholar and missionary who
captured the phonetics of the (then common) Slavonic language
into a writing system (using a different alphabet, now extinct)
and who was a translator of the Bible, and its development is
credited to St. Climent of Okhrid, one of St. Cyril's disciples.

By the way, I am not aware of another alphabet besides the
contemporary Russian version of Cyrillics where the number
of letters is a power of 2 (2^5).

(...)
> But it  _does_ know something about Latin 1:
>
> CL-USER 17 > (code-char #xF0)
> #\�
>
> CL-USER 18 > (char-upcase *)
> #\�
(...)

I'd rather have had the above as

  (char-code (char-upcase (code-char #xF0)))

instead of, or in addition to, the above, which makes little
apparent sense on my Macintosh with a Cyrillic font selected.

--
Vassil Nikolov <vniko�

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

From: Valeriy E. Ushakov
Subject: [off-topic] alphabets &c (Was: Reviews for lisp implementations)
Date: Fri, 16 Apr 1999 00:00:00 +0000
Message-ID: <7f6l5s$cnm$1@news.ptc.spbu.ru>

Vassil Nikolov <········@poboxes.com> wrote:

> Off-topic.

Yes.

> By the way, I am not aware of another alphabet besides the
> contemporary Russian version of Cyrillics where the number
> of letters is a power of 2 (2^5).

[mumbling the alphabet]... hmm, last time I checked - it was 33. :-)

You probably forgot cyrillic-letter-io, which is rarely used in
printed Russian (mostly in texts for children and foreigners and in
ambiguous cases) and is substituted with cyrillic-letter-ie.  Still
it's part of the alphabet/orthography.

SY, Uwe
-- 
···@ptc.spbu.ru                         |       Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/            |       Ist zu Grunde gehen

From: Vassil Nikolov
Subject: Re: [off-topic] alphabets &c (Was: Reviews for lisp implementations)
Date: Sat, 17 Apr 1999 00:00:00 +0000
Message-ID: <7f911i$6k6$1@nnrp1.dejanews.com>

In article <············@news.ptc.spbu.ru>,
  "Valeriy E. Ushakov" <···@ptc.spbu.ru> wrote:
> Vassil Nikolov <········@poboxes.com> wrote:
(...)
> > By the way, I am not aware of another alphabet besides the
> > contemporary Russian version of Cyrillics where the number
> > of letters is a power of 2 (2^5).
>
> [mumbling the alphabet]... hmm, last time I checked - it was 33. :-)
>
> You probably forgot cyrillic-letter-io, which is rarely used in
> printed Russian (mostly in texts for children and foreigners and in
> ambiguous cases) and is substituted with cyrillic-letter-ie.  Still
> it's part of the alphabet/orthography.

No, I had not forgotten it, I forgot to write something like
`mainstream use,' and I apologise for that.  (I believe I have
(almost) never seen this letter in a publication (and I have read a
_lot_ of Russian texts, still do from time to time) which was
not a children's book, a textbook, a dictionary, or some such.)
Specifically, in the context of the thread, I was thinking of the
32-character block that one sees in 8859-5 etc.

Of course, cyrillic letter io _is_ an integral part of the Russian
alphabet (Ukranian? Byelorussian?), and I should have mentioned it.
By the way, with all those language reforms, the phrase `last time
I checked' is very appropriate...

--
Vassil Nikolov <········@poboxes.com> www.poboxes.com/vnikolov
(You may want to cc your posting to me if I _have_ to see it.)
   LEGEMANVALEMFVTVTVM  (Ancient Roman programmers' adage.)

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

From: Arthur Lemmens
Subject: Re: Reviews for lisp implementations
Date: Fri, 16 Apr 1999 00:00:00 +0000
Message-ID: <3716EA3D.22D9BAD5@simplex.nl>

I wrote:
> I'm so glad that I only need to type
>   (code-char #x41A)
> to actually get a Russian K

Vassil Nikolov replied:
> Cyrillic K, please.  There are a number of nations besides the
> Russians, including Byelorussians, Macedonians, Serbs, Ukrainians,
> as well as Bulgarians, who use (different versions of) this alphabet.

Uhm, yes. Sorry. In my situation, (code-char #x41A) is usually a 
Russian K. But next time I'll call it Cyrillic. I sometimes forget 
I'm not the only Cyrillic speaking (;-) Lisp programmer on Usenet.

> I am too lazy to look it up, but I believe the ISO 10646 name of
> this character is CYRILLIC CAPITAL LETTER KA or something.

I looked it up. You're right.

> I'd rather have had the above as
> 
>   (char-code (char-upcase (code-char #xF0)))
> 
> instead of, or in addition to, the above, which makes little
> apparent sense on my Macintosh with a Cyrillic font selected.

Before sending, I verified that the content-type header included 
"charset=iso8859-1" to increase the probability of readers seeing 
what I meant.

Arthur Lemmens

From: Vassil Nikolov
Subject: Re: Reviews for lisp implementations
Date: Sat, 17 Apr 1999 00:00:00 +0000
Message-ID: <7f93n5$8r1$1@nnrp1.dejanews.com>

In article <·················@simplex.nl>,
  Arthur Lemmens <·······@simplex.nl> wrote:
(...)
> In my situation, (code-char #x41A) is usually a
> Russian K. But next time I'll call it Cyrillic.

That sounds like an interesting situation.  If it is _usually_
that, what is it _sometimes_?  Does it ever happen to be a
Bulgarian K?  And what would the difference be, for your purposes,
between a Russian K and a Bulgarian K?  (I'd be hard pressed
to think of such a difference in terms of characters and their
codes.)

(Or do you sometimes use another 16-bit-per-character encoding
where #x41A is the code of some Chinese or Japanese ideogram?)

My point was that unless the context is appropriately specific,
the generic name (Cyrillic) should be used in preference to the
language-specific name (Russian).  In the same way, outside of a
specific context, it is appropriate to say `Roman K' (or `Latin
K'), rather than `English K' (or `Italian K' etc.).

If only the world had simply stuck to the good old Phoenician
alphabet as it was...

(...)
> > I'd rather have had the above as
> >
> >   (char-code (char-upcase (code-char #xF0)))
> >
> > instead of, or in addition to, the above, which makes little
> > apparent sense on my Macintosh with a Cyrillic font selected.
>
> Before sending, I verified that the content-type header included
> "charset=iso8859-1" to increase the probability of readers seeing
> what I meant.

I _did_ see what you meant---but not with my _eyes_ (with the mind's
eye, perhaps, if my mind has one---I have never seen it).

Well, I know I deserve to lose...  Having struggled on too many
occasions with all those 4-5 different Cyrillic encodings that are in
_active_ use around myself (and that are mutually exclusive with the
Roman letters with diacritical marks, for happiness to be complete),
and with all those different EBCDIC-ASCII mappings, etc.^1, I have become
somewhat hypersensitive to not having the character code itself on such
occasions.  I wish ``charset=...'' did work, always.  In a perfect world,
maybe.
__________
^1 the law of perverse solutions (`every problem has one') is also
   applicable here: there are character sets where the codes for
   the Roman _and_ Cyrillic letters A, C, E, etc. (that have the
   same glyphs) are the same...  KOI-8 (and even DKOI) is a blessing
   by comparison.

--
Vassil Nikolov <········@poboxes.com> www.poboxes.com/vnikolov
(You may want to cc your posting to me if I _have_ to see it.)
   LEGEMANVALEMFVTVTVM  (Ancient Roman programmers' adage.)

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

From: David Fox
Subject: Re: Reviews for lisp implementations
Date: Fri, 16 Apr 1999 00:00:00 +0000
Message-ID: <epn2088xr7.fsf@harlequin.co.uk>

Arthur Lemmens <·······@simplex.nl> writes:

> >  (I have had to do a little home-brewing to get ISO 8859-1 working 
> >  as I want it to in Allegro CL, but I don't know whether LispWorks 
> >  is any better.)

LispWorks uses ISO 8859-1 for files by default. Currently users need
to do some configuration to use other encodings for files.

The internal encoding of LispWorks 4.x is Unicode. It has just one
executable.

> I'm so glad that I only need to type 
>   (code-char #x41A) 
> to actually get a Russian K that I've forgiven Lispworks for 
> returning NIL when I ask
>   (alpha-char-p *)
> 
> But it  _does_ know something about Latin 1:
> 
> CL-USER 17 > (code-char #xF0)
> #\�
> 
> CL-USER 18 > (char-upcase *)
> #\�

Yes, in LispWorks we added the alphabetic property and case-pairs
(beyond those required by the ANSI standard) for Latin-1 only. I
should admit that this is rather half-baked, but allow me to
explain one of the technical problems...

Recall that BASE-STRINGs contain only BASE-CHARs. LispWorks provides
also a 16bit string type (TEXT-STRING) which can contain all of
Unicode.

There is a particular difficulty (for LispWorks at least) with U+00FF
LATIN SMALL LETTER Y DIARESIS which is a BASE-CHAR in LispWorks yet
its uppercase pair (as defined by Unicode) is an EXTENDED-CHAR. Thus
if we were to make these particular characters BOTH-CASE-P then
STRING-UPCASE etc. could not be relied upon to preserve string
types. That might be acceptable by the ANSI standard (though
potentially dangerous to users whose code used specialized accessors)
but the real killer was NSTRING-UPCASE.

I suppose we could have defined a larger set of alphabetic characters
without such problems, but we didn't. Sorry!

There was some attempt to define extended character case (and other)
functions in the JEIDA Common Lisp Guideline. I don't know if anyone
actually implemented that.

LispWorks users needing case converters beyond Latin-1 should exploit
the fact that the internal encoding is Unicode to write their own
functions using range checks.  

-- 
Dave Fox                                  Email: ·····@harlequin.com
Harlequin Ltd, Barrington Hall,           Tel:   +44 1223 873879
Barrington, Cambridge CB2 5RG, England.   Fax:   +44 1223 873873
These opinions are not necessarily those of Harlequin.

From: Lars Marius Garshol
Subject: Re: Reviews for lisp implementations
Date: Fri, 16 Apr 1999 00:00:00 +0000
Message-ID: <wk3e20ify6.fsf@ifi.uio.no>

* David Fox
| 
| There is a particular difficulty (for LispWorks at least) with
| U+00FF LATIN SMALL LETTER Y DIARESIS which is a BASE-CHAR in
| LispWorks yet its uppercase pair (as defined by Unicode) is an
| EXTENDED-CHAR.

This problem is solved in the recently-approved ISO 8859-15, so
providing that as an alternative to 8859-1 may make sense.

--Lars M.

From: Vassil Nikolov
Subject: Re: Reviews for lisp implementations
Date: Sat, 17 Apr 1999 00:00:00 +0000
Message-ID: <7fa9eo$631$1@nnrp1.dejanews.com>

In article <··············@ifi.uio.no>,
  Lars Marius Garshol <······@ifi.uio.no> wrote:
>
> * David Fox
> |
> | There is a particular difficulty (for LispWorks at least) with
> | U+00FF LATIN SMALL LETTER Y DIARESIS which is a BASE-CHAR in
> | LispWorks yet its uppercase pair (as defined by Unicode) is an
> | EXTENDED-CHAR.
>
> This problem is solved in the recently-approved ISO 8859-15, so
> providing that as an alternative to 8859-1 may make sense.

It's good that it has been solved (well, I shouldn't say that
when I don't know how).  I was never able to understand what
made them use M-DEL for a printable character in the first
place.

--
Vassil Nikolov <········@poboxes.com> www.poboxes.com/vnikolov
(You may want to cc your posting to me if I _have_ to see it.)
   LEGEMANVALEMFVTVTVM  (Ancient Roman programmers' adage.)

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

From: Erik Naggum
Subject: Re: Reviews for lisp implementations
Date: Sat, 17 Apr 1999 00:00:00 +0000
Message-ID: <3133358604132917@naggum.no>

* Vassil Nikolov <········@poboxes.com>
| It's good that it has been solved (well, I shouldn't say that when I
| don't know how).  I was never able to understand what made them use M-DEL
| for a printable character in the first place.

  ISO character sets come in 94-character and 96-character flavors, apart
  from ISO 10646.  the ISO 8859 family uses the ISO 4873 8-bit template,
  with a 94-character set in the left half and a 96-character set in the
  right half.

  in the 94-character set, 2/0 is SPACE and 7/15 is DELETE, both of which
  sort of dual as control and data characters.  in the 96-character set,
  2/0 and 7/15 are data characters.

  if you have a 94-character set and only 7 bits worth of data, the last
  bit is free to be used for other purposes, such as constant zero, parity,
  an application flag, or constant one.  most modern uses are constant zero
  and an application flag.  however, if you use an 8-bit character set, the
  only chance you have at using an application flag is with 10/0 and 15/15,
  in which case you'd probably want a non-breaking space and what IBM calls
  EO (eight ones), used as an "end of whatever" signal.  referring to 15/15
  as "M-DEL" regardless of whether it is a character or EO betrays a
  serious conceptual confusion about the usage of the code space.

  incidentally, there _is_ no upper-case version of �, just as there is no
  upper-case version of �.  pining for LATIN CAPITAL LETTER Y WITH DIARESIS
  is like pining for LATIN CAPITAL LETTER SHARP S -- a symptom of a strong
  inability to deal with practical matters and to understand the sometimes
  _very_ erratic history of writing systems.

  not that Vassil or anyone here is particularly to blame for this, but the
  history of the �, oe (not in 8859-1 because some French moron told ECMA
  it wasn't needed and shouldn't be there, and then we got � and � stuck in
  the middle of the O's, only to have the smart French guy who designed
  this stuff return fully recuperated after some serious accident or other,
  only the voting had completed, to demand a 8859 member with OE and oe --
  which they got from ISO after a few years, but which nobody uses, not
  even the French�), and � are one of dipthongs that merged over the course
  of centuries and then assumed phonemes of their own.  ae -> � in Denmark
  and Norway are almost the same as � in Sweden, but different from � in
  Germany (and the decoration used to be different, too, until ECMA had
  enough of it).  the French oe has a long and arduous story I don't know
  in detail, but it's not unlike � in Germany.

  now, � is not a y with diaeresis at all.  it has more in common with et
  (&) and ad (@) than y, since it's "ij" written together.  in Belgia and
  the Netherlands, it is pronounced like the English long I.  of course, as
  time goes by, various stupid people will do all kinds of stupid things,
  and in this case, we have the _reverse_ of what happened in France when
  some genius� decided that capital letters should not have accents because
  that was too hard to do with early typewriters and printers -- this has
  since been reversed when computers learned how to handle French.  so now
  that we have these nifty computerized thingamajigs, let's just forget
  that neither I nor J have dots on them, even though i and j do (despite
  the linguist� who decided that Turkish i and j should upcase to I and J
  with dots, but I and J should downcase to i and j without dots, which I
  think is at least part of the reason awful movies get Turkey awards), so
  the nifty computers should produce a _really_ historically moronic letter
  that nobody in their right mind would ever want to use.

  so, the single cluon in danger of being annihilated by swarms of morons
  upon contact is that just as � is upcased to SS, � is upcased to IJ.

[ this article was best viewed with an ISO 8859-1 capable font. ]

#:Erik
-------
� the morale of this story is either to keep the morons away from standards
  bodies or not to have serious accidents if you're the only smart guy in
  France.
� read: moron -- it wasn't the only smart guy in France alluded to above.
� another moron; wouldn't surprise me if he was French.
-- 
environmentalists are much too concerned with planet earth.  their geocentric
attitude prevents them from seeing the greater picture -- lots of planets are
much worse off than earth is.

From: Philip Lijnzaad
Subject: Re: Reviews for lisp implementations
Date: Sat, 17 Apr 1999 00:00:00 +0000
Message-ID: <u7iuau27uq.fsf@ebi.ac.uk>

[ interesting thread, this ]

On 17 Apr 1999 17:23:24 +0000, 
"Erik" == Erik Naggum <····@naggum.no> writes:

Erik> now, � is not a y with diaeresis at all.  it has more in common with et
Erik> (&) and ad (@) than y, since it's "ij" written together.  

Being Dutch, I probably should have known or figured this out, but I didn't;
I always thought it was a Turkish letter.  I don't know who invented the
graphical form of this letter (�), but it probably wasn't a Dutchman. In
actual practice, "ij", although one letter (actually, diftong), is *always*
typed and typeset as an i followed by a j. As far as I'm concerned, i'd be
happy to ceede this ascii value to more important purposes (capital sharp s?)
When upcased, both i and j have to be upcased (which is rare, but a good
example is 'IJsselmeer', the big watery hole in the middle of
Holland^H^H^H^H^H^H^H^HThe Netherlands). However, most dictionaries sort the
'ij' as two separate letters. Confusing, sortof. 
                                                                      Philip
-- 
To accurately forge this signature, use a lucidatypewriter-medium-12 font
-----------------------------------------------------------------------------
Philip Lijnzaad, ········@ebi.ac.uk | European Bioinformatics Institute
+44 (0)1223 49 4639                 | Wellcome Trust Genome Campus, Hinxton
+44 (0)1223 49 4468 (fax)           | Cambridgeshire CB10 1SD,  GREAT BRITAIN
PGP fingerprint: E1 03 BF 80 94 61 B6 FC  50 3D 1F 64 40 75 FB 53

From: Lars Marius Garshol
Subject: Re: Reviews for lisp implementations
Date: Sun, 18 Apr 1999 00:00:00 +0000
Message-ID: <wk4sme43re.fsf@ifi.uio.no>

* Erik Naggum
|
| now, � is not a y with diaeresis at all.  it has more in common with et
| (&) and ad (@) than y, since it's "ij" written together.  

* Philip Lijnzaad
| 
| [...] In actual practice, "ij", although one letter (actually,
| diftong), is *always* typed and typeset as an i followed by a j. As
| far as I'm concerned, i'd be happy to ceede this ascii value to more
| important purposes (capital sharp s?)  When upcased, both i and j
| have to be upcased [...]. However, most dictionaries sort the 'ij'
| as two separate letters. Confusing, sortof.

Most? From what I've heard (from Dutch sources, BTW) IJ is sorted as a
separate letter after Z.  Can you elaborate on whether both happens or
whether I've been misinformed?

And if it's really sorted separately then I think makes sense to
consider it a separate character, as Unicode more or less does
(although it calls it a ligature): U+0132 and U+0133.

--Lars M.

From: Erik Naggum
Subject: Re: Reviews for lisp implementations
Date: Sun, 18 Apr 1999 00:00:00 +0000
Message-ID: <3133424944726306@naggum.no>

* Lars Marius Garshol <······@ifi.uio.no>
| And if it's really sorted separately then I think makes sense to
| consider it a separate character, as Unicode more or less does
| (although it calls it a ligature): U+0132 and U+0133.

  this is getting a bit far afield, but collation order, characterness, and
  glyphness are distinct properties of a writing system element.  for one
  thing, there is no _single_ correct collation order.  character sets do
  _not_ imply collation order.  characterness of a writing system element
  is a fairly fundamental concept and is strongly associated with meaning.
  glyphness of a writing system element is strongly associated with looks.
  finally, fonts are made up instantiations of glyphs.  e.g., a writing
  system element may exhibit so different meanings that they deserve to be
  separate characters, although this is very rare.  in general, there is
  also one glyph per character, although some have more (the German short
  and long s, the open and baggy a, the open and broken vertical line), but
  more frequent is a glyph for a sequence of characters (ligatures in Latin
  scripts, but includes vowels in Indic scripts and Hebrew) or a character
  in contex (the connectives (single, initial, medial, final) in Arabic
  scripts), etc.  collation order is tightly coupled with character, but
  for hysterical raisins many languages collate sequences of characters as
  a single unit.  to represent all of this correctly, you need a whole
  bunch of tables.  there are therefore glyph set standards that are very
  separate from character set standards, and their mapping is non-trivial.
  there are huge tables of correct collation orders for different scripts
  and languages (French requires a five-level deep collation system in full
  name and dictionary sorting), and conflation of representation makes up
  most of it (e.g., no significance it attached to the ring in "�ngstr�m"
  in an English dictionary, where it is sorted with Angst, but you'll find
  it at the end of a Norwegian one because � is a separate character).

  Unicode is a hybrid of a character and a glyph set.  the reason for this
  is fairly obvious when you consider its major proponents: Xerox and
  Microsoft.  Xerox makes printers and wanted a simple standard for which
  they could make huge fonts.  Microsoft are just too damn stupid to get it
  right or to respect any traditions.  (Xerox didn't want it to replace the
  first ISO 10646 draft, however, so they may be excused.)  in typical "is
  this a font or what?"-misunderstanding, � was a ligature in Unicode, but
  I complained about it, so ISO 10646-1 has amended it to be a letter, and
  "ij" is a character, not a presentation form, which it should have been.

#:Erik
-- 
environmentalists are much too concerned with planet earth.  their geocentric
attitude prevents them from seeing the greater picture -- lots of planets are
much worse off than earth is.

From: Arthur Lemmens
Subject: Re: Reviews for lisp implementations
Date: Tue, 20 Apr 1999 00:00:00 +0000
Message-ID: <371C5BAC.F56EFC96@simplex.nl>

* Erik Naggum
|
| now, � is not a y with diaeresis at all.  it has more in common with
et
| (&) and ad (@) than y, since it's "ij" written together.

* Philip Lijnzaad
|
| [...] In actual practice, "ij", although one letter (actually,
| diftong), is *always* typed and typeset as an i followed by a j. As
| far as I'm concerned, i'd be happy to ceede this ascii value to more
| important purposes (capital sharp s?)  When upcased, both i and j
| have to be upcased [...]. However, most dictionaries sort the 'ij'
| as two separate letters. Confusing, sortof.

* Lars Marius Garshol
|
| Most? From what I've heard (from Dutch sources, BTW) IJ is sorted as a
| separate letter after Z.  

Not that any of this has much to do with Lisp, but:

- U+00FF (LATIN SMALL LETTER Y DIAERESIS) is described in the Unicode
  standard as being French, not Dutch. This probably explains why 
  Philip didn't recognize it as a Dutch letter. It also casts some
  doubt on Erik's explanation that it's "ij" written together. 
  I suppose we have to wait for the French to tell us more about this
  (I read some French from time to time, but I don't recall ever 
   having seen a �.)

- The Unicode version of Dutch 'ij', which _is_ "ij" written together
  and is probably what Erik had in mind, is U+0133.  Its upper case 
  equivalent is U+0132.  

- IJ is _never_ sorted as a separate letter after Z. Maybe, sometimes,
  it has been sorted as Y (between X and Z). Modern dictionaries sort
  it as I followed by J. So you have '("iets" "ijdel" "ijsje" "ik").

- When a Dutchman doesn't have a U+0133 handy (which is very likely),
  he just uses #\i followed by #\j. As in "ijsje". If this needs 
  capitalizing, he'll use #\I followed by #\J. Capitalizing the
  above list would result in '("Iets" "IJdel" "IJsje" "Ik").

* Lars Marius Garshol
|
| And if it's really sorted separately then I think makes sense to
| consider it a separate character, as Unicode more or less does
| (although it calls it a ligature): U+0132 and U+0133.

For _capitalization_ it makes some sense to consider it a separate
character. But _sorting_ will be much more likely to go wrong
when you use a separate character.

Arthur Lemmens

From: Erik Naggum
Subject: Re: Reviews for lisp implementations
Date: Tue, 20 Apr 1999 00:00:00 +0000
Message-ID: <3133600396484611@naggum.no>

* Arthur Lemmens <·······@simplex.nl>
| Not that any of this has much to do with Lisp, but:
| 
| - U+00FF (LATIN SMALL LETTER Y DIAERESIS) is described in the Unicode
|   standard as being French, not Dutch.

  I said _from_ Dutch "ij".  it's an _imported_ character.  it is used in a
  bunch of names in Belgia that historically had "ij" in their name.

|   It also casts some doubt on Erik's explanation that it's "ij" written
|   together.

  it does?  so the fact that � is a Danish and Norwegian letter casts doubt
  on its history of being imported from Latin as its a+e ligature, too?
  appreciate that the history of writing systems is not a couple years old.

| - The Unicode version of Dutch 'ij', which _is_ "ij" written together
|   and is probably what Erik had in mind, is U+0133.

  I probably had in mind what I wrote.  so do other people.  please assume
  this next time you feel an overpowering urge to tell people what they
  think.

#:Erik

From: Lieven Marchand
Subject: Re: Reviews for lisp implementations
Date: Tue, 20 Apr 1999 00:00:00 +0000
Message-ID: <m3iuar181o.fsf@localhost.localdomain>

Erik Naggum <····@naggum.no> writes:

> * Arthur Lemmens <·······@simplex.nl>
> | Not that any of this has much to do with Lisp, but:
> | 
> | - U+00FF (LATIN SMALL LETTER Y DIAERESIS) is described in the Unicode
> |   standard as being French, not Dutch.
> 
>   I said _from_ Dutch "ij".  it's an _imported_ character.  it is used in a
>   bunch of names in Belgia that historically had "ij" in their name.
> 

Could you name two please? I live in Belgium (no need to form a
plural) and I've never seen it.

-- 
Lieven Marchand <···@bewoner.dma.be>
If there are aliens, they play Go. -- Lasker

From: Erik Naggum
Subject: Re: Reviews for lisp implementations
Date: Tue, 20 Apr 1999 00:00:00 +0000
Message-ID: <3133637703790224@naggum.no>

* Lieven Marchand <···@bewoner.dma.be>
| Could you name two please?

  not off-hand.  the rationale for � that I have related here is that given
  to ECMA in 1982-6 when formulating and to ISO in 1987 when adopting ISO
  8859-1 through -4.

| I live in Belgium (no need to form a plural) and I've never seen it.
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^

  _very_ good!  never though of it as a plural.  we call it "Belgia"
  in Norwegian.  I'm sure it's an editing glitch.  I hope it isn't receding
  language skills.  :)

  I have actually seen it, though, which is why I remember the minutes from
  the ISO work.  it's been a while (11 years), and I regret I'm not able to
  recall them in minute detail, anymore.

#:Erik

From: Breanndán Ó Nualláin
Subject: Re: Reviews for lisp implementations
Date: Tue, 27 Apr 1999 00:00:00 +0000
Message-ID: <m3k8uyvzd2.fsf@kotona.demon.nl>

>>>>> "Lieven" == Lieven Marchand <···@bewoner.dma.be> writes:
>>>>> Erik Naggum <····@naggum.no> writes:

    Erik> I said _from_ Dutch "ij".  it's an _imported_ character.  it
    Erik> is used in a bunch of names in Belgia that historically had
    Erik> "ij" in their name.

    Lieven> Could you name two please? I live in Belgium (no need to
    Lieven> form a plural) and I've never seen it.

Kortrijk springs to mind.  Maybe to yours too; is that why you asked
for two examples?  :-) 

I had to glance at a map to find a second one: Nijvel.

Would a Belgian spell these with "y trema"?

From: Casper H.S. Dik - Network Security Engineer
Subject: Re: Reviews for lisp implementations
Date: Tue, 20 Apr 1999 00:00:00 +0000
Message-ID: <casper.924612995@nl-usenet.sun.com>

[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]

Lars Marius Garshol <······@ifi.uio.no> writes:

>* Philip Lijnzaad
>| 
>| [...] In actual practice, "ij", although one letter (actually,
>| diftong), is *always* typed and typeset as an i followed by a j. As
>| far as I'm concerned, i'd be happy to ceede this ascii value to more
>| important purposes (capital sharp s?)  When upcased, both i and j
>| have to be upcased [...]. However, most dictionaries sort the 'ij'
>| as two separate letters. Confusing, sortof.

>Most? From what I've heard (from Dutch sources, BTW) IJ is sorted as a
>separate letter after Z.  Can you elaborate on whether both happens or
>whether I've been misinformed?

I think you've been misinformed; all Dutch dictionaries I've ever seen
as well as "Het Groene Boekje" (the official list of Dutch words) sorts
"ij" as if it's an i followed by a j.

The only exception to this standard rule is the Dutch telephone
book; it sorts the ij as if it is an y.

>And if it's really sorted separately then I think makes sense to
>consider it a separate character, as Unicode more or less does
>(although it calls it a ligature): U+0132 and U+0133.

I think the Dutch "ij" is a ligature, even though we learn differently
at school.  As you say, both I and J are upcased together as
in "IJmuiden" but that holds true for AE ligatures as well.

Casper

--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

From: Philip Lijnzaad
Subject: Re: Reviews for lisp implementations
Date: Tue, 20 Apr 1999 00:00:00 +0000
Message-ID: <u74smb2xz9.fsf@ebi.ac.uk>

On 20 Apr 1999 13:02:29 GMT, 
"Casper" == Casper H S Dik <··········@Holland.Sun.Com> writes:

>> Most? From what I've heard (from Dutch sources, BTW) IJ is sorted as a
>> separate letter after Z.  

No, never.

Casper> all Dutch dictionaries I've ever seen as well as "Het Groene Boekje"
Casper> (the official list of Dutch words) sorts "ij" as if it's an i
Casper> followed by a j.

yes, although I remember having used dictionaries in school that had IJ
between X and Z. It's apparently obsolete now, but:

Casper> The only exception to this standard rule is the Dutch telephone
Casper> book; it sorts the ij as if it is an y.

(didn't know that ... a bit strange and confusing, I'd say)

Casper> As you say, both I and J are upcased together as in "IJmuiden" but
Casper> that holds true for AE ligatures as well.

another point is abbreviations: I'm fairly sure that the Dutch 'ij' would be
abbreviated to 'IJ'. Making up an example: Vereniging ter bevoordering van de
ijspret would be V.B.IJ, not V.B.I. The abbreviation issue must be correlated
with the capitalization issue, and I suspect it would be the same for
ligatures in other languages/scripts.

                                                                      Philip
-- 
To accurately forge this signature, use a lucidatypewriter-medium-12 font
-----------------------------------------------------------------------------
Philip Lijnzaad, ········@ebi.ac.uk | European Bioinformatics Institute
+44 (0)1223 49 4639                 | Wellcome Trust Genome Campus, Hinxton
+44 (0)1223 49 4468 (fax)           | Cambridgeshire CB10 1SD,  GREAT BRITAIN
PGP fingerprint: E1 03 BF 80 94 61 B6 FC  50 3D 1F 64 40 75 FB 53

From: Marco Antoniotti
Subject: Re: Reviews for lisp implementations
Date: Tue, 20 Apr 1999 00:00:00 +0000
Message-ID: <lwpv4z8ii2.fsf@copernico.parades.rm.cnr.it>

#+:noise-ahead
Isn't Dutch a throat disease? :)

-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa

From: Lieven Marchand
Subject: Re: Reviews for lisp implementations
Date: Tue, 20 Apr 1999 00:00:00 +0000
Message-ID: <m3hfqb17z3.fsf@localhost.localdomain>

Marco Antoniotti <·······@copernico.parades.rm.cnr.it> writes:

> #+:noise-ahead
> Isn't Dutch a throat disease? :)
> 

#+:further-noise 
No, we just have a fairly complete set of sounds. It helps to
recognize foreigners.

Schild en Vriend?

-- 
Lieven Marchand <···@bewoner.dma.be>
If there are aliens, they play Go. -- Lasker

From: Lieven Marchand
Subject: Re: Reviews for lisp implementations
Date: Sat, 17 Apr 1999 00:00:00 +0000
Message-ID: <m3676v2bqx.fsf@localhost.localdomain>

Erik Naggum <····@naggum.no> writes:

>   now, � is not a y with diaeresis at all.  it has more in common with et
>   (&) and ad (@) than y, since it's "ij" written together.  in Belgia and
>   the Netherlands, it is pronounced like the English long I.  of course, as
>   time goes by, various stupid people will do all kinds of stupid things,

Except that in the Dutch speaking parts of Belgium and the
Netherlands, everybody writes it as ij. The confusion could have been
started because some morons (this time not even French) collated the
ij combination with the y, although modern dictionaries have stopped
this a long time ago. There is also some difference of opinion how to
write an uppercase version of this. Some people use Ij but most -
especially in handwriting will use a variant of uppercase Y with
diaresis.

BTW: if Gordon's Introduction to Old Norse is accurate and can be
extrapolated to the modern variant, it's rather pronounced as the ei
diphtong in 'bein'.

-- 
Lieven Marchand <···@bewoner.dma.be>
If there are aliens, they play Go. -- Lasker

From: Vassil Nikolov
Subject: Re: Reviews for lisp implementations
Date: Sun, 18 Apr 1999 00:00:00 +0000
Message-ID: <7fbt07$ffn$1@nnrp1.dejanews.com>

In article <················@naggum.no>,
  Erik Naggum <····@naggum.no> wrote:
> * Vassil Nikolov <········@poboxes.com>
> | It's good that it has been solved (well, I shouldn't say that when I
> | don't know how).  I was never able to understand what made them use M-DEL
> | for a printable character in the first place.
(...)
>   however, if you use an 8-bit character set, the
>   only chance you have at using an application flag is with 10/0 and 15/15,
>   in which case you'd probably want a non-breaking space and what IBM calls
>   EO (eight ones), used as an "end of whatever" signal.  referring to 15/15
>   as "M-DEL" regardless of whether it is a character or EO betrays a
>   serious conceptual confusion about the usage of the code space.

I don't know if what it _betrays_ is true (don't have such introspective
capabilities), but what it _is_ is inappropriate use of technical
jargon.  Sorry for that.

Correct me if I am wrong, but the above (quoted) paragraph does not
contradict a statement that using 15/15 for a printable character is
inappropriate.  Or did I miss anything?

(...)
>   _very_ erratic history of writing systems.

But very interesting, and from an information technology point
of view too.  (Writing is an information technology in my book
as this phrase does not necessarily mean computer technology.)

It is hard to encode the barely encodable.  (I.e. to transform
human speech into a sequence of signs.)  I find it interesting
that the same language can be used for speaking and writing.

>   not that Vassil or anyone here is particularly to blame for
    [inadequacies in standardised character sets]

:-)

(This reminded me of some Russian who allegedly said, `Cyril and
Methodius did such a bad thing to us...' (meaning that otherwise
Russians would be using the Roman alphabet, like e.g. the Polish
or the Czech, and be saved from many headaches, perhaps).)
__________
For the Russian-speaking: `Kiril i Metodij nam takoe nadelali...';
St. Methodius was St. Cyril's brother and co-developer/co-translator.

(...)
>   neither I nor J have dots on them, even though i and j do (despite
>   the linguist3 who decided that Turkish i and j should upcase to I and J
>   with dots, but I and J should downcase to i and j without dots, which I

I don't understand your point here.  In the version of the Roman alphabet
as used in Turkey (and adopted by an Act of Parliament from 1928, by the
way), there are two I's: one has dots both in the small and capital case
(and is pronounced as the `i' in `fit') and the other has no dots either
in the small or capital case (and is pronounced as the `i' in `fir' but
short and without any `r' of course).  Whether this is moronic is not
for me to say, but this is the way the Turkish alphabet is.  (As to J
in that alphabet, it has a dot in the small case only.)

(Turkish is a very rich language, having incorporated a lot from
Arabic and Persian; until Ataturk's reforms in the 1920's, Arabic
script (or some variety thereof) was used for writing.  I do not
know Turkish (apart from a few words), but I have a dictionary and
I know a few facts about its history (of the language, not the
dictionary).)

>   think is at least part of the reason awful movies get Turkey awards), so
>   the nifty computers should produce a _really_ historically moronic letter
>   that nobody in their right mind would ever want to use.

I.e. a small minority would never want to use it, and the majority will
just accept it as the latest and the greatest benefit coming from
computer technology.

>   so, the single cluon in danger of being annihilated by swarms of morons
>   upon contact is that just as � is upcased to SS, � is upcased to IJ.

I wondered (as an academic exercise) what should CHAR-UPCASE and
NSTRING-UPCASE do about LATIN SMALL LETTER Y WITH DIAERESIS (assuming
STRING-UPCASE is allowed to return a longer string which isn't
especially nice either).  Signal an error?  Or the implementation
would state that the character sets it uses do not include this
letter?  (Making CHAR-UPCASE return two values, like #\I and #\J
in this case, appears more than perverse, though who knows.)

> [ this article was best viewed with an ISO 8859-1 capable font. ]

I did use one this time, on a different machine.

(...)

--
Vassil Nikolov <········@poboxes.com> www.poboxes.com/vnikolov
(You may want to cc your posting to me if I _have_ to see it.)
   LEGEMANVALEMFVTVTVM  (Ancient Roman programmers' adage.)

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

From: Erik Naggum
Subject: Re: Reviews for lisp implementations
Date: Sun, 18 Apr 1999 00:00:00 +0000
Message-ID: <3133417387581097@naggum.no>

* Vassil Nikolov <········@poboxes.com>
| Correct me if I am wrong, but the above (quoted) paragraph does not
| contradict a statement that using 15/15 for a printable character is
| inappropriate.  Or did I miss anything?

  yes.  10/0 and 15/15 are characters when the right-hand side of an 8-bit
  character set (GR) is filled with a 96-character set.  (the other 32 are
  control characters (C1).)  if you had filled it with a 94-character set,
  it would have been inappropriate to use 15/15 at all.

  the reason for this is that 10/0 and 15/15 are characters in their own
  right and must be coded with 8 bits, but if you use a shifting coding
  with only 7 bits and codes to swap between G0 and G1 (both now in GL)
  with the codes SO and SI, then it's important that 2/0 and 7/15 remain
  their usual semi-control characters even when G1 is invoked.

| I don't understand your point here.

  seems I was mistaken about the up/downcasing of I with/without dots.
  (shoot, gotta check and go back and fix those files for Emacs.)

| I wondered (as an academic exercise) what should CHAR-UPCASE and
| NSTRING-UPCASE do about LATIN SMALL LETTER Y WITH DIAERESIS (assuming
| STRING-UPCASE is allowed to return a longer string which isn't especially
| nice either).   Signal an error?  Or the implementation would state that
| the character sets it uses do not include this letter?  (Making
| CHAR-UPCASE return two values, like #\I and #\J in this case, appears
| more than perverse, though who knows.)

  I have come to think that people who use sick writing systems should pay
  for their own mistakes so they will have reason to fix them.  forcing
  everybody else to pay for them only causes software not to be available.
  e.g., the Spanish purportedly undid the silly sorting requirements of ll
  (treated as a separate "letter" between k and l, I think it was) due to
  the force of simplicity and logic of computers (or was it marketing :).
  a German spelling reform (which people seem to hate rather strongly) do
  away with the sharp s and spell it "ss" in lowercase, too.  the Norwegian
  and Danish sillitude of sorting "aa" as equivalent to "�" (a ring), and
  the hysterical requirement that German spelled out with "ue" instead of
  "�" should be sorted as if it wasn't spelled out are examples of morons
  who got into standards bodies.  (now, the right way to do this is to
  store a sort key and a print string, but since people don't use tools
  easily extendible that way, forcing stupid people to do this causes a lot
  of grief and problems when they try to print the sort key or vice versa.)

  anyway, let's just ignore the issue and ask them to spell it out as ij,
  like the Dutch correctly do.  (the � is Belgian, _from_ Dutch ij.)  (I'm
  not sure upcasing "ij" to "IJ" is all that great an idea, although it is
  obvious if you look at fonts designed in or for The Netherlands: they
  sport "ij" and "IJ" ligatures, just as fonts designed for Norway has a
  ligature for "fj" just like "fi", because of "fjord" and "fjell".)

  anyway.  8 bits would have been enough if we had been using floating
  diacritics and upcasing and downcasing would have needed to worry about
  A-Z, only.  ISO tried that, too, (ISO 6937) but computer people were not
  able to appreciate it, because they were thinking fonts, not character
  sets.  sigh.

  if there's reincarnation, I hope I won't remember any of this the next
  time around.

#:Erik
-- 
environmentalists are much too concerned with planet earth.  their geocentric
attitude prevents them from seeing the greater picture -- lots of planets are
much worse off than earth is.

From: Juanma Barranquero
Subject: Re: Reviews for lisp implementations
Date: Mon, 19 Apr 1999 00:00:00 +0000
Message-ID: <37205c1c.31713651@news.mad.ttd.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 18 Apr 1999 09:43:07 +0000, Erik Naggum <····@naggum.no> wrote:

>  e.g., the Spanish purportedly undid the silly sorting requirements
>  of ll (treated as a separate "letter" between k and l, I think it
>  was) due to the force of simplicity and logic of computers (or was
>  it marketing :).

Between "l" and "m".

What it's stupid, IMHO, is not the fact of having "ll" as a single
letter, but having it so, and the same with "ch" (between "c" and "d")
and then having "rr" as r+r and "qu" as q+u. The sound of most of
those characters is not related to their spelling ("ll" is not an l+l,
etc., and "q" is *never* used in isolation in Spanish, it is *always*
q+u, the only case in Spanish where "u" is mute) so in a coherent
world either "ch", "ll", "rr" and "qu" should each be treated as a
single entity, or none of them at all (perhaps the best solution).

Regarding the reform of the sorting requirement, the Spanish RAE
("Real Academia Espa�ola de la Lengua") did it, but I think some
latin-american academies objected and the issue was dropped. Not sure,
thought.

                                                       /L/e/k/t/u

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.0.2i

iQA/AwUBNxr0ev4C0a0jUw5YEQJRdQCfWI/MKMWEMIMt4a28s8WlrhWBlZwAn0Fp
+tn5lYZRhWnsoNfQMxuJ7fML
=n4bS
-----END PGP SIGNATURE-----

From: Howard R. Stearns
Subject: string collating (Re: Reviews for lisp implementations)
Date: Tue, 20 Apr 1999 00:00:00 +0000
Message-ID: <371C9BA7.5815BA50@elwood.com>

Vassil Nikolov wrote:
> ...
> I wondered (as an academic exercise) what should CHAR-UPCASE and
> NSTRING-UPCASE do about LATIN SMALL LETTER Y WITH DIAERESIS (assuming
> STRING-UPCASE is allowed to return a longer string which isn't
> especially nice either).  Signal an error?  Or the implementation
> would state that the character sets it uses do not include this
> letter?  (Making CHAR-UPCASE return two values, like #\I and #\J
> in this case, appears more than perverse, though who knows.)

Careful.  Recall that ANSI CL is an American standard and doesn't make
any attempt to accomodate other collating sequences. 

String-upcase and friends are specifically required to work character by
character, without reference to any context:

  "More precisely, each character of the result string is produced by   
  applying the function char-upcase to the corresponding character of 
  string."

I would have thought that ISO would have addressed this issue more
broadly in ISLisp, but it does not appear that they did.  There is no
string-upcase at all, and string< and friends are specifically defined
to work character by character:

   "Two strings string1 and string2 are in order (string<) if in the
first 
    position in which they differ the character of string1 is char< the 
    corresponding character of string2, ..."

Given that Scheme is an ISO standard, apparently tries to do either the
right thing or nothing at all, and seems to try to not include useful
utilities which are "obvious" compositions or iterations of other
utilities, I would have expected that Scheme either wouldn't have string
operations at all or would have them do the contextually right thing. 
After all, if you just want to map over a sequence with some char< or
such function, just do it.  Of course, I'm wrong.  Scheme also defines
string< to work character by character, but at least it meets my
expectations by failing to define string-upcase at all.  In case I'm
misinterpreting, here's the definition for string-<? and friends:

  "These procedures are the lexicographic extensions to strings of the 
  corresponding orderings on characters. For example,
  string<? is the lexicographic ordering on strings induced by the
ordering 
  char<? on characters. If two strings differ in
  length but are the same up to the length of the shorter string, the 
  shorter string is considered to be lexicographically less
  than the longer string."

From: Reini Urban
Subject: Re: Reviews for lisp implementations
Date: Fri, 16 Apr 1999 00:00:00 +0000
Message-ID: <3717b01c.47909860@judy>

David Fox <·····@harlequin.co.uk> wrote:
>Yes, in LispWorks we added the alphabetic property and case-pairs
>(beyond those required by the ANSI standard) for Latin-1 only. I
>should admit that this is rather half-baked, ...

If you need the case pairs for some other codepages, 
you can grab those. It's not the full set but you get the idea. 
I haven't found them on the net anywhere so I had to calculate them by
my own. I wrote it for AutoLISP, in CL you maybe could use vectors
instead.

I post this also (instead of linking to it) to let you see how weird
some codepages had been designed, considering case predicates and case
conversions. I guess most OS do it by precalculating the tables, wasting
a lot of bytes.

Note: the third element <islower> of each triple is the numeric
difference from the uppercase to the lowercase char. 
so (65 90 32) means that there are 36 uppercase chars from 60 to 95
with the lower brothers 32 above (65+32 up to 90+32)

;;; Hardcoded charset capital letter ranges per codepage,
;;; kind of LC_CTYPE info. Format: list of: (<from> <to> <tolower>)
;;; Found the differences in toupper, tolower, isupper, islower
;;; by scanning the descriptive character names for upper and lower,
;;; unified the pairs into groups and came up with redefinitions
;;; of the upper/lower predicates and conversions.
(setq std:cp-cap-ascii     '((65 90 32)))	; this is simple
			   ;; there's a hole at 215
(setq std:cp-cap-iso8859-1 '((65 90 32)(192 214 32)(216 223 32)))
(setq std:cp-cap-iso8859-2 '((65 90 32)(192 214 32)(216 223 32)
                             (161 161 16)(163 163 16)
                             (165 166 16)(169 172 16)(174 175 16)
                             ))
(setq std:cp-cap-iso8859-3 '((65 90 32)(192 214 32)(216 223 32)
                             (161 161 16)(166 166 16)
                             (169 172 16)(175 175 16)
			     ; 0xAE, 0xBE seem to missing
                             ))

;;; A really weird charset (by ibm), very old.
;;; thanksfully the system provided strcase should handle this most 
;;; of the time (by static table loopkup)
(setq std:cp-cap-dos850    '((65 90 32)
	(128 128 7)(142 142 -10)(143 143 -9)(144 144 -14)
	(146 146 -1) (153 153 -5) (154 154 -25) (157 157 -2) 
	(165 165 -1) (181 181 -21) (182 182 -51) (183 183 -50) 
	(185 185 -5) (186 186 -7)
	(188 188 29) (199 199 -1) (209 209 -1)
	(210 212 -74)(214 214 -53) (215 215 -75) (216 216 -77) 
	(222 222 -81)
	(224 224 -62) (226 226 -79) (227 227 -78) (229 229 -1) 
	(231 231 1)
	(233 233 -70) (234 235 -84) (237 237 -1)))
(setq std:cp-cap-iso8859-4 '((65 90 32) (192 214 32)(216 222 32)
			(152 152 71) (161 161 16) 
			(163 163 16) (165 166 16)
		        (169 172 16) (174 174 16) (189 189 2)
		        ;; not tested
		        ))
(setq std:cp-cap-koi-8r    '((65 90 32)
                             (179 179 -16)(224 255 -32)))
;; the weirdest charset ever (by microsoft), ignoring cp866, 
;; iso-8859-5 and koi8-r
(setq std:cp-cap-cp1251    '((65 90 32)(192 223 32)
                             (128 128 16)(129 129 2)
                             (138 138 16)(140 143 16)
                             (161 161 1)(163 163 25)(165 165 15)
                             (168 168 16)(170 170 16)(175 175 16)
                             (178 178 1)(189 189 1)
                             ))
(setq std:cp-cap-dos866    '((65 90 32)
                             (128 144 32)(145 159 80)
                             (240 240 1)(242 242 1)
                             (244 244 1)(246 246 1)
                             ))

;; Beware: Dynamic Autolisp code, just to get the idea.
;; you really should store the pairs in bitfield for the 
;; predicates and vectors for the converters.
(defun STD-ISUPPER (_i)
  (if (stringp _i)
    (setq _i (ascii _i)))
  (apply 'or
         (mapcar
           (function (lambda (l)
             (<= (car l) _i (cadr l))))
           std:actual-cp-cap)))

(defun STD-TOUPPER (i / cp x)
  (setq x (car (setq cp std:actual-cp-cap)))
  (while x
    (if (<= (+ (caddr x) (car x)) i (+ (caddr x) (cadr x)))
      (setq i (- i (caddr x))
            x nil)
      (setq cp (cdr cp) x (car cp))
    )
  )
  i
)

this is from http://xarch.tu-graz.ac.at/autocad/stdlib/STDLOCAL.LSP
---
Reini Urban
http://xarch.tu-graz.ac.at/autocad/news/faq/autolisp.html

From: Vassil Nikolov
Subject: Re: Reviews for lisp implementations
Date: Sat, 17 Apr 1999 00:00:00 +0000
Message-ID: <7fa950$5nv$1@nnrp1.dejanews.com>

In article <·················@judy>,
  ······@sbox.tu-graz.ac.at (Reini Urban) wrote:
(...)
> I post this also (instead of linking to it) to let you see how weird
> some codepages had been designed, considering case predicates and case
> conversions. I guess most OS do it by precalculating the tables, wasting
> a lot of bytes.
(...)

First of all, it was nice of you to post a useful piece of data.

Second, I would like to make a few points, not to criticise, but
to show there are different ways to look at this.

* The sets you identified as weird all contain Cyrillic characters
  that by themselves look rather strange, even to one who knows the
  Greek alphabet (which helps a little).  Regarding the layout,
  weirdness comes at least in part from the fact that only the
  32 `mainstream' Cyrillic characters are in contiguous positions
  (even with `well-behaved' sets like 8859-5).  Since there are
  other characters in addition to these 32, they had to be fit
  elsewhere, while deciding which other characters (like left/right
  single/double quotes) to keep and which to sacrifice.

  (By the way, even limiting ourselves to ()[]{}<>, there isn't
  a simple operation like toggling a bit to convert an `opener'
  into a `closer,' so even 7-bit ASCII is not absolutely
  regular (not that it could have been, I believe).)

* Keeping tables to support case conversions etc. does not take
  up that much memory (especially now that memory does not come
  so expensive as a couple of decades ago), and improves speed
  a lot; besides, with some sets like KOI-8 and effectively
  Macintosh Cyrillics^1 as well, tables are a must in order to
  do sorting even if we limit ourselves to the `mainstream'
  characters (because (< (CODE-CHAR a) (CODE-CHAR b)) does not
  produce alphabetical order).
__________
^1 uppercase: 80-9F, lowercase: E0-FE,DF

Third, if anyone needs assistance with making sense out of Cyrillic
characters and sets (in particular, Bulgarian, Russian, and Serbian
<silly remarks deleted>), I'd be happy to be of any help, just send
me a private e-mail.

Good luck with character sets,
Vassil.

--
Vassil Nikolov <········@poboxes.com> www.poboxes.com/vnikolov
(You may want to cc your posting to me if I _have_ to see it.)
   LEGEMANVALEMFVTVTVM  (Ancient Roman programmers' adage.)

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

From: Fernando D. Mato Mira
Subject: Re: Reviews for lisp implementations
Date: Thu, 15 Apr 1999 00:00:00 +0000
Message-ID: <3715EB24.B3FB332E@iname.com>

Arthur Lemmens wrote:

> I've waited two days for people with more experience to shed some
> light here. But, apparently, nobody is willing to burn his fingers
> on a comparison between Harlequin and Franz. So here's my (very

OK. I've used both Allegro (up to 4.2) and Harlequin (upto 1995),
and I must say, FFI issues aside, that I would go with Allegro. I just
feel safer (some might argue that Harlequin's other businesses make it
safer,
but for me it gives a feeling (just an impression, but that's what marketing

is all about) of lack of commitment.
Maybe more importantly, I never `got' the Harlequin way. I'm not a fan
of IDEs. The primitive (but then, you have those cool menus) Xemacs
interface of
Allegro feels good. I've never used ACL for Windows but it looks pretty
neat,
so maybe it's just that I don't like _this_ Harlequin IDE.

I'd really like a Genera for SGI to play around. I have no use for it right
now, but I'd gladly pay my own $800 for the manuals (upgradeable to a
commercial license).

--
Fernando D. Mato Mira
Real-Time SW Eng & Networking
Advanced Systems Engineering Division
CSEM
Jaquet-Droz 1                   email: matomira AT acm DOT org
CH-2007 Neuchatel                 tel:       +41 (32) 720-5157
Switzerland                       FAX:       +41 (32) 720-5720

www.csem.ch      www.vrai.com     ligwww.epfl.ch/matomira.html