From: Thomas Gagne
Subject: CLisp case sensitivity
Date: 
Message-ID: <MO-dndtuHLYq_yPcRVn-vQ@wideopenwest.com>
I've read that Common Lisp is case sensitive, but have also noticed that 
Allegro has a way of creating a case-sensitive image.  Can the same thing be 
done with clisp (on GNU/Linux)?

From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <873by9o6ee.fsf@thalassa.informatimago.com>
Thomas Gagne <······@wide-open-west.com> writes:

> I've read that Common Lisp is case sensitive, but have also noticed
> that Allegro has a way of creating a case-sensitive image.  Can the
> same thing be done with clisp (on GNU/Linux)?

clisp  is not  Common Lisp:  clisp is  but one  implementation  of the
language named Common Lisp.


Common Lisp IS case sensitive, BUT its reader can be configured, and
its default configuration is to upcase every symbol, which means that
it's case insensitive.�Other configurations allow to preserve case,
rendering it effectively case sensitive, or even, to _invert_ case,
rendering it completely schizophrenic about case.

Read about *READTABLE-CASE* in CLHS.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Cats meow out of angst
"Thumbs! If only we had thumbs!
We could break so much!"
From: Bruno Haible
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <cpmph4$ge4$1@laposte.ilog.fr>
> have also noticed that Allegro has a way of creating a case-sensitive image.
> Can the same thing be done with clisp (on GNU/Linux)?

Yes, it can: Just start "clisp -modern". It uses the same memory image as
normal "clisp".

And it is even better than Allegro: In CLISP you can mix old-style source
code with modern case-sensitive source code. Thus you can migrate your big
applications to the modern case-sensitive mode slowly, package by package;
you're not forced to do it all at once.

The feature is in CLISP CVS and will be part of clisp-2.34; the
implementation follows the lines presented at LSM 2004 [1].

               Bruno


[1] http://www-jcsu.jesus.cam.ac.uk/~csr21/papers/lightning/lightning.html#htoc4
From: Julian Stecklina
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <86k6rkux55.fsf@goldenaxe.localnet>
Bruno Haible <·····@clisp.org> writes:

>> have also noticed that Allegro has a way of creating a case-sensitive image.
>> Can the same thing be done with clisp (on GNU/Linux)?
>
> Yes, it can: Just start "clisp -modern". It uses the same memory image as
> normal "clisp".

Ok, I do not get it. Why is case-sensitivity = modern? Looks like
clisp -old-school to me. 

Regards,
-- 
                    ____________________________
 Julian Stecklina  /  _________________________/
  ________________/  /
  \_________________/  LISP - truly beautiful
From: Carl Shapiro
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ouywtvkrmrn.fsf@panix3.panix.com>
Julian Stecklina <··········@web.de> writes:

> Bruno Haible <·····@clisp.org> writes:
> 
> >> have also noticed that Allegro has a way of creating a case-sensitive image.
> >> Can the same thing be done with clisp (on GNU/Linux)?
> >
> > Yes, it can: Just start "clisp -modern". It uses the same memory image as
> > normal "clisp".
> 
> Ok, I do not get it. Why is case-sensitivity = modern? Looks like
> clisp -old-school to me. 

Modern mode ought to have been named Franzlisp mode, reflecting the
lineage of the case sensitive reader algorithm in Allegro Common Lisp.
From: Bruno Haible
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <cpp9f6$e33$1@laposte.ilog.fr>
Carl Shapiro wrote:
> Modern mode ought to have been named Franzlisp mode, reflecting the
> lineage of the case sensitive reader algorithm in Allegro Common Lisp.

clisp is not using Allegro CL's algorithm, but a new one.
In Allegro, the case-sensivity bit is in the readtable. In clisp, it
is per package.

                          Bruno
From: jayessay
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <m3brcv8j5v.fsf@rigel.goldenthreadtech.com>
Bruno Haible <·····@clisp.org> writes:

> Carl Shapiro wrote:
> > Modern mode ought to have been named Franzlisp mode, reflecting the
> > lineage of the case sensitive reader algorithm in Allegro Common Lisp.
> 
> clisp is not using Allegro CL's algorithm, but a new one.
> In Allegro, the case-sensivity bit is in the readtable. In clisp, it
> is per package.

This smells pretty close to the right way to do it.  Or have a
readtable per package and leave it in the readtable (or maybe that is
what you meant, and that Allegro only has global tables).


/Jon

-- 
'j' - a n t h o n y at romeo/charley/november com
From: Carl Shapiro
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ouysm67l0in.fsf@panix3.panix.com>
jayessay <······@foo.com> writes:

> Bruno Haible <·····@clisp.org> writes:
> 
> > Carl Shapiro wrote:
> > > Modern mode ought to have been named Franzlisp mode, reflecting the
> > > lineage of the case sensitive reader algorithm in Allegro Common Lisp.
> > 
> > clisp is not using Allegro CL's algorithm, but a new one.
> > In Allegro, the case-sensivity bit is in the readtable. In clisp, it
> > is per package.
> 
> This smells pretty close to the right way to do it.  Or have a
> readtable per package and leave it in the readtable (or maybe that is
> what you meant, and that Allegro only has global tables).

The right way to handle different reader modes is to have a means to
declaratively specify a syntax.  A syntax definition would include,
among other things, a readtable, symbol-lookup mechanism and package
mappings.  Without at least these three features you are going to end
up with a zombie environment which is not self consistent, and users
will have to resort to off beat idioms to write code which works
everywhere.  Syntaxes can invoked on a per-module basis, or switched
in and out of at the top-level.  (In summary, associating this
behavior with packages is far from sufficient.)
From: jayessay
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <m36532880n.fsf@rigel.goldenthreadtech.com>
Carl Shapiro <·············@panix.com> writes:

> jayessay <······@foo.com> writes:
> 
> > Bruno Haible <·····@clisp.org> writes:
> > 
> > > Carl Shapiro wrote:
> > > > Modern mode ought to have been named Franzlisp mode, reflecting the
> > > > lineage of the case sensitive reader algorithm in Allegro Common Lisp.
> > > 
> > > clisp is not using Allegro CL's algorithm, but a new one.
> > > In Allegro, the case-sensivity bit is in the readtable. In clisp, it
> > > is per package.
> > 
> > This smells pretty close to the right way to do it.  Or have a
> > readtable per package and leave it in the readtable (or maybe that is
> > what you meant, and that Allegro only has global tables).
> 
> The right way to handle different reader modes is to have a means to
> declaratively specify a syntax.  A syntax definition would include,
> among other things, a readtable, symbol-lookup mechanism and package
> mappings.  Without at least these three features you are going to end
> up with a zombie environment which is not self consistent, and users
> will have to resort to off beat idioms to write code which works
> everywhere.  Syntaxes can invoked on a per-module basis, or switched
> in and out of at the top-level.  (In summary, associating this
> behavior with packages is far from sufficient.)

You've obviously thought about this more than I have.  These are good
points and sound on target.  Thanks.


/Jon

-- 
'j' - a n t h o n y at romeo/charley/november com
From: Carl Shapiro
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ouy7jniouni.fsf@panix3.panix.com>
jayessay <······@foo.com> writes:

> Carl Shapiro <·············@panix.com> writes:

> > The right way to handle different reader modes is to have a means to
> > declaratively specify a syntax.  A syntax definition would include,
> > among other things, a readtable, symbol-lookup mechanism and package
> > mappings.  Without at least these three features you are going to end
> > up with a zombie environment which is not self consistent, and users
> > will have to resort to off beat idioms to write code which works
> > everywhere.  Syntaxes can invoked on a per-module basis, or switched
> > in and out of at the top-level.  (In summary, associating this
> > behavior with packages is far from sufficient.)

> You've obviously thought about this more than I have.  These are good
> points and sound on target.  Thanks.

I did a little bit of thinking about this problem when I began to use
a half ML, half Prolog-like system which was built on top of Common
Lisp.  (You could do all sorts of grody academic stuff, and then
escape to Common Lisp when real work had to be done.)  In the Prolog
tradition variables and symbols were differentiated by the case of the
first character in a symbol's print-name.  Also, the semicolon was no
longer the comment delimiter and was used rampantly in program code.

I had the good fortune to use a Lisp which supported the syntax
facility that I described.  Syntaxes could be defined hierarchically,
so I subclassed the ANSI Common Lisp syntax, specified a hacked up
readtable, adjusted various package mappings (forward "common-lisp" to
"COMMON-LISP", use a different user package, etc.) and was good to go.
As long as I stuck "Syntax: <the name of my syntax>" at the top of
every source file, all of the development tools would automatically
adjust to the unique treatment of my program code.
From: jayessay
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <m3wtvi6jbm.fsf@rigel.goldenthreadtech.com>
Carl Shapiro <·············@panix.com> writes:

> I had the good fortune to use a Lisp which supported the syntax
> facility that I described.  Syntaxes could be defined hierarchically,
> so I subclassed the ANSI Common Lisp syntax, specified a hacked up
> readtable, adjusted various package mappings (forward "common-lisp" to
> "COMMON-LISP", use a different user package, etc.) and was good to go.
> As long as I stuck "Syntax: <the name of my syntax>" at the top of
> every source file, all of the development tools would automatically
> adjust to the unique treatment of my program code.


Nice.  Which Lisp was this?  Also, is "was" the operative word
here?...


/Jon

-- 
'j' - a n t h o n y at romeo/charley/november com
From: Carl Shapiro
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ouyy8fxooke.fsf@panix3.panix.com>
jayessay <······@foo.com> writes:

> Carl Shapiro <·············@panix.com> writes:
> 
> > I had the good fortune to use a Lisp which supported the syntax
> > facility that I described.  Syntaxes could be defined hierarchically,
> > so I subclassed the ANSI Common Lisp syntax, specified a hacked up
> > readtable, adjusted various package mappings (forward "common-lisp" to
> > "COMMON-LISP", use a different user package, etc.) and was good to go.
> > As long as I stuck "Syntax: <the name of my syntax>" at the top of
> > every source file, all of the development tools would automatically
> > adjust to the unique treatment of my program code.
> 
> 
> Nice.  Which Lisp was this?  Also, is "was" the operative word
> here?...

I was doing this work on Symbolics Genera.  (I posted a more detailed
description of the system in another follow-up which arrived at my
NNTP sever before your's did.)  The syntax support required no Lisp
machine magic, just a few prescient design decisions.

If anybody wants to see this system in action, I'll make sure there is
at least one Lisp Machine and documentation set at the ILC 2005, this
June.
From: Edi Weitz
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <uvfb1vlcz.fsf@agharta.de>
On 16 Dec 2004 17:39:29 -0500, Carl Shapiro <·············@panix.com> wrote:

> If anybody wants to see this system in action, I'll make sure there
> is at least one Lisp Machine and documentation set at the ILC 2005,
> this June.

That's a very good idea.  How about a "tutorial" where a LispM wizard
demoes the system in action?  I'm looking forward to that.

Cheers,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Carl Shapiro
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ouyvfb1mnji.fsf@panix3.panix.com>
Edi Weitz <········@agharta.de> writes:

> On 16 Dec 2004 17:39:29 -0500, Carl Shapiro <·············@panix.com> wrote:
> 
> > If anybody wants to see this system in action, I'll make sure there
> > is at least one Lisp Machine and documentation set at the ILC 2005,
> > this June.
> 
> That's a very good idea.  How about a "tutorial" where a LispM wizard
> demoes the system in action?  I'm looking forward to that.

For what it's worth, past ILCs have always had several Lisp Machine
users in attendance.  (That is to say, people who still use The
Machine every day as part of their job function, not just people with
a dump truck full of fond memories.)  I know next year will be no
different.

Something past organizing chairs have consistently forgotten to do is
to secure a space with enough tables and chairs so people can
congregate around machines, show off programs, and work on code
together.  This time around we will have space where people can play
on various Lisp systems.  I cannot schedule a demonstration of the
Lisp Machine unless somebody would like to volunteer in advance...
However, there will be no shortage of people around who are capable of
giving informal tours.
From: Edi Weitz
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ur7lqgf63.fsf@agharta.de>
On 16 Dec 2004 15:28:01 -0500, Carl Shapiro <·············@panix.com> wrote:

> I had the good fortune to use a Lisp which supported the syntax
> facility that I described.  Syntaxes could be defined
> hierarchically, so I subclassed the ANSI Common Lisp syntax,
> specified a hacked up readtable, adjusted various package mappings
> (forward "common-lisp" to "COMMON-LISP", use a different user
> package, etc.) and was good to go.  As long as I stuck "Syntax: <the
> name of my syntax>" at the top of every source file, all of the
> development tools would automatically adjust to the unique treatment
> of my program code.

Which environment was that?  Genera?

Thanks,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Carl Shapiro
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ouy3by5q3cl.fsf@panix3.panix.com>
Edi Weitz <········@agharta.de> writes:

> On 16 Dec 2004 15:28:01 -0500, Carl Shapiro <·············@panix.com> wrote:
> 
> > I had the good fortune to use a Lisp which supported the syntax
> > facility that I described.  Syntaxes could be defined
> > hierarchically, so I subclassed the ANSI Common Lisp syntax,
> > specified a hacked up readtable, adjusted various package mappings
> > (forward "common-lisp" to "COMMON-LISP", use a different user
> > package, etc.) and was good to go.  As long as I stuck "Syntax: <the
> > name of my syntax>" at the top of every source file, all of the
> > development tools would automatically adjust to the unique treatment
> > of my program code.
> 
> Which environment was that?  Genera?

It sho' was.

The amount of systems code to supported the notion of a syntax is
incredibly tiny.  There were only a few minor changes to the package
system interface so an optional argument could be passed along with
the usual parameters.  The reader had to be taught about the triple
colon operator to make a syntax explicit when a symbol was being read.
You may occasionally see SYNTAX:::PACKAGE::SYMBOL in various obscure
places.

Package maps were the biggest win, and I wish that more people would
experiment with this rather than so-called "hierarchical packages".
The package structure add two slots for relative name fields.  The
reader would examine one of these two slots when reading a symbol
before resolving the package.  A real hierarchy of names exists in
that one can lookup names through other packages, recursively.

Here is an example of how this could work in practice.  Say I wanted
to port code from one system which referenced a package that already
existed on my Lisp but whose exported interface was different.  One
way to correct this problem would be to create a compatibility package
which had a unique name and the expected interfaces.  Unfortunately,
in order to ensure that symbols are resolved to the right package, you
would have to hunt down every reference to the old package name in the
ported source code and replace it with the compatibility package name.
A relative mapping could fix this problem by instructing the reader
that whenever a symbol is referenced from my ported program's package
to the (for example) MP package, forward that reference to this other
package, the compatibility MP package.

A lot of thought and care went into the syntax facility.  Its design
is quite complete and elegant.
From: Alexander Schmolck
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <yfsmzwfpeta.fsf@black4.ex.ac.uk>
Bruno Haible <·····@clisp.org> writes:

> Carl Shapiro wrote:
>> Modern mode ought to have been named Franzlisp mode, reflecting the
>> lineage of the case sensitive reader algorithm in Allegro Common Lisp.
>
> clisp is not using Allegro CL's algorithm, but a new one.
> In Allegro, the case-sensivity bit is in the readtable. In clisp, it
> is per package.

Do keywords work properly?

'as
From: sds
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <1103137657.986879.18380@c13g2000cwb.googlegroups.com>
Alexander Schmolck wrote:
> Bruno Haible <·····@clisp.org> writes:
>
> > Carl Shapiro wrote:
> >> Modern mode ought to have been named Franzlisp mode, reflecting
the
> >> lineage of the case sensitive reader algorithm in Allegro Common
Lisp.
> >
> > clisp is not using Allegro CL's algorithm, but a new one.
> > In Allegro, the case-sensivity bit is in the readtable. In clisp,
it
> > is per package.
>
> Do keywords work properly?

yes.
<http://www.podval.org/~sds/clisp/impnotes/package-case.html#cs-gensym-kwd>
From: Alexander Schmolck
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <yfsy8fybtbj.fsf@black4.ex.ac.uk>
"sds" <···@gnu.org> writes:
> Alexander Schmolck wrote:
>>
>> Do keywords work properly?
>
> yes.
> <http://www.podval.org/~sds/clisp/impnotes/package-case.html#cs-gensym-kwd>

Hmm, as to the "limited negative impact" of (eq :KeyWord :keyword) in
case-sensitive packages: keywords also seem to be quite popular for things
were case *does* matter (like xml tags) so isn't this likely to cause
compatibility problems with acl-modern or inverted readtable-case packages?

'as
From: Julian Stecklina
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <86u0qndo98.fsf@goldenaxe.localnet>
Carl Shapiro <·············@panix.com> writes:

> Modern mode ought to have been named Franzlisp mode, reflecting the
> lineage of the case sensitive reader algorithm in Allegro Common Lisp.

Is there any information on why Franz Inc. introduced this?

Regards,
-- 
                    ____________________________
 Julian Stecklina  /  _________________________/
  ________________/  /
  \_________________/  LISP - truly beautiful
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.14.05.05.35.783757@consulting.net.nz>
Hi Thomas Gagne,

> I've read that Common Lisp is case sensitive, but have also noticed that 
> Allegro has a way of creating a case-sensitive image.  Can the same thing be 
> done with clisp (on GNU/Linux)?

If the question is: Do any of the free Common Lisp implementations provide
a build-time option to intern all symbols in the COMMON-LISP package in
lower case so that :PRESERVE is a suitable readtable option?

The answer is: No.

The next best alternative is to use the :INVERT readtable mode. This
inverts the symbol name of all lowercase or all uppercase symbols as they
are being read while leaving the symbol name of mixed-case symbols alone.

This is the only suitable readtable option that maintains case information
because the ANSI Common Lisp committee decided backwards compatibility
with traditional uppercasing Lisps was most important. The decision hasn't
stood the test of time. If they'd made a better choice the pain of
transition would have been long over.

Readtable case should be deprecated. Symbols should be interned as
written in source code and implementors should not have the burden of
implementing "historical" baggage that is difficult to get 100% right
(e.g. ABCL is continuing to squash :INVERT mode read and print errors).

Note that the ANSI Common Lisp specification is considered sacrosanct and
these comments heretical.

Regards,
Adam
From: Chris Capel
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <10rt9t1nsbctt22@corp.supernews.com>
Adam Warner wrote:

> Hi Thomas Gagne,
> 
>> I've read that Common Lisp is case sensitive, but have also noticed that
>> Allegro has a way of creating a case-sensitive image.  Can the same thing
>> be done with clisp (on GNU/Linux)?
> 
> If the question is: Do any of the free Common Lisp implementations provide
> a build-time option to intern all symbols in the COMMON-LISP package in
> lower case so that :PRESERVE is a suitable readtable option?
> 
> The answer is: No.
> 
> The next best alternative is to use the :INVERT readtable mode. This
> inverts the symbol name of all lowercase or all uppercase symbols as they
> are being read while leaving the symbol name of mixed-case symbols alone.
> 
> This is the only suitable readtable option that maintains case information
> because the ANSI Common Lisp committee decided backwards compatibility
> with traditional uppercasing Lisps was most important. The decision hasn't
> stood the test of time. If they'd made a better choice the pain of
> transition would have been long over.

Another example of the ramifications of this decision: inconsistent
functions names. For example, the convention of using an "f" suffix on
functions using places that isn't followed everywhere (getf, push). The
convention of using a p or -p suffix with many type testing functions, but
not ATOM or NULL! I'm sure others can point out other examples.

Be thankful it isn't as bad as the C standard library.  (Atoi? What's
Eh-toy? Some sort of faerie name?)

Chris Capel
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.14.10.49.34.276021@consulting.net.nz>
Hi Chris Capel,

> Another example of the ramifications of this decision: inconsistent
> functions names. For example, the convention of using an "f" suffix on
> functions using places that isn't followed everywhere (getf, push). The
> convention of using a p or -p suffix with many type testing functions, but
> not ATOM or NULL! I'm sure others can point out other examples.

Look at the character predicates:

                            characterp
                          alpha-char-p
                          digit-char-p
                        graphic-char-p
                       standard-char-p

You can guess why we don't have `charp'. I find using ? to denote
predicates helpful. I don't start a symbol with a non-alphanumeric so the
punctuation marks can still be used as non-terminating dispatching macro
characters [just as # can still be used within symbol names, e.g.
-#x00FF and abc#|this-is-not-a-comment|#def are both symbols].

> Be thankful it isn't as bad as the C standard library.  (Atoi? What's
> Eh-toy? Some sort of faerie name?)

ANSI Common Lisp's naming inconsistency no longer bothers me. At least
when annoyed I can resolve the issue using the package system.

Perhaps the most overlooked inconsistency is that LENGTH returns an
implementation-specific value for strings [largely depending upon whether
strings are implemented as sequences of octets (CMUCL, GCL, historically
SBCL), 16-bit values (ABCL) or 32-bit values (CLISP, SBCL)].

Regards,
Adam
From: Barry Margolin
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <barmar-B18246.08360914122004@comcast.dca.giganews.com>
In article <······························@consulting.net.nz>,
 Adam Warner <······@consulting.net.nz> wrote:

> Hi Chris Capel,
> 
> > Another example of the ramifications of this decision: inconsistent
> > functions names. For example, the convention of using an "f" suffix on
> > functions using places that isn't followed everywhere (getf, push). The
> > convention of using a p or -p suffix with many type testing functions, but
> > not ATOM or NULL! I'm sure others can point out other examples.
> 
> Look at the character predicates:
> 
>                             characterp
>                           alpha-char-p
>                           digit-char-p
>                         graphic-char-p
>                        standard-char-p

What's the problem there?  The convention, which I think is even 
explained explicitly in CLTL, is that "p" is appended to single words 
(e.g. "character"), and "-p" is appended to multiple words (e.g. 
"alpha-char").

> Perhaps the most overlooked inconsistency is that LENGTH returns an
> implementation-specific value for strings [largely depending upon whether
> strings are implemented as sequences of octets (CMUCL, GCL, historically
> SBCL), 16-bit values (ABCL) or 32-bit values (CLISP, SBCL)].

What are you talking about?  It returns the number of array elements, 
regardless of their size.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
From: Chris Riesbeck
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <criesbeck-3533E6.12380614122004@individual.net>
In article <····························@comcast.dca.giganews.com>,
 Barry Margolin <······@alum.mit.edu> wrote:

> In article <······························@consulting.net.nz>,
>  Adam Warner <······@consulting.net.nz> wrote:
> > 
> > Look at the character predicates:
> > 
> >                             characterp
> >                           alpha-char-p
> >                           digit-char-p
> >                         graphic-char-p
> >                        standard-char-p
> 
> What's the problem there?  The convention, which I think is even 
> explained explicitly in CLTL, is that "p" is appended to single words 
> (e.g. "character"), and "-p" is appended to multiple words (e.g. 
> "alpha-char").

Correct, though let us not forget our old friend
string-lessp, which is NOT an exception, but does
invoke an additional rule:

http://www.cliki.net/Naming%20conventions
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.14.22.45.29.948899@consulting.net.nz>
Hi Barry Margolin,

>> Hi Chris Capel,
>> 
>> > Another example of the ramifications of this decision: inconsistent
>> > functions names. For example, the convention of using an "f" suffix on
>> > functions using places that isn't followed everywhere (getf, push). The
>> > convention of using a p or -p suffix with many type testing functions, but
>> > not ATOM or NULL! I'm sure others can point out other examples.
>> 
>> Look at the character predicates:
>> 
>>                             characterp
>>                           alpha-char-p
>>                           digit-char-p
>>                         graphic-char-p
>>                        standard-char-p
> 
>You can guess why we don't have `charp'. What's the problem there?  The
>convention, which I think is even
> explained explicitly in CLTL, is that "p" is appended to single words
> (e.g. "character"), and "-p" is appended to multiple words (e.g.
> "alpha-char").

"You can guess why we don't have `charp'": All the other character
predicates and equality tests use the contracted form of character, char.
So does Scheme. It just so happens that charp looks stupid and would be
pronounced like see-harp, kar-pee or krap (because of the convention).

Pity the language designers forgot the convention for DEFSTRUCT! How hard
would have it have been to parse the symbol name for #\-? Obviously the
problem is that appending a #\p to a symbol name can create strange
combinations that are best avoided by not applying the convention.

>> Perhaps the most overlooked inconsistency is that LENGTH returns an
>> implementation-specific value for strings [largely depending upon whether
>> strings are implemented as sequences of octets (CMUCL, GCL, historically
>> SBCL), 16-bit values (ABCL) or 32-bit values (CLISP, SBCL)].
> 
> What are you talking about?  It returns the number of array elements, 
> regardless of their size.

Even with the same operating system and locale different implementations
of ANSI Common Lisp cons up strings of different lengths when the input is
identical. This means the position of corresponding characters is
implementation dependent and the length of any resulting string is
implementation dependent.

[In the table below I'm assuming ABCL has completed its Java string
support so that Unicode characters are correctly read and stored in 16-bit
strings. Input is UTF-8]

           STRING     CLISP/SBCL     ABCL       CMUCL/GCL
             "A"	  1            1            1
             "Ā"	  1            1            2
             "✐"	  1            1            3
             "𐀀"	  1            2            4

Java dictates that all implementations have the same string representation
(a suboptimal one, but at least it's the same). Python has the same
issue as Common Lisp, but users can choose which way to build it:
<http://python.fyxm.net/peps/pep-0261.html>

Regards,
Adam
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.14.22.46.30.996369@consulting.net.nz>
Hi Barry Margolin,

>> Hi Chris Capel,
>> 
>> > Another example of the ramifications of this decision: inconsistent
>> > functions names. For example, the convention of using an "f" suffix on
>> > functions using places that isn't followed everywhere (getf, push). The
>> > convention of using a p or -p suffix with many type testing functions, but
>> > not ATOM or NULL! I'm sure others can point out other examples.
>> 
>> Look at the character predicates:
>> 
>>                             characterp
>>                           alpha-char-p
>>                           digit-char-p
>>                         graphic-char-p
>>                        standard-char-p
> 
>You can guess why we don't have `charp'. What's the problem there?  The
>convention, which I think is even
> explained explicitly in CLTL, is that "p" is appended to single words
> (e.g. "character"), and "-p" is appended to multiple words (e.g.
> "alpha-char").

"You can guess why we don't have `charp'": All the other character
predicates and equality tests use the contracted form of character, char.
So does Scheme. It just so happens that charp looks stupid and would be
pronounced like see-harp, kar-pee or krap (because of the convention).

Pity the language designers forgot the convention for DEFSTRUCT! How hard
would it have been to parse the symbol name for #\-? An obvious problem is
that appending #\P to a symbol name can create strange combinations that
are best avoided by not applying the convention.

>> Perhaps the most overlooked inconsistency is that LENGTH returns an
>> implementation-specific value for strings [largely depending upon whether
>> strings are implemented as sequences of octets (CMUCL, GCL, historically
>> SBCL), 16-bit values (ABCL) or 32-bit values (CLISP, SBCL)].
> 
> What are you talking about?  It returns the number of array elements, 
> regardless of their size.

Even with the same operating system and locale different implementations
of ANSI Common Lisp cons up strings of different lengths when the input is
identical. This means the position of corresponding characters is
implementation dependent and the length of any resulting string is
implementation dependent.

[In the table below I'm assuming ABCL has completed its Java string
support so that Unicode characters are correctly read and stored in 16-bit
strings. Input is UTF-8]

           STRING     CLISP/SBCL     ABCL       CMUCL/GCL
             "A"	  1            1            1
             "Ā"	  1            1            2
             "✐"	  1            1            3
             "𐀀"	  1            2            4

Java dictates that all implementations have the same string representation
(a suboptimal one, but at least it's the same). Python has the same
issue as Common Lisp, but users can choose which way to build it:
<http://python.fyxm.net/peps/pep-0261.html>

Regards,
Adam
From: Barry Margolin
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <barmar-64A687.00481815122004@comcast.dca.giganews.com>
In article <······························@consulting.net.nz>,
 Adam Warner <······@consulting.net.nz> wrote:

> Hi Barry Margolin,
> 
> >> Hi Chris Capel,
> >> Perhaps the most overlooked inconsistency is that LENGTH returns an
> >> implementation-specific value for strings [largely depending upon whether
> >> strings are implemented as sequences of octets (CMUCL, GCL, historically
> >> SBCL), 16-bit values (ABCL) or 32-bit values (CLISP, SBCL)].
> > 
> > What are you talking about?  It returns the number of array elements, 
> > regardless of their size.
> 
> Even with the same operating system and locale different implementations
> of ANSI Common Lisp cons up strings of different lengths when the input is
> identical. This means the position of corresponding characters is
> implementation dependent and the length of any resulting string is
> implementation dependent.
> 
> [In the table below I'm assuming ABCL has completed its Java string
> support so that Unicode characters are correctly read and stored in 16-bit
> strings. Input is UTF-8]
> 
>            STRING     CLISP/SBCL     ABCL       CMUCL/GCL
>              "A"	  1            1            1
>              "Ā"	  1            1            2
>              "?"	  1            1            3
>              "?"	  1            2            4

This looks like bugs to me.  All those strings should have length 1, 
since they just contain a single character.  LENGTH is supposed to count 
the number of characters, *not* the number of bytes.

Are you sure these are really strings you're creating, and not byte 
arrays that you're filling in by reading a file as binary?

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.15.06.51.50.776046@consulting.net.nz>
Hi Barry Margolin,

[Snipped because you replied with a content type of text/plain]

> This looks like bugs to me.  All those strings should have length 1, 
> since they just contain a single character.  LENGTH is supposed to count 
> the number of characters, *not* the number of bytes.

It is not necessarily a bug for a string to be stored internally using a
particular encoding, whether that encoding be UTF-8, UTF-16 or UTF-32. All
three encoding can be thought of as variable width encodings because none
can store all grapheme clusters within one "character":
<http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries>

You told me in your previous reply that length "returns the number of
array elements, regardless of their size." As ANSI Common Lisp doesn't
define characters to be of a particular size please tell me what the
correct internal encoding should be. Then I'll tell you that every other
implementation now has to traverse each string to determine its length,
and that length will not necessarily equal the number of array elements in
the string.

And if you define the official character size as 32-bit/enough to hold a
Unicode code point then you'll have to explain why an implementation that
implements characters as grapheme clusters is non-conforming with respect
to LENGTH and CHAR.

> Are you sure these are really strings you're creating, and not byte 
> arrays that you're filling in by reading a file as binary?

It all depends upon what an operating system or language defines a
character to be. A character in Java is a 16-bit unsigned value because it
is defined that way, not because it can hold all Unicode code points.
Because of this definition the length of an arbitrary string is the same
across implementations. ANSI Common Lisp doesn't define the size of a
character. Allegro for example corresponds with Java:
<http://www.franz.com/support/documentation/7.0/doc/iacl.htm#memory-usage-2>

Do you claim that the decision to store characters internally as 16-bit
values is non-conforming? Length and array references will differ from an
implementation with 32-bit characters. So who's right? Are they all wrong
and the only implementation capable of returning the correct answer is one
which implements strings as sequences of grapheme clusters (like the
Parrot virtual machine)?

I made a simple claim Barry: Since ANSI Common Lisp doesn't define the
size of a character the length of an arbitrary string will be
implementation specific. I am sure of this claim because no one has put
their foot down and told implementors, for better or worse, that
characters are a fixed size of n-bits or that characters must be handled
as grapheme clusters of variable size.

Regards,
Adam
From: Peter Seibel
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <m3k6rkjahc.fsf@javamonkey.com>
Adam Warner <······@consulting.net.nz> writes:

> I made a simple claim Barry: Since ANSI Common Lisp doesn't define
> the size of a character the length of an arbitrary string will be
> implementation specific. I am sure of this claim because no one has
> put their foot down and told implementors, for better or worse, that
> characters are a fixed size of n-bits or that characters must be
> handled as grapheme clusters of variable size.

Why should the size of characters have anything at all to do with the
length of strings? Strings are measured in characters so whether you
use 8 bits or 8 megs to represent each character should have nothing
to do with the value LENGTH returns when passed a string. In those
implementations that return some number greater than 1 for a
"one-character" string, what do they return for (char s 1) (char s 2)
and (char s 3)?

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.15.08.44.43.362497@consulting.net.nz>
Hi Peter Seibel,

> Why should the size of characters have anything at all to do with the
> length of strings? Strings are measured in characters so whether you use
> 8 bits or 8 megs to represent each character should have nothing to do
> with the value LENGTH returns when passed a string.

It's the translation from a defined external encoding to the implementation's
internal encoding which determines the internal length of the string. The
internal length may differ between implementations because the size of a
"character" unit differs between implementations.

> In those implementations that return some number greater than 1 for a
> "one-character" string, what do they return for (char s 1) (char s 2)
> and (char s 3)?

Let's take a common example: Java/Windows/.NET/any implementation with
16-bit strings: Strings are stored in UTF-16, with code points >= 2^16
stored as high and low surrogates: <http://www.unicode.org/glossary/#UTF_16>

In such implementations a code point in the range #x10000 to #x10FFFF has
a length of two. Here's are some tables setting out the translation:
<http://www.i18nguy.com/unicode/surrogatetable.html>

The 16-bit sequence #xD800 #xDC00 corresponds with the code point #x10000.
That's your (char s 0) and (char s 1) respectively. In a 32-bit character
implementation (char s 0) would be #x10000 and (char s 1) would be out of
range.

Regards,
Adam
From: Barry Margolin
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <barmar-1DDB30.07504415122004@comcast.dca.giganews.com>
In article <······························@consulting.net.nz>,
 Adam Warner <······@consulting.net.nz> wrote:

> Hi Peter Seibel,
> 
> > Why should the size of characters have anything at all to do with the
> > length of strings? Strings are measured in characters so whether you use
> > 8 bits or 8 megs to represent each character should have nothing to do
> > with the value LENGTH returns when passed a string.
> 
> It's the translation from a defined external encoding to the implementation's
> internal encoding which determines the internal length of the string. The
> internal length may differ between implementations because the size of a
> "character" unit differs between implementations.

But the internal representation is not supposed to be visible to the 
user.

Consider the following:

(defvar *array*)
(setq *array* (make-array 1))
(setf (aref *array* 0) (expt 2 255))
(print (length *array*))

Even though the array contains a bignum whose representation takes at 
least 32 bytes, this should print 1.

Lisp is a high-level language, and LENGTH is supposed to deal in the 
high-level concept of characters, not bytes.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
From: Harald Hanche-Olsen
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pco7jnjy59o.fsf@shuttle.math.ntnu.no>
+ Barry Margolin <······@alum.mit.edu>:

| Lisp is a high-level language, and LENGTH is supposed to deal in the
| high-level concept of characters, not bytes.

But if I have understood Adam Warner's point correctly, it is that
Unicode has /two different/ high-level concepts of characters, one at
a higher lever than the other: The higher level concept is that of
grapheme clusters, each of which contains one or more "ordinary"
unicode characters.

As far as I understand, end users should only ever see grapheme
clusters.  But Common Lisp programmers are hardly end users, and it is
not at all obvious (to me) if Lisp characters should correspond to
grapheme clusters or simple characters.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- Debating gives most of us much more psychological satisfaction
  than thinking does: but it deprives us of whatever chance there is
  of getting closer to the truth.  -- C.P. Snow
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87oegv6gu5.fsf@thalassa.informatimago.com>
Adam Warner <······@consulting.net.nz> writes:

> Hi Peter Seibel,
> 
> > Why should the size of characters have anything at all to do with the
> > length of strings? Strings are measured in characters so whether you use
> > 8 bits or 8 megs to represent each character should have nothing to do
> > with the value LENGTH returns when passed a string.
> 
> It's the translation from a defined external encoding to the implementation's
> internal encoding which determines the internal length of the string. The
> internal length may differ between implementations because the size of a
> "character" unit differs between implementations.
> 
> > In those implementations that return some number greater than 1 for a
> > "one-character" string, what do they return for (char s 1) (char s 2)
> > and (char s 3)?
> 
> Let's take a common example: Java/Windows/.NET/any implementation with
> 16-bit strings: Strings are stored in UTF-16, with code points >= 2^16
> stored as high and low surrogates: <http://www.unicode.org/glossary/#UTF_16>
> 
> In such implementations a code point in the range #x10000 to #x10FFFF has
> a length of two. Here's are some tables setting out the translation:
> <http://www.i18nguy.com/unicode/surrogatetable.html>
> 
> The 16-bit sequence #xD800 #xDC00 corresponds with the code point #x10000.
> That's your (char s 0) and (char s 1) respectively. In a 32-bit character
> implementation (char s 0) would be #x10000 and (char s 1) would be out of
> range.

You have to distinguish characters (code points) and codes (integers).

If you want to encode full unicode (ie, the 11000(hex) code points) in
16-bit, then use (vector (unsigned-byte 16)) and assume the
consequences (ie. (length vector-of-codes) is not the number of
characters, but the number of _codes_).

Otherwise, use a unicode-enabled lisp implementation, like clisp or
sbcl, put your unicode characters in a string and get the number of
character with (LENGTH string).

Trying to hold an encoded sequence into a lisp string is a cheap
kludge inherited from the bad C char==8-bit-integer mentality that
should not occur in lisp.

If you want to process unicode data in a lisp implementation that can
handle only iso-8859-1 characters, then you must not use the string
and character types, but only (vector (unsigned-byte 8)) for utf-8;
(vector (unsigned-byte 16)) for utf-16
and (vector (integer 0 #x10ffff)) for the full unicode.



-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Cats meow out of angst
"Thumbs! If only we had thumbs!
We could break so much!"
From: Peter Seibel
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <m3brcvjvmm.fsf@javamonkey.com>
Adam Warner <······@consulting.net.nz> writes:

> Hi Peter Seibel,
>
>> Why should the size of characters have anything at all to do with
>> the length of strings? Strings are measured in characters so
>> whether you use 8 bits or 8 megs to represent each character should
>> have nothing to do with the value LENGTH returns when passed a
>> string.
>
> It's the translation from a defined external encoding to the
> implementation's internal encoding which determines the internal
> length of the string. The internal length may differ between
> implementations because the size of a "character" unit differs
> between implementations.

But the "internal length" of the string has nothing to do with the
value of LENGTH, or should not. The LENGTH of a string is the number
of characters it contains.

>> In those implementations that return some number greater than 1 for
>> a "one-character" string, what do they return for (char s 1) (char
>> s 2) and (char s 3)?
>
> Let's take a common example: Java/Windows/.NET/any implementation
> with 16-bit strings: Strings are stored in UTF-16, with code points
> >= 2^16 stored as high and low surrogates:
> <http://www.unicode.org/glossary/#UTF_16>

Yes, I'm with you so far. That just means that LENGTH has to be
implemented in a smarter way--it has to scan the array of code-points
looking for surrogate pairs in order to determine how many characters
are in the string. (That Java's String.length() method doesn't do this
will no doubt cause no end of problems down the line.)

> In such implementations a code point in the range #x10000 to
> #x10FFFF has a length of two.

A representational length. But it's still one character. Or ought to
be. Java blew this one and is now suffering the consequences. Some
Common Lisp's may have taken the same approach but that seems wrong.

> Here's are some tables setting out the translation:
> <http://www.i18nguy.com/unicode/surrogatetable.html>
>
> The 16-bit sequence #xD800 #xDC00 corresponds with the code point
> #x10000. That's your (char s 0) and (char s 1) respectively.

But that can't be because CHAR returns a character and (assuming a
Unicode capable Lisp) there is no char with the char-code #xD800,
right? Now if you're trying to process Unicode strings in a Lisp that
doesn't actually support Unicode, I'm not entirely suprised that it
doesn't work. But you've got all kinds of problems there; you can't,
for instance, say (code-char #x10000).

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Ray Dillinger
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ar%Dd.803$m31.9833@typhoon.sonic.net>
Adam Warner wrote:

> Let's take a common example: Java/Windows/.NET/any implementation with
> 16-bit strings: Strings are stored in UTF-16, with code points >= 2^16
> stored as high and low surrogates: <http://www.unicode.org/glossary/#UTF_16>
> 
> In such implementations a code point in the range #x10000 to #x10FFFF has
> a length of two. Here's are some tables setting out the translation:
> <http://www.i18nguy.com/unicode/surrogatetable.html>

But that's dumb.  It confuses characters with codons, and relies
on others emulating that confusion for compatibility. Confusing
characters with codepoints is marginally acceptable depending on
your interpretation of the standard, but confusing characters
with codons is just plain wrong.

The fundamental issue is that people need to decide what 'length' is
supposed to measure and then stick with the decision; and my choice
is that it should measure characters - what the Unicode standard terms
"grapheme clusters."

All the other choices give rise to much hair; if you count codepoints,
then the length of the string changes on casing operations.  If you
count codons, you get different string-lengths depending on the
interaction of codepoints with the codon size in use in the encoding
scheme.

				Bear
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2005.01.09.02.03.36.154178@consulting.net.nz>
Hi Ray Dillinger,
On Sun, 09 Jan 2005 01:05:10 +0000, Ray Dillinger wrote:
> Adam Warner wrote:
> 
>> Let's take a common example: Java/Windows/.NET/any implementation with
>> 16-bit strings: Strings are stored in UTF-16, with code points >= 2^16
>> stored as high and low surrogates: <http://www.unicode.org/glossary/#UTF_16>
>> 
>> In such implementations a code point in the range #x10000 to #x10FFFF has
>> a length of two. Here's are some tables setting out the translation:
>> <http://www.i18nguy.com/unicode/surrogatetable.html>
> 
> But that's dumb.  It confuses characters with codons, and relies
> on others emulating that confusion for compatibility. Confusing
> characters with codepoints is marginally acceptable depending on
> your interpretation of the standard, but confusing characters
> with codons is just plain wrong.
> 
> The fundamental issue is that people need to decide what 'length' is
> supposed to measure and then stick with the decision; and my choice
> is that it should measure characters - what the Unicode standard terms
> "grapheme clusters."
> 
> All the other choices give rise to much hair; if you count codepoints,
> then the length of the string changes on casing operations.

As it does with grapheme clusters. Remember the uppercase of LATIN SMALL
LETTER SHARP S (ß) is SS, which is usually viewed as a pair of grapheme
clusters: LATIN CAPITAL LETTER S and LATIN CAPITAL LETTER S with a length
of two.

<http://www.unicode.org/Public/UNIDATA/CaseFolding.txt>

   The data supports both implementations that require simple case
   foldings (where string lengths don't change), and implementations that
   allow full case folding (where string lengths may grow). Note that
   where they can be supported, the full case foldings are superior: for
   example, they allow "MASSE" and "Maße" to match.

Full case folding requires that string lengths may grow.

Regards,
Adam
From: Ray Dillinger
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <_LFEd.1303$m31.13578@typhoon.sonic.net>
Adam Warner wrote:
> Hi Ray Dillinger,
> On Sun, 09 Jan 2005 01:05:10 +0000, Ray Dillinger wrote:


>>The fundamental issue is that people need to decide what 'length' is
>>supposed to measure and then stick with the decision; and my choice
>>is that it should measure characters - what the Unicode standard terms
>>"grapheme clusters."
>>
>>All the other choices give rise to much hair; if you count codepoints,
>>then the length of the string changes on casing operations.
> 
> 
> As it does with grapheme clusters. Remember the uppercase of LATIN SMALL
> LETTER SHARP S (ß) is SS, which is usually viewed as a pair of grapheme
> clusters: LATIN CAPITAL LETTER S and LATIN CAPITAL LETTER S with a length
> of two.

I'm aware of that, but it's exactly one case.  There are also some
non-canonical characters (ligatures) that change actual length on
casing.  But making strings out of grapheme clusters rather than
codepoints makes these occurrences rare instead of ridiculously
common.

But that's not the only hair we eliminate by counting grapheme clusters
instead of codepoints.  If you count grapheme clusters, then string
equivalence can be the relationship induced by character equivalence
over the length of a string.

  consider:

precomposed-character1 codepoint base-codepoint2 nonspacing-codepoint2
vs.
base-codepoint1 nonspacing-codepoint1 codepoint precomposed-character2

Here you have two "equivalent" strings (or a unicode violation if
you regard them as non-equivalent) of the same length, but if
characters are codepoints, no character in any position is the
same!  If characters are grapheme clusters, then all the characters
in the same locations are equal.

				Bear
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2005.01.11.02.33.08.737261@consulting.net.nz>
Hi Ray Dillinger,
On Tue, 11 Jan 2005 01:14:34 +0000, Ray Dillinger wrote:
>>>All the other choices give rise to much hair; if you count codepoints,
>>>then the length of the string changes on casing operations.
>> 
>> 
>> As it does with grapheme clusters. Remember the uppercase of LATIN SMALL
>> LETTER SHARP S (ß) is SS, which is usually viewed as a pair of grapheme
>> clusters: LATIN CAPITAL LETTER S and LATIN CAPITAL LETTER S with a length
>> of two.
> 
> I'm aware of that, but it's exactly one case.  There are also some
> non-canonical characters (ligatures) that change actual length on
> casing.  But making strings out of grapheme clusters rather than
> codepoints makes these occurrences rare instead of ridiculously
> common.

Rare cases still have to be handled correctly. Consider the impact upon
the Common Lisp NSTRING functions:

   nstring-upcase, nstring-downcase, and nstring-capitalize are identical
   to string-upcase, string-downcase, and string-capitalize respectively
   except that they modify string.

To support the rare cases all mutable strings also have to be adjustable.
This means all mutable strings should be implemented as objects that point
to another area of memory (so the memory allocated can be reallocated).
There's no clean shortcut to another level of indirection.

> But that's not the only hair we eliminate by counting grapheme clusters
> instead of codepoints.

Making the special cases more rare eliminates no hair. Defining strings to
be immutable and discarding the string mutation functions could eliminate
the above hair.

I see you've considered this issue extensively:
<http://groups.google.co.nz/groups?selm=3FB1CC83.A9EFEE54%40sonic.net&output=gplain>

We're unlikely to agree about the best way to handle this. I tend to
generalise special cases until they're no longer special (e.g. using
indirection) or define them away via a more constrained object definition
(e.g. immutability). I'd be unlikely to assign some Unicode code points
special meanings to handle these special cases.

> If you count grapheme clusters, then string equivalence can be the
> relationship induced by character equivalence over the length of a
> string.
> 
>   consider:
> 
> precomposed-character1 codepoint base-codepoint2 nonspacing-codepoint2
> vs.
> base-codepoint1 nonspacing-codepoint1 codepoint precomposed-character2
> 
> Here you have two "equivalent" strings (or a unicode violation if you
> regard them as non-equivalent) of the same length, but if characters are
> codepoints, no character in any position is the same!  If characters are
> grapheme clusters, then all the characters in the same locations are
> equal.

That's a nice example, thanks.

Regards,
Adam
From: Ray Dillinger
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <d05Fd.1534$m31.15946@typhoon.sonic.net>
Adam Warner wrote:

>>I'm aware of that, but it's exactly one case.  There are also some
>>non-canonical characters (ligatures) that change actual length on
>>casing.  But making strings out of grapheme clusters rather than
>>codepoints makes these occurrences rare instead of ridiculously
>>common.
> 
> 
> Rare cases still have to be handled correctly.

This is true, but even when handled correctly, annoying cases should
not call themselves to the attention of the programmer or the user
more frequently than necessary.

> Consider the impact upon
> the Common Lisp NSTRING functions:
> 
>    nstring-upcase, nstring-downcase, and nstring-capitalize are identical
>    to string-upcase, string-downcase, and string-capitalize respectively
>    except that they modify string.
> 
> To support the rare cases all mutable strings also have to be adjustable.
> This means all mutable strings should be implemented as objects that point
> to another area of memory (so the memory allocated can be reallocated).
> There's no clean shortcut to another level of indirection.

Right.  My strings are balanced trees of 'strands' 1000 characters long,
with copy-on-write semantics for the strands and all the tree nodes. It
is a necessary structure when strings get 2M long or so, and limits the
effect of length-changing mutations. to local copying instead of moving,
on average, half the string.

I didn't even consider immutable strings.  I would have to think very
carefully about how that would work from a programming perspective, but
initially, I think the splay-tree implementation would be needed anyway
so that a copy of a string could share structure with it.

> Making the special cases more rare eliminates no hair.

It eliminates no hair for the implementor, but it eliminates hair for
the vast majority of programmers; (those who use languages that don't
need eszett, in particular).

> We're unlikely to agree about the best way to handle this. I tend to
> generalise special cases until they're no longer special (e.g. using
> indirection) or define them away via a more constrained object definition
> (e.g. immutability). I'd be unlikely to assign some Unicode code points
> special meanings to handle these special cases.

It's been bothering me lately; I've been looking for a different way
to do it.  But I just can't get around the craziness of the Unicode
character set, no matter what; it seems that every other solution I've
considered is just as ugly or uglier.

I think that, for "reasonable" use, there *has* to be roundtripping
between upper and lower case.  Unicode, as written, breaks that.
How to deal with the situation?

				Bear
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87zmze6ab2.fsf@qrnik.zagroda>
Ray Dillinger <····@sonic.net> writes:

> I think that, for "reasonable" use, there *has* to be roundtripping
> between upper and lower case.  Unicode, as written, breaks that.
> How to deal with the situation?

Accept the fact that that there is no roundtripping: lowercase(str)
is not necessarily the same as lowercase(uppercase(str)).

Also, lowercase(append(str1, str2)) is not the same as
append(lowercase(str1), lowercase(str2)), because of greek sigma which
has distinct forms in lowercase (depending on whether it's at the end
of a word or not) but the same form in uppercase (encoded once). Case
mapping doesn't work character by character, even though it's close.

Declaring a widely used writing system as broken doesn't help: Germans
and Greeks won't change their orthographies just because a programming
language has rules incompatible with it. Programming languages must be
adapted to the writing systems rather than the other way around.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Florian Weimer
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87u0pmk8ty.fsf@deneb.enyo.de>
* Marcin Kowalczyk:

> Declaring a widely used writing system as broken doesn't help: Germans
> and Greeks won't change their orthographies just because a programming
> language has rules incompatible with it. Programming languages must be
> adapted to the writing systems rather than the other way around.

The ß -> SS -> ss conversion (which is fortunately *not* specific to
German locales!) could be addressed by including an SS in Unicode.
Much more annoying are conversion rules that depend on the locale, for
example ı -> I, and i -> İ in a Turkish locale (and only there).
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87d5wabrrq.fsf@qrnik.zagroda>
Florian Weimer <··@deneb.enyo.de> writes:

> The � -> SS -> ss conversion (which is fortunately *not* specific to
> German locales!) could be addressed by including an SS in Unicode.

IMHO it's unlikely.

> Much more annoying are conversion rules that depend on the locale,
> for example � -> I, and i -> � in a Turkish locale (and only there).

And Azeri, which since 1991 uses an alphabet based on Turkish.

There are also special rules for Lithuanian which retains the dot over i
when another accent is added over it. Thus in Lithuanian lowercasing
"I with grave" yields "i with dot above and grave", because "i with grave"
is rendered without the dot.
http://titus.uni-frankfurt.de/curric/gldv99/paper/tumason/tumasonx.pdf

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Ray Dillinger
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <qacFd.1555$m31.16339@typhoon.sonic.net>
Marcin 'Qrczak' Kowalczyk wrote:

> Declaring a widely used writing system as broken doesn't help: Germans
> and Greeks won't change their orthographies just because a programming
> language has rules incompatible with it. Programming languages must be
> adapted to the writing systems rather than the other way around.

Time will tell.  Remember what happened to american english more or
less as a result of our early limited printing technology and
predominating use of typewriters.

				Bear
From: Peter Seibel
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <m3pt0abjkv.fsf@javamonkey.com>
Ray Dillinger <····@sonic.net> writes:

> Marcin 'Qrczak' Kowalczyk wrote:
>
>> Declaring a widely used writing system as broken doesn't help: Germans
>> and Greeks won't change their orthographies just because a programming
>> language has rules incompatible with it. Programming languages must be
>> adapted to the writing systems rather than the other way around.
>
> Time will tell.  Remember what happened to american english more or
> less as a result of our early limited printing technology and
> predominating use of typewriters.

too bad they didn't leave the shift key off those typewriters. then we
could have avoided all these discussions, at least in the u.s. ;-)

-peter

-- 
peter seibel                                      ·····@javamonkey.com

         lisp is the red pill. -- john fraser, comp.lang.lisp
From: Karl A. Krueger
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <cs41eu$pu0$1@baldur.whoi.edu>
Peter Seibel <·····@javamonkey.com> wrote:
> Ray Dillinger <····@sonic.net> writes:
>> Marcin 'Qrczak' Kowalczyk wrote:
>>
>>> Declaring a widely used writing system as broken doesn't help: Germans
>>> and Greeks won't change their orthographies just because a programming
>>> language has rules incompatible with it. Programming languages must be
>>> adapted to the writing systems rather than the other way around.
>>
>> Time will tell.  Remember what happened to american english more or
>> less as a result of our early limited printing technology and
>> predominating use of typewriters.
> 
> too bad they didn't leave the shift key off those typewriters. then we
> could have avoided all these discussions, at least in the u.s. ;-)

It could be much worfe than that.

(Yes, I know the "long S" is not an F.  Ftill lookf like one.
Well, that or an integral sign.)

-- 
Karl A. Krueger <········@example.edu> { s/example/whoi/ }

Every program has at least one bug and can be shortened by at least one line.
By induction, every program can be reduced to one line which does not work.
From: Lars Brinkhoff
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <85is61std8.fsf@junk.nocrew.org>
Ray Dillinger <····@sonic.net> writes:
> Remember what happened to american english more or less as a result
> of our early limited printing technology and predominating use of
> typewriters.

I don't think I have any memory of that.  What happened?
From: Ray Dillinger
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <zorFd.1661$m31.19401@typhoon.sonic.net>
Lars Brinkhoff wrote:
> Ray Dillinger <····@sonic.net> writes:
> 
>>Remember what happened to american english more or less as a result
>>of our early limited printing technology and predominating use of
>>typewriters.
> 
> 
> I don't think I have any memory of that.  What happened?

People ceased to regard words without accents as misspelled,
and several 'variants' such as long-s that were used in
handwritten or calligraphed documents fell into disuse.

				Bear
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87sm55blsd.fsf@thalassa.informatimago.com>
Ray Dillinger <····@sonic.net> writes:

> Lars Brinkhoff wrote:
> > Ray Dillinger <····@sonic.net> writes:
> >
> >>Remember what happened to american english more or less as a result
> >>of our early limited printing technology and predominating use of
> >>typewriters.
> > I don't think I have any memory of that.  What happened?
> 
> People ceased to regard words without accents as misspelled,
> and several 'variants' such as long-s that were used in
> handwritten or calligraphed documents fell into disuse.

But that cannot come from limitations in printing technology: in other
European languages, we can print accented letters and still do.  Or do
you mean that American printing technology is less advanced?

So why were accents dropped from Old English?

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Cats meow out of angst
"Thumbs! If only we had thumbs!
We could break so much!"
From: Ray Dillinger
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <DySFd.1881$m31.22000@typhoon.sonic.net>
Pascal Bourguignon wrote:
> Ray Dillinger <····@sonic.net> writes:
> 
> 
>>Lars Brinkhoff wrote:
>>
>>>Ray Dillinger <····@sonic.net> writes:
>>>
>>>
>>>>Remember what happened to american english more or less as a result
>>>>of our early limited printing technology and predominating use of
>>>>typewriters.
>>>
>>>I don't think I have any memory of that.  What happened?
>>
>>People ceased to regard words without accents as misspelled,
>>and several 'variants' such as long-s that were used in
>>handwritten or calligraphed documents fell into disuse.
> 
> 
> But that cannot come from limitations in printing technology: in other
> European languages, we can print accented letters and still do.  Or do
> you mean that American printing technology is less advanced?
> 

Early American newspapers were usually a lot more 'basic' in terms
of typographic capabilities than their european counterparts.  There
weren't really type foundries in the US until the 1850's or so, so
newspaper typesetters usually had more limited sets of letters to
choose from.  About the time our typesetters achieved parity with
europe, we got typewriters, and typewriters (which at the time
couldn't do accents) caught on here a lot faster than in europe.

				Bear
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87hdljan5z.fsf@thalassa.informatimago.com>
Ray Dillinger <····@sonic.net> writes:

> Pascal Bourguignon wrote:
> > Ray Dillinger <····@sonic.net> writes:
> >
> >>Lars Brinkhoff wrote:
> >>
> >>>Ray Dillinger <····@sonic.net> writes:
> >>>
> >>>
> >>>>Remember what happened to american english more or less as a result
> >>>>of our early limited printing technology and predominating use of
> >>>>typewriters.
> >>>
> >>>I don't think I have any memory of that.  What happened?
> >>
> >>People ceased to regard words without accents as misspelled,
> >>and several 'variants' such as long-s that were used in
> >>handwritten or calligraphed documents fell into disuse.
> > But that cannot come from limitations in printing technology: in
> > other
> > European languages, we can print accented letters and still do.  Or do
> > you mean that American printing technology is less advanced?
> >
> 
> Early American newspapers were usually a lot more 'basic' in terms
> of typographic capabilities than their european counterparts.  There
> weren't really type foundries in the US until the 1850's or so, so
> newspaper typesetters usually had more limited sets of letters to
> choose from.  About the time our typesetters achieved parity with
> europe, we got typewriters, and typewriters (which at the time
> couldn't do accents) caught on here a lot faster than in europe.

But I think the loss of accent is much older than that.  I've not been
able to google anything precise, but from what I could read it seems
that they were lost in the transition from Old English to English
(that was before the printing press if I'm not wrong).

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
I need a new toy.
Tail of black dog keeps good time.
Pounce! Good dog! Good dog!
From: Rahul Jain
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87is5wfy29.fsf@nyct.net>
Pascal Bourguignon <····@mouse-potato.com> writes:

> But I think the loss of accent is much older than that.  I've not been
> able to google anything precise, but from what I could read it seems
> that they were lost in the transition from Old English to English
> (that was before the printing press if I'm not wrong).

But the words with accents are from French (mostly, unless it's a
dieresis), so they came in long after Middle English. IIANM, the Norman
invasion and the subsequent domination of French is what started Modern
English. Maybe you're talking about losing the Germanic umlaut, but
that's shared with Dutch, is it not? So that would indicate that it
happened even before English unless it's a case of convergent evolution.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Hannah Schroeter
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <csgneo$ad9$1@c3po.use.schlund.de>
Hello!

Rahul Jain  <·····@nyct.net> wrote:
>Pascal Bourguignon <····@mouse-potato.com> writes:

>> But I think the loss of accent is much older than that.  I've not been
>> able to google anything precise, but from what I could read it seems
>> that they were lost in the transition from Old English to English
>> (that was before the printing press if I'm not wrong).

>But the words with accents are from French (mostly, unless it's a
>dieresis), so they came in long after Middle English. IIANM, the Norman
>invasion and the subsequent domination of French is what started Modern
>English. Maybe you're talking about losing the Germanic umlaut, but
>that's shared with Dutch, is it not? So that would indicate that it
>happened even before English unless it's a case of convergent evolution.

Ehm. Umlaut (the phenomenon) is present in English too. Witness man,
plural men, for example.

Kind regards,

Hannah.
From: Christopher C. Stacy
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <u7jmh426d.fsf@news.dtpq.com>
Ray Dillinger <····@sonic.net> writes:

> Lars Brinkhoff wrote:
> > Ray Dillinger <····@sonic.net> writes:
> >
> >>Remember what happened to american english more or less as a result
> >>of our early limited printing technology and predominating use of
> >>typewriters.
> > I don't think I have any memory of that.  What happened?
> 
> People ceased to regard words without accents as misspelled,
> and several 'variants' such as long-s that were used in
> handwritten or calligraphed documents fell into disuse.

Did this really happen in America, and in the era of typewriters?
I don't remember nociting any of those artifacts on early documents
from the 1700s (although I wasn't really looking for them).
Is it possible that the changes were just a natural evolution?
And do we think that this was a bad thing, anyway?
From: Ray Dillinger
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <qAiJd.3481$m31.49847@typhoon.sonic.net>
Marcin 'Qrczak' Kowalczyk wrote:
> Ray Dillinger <····@sonic.net> writes:
> 
> 
>>I think that, for "reasonable" use, there *has* to be roundtripping
>>between upper and lower case.  Unicode, as written, breaks that.
>>How to deal with the situation?
> 
> 
> Accept the fact that that there is no roundtripping: lowercase(str)
> is not necessarily the same as lowercase(uppercase(str)).

This is a very strong argument for case-sensitivity in any
Unicode-based LISP.  If case-insensitivity can result in
symbol ambiguities, it cannot be tolerated in a programming
language.

If you want to preserve case-insensitive symbols, and you don't
have roundtripping, you have to specify whether the lowercase
or uppercase form determines symbol uniqueness.

If uppercase forms determine symbol uniqueness, ma�e and masse
are the same symbol, which is ugly; if lowercase form
determines symbol uniqueness, then any identifier containing
SS is illegal because its lowercase form is ambiguous.

The alternative is to exclude characters which are not the
preferred altercase image of their own preferred altercase
images from the legal set of characters of which ordinary
symbols can be made (ie, to insist on roundtripping in the
alphabet used for symbols). In this case ma�e is an illegal
symbol.

> Declaring a widely used writing system as broken doesn't help: Germans
> and Greeks won't change their orthographies just because a programming
> language has rules incompatible with it. Programming languages must be
> adapted to the writing systems rather than the other way around.

The fact that they won't change it does not prevent a declaration
that it is broken from being accurate. You're right though that
such a declaration, though perhaps accurate, is not helpful.

Final Sigma is a presentation form, not a letter; like ligatures
and many compatibility glyphs, it belongs in the rendering
engine, not the alphabet.  I think it should be simply converted
to sigma as it's read, output simply as sigma, and any printing
as final-sigma handled by the rendering engine if appropriate.

				Bear
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87brbem86y.fsf@thalassa.informatimago.com>
Ray Dillinger <····@sonic.net> writes:

> Marcin 'Qrczak' Kowalczyk wrote:
> > Ray Dillinger <····@sonic.net> writes:
> >
> >>I think that, for "reasonable" use, there *has* to be roundtripping
> >>between upper and lower case.  Unicode, as written, breaks that.
> >>How to deal with the situation?
> > Accept the fact that that there is no roundtripping: lowercase(str)
> > is not necessarily the same as lowercase(uppercase(str)).
> 
> This is a very strong argument for case-sensitivity in any
> Unicode-based LISP.  If case-insensitivity can result in
> symbol ambiguities, it cannot be tolerated in a programming
> language.

So I was right to proactively write COMMON-LISP symbols in my programs
in upper-case :-)  This way I won't have to change my sources when
implementations will take unicode as BASE-CHAR.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Nobody can fix the economy.  Nobody can be trusted with their finger
on the button.  Nobody's perfect.  VOTE FOR NOBODY.
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87is62egvb.fsf@thalassa.informatimago.com>
Ray Dillinger <····@sonic.net> writes:
> I think that, for "reasonable" use, there *has* to be roundtripping
> between upper and lower case.  Unicode, as written, breaks that.
> How to deal with the situation?

Unicode is a different beast.  Keep Common-Lisp string function behave
as specified, and add a UNICODE package with UNICODE:STRING-UPCASE
doing the right thing for �, and UNICODE:STRING-DOWNCASE checking the
language and doing a contextual dictionary lookup to downcase SS
correctly (unless it's out of the scope of unicode to downcase
correctly).  Don't forget to define a  UNICODE:AMBIGUOUS-LANGUAGE
condition for when an uppercase word exist in two languages with
different rules for downcasing...

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
You never feed me.
Perhaps I'll sleep on your face.
That will sure show you.
From: Rahul Jain
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87is60bvwv.fsf@nyct.net>
Pascal Bourguignon <····@mouse-potato.com> writes:

> Ray Dillinger <····@sonic.net> writes:
>> I think that, for "reasonable" use, there *has* to be roundtripping
>> between upper and lower case.  Unicode, as written, breaks that.
>> How to deal with the situation?
>
> Unicode is a different beast.  Keep Common-Lisp string function behave
> as specified, and add a UNICODE package with UNICODE:STRING-UPCASE
> doing the right thing for �, and UNICODE:STRING-DOWNCASE checking the
> language and doing a contextual dictionary lookup to downcase SS
> correctly (unless it's out of the scope of unicode to downcase
> correctly).  Don't forget to define a  UNICODE:AMBIGUOUS-LANGUAGE
> condition for when an uppercase word exist in two languages with
> different rules for downcasing...

I think this is a good idea and goes along with the way FORMAT is
specified: to assume American English. This is because the internals of
the language are not desgined for real general-purpose UI. They are
there to communicate with the programmer, who must understand American
English anyway in order to understand the meanings of many symbol names
in the language itself, to say nothing of reading the spec. I don't
think the ability to write Lisp code in a mixture of human languages is
something we need to go out of our way to support.

My gut feel is that unicode characters may be instances of CHARACTER,
but should be a distinct type from BASE-CHAR and unicode strings should
be a distinct type from STRING.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Duane Rettig
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <4is73uvko.fsf@franz.com>
Adam Warner <······@consulting.net.nz> writes:

> Hi Barry Margolin,

> You told me in your previous reply that length "returns the number of
> array elements, regardless of their size." As ANSI Common Lisp doesn't
> define characters to be of a particular size please tell me what the
> correct internal encoding should be. Then I'll tell you that every other
> implementation now has to traverse each string to determine its length,

This is exactly true.  And an implementation has a choice as to whether
to implement strings with a constant-width encoding, to make LENGTH
work efficiently, or to sacrifice LENGTH efficiency in order to use a
variable-width encoding.  Either way, LENGTH must work correctly, and
it is very simply defined on character count, independent of
its internal representation for strings.  Note that in a string where
the characters are of varying width, CHAR, AREF, and their setf inverses
also are no longer 

> and that length will not necessarily equal the number of array elements in
> the string.

This cannot be true, by definition.  A string is a vector of characters,
period.  If the characters are implemented in a variable-width manner, then
the elements themselves are of varying width, but still have the same count.

One could get around the need for more than 8. 16, or 32 bits to represent
some characters by defining all characters to be boxed values, instead of
immediates.  This wouldn't be efficient, but it would allow strings to be
implemented as lispval pointers to character boxed-objects.  But then that
would make the strings have fixed-width elements, wouldn't it? :-)

> And if you define the official character size as 32-bit/enough to hold a
> Unicode code point then you'll have to explain why an implementation that
> implements characters as grapheme clusters is non-conforming with respect
> to LENGTH and CHAR.

As has been stated elsewhere in this thread, Allegro CL implements strings
internally as fixed-width arrays of characters.  We provide versions of
the lisp that have 8-bit characters and 16-bit characters, but only one
in each lisp (it reduces type complexity and runtime discrimination
requirements).  All other encodings of strings are treated as external-formats,
and are handled by streams.  Since we use simple-streams to encode between
arrays of octets and arrays of characters, the translation from external
("native") to internal ("character", "string") is a simple matter of
using the external-format for the conversion.  I've lost track of how
many external-formats we provide, but the representations of currently
grapheme clusters, etc., could be simply a matter of writing an
external format for it, if not already available.

The length of an external ("native") string is given in terms of
octets; it is calculable via excl:native-string-sizeof, which
returns the number of octets in the string argument (which is
not a Lisp string, but an external representation of a string;
we call it "native" because it is presumably native to the
operating system hosting our lisp).  This function must indeed
traverse the string to figure out how many characters are in it.
But it is not the LENGTH function; it would be a nonconformance
to replace LENGTH with this function.

> > Are you sure these are really strings you're creating, and not byte 
> > arrays that you're filling in by reading a file as binary?
> 
> It all depends upon what an operating system or language defines a
> character to be. A character in Java is a 16-bit unsigned value because it
> is defined that way, not because it can hold all Unicode code points.
> Because of this definition the length of an arbitrary string is the same
> across implementations. ANSI Common Lisp doesn't define the size of a
> character. Allegro for example corresponds with Java:
> <http://www.franz.com/support/documentation/7.0/doc/iacl.htm#memory-usage-2>

Yes, internally, Allegro CL uses 16-bit characters for strings.  An
8-bit version exists as well, and a 32-bit version could conceivably
be made available if demand were high (so far, it is not).  The link
you quote is an explanation of the size increase when moving from an
8-bit representation to a 16-bit representation.  It has nothing to do
with the interaction we provide for the external world.

> Do you claim that the decision to store characters internally as 16-bit
> values is non-conforming? Length and array references will differ from an
> implementation with 32-bit characters. So who's right? Are they all wrong
> and the only implementation capable of returning the correct answer is one
> which implements strings as sequences of grapheme clusters (like the
> Parrot virtual machine)?

I think you misunderstand Barry, because you are not allowing for a
split between internal representations and external formats.  I didn't
see any such claim to nonconformance in his response.

> I made a simple claim Barry: Since ANSI Common Lisp doesn't define the
> size of a character the length of an arbitrary string will be
> implementation specific.

This claim is false, by definition, since length is specified in
terms of a count, and not in terms of widths in some other units
of measure.

> I am sure of this claim because no one has put
> their foot down and told implementors, for better or worse, that
> characters are a fixed size of n-bits or that characters must be handled
> as grapheme clusters of variable size.

The implementation decision is a choice, but the requirement to count
elements (i.e. characters) is not.  That seals the tradeoff consideration.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.16.00.18.18.272238@consulting.net.nz>
Hi Duane Rettig,

>> I made a simple claim Barry: Since ANSI Common Lisp doesn't define the
>> size of a character the length of an arbitrary string will be
>> implementation specific.
> 
> This claim is false, by definition, since length is specified in terms
> of a count, and not in terms of widths in some other units of measure.

Here is an arbitrary string encoded in UTF-8: "𐀀" [You may generate it
in CLISP using (string (code-char #x10000))]. It consists of a single code
point.

I expect (cl:length "𐀀") will NOT return 1 in a 16-bit character Allegro
yet it will return 1 in CLISP and SBCL. I expect:

(let ((s (copy-seq "𐀀")))
  (setf (char s 0) #\A)
  s)

will return additional garbage because it replaces the first half of a
surrogate character. I expect this will also be legal in Allegro but
not CLISP and SBCL:

(let ((s (copy-seq "𐀀")))
  (setf (char s 0) #\A)
  (setf (char s 1) #\B)
  s)

[I can only expect these things because I haven't licensed Allegro (and
telnetting into prompt.franz.com appears to be the 8-bit version)]

LENGTH currently returns implementation specific values since the values
it returns differ between some implementations for some identical external
strings in the same encoding. If the Lisp community can't accept the
present situation then it has to agree upon an internal encoding format
for characters, which is likely to be Unicode code points.

Regards,
Adam
From: Duane Rettig
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <4d5xbukcp.fsf@franz.com>
--=-=-=
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Adam Warner <······@consulting.net.nz> writes:

> Hi Duane Rettig,
> 
> >> I made a simple claim Barry: Since ANSI Common Lisp doesn't define the
> >> size of a character the length of an arbitrary string will be
> >> implementation specific.
> > 
> > This claim is false, by definition, since length is specified in terms
> > of a count, and not in terms of widths in some other units of measure.
> 
> Here is an arbitrary string encoded in UTF-8: "�
--=-=-=
Content-Transfer-Encoding: 8bit

���" [You may generate it
> in CLISP using (string (code-char #x10000))]. It consists of a single code
> point.

Right, but you should be checking the char-code-limit in each lisp
you are trying to use.  Some lisps will return nil for this expression.

--=-=-=
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit


> I expect (cl:length "�
--=-=-=
Content-Transfer-Encoding: 8bit

���") will NOT return 1 in a 16-bit character Allegro
> yet it will return 1 in CLISP and SBCL. I expect:

You're expectations are wrong:

--=-=-=
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit


CL-USER(1): (setq x
               (make-array 4 :element-type '(unsigned-byte 8)
                             :initial-contents '(#xf0 #x90 #x80 #x80)))
#(240 144 128 128)
CL-USER(2): (octets-to-string x)  ;; verify I got the right encoding
"�
--=-=-=
Content-Transfer-Encoding: 8bit

���"
4
4
CL-USER(3): (setq y (octets-to-string x :external-format :utf-8))
"?"
CL-USER(4): (length y)
1
CL-USER(5): (type-of y)
(SIMPLE-ARRAY CHARACTER (1))
CL-USER(6): 

The fact that the character overflows the 16-bit value doesn't change
the fact that the length is properly perserved.  As I said in my
previous response, we might offer a 32-bit-character lisp if the
demand is high enough, but so far it is not.

--=-=-=
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit


> (let ((s (copy-seq "�
--=-=-=
Content-Transfer-Encoding: 8bit

���")))
>   (setf (char s 0) #\A)
>   s)
> 
> will return additional garbage because it replaces the first half of a
> surrogate character. I expect this will also be legal in Allegro but
> not CLISP and SBCL:
> 

--=-=-=
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

> (let ((s (copy-seq "�
--=-=-=
Content-Transfer-Encoding: 8bit

���")))
>   (setf (char s 0) #\A)
>   (setf (char s 1) #\B)
>   s)

You're still dealing with an internal representation of a string
in the lisp you're running; you haven't yet understood the difference
between an external format and an internal string representation.
You are trying to assume that CMUCL and SBCL encode their strings
in utf-8 format, and that is evidentally not true.  What is the
actual length of the string you created in these lisps?  What are
the actual characters in the string?  It really didn't get translated
at all, did it?  :-)

> [I can only expect these things because I haven't licensed Allegro (and
> telnetting into prompt.franz.com appears to be the 8-bit version)]

That is not a good method of setting your expectations.

> LENGTH currently returns implementation specific values since the values
> it returns differ between some implementations for some identical external
> strings in the same encoding.

You have not yet proven this, and I have proven it false for at
least Allegro CL.

> If the Lisp community can't accept the
> present situation then it has to agree upon an internal encoding format
> for characters, which is likely to be Unicode code points.

therefore this doesn't follow.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   

--=-=-=--
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.16.02.11.52.360086@consulting.net.nz>
Hi Duane Rettig,

>> (let ((s (copy-seq "𐀀")))
>>   (setf (char s 0) #\A)
>>   s)
>> 
>> will return additional garbage because it replaces the first half of a
>> surrogate character. I expect this will also be legal in Allegro but
>> not CLISP and SBCL:
>> 
>> (let ((s (copy-seq "𐀀")))
>>   (setf (char s 0) #\A)
>>   (setf (char s 1) #\B)
>>   s)
> 
> You're still dealing with an internal representation of a string
> in the lisp you're running; you haven't yet understood the difference
> between an external format and an internal string representation.

Yes I have. The string above is an encoding of a single code point with
the code #x10000. It just so happens to have a length of 4 octets in
my current locale, UTF-8, a locale that a Unicode Lisp should understand.
To the Lisp user you should be presenting a single character with
CHAR-CODE of #x10000. If you can not there is no consistency between
Unicode 3.1+ implementations of ANSI Common Lisp.

(If you did not see the string correctly there is probably an issue with
your NNTP client. Gnome has amazing Unicode support and will display the
numeric code point for a Unicode code point without a corresponding glyph.
My posts display a string with a boxed [010]
                                       [000] value, which is a very good
indication that I'm encoding this discussion correctly. The Content-Type
of your reply is bizarre. It should be something like text/plain;
charset=UTF-8 (or UTF-16, etc.) Yours its multipart/mixed;
boundary="=-=-=". Nowhere does it specify the encoding.

If (char-code (char "𐀀" 0)) is not 65536 then there will be
inconsistent results between ANSI Common Lisp Unicode string
implementations.

I am impressed by your demonstration of Allegro correctly handling the
length of a string with a supplementary character ["The fact that the
character overflows the 16-bit value doesn't change the fact that the
length is properly perserved."] Will you also please confirm that SETF
CHAR correctly handles the destructive modification of a supplementary
character with a Basic Multilingual Plane character (and vice versa)
within your internal string representation. If you do correctly handle
this how the heck did you achieve it? As you cannot fit a supplementary
character within the space allocated to a BMP character you would have to
cons up a new string with SETF CHAR.

Regards,
Adam
From: Duane Rettig
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <4u0qmzleo.fsf@franz.com>
[As Adam noticed and I explain below, the encoding that ended up
being sent out was unspecified, and I've been informed by a colleague
that it crashed two of his X servers.  To anyone else that my article
caused the same crash, I apologize.  My colleage was able to display
with Windows, and I always use Redhat and gnus in xemacs, so I know
that at least those two should work.  In this message I have elided
all characters that various X displays might interpret as control
characters.  Again, I apologize for any crashes I have caused.]

Adam Warner <······@consulting.net.nz> writes:

> Hi Duane Rettig,
> 
> > You're still dealing with an internal representation of a string
> > in the lisp you're running; you haven't yet understood the difference
> > between an external format and an internal string representation.
> 
> Yes I have. The string above is an encoding of a single code point with
> the code #x10000.

No, it is only an encoding if your X display interprets it that way.
But what you actually sent to the lisp you were working with was
4 characters (not 4 octets) with char-code values of #xf0, #x90,
#x80, and #x80 respectively.  If you inspect this string you are making
in whatever lisp you are using that you claim is nonconforming, you will
see that it consists of four _characters_ (and _not_ octets).  I repeat,
you have failed to understand the difference between external and
internal representation. But before you become upset with this seemingly
argumentative response, read further...

> It just so happens to have a length of 4 octets in
> my current locale, UTF-8, a locale that a Unicode Lisp should understand.

No, Ansi Common Lisp does not define "locale", and makes no requirement
that any conforming implementation support it.

> To the Lisp user you should be presenting a single character with
> CHAR-CODE of #x10000. If you can not there is no consistency between
> Unicode 3.1+ implementations of ANSI Common Lisp.

Ansi Common Lisp makes no requirement that char-code-limit be any larger
than 96.  Thus, it explicitly allows conforming lisps not to support
Unicode.

> (If you did not see the string correctly there is probably an issue with
> your NNTP client. Gnome has amazing Unicode support and will display the
> numeric code point for a Unicode code point without a corresponding glyph.
> My posts display a string with a boxed [010]
>                                        [000] value, which is a very good
> indication that I'm encoding this discussion correctly.

No, it is only an indication that Gnome is fooling you into thinking you
are seeing 1 character.  After making the string in the "nonconforming"
implementation of your choice, switch locales, and see what the lisp
shows you.  You will see four characters, possibly nearly unprintable.
Or, f your window system still insists on interpreting the four characters,
do a (dotimes (i 4) (describe (char s i))) to see that the string truly is
the four characters that you are insisting on is one.

On the 16-bit Allegro CL, the characters show up as

CL-USER(1): (code-char #xf0)
#\latin_small_letter_eth
CL-USER(2): (code-char #x90)
#\%^p
CL-USER(3): (code-char #x80)
#\%null
CL-USER(4): (code-char #x80)
#\%null
CL-USER(5): 

I.e. they are latin-1 characters each of whose high bit happens
to have been set.

So why does it appear that you are looking at a single character?  Well,
with your locale set to utf-8, any characters that are larger than
7 bits are going to be interpreted by your window system as if it
is a utf-8 encoding.  This is as you might expect.  However, what you've
just done was to take a string of (internal) characters from an 8-bit
lisp, and displayed them with an output device that only provides the
lower 7-bits as one-to-one mappings with the lower 7 bits of an 8-bit
latin-1 encoding, but it misinterprets the 8th bit.  So what should have
been interpreted as an 8-bit code of #xf0 was misinterpreted as the
flag byte for the full 4-octet encoding of a unicode character.  In
short, you thought you were dealing with an internal representation,
but your window manager was doing an extra external-format conversion
for you!

It is a tribute to utf-8 encoding style that 7-bit ascii translates
without change.  Also, the bottom half of latin-1 is mapped onto
7-bit ascii, so for all of these systems, you can get away with a utf-8
locale to display them all.  Once you get above 7-bits, though, you
must match locales with the intended formats.

> The Content-Type
> of your reply is bizarre. It should be something like text/plain;
> charset=UTF-8 (or UTF-16, etc.) Yours its multipart/mixed;
> boundary="=-=-=". Nowhere does it specify the encoding.

Yes.  And for those of you for whom I crashed their X servers,
I apologize.  I was trying to give you printed representations
of characters with no encodings, as seen by the lisps themselves.
Apparently it didn't work as I had expected.

> If (char-code (char [ ... ] 0)) is not 65536 then there will be
======================^^^^^^^  <== My editing
> inconsistent results between ANSI Common Lisp Unicode string
> implementations.

For 7 bit lisps, char-code-limit is likely to be 128.  For 8-bit
lisps, that limit is likely 256.  For 16-bit lisps, the limit is
likely to be 65536.  For any of these lisps, if you try (code-char N)
where N > char-code-limit, then you are writing nonportable code.

> I am impressed by your demonstration of Allegro correctly handling the
> length of a string with a supplementary character ["The fact that the
> character overflows the 16-bit value doesn't change the fact that the
> length is properly perserved."]

Thank you.  It is a question of making sure the utf-8 external-format
gets 32-bit values right (i.e. it correctly rejects them because they
are larger than the char-code-limit).

>  Will you also please confirm that SETF
> CHAR correctly handles the destructive modification of a supplementary
> character with a Basic Multilingual Plane character (and vice versa)
> within your internal string representation.

You're barking up the wrong tree.  I will confirm this:

CL-USER(1): char-code-limit
65536
CL-USER(2): (code-char 65536)
NIL
CL-USER(3): 

which is correct behavior.  See
http://www.franz.com/support/documentation/7.0/ansicl/dictentr/code-cha.htm
and
http://www.franz.com/support/documentation/7.0/ansicl/dictentr/char-cod.htm


> If you do correctly handle
> this how the heck did you achieve it? As you cannot fit a supplementary
> character within the space allocated to a BMP character you would have to
> cons up a new string with SETF CHAR.

It is incorrect to assume that correct handling of external-formats too
large to fit into single-character spaces imply that characters are thus
created that are too large to fit into a character's space in a string.
The specification of Ansi creates a consistent framework from which
this can be implemented correctly.  _Please_ read and understand
char-code-limit.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Mario S. Mommer
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <fzsm662yrr.fsf@germany.igpm.rwth-aachen.de>
Duane Rettig <·····@franz.com> writes:
> [As Adam noticed and I explain below, the encoding that ended up
> being sent out was unspecified, and I've been informed by a colleague
> that it crashed two of his X servers.  To anyone else that my article
> caused the same crash, I apologize.  My colleage was able to display
> with Windows, and I always use Redhat and gnus in xemacs, so I know
> that at least those two should work.  In this message I have elided
> all characters that various X displays might interpret as control
> characters.  Again, I apologize for any crashes I have caused.]

I'd say that the crashes were caused by bad software on their end. If
you can crash X servers with a simple usenet post, then something is
rotten.
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.16.11.30.49.161769@consulting.net.nz>
Hi Duane Rettig,

Many thanks for the thoughtful reply.

>> It just so happens to have a length of 4 octets in
>> my current locale, UTF-8, a locale that a Unicode Lisp should understand.
> 
> No, Ansi Common Lisp does not define "locale", and makes no requirement
> that any conforming implementation support it.
> 
>> To the Lisp user you should be presenting a single character with
>> CHAR-CODE of #x10000. If you can not there is no consistency between
>> Unicode 3.1+ implementations of ANSI Common Lisp.
> 
> Ansi Common Lisp makes no requirement that char-code-limit be any larger
> than 96.  Thus, it explicitly allows conforming lisps not to support
> Unicode.

Of course. But we're discussing the parameters of what makes a conforming
Lisp implementation _also_ conforming with Unicode. If you're not claiming
that Allegro fully supports Unicode 3.1+ then there's no live issue
(because differing semantics are expected). However I don't think you're
claiming this. See below for what I suspect may describe your position.

[big snip]

>> If (char-code (char [ ... ] 0)) is not 65536 then there will be
> ======================^^^^^^^  <== My editing
>> inconsistent results between ANSI Common Lisp Unicode string
>> implementations.
> 
> For 7 bit lisps, char-code-limit is likely to be 128.  For 8-bit
> lisps, that limit is likely 256.  For 16-bit lisps, the limit is
> likely to be 65536.  For any of these lisps, if you try (code-char N)
> where N > char-code-limit, then you are writing nonportable code.

[...]

>>  Will you also please confirm that SETF
>> CHAR correctly handles the destructive modification of a supplementary
>> character with a Basic Multilingual Plane character (and vice versa)
>> within your internal string representation.
> 
> You're barking up the wrong tree.  I will confirm this:
> 
> CL-USER(1): char-code-limit
> 65536
> CL-USER(2): (code-char 65536)
> NIL
> CL-USER(3): 
> 
> which is correct behavior.  See
> http://www.franz.com/support/documentation/7.0/ansicl/dictentr/code-cha.htm
> and
> http://www.franz.com/support/documentation/7.0/ansicl/dictentr/char-cod.htm

OK, you've demonstrated that "no such character [with code 65536] exists
and one cannot be created, [so] nil is returned." You therefore don't have
an ANSI defined Common Lisp _character_ interface to Unicode supplementary
code points. You can encode them in strings. You just can't represent them
as characters (and therefore one can't, for example, LOOP ACROSS a string
and expect to have a supplementary character of-type CHARACTER returned).

But this doesn't necessarily mean Allegro doesn't fully support the latest
Unicode standard because fully supporting Unicode is an extension to ANSI
Common Lisp. According to this interpretation an implementation is free to
choose any character code limit so long as internally strings can encode
Unicode code points and extensions are provided to, e.g., access those
code points.

Unfortunately this interpretation makes Unicode support vendor specific
and potentially subject to vendor lock in (I know this is furthest from
your mind and you've already raised the issue of making a "32-bit" version
of Allegro CL available, subject to customer demand).

It's unlikely to be in the interests of users to have fragmented Unicode
support when the ANSI standard defines a way to support all Unicode code
points via #\ notation, CODE-CHAR, CHAR-CODE, CHAR, CHAR-CODE-LIMIT, etc.
But so much else is already non-standard in Common Lisp that it would be
just another pity.

I hope we've reached a mutually acceptable understanding.

Regards,
Adam
From: Duane Rettig
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <4vfb2f8mz.fsf@franz.com>
Adam Warner <······@consulting.net.nz> writes:

> Hi Duane Rettig,
> 
> Many thanks for the thoughtful reply.

You're welcome.  I think we're close to coming to terms.  Just a couple
more isses.

> OK, you've demonstrated that "no such character [with code 65536] exists
> and one cannot be created, [so] nil is returned." You therefore don't have
> an ANSI defined Common Lisp _character_ interface to Unicode supplementary
=====^^^^^^^^^^^^^^^^^^^^^^^^
> code points. You can encode them in strings. You just can't represent them
> as characters (and therefore one can't, for example, LOOP ACROSS a string
> and expect to have a supplementary character of-type CHARACTER returned).
> 
> But this doesn't necessarily mean Allegro doesn't fully support the latest
> Unicode standard because fully supporting Unicode is an extension to ANSI
> Common Lisp. According to this interpretation an implementation is free to
> choose any character code limit so long as internally strings can encode
> Unicode code points and extensions are provided to, e.g., access those
> code points.
> 
> Unfortunately this interpretation makes Unicode support vendor specific
> and potentially subject to vendor lock in (I know this is furthest from
> your mind and you've already raised the issue of making a "32-bit" version
> of Allegro CL available, subject to customer demand).
> 
> It's unlikely to be in the interests of users to have fragmented Unicode
> support when the ANSI standard defines a way to support all Unicode code
> points via #\ notation, CODE-CHAR, CHAR-CODE, CHAR, CHAR-CODE-LIMIT, etc.
> But so much else is already non-standard in Common Lisp that it would be
===================^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> just another pity.

Two issues:

 1. The flagged lines indicate that you don't understand the ANSI Common Lisp
standard, or are at least not referring to it correctly.  The second
flagged line is simply wrong.  Common Lisp is standard by definition.
Also, as indicated by the first flagged line, You are talking about an
ANSI-defined Common Lisp character interface.  There are two possibilities
here that I can think of:

 a. You actually believe that Ansi Common Lisp defines character interfaces.
    This is incorrect.  Ansi Common Lisp goes to great lengths to allow
    any character interfaces.
 b. You are coming purely from a Unicode perspective, and talking about
    Common Lisp not supporting the full Unicode standard.  If so, it is
    slightly peculiar for you to talk about ANSI definitions in a Unicode
    context; Unicode is its own organization, and has ties to other
    standards organizations, but mostly to ISO, and these two organizations
    have agreement to follow each other closely - ISO tends to follow
    Unicode via its ISO/IEC 10646 standard.  Why mention ANSI from a
    Unicode/ISO perspective?
 c. Something else...?


2. If there were no negatives to moving to 32-bit Unicode, we would
jump at it.  However, although most people believe that the size issue
has been solved by the availability of cheaper and larger memory and
disk space, and by the presence of 64-bit systems (and by the presence
of our lisp on those systems), there are still many customers for
whom any size increase in underlying implementation is fatal; they
may tend to work primarily with strings, or they may push or plan to
push the limit of their memory or address-space resources, and increasing
the size of characters would push them over that limit.  This is true
both in the large-program domain, where there is never enough memory,
and in the tiny-program domain, where Lisp has traditionally been at
a disadvantage for being so "well-endowed".  It is also for this reason
that we provide the 8-bit version of our lisp, and we have customers
who still use it and plan never to move off of it.  "... a unique
number for every character ..." might be a nice mantra, but not
every programmer is of that faith.

> I hope we've reached a mutually acceptable understanding.

I hope so, too.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87vfb33w35.fsf@thalassa.informatimago.com>
Adam Warner <······@consulting.net.nz> writes:

> Hi Duane Rettig,
> 
> >> I made a simple claim Barry: Since ANSI Common Lisp doesn't define the
> >> size of a character the length of an arbitrary string will be
> >> implementation specific.
> > 
> > This claim is false, by definition, since length is specified in terms
> > of a count, and not in terms of widths in some other units of measure.
> 
> Here is an arbitrary string encoded in UTF-8: "�€" [You
> may generate it in CLISP using (string (code-char #x10000))]. It
> consists of a single code point.

No. You have to specify an external format, you cannot generate it jus
with (string (code-char #x10000)).  For example in my case, it gives
this error:
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.16.01.17.04.895582@consulting.net.nz>
Hi Pascal Bourguignon,

>> (let ((s (copy-seq "𐀀")))
>>   (setf (char s 0) #\A)
>>   s)
> 
> You are abusing strings, using them to store _codes_ instead of
> characters.  This cannot be portable Common Lisp.
> 
> 
> All this subject is silly, it's like asking that (length "SGVsbG8K") 
> returns 5 because (to-base64 "Hello") returns "SGVsbG8K".

Please stop this nonsense! We are discussing Unicode encoding that
consists of code points in the range 0 to #x10FFFF. This is one of the
Universal conceptions of a "character." The code #x10000 is a single code
point in Unicode. `A' is a single code point in Unicode with code 65. By
destructively modifying a string with a single code point by inserting a
new single code point you should end up with a string of a single code
point. This isn't abusing strings. This is the whole point of Unicode
strings at implementation level 2:
<http://www.unicode.org/faq/char_combmark.html#7>

Please stop spreading misinformation and claiming this subject is silly
until you have a some idea of what you're talking about.

Regards,
Adam
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.16.11.51.50.958450@consulting.net.nz>
Hi Pascal Bourguignon,

> You are abusing strings, using them to store _codes_ instead of
> characters.  This cannot be portable Common Lisp.
> 
> 
> All this subject is silly, it's like asking that (length "SGVsbG8K") 
> returns 5 because (to-base64 "Hello") returns "SGVsbG8K".

I suspect my reply offended you just as much as your claims offended me. I
hope we can put these differences aside and proceed anew. I think you may
believe I'm abusing strings because I keep using the term "code point".
This is the technical term for what a computer programmer would consider a
"character". I was really working with characters above. I just use the
term code point to distinguish a character from its unit encoding and to
distinguish a character from its grapheme cluster.

<http://www.unicode.org/glossary/#C>

   Character. (1) The smallest component of written language that has
   semantic value; refers to the abstract meaning and/or shape, rather
   than a specific shape (see also glyph), though in code tables some form
   of visual representation is essential for the reader’s understanding.
   (2) Synonym for abstract character. (3) The basic unit of encoding for
   the Unicode character encoding. (4) The English name for the
   ideographic written elements of Chinese origin. (See  ideograph (2).)

   Code Point. Any value in the Unicode codespace; that is, the range of
   integers from 0 to 10FFFFh. (See definition D4b in Section 3.4,
   Characters and Encoding.)

Using the term "Code Point" is more precise than discussing the storing of
characters even though it may give the impression I'm abusing strings to
store codes instead of characters.

Regards,
Adam
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87brct4zym.fsf@thalassa.informatimago.com>
Adam Warner <······@consulting.net.nz> writes:

> Hi Pascal Bourguignon,
> 
> > You are abusing strings, using them to store _codes_ instead of
> > characters.  This cannot be portable Common Lisp.
> > 
> > 
> > All this subject is silly, it's like asking that (length "SGVsbG8K") 
> > returns 5 because (to-base64 "Hello") returns "SGVsbG8K".
> 
> I suspect my reply offended you just as much as your claims offended me. I
> hope we can put these differences aside and proceed anew. I think you may
> believe I'm abusing strings because I keep using the term "code point".
> This is the technical term for what a computer programmer would consider a
> "character". I was really working with characters above. I just use the
> term code point to distinguish a character from its unit encoding and to
> distinguish a character from its grapheme cluster.

No offense. I guess the problem was that my gnus setting showed your
single character unicode string as a string of four 8-bit characters
on my screen, so I assumed you wanted a 4-character string to be of
LENGTH 1.

  
> <http://www.unicode.org/glossary/#C>
> 
>    Character.
>    (3) The basic unit of encoding for the Unicode character encoding.

I guess this unicode-specific meaning is at cross with Common Lisp
definition of a character. Common Lisp use external bits
(:EXTERNAL-FORMAT) or small integers for encodings (CHAR-CODE/CODE-CHAR) 
and Common Lisp CHARACTER correspond to unicode Code Points.


>    Code Point. Any value in the Unicode codespace; that is, the range of
>    integers from 0 to 10FFFFh. (See definition D4b in Section 3.4,
>    Characters and Encoding.)
> 
> Using the term "Code Point" is more precise than discussing the storing of
> characters even though it may give the impression I'm abusing strings to
> store codes instead of characters.
> 
> Regards,
> Adam

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Cats meow out of angst
"Thumbs! If only we had thumbs!
We could break so much!"
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <B8CdnWETuM3jc1zcRVn-1g@golden.net>
Adam Warner wrote:
> Here is an arbitrary string encoded in UTF-8:

Hilarious. Every time someone quoted this article, I got a different 
character. Adam's original showed up as a question mark, Pascal's as a 
diamond, and Duane's was, well, interesting.
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <877jnh7sw4.fsf@qrnik.zagroda>
Cameron MacKinnon <··········@clearspot.net> writes:

> Hilarious. Every time someone quoted this article, I got a different
> character. Adam's original showed up as a question mark, Pascal's as
> a diamond, and Duane's was, well, interesting.

I didn't saw a correct character because of poor Unicode support in
Emacs Lisp.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <874qil7sut.fsf@qrnik.zagroda>
Cameron MacKinnon <··········@clearspot.net> writes:

> Hilarious. Every time someone quoted this article, I got a different
> character. Adam's original showed up as a question mark, Pascal's as
> a diamond, and Duane's was, well, interesting.

I didn't see a correct character because of poor Unicode support in
Emacs Lisp.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Thomas A. Russ
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ymid5xb598q.fsf@sevak.isi.edu>
Adam Warner <······@consulting.net.nz> writes:

> Look at the character predicates:
> 
>                             characterp
>                           alpha-char-p
>                           digit-char-p
>                         graphic-char-p
>                        standard-char-p

Actually there is a principle at work here.  The prinicple is that
function names that are single words take "p" as the suffix, whereas
those that are multiple hyphenated words take "-p".

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Julian Stecklina
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <86oegwuxdv.fsf@goldenaxe.localnet>
Adam Warner <······@consulting.net.nz> writes:
> This is the only suitable readtable option that maintains case information
> because the ANSI Common Lisp committee decided backwards compatibility
> with traditional uppercasing Lisps was most important. The decision hasn't
> stood the test of time. If they'd made a better choice the pain of
> transition would have been long over.
>
> Readtable case should be deprecated. Symbols should be interned as
> written in source code and implementors should not have the burden of
> implementing "historical" baggage that is difficult to get 100% right
> (e.g. ABCL is continuing to squash :INVERT mode read and print errors).

What pain is it to have symbols be converted to upcase by default? Do
you want a case-sensitive Lisp? Do you want Read, read and READ to be
three distinct symbols?

> Note that the ANSI Common Lisp specification is considered sacrosanct and
> these comments heretical.

:)

Regards,
-- 
                    ____________________________
 Julian Stecklina  /  _________________________/
  ________________/  /
  \_________________/  LISP - truly beautiful
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <H8ednVhHbYzYWyLcRVn-vg@golden.net>
Julian Stecklina wrote:
> What pain is it to have symbols be converted to upcase by default? Do
> you want a case-sensitive Lisp? Do you want Read, read and READ to be
> three distinct symbols?

Of course. Who wouldn't?

Are there people out there who use the case insensitivity of symbols to 
advantage in their own code? No. Are there teams of programmers working 
on projects where each programmer uses his own capitalization style? 
Possibly, but we should not errant ones.

Are there programmers who would like to aesthetically improve their code 
(by their standards, not mine) or encode more information into their 
symbols via selective capitalization? Yes. They should not be made to 
feel that their choice is in any way unnatural or discouraged.

Or are you paranoid that one day YOU WILL BE STUCK IN AN ELECTRONIC 
JUNKYARD IN A THIRD WORLD METROPOLIS, AND YOUR ONLY CONNECTION TO YOUR 
LISP IMAGE WILL BE THROUGH A TERMINAL THAT DOES NOT SUPPORT MIXED CASE?
From: jayessay
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <m3fz278j9a.fsf@rigel.goldenthreadtech.com>
Cameron MacKinnon <··········@clearspot.net> writes:

> Julian Stecklina wrote:
> > What pain is it to have symbols be converted to upcase by default? Do
> > you want a case-sensitive Lisp? Do you want Read, read and READ to be
> > three distinct symbols?
> 
> Of course. Who wouldn't?

A lot of people.  Myself included.  For symbols with such string-equal
names as the one indicated, case sensitivity is a broken mode.

> Are there programmers who would like to aesthetically improve their
> code (by their standards, not mine) or encode more information into
> their symbols via selective capitalization? Yes

But this is irrelevant (as you should understand).


The mode mechanism as provided in CLISP is much more on track.


/Jon

-- 
'j' - a n t h o n y at romeo/charley/november com
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <GsednetAXIMfEl3cRVn-sA@golden.net>
jayessay wrote:
> Cameron MacKinnon <··········@clearspot.net> writes:
> 
> 
>>Julian Stecklina wrote:
>>
>>>What pain is it to have symbols be converted to upcase by default? Do
>>>you want a case-sensitive Lisp? Do you want Read, read and READ to be
>>>three distinct symbols?
>>
>>Of course. Who wouldn't?
> 
> 
> A lot of people.  Myself included.  For symbols with such string-equal
> names as the one indicated, case sensitivity is a broken mode.

Yes, it is. Do you have a body of code for which this is a problem? I 
don't think there are any such codebases out there, whose owners 
wouldn't a) admit that the random capitalization is unintentional cruft 
which ought, ideally, to be more uniform and b) be able to fix it in 
minutes with a quickie script written for the purpose.
From: jayessay
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <m31xdq87vn.fsf@rigel.goldenthreadtech.com>
Cameron MacKinnon <··········@clearspot.net> writes:

> jayessay wrote:
> > Cameron MacKinnon <··········@clearspot.net> writes:
> >
> >>Julian Stecklina wrote:
> >>
> >>>What pain is it to have symbols be converted to upcase by default? Do
> >>>you want a case-sensitive Lisp? Do you want Read, read and READ to be
> >>>three distinct symbols?
> >>
> >>Of course. Who wouldn't?
> > A lot of people.  Myself included.  For symbols with such
> > string-equal
> > names as the one indicated, case sensitivity is a broken mode.
> 
> Yes, it is. Do you have a body of code for which this is a problem?

I don't know, which is part of the point.

> I don't think there are any such codebases out there, whose owners
> wouldn't a) admit that the random capitalization is unintentional
> cruft which ought, ideally, to be more uniform and b) be able to fix
> it in minutes with a quickie script written for the purpose.

Yes, but each of these is irrelevant to the point that case
sensitivity here is wrong.


/Jon

-- 
'j' - a n t h o n y at romeo/charley/november com
From: Thomas A. Russ
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ymibrcv58vr.fsf@sevak.isi.edu>
Cameron MacKinnon <··········@clearspot.net> writes:

> Are there programmers who would like to aesthetically improve their code 
> (by their standards, not mine) or encode more information into their 
> symbols via selective capitalization? Yes. They should not be made to 
> feel that their choice is in any way unnatural or discouraged.

Of course they should.  Their choice is evil and will lead to bad things
happening.  Cats and dogs cohabitating and the end of the world.

Distinguishing symbol names just based on case is IMNSHO a really bad
thing to do, and programming languages SHOULD discourage it.  The
problem is that it leads to various subtle bugs that are hard to
visually pick up when one has to deal with the fact, that, for example
the two symbols  subclassOf and subClassOf are actually different.

One can perhaps make a weak case that all uppercase is visually
different enough to be distinguished from mixed and lowercase, but that
is a much more arcane restriction.  If you look at the direction of
"modern" file systems (Windows and Macintosh), you will find that they
are case-preserving but case-insensitive.  That seems to be to be a
reasonable approach to the issue, since the case of the letters is
generally just too subtle a distinguishing item.

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Ed Symanzik
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <cpqps2$2t3g$1@msunews.cl.msu.edu>
Thomas A. Russ wrote:
>  Cats and dogs cohabitating and the end of the world.

The end is near:

Utah Town Ends Law Barring Cat-Dog Cohabitation
http://www.local6.com/family/3984840/detail.html
From: David Steuber
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <873by6jbc1.fsf@david-steuber.com>
Ed Symanzik <···@msu.edu> writes:

> Thomas A. Russ wrote:
> >  Cats and dogs cohabitating and the end of the world.
> 
> The end is near:
> 
> Utah Town Ends Law Barring Cat-Dog Cohabitation
> http://www.local6.com/family/3984840/detail.html

Mass hysteria ensues.

-- 
An ideal world is left as an excercise to the reader.
   --- Paul Graham, On Lisp 8.1
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87llbzro5z.fsf@qrnik.zagroda>
···@sevak.isi.edu (Thomas A. Russ) writes:

> Distinguishing symbol names just based on case is IMNSHO a really bad
> thing to do, and programming languages SHOULD discourage it.

I disagree.

Distinguishing function names from other variable names just based on
the position in a function application expression is IMNSHO a really
bad thing to do, and programming languages SHOULD discourage it.

It would be better to distinguish them by case. Or distinguish local
variables from global variables by case (this often coincides). This
is more readable than distinguishing by position in an expression.

> If you look at the direction of "modern" file systems (Windows
> and Macintosh), you will find that they are case-preserving but
> case-insensitive.

But modern Unix file systems are case-sensitive.

> That seems to be to be a reasonable approach to the issue, since the
> case of the letters is generally just too subtle a distinguishing item.

It's more important to have unambiguous rules. Case mapping in Unicode
is non-trivial: context-dependent (not character -> character but
string -> string) and locale-dependent. Common Lisp already gets this
wrong by insisting that uppercasing a string is the same as uppercasing
each of its characters separately.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Svein Ove Aas
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <cproi1$2mc$1@services.kq.no>
begin  quoting Marcin 'Qrczak' Kowalczyk :

>> If you look at the direction of "modern" file systems (Windows
>> and Macintosh), you will find that they are case-preserving but
>> case-insensitive.
> 
> But modern Unix file systems are case-sensitive.
> 
There is no such thing as a modern unix filesystem in this respect; they're
still based on the original filesystem semantics, with only very small
changes.

It isn't case-sensitive because that is the right thing to do, or even a
good thing to do; it's case-sensitive because that was the *easiest* thing
to do, as with many other things in unix.
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87fz264i3r.fsf@thalassa.informatimago.com>
Svein Ove Aas <·········@aas.no> writes:

> begin  quoting Marcin 'Qrczak' Kowalczyk :
> 
> >> If you look at the direction of "modern" file systems (Windows
> >> and Macintosh), you will find that they are case-preserving but
> >> case-insensitive.
> > 
> > But modern Unix file systems are case-sensitive.
> > 
> There is no such thing as a modern unix filesystem in this respect; they're
> still based on the original filesystem semantics, with only very small
> changes.
> 
> It isn't case-sensitive because that is the right thing to do, or even a
> good thing to do; it's case-sensitive because that was the *easiest* thing
> to do, as with many other things in unix.

I still think case sensitivity is the best thing to do, the more so
with unicode, because of the impossibility to uppercase/downcase
consistently letters like �.  It's better to leave the case alone.

Anyways, what's wrong in file systems is not the case sensitivity,
it's the file system itself (for the _users_).  Instead of a strict
hierarchical organization, _users_ want a bag of _unnamed_ files, and
will retrive them by icon position, by subject or project, or by color.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Cats meow out of angst
"Thumbs! If only we had thumbs!
We could break so much!"
From: Raffael Cavallaro
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <2004122021215016807%raffaelcavallaro@pasdespamsilvousplaitdotmaccom>
On 2004-12-15 21:07:36 -0500, Marcin 'Qrczak' Kowalczyk 
<······@knm.org.pl> said:

> modern Unix file systems


Isn't this really for backward compatibility?
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87is6wysub.fsf@qrnik.zagroda>
Raffael Cavallaro <················@pas-d'espam-s'il-vous-plait-dot-mac.com> writes:

>> modern Unix file systems
>
> Isn't this really for backward compatibility?

Ypu can say the same about Windows and Mac.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Raffael Cavallaro
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <2004122200475275249%raffaelcavallaro@pasdespamsilvousplaitdotmaccom>
On 2004-12-21 03:10:36 -0500, Marcin 'Qrczak' Kowalczyk 
<······@knm.org.pl> said:

> Ypu can say the same about Windows and Mac.

Not really. Both were developed after the first Unix file systems. The 
implementors of Mac OS and Windows had the Unix example available and 
explicitly rejected it.

Mac OS X is a good counterexample. It has both a Unix command line, and 
the Mac OX X GUI available. They could have made the whole system 
conform to the Unix way (case sensitive), but they (I think wisely) 
chose to make the GUI layer that most users see, and thus, file system 
naming conventions, case-preserving but case-insensitive.

The reason is obvious - it is less confusing to users.
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87wtvad1l2.fsf@thalassa.informatimago.com>
Raffael Cavallaro <················@pas-d'espam-s'il-vous-plait-dot-mac.com> writes:

> On 2004-12-21 03:10:36 -0500, Marcin 'Qrczak' Kowalczyk
> <······@knm.org.pl> said:
> 
> > Ypu can say the same about Windows and Mac.
> 
> Not really. Both were developed after the first Unix file systems. The
> implementors of Mac OS and Windows had the Unix example available and
> explicitly rejected it.
> 
> Mac OS X is a good counterexample. It has both a Unix command line,
> and the Mac OX X GUI available. They could have made the whole system
> conform to the Unix way (case sensitive), but they (I think wisely)
> chose to make the GUI layer that most users see, and thus, file system
> naming conventions, case-preserving but case-insensitive.
> 
> The reason is obvious - it is less confusing to users.

Please, explain me the purpose of case-insensitivity on a system where
you select the files by clicking on icons instead of typing names?

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
You never feed me.
Perhaps I'll sleep on your face.
That will sure show you.
From: Edi Weitz
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <u7jna7ezx.fsf@agharta.de>
On 22 Dec 2004 12:17:13 +0100, Pascal Bourguignon <····@mouse-potato.com> wrote:

> Please, explain me the purpose of case-insensitivity on a system
> where you select the files by clicking on icons instead of typing
> names?

How you think users name new files, e.g. when saving documents?  By
clicking on icons?

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87pt12d0q1.fsf@thalassa.informatimago.com>
Edi Weitz <········@agharta.de> writes:

> On 22 Dec 2004 12:17:13 +0100, Pascal Bourguignon <····@mouse-potato.com> wrote:
> 
> > Please, explain me the purpose of case-insensitivity on a system
> > where you select the files by clicking on icons instead of typing
> > names?
> 
> How you think users name new files, e.g. when saving documents?  By
> clicking on icons?

You mean your users have files with names other than "Untitled"?

You're complaining about case-sensitivity, but what about these _true_
names:

    "Compte rendu brasilia2.doc"
    "  Compte rendu brasilia2.doc"
    "Compte rendu  BRASILIA2.doc"

Do you think it was really useful to ask the user to type a name?
She had no difficulty in point-and-clicking the icons though.

Let's do a little experiment. Take the latest sheets of paper you wrote
with a pen: did you write a "document name" on these sheets? 
If you put them in a folder, did you write a "folder name" on these folder?


-- 
__Pascal_Bourguignon__               _  Software patents are endangering
()  ASCII ribbon against html email (o_ the computer industry all around
/\  1962:DO20I=1.100                //\ the world http://lpf.ai.mit.edu/
    2001:my($f)=`fortune`;          V_/   http://petition.eurolinux.org/
From: David Golden
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <USeyd.44392$Z14.21111@news.indigo.ie>
Pascal Bourguignon wrote:

> 
> Let's do a little experiment. Take the latest sheets of paper you
> wrote with a pen: did you write a "document name" on these sheets?
> If you put them in a folder, did you write a "folder name" on these
> folder?
> 
> 

Not the OP. But : Yes, DUH! Surely everyone's taught to do that
in primary school? "Descriptive title, author name and date..."

Then again, I have often been surprised at poor trivial educational
standards elsewhere, so maybe many people aren't so taught.
From: Edi Weitz
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <u1xdiv2vz.fsf@agharta.de>
On 22 Dec 2004 12:35:50 +0100, Pascal Bourguignon <····@mouse-potato.com> wrote:

> You're complaining about case-sensitivity

Where exactly did I do that?

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87d5x2cr2z.fsf@thalassa.informatimago.com>
Edi Weitz <········@agharta.de> writes:

> On 22 Dec 2004 12:35:50 +0100, Pascal Bourguignon <····@mouse-potato.com> wrote:
> 
> > You're complaining about case-sensitivity
> 
> Where exactly did I do that?

Sorry, not you.  Somebody else prefers case-insensitivity.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The rule for today:
Touch my tail, I shred your hand.
New rule tomorrow.
From: Raffael Cavallaro
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <2004122214075616807%raffaelcavallaro@pasdespamsilvousplaitdotmaccom>
On 2004-12-22 06:17:13 -0500, Pascal Bourguignon <····@mouse-potato.com> said:

> Please, explain me the purpose of case-insensitivity on a system where
> you select the files by clicking on icons instead of typing names?

You have a wildly caricaturish notion of how users interact with a GUI shell.

As has already been pointed out, files are *named* by typing, not 
clicking on icons. If, for example, Mac OS X were case sensitive, then 
a user could name two files thus on succeeding weeks:

Friday receipts

friday receipts

On the following monday, much confusion would ensue if this user opened 
one, but saw the contents of the other. Worse still, if this user 
opened one, expecting the other, and then edited it and saved the 
result. This would be data loss, the cardinal sin of a GUI.

Instead this confusion is prevented thus:

'The name "friday receipts" is already taken. Please choose another one.'

          ^^^^^^  note lower case f, even though the existing file has 
a capital F

Mac OS X and Windows have opted for case-preserving but 
case-insensitive because it avoids just this sort of confusion, a 
confusion which often leads to data loss.


regards,

Ralph
From: Karl A. Krueger
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <cqc89l$nt8$1@baldur.whoi.edu>
Raffael Cavallaro <················@pas-d'espam-s'il-vous-plait-dot-mac.com> wrote:
> Mac OS X is a good counterexample. It has both a Unix command line, and 
> the Mac OX X GUI available. They could have made the whole system 
> conform to the Unix way (case sensitive), but they (I think wisely) 
> chose to make the GUI layer that most users see, and thus, file system 
> naming conventions, case-preserving but case-insensitive.

It's not quite that simple.  OS X has two different filesystem types:
HFS+, a variation of the old Mac "Hierarchical File System"; and UFS,
the "Unix File System".

HFS+ volumes are case-insensitive but case-preserving:  you can't have
files "Foo" and "FOO" in the same directory, and if you have "Foo",
opening "FOO" gets you the same file.  UFS volumes are case-sensitive
like any other Unix filesystem.

Unix shell globbing and so forth is case-sensitive even on HFS+ volumes:
if you have a file "Foo", matching "*O*" will NOT match it.

The Finder is wise to the difference:  If you have files named "FOO" and
"foo" in a UFS volume and you select both of them and try to copy them
to a folder on an HFS+ volume, you get an error.

-- 
Karl A. Krueger <········@example.edu> { s/example/whoi/ }

Every program has at least one bug and can be shortened by at least one line.
By induction, every program can be reduced to one line which does not work.
From: Raffael Cavallaro
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <2004122214551375249%raffaelcavallaro@pasdespamsilvousplaitdotmaccom>
On 2004-12-22 11:44:39 -0500, "Karl A. Krueger" <········@example.edu> said:

> Raffael Cavallaro 
> <················@pas-d'espam-s'il-vous-plait-dot-mac.com> wrote:
>> Mac OS X is a good counterexample. It has both a Unix command line, and 
>> the Mac OX X GUI available. They could have made the whole system 
>> conform to the Unix way (case sensitive), but they (I think wisely) 
>> chose to make the GUI layer that most users see, and thus, file system 
>> naming conventions, case-preserving but case-insensitive.
> 
> It's not quite that simple.  OS X has two different filesystem types:
> HFS+, a variation of the old Mac "Hierarchical File System"; and UFS,
> the "Unix File System".

I'm aware of that, but note that Mac OS X machines ship with HFS+ 
volumes by default - UFS is for power users only, who, presumably, know 
what they are doing.

> 
> HFS+ volumes are case-insensitive but case-preserving:  you can't have
> files "Foo" and "FOO" in the same directory, and if you have "Foo",
> opening "FOO" gets you the same file.  UFS volumes are case-sensitive
> like any other Unix filesystem.

Yes, but I was talking about HFS+, which is the only sort of volume 
that the overwhelming majority of Mac OS X users will have in their 
machine. (They'll undoubtedly access Unix server volumes on the net, 
but they won't be naming files on these network volumes).

> 
> Unix shell globbing and so forth is case-sensitive even on HFS+ volumes:
> if you have a file "Foo", matching "*O*" will NOT match it.
> 
> The Finder is wise to the difference:  If you have files named "FOO" and
> "foo" in a UFS volume and you select both of them and try to copy them
> to a folder on an HFS+ volume, you get an error.

Not just the Finder. Unlike shell globbing, certain command line 
operations do treat 'foo' and 'Foo' as the same file on HFS+, a 
potential source of error for Unix adepts on the Mac OS X command line:

raffaelc$ cat > foo <<EOF
> this will get clobbered
> EOF

raffaelc$ cat > FOO <<EOF
> unwittingly overwritten
> EOF

raffaelc$ cat foo
unwittingly overwritten

So our first file has been overwritten, because 'foo' and 'FOO' (as 
well as 'Foo' 'fOo' etc.) are the same file name on HFS+

Yes, they're different on UFS, but GUI-only users (again, the 
overwhelming majority of Mac OS X users) are unlikely to be naming 
files on UFS volumes. Command line users are, presumably, aware of the 
pitfalls.

The choice to make HFS+ on Mac OS X case-preserving but 
case-insensitive is clearly to accommodate the expectations of GUI 
users, not Unix power users. Case-preserving but case-insensitive 
reduces potential confusion by eliminating the possibility of having 
two separate files that differ only by the capitalization of a single 
letter.

regards

Ralph
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <873bxychs3.fsf@thalassa.informatimago.com>
"Karl A. Krueger" <········@example.edu> writes:

> Raffael Cavallaro <················@pas-d'espam-s'il-vous-plait-dot-mac.com> wrote:
> > Mac OS X is a good counterexample. It has both a Unix command line, and 
> > the Mac OX X GUI available. They could have made the whole system 
> > conform to the Unix way (case sensitive), but they (I think wisely) 
> > chose to make the GUI layer that most users see, and thus, file system 
> > naming conventions, case-preserving but case-insensitive.
> 
> It's not quite that simple.  OS X has two different filesystem types:
> HFS+, a variation of the old Mac "Hierarchical File System"; and UFS,
> the "Unix File System".
> 
> HFS+ volumes are case-insensitive but case-preserving:  you can't have
> files "Foo" and "FOO" in the same directory, and if you have "Foo",
> opening "FOO" gets you the same file.  UFS volumes are case-sensitive
> like any other Unix filesystem.

 
> Unix shell globbing and so forth is case-sensitive even on HFS+ volumes:
> if you have a file "Foo", matching "*O*" will NOT match it.

I'm not sure.  At least, I happened to write once: LS -L
and got the expected output.


> The Finder is wise to the difference:  If you have files named "FOO" and
> "foo" in a UFS volume and you select both of them and try to copy them
> to a folder on an HFS+ volume, you get an error.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/IT d? s++:++(+++)>++ a C+++  UB+++L++++$S+X++++>$ P- L+++ E++ W++
N++ o-- K- w------ O- M++$ V PS+E++ Y++ PGP++ t+ 5? X+ R !tv b++(+)
DI+++ D++ G++ e+++ h+(++) r? y---? UF++++
------END GEEK CODE BLOCK------
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87d5x2gos4.fsf@qrnik.zagroda>
Pascal Bourguignon <····@mouse-potato.com> writes:

>> Unix shell globbing and so forth is case-sensitive even on HFS+ volumes:
>> if you have a file "Foo", matching "*O*" will NOT match it.
>
> I'm not sure.  At least, I happened to write once: LS -L
> and got the expected output.

······@ppc-osx2:~$ ls /sw/bin/GREP
/sw/bin/GREP
······@ppc-osx2:~$ ls /sw/bin/GR*
ls: /sw/bin/GR*: No such file or directory

It works in the way I would expect from a case-insensitive volume used
by libaries which assume case-sensitive matching. Accessing a file by
an explicitly given name is done by the OS, case-insensitively, and
wildcard expansion happens in userspace by filtering the list of
received file names. The same happens when Linux is accessing FAT.
Yes, mixing conventions is confusing.

-l and -L have different meanings for ls, no matter what the
 filesystem is.

Do Windows and Mac use the same rules of case mapping? If not,
the confusion can affect filenames passed across that border.
A case-insensitive filesystem looked innocent when filenames were
all ASCII.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <XtmdnRX1dPpqd1zcRVn-sQ@golden.net>
Thomas A. Russ wrote:

> Distinguishing symbol names just based on case is IMNSHO a really bad
> thing to do, and programming languages SHOULD discourage it.  The
> problem is that it leads to various subtle bugs that are hard to
> visually pick up when one has to deal with the fact, that, for example
> the two symbols  subclassOf and subClassOf are actually different.

By advocating a case-insensitive language, you force subtle writers to 
interface with tin-eared listeners. It's an autistic machine interface; 
one that gets the big picture, but doesn't understand the subtleties 
which the writer introduced into his prose, and which human readers do 
understand.

In no field outside of computers are there case-insensitive languages 
which leave capitalization as a completely arbitrary yet 
information-free choice of the writer. The posting to which I am 
replying contains various examples of information encoded with selective 
capitalization, for example.

Typographers, whose job it is to make text look good so it can quickly 
and delightfully convey information, do not see the choice between 
majiscules and miniscules as an arbitrary one, and they use it to great 
effect. Your belief that the distinction is slight and should be ignored 
by our machines belies centuries of typographic wisdom.

> One can perhaps make a weak case that all uppercase is visually
> different enough to be distinguished from mixed and lowercase, but that
> is a much more arcane restriction.  If you look at the direction of
> "modern" file systems (Windows and Macintosh), you will find that they
> are case-preserving but case-insensitive.  That seems to be to be a
> reasonable approach to the issue, since the case of the letters is
> generally just too subtle a distinguishing item.

File systems' designs have a lot more to do with being compatible with 
other operating systems, past and present, than with user desires. 
Besides which, users don't mind if their expression of ideas is somewhat 
limited in the file's name, it is simply a tag which is consulted 
briefly as an index. Users know that their ideas can be given full 
expression inside the files, not in their names.

Case insensitivity in our computer systems has everything to do with 
baudot five bit codes, low resolution displays and nine pin dot matrix 
printers. In that austere and harsh environment, ALL CAPS really was the 
most legible and economical choice. Not anymore.


A few comments regarding your subclassOf and subClassOf example:

I easily notice capitalization errors when I'm reading prose. If a coder 
really did have code where both subclassOf and subClassOf were defined 
and in scope, well, how far are you willing to go to save him from 
himself? The benefits to allowing writers more style in their works 
outweigh, I think, the risks that bad writers will use the tools to 
create eyesores -- bad writers will find ways of writing bad code no 
matter how much you constrain them, but good writers can only surpass 
mediocrity with good, case sensitive tools.

(HELP, MY LISP READER KEEPS SHOUTING AT ME)  :-)
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <8765314zeu.fsf@thalassa.informatimago.com>
Cameron MacKinnon <··········@clearspot.net> writes:

> Thomas A. Russ wrote:
> 
> > Distinguishing symbol names just based on case is IMNSHO a really bad
> > thing to do, and programming languages SHOULD discourage it.  The
> > problem is that it leads to various subtle bugs that are hard to
> > visually pick up when one has to deal with the fact, that, for example
> > the two symbols  subclassOf and subClassOf are actually different.
> 
> By advocating a case-insensitive language, you force subtle writers to
> interface with tin-eared listeners. It's an autistic machine
> interface; one that gets the big picture, but doesn't understand the
> subtleties which the writer introduced into his prose, and which human
> readers do understand.

I'm all in favor of case sensibility. I've never lost a file on case
sensitive  file system, but I did on case insensitive file system,
overidding some File with some other file.

Anyway, what users want really is not case insensitivity, it's
phonetic spelling of file names.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Cats meow out of angst
"Thumbs! If only we had thumbs!
We could break so much!"
From: Kenny Tilton
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <rDuwd.29714$ld2.12785593@twister.nyc.rr.com>
Cameron MacKinnon wrote:
> Thomas A. Russ wrote:
> 
>> Distinguishing symbol names just based on case is IMNSHO a really bad
>> thing to do, and programming languages SHOULD discourage it.  The
>> problem is that it leads to various subtle bugs that are hard to
>> visually pick up when one has to deal with the fact, that, for example
>> the two symbols  subclassOf and subClassOf are actually different.
> 
> 
> By advocating a case-insensitive language, you force subtle writers to 
> interface with tin-eared listeners. It's an autistic machine interface; 
> one that gets the big picture, but doesn't understand the subtleties 
> which the writer introduced into his prose, and which human readers do 
> understand.
> 
> In no field outside of computers are there case-insensitive languages 
> which leave capitalization as a completely arbitrary yet 
> information-free choice of the writer. The posting to which I am 
> replying contains various examples of information encoded with selective 
> capitalization, for example.
> 
> Typographers, whose job it is to make text look good so it can quickly 
> and delightfully convey information, do not see the choice between 
> majiscules and miniscules as an arbitrary one, and they use it to great 
> effect. Your belief that the distinction is slight and should be ignored 
> by our machines belies centuries of typographic wisdom.

I must say, that was a beautiful speech. But your correspondent offered 
subclassOf vs. subClassOf, while you offered nothing. subclassOf vs. 
subClassOf is silly because no two different things or functions could 
ever end up with those names. which is the point:

what two different things could be named correctly with names differing 
only in the case of one or more letters?

Safely assuming the null set to be your response...why introduce 
gratuitous case sensitivity? Remember, some few of us are actually 
trying to get some programming done.

kt


-- 
Cells? Cello? Celtik?: http://www.common-lisp.net/project/cells/
Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film
From: Thomas F. Burdick
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <xcv4qikkcgd.fsf@conquest.OCF.Berkeley.EDU>
Kenny Tilton <·······@nyc.rr.com> writes:

> I must say, that was a beautiful speech. But your correspondent offered 
> subclassOf vs. subClassOf, while you offered nothing. subclassOf vs. 
> subClassOf is silly because no two different things or functions could 
> ever end up with those names. which is the point:

This example is silly because it's the same as subclass-of vs
sub-class-of.  Yet we still use hyphens.

> what two different things could be named correctly with names differing 
> only in the case of one or more letters?

At one point in my Lisping, I had taken to writing the names of
Special-Variables like that, and lexical-variables like that.
CONSTANTS were in all caps.  Really, it's the same namespacing that we
use *stars* and +pluses+ for.  So a legitamate example could be:

  (defun make-foo-printer (some-context &key (output-stream Output-Stream))
    (lambda (foo)
      (let ((Output-Stream output-stream))
        (print-foo foo some-context))))

I later switched back to using the normal conventions, but I think the
above style could be fine in a non-CL Lisp.  

> Safely assuming the null set to be your response...why introduce 
> gratuitous case sensitivity? Remember, some few of us are actually 
> trying to get some programming done.

Now and then it's not gratuitous.  But the important thing is, it
doesn't need to be introduced, it's already there.  And it's easy to
turn on on a per-file basis, because LOAD binds *READTABLE*.  With a
simple USE-READTABLE macro that expands to an eval-when around a SETF
of *READTABLE*, you can start your files like this:

  (in-package :my-project)
  (use-readtable *objc-readtable*)

Inconsistent typists Like kenny can go ahead and use the default
readtable, and quote the names of |methodsInObjectiveC| with pipes.
Of course, one has to be careful editing a file authored by someone
else, but that's always the case.
From: Björn Lindberg
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <hcsoegs8zn1.fsf@my.nada.kth.se>
···@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> Kenny Tilton <·······@nyc.rr.com> writes:
> 
> > I must say, that was a beautiful speech. But your correspondent offered 
> > subclassOf vs. subClassOf, while you offered nothing. subclassOf vs. 
> > subClassOf is silly because no two different things or functions could 
> > ever end up with those names. which is the point:
> 
> This example is silly because it's the same as subclass-of vs
> sub-class-of.  Yet we still use hyphens.

Possibly. But for me at least, I have never had any problems
remembering the hyphenation of lisp symbols whereas I have had trouble
remembering the captialization of words in Java. I will refrain from
extrapolating this experience to the whole of humanity though. :-)


Bj�rn
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87brctfe9p.fsf@qrnik.zagroda>
Kenny Tilton <·······@nyc.rr.com> writes:

> what two different things could be named correctly with names
> differing only in the case of one or more letters?

For example in my language Kogut I distinguish local names,
global type names and other global names by case convention.
There can be names list, LIST and List in one scope.

Lisp distinguishes its meanings of "list" by context of usage.
I prefer my rules: no need for an equivalent of funcall/apply or #'.

Now let's flip the question. What name in a case-insensitive language
could be spelled correctly with inconsistent capitalization (forgetting
about uppercase Lisp names which are usually spelled with lowercase,
because this would be solved differently if we designed rules today)?

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <uYydnccFEOsttF7cRVn-pQ@golden.net>
Kenny Tilton wrote:
> 
> I must say, that was a beautiful speech. But your correspondent offered 
> subclassOf vs. subClassOf, while you offered nothing. subclassOf vs. 
> subClassOf is silly because no two different things or functions could 
> ever end up with those names. which is the point:
> 
> what two different things could be named correctly with names differing 
> only in the case of one or more letters?

I'm no poet. Let me try to explain with the example of A and a.

They're not the same, though their names suggest a relation. Maybe A is 
a function that operates on a, and the coder didn't want to write Fa, 
oops, f-a.

You're asking me to justify case sensitivity by coming up with an 
example that will look better than the traditional long-words-and-dashes 
style, even to someone with your blighted sensibilities. This hardly 
seems fair, as you've learned to love your tools as they are.

CL's case rules are a bizarre anachronism that could only be the result 
of an incomplete, suspended evolution from a world with no lowercase. No 
committee would have dreamed them up in the absence of legacy and 
interoperability requirements -- and I'm not given to underestimating 
the stupidities that committees can produce. Given that, why in the heck 
should the onus be on me to prove that a more anthropomorphic approach 
is the correct one?

> Safely assuming the null set to be your response...why introduce 
> gratuitous case sensitivity? Remember, some few of us are actually 
> trying to get some programming done.

Gratuitous is the wrong word here. If anything, the current rules 
instruct the computer to gratuitously ignore distinctions that the 
programmer attempts to make.

Your old code won't break unless you were in the habit of gratuitous and 
inconsistent uppercasing when you wrote it. And, in the tradition of 
computer languages which strive for compatibility with earlier 
incarnations of themselves, a readtable-case setting for the current 
regime would continue to be available.
From: Karl A. Krueger
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <cpvvgp$j34$1@baldur.whoi.edu>
Cameron MacKinnon <··········@clearspot.net> wrote:
> CL's case rules are a bizarre anachronism that could only be the result 
> of an incomplete, suspended evolution from a world with no lowercase.

I don't know.  There seem to be other factors:

Some texts use case differences to distinguish user input from Lisp
output.  FOO and foo are the "same thing", but the former appears when
Lisp is returning FOO to you, and the latter when you are typing foo in
to Lisp.  A typographic distinction here is useful -- for a similar
idea, see Iain Ferguson's _The Schemer's Guide_, which uses black text
for unevaluated expressions and red for evaluated ones.

In most natural languages, case distinctions do not change what is being
talked about -- they convey differences in emphasis or sometiems
grammar, but not in substance.  A GREEN FROG and a green frog are the
same amphibian.  Indeed, students when learning case-sensitive languages
such as C often run into difficulties (albeit brief ones) in coming to
terms with case sensitivity ... and "Is your Caps Lock on?" is the first
question a sysadmin asks when a user complains that their password is
not accepted.

So, while case sensitivity seems to be considered "modern" (though, with
my growing skepticism of my own Unix background I have to wonder if that
means "Unixy") there might be some distinct advantages to smashing case
once in a while.  :)

-- 
Karl A. Krueger <········@example.edu> { s/example/whoi/ }

Every program has at least one bug and can be shortened by at least one line.
By induction, every program can be reduced to one line which does not work.
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87brcs2lcc.fsf@thalassa.informatimago.com>
"Karl A. Krueger" <········@example.edu> writes:
> In most natural languages, case distinctions do not change what is being
> talked about -- they convey differences in emphasis or sometiems
> grammar, but not in substance.  A GREEN FROG and a green frog are the
> same amphibian.  

Not exactly. It's is because you interpret it so. Typographers
invented the "small caps" to be able to put emphasis in this way on
some expression, while still keeping the distinction between true
uppercase and the small-capped lowercases.



> Indeed, students when learning case-sensitive languages
> such as C often run into difficulties (albeit brief ones) in coming to
> terms with case sensitivity ... and "Is your Caps Lock on?" is the first
> question a sysadmin asks when a user complains that their password is
> not accepted.

They're learning that the computers are not as smart as homo-sapiens.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
You're always typing.
Well, let's see you ignore my
sitting on your hands.
From: Thomas A. Russ
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ymi65306d28.fsf@sevak.isi.edu>
Cameron MacKinnon <··········@clearspot.net> writes:

> 
> Thomas A. Russ wrote:
> 
> > Distinguishing symbol names just based on case is IMNSHO a really bad
> > thing to do, and programming languages SHOULD discourage it.  The
> > problem is that it leads to various subtle bugs that are hard to
> > visually pick up when one has to deal with the fact, that, for example
> > the two symbols  subclassOf and subClassOf are actually different.
> 
> By advocating a case-insensitive language, you force subtle writers to 
> interface with tin-eared listeners. It's an autistic machine interface; 
> one that gets the big picture, but doesn't understand the subtleties 
> which the writer introduced into his prose, and which human readers do 
> understand.

A nice sentiment, but I fear it is out of place in an engineering
endeavor.  The point of writing code is to make the intent of the
programmer crystal clear, both to the machine and to other readers of
the code.  This is not a place where subtlety is generally called for.
In fact, quite the opposite.  Truly professional code is easy to follow
and therefore to maintain.  If great cleverness is called for, then it
is also accompanied by comments which point out the cleverness so that
it does not mystify the next reader of the code.  Mystifying other code
readers can lead to ill-founded changes to the code during future
maintenance.

> In no field outside of computers are there case-insensitive languages 
> which leave capitalization as a completely arbitrary yet 
> information-free choice of the writer. The posting to which I am 
> replying contains various examples of information encoded with selective 
> capitalization, for example.

But in those fields outside of computers, changing the capitalization of
the words does not change what the word refers to.  The capitalization
is generally an embellisment added to the underlying information content
of the text.  Typically it highlights a particular part of the text.  In
the text strings "Happy Birthday" and "HAPPY BIRTHDAY" the second word
really denotes the same celebration in each case, irrespective of the
differing capitalization.

> Typographers, whose job it is to make text look good so it can quickly 
> and delightfully convey information, do not see the choice between 
> majiscules and miniscules as an arbitrary one, and they use it to great 
> effect. Your belief that the distinction is slight and should be ignored 
> by our machines belies centuries of typographic wisdom.

Perhaps I am using the wrong brand of computers, but the machines I'm in
daily contact with are boring dolts incapable of understanding and
appreciating the beauty of such aesthetic judgements.  The readers of
the code, on the other hand, are the ones that should be addressed by
the visual presentation.  In fact, I will go so far as to claim that if
one has the language be case-insensitive, one allows greater freedom to
the programmer to use (albeit limited) typography to convey additional
information about the emphasis of the code.  Having a case insensitive
language doesn't mean one can't use capitalization in the code.  It can
be used to convey information other than symbol identity, just as the
typesetting of prose doesn't generally change the meaning of the
underlying sentence.

> > One can perhaps make a weak case that all uppercase is visually
> > different enough to be distinguished from mixed and lowercase, but that
> > is a much more arcane restriction.  If you look at the direction of
> > "modern" file systems (Windows and Macintosh), you will find that they
> > are case-preserving but case-insensitive.  That seems to be to be a
> > reasonable approach to the issue, since the case of the letters is
> > generally just too subtle a distinguishing item.
> 
> File systems' designs have a lot more to do with being compatible with 
> other operating systems, past and present, than with user desires.

This is an interesting claim, especially since it seems that just about
every operating system chooses different conventions for what is allowed
in file names and how they are treated with respect to case.

> Besides which, users don't mind if their expression of ideas is somewhat 
> limited in the file's name, it is simply a tag which is consulted 
> briefly as an index. Users know that their ideas can be given full 
> expression inside the files, not in their names.

I still don't see why having case be supported by a programming language
without allowing it to differentiate identifiers stifles expression of
ideas.  I think it frees it, since one is no longer constrained to use
the same case everywhere.  Words at the beginning of sentences are
capitalized, but words in the middle (at least in English) aren't.  But
they still refer to the same thing.

> Case insensitivity in our computer systems has everything to do with 
> baudot five bit codes, low resolution displays and nine pin dot matrix 
> printers. In that austere and harsh environment, ALL CAPS really was the 
> most legible and economical choice. Not anymore.

Which is why my preferred design choice in a language would be something
akin to the Mac/Windows file system paradigm:  You can use any case you
like in the names, but they all get folded into a single
representation.  The canonical name for a file, for example, is the one
used when the file (or symbol) is first created.  The printer preserves
this canonical case.

> A few comments regarding your subclassOf and subClassOf example:
> 
> I easily notice capitalization errors when I'm reading prose. If a coder 
> really did have code where both subclassOf and subClassOf were defined 
> and in scope, well, how far are you willing to go to save him from 
> himself? The benefits to allowing writers more style in their works 
> outweigh, I think, the risks that bad writers will use the tools to 
> create eyesores -- bad writers will find ways of writing bad code no 
> matter how much you constrain them, but good writers can only surpass 
> mediocrity with good, case sensitive tools.

I also notice them when I am reading the prose closely.  It is much
harder to detect if I am quickly scanning a page instead of analyzing
and understanding the details of the text.  I often don't want to 

> (HELP, MY LISP READER KEEPS SHOUTING AT ME)  :-)

(setf (readtable-case *readtable*) :INVERT)

There.  I've used capitalization to emphasize a point in the code and
draw attention to it.  That only works because the Lisp system knows
that :invert and :INVERT are the same.  At least until after it executes
the form......

But more to the point.  I am not advocating upper-case as a good choice
for language design.

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87is702ljk.fsf@thalassa.informatimago.com>
···@sevak.isi.edu (Thomas A. Russ) writes:
> But in those fields outside of computers, changing the capitalization of
> the words does not change what the word refers to.  The capitalization
> is generally an embellisment added to the underlying information content
> of the text.  Typically it highlights a particular part of the text.  In
> the text strings "Happy Birthday" and "HAPPY BIRTHDAY" the second word
> really denotes the same celebration in each case, irrespective of the
> differing capitalization.

What about NeXT and next, or windows and Windows, or cisco and Cisco?
But perhaps we've not left the field of computers...


> I still don't see why having case be supported by a programming language
> without allowing it to differentiate identifiers stifles expression of
> ideas.  I think it frees it, since one is no longer constrained to use
> the same case everywhere.  Words at the beginning of sentences are
> capitalized, but words in the middle (at least in English) aren't.  But
> they still refer to the same thing.

bob and Bob ?  bill and Bill ? etc.  Case is significant in most
natural languages.  Granted, with natural readers, if you make an
error in  capitalization, the reader can recover gracefully and
restore the original capitalization and meaning autonomously.
Nonetheless, the capitalization is significant.


> Which is why my preferred design choice in a language would be something
> akin to the Mac/Windows file system paradigm:  You can use any case you
> like in the names, but they all get folded into a single
> representation.  The canonical name for a file, for example, is the one
> used when the file (or symbol) is first created.  The printer preserves
> this canonical case.

Then you'll explain to users why their invoice file was erased when
then wrote a letter to Mr Gates.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
You never feed me.
Perhaps I'll sleep on your face.
That will sure show you.
From: Thomas A. Russ
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ymi4qig6243.fsf@sevak.isi.edu>
Pascal Bourguignon <····@mouse-potato.com> writes:

> 
> ···@sevak.isi.edu (Thomas A. Russ) writes:
> > But in those fields outside of computers, changing the capitalization of
> > the words does not change what the word refers to.  The capitalization
> > is generally an embellisment added to the underlying information content
> > of the text.  Typically it highlights a particular part of the text.  In
> > the text strings "Happy Birthday" and "HAPPY BIRTHDAY" the second word
> > really denotes the same celebration in each case, irrespective of the
> > differing capitalization.
> 
> What about NeXT and next, or windows and Windows, or cisco and Cisco?
> But perhaps we've not left the field of computers...
> 
> 
> > I still don't see why having case be supported by a programming language
> > without allowing it to differentiate identifiers stifles expression of
> > ideas.  I think it frees it, since one is no longer constrained to use
> > the same case everywhere.  Words at the beginning of sentences are
> > capitalized, but words in the middle (at least in English) aren't.  But
> > they still refer to the same thing.
> 
> bob and Bob ?  bill and Bill ? etc.  Case is significant in most
> natural languages.  Granted, with natural readers, if you make an
> error in  capitalization, the reader can recover gracefully and
> restore the original capitalization and meaning autonomously.
> Nonetheless, the capitalization is significant.

But not required or even sufficient.  "bob" becomes "Bob" when it heads
up a sentence, even if the sentence is "Bob for an apple."  Similarly
"Bill the head office for the new drapes." uses the non-proper noun
meaning of "bill".  To support this distinction, one would have to allow
(or maybe even require?) something really interesting like:

  (DEFUN bill (x y) ...)

  (Bill x y)

because in the second form, "bill" heads the sentence, whereas not in
the definition.  But then one would need to use a lot more semantic
reasoning to distinguish this from

   (DEFUN Bill () ...)

   (Bill)

Clearly one could do it, but why?

> > Which is why my preferred design choice in a language would be something
> > akin to the Mac/Windows file system paradigm:  You can use any case you
> > like in the names, but they all get folded into a single
> > representation.  The canonical name for a file, for example, is the one
> > used when the file (or symbol) is first created.  The printer preserves
> > this canonical case.
> 
> Then you'll explain to users why their invoice file was erased when
> then wrote a letter to Mr Gates.

Well, if Mr. Gates software can't get its own operating system's
paradigm correct, I can't be responsible for that.  At least on the Mac
(when not using the Unix file system), the software would recognize the
name conflict and warn you about the replacement....

But since this is, perhaps, fundamentally a religious argument, I doubt
there will be any particular resolution.  My preference is to not have
case be used to distinguish symbol identity.

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87u0qgzf8b.fsf@thalassa.informatimago.com>
···@sevak.isi.edu (Thomas A. Russ) writes:
> > > Which is why my preferred design choice in a language would be something
> > > akin to the Mac/Windows file system paradigm:  You can use any case you
> > > like in the names, but they all get folded into a single
> > > representation.  The canonical name for a file, for example, is the one
> > > used when the file (or symbol) is first created.  The printer preserves
> > > this canonical case.
> > 
> > Then you'll explain to users why their invoice file was erased when
> > then wrote a letter to Mr Gates.
> 
> Well, if Mr. Gates software can't get its own operating system's
> paradigm correct, I can't be responsible for that.  At least on the Mac
> (when not using the Unix file system), the software would recognize the
> name conflict and warn you about the replacement....
> 
> But since this is, perhaps, fundamentally a religious argument, I doubt
> there will be any particular resolution.  My preference is to not have
> case be used to distinguish symbol identity.

This user lost files on HFS+/MacOSX for a name collision.  Perhaps the
higher level, closer to the user do warn, but not the unix layer, and
neither the random open source unix tool.

I find the current setup in Common Lisp quite satisfying: symbols ARE
case sensitive: (assert (not (eq '|Foo| '|FOO|))) 
but the user interface can be configured to not distinguish between
Foo and FOO: (assert (eq 'FOO 'Foo))

I don't feel the need for a "modern" mode, I'm happy with :PRESERVE or
with :UPCASE.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Cats meow out of angst
"Thumbs! If only we had thumbs!
We could break so much!"
From: Tim Bradshaw
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ey36531uqe7.fsf@cley.com>
* Cameron MacKinnon wrote:
> Typographers, whose job it is to make text look good so it can quickly
> and delightfully convey information, do not see the choice between
> majiscules and miniscules as an arbitrary one, and they use it to
> great effect. Your belief that the distinction is slight and should be
> ignored by our machines belies centuries of typographic wisdom.

Right. and this is why case distinctions in programming languages are
bad!  For hundreds of years most of the text people read was created
by people with a lot of training in how to make it look good: correct
use of capitalisation, typeface, whitespace and so on.  *Even then*
there were horrors perpetrated: 19th century capitalisation was often
really awful, and a large amount of 19th century typography was just
appalling - all those `modern' typefaces, yuck!  It took a revolution
in typography to restore some kind of sanity.

Now look at what happened when people got let loose with word
processors and DTP packages.  Terror!  It turns out that the training
typographers have is worth something.

Programmers usually don't have any kind of training at all in
typography.  Most of them are not even very literate.  The very last
thing you want to do is give them too many options about case.

But of course, in the form of C-family languages they did get options.
And it's interesting to look at what happened.  The early Unix
programmers - who were highly literate - made extremely sparing and
careful use of case distinctions.  Later on, as less literate people
moved in, we got the current obscurely capitalised horrors.

--tim





>> One can perhaps make a weak case that all uppercase is visually
>> different enough to be distinguished from mixed and lowercase, but that
>> is a much more arcane restriction.  If you look at the direction of
>> "modern" file systems (Windows and Macintosh), you will find that they
>> are case-preserving but case-insensitive.  That seems to be to be a
>> reasonable approach to the issue, since the case of the letters is
>> generally just too subtle a distinguishing item.

> File systems' designs have a lot more to do with being compatible with
> other operating systems, past and present, than with user
> desires. Besides which, users don't mind if their expression of ideas
> is somewhat limited in the file's name, it is simply a tag which is
> consulted briefly as an index. Users know that their ideas can be
> given full expression inside the files, not in their names.

> Case insensitivity in our computer systems has everything to do with
> baudot five bit codes, low resolution displays and nine pin dot matrix
> printers. In that austere and harsh environment, ALL CAPS really was
> the most legible and economical choice. Not anymore.


> A few comments regarding your subclassOf and subClassOf example:

> I easily notice capitalization errors when I'm reading prose. If a
> coder really did have code where both subclassOf and subClassOf were
> defined and in scope, well, how far are you willing to go to save him
> from himself? The benefits to allowing writers more style in their
> works outweigh, I think, the risks that bad writers will use the tools
> to create eyesores -- bad writers will find ways of writing bad code
> no matter how much you constrain them, but good writers can only
> surpass mediocrity with good, case sensitive tools.

> (HELP, MY LISP READER KEEPS SHOUTING AT ME)  :-)
From: Fred Gilham
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <u7sm64agzh.fsf@snapdragon.csl.sri.com>
Tim Bradshaw wrote:

> Cameron MacKinnon wrote:
> > Typographers, whose job it is to make text look good so it can
> > quickly and delightfully convey information, do not see the choice
> > between majiscules and miniscules as an arbitrary one, and they
> > use it to great effect. Your belief that the distinction is slight
> > and should be ignored by our machines belies centuries of
> > typographic wisdom.
> 
> ...
>
> Programmers usually don't have any kind of training at all in
> typography.  Most of them are not even very literate.  The very last
> thing you want to do is give them too many options about case.


Has anyone mentioned the fact that decorating code with case, in Lisp,
distracts from the cues we learn over time that help us grasp the
code?  I believe that I (or some other experienced Lisp programmer)
look at Lisp code I subconsciously pick up on many cues that tell me
about the meaning of the pieces of the program.  I experimented with
caps at different times, trying to emulate some of the styles I saw in
other code (e.g. Garnet and CLX code used various capitalization
conventions), but found that it either distracted or (at best) wasn't
necessary.

I think people should capitalize their code any way they want.  If it
turns out to be effective, it will catch on. All I can say is that I
like the way it's done in Lisp.  The fact that I don't have to use
"camel case" (which outrages my eyes, my fingers, and my literary
sensibilities) is, to me, a big advantage.


-- 
Fred Gilham                                        ······@csl.sri.com
A common sense interpretation of the facts suggests that a
superintellect has monkeyed with physics, as well as with chemistry
and biology, and that there are no blind forces worth speaking about
in nature. --- Fred Hoyle
From: Tim Bradshaw
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ey3u0pyu1ew.fsf@cley.com>
* Fred Gilham wrote:
> Has anyone mentioned the fact that decorating code with case, in Lisp,
> distracts from the cues we learn over time that help us grasp the
> code?  I believe that I (or some other experienced Lisp programmer)
> look at Lisp code I subconsciously pick up on many cues that tell me
> about the meaning of the pieces of the program.  I experimented with
> caps at different times, trying to emulate some of the styles I saw in
> other code (e.g. Garnet and CLX code used various capitalization
> conventions), but found that it either distracted or (at best) wasn't
> necessary.

> I think people should capitalize their code any way they want.  If it
> turns out to be effective, it will catch on. All I can say is that I
> like the way it's done in Lisp.  The fact that I don't have to use
> "camel case" (which outrages my eyes, my fingers, and my literary
> sensibilities) is, to me, a big advantage.

Related to this is the important (but conveniently ignored by the
case-sensitivity cultists) point that in most natural languages
capitalisation is optional, and a matter of style and not meaning.
For instance: it is possible to use English as a spoken language, and
there are no case distinctions in spoken English. (Please note: I am
not trying to claim that Lisp should be a spoken language!)  Anyone
who has worked for a publisher will also be aware of the various house
rules which cover capitalisation as well as aspects of punctuation and
so on.  This:

                              Chapter 12
               Order and Disorder in Portuguese Grammar

is only stylistically different to:

    12. Order and disorder in Portuguese grammar

I'd find this:

    12. order and disorder in portuguese grammar

objectionable, but the meaning is the same.


All this is completely different to the case in (most) programming
languages, where case changes meaning.  Except, of course, Common Lisp
(with the normal readtable), where case is a stylistic matter, as it
is in natural languages.

--tim

(Yes, I know there are cases where case distinctions affect meaning in
natural language: the point is that they are rare compared to the
cases where they affect readability only.)
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <KM2dndQXRu3xP0TcRVn-oA@golden.net>
Tim Bradshaw wrote:
> * Fred Gilham wrote:
> 
>>I think people should capitalize their code any way they want.  If it
>>turns out to be effective, it will catch on. All I can say is that I
>>like the way it's done in Lisp.  The fact that I don't have to use
>>"camel case" (which outrages my eyes, my fingers, and my literary
>>sensibilities) is, to me, a big advantage.
> 
> 
> Related to this is the important (but conveniently ignored by the
> case-sensitivity cultists) point that in most natural languages
> capitalisation is optional, and a matter of style and not meaning.

Capitalization has meaning. It does not usually convey much meaning, and 
people can usually reconstruct the meaning that is lost when it is 
missing, but it is not merely a matter of style, especially for a writer 
who only has one font at his disposal.

> For instance: it is possible to use English as a spoken language, and
> there are no case distinctions in spoken English. (Please note: I am
> not trying to claim that Lisp should be a spoken language!)

Certainly. But written English and spoken English are, as a result of 
this and other issues, subtly different languages. For example, I can 
write 'I saw The English Patient last night' without ambiguity, but to 
say it, I'd likely replace 'saw' with 'watched' or 'screened' or 
'rented' to clue in the listener. When someone writes that something is 
a Good Thing, it is not a mere stylistic choice -- there is meaning 
there, and a lot of it.

Case insensitive Lisp is akin to a spoken language - there are 
distinctions that cannot be made, which could be made in a case 
sensitive language. Saying 'You are free to use capitalization as you 
wish, but the Lisp reader will ignore it' is tantamount to banning caps, 
because the smart programmer will never use a differentiator that the 
computer ignores -- it is risky behaviour to have a different symbol 
classification algorithm than the entity you are communicating with.

> (Yes, I know there are cases where case distinctions affect meaning in
> natural language: the point is that they are rare compared to the
> cases where they affect readability only.)

Every human reader of Latin languages has a case interpretation 
algorithm - we notice and draw meaning from the presence or absence of 
caps when we are reading, and we use caps to convey meaning when we are 
writing. Those among us who choose not to bother (knowing that a 
moderately literate reader can take up the slack) say a lot about 
themselves, too.

It is certainly within the capabilities of our computers to take note of 
capitalization, to attempt to infer meaning from its use, and to point 
out inexplicable inconsistencies in its use as indicative of a possible 
communication failure between man and machine. Doing this would empower 
people who want to use caps to convey meaning, while not failing any but 
the most spastic legacy code(r).

It is probably already too late for me. I've been following the 'modern' 
Lisp convention of lowercase-with-dashes, and may always see that form 
as natural, just as I'm sure that those whose Lisping dates from the 
Regency spent years seeing ALL-CAPS-LISP as normal and lowercase code as 
odd-looking.

And does anyone really think they're fooling newcomers to the language 
when they backfill around the historical realities by suggesting that 
the current convention is how reasonable people would do it if starting 
from scratch? Once a hacker discovers the baroquen steam calliope that 
is readtable-case :invert, the jig is up.
From: Tim Bradshaw
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ey3d5wmtdj9.fsf@cley.com>
* Cameron MacKinnon wrote:

> Capitalization has meaning. It does not usually convey much meaning,
> and people can usually reconstruct the meaning that is lost when it is
> missing, but it is not merely a matter of style, especially for a
> writer who only has one font at his disposal.

It is very often a matter of style.  Or, if I was German or perhaps
Victorian: it is very often a matter of Style.

Of course there are cases where there is significant meaning.  But, as
we've seen in other (computer) languages, making too much use of that
meaning leads us straight to a hell which is even more painful than
written German (although I think there have been reforms in German
orthography which make it a lot better). All the evidence is that if
you make case distinctions matter in programming languages you end up
in a nightmare of studly capitalisation.

> And does anyone really think they're fooling newcomers to the language
> when they backfill around the historical realities by suggesting that
> the current convention is how reasonable people would do it if
> starting from scratch? Once a hacker discovers the baroquen steam
> calliope that is readtable-case :invert, the jig is up.

I don't know the historical realities: when I started using CL 17
years ago the case behaviour of the default readtable already seemed
`archaic'.  I presume the history is use on machines which did not
support case distinctions very well, or at all.

However the history is irrelevant: what matters is that the result of
that history is actually rather good, other than when dealing with
languages which do make excessive use of case distinction.
Unfortunately the prevalence of those languages, and of stupidity in
general, will almost certainly lead to case sensitivity by default in
future CLs, with the horrors that implies.

I'm just glad I don't have to deal with idiot programmers (in other
words: all programmers) any more!

--tim
From: Tobias C. Rittweiler
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <1104790554.527489.254250@f14g2000cwb.googlegroups.com>
Tim Bradshaw wrote:

> Of course there are cases where there is significant meaning. But,
> as we've seen in other (computer) languages, making too much use of
> that meaning leads us straight to a hell which is even more painful
> than written German

I think you meant `writing German', since the outcome is actually
very pleasant to read. And writing in a not-overly-formal context
isn't very hard at all, because the rule of thumb of capitalizing
every substantive (including substantivated forms) matches almost
all cases.


--tcr.
From: Tim Bradshaw
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ey38y7at9ri.fsf@cley.com>
* Tobias C Rittweiler wrote:

> I think you meant `writing German', since the outcome is actually
> very pleasant to read. And writing in a not-overly-formal context
> isn't very hard at all, because the rule of thumb of capitalizing
> every substantive (including substantivated forms) matches almost
> all cases.

No. I meant what I wrote.

--tim
From: Julian Stecklina
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <86ekh1w4d6.fsf@goldenaxe.localnet>
Tim Bradshaw <···@cley.com> writes:

> written German (although I think there have been reforms in German
> orthography which make it a lot better).

I think many people in Germany (and Austria and perhaps Switzerland)
have quite a different opinion. ;)

Regards,
-- 
                    ____________________________
 Julian Stecklina  /  _________________________/
  ________________/  /
  \_________________/  LISP - truly beautiful
From: Tim Bradshaw
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ey3vfadrs21.fsf@cley.com>
* Julian Stecklina wrote:
> Tim Bradshaw <···@cley.com> writes:

> I think many people in Germany (and Austria and perhaps Switzerland)
> have quite a different opinion. ;)

Well, of course they do. I don't want to pass judgement on the reforms
(I don't know them in any detail!), but it's more-or-less axiomatic
that a large number of people will object to any reform.  Do you think
we (in the UK) feel positive about metric units?  Of course we
don't. What does this tell you about their benefits or otherwise?
Nothing.  How do you think people feel about split-infinitives?

--tim
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <etGdnYBkN50b20HcRVn-sA@golden.net>
Tim Bradshaw wrote:
> * Cameron MacKinnon wrote:
> 
> 
>>Capitalization has meaning.

> Of course there are cases where there is significant meaning.  But, as
> we've seen in other (computer) languages, making too much use of that
> meaning leads us straight to a hell which is even more painful than
> written German (although I think there have been reforms in German
> orthography which make it a lot better). All the evidence is that if
> you make case distinctions matter in programming languages you end up
> in a nightmare of studly capitalisation.

Regarding German, I find the capitalization helps me when reading it, 
but I think language features should be for the primary benefit of power 
users, not illiterate clods such as myself. Gothic German is 
eye-wateringly painful for me, but I understand it was deprecated by 
fiat in 1941*. Have there been more recent reforms?

But you talk of "making TOO MUCH use" of the meaning conveyed in case 
distinctions. Maybe there's hope for common ground between us...

> However the history is irrelevant: what matters is that the result of
> that history is actually rather good, other than when dealing with
> languages which do make excessive use of case distinction.

Too bad those languages are the lingua franca of computing. But again, 
"EXCESSIVE use of case distinction."

> Unfortunately the prevalence of those languages, and of stupidity in
> general, will almost certainly lead to case sensitivity by default in
> future CLs, with the horrors that implies.

So we find ourself today. Half a millennium after the adoption of 
lowercase Latin, binary computers and sundry telegraphic devices are 
invented. For the sake of expediency, early models had no lowercase. 
Later models bifurcated into two world views: that 'a' and 'A' are 
identical, or that they bear no relation to each other. Five centuries 
of progress in type snarled by a few decades of blundering 
communications engineers and "computer scientists"!

Lisp had a history of DWIM. Making intelligent decisions regarding which 
case distinctions in source code are meaningful would require examining 
tokens in the context of the entire stream that they reside in -- quite 
a change from the operation of Common Lisp's READ. Perhaps 
READ-SENSITIVELY?

If you don't like that idea, then I agree with you that future Lisps 
will likely follow in the path of languages where 'a' and 'A' are 
considered unrelated, to our detriment.


* - http://www.waldenfont.com/content.asp?contentpageID=8
From: Julian Stecklina
Subject: [OT] German orthography (was: Re: CLisp case sensitivity)
Date: 
Message-ID: <86u0pvv9pe.fsf_-_@goldenaxe.localnet>
Cameron MacKinnon <··········@clearspot.net> writes:

> Tim Bradshaw wrote:
>> * Cameron MacKinnon wrote:
>>
>>>Capitalization has meaning.
>
>> Of course there are cases where there is significant meaning.  But, as
>> we've seen in other (computer) languages, making too much use of that
>> meaning leads us straight to a hell which is even more painful than
>> written German (although I think there have been reforms in German
>> orthography which make it a lot better). All the evidence is that if
>> you make case distinctions matter in programming languages you end up
>> in a nightmare of studly capitalisation.
>
> Regarding German, I find the capitalization helps me when reading it, but I think language features should be for the primary
> benefit of power users, not illiterate clods such as myself. Gothic German is eye-wateringly painful for me, but I understand it
> was deprecated by fiat in 1941*. Have there been more recent reforms?

Yes, there has been another reform 1996:
http://de.wikipedia.org/wiki/Neue_deutsche_Rechtschreibung

If you mean Frakturschrift with "Gothic" German, I find that quite
easy to read. 
http://de.wikipedia.org/wiki/Fraktur_%28Schrift%29

If you mean S�tterlin, I cannot read that either. ;)
http://de.wikipedia.org/wiki/S%C3%BCtterlinschrift


Regards,
-- 
                    ____________________________
 Julian Stecklina  /  _________________________/
  ________________/  /
  \_________________/  LISP - truly beautiful
From: Rahul Jain
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <87sm5fp2lm.fsf@nyct.net>
Cameron MacKinnon <··········@clearspot.net> writes:

> Lisp had a history of DWIM. Making intelligent decisions regarding which
> case distinctions in source code are meaningful would require examining
> tokens in the context of the entire stream that they reside in -- quite
> a change from the operation of Common Lisp's READ. Perhaps
> READ-SENSITIVELY?

Do you really think that hoping the DWIM system will guess your meaning
correctly is a good idea? DWIM is fine for user interfaces, such as code
editors, but the actual code and language shouldn't be overly complex to
understand semantically. Please don't throw out the principle of least
surprise.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <tJSdnSsFUvQZfUHcRVn-3A@golden.net>
Rahul Jain wrote:
> Cameron MacKinnon <··········@clearspot.net> writes:
> 
> 
>>Lisp had a history of DWIM. Making intelligent decisions regarding which
>>case distinctions in source code are meaningful would require examining
>>tokens in the context of the entire stream that they reside in -- quite
>>a change from the operation of Common Lisp's READ. Perhaps
>>READ-SENSITIVELY?
> 
> 
> Do you really think that hoping the DWIM system will guess your meaning
> correctly is a good idea? DWIM is fine for user interfaces, such as code
> editors, but the actual code and language shouldn't be overly complex to
> understand semantically. Please don't throw out the principle of least
> surprise.

I don't think that developing a DWIM system to do this is difficult, and 
I would expect such a system not to proceed in the face of ambiguity. 
One could even (horrors) write a proof-of-concept preprocessor similar 
to the original C++ implementation, which would examine an input file 
and rewrite it to CL, escaping or renaming tokens as necessary.

I do think that CL's current state of case sensitivity is archaic, and 
that those who are defending it are fighting a rearguard action. 
Interoperability with C/C++ and the expectations of younger programmers 
are two good reasons why, sooner or later, greater sensitivity to case 
is likely coming to Lisp. In fact, various implementations have 
nonstandard case sensitivity modes already. As time goes on, the status 
quo will become a bunch of Lisps with incompatible case sensitivity 
rules in addition to the CL mandated ones.

If the people who like aspects of Lisp's current case-blind behaviour 
don't come up with a better solution, then the likely one adopted will 
be that of other languages -- which many consider to be pedantic 
oversensitivity.
From: Tim Bradshaw
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ey3brc2ssej.fsf@cley.com>
* Cameron MacKinnon wrote:
> I do think that CL's current state of case sensitivity is archaic, and
> that those who are defending it are fighting a rearguard
> action.

Probably.  `Archaic' in much the same sense that the typefaces
designed in the first 50-100 years of movable-type printing are
archaic, of course.

> Interoperability with C/C++ and the expectations of younger
> programmers are two good reasons why, sooner or later, greater
> sensitivity to case is likely coming to Lisp. 

They're reasons, anyway.  The cretins will win, again.  Well done,
cretins.

--tim
From: Wade Humeniuk
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <CldDd.34136$nN6.19385@edtnps84>
Cameron MacKinnon wrote:

> 
> I do think that CL's current state of case sensitivity is archaic, and 
> that those who are defending it are fighting a rearguard action. 
> Interoperability with C/C++ and the expectations of younger programmers 
> are two good reasons why, sooner or later, greater sensitivity to case 
> is likely coming to Lisp. In fact, various implementations have 
> nonstandard case sensitivity modes already. As time goes on, the status 
> quo will become a bunch of Lisps with incompatible case sensitivity 
> rules in addition to the CL mandated ones.
> 

And I say let them create their own case-sensitive Lisp's.  They have the
freedom to do that with CL.  If someone feels that CL _mandates_ anything then
it is only their feeling and their own psychological need for authority.
If a young programmer feels compelled to follow and be constrained by
their own and other's expectations, then they are allowed to do that.
I find that CL has allowed both case-sensitive and case-insensitive
sensibilities to be acknowledged, and I do not feel constrained to
feel strongly about case sensitivity at all.  Nor would I care
whether multiple Lisps are incompatible.  If people have become so
interested in Lisp to create their own dialects, then it is a good
thing.

Wade
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <873bxe8r12.fsf@thalassa.informatimago.com>
Wade Humeniuk <····················@telus.net> writes:

> Cameron MacKinnon wrote:
> 
> > I do think that CL's current state of case sensitivity is archaic,
> > and that those who are defending it are fighting a rearguard
> > action. Interoperability with C/C++ and the expectations of younger
> > programmers are two good reasons why, sooner or later, greater
> > sensitivity to case is likely coming to Lisp. In fact, various
> > implementations have nonstandard case sensitivity modes already. As
> > time goes on, the status quo will become a bunch of Lisps with
> > incompatible case sensitivity rules in addition to the CL mandated
> > ones.
> >
> 
> And I say let them create their own case-sensitive Lisp's.  They have the
> freedom to do that with CL.  If someone feels that CL _mandates_ anything then
> it is only their feeling and their own psychological need for authority.
> If a young programmer feels compelled to follow and be constrained by
> their own and other's expectations, then they are allowed to do that.
> I find that CL has allowed both case-sensitive and case-insensitive
> sensibilities to be acknowledged, and I do not feel constrained to
> feel strongly about case sensitivity at all.  Nor would I care
> whether multiple Lisps are incompatible.  If people have become so
> interested in Lisp to create their own dialects, then it is a good
> thing.

I don't understand this. 

                Common Lisp IS case sensitive.

    (not (eq (intern "Hello") (intern "HELLO")))       !!!

Only the default reader setting converts everything to upper case:

    (eq 'hello 'HELLO)



But you can just write:

    (SETF (READTABLE-CASE *READTABLE*) :PRESERVE)

to get:

    (NOT (EQ 'hello 'HELLO)) 



So there's no question of case sensitivity or not.


Perhaps what's molesting some people is that the COMMON-LISP package
exports symbols that are all upper-case?  Well, that's not the only
language where the language "keywords" are defined in upper
case. Modula-2 and Modula-3 are case sensitive and define their
keywords in upper case too, for example.  If you can't stand upper
case, then just keep the default readtable-case setting and go-on
typing lower cases.



-- 
__Pascal_Bourguignon__               _  Software patents are endangering
()  ASCII ribbon against html email (o_ the computer industry all around
/\  1962:DO20I=1.100                //\ the world http://lpf.ai.mit.edu/
    2001:my($f)=`fortune`;          V_/   http://petition.eurolinux.org/
From: Tim Bradshaw
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ey3sm5equu9.fsf@cley.com>
* Pascal Bourguignon wrote:
> I don't understand this. 

>                 Common Lisp IS case sensitive.

Yes, it is.

> Perhaps what's molesting some people is that the COMMON-LISP package
> exports symbols that are all upper-case?  Well, that's not the only
> language where the language "keywords" are defined in upper
> case. Modula-2 and Modula-3 are case sensitive and define their
> keywords in upper case too, for example.  If you can't stand upper
> case, then just keep the default readtable-case setting and go-on
> typing lower cases.

I think this (case wars) is a really good example of what happens when
people have run horribly out of ideas (or perhaps never had any in the
first place): they start fighting over things that are absolutely and
utterly trivial, like case.  Has it stopped anyone writing programs?
No, of course it hasn't.  Has it stopped anyone interfacing with
Studly-by-default languages? No, of course it hasn't (I mean, even if
you're generating the interface manually, it really isn't that hard to
type |Foo| is it?  And are you *actually* generating the interface
manually? Surely you're not, and surely, like me you have utilities
which convert fooBar to foo-bar and FooBar to !foo-bar (if you need
the distinction)).

You can find many other wars like this on CLL.  I rather suspect that
CLL posters have rather few good ideas...

And of course, like a fool, I've let myself get dragged in.  Well:
never again.

--tim
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <3OKdnWPypJmpE0DcRVn-sA@golden.net>
Tim Bradshaw wrote:
> Has it stopped anyone interfacing with
> Studly-by-default languages? No, of course it hasn't (I mean, even if
> you're generating the interface manually, it really isn't that hard to
> type |Foo| is it?  And are you *actually* generating the interface
> manually? Surely you're not, and surely, like me you have utilities
> which convert fooBar to foo-bar and FooBar to !foo-bar (if you need
> the distinction)).

This is the "before" discussion. For software engineering practice like 
this, the "after" discussion is held in comp.risks
From: Rob Warnock
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <1uGdnbf_U4gslUPcRVn-pg@speakeasy.net>
Cameron MacKinnon  <··········@clearspot.net> wrote:
+---------------
| Tim Bradshaw wrote:
| > Has it stopped anyone interfacing with
| > Studly-by-default languages? No, of course it hasn't (I mean, even if
| > you're generating the interface manually, it really isn't that hard to
| > type |Foo| is it?  And are you *actually* generating the interface
| > manually? Surely you're not, and surely, like me you have utilities
| > which convert fooBar to foo-bar and FooBar to !foo-bar (if you need
| > the distinction)).
| 
| This is the "before" discussion. For software engineering practice
| like  this, the "after" discussion is held in comp.risks
+---------------

Indeed. Except that the only problem is that -- at least, based on the
case sensitivity discussions I've seen around here -- it's hard to guess
ahead of time which side(s) of the argument(s) will turn out to generate
the greater RISKs in the long run!!  ;-}  ;-}  [or maybe :-( ]

Seriously, I've seen cogent arguments in several directions, each of
which points out a RISK of some other direction than the "favored" one.
From all of this, the only thing I can take away with some confidence
is: "Be afraid.  Be *very* afraid."  [Or at least be very careful...]


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Tim Bradshaw
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ey31xcx2rt0.fsf@cley.com>
* Cameron MacKinnon wrote:

> This is the "before" discussion. For software engineering practice
> like this, the "after" discussion is held in comp.risks

I think you just mentioned Hitler.  I'll go away now.
From: Peter Seibel
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <m3acrmz7ck.fsf@javamonkey.com>
Tim Bradshaw <···@cley.com> writes:

> You can find many other wars like this on CLL.  I rather suspect that
> CLL posters have rather few good ideas...
>
> And of course, like a fool, I've let myself get dragged in.  Well:
> never again.

Hey, at least we don't spend any time arguing about where the {}'s go.
;-)

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Larry Clapp
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <slrncts3o7.1tp.larry@theclapp.ddts.net>
In article <··············@thalassa.informatimago.com>, Pascal
Bourguignon wrote:
> Perhaps what's molesting some people is that the COMMON-LISP package
> exports symbols that are all upper-case?  Well, that's not the only
> language where the language "keywords" are defined in upper case.
> Modula-2 and Modula-3 are case sensitive and define their keywords
> in upper case too, for example.

After learning Turbo Pascal as my second computer language, and
enjoying it a lot, the case sensitivity of Modula-2's keywords really
turned me off.

I don't mind if a language has case-sensitive keywords as long as
they're *all lower case*.

The more Lisp I do, the more I like variables-like-this.

-- Larry
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <vMydnQazZZgyr17cRVn-uw@golden.net>
Tim Bradshaw wrote:

> Now look at what happened when people got let loose with word
> processors and DTP packages.  Terror!  It turns out that the training
> typographers have is worth something.

Yes, we went through a decade of DTP hell. But I wouldn't go back to the 
days before the DTP revolution.

> Programmers usually don't have any kind of training at all in
> typography.  Most of them are not even very literate.  The very last
> thing you want to do is give them too many options about case.

Yikes! Better lock up LAMBDA and the macroexpander, too, or they could 
get up to real mischief.
From: jayessay
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <m3sm647l2d.fsf@rigel.goldenthreadtech.com>
Cameron MacKinnon <··········@clearspot.net> writes:

> Tim Bradshaw wrote:
> 
> > Programmers usually don't have any kind of training at all in
> > typography.  Most of them are not even very literate.  The very last
> > thing you want to do is give them too many options about case.
> 
> Yikes! Better lock up LAMBDA and the macroexpander, too, or they could
> get up to real mischief.

category error ...


/Jon

-- 
'j' - a n t h o n y at romeo/charley/november com
From: Tim Bradshaw
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <ey3y8fau33z.fsf@cley.com>
* Cameron MacKinnon wrote:
> Yes, we went through a decade of DTP hell. But I wouldn't go back to
> the days before the DTP revolution.

Anyone with any feeling for good typography knows that we are still in
DTP hell, and there is little hope of escape.

--tim
From: Cameron MacKinnon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <-4OdnbUWGpJwPkTcRVn-2g@golden.net>
Tim Bradshaw wrote:
> * Cameron MacKinnon wrote:
> 
>>Yes, we went through a decade of DTP hell. But I wouldn't go back to
>>the days before the DTP revolution.
> 
> 
> Anyone with any feeling for good typography knows that we are still in
> DTP hell, and there is little hope of escape.

Elite or Pica, and Liquid Paper
It's that for you, or the engraver.
From: Peter Seibel
Subject: OT: Typography [was: CLisp case sensitivity]
Date: 
Message-ID: <m3pt0mzb2l.fsf_-_@javamonkey.com>
Tim Bradshaw <···@cley.com> writes:

> * Cameron MacKinnon wrote:
>> Yes, we went through a decade of DTP hell. But I wouldn't go back to
>> the days before the DTP revolution.
>
> Anyone with any feeling for good typography knows that we are still
> in DTP hell, and there is little hope of escape.

So I don't know if I have any feeling for good typography but I'm
curious, what tools would someone who did use to generate the output
for, say, a book. I've been thinking about this lately because after
writing my book and generating various kinds of output using my
homebrew toolchain (with many thanks to Marc Battyani for cl-pdf and
cl-typesetting) I'm annoyed at the tools I have to use for the final
stages of book production. (Microsoft Word plays a role; enough said.)
As I understand it the final version of the book is going to be
produced using Framemaker or Quark. Can those tools be used with taste
and style? Or are they just part of DTP hell? I've heard folks here
say plenty of snide things about TeX but at least has the appealing
feature of being text based--you could keep TeX files in a version
control system and diff them and edit them in Emacs and all that good
stuff. Is there something better? What do "real" typographers use?

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Cameron MacKinnon
Subject: Re: OT: Typography [was: CLisp case sensitivity]
Date: 
Message-ID: <ZKidnd8vDYboLETcRVn-rA@golden.net>
Peter Seibel wrote:
> 
> So I don't know if I have any feeling for good typography but I'm
> curious, what tools would someone who did use to generate the output
> for, say, a book.

I highly recommend reading "The Elements of Typographic Style". Then 
find books whose style you like. You will then be in a position to talk 
to your Quark driver, with examples and a knowledge of the issues.

DTP hell is mostly poor font choices, bad layout, and poor font choices 
again. It is a state of mind, not of tools.
From: Peter Seibel
Subject: Re: OT: Typography [was: CLisp case sensitivity]
Date: 
Message-ID: <m33bxixamf.fsf@javamonkey.com>
Cameron MacKinnon <··········@clearspot.net> writes:

> Peter Seibel wrote:
>> So I don't know if I have any feeling for good typography but I'm
>> curious, what tools would someone who did use to generate the output
>> for, say, a book.
>
> I highly recommend reading "The Elements of Typographic Style". Then
> find books whose style you like. You will then be in a position to
> talk to your Quark driver, with examples and a knowledge of the
> issues.

Well, at this point, I'm going to defer to their professional judgment
as I still have a couple *first* drafts to write and I just got my
first page proofs. Anyway, the first page proofs I've looked at look
pretty nice.

The reason I was asking about tools was mostly idle curiosity about if
there's a better way. My current problem is that while I have my whole
book in version control once I had to start dealing with Apress's
tools I lost the ability to track changes. Copy editing is done in
Word and then final layout in Framemaker and both of those file
formats are opaque to me. In a perfect world the book would stay in
version control all the way through and I'd still be able to do diffs,
greps, etc. of the latest typesetting source to see if certain fixes
are in or whatever.

> DTP hell is mostly poor font choices, bad layout, and poor font
> choices again. It is a state of mind, not of tools.

Yeah. I guess another way to frame my question is, are there purely
text-based layout tools that can do as good a job as graphical tools
like Quark and Framemaker, at least for relatively simple things such
as a book. (Simple compared to, say, a fancy brochure or Wired
Magazine.)

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Tim Bradshaw
Subject: Re: OT: Typography
Date: 
Message-ID: <ey34qhxtlrn.fsf@cley.com>
* Peter Seibel wrote:

> Yeah. I guess another way to frame my question is, are there purely
> text-based layout tools that can do as good a job as graphical tools
> like Quark and Framemaker, at least for relatively simple things such
> as a book. (Simple compared to, say, a fancy brochure or Wired
> Magazine.)

Yes: TeX.  What the graphical tools get you is the ability to see what
you are doing, and this matters for very visual layouts but not for
things like normal books.  Actually, that's not right: it matters just
as much but books generally have more repetition (each spread looks
the same!) and less complexity, so you only have to get things right
once.

--tim
From: rydis (Martin Rydstr|m) @CD.Chalmers.SE
Subject: Re: OT: Typography [was: CLisp case sensitivity]
Date: 
Message-ID: <w4csm5gdq7u.fsf@boris.cd.chalmers.se>
Peter Seibel <·····@javamonkey.com> writes:
> The reason I was asking about tools was mostly idle curiosity about if
> there's a better way. My current problem is that while I have my whole
> book in version control once I had to start dealing with Apress's
> tools I lost the ability to track changes. Copy editing is done in
> Word and then final layout in Framemaker and both of those file
> formats are opaque to me. In a perfect world the book would stay in
> version control all the way through and I'd still be able to do diffs,
> greps, etc. of the latest typesetting source to see if certain fixes
> are in or whatever.

Framemaker, at least, has a text-based file format (that I believe is
one-to-one with regard to its binary format(s?)) called MIF. There are
tools to convert from binary to textual format, as well. The diffs
might be less than great, though, I guess.

'mr

-- 
[Emacs] is written in Lisp, which is the only computer language that is
beautiful.  -- Neal Stephenson, _In the Beginning was the Command Line_
From: Thomas A. Russ
Subject: Re: OT: Typography [was: CLisp case sensitivity]
Date: 
Message-ID: <ymiu0py3uh4.fsf@sevak.isi.edu>
Peter Seibel <·····@javamonkey.com> writes:

> So I don't know if I have any feeling for good typography but I'm
> curious, what tools would someone who did use to generate the output
> for, say, a book. I've been thinking about this lately because after
> writing my book and generating various kinds of output using my
> homebrew toolchain (with many thanks to Marc Battyani for cl-pdf and
> cl-typesetting) I'm annoyed at the tools I have to use for the final
> stages of book production. (Microsoft Word plays a role; enough said.)
> As I understand it the final version of the book is going to be
> produced using Framemaker or Quark. Can those tools be used with taste
> and style? Or are they just part of DTP hell?

Well, they certainly can be used with taste and style.  So can Word,
although it may be a little more difficult.  The main problem with all
of this is that to use it with taste and style requires a bit more work
and most importantly the knowledge about what taste and style entails in
publishing.

The problem is that the WYSIWYG approach provides some powerful tools
that can be easily misused without some grounding in design and
aesthetics.

Speaking of which, are there any good recommendations for books which
describe how to think about good typography?  Every once in a while I
feel like I should learn more, but don't know where to look.

> I've heard folks here
> say plenty of snide things about TeX but at least has the appealing
> feature of being text based--you could keep TeX files in a version
> control system and diff them and edit them in Emacs and all that good
> stuff. Is there something better? What do "real" typographers use?

Well, one appeal of TeX or my preferred version, LaTeX is that there is
a lot of typographical design already built in to the system.  In LaTeX
with its document classes, this is even more true.  That makes it easier
to produce something reasonable without having to learn the typographic
rules yourself.  On the other hand, if what you wish to accomplish is
not anticipated by the document class or styles, it can be very hard to
coerce the tool into doing it your way.

Being text-based does, as you note, have advantages from the perspective
of source control.  It also has a number of advantages if you want to
try generating parts of the document automatically.  It is fairly easy
to generate decent looking Lisp documentation from a source file and the
string comments by emitting LaTeX code.

> 
> -Peter
> 
> -- 
> Peter Seibel                                      ·····@javamonkey.com
> 
>          Lisp is the red pill. -- John Fraser, comp.lang.lisp

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Tim Bradshaw
Subject: Re: OT: Typography
Date: 
Message-ID: <ey3zmzps6v7.fsf@cley.com>
* Thomas A Russ wrote:

> Speaking of which, are there any good recommendations for books which
> describe how to think about good typography?  Every once in a while I
> feel like I should learn more, but don't know where to look.

The first book I read was `The Thames and Hudson manual of Typography',
by Ruari McLean, which I still return to frequently.  I think it is
out of print unfortunately.

The most recent book I read was `The New Typography', by Jan
Tschichold.  This is hardly a new book - first published, in German,
in 1928 - and Tschichold later changed his views very significantly,
but it was profoundly influential, and still makes you think.  The
current edition is unfortunately on annoyingly shiny paper, which I am
fairly sure the original was not.

--tim
From: Scott Andrew Borton
Subject: Re: OT: Typography
Date: 
Message-ID: <ubrc4i517.fsf@pp.htv.fi.invalid>
Tim Bradshaw <···@cley.com> writes:

> The first book I read was `The Thames and Hudson manual of Typography',
> by Ruari McLean, which I still return to frequently.  I think it is
> out of print unfortunately.

It's still in print in the UK. One nice thing about this book compared to
many of the others available today is that it was written before the age of
desktop publishing, so it concentrates on the actual art of typography,
rather than devoting chapters to installing fonts on Macs or whatnot. There
are still descriptions of historical printing technologies that I suppose
few of us are ever likely to use, but these continue to be interesting
because they provide background for many concepts and terms still in use
today.



--scott

-- 
Notebook on music and airports:
http://two-wugs.net/scott/
From: Tim Bradshaw
Subject: Re: OT: Typography
Date: 
Message-ID: <ey3mzvorrdu.fsf@cley.com>
* Scott Andrew Borton wrote:

> It's still in print in the UK. One nice thing about this book compared to
> many of the others available today is that it was written before the age of
> desktop publishing, so it concentrates on the actual art of typography,
> rather than devoting chapters to installing fonts on Macs or
> whatnot. 

It's good to know it's still in print.  As you say, a very important
criteria when choosing a book on typography is to pick one that
*doesn't* spend too much time talking about computers.

> There
> are still descriptions of historical printing technologies that I suppose
> few of us are ever likely to use, but these continue to be interesting
> because they provide background for many concepts and terms still in use
> today.

When I used to spend more time thinking about typography, I had
arguments with people who wanted to learn to design type by using some
computer package, while I insisted (and insist) that you should learn
to *draw* type (with pencil and paper, not a machine) first.  I also
think that if you ever get a chance to typeset things letterpress you
should take it like a shot.

--tim
From: Cameron MacKinnon
Subject: Re: OT: Typography
Date: 
Message-ID: <gf2dnVJHCp9R1UHcRVn-uA@golden.net>
Tim Bradshaw wrote:
> I also think that if you ever get a chance to typeset things
> letterpress you should take it like a shot.

A Linotype man from L.A.
Was having a difficult day.
With fingers abused
ETAOIN was used
And SHRDLU brought into play.

               - http://www.metaltype.co.uk/stories/story22.shtml
From: Thomas A. Russ
Subject: Re: OT: Typography
Date: 
Message-ID: <ymimzvn4p23.fsf@sevak.isi.edu>
Thanks, Tim and Gareth.
I'll add these to my list of things to look at.

-Tom.

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Gareth McCaughan
Subject: Re: OT: Typography [was: CLisp case sensitivity]
Date: 
Message-ID: <87vfacvo0n.fsf@g.mccaughan.ntlworld.com>
Thomas Russ wrote:

> Speaking of which, are there any good recommendations for books which
> describe how to think about good typography?  Every once in a while I
> feel like I should learn more, but don't know where to look.

"The elements of typographic style" by Robert Bringhurst;
already recommended elsewhere in this thread. Very good
indeed.

> Well, one appeal of TeX or my preferred version, LaTeX is that there is
> a lot of typographical design already built in to the system.  In LaTeX
> with its document classes, this is even more true.  That makes it easier
> to produce something reasonable without having to learn the typographic
> rules yourself.  On the other hand, if what you wish to accomplish is
> not anticipated by the document class or styles, it can be very hard to
> coerce the tool into doing it your way.

And, unfortunately, LaTeX's built in document classes
are mostly rather nasty. To my eye, anyway.

-- 
Gareth McCaughan
.sig under construc
From: Tim Bradshaw
Subject: Re: OT: Typography
Date: 
Message-ID: <ey3hdlytmms.fsf@cley.com>
* Peter Seibel wrote:
> Tim Bradshaw <···@cley.com> writes:

> So I don't know if I have any feeling for good typography but I'm
> curious, what tools would someone who did use to generate the output
> for, say, a book. I've been thinking about this lately because after
> writing my book and generating various kinds of output using my
> homebrew toolchain (with many thanks to Marc Battyani for cl-pdf and
> cl-typesetting) I'm annoyed at the tools I have to use for the final
> stages of book production. (Microsoft Word plays a role; enough said.)
> As I understand it the final version of the book is going to be
> produced using Framemaker or Quark. Can those tools be used with taste
> and style? Or are they just part of DTP hell? I've heard folks here
> say plenty of snide things about TeX but at least has the appealing
> feature of being text based--you could keep TeX files in a version
> control system and diff them and edit them in Emacs and all that good
> stuff. Is there something better? What do "real" typographers use?

I think that `DTP hell' has more to do with the user than the tool.
I do not know enough of Word's abilities to be able to make a
judgement on it: a lot of atrocities are produced with Word, but then
a lot of typographically extremely naive people use it.  I'm sure good
results can be achieved with framemaker (having seem some) and Quark.

From a systems rather than typographical point of view I'd be wary of
anything that didn't keep its documents in a form which could be
handled well by a general-purpose revision-control system (so: not one
specific to the application).  I think that effectively rules out
Word, anyway.  Anything that doesn't allow enforced style rules is
probably bad for multi-author documents, anyway.  Anything that
listens too much to whitespace space in the input is death.

TeX is typographically fine (modulo awful default styles in LaTeX, and
computer modern), but is horrible to program.

We use a hacked-together toolchain with source in DTML, generating
HTML which we then process with html2ps and a ps->pdf converter.
Extensive use of DTML macros mean the source documents are very
structured. It's OK but there are limitations (page breaks are a
pain). If I had time I'd investigate cl-pdf and so on and use one of
those as a backend for dtml documents (but I never will have time!).

I think `real' typographers (I'm definitely not one) use something
like Quark and a lot of experience.

--tim
From: Peter Seibel
Subject: Re: OT: Typography
Date: 
Message-ID: <m3acrqz5ra.fsf@javamonkey.com>
Tim Bradshaw <···@cley.com> writes:

> * Peter Seibel wrote:
>> Tim Bradshaw <···@cley.com> writes:
>
>> So I don't know if I have any feeling for good typography but I'm
>> curious, what tools would someone who did use to generate the output
>> for, say, a book. I've been thinking about this lately because after
>> writing my book and generating various kinds of output using my
>> homebrew toolchain (with many thanks to Marc Battyani for cl-pdf and
>> cl-typesetting) I'm annoyed at the tools I have to use for the final
>> stages of book production. (Microsoft Word plays a role; enough said.)
>> As I understand it the final version of the book is going to be
>> produced using Framemaker or Quark. Can those tools be used with taste
>> and style? Or are they just part of DTP hell? I've heard folks here
>> say plenty of snide things about TeX but at least has the appealing
>> feature of being text based--you could keep TeX files in a version
>> control system and diff them and edit them in Emacs and all that good
>> stuff. Is there something better? What do "real" typographers use?
>
> I think that `DTP hell' has more to do with the user than the tool.
> I do not know enough of Word's abilities to be able to make a
> judgement on it: a lot of atrocities are produced with Word, but
> then a lot of typographically extremely naive people use it. I'm
> sure good results can be achieved with framemaker (having seem some)
> and Quark.
>
> From a systems rather than typographical point of view I'd be wary
> of anything that didn't keep its documents in a form which could be
> handled well by a general-purpose revision-control system (so: not
> one specific to the application). I think that effectively rules out
> Word, anyway. 

Indeed. Worse, Word has its own "revision tracking" feature which
Apress is really into but which is pretty much a nightmare.

> Anything that doesn't allow enforced style rules is probably bad for
> multi-author documents, anyway. Anything that listens too much to
> whitespace space in the input is death.

Yes.

> TeX is typographically fine (modulo awful default styles in LaTeX, and
> computer modern), but is horrible to program.
>
> We use a hacked-together toolchain with source in DTML, generating
> HTML which we then process with html2ps and a ps->pdf converter.
> Extensive use of DTML macros mean the source documents are very
> structured. It's OK but there are limitations (page breaks are a
> pain). If I had time I'd investigate cl-pdf and so on and use one of
> those as a backend for dtml documents (but I never will have time!).

Well, I'm planning, after the actual writing of the book is done and I
turn to putting all the example code up on the web, to include a
colophon section on the web with all the code I used while writing the
book. I can tell you I was very happy when I got my cl-typesetting
backend working and could bag the html2ps + ps->pdf method of
generating PDF. The output was much nicer. Maybe you'll find something
in there to jumpstart your cl-pdf/cl-typesetting use. 

I also have three backends for my homebrew markup--HTML, PDF, and RTF.
The last one saves me from having to deal with Word except for in the
final round with the copy editor.

I also used cl-pdf to generate the diagrams in my book though I don't
know if Apress will actually use those versions. But it was a lot less
painful for me to "write" the diagrams than to muck around with an
actual drawing program. With a bit of time I can probably turn my
figure drawing macros into at least a better PIC.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Christopher C. Stacy
Subject: Re: OT: Typography [was: CLisp case sensitivity]
Date: 
Message-ID: <uwtuuno21.fsf@news.dtpq.com>
Write the entire book in all-caps, using some font that
looks like an ASR-33.  Re-title the book as "SHOUT LISP"
From: Julian Stecklina
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <86y8fzdoc1.fsf@goldenaxe.localnet>
Cameron MacKinnon <··········@clearspot.net> writes:

> Julian Stecklina wrote:
>> What pain is it to have symbols be converted to upcase by default? Do
>> you want a case-sensitive Lisp? Do you want Read, read and READ to be
>> three distinct symbols?
>
> Of course. Who wouldn't?

I would not. I think it's a pain in C. Why should it be better in CL?

> Are there programmers who would like to aesthetically improve their
> code (by their standards, not mine) or encode more information into
> their symbols via selective capitalization? Yes. They should not be
> made to feel that their choice is in any way unnatural or discouraged.

Why can't they?

> Or are you paranoid that one day YOU WILL BE STUCK IN AN ELECTRONIC
> JUNKYARD IN A THIRD WORLD METROPOLIS, AND YOUR ONLY CONNECTION TO YOUR
> LISP IMAGE WILL BE THROUGH A TERMINAL THAT DOES NOT SUPPORT MIXED CASE?

Come on...

Regards,
-- 
                    ____________________________
 Julian Stecklina  /  _________________________/
  ________________/  /
  \_________________/  LISP - truly beautiful
From: Adam Warner
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <pan.2004.12.15.03.21.32.706951@consulting.net.nz>
Hi Julian Stecklina,

>> Readtable case should be deprecated. Symbols should be interned as
>> written in source code and implementors should not have the burden of
>> implementing "historical" baggage that is difficult to get 100% right
>> (e.g. ABCL is continuing to squash :INVERT mode read and print errors).
> 
> What pain is it to have symbols be converted to upcase by default?

It's the reason _why_ they're uppercased which is the issue: Symbols in
the "COMMON-LISP" package are interned in uppercase.

> Do you want a case-sensitive Lisp?

Yes.

> Do you want Read, read and READ to be three distinct symbols?

Yes. With the symbol name to correspond with the textual name. This
eliminates many printing issues.

By the way I've started using uppercase text to refer to constants. It's
better than the +constant+ convention, especially when mixing constants
and arithmetic: (+ CONSTANT1 CONSTANT2) is simply more legible than
(+ +constant1+ +constant2+).

Also consider this: One Lisp is Unicode code point aware like CLISP and
SBCL and cannot uppercase a character such as #\ß within a string. So
when the symbol ß is interned its given the symbol name "ß".

Another hypothetical Lisp implementation uppercases the string "ß" as
"SS". So the symbol "ß" is interned with the symbol name "SS".

When converting back from the internal encodings to a common external
encoding the symbols names /no longer correspond/. This is caused by the
unnecessary case conversion. If the case had been left alone the
differing Unicode capabilities of the implementations would have been
irrelevant to textual symbol identity.

Regards,
Adam
From: Pascal Bourguignon
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <873by8maii.fsf@thalassa.informatimago.com>
Adam Warner <······@consulting.net.nz> writes:
> Another hypothetical Lisp implementation uppercases the string "�" as
> "SS". So the symbol "�" is interned with the symbol name "SS".
> 
> When converting back from the internal encodings to a common external
> encoding the symbols names /no longer correspond/. This is caused by the
> unnecessary case conversion. If the case had been left alone the
> differing Unicode capabilities of the implementations would have been
> irrelevant to textual symbol identity.

Of course. That's why you should put:
    (setf (readtable-case *readtable*) :preserve)
in  your ~/.clisprc, and use my emacs M-x upcase-lisp RET command
to update old code.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Cats meow out of angst
"Thumbs! If only we had thumbs!
We could break so much!"
From: Brian Downing
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <d56wd.238920$HA.100632@attbi_s01>
In article <··············@thalassa.informatimago.com>,
Pascal Bourguignon  <····@mouse-potato.com> wrote:
> Adam Warner <······@consulting.net.nz> writes:
> > Another hypothetical Lisp implementation uppercases the string "�" as
> > "SS". So the symbol "�" is interned with the symbol name "SS".
> > 
> > When converting back from the internal encodings to a common external
> > encoding the symbols names /no longer correspond/. This is caused by the
> > unnecessary case conversion. If the case had been left alone the
> > differing Unicode capabilities of the implementations would have been
> > irrelevant to textual symbol identity.
> 
> Of course. That's why you should put:
>     (setf (readtable-case *readtable*) :preserve)
> in  your ~/.clisprc, and use my emacs M-x upcase-lisp RET command
> to update old code.

I'm glad that works for you.  Personally, I think that's impossibly
ugly, and would not write or maintain code like that.

(I'm not arguing for a case-sensitive Lisp, either.  I have no problem
personally with the current standard, but I can understand why some
would.)

-bcd
-- 
*** Brian Downing <bdowning at lavos dot net> 
From: Julian Stecklina
Subject: Re: CLisp case sensitivity
Date: 
Message-ID: <863by7f315.fsf@goldenaxe.localnet>
Adam Warner <······@consulting.net.nz> writes:

>> Do you want Read, read and READ to be three distinct symbols?
>
> Yes. With the symbol name to correspond with the textual name. This
> eliminates many printing issues.

E.g.?

> Also consider this: One Lisp is Unicode code point aware like CLISP and
> SBCL and cannot uppercase a character such as #\ß within a string. So
> when the symbol ß is interned its given the symbol name "ß".
> Another hypothetical Lisp implementation uppercases the string "ß" as
> "SS". So the symbol "ß" is interned with the symbol name "SS".

This is a problem of not supporting Unicode and getting file formats wrong.


Regards,
-- 
                    ____________________________
 Julian Stecklina  /  _________________________/
  ________________/  /
  \_________________/  LISP - truly beautiful