From: Russell Wallace
Subject: Name for the set of characters legal in identifiers
Date: 
Message-ID: <4004c6e2.127807819@news.eircom.net>
A trivial little question, but one that's been bugging me: Is there a
name for that set of characters legal in Lisp identifiers? For most
languages this would be "alphanumeric" (perhaps with a footnote that _
is regarded as a letter in this context), but Lisp includes characters
like + and - that most languages regard as punctuation.

Thanks,

-- 
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace

From: Wade Humeniuk
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <r24Nb.10822$wf1.9018@edtnps89>
Russell Wallace wrote:
> A trivial little question, but one that's been bugging me: Is there a
> name for that set of characters legal in Lisp identifiers?

In CL that would be _all_.

Wade
From: Erik Naggum
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <3283047574505462KL2065E@naggum.no>
* Russell Wallace
| A trivial little question, but one that's been bugging me: Is there
| a name for that set of characters legal in Lisp identifiers?  For
| most languages this would be "alphanumeric" (perhaps with a footnote
| that _ is regarded as a letter in this context), but Lisp includes
| characters like + and - that most languages regard as punctuation.

  The type STANDARD-CHAR covers the set of characters from which all
  symbols in the standard packages are made.  This simple fact may
  give rise to the invalid assumption that there must be a particular
  character set from which all symbols must be made.

  However, the functions INTERN and MAKE-SYMBOL take a STRING as the
  name of the symbol to be created, and there is no restriction on
  this /string/ to be of type BASE-STRING.  Likewise, the value of
  SYMBOL-NAME is only specified to be of type STRING, with no mention
  of the common observation that it may be a SIMPLE-STRING regardless
  of whether the corresponding argument to INTERN or MAKE-SYMBOL was.

  Since the symbols are normally created by the Common Lisp reader,
  your question is therefore really which characters the reader is
  able to build into a string that it will pass to INTERN.  There is
  no upper bound on this character set in the standard, but an actual
  implementation will necessarily place restrictions on this set.  In
  the worst case, the Common Lisp reader does not understand which
  character is has just read the encoding of, and may produce symbols
  with garbage bytes that nevertheless reproduce the character in your
  editor or other character display equipment.

  Pessimistically, therefore, your question is whether you will find
  any mention in the standard of any invalid characters in symbols,
  but you find quite the opposite: After a single-escape character,
  normally \, any following character will be a constituent character
  in the symbol name being read, and between the multiple-escape
  characters, normally |, all characters will be constituent.  The
  best you can hope for is thus that whatever reads the byte stream
  that is your source file will reject unacceptable encodings.  As
  long as you use an encoded character set that includes the standard
  characters, there is no restriction on what you can do, and if you
  use an encoding that does not confuse standard characters and one of
  your other characters even in the least capable decoders, you will
  find that there is not even any useful restriction on the /length/
  of Common Lisp symbol names.

  Optimistically, however, the answer to your question is that the set
  of characters that are legal in identifiers is the standard-class
  CHARACTER, but you may not be able to produce all of them in any
  given source file.

  I am particularly fond of using the non-breaking space in symbol
  names, just as I use it in filenames under operating systems that
  believe that ordinary spaces are separators regardless of how much
  effort one puts into convincing its various programs otherwise.  I
  know people who think there ought to be laws against this practice,
  but sadly, the Common Lisp standard does not come to their aid.

-- 
Erik Naggum | Oslo, Norway                      Yes, I survived 2003.

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Duane Rettig
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <4n08r6nnd.fsf@franz.com>
> Erik Naggum | Oslo, Norway                      Yes, I survived 2003.

Welcome back, Erik!

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Russell Wallace
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <4004f268.138951618@news.eircom.net>
On 14 Jan 2004 05:39:34 +0000, Erik Naggum <····@naggum.no> wrote:

>  However, the functions INTERN and MAKE-SYMBOL take a STRING as the
>  name of the symbol to be created, and there is no restriction on
>  this /string/ to be of type BASE-STRING.  Likewise, the value of
>  SYMBOL-NAME is only specified to be of type STRING, with no mention
>  of the common observation that it may be a SIMPLE-STRING regardless
>  of whether the corresponding argument to INTERN or MAKE-SYMBOL was.

Welcome back, Erik!

Thanks for the explanation - okay, so basically any character _can_ be
part of a symbol... fair enough... my question is really about the
English terminology, though. That is, say you write...

 (defun +-?-+ ...)

...that's fine, you can use the characters +, - and ? in a function
name, they're... "constituent characters", one poster said? Whereas if
you write...

 (defun )(')( ...)

That won't work; (, ) and ' are "punctuation" (?) and normally
recognized by the reader as special characters. (I'm talking about the
normal case, not what you can persuade the reader, interner or
whatever to do if you try hard enough :)) So there's "whitespace",
"punctuation" and... what's the third category called? Not
"alphanumeric"... "constituent characters"?

-- 
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace
From: Erik Naggum
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <3283057362064279KL2065E@naggum.no>
* Russell Wallace
| Thanks for the explanation - okay, so basically any character _can_
| be part of a symbol... fair enough... my question is really about
| the English terminology, though.

  The terminology is really pretty simple, but you have to look at it
  from the right angle.  In languages that require identifiers to be
  made up of particular characters, there is obviously a name for the
  character set, but in a language that goes out of its way to make it
  possible to use absolutely any character you want, there are only
  names for those characters that need special treatment to become
  part of a symbol name because their "normal" function is not to.

| Whereas if you write...
| 
|  (defun )(')( ...)
| 
| That won't work; (, ) and ' are "punctuation" (?) and normally
| recognized by the reader as special characters.

  Well, they are known as "macro characters".  The important thing is
  that the set of macro characters is not defined by the language, but
  by the readtable in effect when the Common Lisp reader processes
  your source.  There is a standard readtable, however, and one would
  have to say "unescaped terminating macro characters in the standard
  readtable" or another phrasing that tries to hide the obvious anal
  retentiveness to really speak about the characters that will not be
  part of a symbol name unless you have changed the rules.  There is
  nothing particularly special about any of these macro characters.
  There are some restrictions on what the readtable can do and how the
  reader collects characters into symbol names.  If you really insist,
  calling them "constituent characters" will help, but realize that
  this property is a result of falling through every other test --
  unless it is escaped, in which case it wins its constituency right
  away.  (There's an awful pun waiting to happen here, about Iowa, but
  I'll ignore the temptation.)

| (I'm talking about the normal case, not what you can persuade the
| reader, interner or whatever to do if you try hard enough :))

  While this may seem reasonable from the angle you chose to look at
  this problem, it is the a priori reasonability of the position that
  has produced your problem.  It is in fact unreasonable to approach
  Common Lisp from this angle.  The problem does not exist.  This

  (defun |)(')(| ...)

  is in fact fully valid Common Lisp code.  You cannot define away the
  solution to the problem and insist that you still have a problem in
  need of an answer.

| So there's "whitespace", "punctuation" and... what's the third
| category called? Not "alphanumeric"... "constituent characters"?

  I have to zoom out and ask you what you would do with the elusive
  name for this category.  If I guess correctly at your intentions, I
  would perhaps have said that "any character can be part of a symbol
  name, but most macro characters need to be escaped to prevent them
  from having their macro function".  (The important exception is #,
  the only non-terminating macro character in the standard readtable,
  meaning that #xF will be interpreted as hexadecimal number, but F#x
  is a three-character-long symbol name with a # in it.)

  Unless you have a simple need that can be resolved by a nice, vague
  explanation that only informs your reader that Common Lisp is a lot
  different from languages that require particular characters in the
  names of identifiers/symbols, I think Chapter 23 in the standard, on
  the Common Lisp Reader, would be a really good suggestion right now.

  Yeah, I'm back allright, with undesirably high levels of precision,
  scaring away frail newbies from day one.  Maybe I'll go hibernate.

-- 
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Russell Wallace
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <400527b7.152600479@news.eircom.net>
On 14 Jan 2004 08:22:42 +0000, Erik Naggum <····@naggum.no> wrote:

>  Well, they are known as "macro characters".  The important thing is
>  that the set of macro characters is not defined by the language, but
>  by the readtable in effect when the Common Lisp reader processes
>  your source.  There is a standard readtable, however, and one would
>  have to say "unescaped terminating macro characters in the standard
>  readtable" or another phrasing that tries to hide the obvious anal
>  retentiveness to really speak about the characters that will not be
>  part of a symbol name unless you have changed the rules.

Right, so another way of phrasing my question would be: is there a
shorter term for the noun phrase "unescaped..." above :)

>  While this may seem reasonable from the angle you chose to look at
>  this problem, it is the a priori reasonability of the position that
>  has produced your problem.  It is in fact unreasonable to approach
>  Common Lisp from this angle.  The problem does not exist.

You're right, of course, and if my objective was to understand Common
Lisp, I wouldn't give this issue any more thought - it isn't a problem
in that language.

>  I have to zoom out and ask you what you would do with the elusive
>  name for this category.

What I'm actually doing is designing a new language that's intended to
share Lisp's property of allowing characters like + and - in symbols
(though not the feature of also allowing things like brackets in
symbols if you ask nicely), and I found when thinking about the syntax
I was making heavy use of a concept I didn't have a name for, which
rather bugged me; Lisp is one of the very few languages which allow
non-alphanumeric characters in symbols, so I was wondering if it had a
name for the concept.

It seems the answer is that it doesn't have a name because it doesn't
particularly need the concept... hmm. I think I'll call them "ordinary
characters".

>  Yeah, I'm back allright, with undesirably high levels of precision,
>  scaring away frail newbies from day one.  Maybe I'll go hibernate.

*grin* No, stick around. The newsgroup's more fun with you around.

-- 
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace
From: Don Geddis
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <873cainy43.fsf@sidious.geddis.org>
················@eircom.net (Russell Wallace) writes:
> What I'm actually doing is designing a new language that's intended to
> share Lisp's property of allowing characters like + and - in symbols
> (though not the feature of also allowing things like brackets in
> symbols if you ask nicely)

I think you're still missing the point.  As Erik explained, _all_ characters
are valid in a Lisp symbol name.

You seem to be trying to find the set of characters that don't require
escaping in order to use them in symbol names.  This is really a question about
the Lisp reader.  Basically, things will get turned into symbols if they don't
parse as some other kind of thing.

I think you're mistaken to assume there is some subset of characters in CL that
does what you want.  Otherwise, what do you think of this:

        Lisp> (type-of '123)
        FIXNUM
        Lisp> (type-of '123d0)
        DOUBLE-FLOAT
        Lisp> (type-of 'd1230)
        SYMBOL
        Lisp> (type-of '123j0)
        SYMBOL

If your concern is what you can type to the reader, to result in a symbol,
the answer is not simply a subset of characters.  The syntax of those
characters matters a lot as well.  Are numerals in your set?  By themselves,
without escaping, the reader will turn them into numbers, not symbols.
How about the letter "d", along with some numerals?  Depends where in the
sequence it appears.

All of the sequences above, if escaped, can be the names of symbols.  If not
escaped, then whether they become symbols or not when passed through the
reader is _not_ a simple matter of character subsets; it's a matter of
fallthrough in a series of parse attempts.

(And yes, I'm sure you can find a sufficiently small subset of characters, such
that any sequence from the subset will parse only as a symbol.  But that set
is much _smaller_ than alphanumeric, whereas you were clearly looking for
a subset of characters larger than that, e.g. including punctuation.)

        -- Don
_______________________________________________________________________________
Don Geddis                  http://don.geddis.org/               ···@geddis.org
Underachievement:  The tallest blade of grass is the first to be cut by the
lawnmower.  -- Despair.com
From: Russell Wallace
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <4005c07e.191717219@news.eircom.net>
On 14 Jan 2004 11:12:12 -0800, Don Geddis <···@geddis.org> wrote:

>I think you're still missing the point.  As Erik explained, _all_ characters
>are valid in a Lisp symbol name.

No, that's fine, I understand that - my question wasn't about Lisp,
but about English terminology. I gather from Erik's explanation that
the answer is "Lisp doesn't regard any such set as special enough to
merit a short name", though, so I'll just make up one myself,
something like "ordinary characters".

-- 
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace
From: Don Geddis
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <878yk9maa2.fsf@sidious.geddis.org>
················@eircom.net (Russell Wallace) writes:
> my question wasn't about Lisp, but about English terminology. I gather from
> Erik's explanation that the answer is "Lisp doesn't regard any such set as
> special enough to merit a short name", though, so I'll just make up one
> myself, something like "ordinary characters".

I think you're still making a conceptual error.  You're all concerned about
the name for this concept, but the problem (in Lisp) is that the concept
itself doesn't exist.

There are CHARACTERs, which for example can be put together into STRINGs.
SYMBOLs have names which are STRINGs, composed of any CHARACTER at all.

There is _no_ (sub)set of CHARACTERs in Lisp which does what you want.
You're searching for the name of a concept, but the concept itself is not
well-formed.  No wonder it doesn't have a name.

(In particular: whether the CL reader interprets a token as a symbol is a
result of a parsing algorithm, not a result of whether the constituent
characters are in your magic subset or not.  If the parser can't interpret
the token as some other data type, then it becomes a symbol.  You're imagining
the wrong algorithm for choosing to make a token into a symbol.)

        -- Don
_______________________________________________________________________________
Don Geddis                  http://don.geddis.org/               ···@geddis.org
From: Thomas F. Burdick
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <xcvisjel3td.fsf@famine.OCF.Berkeley.EDU>
················@eircom.net (Russell Wallace) writes:

> What I'm actually doing is designing a new language that's intended to
> share Lisp's property of allowing characters like + and - in symbols
> (though not the feature of also allowing things like brackets in
> symbols if you ask nicely)

So you won't be having first-class symbols?  I'd be pretty appalled if
I couldn't give make-symbol any arbitrary string.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Russell Wallace
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <4005bfbc.191523371@news.eircom.net>
On 14 Jan 2004 11:37:18 -0800, ···@famine.OCF.Berkeley.EDU (Thomas F.
Burdick) wrote:

>················@eircom.net (Russell Wallace) writes:
>
>> What I'm actually doing is designing a new language that's intended to
>> share Lisp's property of allowing characters like + and - in symbols
>> (though not the feature of also allowing things like brackets in
>> symbols if you ask nicely)
>
>So you won't be having first-class symbols?

Right.

>I'd be pretty appalled if
>I couldn't give make-symbol any arbitrary string.

Well, in Common Lisp you'd probably be right. Arete (provisional name
for my new language) is designed differently - symbols are only used
for lexically scoped name-value mappings; strings do most of the other
things you use symbols for in Lisp. (For example, 'FOO is just
syntactic sugar for "FOO", it's not a symbol.)

-- 
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace
From: Lars Brinkhoff
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <85d69m1ywt.fsf@junk.nocrew.org>
················@eircom.net (Russell Wallace) writes:
> I was making heavy use of a concept I didn't have a name for, which
> rather bugged me; Lisp is one of the very few languages which allow
> non-alphanumeric characters in symbols

So does Forth, so perhaps programmers using that language have a name
for it.

-- 
Lars Brinkhoff,         Services for Unix, Linux, GCC, HTTP
Brinkhoff Consulting    http://www.brinkhoff.se/
From: Russell Wallace
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <40058fce.179251201@news.eircom.net>
On 14 Jan 2004 13:45:54 +0100, Lars Brinkhoff <·········@nocrew.org>
wrote:

>················@eircom.net (Russell Wallace) writes:
>> I was making heavy use of a concept I didn't have a name for, which
>> rather bugged me; Lisp is one of the very few languages which allow
>> non-alphanumeric characters in symbols
>
>So does Forth, so perhaps programmers using that language have a name
>for it.

So it does; good idea. I'll try asking there, thanks.

-- 
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace
From: Pascal Costanza
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <bu4h0q$d62$1@newsreader2.netcologne.de>
Russell Wallace wrote:

> What I'm actually doing is designing a new language that's intended to
> share Lisp's property of allowing characters like + and - in symbols
> (though not the feature of also allowing things like brackets in
> symbols if you ask nicely), and I found when thinking about the syntax
> I was making heavy use of a concept I didn't have a name for, which
> rather bugged me; Lisp is one of the very few languages which allow
> non-alphanumeric characters in symbols, so I was wondering if it had a
> name for the concept.

I don't know any language that has a name for this concept. Instead, you 
will find grammars for most languages, in BNF notation or something 
along these lines, that define what characters are accepted as part of 
identifiers. Chapter 2.2 in the HyperSpec is pretty close to what other 
languages do in this regard, for example.

When defining a new language, it's probably a good idea to define such a 
grammar at a certain stage anyway, and try to convince yourself that 
it's an LL(1) grammar. Minimizing the lookahead that's needed for 
parsing a program source is likely to improve the programmer's 
understanding of the language.

As a result you will get a single definitive point to refer to when 
someone wants to know what characters are accepted. That's probably 
better than inventing a term for this concept. Later on you can just use 
terms like "identifier" or "symbol", and it's clear from the grammar 
what is meant.

Further note that the idea to include characters like + and - in 
identifiers is IMHO only a good idea in prefix and probably postfix 
languages. In infix languages, it's very likely to be confusing when a+b 
and a + b mean different things. (If your language is not an infix 
language, then just forget this remark. ;)


Pascal

-- 
Tyler: "How's that working out for you?"
Jack: "Great."
Tyler: "Keep it up, then."
From: Russell Wallace
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <4005c949.193968539@news.eircom.net>
On Wed, 14 Jan 2004 23:49:02 +0100, Pascal Costanza <········@web.de>
wrote:

>When defining a new language, it's probably a good idea to define such a 
>grammar at a certain stage anyway, and try to convince yourself that 
>it's an LL(1) grammar. Minimizing the lookahead that's needed for 
>parsing a program source is likely to improve the programmer's 
>understanding of the language.

*nod-nod* I agree completely. I've the outline of a BNF grammar
sketched in my head, and I'm pretty sure it's LL(1). Simple grammer is
good ^.^

>Further note that the idea to include characters like + and - in 
>identifiers is IMHO only a good idea in prefix and probably postfix 
>languages. In infix languages, it's very likely to be confusing when a+b 
>and a + b mean different things. (If your language is not an infix 
>language, then just forget this remark. ;)

It is an infix language, and I agree that's a downside. I just think
it's very heavily outweighed by the ability to write multiword
identifiers with dashes instead of mixed case.

-- 
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace
From: james anderson
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <40052048.2C429E1F@setf.de>
Erik Naggum wrote:
> 
> * Russell Wallace
> | Thanks for the explanation - okay, so basically any character _can_
> | be part of a symbol... fair enough... my question is really about
> | the English terminology, though.
> 
>   ...
> 
> | So there's "whitespace", "punctuation" and... what's the third
> | category called? Not "alphanumeric"... "constituent characters"?
> 
>   I have to zoom out and ask you what you would do with the elusive
>   name for this category.  If I guess correctly at your intentions, I
>   would perhaps have said that "any character can be part of a symbol
>   name, but most macro characters need to be escaped to prevent them
>   from having their macro function".  (The important exception is #,
>   the only non-terminating macro character in the standard readtable,
>   meaning that #xF will be interpreted as hexadecimal number, but F#x
>   is a three-character-long symbol name with a # in it.)
> 
>   Unless you have a simple need that can be resolved by a nice, vague
>   explanation that only informs your reader that Common Lisp is a lot
>   different from languages that require particular characters in the
>   names of identifiers/symbols, I think Chapter 23 in the standard, on
>   the Common Lisp Reader, would be a really good suggestion right now.
> 

i would have thought that a useful characterization would be "constituent
character in the current readtable, with the constituent traits 'alphabetic'
or 'alphadigit'", as that describes the set of characters which could be read,
without escaping, as part of a symbol name, by means of readtable adjustments
with set-syntax-from-char.

upon experimentation, however, i observe that

? (defun test-constituent-character (code)
  (handler-case
    (read-from-string (concatenate 'string "a" (string (code-char code)) "b"))
    (error (e) e)))
TEST-CONSTITUENT-CHARACTER
? (let ((*rt* (copy-readtable)))
    (dotimes (i char-code-limit)
      (set-syntax-from-char (code-char i)  #\a *rt*))
    (let ((result nil)
          (*readtable* *rt*))
      (dotimes (i char-code-limit)
        (typecase (setf result (test-constituent-character i))
          (symbol )
          (t (format *trace-output* "~%~6,'0d (~c) : *** : ~a"
                     i (code-char i) result))))))

000058 (:) : *** : There is no package named "A" .
NIL
? 

i would have expected the token parser to have signaled errors when reading
from strings which contained those characters for which 2.1.4.2 specifies the
constituent trait 'invalid'.

is this an implementation bug, or have i misunderstood 2.1.4.2?

...
From: Erik Naggum
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <3283091270658632KL2065E@naggum.no>
* james anderson
| upon experimentation, however, i observe that

  Your experiment has only uncovered that it is impossible to override
  the package marker status of colon.  Other than that, you have only
  clobbered the constituent traits of all characters, forcing them the
  same as for #\a.  It is unclear which hypotheses your experiment has
  actually tested.

  This goes to show that : must always be escaped if it is to be part
  of a symbol name, however, further complicating the "name" for the
  set of allowable characters in a symbol.

-- 
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: james anderson
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <40058985.FF0A6B21@setf.de>
Erik Naggum wrote:
> 
> * james anderson
> | upon experimentation, however, i observe that
> 
>   Your experiment has only uncovered that it is impossible to override
>   the package marker status of colon.  Other than that, you have only
>   clobbered the constituent traits of all characters, forcing them the
>   same as for #\a.  It is unclear which hypotheses your experiment has
>   actually tested.
> 
the hypothesis was that the constituent traits as set out in the table on
standard and semi-standard characters, which traits are not supposed to be
clobbered by set-syntax-from-char, would be useful to characterise the set of
characters which could be used in symbol names without explicit escaping.

>   This goes to show that : must always be escaped if it is to be part
>   of a symbol name, however, further complicating the "name" for the
>   set of allowable characters in a symbol.

i would have expected the same status as that for #\: to apply to whitespace
characters and to rubout.

...
From: Erik Naggum
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <3283105015922595KL2065E@naggum.no>
* james anderson
| the hypothesis was that the constituent traits as set out in the
| table on standard and semi-standard characters, which traits are not
| supposed to be clobbered by set-syntax-from-char, would be useful to
| characterise the set of characters which could be used in symbol
| names without explicit escaping.

  That does not appear to be an unreasonable hypothesis, but it was
  not the hypothesis you tested.  You tested whether a string of three
  characters, varying the middle one, would be read as a symbol or
  would signal an error.  Any number of middle characters that cause a
  termination of the reader algorithm will produce a symbol read from
  the first character, a letter.

| i would have expected the same status as that for #\: to apply to
| whitespace characters and to rubout.

  But (read-from-string "a b") will return a symbol, namely A, when
  the constituent trait of the space is /invalid/.  You did not test
  the length or any other property of the symbol-name of the returned
  symbol, only that it did not error.  The secondary value returned
  from READ-FROM-STRING should be educational.

-- 
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: james anderson
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <4005CB00.6301FFB3@setf.de>
Erik Naggum wrote:
> 
> * james anderson
> | the hypothesis was that the constituent traits as set out in the
> | table on standard and semi-standard characters, which traits are not
> | supposed to be clobbered by set-syntax-from-char, would be useful to
> | characterise the set of characters which could be used in symbol
> | names without explicit escaping.
> 
>   That does not appear to be an unreasonable hypothesis, but it was
>   not the hypothesis [the posted code] tested.  [It tested] whether a string of three
>   characters, varying the middle one, would be read as a symbol or
>   would signal an error.  Any number of middle characters that cause a
>   termination of the reader algorithm will produce a symbol read from
>   the first character, a letter.
> 
> | i would have expected the same status as that for #\: to apply to
> | whitespace characters and to rubout.
> 
>   But (read-from-string "a b") will return a symbol, namely A, when
>   the constituent trait of the space is /invalid/.

i had thought that circumstance was specified to signal an error. there was a
different version, which printed a bit too much to post, which noted and
printed everything - exactly because the result was a surprise, which neither
signalled an error, nor did it demonstrate the length-1-symbol-name behaviour. 

>       [The posted code] did not test
>   the length or any other property of the symbol-name of the returned
>   symbol, only that it did not error.  The secondary value returned
>   from READ-FROM-STRING should be educational.

it was always 3. 

...
From: Erik Naggum
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <3283128022497673KL2065E@naggum.no>
* Erik Naggum
> But (read-from-string "a b") will return a symbol, namely A, when
> the constituent trait of the space is /invalid/.

* james anderson
| i had thought that circumstance was specified to signal an error.

  Hm.  This appears to be unexplored territory.  You deserve credit
  for pointing to the map and the real world and urging me to take a
  closer look at both.

  We have the following situation: A character whose syntax type is
  /constituent/ is used to set the syntax type of a character whose
  previous syntax type was /whitespace/, but this means that the
  constituent trait of that character remains /invalid/, which makes
  the syntax type /invalid/.  According to the specification, such a
  character can never occur in the input except under the control of a
  single escape character, so (read-from-string "a b") should indeed
  signal an error, as per 2.1.4.3.  (In case anyone else wonders, the
  multiple escape mechanism already forces all characters to have the
  alphabetic trait.)

  I thought I caught an obvious oversight in your test, but it would
  have been strong enough to test the hypothesis, were it not for the
  sorry fact that none of the Common Lisp environments I have access
  to signal an error when encountering invalid characters in the input
  stream.

| it was always 3. 

  OK, then this is definitely surprising and in clear violation of the
  standard.  You're right that SET-SYNTAX-FROM-CHAR should not clobber
  the constituent trait for any character, not just the package marker.

  Where is that annoying conformance test guy who stresses the useless
  corners and boundary conditions of the standard when you need him?

-- 
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Paul F. Dietz
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <tZGdncYisbYlHpvd4p2dnA@dls.net>
Erik Naggum wrote:

>   Where is that annoying conformance test guy who stresses the useless
>   corners and boundary conditions of the standard when you need him?

I haven't tested the reader (much) yet, so I don't feel comfortable
offering an opinion on this at this time.

	Paul
	(who is trying to recover from attempting to test section 19)
From: Christophe Rhodes
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <sqd69ld39t.fsf@lambda.dyndns.org>
Erik Naggum <····@naggum.no> writes:

>   Where is that annoying conformance test guy who stresses the useless
>   corners and boundary conditions of the standard when you need him?

Since he may not respond to that description, I'll just say that
Paul's tests are currently in progress up to chapter 21 (Streams), so
it shouldn't be too long before chapter 23 (Reader) is breached.

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)
From: Thomas A. Russ
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <ymiptdk6eo9.fsf@sevak.isi.edu>
················@eircom.net (Russell Wallace) writes:
>  (defun )(')( ...)

(defun |)(;)| ( ...)

> That won't work; (, ) and ' are "punctuation" (?) and normally
> recognized by the reader as special characters. (I'm talking about the
> normal case, not what you can persuade the reader, interner or
> whatever to do if you try hard enough :))

Of course, surrounding the symbol name with vertical bars might be
considered "trying hard enough" by some people.


-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Kenny Tilton
Subject: XML->sexpr ideas [was Re: Name for the set of characters legal in identifiers]
Date: 
Message-ID: <JMFNb.122098$4F2.13538779@twister.nyc.rr.com>
Thomas A. Russ wrote:

> ················@eircom.net (Russell Wallace) writes:
> 
>> (defun )(')( ...)
> 
> 
> (defun |)(;)| ( ...)
> 
> 
>>That won't work; (, ) and ' are "punctuation" (?) and normally
>>recognized by the reader as special characters. (I'm talking about the
>>normal case, not what you can persuade the reader, interner or
>>whatever to do if you try hard enough :))
> 
> 
> Of course, surrounding the symbol name with vertical bars might be
> considered "trying hard enough" by some people.

That's how I might have felt until I found myself with the requirement 
to parse some useful metadata out of an XML dtd. <yechh> Now it seems 
like the most natural thing in the world to code:

    (case tag-id
      (|BeginString| ...)
      (|MsgType| ...)
      (|CheckSum| ...))

Speaking of which, this is related to a Lisp NYC project to create a toy 
exchange with a FIX (Financial Info Exchange) protocol interface. The 
original protocol was a flat "tag=value;"+ format. An XML version was 
developed and a DTD along with it, which I used just to get the metadata.

Now we want to leave the land of XML behind and write out a nice sexpr 
variant of the same metadata as /our/ spec (we do not have to worry 
about matching the real Fix tit for tat since this is a toy exchange not 
meant as a FIX client testbed).

Of course it is easy enough for me to come up with a sexpr format off 
the top of my head, but I seem to recall someone (Erik? Tim? Other?) 
saying they had done some work on a formal approach to an alternative to 
XML/HTML/whatever.

True that? If so, I am all ears.

kt


-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: Erik Naggum
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <3283503882585065KL2065E_-_@naggum.no>
* Kenny Tilton
| Of course it is easy enough for me to come up with a sexpr format off
| the top of my head, but I seem to recall someone (Erik? Tim? Other?)
| saying they had done some work on a formal approach to an alternative
| to XML/HTML/whatever.
| 
| True that? If so, I am all ears.

  Really?  You are?  Maybe I didn't survive 2003 and this is some Hell
  where people have to do eternal penance, and now I get to do SGML all
  over again.

  Much processing of SGML-like data appears to be stream-like and will
  therefore appear to be equivalent to an in-order traversal of a tree,
  which can therefore be represented with cons cells while the traverser
  maintains its own backward links elsewhere, but this is misleading.

  The amount of work and memory required to maintain the proper backward
  links and to make the right decisions is found in real applications to
  balloon and to cause random hacks; the query languages reflect this
  complexity.  Ease of access to the parent element is crucial to the
  decision-making process, so if one wants to use a simple list to keep
  track of this, the most natural thing is to create a list of the
  element type, the parent, and the contents, such that each element has
  the form (type parent . contents), but this has the annoying property
  that moving from a particular element to the next can only be done by
  remembering the position of the current element in a list, just as one
  cannot move to the next element in a list unless you keep the cons
  cell around.  However, the whole point of this exercise is to be able
  to keep only one pointer around.  So the contents of an element must
  have the form (type parent contents . tail) if it has element contents
  or simply a list of objects, or just the object if simple enough.

  Example: <foo>123</foo> would thus be represented by (foo nil "123"),
  <foo>123</foo><bar>456</bar> by (foo nil "123" bar nil "456"), and
  <zot><foo>123</foo><bar>456</bar></zot> by #1=(zot nil (foo #1# "123"
  bar #1# "456")).

  Navigation inside this kind of structure is easy: When the contents in
  CADDR is exhausted, the CDDDR is the next element, or if NIL, we have
  exhausted the contents of the parent and move up to the CADR and look
  for its next element, etc.  All the important edges of the containers
  that make up the *ML document are easily detectible and the operations
  that are usually found at the edges are normally tied to the element
  type (or as modified by its parents), are easily computable.  However,
  using a list for this is cumbersome, so I cooked up the �quad�.  The
  �quad� is devoid of any intrinsic meaning because it is intended to be
  a general data structure, so I looked for the best meaningless names
  for the slots/accessors, and decided on QAR, QBR, QCR, and QDR.  The
  quad points to the element type (like the operator in a sexpr) in the
  QAR, the parent (or back) quad in the QBR, the contents of the element
  in the QCR, and the usual pointer to the next quad in the QDR.

  Since the intent with this model is to �load� SGML/XML/SALT documents
  into memory, one important issue is how to represent long stretches of
  character content or binary content.  The quad can easily be used to
  represent a (sequence of) entity fragments, with the source in QAR,
  the start position in QBR, and the end position in QCR, thereby using
  a minimum of memory for the contents.  Since very large documents are
  intended to be loaded into memory, this property is central to the
  ability to search only selected elements for their contents -- most
  searching processors today parse the entire entity structure and do
  very little to maintain the parsed element structure.

  Speaking of memory, one simple and efficient way to implement the quad
  on systems that lack the ability to add native types without overhead,
  is to use a two-dimensional array with a second dimension of 4 and let
  quad pointers be integers, which is friendly to garbage collection and
  is unambiguous when the quad is used in the way explained above.

  Maybe I'll talk about SALT some other day.

-- 
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Kenny Tilton
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <60VOb.229245$0P1.199252@twister.nyc.rr.com>
Erik Naggum wrote:

> * Kenny Tilton
> | Of course it is easy enough for me to come up with a sexpr format off
> | the top of my head, but I seem to recall someone (Erik? Tim? Other?)
> | saying they had done some work on a formal approach to an alternative
> | to XML/HTML/whatever.
> | 
> | True that? If so, I am all ears.
> 
>   Really?  You are?  Maybe I didn't survive 2003 and this is some Hell
>   where people have to do eternal penance, and now I get to do SGML all
>   over again.

First, thx, <<quad>>s are nice. I was thinking about compiling some 
XML-alternative syntax into internal Lisp structures (which is why I was 
wondering why I even need someone else's proposal, I can just write the 
internal structures out as READable forms).

I see <<quads>> are something that allow one to navigate the structure 
itself, and that this is useful if one does not want to gobble up the 
whole of the structure. I'll keep <<quad>>s in mind if I ever want a 
random-access markup store.

kenny

-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: Erik Naggum
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <3283560535880752KL2065E@naggum.no>
* Kenny Tilton
| First, thx, <<quad>>s are nice.

  Heh.  My absence from news shows.  Over here in Europe, �� are the
  proper quotation marks, instead of the various versions of " that are
  not in ISO 8859-1.  The � and � are not integral to the name of the
  type, it's just "quad".

| I was thinking about compiling some XML-alternative syntax into
| internal Lisp structures (which is why I was wondering why I even need
| someone else's proposal, I can just write the internal structures out
| as READable forms).

  You always have to consider how much information you want to retain
  from the parsing process.  The sexpr contains just enough information
  for its uses, but the only navigation you ever do with sexprs is to go
  down the CAR or CDR.

| I see <<quads>> are something that allow one to navigate the structure
| itself, and that this is useful if one does not want to gobble up the
| whole of the structure.

  Hm, I think it makes most sense when you do want to gobble up the
  whole of the structure.  The point about storing pointers to entity
  fragments using quads, too, was that contents usually dwarfs the
  markup in volume.  When end-tags make up 25% of the volume of the
  document, however, the start-tags make up another 25%, and when I
  designed the quad and its various implementations in Common Lisp and
  in special languages, the strong desire was to be able to load large
  documents into memory.

| I'll keep <<quad>>s in mind if I ever want a random-access markup
| store.

  That seems like you decided on their utility before trying them out,
  while I have really tried to build a system as useful for XML-like
  data as the cons cell is for Lisp-like data.  Instead of inventing the
  array and regarding cons cells as random access into the list, we just
  use lists made up cons cells because that affords the navigation we
  need when processing them.  Likewise, the quad affords the navigation
  we need when processing XML-like structures.  When I suggest that an
  implementation that does not provide the ability to add native types
  use a two-dimensional array, it is not because it makes random access
  into the document possible but because it saves a lot of memory.

-- 
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Kenny Tilton
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <ON4Pb.87593$cM1.15644702@twister.nyc.rr.com>
Erik Naggum wrote:

> * Kenny Tilton
> | I was thinking about compiling some XML-alternative syntax into
> | internal Lisp structures (which is why I was wondering why I even need
> | someone else's proposal, I can just write the internal structures out
> | as READable forms).
> 
>   You always have to consider how much information you want to retain
>   from the parsing process.  The sexpr contains just enough information
>   for its uses, but the only navigation you ever do with sexprs is to go
>   down the CAR or CDR.

Maybe I should have said more about what I am doing. I wrote a poor 
man's XML parser just so I could read a DTD just so I could get metadata 
about the required structure of Financial Info Exchange (FIX) protocol 
messages. The funny thing is we do not plan to support FIXML, but the 
DTD for it looked like the best source of metadata about the original 
"tag=value" format.

What we want to do is now leave the world of XML behind and just write 
out the metadata in some nice Lisp-friendly way.

The DTD was nothing more than !ENTITYs, !ELEMENTs, and !ATTLISTs. 
Anyway, I just created hashtables for entities and elements which I 
converted to structs, and the element struct had a slot for attributes, 
etc etc etc.

Now I want to write it all out readably so we can leave XML behind. As 
it is, I had to fill in some gaps by adding to the DTD so the parse 
could produce the Right Thing (this being more fun than the alternative 
of hardcoded additions to the internal structures post-parse).

It's a little fuzzy, but one element would define a record and have a 
content string that listed all the field elements. So at run time I 
dynamically use bits of the same parser to read that string and 
determine the fields (I sensed that I had to leave it to the last second 
to support dynamic redefinition of elements, but perhaps this step could 
also be <<pre-compiled>>) and then look up the fields to determine their 
attributes in turn to assist with parsing of the field data.

> | I'll keep <<quad>>s in mind if I ever want a random-access markup
> | store.
> 
>   That seems like you decided on their utility before trying them out,

Maybe I just misunderstood. If quads just give me a link to the parent, 
well, in the case of the DTD, all the entities, elements, and attributes 
had the same parent, the XML dtd document. So I imagined an awful lot of 
serial searching, repeated over and over again for the same message 
type, and yes, I made a gut determination that I could use the names of 
things as keys to a hash table and turn a record expansion into so many 
keyed lookups.

Well, maybe I am all wet. If performance is my concern, I need only 
memoize things like record expansions, something I should do anyway even 
with the keyed lookups. Memoization will internally involve its own hash 
tables, but at least they are hidden behind a functional interface, 
which would be nice.

> However, the whole point of this exercise is to be able
>   to keep only one pointer around.  So the contents of an element must
>   have the form (type parent contents . tail) if it has element contents
>   or simply a list of objects, or just the object if simple enough.
> 
>   Example: <foo>123</foo> would thus be represented by (foo nil "123"),
>   <foo>123</foo><bar>456</bar> by (foo nil "123" bar nil "456"), and
>   <zot><foo>123</foo><bar>456</bar></zot> by #1=(zot nil (foo #1# "123"
>   bar #1# "456")).

Do we need each child to refer to its parent? Why not a format with the 
parent first and then one or more children understood to share the same 
parent?

    #1=(nil zot (#1# foo "123" bar "456"))

?

kt


-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: Marco Antoniotti
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <v8fPb.456$Nq.112066@typhoon.nyu.edu>
Kenny Tilton wrote:
> 
> 
> Erik Naggum wrote:
> 
>> * Kenny Tilton
>> | I was thinking about compiling some XML-alternative syntax into
>> | internal Lisp structures (which is why I was wondering why I even need
>> | someone else's proposal, I can just write the internal structures out
>> | as READable forms).
>>
>>   You always have to consider how much information you want to retain
>>   from the parsing process.  The sexpr contains just enough information
>>   for its uses, but the only navigation you ever do with sexprs is to go
>>   down the CAR or CDR.
> 
> 
> Maybe I should have said more about what I am doing. I wrote a poor 
> man's XML parser just so I could read a DTD just so I could get metadata 
> about the required structure of Financial Info Exchange (FIX) protocol 
> messages. The funny thing is we do not plan to support FIXML, but the 
> DTD for it looked like the best source of metadata about the original 
> "tag=value" format.

What's wrong with CL-XML?


Cheers
--
Marco
From: Kenny Tilton
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <3rfPb.88620$cM1.15783932@twister.nyc.rr.com>
Marco Antoniotti wrote:
> 
> 
> Kenny Tilton wrote:
> 
>>
>>
>> Erik Naggum wrote:
>>
>>> * Kenny Tilton
>>> | I was thinking about compiling some XML-alternative syntax into
>>> | internal Lisp structures (which is why I was wondering why I even need
>>> | someone else's proposal, I can just write the internal structures out
>>> | as READable forms).
>>>
>>>   You always have to consider how much information you want to retain
>>>   from the parsing process.  The sexpr contains just enough information
>>>   for its uses, but the only navigation you ever do with sexprs is to go
>>>   down the CAR or CDR.
>>
>>
>>
>> Maybe I should have said more about what I am doing. I wrote a poor 
>> man's XML parser just so I could read a DTD just so I could get 
>> metadata about the required structure of Financial Info Exchange (FIX) 
>> protocol messages. The funny thing is we do not plan to support FIXML, 
>> but the DTD for it looked like the best source of metadata about the 
>> original "tag=value" format.
> 
> 
> What's wrong with CL-XML?

I couldn't understand the installation instructions.

And the doc said it pulled things into CLOS instances, and then I would 
use XQuery/XPath/XCrap to get at the info. And I did not really want to 
do XML (in which case getting all fancy like that might make sense), I 
just wanted to suck some info out of a DTD.

In fact, someone on the team already said I screwed up, I should have 
parsed an HTML file for the same info, which would be more accurate in 
certain dark corners where the XML orientation of the DTD diminishes the 
correspondence to the non-XML syntax.

And this is Lisp, I wrote my crappy hard-coded parser in less time than 
it would have taken to figure out how to install cl-xml. And about 100 
lines of code so no one on our team has to bother with cl-xml.

And now it's mine! All mine!!

:)

kt


-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: Edi Weitz
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <m38yk2nyrl.fsf@bird.agharta.de>
On Tue, 20 Jan 2004 19:36:31 GMT, Kenny Tilton <·······@nyc.rr.com> wrote:

> Marco Antoniotti wrote:
>>
>> What's wrong with CL-XML?
>
> I couldn't understand the installation instructions.

So I'm not the only one... :)

But you know that there are some more "lightweight" solutions out
there? You could have said

  (asdf-install:install :pxmlutils)

or

  (asdf-install:install :xmls)

and - voil�! (At least I hope so...)

CLiki is your friend.

Edi.
From: Kenny Tilton
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <ZkjPb.88700$cM1.15857356@twister.nyc.rr.com>
Edi Weitz wrote:
> On Tue, 20 Jan 2004 19:36:31 GMT, Kenny Tilton <·······@nyc.rr.com> wrote:
> 
> 
>>Marco Antoniotti wrote:
>>
>>>What's wrong with CL-XML?
>>
>>I couldn't understand the installation instructions.
> 
> 
> So I'm not the only one... :)
> 
> But you know that there are some more "lightweight" solutions out
> there? 

Aw, c'mon, I can write a hard-coded parser in my sleep. Besides, now I 
can put XML on the resume. I'll just have to pretend i wrote it in a 
real language.

:)

kt

-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: james anderson
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <400DB7CF.299789DB@setf.de>
Edi Weitz wrote:
> 
> On Tue, 20 Jan 2004 19:36:31 GMT, Kenny Tilton <·······@nyc.rr.com> wrote:
> 
> > Marco Antoniotti wrote:
> >>
> >> What's wrong with CL-XML?
> >
> > I couldn't understand the installation instructions.
> 
> So I'm not the only one... :)

the mind boggles.

the distribution unpacks to a directory at the top level of which is a
collection of files with names in the form

  load{+,-}cl-http{+,-}instanceNames.lisp

and a file

  load.lisp

which is a symbolic link to

  load-cl-http+instanceNames.lisp

that is to say, if one is using one of the supported lisp implementations and
one types

(load #p"<pathname to the load.lisp file")<return>

at the the repl prompt, one compiles and loads the parser in an environment
without cl-http and in a mode which implements names as instances (as opposed
to symbols).

which aspect of this prospective process does one find difficult to understand?

> 
> But you know that there are some more "lightweight" solutions out
> there? You could have said
> 
>   (asdf-install:install :pxmlutils)
> 
> or
> 
>   (asdf-install:install :xmls)
> 
> and - voil�! (At least I hope so...)

should there be any cause for uncertainty as to how to processed, the release
includes a directory

[tschichold:XML-0-949-20030409T2320-MACOS/tests/implementation]
janson% ls -l
total 0
drwxr-xr-x  3 janson  admin  102 Apr  9  2003 acl-5-0-1
drwxr-xr-x  3 janson  admin  102 Apr  9  2003 cmucl-18e+
drwxr-xr-x  3 janson  admin  102 Apr  9  2003 lispworks-4-2
drwxr-xr-x  3 janson  admin  102 Apr  9  2003 lispworks-4-3
drwxr-xr-x  3 janson  admin  102 Apr  9  2003 mcl-5-0b
drwxr-xr-x  3 janson  admin  102 Apr  9  2003 openmcl-0-13-3
[tschichold:XML-0-949-20030409T2320-MACOS/tests/implementation]
janson% 

which contains transcripts of the load process and the results of the oasis
conformance tests in the respective implementations.

...
From: Edi Weitz
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <m3r7xuma0e.fsf@bird.agharta.de>
On Wed, 21 Jan 2004 00:20:48 +0100, james anderson <··············@setf.de> wrote:

> the mind boggles.
>
> the distribution unpacks to a directory at the top level of which is
> a collection of files with names in the form
>
>   load{+,-}cl-http{+,-}instanceNames.lisp
>
> and a file
>
>   load.lisp
>
> which is a symbolic link to
>
>   load-cl-http+instanceNames.lisp
>
> that is to say, if one is using one of the supported lisp
> implementations and one types
>
> (load #p"<pathname to the load.lisp file")<return>
>
> at the the repl prompt, one compiles and loads the parser in an
> environment without cl-http and in a mode which implements names as
> instances (as opposed to symbols).
>
> which aspect of this prospective process does one find difficult to
> understand?

What you describe here is rather easy to understand. I suggest you add
it to the webpage

  <http://pws.prserv.net/James.Anderson/XML/documentation/howto/load.html>.

I'm not 100% sure but I think I remember that the last time I checked
I was supposed to install CL-HTTP before I could compile
CL-XML. That's a bit harder than just (LOAD "load.lisp").

Also, if I unpack the file XML-0-949-20030409.tgz on my machine the
"documentation" directory is empty except for three GIFs - no "README"
or "INSTALL" file. The file "load.lisp" is just one of five
"load*.lisp" files. The fact that it once was a symbolic link
obviously got lost in the tarball.

Edi.
From: james anderson
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <400DCFE1.7CA8A5FB@setf.de>
Edi Weitz wrote:
> 
> On Wed, 21 Jan 2004 00:20:48 +0100, james anderson <··············@setf.de> wrote:
> 
> > ...
> >
> > which aspect of this prospective process does one find difficult to
> > understand?
> 
> What you describe here is rather easy to understand. I suggest you add
> it to the webpage
> 
>   <http://pws.prserv.net/James.Anderson/XML/documentation/howto/load.html>.
> 

ok.

the irony of which is, that page was composed in response to an aversion, from
someone who had found the various load*.lisp files, that, once he had looked
at them, it was not clear how to load and use the xml-path library rather than
just the parser.

> I'm not 100% sure but I think I remember that the last time I checked
> I was supposed to install CL-HTTP before I could compile
> CL-XML. That's a bit harder than just (LOAD "load.lisp").

it would be helpful to hear what might have led you to that supposition. so
far as i can ascertain (i have code going back 4 years only) that has not been
the case for a long time. perhpas there's a note somewhere in the
documentation which is misleading?

> 
> Also, if I unpack the file XML-0-949-20030409.tgz on my machine the
> "documentation" directory is empty except for three GIFs - no "README"
> or "INSTALL" file. The file "load.lisp" is just one of five
> "load*.lisp" files. The fact that it once was a symbolic link
> obviously got lost in the tarball.

hmm. it would appear that i have to be more selective as to which version of
tar i use in the future. thanks for the hint.

...
From: Edi Weitz
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <m3ektum7fu.fsf@bird.agharta.de>
On Wed, 21 Jan 2004 02:03:33 +0100, james anderson <··············@setf.de> wrote:

> Edi Weitz wrote:
>> 
>> I'm not 100% sure but I think I remember that the last time I
>> checked I was supposed to install CL-HTTP before I could compile
>> CL-XML. That's a bit harder than just (LOAD "load.lisp").
>
> it would be helpful to hear what might have led you to that
> supposition. so far as i can ascertain (i have code going back 4
> years only) that has not been the case for a long time. perhpas
> there's a note somewhere in the documentation which is misleading?

Hmm, it definitely wasn't four years ago - I hardly knew about CL at
that time. It must have been around 2001/2002 and I didn't really try
hard to get CL-XML installed. (I didn't need it, I was just browsing.)
I read the docs and said to myself "That's too much of a hassle. Let's
try again later." (Mind you, that was when I was still fighting with
things like MK:DEFSYSTEM and ASDF. I have learned a thing or two
since.)

If CL-HTTP wasn't required then maybe the preferred system definition
utility (or the one described in the docs) was from CL-HTTP? I can't
remember but I know that I left with the impression that I had to be
somewhat familiar with CL-HTTP (which I wasn't) in order to use
CL-XML. It's good to know that this isn't the case.

Cheers,
Edi.
From: a
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <nIzPb.56333$vn.147205@sea-read.news.verio.net>
But one does have to use an "experimental" version of CMUCL, if one uses
CMUCL. It's documented on the CL-XML website but I certainly overlooked it
the first time I tried to get CL-XML working. It's a CLOS issue, IIRC.

"Edi Weitz" <···@agharta.de> wrote in message
···················@bird.agharta.de...
...
>I know that I left with the impression that I had to be
> somewhat familiar with CL-HTTP (which I wasn't) in order to use
> CL-XML. It's good to know that this isn't the case.
From: Raymond Toy
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <4nd69dccwy.fsf@edgedsp4.rtp.ericsson.se>
>>>>> "a" == a  <···@def.gh> writes:

    a> But one does have to use an "experimental" version of CMUCL, if one uses
    a> CMUCL. It's documented on the CL-XML website but I certainly overlooked it

What does "experimental" mean?  AFAIK, experimental versions were from
years ago.  There are, however, some monthly snapshots available[1], with
some other random CVS snapshots.  And a release is coming Real Soon
Now too.


Ray


Footnotes: 
[1]  Sort of.  cons.org is mostly down right now.  But should be back
real soon now.
From: a
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <zXBPb.56340$vn.147494@sea-read.news.verio.net>
See http://cl-xml.org and under Availability | Releases follow the "A
separate {{document}}" link. to the CL-XML Releases page. The cmucl entry is
marked "yes[1]" under the "wo/CL-HTTP w/ name symbols" column. Footnote [1]
says "the experimental CLOS/MOP is required. tests were done with the
{{i686-linux}} version." The i686-linux link points to
http://cvs2.cons.org/ftp-area/cmucl/experimental/pcl/cmucl-2003-03-28--17-20
-37-i686-Linux.tar.gz, if I spelled that correctly. CL-XML did not compile
with any other version of CMUCL that I had tried but it worked fine with
that one.



"Raymond Toy" <···@rtp.ericsson.se> wrote in message
···················@edgedsp4.rtp.ericsson.se...
> >>>>> "a" == a  <···@def.gh> writes:
>
>     a> But one does have to use an "experimental" version of CMUCL, if one
uses
>     a> CMUCL. It's documented on the CL-XML website but I certainly
overlooked it
>
> What does "experimental" mean?  AFAIK, experimental versions were from
> years ago.  There are, however, some monthly snapshots available[1], with
> some other random CVS snapshots.  And a release is coming Real Soon
> Now too.
>
>
> Ray
>
>
> Footnotes:
> [1]  Sort of.  cons.org is mostly down right now.  But should be back
> real soon now.
>
From: Raymond Toy
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <4nvfn5at7y.fsf@edgedsp4.rtp.ericsson.se>
>>>>> "a" == a  <···@def.gh> writes:

    a> marked "yes[1]" under the "wo/CL-HTTP w/ name symbols" column. Footnote [1]
    a> says "the experimental CLOS/MOP is required. tests were done with the

Ah, ok.  That's Gerd's PCL stuff.  Yeah, it was experimental, but it's
not anymore.  It will go out in the next release, and has been
the default for quite a while.

Ray
From: james anderson
Subject: CL-XML [ Re: XML->sexpr ideas
Date: 
Message-ID: <400F15B9.1FF939F3@setf.de>
there were a number of things which the experimental pcl did better than the
then official release at the time i was porting, so i've been waiting for some
indication that it had been folded into a stable release before rechecking
compatibility and updating any documentation.

if anyone has built it with more recent cvs snapshots, please let me know, so
that i can update the notes. otherwise i'll just keep watching the releases.

Raymond Toy wrote:
> 
> >>>>> "a" == a  <···@def.gh> writes:
> 
>     a> marked "yes[1]" under the "wo/CL-HTTP w/ name symbols" column. Footnote [1]
>     a> says "the experimental CLOS/MOP is required. tests were done with the
> 
> Ah, ok.  That's Gerd's PCL stuff.  Yeah, it was experimental, but it's
> not anymore.  It will go out in the next release, and has been
> the default for quite a while.
> 
> Ray

...
From: Erik Naggum
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <3283657498520793KL2065E@naggum.no>
* Kenny Tilton
| Maybe I just misunderstood.  If quads just give me a link to the
| parent, well, in the case of the DTD, all the entities, elements, and
| attributes had the same parent, the XML dtd document.  So I imagined
| an awful lot of serial searching, repeated over and over again for the
| same message type, and yes, I made a gut determination that I could
| use the names of things as keys to a hash table and turn a record
| expansion into so many keyed lookups.

  You assume way too much.  I lack the information to unwind your many
  assumptions, but you may have noticed that I wrote that QAR would
  point to the element type, like the operator in the CAR of a sexpr.
  This is obviously a symbol-like structure.  For some reason, you have
  read what I wrote to refer to the prolog of an SGML/XML document,
  while I talked about the document instance.  I have written elsewhere
  that the very concept of a DTD was a huge mistake, so I really wish
  you had asked me instead of running with your assumptions.

  Just as Common Lisp is defined on objects in a tree structure, but
  still manages to have clearly defined semantics, I had hoped it would
  be rather obvious that I intend the same to hold true for the SGML
  tree.  Defining element types and processors on them is clearly part
  of the whole approach, and just as Common Lisp systems do not search
  source files linearly for definitions of operators, this part of the
  language is not restricted to being represented with quads.

  But I don't know where to begin to explain things to you so you don't
  assume things without asking.  It is very difficult to predict what
  someone who guesses a lot will need to invalidate an assumption.

| Do we need each child to refer to its parent?

  Yes.

| Why not a format with the parent first and then one or more children
| understood to share the same parent?

  That would require more pointers to be kept around in a stack-like
  structure when traversing the document, while an explicit design goal
  of my approach is to move all this information into the tree.

-- 
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Kenny Tilton
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <ZyLPb.243803$0P1.168275@twister.nyc.rr.com>
Erik Naggum wrote:
> * Kenny Tilton
> | Maybe I just misunderstood.  If quads just give me a link to the
> | parent, well, in the case of the DTD, all the entities, elements, and
> | attributes had the same parent, the XML dtd document.  So I imagined
> | an awful lot of serial searching, repeated over and over again for the
> | same message type, and yes, I made a gut determination that I could
> | use the names of things as keys to a hash table and turn a record
> | expansion into so many keyed lookups.
> 
>   You assume way too much.  I lack the information to unwind your many
>   assumptions, but you may have noticed that I wrote that QAR would
>   point to the element type, like the operator in the CAR of a sexpr.
>   This is obviously a symbol-like structure.  For some reason, you have
>   read what I wrote to refer to the prolog of an SGML/XML document,
>   while I talked about the document instance.

No, I figured out you must be talking about doc instances. That is why I 
confessed to parsing a DTD. Anyway, i think I follow. The only reason I 
thought mad serial searching was involved was the parent pointer, but 
hey, all my tree nodes know their parents so I certainly see the value 
in that.

   I have written elsewhere
>   that the very concept of a DTD was a huge mistake,

It seems the world agrees. The DTD is dead, long live the Schema.

>   But I don't know where to begin to explain things to you so you don't
>   assume things without asking.

This reminds me of my attempt to teach someone rollerblading, who 
persisted in pitching himself headlong and often spinning thru the air 
and onto the asphalt. After a few tries at talking the lad down I 
realized it was just his learning style and let him be.

Anyway, as I said, I have always been a parent-aware node designer 
myself, so it is fun seeing that elevated to the status of a car or cdr.

Not that it matters, but why isn't the parent the first slot? As for the 
tail being dotted, bold stroke that. I've always felt bad writing code 
to ask my parent who is nect after me just to get to my next sibling.

kt
From: Erik Naggum
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <3283757471329193KL2065E@naggum.no>
* Kenny Tilton
| The only reason I thought mad serial searching was involved was the
| parent pointer, but hey, all my tree nodes know their parents so I
| certainly see the value in that.

  OK, but the point with my approach is to "load" a document into memory
  and then work on and navigate around in the in-memory representation
  instead of the edge-detection scheme that is used in the most popular
  tools.  That is, a DOM without any of the insanity.

| It seems the world agrees. The DTD is dead, long live the Schema.

  I am not pleased with this development, either, FWIW.

| Not that it matters, but why isn't the parent the first slot?

  Because the QAR is the "operator".  In the case of an entity fragment,
  the QAR is the source, and the meaning of the QBR is different, but if
  the two-dimensional vector with indexes instead of pointers is used,
  both a parent and a start position in a source would be a number.

| As for the tail being dotted, bold stroke that.  I've always felt bad
| writing code to ask my parent who is next after me just to get to my
| next sibling.

  Precisely.  That is so wrong.

-- 
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Björn Lindberg
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <hcsr7xug1fy.fsf@knatte.nada.kth.se>
Erik Naggum <····@naggum.no> writes:

> * Kenny Tilton
> | First, thx, <<quad>>s are nice.
> 
>   Heh.  My absence from news shows.  Over here in Europe, �� are the
>   proper quotation marks, instead of the various versions of " that are
>   not in ISO 8859-1.

What do you mean? The proper quotation marks for Swedish is
"ninety-nine ninety-nine", while in eg the UK "sixty-six ninety-nine"
is used. "G�s�gon", the marks you used can also be used in Swedish but
are then often used �like this�. In Norway they are pointed outwards,
like you did, but in Denmark they are �pointed inwards� instead[1].

Or did you just mean to say that since ISO 8859-1 is lacking the
proper "-style quotation marks, it is better to use ��? Because I
believe the quote marks to be used are dictated by the language the
text is written in, so that English text should be written using
English quotation marks, Swedish text using Swedish quotation marks,
etc.

[1] (in Swedish)
  http://susning.nu/Citat
  http://susning.nu/G%e5s%f6gon


Bj�rn
From: Erik Naggum
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <3283641017896220KL2065E@naggum.no>
* Bj�rn Lindberg
| What do you mean?

  That my habits have changed.

-- 
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Joe Marshall
Subject: Re: XML->sexpr ideas
Date: 
Message-ID: <4quqifde.fsf@ccs.neu.edu>
Erik Naggum <····@naggum.no> writes:

>   Maybe I didn't survive 2003 and this is some Hell where people
>   have to do eternal penance...

It's worse than that, this is comp.lang.lisp
From: Russell Wallace
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <4007e883.333055215@news.eircom.net>
On 15 Jan 2004 14:16:22 -0800, ···@sevak.isi.edu (Thomas A. Russ)
wrote:

>Of course, surrounding the symbol name with vertical bars might be
>considered "trying hard enough" by some people.

The fact that it works proves it's enough, n'est-ce pas? ^.~

-- 
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace
From: Barry Margolin
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <barmar-46176C.13332416012004@netnews.comcast.net>
In article <··················@news.eircom.net>,
 ················@eircom.net (Russell Wallace) wrote:

>  (defun )(')( ...)
> 
> That won't work; (, ) and ' are "punctuation" (?) and normally
> recognized by the reader as special characters. (I'm talking about the
> normal case, not what you can persuade the reader, interner or
> whatever to do if you try hard enough :)) So there's "whitespace",
> "punctuation" and... what's the third category called? Not
> "alphanumeric"... "constituent characters"?

Yes, that's the phrase used in the specification.

Note, however, that a token consisting only of constituent characters is 
*not* necessarily going to be parsed as a symbol.  Both numbers and 
symbols are made up only of constituent characters (unless you make use 
of radix prefixes like #o and #b).  Thus, 123e.456 is a symbol, 123.456 
and 123e456 are floats, and 123e is a symbol or integer depending on the 
value of *READ-BASE*.

There are tables in the ANSI spec and CLTL that list all the standard 
character types and constituent character attributes.  The character 
types are whitespace, terminating macro, non-terminating macro, single 
escape, multiple escape, and constituent (the text also mentions 
"illegal" characters, although no standard characters are of this type).

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
From: Joe Marshall
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <u12yqrn7.fsf@ccs.neu.edu>
Erik Naggum <····@naggum.no> writes:

[snip]

Welcome back!
From: Marc Spitzer
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <86isjew4gm.fsf@bogomips.optonline.net>
Erik Naggum <····@naggum.no> writes:

Glad to see you here again,

marc
From: Kent M Pitman
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <sfwisjd5mm6.fsf@shell01.TheWorld.com>
Erik Naggum <····@naggum.no> writes:

(Hi, Erik!  A pleasure to see you here.)

> * Russell Wallace
> | A trivial little question, but one that's been bugging me: Is there
> | a name for that set of characters legal in Lisp identifiers?  For
> | most languages this would be "alphanumeric" (perhaps with a footnote
> | that _ is regarded as a letter in this context), but Lisp includes
> | characters like + and - that most languages regard as punctuation.
> 
>   The type STANDARD-CHAR covers the set of characters from which all
>   symbols in the standard packages are made.  This simple fact may
>   give rise to the invalid assumption that there must be a particular
>   character set from which all symbols must be made.

Although, in contrast, if you're trying to write code to share around,
it's a good conservative set.  In the same sense as it's conservative
to write your programs in English.

I experimented with a multi-user, multi-lingual system (not Lisp-based)
for a while, and we eventually concluded that multilingualism is cool but
is best left to the interface.  At the programming level, the ability to
have one name for one function, is important to being able to search for
and update callers.   Making multiple names (for each language) for a 
function both impedes search and makes programs look dumb.  Making each
package impose its own language choice makes it hard to read programs and
sometimes raises argument order/naming issues.  And so, in the end, if
you retreat to some language to program in as a common language, English
once again rears its ugly chauvinistic self as the obvious alternative.
And with it, the standard characters are a nice safe set to build out of,
since there's no real reason to invite portability problems when you're 
already within striking distance of easy portability.

The more power you get, the more the burden is on you to use it
wisely.  Just because you can do something doesn't mean you should...

>   However, the functions INTERN and MAKE-SYMBOL take a STRING as the
>   name of the symbol to be created, and there is no restriction on
>   this /string/ to be of type BASE-STRING.  Likewise, the value of
>   SYMBOL-NAME is only specified to be of type STRING, with no mention
>   of the common observation that it may be a SIMPLE-STRING regardless
>   of whether the corresponding argument to INTERN or MAKE-SYMBOL was.

Yeah, I think this last is left to implementations.  I don't think there 
is any really good reason to require it to be a simple string.  An 
implementation might want to experiment with non-simple strings in ways
the designers didn't anticipate.

>   I am particularly fond of using the non-breaking space in symbol
>   names, just as I use it in filenames under operating systems that
>   believe that ordinary spaces are separators regardless of how much
>   effort one puts into convincing its various programs otherwise.  I
>   know people who think there ought to be laws against this practice,
>   but sadly, the Common Lisp standard does not come to their aid.

Erik, I have missed your singular ability to make me mad and make me smile
at the same time.  I wish I could decide whether I think this practice is
clever and forward thinking or just an irritating loophole.  But either way,
the problem exists, and you're just highlighting it.
From: rydis (Martin Rydstr|m) @CD.Chalmers.SE
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <w4cy8sb5dr1.fsf@basil.cd.chalmers.se>
················@eircom.net (Russell Wallace) writes:

> A trivial little question, but one that's been bugging me: Is there a
> name for that set of characters legal in Lisp identifiers? For most
> languages this would be "alphanumeric" (perhaps with a footnote that _
> is regarded as a letter in this context), but Lisp includes characters
> like + and - that most languages regard as punctuation.

I think "constituent character" is quite close, if not "it".

Regards,

'mr

-- 
[Emacs] is written in Lisp, which is the only computer language that is
beautiful.  -- Neal Stephenson, _In the Beginning was the Command Line_
From: Kent M Pitman
Subject: Re: Name for the set of characters legal in identifiers
Date: 
Message-ID: <sfwn08p5na5.fsf@shell01.TheWorld.com>
················@eircom.net (Russell Wallace) writes:

> A trivial little question, but one that's been bugging me: Is there a
> name for that set of characters legal in Lisp identifiers?

A character.

I think you don't mean what you wrote.

A Lisp identifier is a symbol, not a piece of text.  Some code is 
constructed entirely from programs and never even goes through the
text phase and has no such thing.

All characters are, in principle, allowed in a symbol.  You have to
use \x or |xxx| escaping to get some in.

If your question is about symbols, rather than about identifiers, 
that's a legit thing to ask, but is a completely different matter.
Not all symbols are identifiers, though.

> For most languages this would be "alphanumeric" (perhaps with a
> footnote that _ is regarded as a letter in this context), but Lisp
> includes characters like + and - that most languages regard as
> punctuation.

Most languages are parsed from text to program, with no intermediate
phase.  In Lisp, text (if there was any) has been parsed prior to the
time that expressions start to become considered as programs.  Lisp 
programs are not made out of characters, they are made out of structured
(i.e., already extant and composed) objects  (conses, symbols, numbers,
etc.).