XML and lisp

From: Jacek Generowicz
Subject: XML and lisp
Date: Thu, 23 Aug 2001 09:34:00 +0000
Message-ID: <g07kvvjf1z.fsf@scumbag.ecs.soton.ac.uk>

I've been doing my best to ignore XML thus far, but repeatedly
encountering comparisons of XML to lisp has piqued my interest. I am
wondering whether I can advance my understanding of lisp by learning
about its relation to XML.

Can you reccommend any books or URLs which could help me to learn
about XML, whith the aim of being able to discuss intelligently the
relative merits of the two.

Jacek

Re: XML and lisp Marco Antoniotti
- Re: XML and lisp Ian Wild
  - Re: XML and lisp Marco Antoniotti
  - Re: XML and lisp Tim Bradshaw
    - Re: XML and lisp Erik Naggum
      - Re: XML and lisp Tim Bradshaw
        Re: XML and lisp Erik Naggum
        Re: XML and lisp Tim Bradshaw
        Re: XML and lisp Barry Fishman
        Re: XML and lisp Erik Naggum
        Re: XML and lisp Barry Fishman
        Re: XML and lisp Erik Naggum
        Re: XML and lisp Barry Fishman
        Re: XML and lisp Boris Schaefer
        Re: XML and lisp Erik Naggum
        Re: XML and lisp Kent M Pitman
        Re: XML and lisp Thomas F. Burdick
        Re: XML and lisp Erik Naggum
        Re: XML and lisp Rob Warnock
        Re: XML and lisp Kent M Pitman
        Re: XML and lisp Rob Warnock
        Re: XML and lisp Erik Naggum
        Re: XML and lisp Rob Warnock
        Re: XML and lisp Boris Schaefer
        Re: XML and lisp Michael Livshin
      - Re: XML and lisp Kent M Pitman
        Re: XML and lisp Erik Naggum
        Re: XML and lisp Ray Blaak
Re: XML and lisp Bob Bane
- Re: XML and lisp Kaz Kylheku
Re: XML and lisp Kaz Kylheku
Re: XML and lisp Graham Ward

From: Marco Antoniotti
Subject: Re: XML and lisp
Date: Thu, 23 Aug 2001 13:42:48 +0000
Message-ID: <y6cvgjeuc2v.fsf@octagon.mrl.nyu.edu>

Jacek Generowicz <···@ecs.soton.ac.uk> writes:

> I've been doing my best to ignore XML thus far, but repeatedly
> encountering comparisons of XML to lisp has piqued my interest. I am
> wondering whether I can advance my understanding of lisp by learning
> about its relation to XML.

Roughly said (and with my agent provocateur hat on :) ) XML is a
re-invention of the wheel. The wheel is Lisp.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Ian Wild
Subject: Re: XML and lisp
Date: Thu, 23 Aug 2001 14:14:54 +0000
Message-ID: <3B85106D.C0EFD770@cfmu.eurocontrol.int>

Marco Antoniotti wrote:
> 
> Jacek Generowicz <···@ecs.soton.ac.uk> writes:
> 
> > I've been doing my best to ignore XML thus far, but repeatedly
> > encountering comparisons of XML to lisp has piqued my interest. I am
> > wondering whether I can advance my understanding of lisp by learning
> > about its relation to XML.
> 
> Roughly said (and with my agent provocateur hat on :) ) XML is a
> re-invention of the wheel. The wheel is Lisp.

Similarities:

-o- both are regarded by some as the best thing since sliced bread

-o- both go in heavily for balanced delimiters

-o- both are regarded as overly-bracketful by many people


Differences:

-o- One is a text markup language with little or no semantics

-o- One is a programming language with little or no syntax

From: Marco Antoniotti
Subject: Re: XML and lisp
Date: Thu, 23 Aug 2001 18:30:29 +0000
Message-ID: <y6csneityre.fsf@octagon.mrl.nyu.edu>

Ian Wild <···@cfmu.eurocontrol.int> writes:

> Marco Antoniotti wrote:
> > 
> > Jacek Generowicz <···@ecs.soton.ac.uk> writes:
> > 
> > > I've been doing my best to ignore XML thus far, but repeatedly
> > > encountering comparisons of XML to lisp has piqued my interest. I am
> > > wondering whether I can advance my understanding of lisp by learning
> > > about its relation to XML.
> > 
> > Roughly said (and with my agent provocateur hat on :) ) XML is a
> > re-invention of the wheel. The wheel is Lisp.
> 

> Differences:
> 
> -o- One is a text markup language with little or no semantics
> 
> -o- One is a programming language with little or no syntax

I like this one! :)

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Tim Bradshaw
Subject: Re: XML and lisp
Date: Thu, 23 Aug 2001 15:17:57 +0000
Message-ID: <nkjn14qzty2.fsf@omega.tardis.ed.ac.uk>

Ian Wild <···@cfmu.eurocontrol.int> writes:
> Differences:
> 
> -o- One is a text markup language with little or no semantics
> 
> -o- One is a programming language with little or no syntax

((:reply :title "Lisp is not just a programming language")
 (:body
  (:p "It is also a text-markup language, 
and many other things, as you can see here"
      "For instance with a suitable (small) macro, this is quite legal 
Lisp syntax, which is compiled to *ML.  I have written significantly-sized 
documents in this notation."))
 (:signature "--tim"))

From: Erik Naggum
Subject: Re: XML and lisp
Date: Fri, 24 Aug 2001 07:21:01 +0000
Message-ID: <3207626455633924@naggum.net>

* Tim Bradshaw <···@tfeb.org>
> ((:reply :title "Lisp is not just a programming language")
>  (:body
>   (:p "It is also a text-markup language, 
> and many other things, as you can see here"
>       "For instance with a suitable (small) macro, this is quite legal 
> Lisp syntax, which is compiled to *ML.  I have written significantly-sized 
> documents in this notation."))
>  (:signature "--tim"))

  As long as we think aloud in alternative syntaxes, I actually prefer to
  break the _incredibly_ stupid syntactic-only separation of elements and
  attribute values.  SGML and its descendants have made a crucial mistake:
  For every level of container (there are about 7 of them), there is a new
  syntax for _two_ properties of the container: (1) the contents is wrapped
  in one syntax, but (2) the "writing on the box" is in quite another.
  This means that information and meta-information are massively different
  concepts, and this artificial separation runs through the whole SGML
  design.  Each level offers a new way to write the two differently.  This
  is what makes it so goddamn hard to reason about SGML documents and to do
  reasonably intelligent transformations on them without working your butt
  off specifying all sorts of irrelevant stuff that does _nothing_ but get
  in your way.

  I have come to _loathe_ the half-assed hybrid that some XML-in-Lisp tools
  use and produce, because it makes XML just as evil in Lisp as it was in
  XML to begin with, and we have gained absolutely nothing in either power
  of processing or in abstraction, which is so very un-Lisp-like.

<foo bar="zot">quux</foo>

  should be read as

(foo (bar "zot") "quux")

  and most definitely _NOT_ as ((:foo :bar "zot") "quux"), which turns this
  fairly reasonable structure into a morass of complexity worse than it was
  to begin with.  And it does _NOT_ help to represent empty elements only
  with a keyword.  Using three different levels of nesting to represent a
  single concept is Just Plain Wrong.  Also, using keywords is not a good
  idea because there needs to be a lot of related information associated
  with elements and attributes, in different contexts, not to mention all
  the things they do with their funny "namespaces" these days.

  Whether something is an attribute or element is _completely_ arbitrary.
  It is based on some arbitrary choices in the design process that reveal
  absolutely no inherent qualities.  For purely pragmatic reasons, SGML
  folks will use attributes for some things and elements for others because
  their tools can deal with some things in attributes and some things in
  elements.  The faulty idea that attributes say something "about" the
  element and sub-elements somehow constitute be their contents is the same
  premature structuring that premature optimization of code suffers from.
  The whole language is incredibly misdesigned in making that distinction.

  As for writing SGML/XML/HTML/whatever, I have a simple way to get rid of
  the annoying verbosity of these stupid languages while _retaining_ that
  mistake between attribute values and elements, because it is quite hard
  to make simple regular expression-based conversions retain enough data
  about an element to decide what should be attribute and element.  An
  element has the form <name [attributes] | [contents]>.  Attribute have
  the form <name | value>.  Internal whitespace is only for readability.

XML                             Enamel (NML)            CL
<foo/>                          <foo>                   (foo)
<foo bar="zot"/>                <foo <bar|zot>>         (foo (bar "zot"))
<foo>zot</foo>                  <foo|zot>               (foo "zot")
<foo bar="zot">quux</foo>       <foo <bar|zot> |quux>   (foo (bar "zot") "quux")
<foo>Hey, &quux;!</foo>         <foo|Hey, [quux]!>      (foo "Hey, " quux "!")
<foo>AT&amp;T you will</foo>    <foo|AT&T you will>     (foo "AT&T you will")
<foo><bar>zot</bar></foo>       <foo|<bar|zot>>         (foo (bar "zot"))

  So I have almost none of the annoying and arbitrary quote/escape mania in
  attribute values or contents alike, either.  Entities I write as [name],
  and they end up in the Lisp version as symbols if not the character they
  represent purely for syntactic reasons.  Writing "code" in this language
  is actually amazingly painless compared to the produced noise.  Besides,
  with a few simple modify-syntax-entry calls in Emacs, I get < and > to
  match and blink and I can move up and down the structure very easily.

  For processing this stuff in Common Lisp, it is _sometimes_ neat to
  convert the single | attribute/content marker into the zero-length
  symbol, ||, so pathological cases like

<foo bar="zot"><bar>"zot"</bar></foo>

  which could have been written like this to show how arbitrary the
  syntactic disctinction in SGML/XML is

<foo <bar|zot>|<bar|zot>>

  come out as

(foo (bar "zot") || (bar "zot"))

  The really interesting thing is that writing in Enamel and producing XML
  is so easy that a simple Perl or Lisp function that takes an Enamel
  string as argument and produces XML is quite simple and straight-
  forward.  This makes for some interesting-looking "scripting" that blows
  the mind of the miserable little wrecks that think they have to type the
  endtag, the quotes and all the other user-inimical features of SGML/XML.

  In my personal view, Lisp "markup" has the disadvantage of needing lots
  of quotes, while Enamel has the strong advantage that in <xxx|yyy>, xxx
  is always symbolic and yyy is always a string of characters subject to
  interpretation by whatever the symbolic part instructs in context.

  Since the key feature of markup languages is the separation of text from
  markup, the simple idea in Enamel should carry enough force to make this
  a fully realizable goal without making an artificial syntactic separation
  between information and meta-information at any level.  If the syntax is
  good enough for the information, it should be good enough for the meta-
  information, and I think Enamel is.  Fortunately, I do not have to create
  a whole new international following and engage in godawful politics to
  use a better syntax for XML and the like, since XML and the like are only
  used as interchange syntaxes these days.  Nobody in their right mind
  actuslly writes anything by hand in such stupid languages that require so
  much attention to incredibly insignificant details and incomprehensibly
  irrelevant redundancy, anyway, do they?  :)

  Finally, note that in Enamel, a complete element is enclosed in <...> and
  that means it can be subject to a nice little Common Lisp reader macro,
  and it can be taught to recognize other stuff, as well, such as the neat
  concept of interpolating expression values where {expression} occurs.

  Still at "internal use" stage, I plan to publish some stuff about Enamel
  not too far into the future.

///

From: Tim Bradshaw
Subject: Re: XML and lisp
Date: Fri, 24 Aug 2001 10:29:25 +0000
Message-ID: <nkjpu9leooq.fsf@omega.tardis.ed.ac.uk>

Erik Naggum <····@naggum.net> writes:

> <foo bar="zot">quux</foo>
> 
>   should be read as
> 
> (foo (bar "zot") "quux")
> 
>   and most definitely _NOT_ as ((:foo :bar "zot") "quux"), which turns this
>   fairly reasonable structure into a morass of complexity worse than it was
>   to begin with.  And it does _NOT_ help to represent empty elements only
>   with a keyword.  Using three different levels of nesting to represent a
>   single concept is Just Plain Wrong.  Also, using keywords is not a good
>   idea because there needs to be a lot of related information associated
>   with elements and attributes, in different contexts, not to mention all
>   the things they do with their funny "namespaces" these days.
> 

I don't think I disagree with any of this - my lhtml hack was never
meant as more than that - it originated out of dissatisfaction with
the WITH-x syntax that CL-HTTP uses which is really painful to type,
and you also need to define millions of macros, and it's only meant to
be better than that. I consciously ignored the whole namespace stuff,
because I was really only interested in spitting out something a
browser could render efficiently, and embedding it in lisp programs in
such a way that I can skip easily between lhtml and lisp (the macro
just checks if the car is a keyword basically...).  So really I just
want to say that I'm not proposing the syntax I gave as anything other
than a quick hack.

I'm curious about your syntax though: If I want to go from Lisp to
something (rather than from something to Lisp), it seems that the
syntax you give is amiguous because of this (I cut the lines that
don't seem relevent).

> XML                             Enamel (NML)            CL
> <foo bar="zot"/>                <foo <bar|zot>>         (foo (bar "zot"))
> <foo><bar>zot</bar></foo>       <foo|<bar|zot>>         (foo (bar "zot"))

What I considered for my own hack was to avoid the whole ((...) ...)
thing by always requiring an attribute list, which could be nil, so
these would come out as

(foo (bar "zot")) and (foo () (bar () "zot")) respectively.  But for
most cases that was more typing than I liked (since I was typing in
Lisp not a better syntax).
> 
>   In my personal view, Lisp "markup" has the disadvantage of needing lots
>   of quotes, while Enamel has the strong advantage that in <xxx|yyy>, xxx
>   is always symbolic and yyy is always a string of characters subject to
>   interpretation by whatever the symbolic part instructs in context.
> 

Yes, this is a really good point,  the quoting gets tedious.

--tim

From: Erik Naggum
Subject: Re: XML and lisp
Date: Fri, 24 Aug 2001 15:17:11 +0000
Message-ID: <3207655030850317@naggum.net>

* Tim Bradshaw <···@tfeb.org>
> I'm curious about your syntax though: If I want to go from Lisp to
> something (rather than from something to Lisp), it seems that the syntax
> you give is amiguous because of this (I cut the lines that don't seem
> relevent).
> 
> > XML                             Enamel (NML)            CL
> > <foo bar="zot"/>                <foo <bar|zot>>         (foo (bar "zot"))
> > <foo><bar>zot</bar></foo>       <foo|<bar|zot>>         (foo (bar "zot"))

  The key to this is the relationship between foo and bar.  Whether bar is
  an attribute or a sub-element of foo is irrelevant to processing them,
  but when you need to turn this back into SGML/XML/Enamel, you need to
  know which it is.  This is why I said:

    As for writing SGML/XML/HTML/whatever, I have a simple way to get rid
    of the annoying verbosity of these stupid languages while _retaining_
    that mistake between attribute values and elements, because it is quite
    hard to make simple regular expression-based conversions retain enough
    data about an element to decide what should be attribute and element.

  ... implying that I would normally have such information and use it when
  generating attribute/value or sub-element/contents.

///

From: Tim Bradshaw
Subject: Re: XML and lisp
Date: Sun, 26 Aug 2001 08:39:39 +0000
Message-ID: <ey3g0afp644.fsf@cley.com>

* Erik Naggum wrote:

>   The key to this is the relationship between foo and bar.  Whether bar is
>   an attribute or a sub-element of foo is irrelevant to processing them,

Yes, that's a good point.  Whenever I've tried to design DTDs I've
always ended up having no attributes but doing everything as
subelements, and it's interesting that another very rich & successful
markup language - CL - does everything as `subelements' except when
people like me try and make it mirror HTML.

>   ... implying that I would normally have such information and use it when
>   generating attribute/value or sub-element/contents.

Yes, that's the crucial information I don't have in my application.

--tim

From: Barry Fishman
Subject: Re: XML and lisp
Date: Fri, 24 Aug 2001 15:46:40 +0000
Message-ID: <m3k7ztv50s.fsf@barry_fishman.att.net>

Tim Bradshaw <···@tfeb.org> writes:

> Erik Naggum <····@naggum.net> writes:
>
>> XML                             Enamel (NML)            CL
>> <foo bar="zot"/>                <foo <bar|zot>>         (foo (bar "zot"))
>> <foo><bar>zot</bar></foo>       <foo|<bar|zot>>         (foo (bar "zot"))
>
> What I considered for my own hack was to avoid the whole ((...) ...)
> thing by always requiring an attribute list, which could be nil, so
> these would come out as
>
> (foo (bar "zot")) and (foo () (bar () "zot")) respectively.  But for
> most cases that was more typing than I liked (since I was typing in
> Lisp not a better syntax).

Wouldn't attribute lists need to have a more `let' like syntax (and behavior).

(foo ((bar "zot")) "text")

or for some HTML:

(font ((size 10) (color :yellow)) "text")

Which is just a lisp program, after applying even my minimal skills
with macros.  The hard part is not overwelming the text.

(fragment ((layout :html-like) (feeling :pompous)) "
SGML and TeX, being just markup, did their best to preserve the bulk
of text without any transformation.  Their goal is to take a normal
text document and " (tquote "mark it up") " for computer interpretation.
SGML markers are ugly but they weren't intended to dominate the file.
" (p) "
As people's interest has moved from SGML to XML, they now talk more off
" (italic "structured data") ", and although this is a somewhat subtle
change of mindset, it makes the markup the dominant part of the file.
Unfortunately, once people start down a course of action they rarely
stop to consider if the original design guidelines and intent may have
been lost.
" (p) "
Just by " (emph "standardizing") " a straightforward mapping of XML
into and back from lisp, the uglyness and verbosity of XML would be
less of an issue.  You could use the syntax you liked.  I suspect when
the enthusiasm for XML has died down a bit, the benefits of a
standardized lisp notation could become better recognized.
" (p) "
Without such standards, of course, forget it.
")

You do need to step around the native lisp functions like quote.

Barry Fishman

From: Erik Naggum
Subject: Re: XML and lisp
Date: Fri, 24 Aug 2001 17:05:27 +0000
Message-ID: <3207661527508051@naggum.net>

* Barry Fishman <·············@acm.org>
> Wouldn't attribute lists need to have a more `let' like syntax (and behavior).

  No.  Please forget the attributes.  There _are_ no attributes.  Whether
  something is an attribute or not is completely arbitrary and irrelevant.
  Your access to that information is _not_ dependent on its rerepsentation
  in SGML/XML.  Treat everything as a subordinate element.  This is the key
  idea to gaining power of abstraction over the XML data.  Holding on to
  the mythical distinction between attribute and sub-element is the key
  idea to losing any and all power of abstraction.

> Just by " (emph "standardizing") " a straightforward mapping of XML into
> and back from lisp, the uglyness and verbosity of XML would be less of an
> issue.  You could use the syntax you liked.  I suspect when the
> enthusiasm for XML has died down a bit, the benefits of a standardized
> lisp notation could become better recognized.

  Please understand that that is what I was trying to do.  The only way to
  deal with the mistake that they made in syntactically separating
  attributes from contents is to undo that mistake.  Any and all catering
  to it is only making it worse.

> You do need to step around the native lisp functions like quote.

  Huh?

///

From: Barry Fishman
Subject: Re: XML and lisp
Date: Sat, 25 Aug 2001 03:31:26 +0000
Message-ID: <m3ae0o7r3t.fsf@barry_fishman.att.net>

Erik Naggum <····@naggum.net> writes:
>   No.  Please forget the attributes.  There _are_ no attributes.  Whether
>   something is an attribute or not is completely arbitrary and irrelevant.
>   Your access to that information is _not_ dependent on its rerepsentation
>   in SGML/XML.  Treat everything as a subordinate element.  This is the key
>   idea to gaining power of abstraction over the XML data.  Holding on to
>   the mythical distinction between attribute and sub-element is the key
>   idea to losing any and all power of abstraction.

I looked again, and you incantations did not work.  Attributes still
seem to be in the language.  I agree that when XML is used as a data
definition they are "completely arbitrary" and make a syntactic
separation which is destructive.  I, personally, just avoid using them
when I have control of the XML I use to define data.  But I can't
ignor them or re-format them, when I need to generate XML which
someone or some standard defined to use them.  That battle belongs in
the XML standards committees, and I am afraid its a bit late to change
their minds.

If I just treat attributes as subordinate elements, I lose the ability
to simply translate from lisp into XML.  In other news articles
you seem to suggest that you use information outside the lisp
representation to make that determination.  This means that my tools
would require priori knowledge, which I feel a simple lisp->XML
(non-interpretating) translator should not need.  I don't think
lisp->XML translators should have constraints that XML parsers don't
have.

In code which interprets the lispified XML, I know what the grammar is,
so can't I (at that time) bury any abstraction issues in the access
methods?  I admit I don't fully understand the abstraction benefits
with which you are concerned.  I've been overwelmed in tracking
all the XML languages which are being defined.  I was hoping that
being able to map them into lisp syntax would help avoid being buried
in XML's confusing syntax.  When looking at them in a lisp syntax,
thing can become clearer (and seem less innovative).

I don't agree that the distinction between attributes and entities is
always arbitrary.  SGML does stands for Simple Graphical *Markup*
Language, and in a markup language, I think it is important to
distinguish the text of a document from it markup.  Multiple
translators may be used, and they should not need to be kept up to
date on what attributes are used in the other translators.  In an
expression like:

<header1><italic>Wow</italic>, this is difficult.</header1> 

or as lisp (which I think is more readable):
(header1 (italic "Wow") " this is difficult")

it isn't clear whether "Wow" is text or the value of an attribute
unless you have prior knowledge of whether `italic` is a attribute in
the context of a header1 directive.  So here the distinction is
simple, clear, and useful.  (I am not commenting on the syntax.)

This is still important for things like xhtml -- and probably docbook,
whose standard I have not yet assimilated.

In my previous message I suggested that:
   <header1 italic="Wow"> this is difficult</header1>

become:
   (header1 ((italic "Wow")) " this is difficult")

With mimimal (but I admit real) damage to the syntax.

Barry Fishman

From: Erik Naggum
Subject: Re: XML and lisp
Date: Sat, 25 Aug 2001 12:08:48 +0000
Message-ID: <3207730126705119@naggum.net>

* Barry Fishman <·············@acm.org>
> I looked again, and you incantations did not work.  Attributes still
> seem to be in the language.

  Sigh.

> I agree that when XML is used as a data definition they are "completely
> arbitrary" and make a syntactic separation which is destructive.  I,
> personally, just avoid using them when I have control of the XML I use to
> define data.  But I can't ignor them or re-format them, when I need to
> generate XML which someone or some standard defined to use them.  That
> battle belongs in the XML standards committees, and I am afraid its a bit
> late to change their minds.

  How you work with XML is not defined by those standards bodies.  What
  your _internal_ representation of XML looks like is not defined by those
  standards bodies.  One of the fundamental properties of Lisp is that we
  have a very nice and well-defined mapping between external and internal
  representation for most of our object types.  There is no well-defined
  mapping between XML syntax and internal representation.  Lots of ways are
  equally valid.  Insisting on only some of them is counter-productive.

> If I just treat attributes as subordinate elements, I lose the ability to
> simply translate from lisp into XML.

  You have made up your mind about this, so I shall not try to convince you
  of the errors of your ways.  People who are dead set on their ways should
  be left alone, mostly because the get cranky when faced with alternatives.

> In other news articles you seem to suggest that you use information
> outside the lisp representation to make that determination.

  No, you do not understand, and that is because you do not even try.

> This means that my tools would require priori knowledge, which I feel a
> simple lisp->XML (non-interpretating) translator should not need.

  I see that you have to be very hard and fast on how you represent your
  information.  This is your choice.  I wish you would recognize it as a
  choice, and not try to impose a very specific view on the reality that is
  far more flexible and adaptable than you have shown to believe it to be.

> I don't think lisp->XML translators should have constraints that XML
> parsers don't have.

  Well, that is another choice you have made.  Other people, other choices.

> In code which interprets the lispified XML, I know what the grammar is,
> so can't I (at that time) bury any abstraction issues in the access
> methods?

  What does it matter to your access whether something is an attribute or a
  sub-element?  Why do you need to retain the distinction internally?

> I admit I don't fully understand the abstraction benefits with which you
> are concerned.

  I appreciate that you state this, because you certainly have not.

> I've been overwelmed in tracking all the XML languages which are being
> defined.

  Yes, overwhelmed by bad design, most people's brain shut down and they
  refuse to deal with a massive simplification because it threatens to be
  as painful as dealing with the complexity they have barely survived.

> I was hoping that being able to map them into lisp syntax would help
> avoid being buried in XML's confusing syntax.

  That is my idea.  I am sorry for you that you have to define away the
  solution to your problem by insisting on a trivial one-to-one mapping of
  conceptual elements that effectively block your own conceptualization.

> When looking at them in a lisp syntax, thing can become clearer (and seem
> less innovative).

  How very true.

> I don't agree that the distinction between attributes and entities is
> always arbitrary.

  Attribute and entities are very different concepts and distinction
  between them is of fundamental importance.  I fail to see how you think I
  have made any claims about their relationship, however.  I am talking
  about _elements_.

> SGML does stands for Simple Graphical *Markup* Language,

  It stands for Standard Generalized Markup Language, actually.  They key
  to understanding the name is that "generalized markup" is something more
  than mere markup.  SGML has aspirations beyond simply marking up text.

> and in a markup language, I think it is important to distinguish the text
> of a document from it markup.

  I think I already said that.

> Multiple translators may be used, and they should not need to be kept up
> to date on what attributes are used in the other translators.

  Your value judgments are your choice.  I happen to disagree with them.
  If you try to deny me this, please realize that I do not care at all.

> In an expression like:
> 
> <header1><italic>Wow</italic>, this is difficult.</header1> 
> 
> or as lisp (which I think is more readable):
> (header1 (italic "Wow") " this is difficult")
> 
> it isn't clear whether "Wow" is text or the value of an attribute
> unless you have prior knowledge of whether `italic` is a attribute in
> the context of a header1 directive.

  Well, first off: You _have_ that prior knowledge.  Your application will
  actually need to know what to do with it whether it is an attribute or a
  sub-element.  If your application does not know what to do with it, I
  fail to see how whether it is an attribute or an element can matter to
  you.  If you _do_ know what to do with it, how does it matter to you
  whether it came from an attribute value or a sub-element?

> So here the distinction is simple, clear, and useful.

  It is arbitrary.

> This is still important for things like xhtml -- and probably docbook,
> whose standard I have not yet assimilated.

  No, it is fundamentally unimportant.  Please try to accept this premise
  for the sake of discussion, and see if something you believe falls out
  and shows itself to you as more important than your simple protestations.

> In my previous message I suggested that:
>    <header1 italic="Wow"> this is difficult</header1>
> 
> become:
>    (header1 ((italic "Wow")) " this is difficult")
> 
> With mimimal (but I admit real) damage to the syntax.

  Keeping the distinction between attributes and content is keeping you
  from realizing how simple and efficiently you can deal with XML data.
  But that is your choice.  I fully expect that loads of people who have
  fused their brains shut and have fully "integrated" the false dichotomy
  of attributes and contents will never be able to unfuse it and open up to
  a very simple realization that it has absolutely no bearing on anything
  _other_ than the specific syntax in SGML/XML whether something is an
  attribute or an element.

  Those who grasp the concepts involved, will see that attributes are just
  another form of contents.  Those who do not grasp the concepts involved,
  will think that attributes are different from contents because they have
  been given syntactically different expression.  But it is always the
  syntax that follows the function.  Someone believed that meta-information
  should be fundamentally different from information.  Someone believed
  that the contents of elements should be text that wound up in the final
  document on the printed page and the values of attributes should not, but
  should only influence the processing of the information.  This worked
  only as long as SGML was used as a markup language for documents and had
  no aspirations towards being an abstract structuring syntax.  When it
  came to use it as a more abstract syntax, there _is_ no inherent quality
  that determines whether some value ends up displayed or not.  That has to
  be supplied by the software that processes the information, which is
  precisely prior knowledge of the structure and its meaning.

///

From: Barry Fishman
Subject: Re: XML and lisp
Date: Sat, 25 Aug 2001 20:00:59 +0000
Message-ID: <m3ae0n29lk.fsf@barry_fishman.att.net>

Erik Naggum <····@naggum.net> writes:
> * Barry Fishman <·············@acm.org>
>> If I just treat attributes as subordinate elements, I lose the ability to
>> simply translate from lisp into XML.
>
>   You have made up your mind about this, so I shall not try to convince you
>   of the errors of your ways.  People who are dead set on their ways should
>   be left alone, mostly because the get cranky when faced with
>   alternatives.

Crankyness is just a part of facing new ways of looking at things, Its
only a terminal disease in the very young.  At my age, old ways of
thinking, still, do not give way without an internal fight.  I have
great many years of Java/C/C++/Perl to overcome.  I do comprehend that
writing the equivalent code in lisp is pointless, although easy.

I wouldn't be taking the time to learn and work in lisp, if I didn't
recognize that it could significantly improve the ways I analyze
and solve problems.  This was made obvious by looking at (what I
presume is) good lisp code.

I will follow your suggestion and remove the entity/attribute
distinctions in my lisp code.  I am then left with a strong desire to
keep the names of XML attribute names in a list, and use that in a
generic XML output translator.

I suspect this is still avoiding the issues you have raised.  Instead
I will start by writing specific code for each case and see if a less
"C" like way of sharing code becomes evident.

I am open to any suggestions, although I can not guarantee I will
immediately grasp their rational.  (I think I am past my cranky
stage.  I am never cranky when I get to write code.)

>> SGML does stands for Simple Graphical *Markup* Language,
>
>   It stands for Standard Generalized Markup Language, actually.

Yes, Yes, Yes.  I was focused on the _markup_ part, but there really is
no excuse when the answer is just an `info psgml' away.

Barry Fishman
-- 
I am used to working from the general to the specific.  Problems seem
to have the same design patterns in C/C++/Java and probably XML,
although the implementations may use slightly different language
features.  However, this does not seem to follow with lisp.  A new set
of approaches take center stage, and I do not yet have the judgement
to understand their implications.  They just dangle before me benefits
which aren't present otherwise.  They also seem to carry the seed of
complexitys which could bury the project as a whole.  These disasters
of course are present in other languages, but there I can trust my
traditional ways of avoiding them.  The answer is, as my music teacher
would say, is practice, practice practice.

From: Boris Schaefer
Subject: Re: XML and lisp
Date: Sun, 26 Aug 2001 09:45:29 +0000
Message-ID: <873d6frw7a.fsf@qiwi.uncommon-sense.net>

Erik Naggum <····@naggum.net> writes:

| * Barry Fishman <·············@acm.org>
| 
| > In an expression like:
| > 
| > <header1><italic>Wow</italic>, this is difficult.</header1> 
| > 
| > or as lisp (which I think is more readable):
| > (header1 (italic "Wow") " this is difficult")
| > 
| > it isn't clear whether "Wow" is text or the value of an attribute
| > unless you have prior knowledge of whether `italic` is a attribute
| > in the context of a header1 directive.
| 
|   Well, first off: You _have_ that prior knowledge.  Your
|   application will actually need to know what to do with it whether
|   it is an attribute or a sub-element.  If your application does not
|   know what to do with it, I fail to see how whether it is an
|   attribute or an element can matter to you.  If you _do_ know what
|   to do with it, how does it matter to you whether it came from an
|   attribute value or a sub-element?

Well, I agree that in most cases you will know whether something was
an attribute or contents, when you're processing it, but what about:

  <foo bar="1"><bar>2</bar></foo>

If I understand you correctly (and I'm not exactly sure about that),
you would represent this in Lisp as:

  (foo (bar 1) (bar 2))

I don't see how you can distinguish attributes and contents in this
case, and how you can translate this back into the same XML.  Probably
I'm missing something.

Boris

-- 
·····@uncommon-sense.net - <http://www.uncommon-sense.net/>

If you want to put yourself on the map, publish your own map.

From: Erik Naggum
Subject: Re: XML and lisp
Date: Sun, 26 Aug 2001 09:30:31 +0000
Message-ID: <3207807028652689@naggum.net>

* Boris Schaefer <·····@uncommon-sense.net>
> Well, I agree that in most cases you will know whether something was
> an attribute or contents, when you're processing it, but what about:
> 
>   <foo bar="1"><bar>2</bar></foo>
> 
> If I understand you correctly (and I'm not exactly sure about that),
> you would represent this in Lisp as:
> 
>   (foo (bar 1) (bar 2))
> 
> I don't see how you can distinguish attributes and contents in this case,
> and how you can translate this back into the same XML.  Probably I'm
> missing something.

  Yes, you are definitely missing the constraints of SGML in real life.
  There are some problems that are not worth solving because they never
  come up even if they superficially could appear to come up if you do not
  pay attention.  This is such a problem.  You have failed to consider the
  ramifications of the solutions and pose a problem that simply would not
  exist if you did.  This taxes my patience, which already legendary in its
  general absence.

  However, I apparently need to insist that you understand that in SGML and
  XML alike, you do in fact know what attributes an element has.  It cannot
  possibly be ambiguous.  If you decide to name a sub-element the same as
  an attribute, however massively stupid that is even with SGML/XML as it
  is, you _still_ know that you have an attribute with that name.  That
  there is a sub-element with that name, as well, is coincidental to the
  representation.  There simply is no way you can _not_ know that, unless
  you go out of your way to destroy the information that SGML provides you
  with.  If you destroy the information that is available to you, you will
  not get me to do stupid human tricks answering your resulting questions.

  I truly wonder what is so hard to understand about this.  We Lisp people
  are quite used to association lists, right?  Keyword-value pairs do not
  need to be in property lists to be understandable by Lisp people, do
  they?  To my mind, whether you store something in a property list or an
  association list is arbitrary.  However, in the reactions that I have
  seen to obliterating the false dichotomy between attributes and contents
  in SGML, there somehow seems to be a _fundamental_ difference between
  property lists and association lists.  I completely fail to understand
  how that can be.

  The whole deal is so simple I do not even know how to explain it so
  people get it if they do not get it immediately.  It is somewhat like
  seeing someone struggle with fractions.  They either get it or they do
  not, and although I have managed to make many a struggling child get the
  idea, I have _no_ idea what precisely caused them to grasp it.  It just
  happened, and they laughed in relief.  This attribute/container thing is
  equally intuitively evident.

  Case in point: An element has a fixed number of attributes.  That is
  reflected in a fixed length of the association list that makes up the
  attributes.  Attributes are not repetable and not omissible, so if there
  are n attributes in the attribute list for an element, there will be n
  conses with attributes in the cdr of the element representation.  There
  are no two ways about this.  It is completely and irrevocably unambiguous.

  By exploiting the rich information we have about the elements and their
  makeup in SGML, we can reason about things with much simpler means than
  by adhering strictly to the particular representational issues in SGML.
  If it matters to you that some values are attributes, you ask for the
  attribute information.  If it does not matter to you, you can be relieved
  of the distinction.  If you want to transform attribute to contents or
  vice versa, modify the information about the element, not the element; if
  and when you print it out, the modifications will manifest themselves in
  new SGML/XML syntax, but nothing happened to your internal representation.

///

From: Kent M Pitman
Subject: Re: XML and lisp
Date: Sun, 26 Aug 2001 12:50:58 +0000
Message-ID: <sfw8zg7yogd.fsf@world.std.com>

Erik Naggum <····@naggum.net> writes:

>   I truly wonder what is so hard to understand about this.

I think in situations like this the answer is that you need to stay concrete.
One often can't say specifically why one finds something difficult or hard, 
but one can generate a test case that is at their fringe.  It was asked 
whether

   <foo bar="1"><bar>2</bar></foo>

would be represented as 

   (foo (bar 1) (bar 2))

and you've sort of hinted yes.  You've made allusions to alists as a way
of understanding this, but as a sense of intuition, of course, that doesn't
help a Lisp programmer a lot since plainly an alist is about th leftmost
of each named thing, and people are uneasy about accessing the next-leftmost
element behind it--that usually violates some sense of a-list/stack discipline.

You haven't offered an operator whose goal is to be like destructuring-bind
and so to get around this, so the burden seems, to those looking on, to be
on the programmer to pick apart this structure manually and the set of tools
seems light.  That's probably only an artifact of not seeing your tools, 
rather than anyone's belief that you have no such tools.

Likewise you haven't shown any syntax which is, by loose analogy, the 
equivalent of Lisp's arglist strangeness for keywords where you map a keyword
to a differently-named variable by doing 
   (lambda ((:foo fu) 3) fu)
It is by having an abstraction like this that you can assure the person 
that the caller's name for things will not confuse the callee.  I toyed with
coming up with an analogously absurd example for Lisp and the following
was my best go of it.  If it seems unhelpful, just ignore it.  But the point
is just to show that you can manage 
(let ((weird 'ee) (apartheid 'ii) (pie 'ii) (pier 'ee))  
  (labels ((fn1 (&key ((:ei e)) ((:ie i))) (list 'ee e 'ii i))
           (fn2 (&key ((:ie e)) ((:ei i))) (list 'ee e 'ii i))
           (sort-by-sound (&rest keys &key (first-vowel-wins-p t) 
                           &allow-other-keys)
             (apply (if first-vowel-wins-p #'fn1 #'fn2) 
                    :allow-other-keys t
                    keys)))
    (list (sort-by-sound :first-vowel-wins-p t   :ei 'weird     :ie 'pie)
          (sort-by-sound :first-vowel-wins-p nil :ei 'apartheid :ie 'pier))))
((EE WEIRD II PIE)
 (EE PIER  II APARTHEID))

That is, somehow you'd expect the external representation (:ie vs :ei) to
have a fixed effect on what two functiosnt that each have the same body
might return, but the arglist mappings (the "magic" in your example, of a
different kind than the "magic" here of &key, but still magic in a way)
manage to sort things out.  It isn't their behavior but the cross-bar you
plug between them that is doing something cool, and people don't see what
that cool thing is, probably only for lack of specificity rather than 
disbelief that what you say might be true.  Just as my example above is ho-hum
to a Lisp programmer, not mysterious, once they understand how keyargs work.

I think it would help if you posted the NML which helps you manipulate 
these, and perhaps a small code fragment that showed an end-to-end use
of constructing an expression in Lisp and having it appear in the XML with
this notation Boris suggests, and the reverse.  Then people would be
talking concrete still.

From: Thomas F. Burdick
Subject: Re: XML and lisp
Date: Sun, 26 Aug 2001 19:45:40 +0000
Message-ID: <xcvae0moba3.fsf@conquest.OCF.Berkeley.EDU>

Kent M Pitman <······@world.std.com> writes:

> I think it would help if you posted the NML which helps you manipulate 
> these, and perhaps a small code fragment that showed an end-to-end use
> of constructing an expression in Lisp and having it appear in the XML with
> this notation Boris suggests, and the reverse.  Then people would be
> talking concrete still.

I'd like to echo this sentiment.  I'm intrigued, but Dog knows
intriguing things can turn out to be pretty aweful in practice, or
divine, or anywhere in between, but it takes actual experience to tell
the difference most of the time.

From: Erik Naggum
Subject: Re: XML and lisp
Date: Mon, 27 Aug 2001 05:14:29 +0000
Message-ID: <3207878053204561@naggum.net>

* Kent M Pitman <······@world.std.com>
> You've made allusions to alists as a way of understanding this, but as a
> sense of intuition, of course, that doesn't help a Lisp programmer a lot
> since plainly an alist is about th leftmost of each named thing, and
> people are uneasy about accessing the next-leftmost element behind
> it--that usually violates some sense of a-list/stack discipline.

  Well, this is why association lists work as a metaphor -- attributes in
  SGML/XML cannot be repeated.  If there are more keys in the remainder of
  the contents, they are not attributes.

> You haven't offered an operator whose goal is to be like
> destructuring-bind and so to get around this, so the burden seems, to
> those looking on, to be on the programmer to pick apart this structure
> manually and the set of tools seems light.  That's probably only an
> artifact of not seeing your tools, rather than anyone's belief that you
> have no such tools.

  Which tools are available for the contents?  Why are they _not_ usable
  directly for the attributes?  I fail to grasp what you want to _do_ with
  the attributes that you cannot do with them if they are sub-elements.

  You imply that people are unable to deal with sub-elements and need
  special tools to deal with attributes.  This _must_ be wrong.

> I think it would help if you posted the NML which helps you manipulate
> these, and perhaps a small code fragment that showed an end-to-end use of
> constructing an expression in Lisp and having it appear in the XML with
> this notation Boris suggests, and the reverse.  Then people would be
> talking concrete still.

  I assume that people who voice their concerns in this discussion know
  SGML.  I have no inclination to write tutorials for people who do not.
  It is a waste of my time, and I know that I will hate it.  I have about
  500 pages of a book entitled "A Conceptual Introduction to SGML" that I
  swear to whichever deity is on duty today will _never_ be published,
  because the design flaws of SGML are so pervasive that the only thing I
  want to do with them is get rid of them.  Accept the fact that I deal
  with a history of personal pain in this regard.  I invested 6 years of my
  life on SGML and related standards, and the more I worked with it, the
  more I found that SGML actively destroyed any hope of achieving what it
  had set out to do, because it is introducing several poisons into the
  conceptual processes of structuring information.  Taking a look at what
  people do with SGML and XML today has not shown _one_ case of anyone
  waking up and smelling the coffee, and it has been _burning_ in the
  coffee machine for a decade.

  This is my view: You were told that you needed attributes in addition to
  sub-element contents.  Why did you ever _agree_ to that?  The onus of
  proof is normally on he who asserts the positive, and I challenge you to
  explain to me why you _need_ attributes rather than accepting any
  challenge to explain why you do _not_ need them when what I say is that
  _you_ already know perfectly well how to deal with sub-elements.  If you
  have worked with SGML at all, you _know_ that people screw up attributes
  and sub-elements, and you _have_ to had to deal with one that should have
  been the other in your processing.  It is _impossible_ to get them
  "right" because the notion that there is a "right" solution depends on
  information that is not available at the time the distinction is made.

  Over the years, I have thought of _many_ different ways to deal with the
  colossal braindamage that is attributes in SGML.  One might think of them
  as (keyword) arguments to functions, but which other information should
  influence a "function" that deals with an element?  Well, first and
  foremost, its _parentage_.  That means that I have already had to get rid
  of the notion that <foo bar="x" zot="y"> is "really" a function call like
  (foo :bar "x" :zot "y").  It has to know _so_ much more to do _anything_
  right that it is completely useless to cast one's thinking in such terms.

  SGML must be _questioned_, not accepted as gospel or natural science
  reporting on some findings.  Somebody made a decision to add attributes,
  and I know for a fact that that was back in the days of typesetting and
  document production when the idea was that you should be able to "remove"
  the "tags" and end up with the readable text of the document as it would
  be printed.  That was the _real_ rationale for attributes.  I happen to
  think that was a briliant idea at the time -- competing markup languages
  have a serious problem in using notations that destroy the ability to
  figure out easily what it intended for human and what is intended for the
  machine.  (In particular, TeX is a monster.)  I tended towards explaining
  to people that they should not let stuff that should not be displayed be
  in sub-elements.  What a crock of shit that advice is!  As soon as GML
  became more general than producing print documents, for which it was well
  suited and still is, the attribute concept had become a mill-stone around
  its neck and it dragged it down fast.  It was _wrong_ to keep attributes
  around when their rationale had been completely eradicated from its set
  of operating conditions.  It made everything incredibly complex.  I was
  one of very few people on this planet to really _study_ the standard, and
  my brain works in such a way that I still _know_ with immediate certainty
  whether something is or is not supported by the standard language and how
  to express it.  (It works the exact same way with Common Lisp, Ada (1983,
  unfortunately :), C (1991), and any number of things I have really sat
  down to study and understand, and it is so efficient that I even get an
  emotional response to violations before I see the logic of them.)  I love
  the way my brain works, but it also has serious drawbacks: Overriding and
  updating old information is something I have to work really hard at.  The
  end result of the way I think and the way the standard is defined is that
  I immediately saw these massively complex ways to do things that "nobody"
  understood.  Take HyTime and what it calls "architectual forms" -- I
  vividly remember a long walk around a quiet Tallahassee one summer night
  with the creator of this concept, when I questioned some of the designs
  and how it would be implemented, and he was quiet for the longest time
  before he said that I was probably the first person to have understood
  what he was _really_ trying to accomplish.  That would have been _such_ a
  great thing if it had been, say, rocket science, but it was not.  It was
  a man-made complexity so great that it had required _months_ of brain-
  wracking to really get my intuition working.  That was the first time I
  had really serious doubts about the wisdom of SGML's structuring process,
  because the massive complexity of it all is _completely_ pointless and a
  result of spreading the semantics so thin that you had to keep mental
  track of an enormous number of relationships to end up with an idea of
  what something should do or mean.  It does not have to be that way.  It
  was _profoundly_ disappointing to discover that at the end of this long
  process of grasping something that looked intellectually challenging lie
  only a complexity that resulted from _rejecting_ simplicity of design at
  a few crucial points.  Hell, it still took me years to figure out what
  alternatives they _should_ have picked up, and by then it was too late.

  Now you are probably thinking "how F hard can it be?" and looking half
  condescending on a retarded monkey who cannot figure out the purposes of
  the mathematical relationships in calculus.  But it is the same problem
  we find in C++.  The question to be asked of massive complexity like that
  is not "what wonderful things did you find out that made this necessary",
  but "whatever did you _miss_ that made this so horribly complex"?  You
  can sometimes see people who are really, really dumb go about some simple
  tasks in a way that tells you that they have arrived at their ways of
  performing it through an incredibly painful process that they are loathe
  to reopen or examine at all no matter how hard it is to get it right for
  them.  Some people will construct ways of performing their job so that
  they utilize all available brainpower, simply because that is indeed a
  very satisfying feeling.  However, when it comes to grasping someone
  else's _wrong_ ideas, there is no upper bound on complexity.  Some people
  have the most bizarrely convoluted thinking processes and they completely
  fail to monitor their thinking so they traipse off into oblivion and may
  or may not come back, but if they do, it is with these spectacularly
  irrational ideas that they _love_ before they discard them.  This is the
  kind of complexity that befell the SGML community.  That I could figure
  this mess out and think about it and have something dramatic to say about
  it to the creators, frankly scares me.

  In any case, I think the core problem is that a request for a rationale
  for _removing_ a complexifying misfeature is completely bogus.  We should
  not look at what we wound up with, we should look at _how_ we wound up
  where we are.  I have explained how attributes got invented in the first
  place and it _was_ a good idea at the time.  However, as soon as elements
  got more abstract and elements could contain _no_ information that would
  wind up on the printed page, but instead other elements that would, and
  those "abstract" elements would influence the way their sub-elements'
  contents would wind up on the printed page, it should have been clear
  that the attribute concept should be scheduled for extinction because
  some of its roles had now been moved into a different realm where _all_
  of its roles could be moved without sacrificing anything.

  The core idea that went horribly wrong with SGML _because_ of the very
  sad lack of re-examination of the rationale for attributes is almost so
  fundamental that removing it will tear down everything that SGML has
  built with it.  This is likely why people resist thinking about it,
  because it was so painful to learn SGML, it is better to keep out any
  risk of having to re-experience that pain from another angle.  I shall
  probably have to repeat this core idea forever because so few people
  really grasp it: SGML claims that some things you want to say about
  something is meta-information and some things are "normal" information.
  Like the historical baggage from the characters in the file that wound up
  in print and the characters that vanished in processing, SGML's view on
  meta-information and information is that they are inherently different
  and thus not only distinguishable, but in need of being kept apart, so
  much so that there are two wildly different languages to describe them.
  This core mistake leads to an inability to move between views of your own
  information and conceptualization of its structure, and that is just the
  way to kill your information.

  As a result of this dichotomy, SGML imposes an incredibly hard structure
  on the information.  If the information wants to break out of it, the
  whole structure breaks.  (XML is really _nothing_ better, but has all the
  appeal of tooth decay the way it touts its caries as "extensibility".)
  There are so many rules in the SGML standard that effectively prohibit a
  rational way to "flex" its design that people do not refrain from it
  because they do not consider it useful to be able to, but because any
  change to a document type definition is associated with an unknowable
  increase in complexity of processing, especially in the area of bringing
  legacy documnts in line with the change.  The extreme _brittleness_ of
  the SGML structure is a direct result of the core mistake to strike a
  dichotomy between meta-information and information, because in real life,
  the two are in fact _exactly_ the same thing, it is just a matter of who
  looks at it for which purpose.  If you do not believe that, it is because
  you still think that there _has_ to be a difference.  Of course there is,
  but it is not _inherent_ or _intrinsic_ to the information, it is highly
  pragmatically determined which is which at any given time.

  Structuring information is one of the _easiest_ tasks we humans do.  All
  the time, we add meta-information to information and we do not even mark
  it up as we go.  Human languages are chock full of meta-information: "I
  did not know darkness could be so illuminating", he said, expectingly.
  We _have_ no desire to mark meta-information as such and directly because
  it is part and parcel of how we interpret what other people tell us.  If
  I say "yesterday" today, I probably mean "2001-08-26", so I could write
  <date <formal 2001-08-26> yesterday>, but I could also talk about the
  past in some general term like <date <formal past> yesterday>, and so on
  and so forth.  What we really grasp about the information we receive _is_
  invariably meta-information.  The problem is then entirely artificial,
  since we do this almost automatically.  What we really need are means to
  make the meta-information explicit.  I used to believe that this would be
  a good idea, but until we find ways to "intuit" meta-information from a
  human context, I believe it is a waste of effort and it could well be
  counterproductive.  What we need is a very limited and very practical
  approach to obtain a minimal level of meta-information.  The more we
  specify the move we exclude, because as soon as we aim for a certain
  "depth" of representation, the alternative representations at the same
  level grow exponentially in number.

///

From: Rob Warnock
Subject: Re: XML and lisp
Date: Sun, 26 Aug 2001 14:05:43 +0000
Message-ID: <9mavnn$ec8ip$1@fido.engr.sgi.com>

Erik Naggum  <····@naggum.net> wrote:
+---------------
| * Boris Schaefer <·····@uncommon-sense.net>
| > Well, I agree that in most cases you will know whether something was
| > an attribute or contents, when you're processing it, but what about:
| >   <foo bar="1"><bar>2</bar></foo>
...
| >   (foo (bar 1) (bar 2))
| > I don't see how you can distinguish attributes and contents in this case,
| > and how you can translate this back into the same XML.  ...
...
|   Case in point: An element has a fixed number of attributes.  That is
|   reflected in a fixed length of the association list that makes up the
|   attributes.  Attributes are not repetable and not omissible, so if there
|   are n attributes in the attribute list for an element, there will be n
|   conses with attributes in the cdr of the element representation.  There
|   are no two ways about this.  It is completely and irrevocably unambiguous.
+---------------

While not repeatable, attributes *are* omissible if the DTD for those
attribute contains either default values or the "#IMPLIED" status keyword,
are they not? So if the DTD said:

	<!ELEMENT foo (bar | PCDATA)*>
	<!ATTLIST foo bar NUMBER #IMPLIED>

that is, the "foo" element has an optional "bar" attribute *and* also
allows an arbitrary number of "bar" sub-elements, then (foo (bar 1) (bar 2))
*would* be ambiguous.

I see two obvious ways to preserve the simplicity you seek:

1. Do what CL does for declarations, that is, reserve a symbol to
   tag lists of attributes (like "declare" does), which are optional,
   but if present may only appear before all non-attribute subforms:

	(foo (attr (bar 1)) (bar 2))

2. Force attribute names and element names into different packages, e.g.:

	(foo (attr:bar 1) (bar 2))

   or if the current package is never the keyword package, simply:

	(foo (:bar 1) (bar 2))


-Rob

p.s. The article "Element/Attribute Distinction Considered Harmful"
<URL:http://www.lists.ic.ac.uk/hypermail/xml-dev/xml-dev-Aug-1999/0375.html>
from the XML-DEV list discusses precisely the same issue, starting with:

	After writing the usual 'when to use elements and when to use
	attributes' bit for a new book and then spending some time
	close up with the XLink specs, I'm really starting to wonder
	if we haven't painted ourselves into a corner by treating leaf
	elements and attributes differently.

Unfortunately, no significant followups seem to have been posted!!

-----
Rob Warnock, 30-3-510		<····@sgi.com>
SGI Network Engineering		<http://reality.sgi.com/rpw3/>
1600 Amphitheatre Pkwy.		Phone: 650-933-1673
Mountain View, CA  94043	PP-ASEL-IA

From: Kent M Pitman
Subject: Re: XML and lisp
Date: Sun, 26 Aug 2001 14:25:48 +0000
Message-ID: <sfw3d6ezymr.fsf@world.std.com>

····@rigden.engr.sgi.com (Rob Warnock) writes:

> 2. Force attribute names and element names into different packages, e.g.:
> 
> 	(foo (attr:bar 1) (bar 2))
> 
>    or if the current package is never the keyword package, simply:
> 
> 	(foo (:bar 1) (bar 2))

Don't forget XML has a package namespace of its own.  You'd need nested
namespaces to pull this off, no?

From: Rob Warnock
Subject: Re: XML and lisp
Date: Mon, 27 Aug 2001 08:17:41 +0000
Message-ID: <9mcvn5$eh87a$1@fido.engr.sgi.com>

Kent M Pitman  <······@world.std.com> wrote:
+---------------
| ····@rigden.engr.sgi.com (Rob Warnock) writes:
| > 2. Force attribute names and element names into different packages, e.g.:
| > 	(foo (attr:bar 1) (bar 2))
| >    or if the current package is never the keyword package, simply:
| > 	(foo (:bar 1) (bar 2))
| 
| Don't forget XML has a package namespace of its own.  You'd need nested
| namespaces to pull this off, no?
+---------------

Oh, heavens! I certainly wasn't trying to open *that* can of worms again!
But yes, you're right, of course, if one were to try to use Lisp namespaces
directly for XML names. But...

I think Erik's parallel response gets it absolutely correct [which I
missed on first reading of his earlier article -- oops!], namely, once
parsed (and defaulted, if necessary) all the stuff about what's an
"attribute" and what's not should be a property of the Lisp representation
of the element [CLOS class, whatever], and not necessarily encoded
in any way in the Lisp data structure per se.

Likewise, I suspect the right answer for dealing with XML namespaces
will turn out to be to have the Lisp representation of each element
worry about that, and use directly-corresponding names for XML elements
and Lisp symbols only to the extent that it's convenient, and *NOT*
attempt to force any rigid or automatic 1-to-1 correspondence.

I was intending to use Lisp packages only to encode the one bit of
"attribute/non-attribute", not encode XML namespace, but Erik rightly
showed that approach was still trapped in the SGML/XML worldview. Hence,
I retract the suggestion (except in the case that the Lisp representation
of a particular element *chooses* to use that distinction, purely for
its own convenience).


-Rob

-----
Rob Warnock, 30-3-510		<····@sgi.com>
SGI Network Engineering		<http://reality.sgi.com/rpw3/>
1600 Amphitheatre Pkwy.		Phone: 650-933-1673
Mountain View, CA  94043	PP-ASEL-IA

From: Erik Naggum
Subject: Re: XML and lisp
Date: Mon, 27 Aug 2001 05:23:23 +0000
Message-ID: <3207878601378705@naggum.net>

* ····@rigden.engr.sgi.com (Rob Warnock)
> While not repeatable, attributes *are* omissible if the DTD for those
> attribute contains either default values or the "#IMPLIED" status keyword,
> are they not?

  That depends on whether you represent the parsed or pre-parsed structure.
  In a Common Lisp setting, we are dealing with parsed structure.  If the
  attribute value is "implied" in the source, it still needs to be there in
  the parsed structure.

> So if the DTD said:
> 
> 	<!ELEMENT foo (bar | PCDATA)*>
> 	<!ATTLIST foo bar NUMBER #IMPLIED>
> 
> that is, the "foo" element has an optional "bar" attribute *and* also
> allows an arbitrary number of "bar" sub-elements, then (foo (bar 1) (bar
> 2)) *would* be ambiguous.

  If you choose to represent a pre-parsed SGML instance in Common Lisp, I
  would argue strongly against that before I would even attempt to answer
  anything else.

  I _really_ mean it when I say that the attribute list has a fixed length.

  I also indicated that for pragmatic reasons, I sometimes use a marker to
  separate the attributes from the contents in the cdr of the element, such
  as when the task at hand would be wastefully slow if I were to deal with
  a fully parsed structure.  Dirty hacks should be within reach because the
  world is sometimes not clean.  I am probably not going to get used to the
  habit of some people who see a problem in one part of a proposal and
  ignore the fact that there is a solution in another part of the same
  proposal (like the next paragraph), and I am certainly not patient enough
  with all the rampant idiocy in the SGML/XML world to explain this over
  and over, but please go back and read the whole message.  If you find a
  need to use a marker in _some_ cases, I have in fact covered it.  In the
  fully parsed, fully general case, that need does _not_ arise, because the
  attribute list is a fixed set of "slots" in the structure.  This should
  have no bearing on how to process them, however, but of course it matters
  to and from SGML/XML representation.

///

From: Rob Warnock
Subject: Re: XML and lisp
Date: Mon, 27 Aug 2001 10:39:11 +0000
Message-ID: <9md80f$ei0bi$1@fido.engr.sgi.com>

Erik Naggum  <····@naggum.net> wrote:
+---------------
| ····@rigden.engr.sgi.com (Rob Warnock)
| > While not repeatable, attributes *are* omissible if the DTD for those
| > attribute contains either default values or the "#IMPLIED" status keyword,
| > are they not?
| 
|   That depends on whether you represent the parsed or pre-parsed structure.
|   In a Common Lisp setting, we are dealing with parsed structure.  If the
|   attribute value is "implied" in the source, it still needs to be there
|   in the parsed structure.
+---------------

*Doh!* I think I finally get what you were trying to say, thanks!

+---------------
| > So if the DTD said:
| > 	<!ELEMENT foo (bar | PCDATA)*>
| > 	<!ATTLIST foo bar NUMBER #IMPLIED>
| > that is, the "foo" element has an optional "bar" attribute *and* also
| > allows an arbitrary number of "bar" sub-elements, then (foo (bar 1) (bar
| > 2)) *would* be ambiguous.
| 
|   If you choose to represent a pre-parsed SGML instance in Common Lisp...
+---------------

Or a half-parsed (i.e., half-assed)?  ;-}

+---------------
|   I would argue strongly against that before I would even attempt to
|   answer anything else.
| 
|   I _really_ mean it when I say that the attribute list has a fixed length.
+---------------

Got it. Now let's see if I can explain it to others who may not have:

My understanding of what Erik is suggesting [very strongly!] is that one
should *NOT* try to invent any kind of direct "Lispified" or S-expr
restatement of XML/HTML/SGML *syntax* per se, but instead to *parse*
the XML document and choose convenient (potentially element-specific)
CL representations for the parsed elements. This parsing process will
involve filling in default values for omitted attributes, including those
whose default is "#IMPLIED". Once you have done this parsing, there is
nothing "optional" at all about any of the attributes -- you now have
*all* of their values. [Whether you choose to explicitly store defaulted
ones or not is a separate decision -- in any event you know their values.]

Now, having parsed the element and filled in the defaults, how you
choose to represent it in CL data is pretty much up to you. One way
might be as an instance of a CLOS class, with the attributes as slots
[plus a slot for the sub-elements, if it's not an empty element]. This
would allow you to use a generic function (print-element elem style)
that specialized on both the element type and the desired output style
to output completely different texts from the same parsed document.

Another way is a simple list of the element name[*] followed by the
values of the attributes (with or without attendant "keywords" to
make them readable to humans debugging the program) followed by the
rest of the contained elements (if any). Without any attribute markers
at all, this might have a form similar in appearance (only!) to a
function call with positional parameters, that is:

	<foo bar="1"><bar>2</bar></foo>

after parsing might internally represented as:

	(foo 1 (bar 2))

Or if you choose to add some element-like structure to the attributes,
you can do that, too. [You might choose to do that if (*ugh!* *shudder!*)
some attributes contain further internal structure, and you'd like to
represent the *parsed* version of that structure in a pleasing way.]
That gets us to:

	(foo (bar 1) (bar 2))

But again, since all of the application routines that have to deal with
a "foo" element *know* that "foo" has a "bar" attribute, all of the code
[that cares about attributes] knows that the CADADR is the attribute value
and the CDDR is the content.

Now suppose that the application-implied value for the attribute "bar"
is zero, and we are given this to parse:

	<foo><bar>2</bar><bar>17</bar></foo>

What I (finally) heard Erik say is that the only reasonable internal
representation for that (depending on whether you chose the "positional"
or "element-like" representation for foo's attributes) would be one of
these forms:

	(foo 0 (bar 2) (bar 17))
or:
	(foo (bar 0) (bar 2) (bar 17))

That is, the structure of the CL representation *must* be invariant
w.r.t. inclusion or omission of attributes in the source text. So in
the second form, the CADADR is still the attribute value and the CDDR
is still the content, even though the attribute was omitted in the
source text.

+---------------
|   I also indicated that for pragmatic reasons, I sometimes use a marker to
|   separate the attributes from the contents in the cdr of the element, such
|   as when the task at hand would be wastefully slow if I were to deal with
|   a fully parsed structure.  Dirty hacks should be within reach because the
|   world is sometimes not clean.
+---------------

I now understand & agree.

+---------------
|   I am probably not going to get used to the habit of some people who
|   see a problem in one part of a proposal and ignore the fact that there
|   is a solution in another part of the same proposal (like the next
|   paragraph), and I am certainly not patient enough with all the rampant
|   idiocy in the SGML/XML world to explain this over and over, but please
|   go back and read the whole message.
+---------------

I did, and that's when the light finally dawned, but I have to say
that until one *does* finally understand it's not at all obvious.
No, I don't know how you could have said it any more clearly. I can
only say (from personal experience now!) that if one *ever* falls into
the trap of trying to "Lispify" the *syntax* of XML instead of represent
the *parsed* structure, it can be very hard to let go of that fixation.

Hmmm... Perhaps it's some sort of "figure/ground" thing, as in that
classic picture <URL:http://www.lcsc.edu/ss150/u5s1p6.htm> used in
gestalt psychology. If you see the young woman first, it's sometimes
hard to then see the old hag (or vice versa). And one's history or
prejudices may strongly affect which one you see first, e.g., young
men tend to see the young woman first.

[Of course, once you've seen *both*, then it's much, much easier
to flip your perception back and forth at will between them.]


-Rob

[*] That is, as I mentioned in my parallel reply to Kent, a CL symbol
    chosen to *represent* the XML element name, not necessarily or even
    desirably any automatic conversion of the XML element name to a CL
    symbol.

-----
Rob Warnock, 30-3-510		<····@sgi.com>
SGI Network Engineering		<http://reality.sgi.com/rpw3/>
1600 Amphitheatre Pkwy.		Phone: 650-933-1673
Mountain View, CA  94043	PP-ASEL-IA

[Note: ·········@sgi.com and ········@sgi.com aren't for humans ]

From: Boris Schaefer
Subject: Re: XML and lisp
Date: Tue, 28 Aug 2001 02:16:54 +0000
Message-ID: <8766b97wtl.fsf@qiwi.uncommon-sense.net>

Erik Naggum <····@naggum.net> writes:

| That depends on whether you represent the parsed or pre-parsed
| structure.  In a Common Lisp setting, we are dealing with parsed
| structure.  If the attribute value is "implied" in the source, it
| still needs to be there in the parsed structure.

Aahh, I think this clears things up for me.  I think I understand now.
Thanks.  

| I am probably not going to get used to the habit of some people who
| see a problem in one part of a proposal and ignore the fact that
| there is a solution in another part of the same proposal (like the
| next paragraph), and I am certainly not patient enough with all the
| rampant idiocy in the SGML/XML world to explain this over and over,
| but please go back and read the whole message.

I did.  I actually read that part about the marker before already,
somehow it just didn't enter my brain.  I also really didn't realize
that the attribute list _really_ is fixed length after parsing.
Thanks for stressing your patience and explaining it again.

Boris

-- 
·····@uncommon-sense.net - <http://www.uncommon-sense.net/>

Facts, apart from their relationships, are like labels on empty bottles.
		-- Sven Italla

From: Michael Livshin
Subject: Re: XML and lisp
Date: Sun, 26 Aug 2001 08:21:08 +0000
Message-ID: <s37kvrmdu3.fsf@yahoo.com.cmm>

Boris Schaefer <·····@uncommon-sense.net> writes:

> Well, I agree that in most cases you will know whether something was
> an attribute or contents, when you're processing it, but what about:
> 
>   <foo bar="1"><bar>2</bar></foo>
> 
> If I understand you correctly (and I'm not exactly sure about that),
> you would represent this in Lisp as:
> 
>   (foo (bar 1) (bar 2))
> 
> I don't see how you can distinguish attributes and contents in this
> case, and how you can translate this back into the same XML.  Probably
> I'm missing something.

assuming I understood Erik's point, you can surely distinguish
attributes and contents when translating back to XML.  that is, your
Lisp->XML translator should know that `(bar 1)' under `foo' is
Lisp->supposed to be an attribute.

having this knowledge _only_ in the output translator frees you from
distinguishing attributes and contents in your program.

this does mean that the value of foo's `bar' attribute (er, content)
should magically always be atomic, because if it's not you won't be
able to output valid XML.  but note that if your program can deal with
non-atomic values of `bar', then you probably have chosen wrong XML
format in the first place...

-- 
What is this talk of 'release'? Klingons do not make software
'releases.' Our software 'escapes' leaving a bloody trail of designers
and Quality Assurance people in its wake.
                                        -- Klingon Programmer

From: Kent M Pitman
Subject: Re: XML and lisp
Date: Fri, 24 Aug 2001 15:19:56 +0000
Message-ID: <sfw7kvto57n.fsf@world.std.com>

Erik Naggum <····@naggum.net> writes:

> 
> * Tim Bradshaw <···@tfeb.org>
> > ((:reply :title "Lisp is not just a programming language")
> >  (:body
> >   (:p "It is also a text-markup language, 
> > and many other things, as you can see here"
> >       "For instance with a suitable (small) macro, this is quite legal 
> > Lisp syntax, which is compiled to *ML.  I have written significantly-sized 
> > documents in this notation."))
> >  (:signature "--tim"))
> 
>   As long as we think aloud in alternative syntaxes, I actually prefer to
>   break the _incredibly_ stupid syntactic-only separation of elements and
>   attribute values.  SGML and its descendants have made a crucial mistake:
>   For every level of container (there are about 7 of them), there is a new
>   syntax for _two_ properties of the container: (1) the contents is wrapped
>   in one syntax, but (2) the "writing on the box" is in quite another.

Certainly what you say is undeniably true in terms of practice, and I'd even
give you that the notational distinction is not worth the mechanism, but
is there somewhere that the language actually forces this "role" relationship?

I wrote a package in Java at a prior employer which automatically generated
XML representations for classes as elements based on Java metadata, and the
tack I took was not that the XML attributes contain meta-data and the contents
data but rather that the XML attributes contain atomic data and the
contents contain compound data, since this is IN FACT what the real distinction
is. Some type Foo with

 int x;
 Date y;
 ElementList z;

might produce 

 <Foo x="3" date="3207653971"><z><Element>...</Element>...</z></Foo>

In effect, what I got out of this was a description that allowed two 
syntaxes: an easy syntax for easy things, and a hard syntax for hard
things.  Of course, there are all kinds of problems even then because of
subclass relationships (the analog of the problem of *print-nreadably*
and strings of base-char vs char, or the loss of fill-pointer, etc. in
printing a string.  Fixing these gets very verbose very quickly.
So I'm mnot defending the notation in that regard.)

But what I'm really wondering is whether SGML has some "intended use" spec
that tells you that you have to put meta-info in the "car" of the "form",
and info in the "cdr".  I thought the use of these containers was
semantics-free.

>   I have come to _loathe_ the half-assed hybrid that some XML-in-Lisp tools
>   use and produce, because it makes XML just as evil in Lisp as it was in
>   XML to begin with, and we have gained absolutely nothing in either power
>   of processing or in abstraction, which is so very un-Lisp-like.
> 
> <foo bar="zot">quux</foo>
> 
>   should be read as
> 
> (foo (bar "zot") "quux")
> 

Maybe. Macsyma used a similar notation for years (though without the restriction
on container-ness).  I don't think the answer is to change to do the rewrite
you suggest.  I don't understand why it's not natural to add the
following as legal syntaxes:

 <foo bar=<zot/>>

or

 <foo bar=<string>zot</string>>quux</foo>

This would keep people from feeling the attribute list was a shorthand
area and would also allow the storing of complex meta-data.  Right now,
the fact that a use of <...> in the attribute thing seems a terrible waste.
The only rationales I can figure for this were either the desire to 
periodically beat someone on the back of the hand for syntax errors 
by having a regular application of over-applied syntax or else some sort of
efficiency bum to make the acquisition of strings in the attribute list
uselessly faster.  Do you know what the reason was that recursive structures
were not allowed in this position in XML?

Or perhaps it was the fact that the "real world" substitutes for "parsed
structure" things like that weird assembly code like notation which looks
like

 (A
 AHREF=foo.html
 -Text
 )A

Perhaps someone was just being uncreative about how a compound-structure
could be offered as an attribute.

>   As for writing SGML/XML/HTML/whatever, I have a simple way to get rid of
>   the annoying verbosity of these stupid languages while _retaining_ that
>   mistake between attribute values and elements, because it is quite hard
>   to make simple regular expression-based conversions retain enough data
>   about an element to decide what should be attribute and element.  An
>   element has the form <name [attributes] | [contents]>.  Attribute have
>   the form <name | value>.  Internal whitespace is only for readability.
> 
> XML                             Enamel (NML)            CL
> <foo/>                          <foo>                   (foo)
> <foo bar="zot"/>                <foo <bar|zot>>         (foo (bar "zot"))
> <foo>zot</foo>                  <foo|zot>               (foo "zot")
> <foo bar="zot">quux</foo>       <foo <bar|zot> |quux>   (foo (bar "zot") "quux")
> <foo>Hey, &quux;!</foo>         <foo|Hey, [quux]!>      (foo "Hey, " quux "!")
> <foo>AT&amp;T you will</foo>    <foo|AT&T you will>     (foo "AT&T you will")
> <foo><bar>zot</bar></foo>       <foo|<bar|zot>>         (foo (bar "zot"))
> 
>   In my personal view, Lisp "markup" has the disadvantage of needing lots
>   of quotes, while Enamel has the strong advantage that in <xxx|yyy>, xxx
>   is always symbolic and yyy is always a string of characters subject to
>   interpretation by whatever the symbolic part instructs in context.

I'd like to see a side-by-side elaboration of this problem to better 
understand it.   

>   Still at "internal use" stage, I plan to publish some stuff about Enamel
>   not too far into the future.

Good.  I'd hate for it to be "lost" as merely a post here, though I think
it's fun that you felt comfortable in sharing your thoughts.

From: Erik Naggum
Subject: Re: XML and lisp
Date: Fri, 24 Aug 2001 20:03:19 +0000
Message-ID: <3207672197075433@naggum.net>

* Kent M Pitman <······@world.std.com>
> Certainly what you say is undeniably true in terms of practice, and I'd even
> give you that the notational distinction is not worth the mechanism, but
> is there somewhere that the language actually forces this "role" relationship?

  No, there is nothing that requires there to be element attributes as a
  distinct concept from element contents.  There are, however, a number of
  practical things that follow from making that arbitrar distinction which
  can look like rationales, but if you ask yourself "why can it not be a
  subelement", there are no real answers, only appeals to the idea that
  there somehow __have to be a distinction.  It took me years to figure out
  that the whole attribute idea is completely vacuous, and I worked with
  the creator of SGML himself for several years on several SGML-related
  standards and projects.  I started writing "A conceptual introduction to
  SGML" back in 1994, but as I had pained my way through five chapters, I
  had to realize that it was all wrong.  There was a basic design mistake
  in the whole language framework.  That mistake is that simply put: "what
  is good enough for the users of the language is not good enough for its
  creators".  Each and every level of "containership" in SGML has its own
  syntax, optimized for the task.  Each and every level has a different
  syntax for "the writing on the box" as opposed to "the contents of the
  box".  This follows from a very simple, yet amazingly elusive principle
  in its design: Meta-data is conceptually incompatible with data.  This is
  in fact wrong.  Meta-data is only data viewed from a different angle, and
  vice versa.  SGML forces you to remain loyal to your chosen angle of view.

> I wrote a package in Java at a prior employer which automatically
> generated XML representations for classes as elements based on Java
> metadata, and the tack I took was not that the XML attributes contain
> meta-data and the contents data but rather that the XML attributes
> contain atomic data and the contents contain compound data, since this is
> IN FACT what the real distinction is.

  The key to understanding this is that there is no _one_ real distinction.
  There are in fact any number of "real distinctions".  You just found one
  way to wrap your world in the attribute/contents dichotomy because it was
  there.  What would you do if it was not?  What would you do if you had
  only sub-elements?  Would you have _invented_ attributes?  I do not think
  anyone would have, because using sub-elements exacts no higher cost than
  using attributes.

> In effect, what I got out of this was a description that allowed two
> syntaxes: an easy syntax for easy things, and a hard syntax for hard
> things.

  I propose an easier syntax for the harder things and a slightly harder
  syntax for the easier things so they do not impose any easy-vs-hard
  misconceptions on the user and designer.  By making both things cost the
  same, the decision to use an attribute or a sub-element becomes a very
  different choice.

> But what I'm really wondering is whether SGML has some "intended use"
> spec that tells you that you have to put meta-info in the "car" of the
> "form", and info in the "cdr".  I thought the use of these containers was
> semantics-free.

  The intended use has less to do with it than the notion that you can
  define what is meta-information and what is information at the time you
  want to decide whether something goes in an attribute or a sub-element.
  My argument is that this is impossible.  Whether it is meta-information
  or information is a reflection of the actual use, not the intended use.

  However, given that the mechanism was created, and I will argue that it
  was not so much created as it was never thought possible to be any other
  way, it was used to define several language properties.  "Now that we
  have this, would it not also be nice to have that."  This means that
  several of the attribute types grew very far apart from the contents of
  sub-elements and you sort of "had" to use them as attributes, but only
  sort of, because the application can and does define the semantics of
  everything, and if you want ID and IDREF, you can make the same choice as
  you would in Common Lisp to use symbols or a hash tables of strings.

> >   I have come to _loathe_ the half-assed hybrid that some XML-in-Lisp tools
> >   use and produce, because it makes XML just as evil in Lisp as it was in
> >   XML to begin with, and we have gained absolutely nothing in either power
> >   of processing or in abstraction, which is so very un-Lisp-like.
> > 
> > <foo bar="zot">quux</foo>
> > 
> >   should be read as
> > 
> > (foo (bar "zot") "quux")
> > 
> 
> Maybe. Macsyma used a similar notation for years (though without the restriction
> on container-ness).  I don't think the answer is to change to do the rewrite
> you suggest.

  I cannot follow you here.  I am not suggesting a rewrite.  I suggest that
  there is _no_ distinction between attribute and sub-element contents.
  What I am trying to communicate is so emphatically _NOT_ syntax that we
  will have a severe communications problem if this is not understood.  The
  syntax has a function, and I am challenging the _function_ of the syntax
  that is believed by many people to support a concept I _also_ challenge.
  What do you gain from the attribute-vs-contents dichotomy?  Why do you
  need it?  What does it do for you?  What would you have done if it were
  not there?  What choices and design decisions went into attributes that
  would go into contents if you did not have attributes?

> I don't understand why it's not natural to add the
> following as legal syntaxes:
> 
>  <foo bar=<zot/>>
> 
> or
> 
>  <foo bar=<string>zot</string>>quux</foo>

  Imagine that all attributes are in fact sub-elements, and this problem
  just goes away.  Please, discard the concept of attributes.  They no
  longer exist.  What used to be called "attributes" are only sub-elements
  with special treatment and a whole bunch of arbitrary restrictions, one
  of which is lack of internal structure (except insofar as defined by the
  NOTATION attribute of attributes in SGML).

> This would keep people from feeling the attribute list was a shorthand
> area and would also allow the storing of complex meta-data.

  But that is not my goal.  My goal is to get rid of the idea that there is
  a distinction that can be made once and for all, and prematurely at that,
  that some information is meta-data and some information is data.  The
  core philosophical mistake in SGML is that you can specify these things
  before you know them.  SGML is great for after-the-fact description of
  structures you already know how to deal with perfectly.  It absolutely
  sucks for structures that are in any way yet to be defined.  This is
  _because_ it is impossible to define what is considered meta-information
  and what is considered information before you actually have a full-blown
  software application that is hard to change your mind about.  SGML was
  supposedly designed to free data from the vagaries of software, but when
  it adopted the attribute-content dichotomy, it dove right into dependency
  on the software design process instead of the information design process.

> Do you know what the reason was that recursive structures were not
> allowed in this position in XML?

  Yes, as a matter of fact, I do.  Recursive structures are in fact allowed
  in attribute values, provided that your application processe them and not
  the SGML/XML parser.  Back in the SGML days, the NOTATION attribute of
  both elements and attribute values was designed as an "escape" to the
  application to let some other syntax processor deal with the string of
  characters.  (Please understand that everything SGML/XML is a string of
  characters.  There are no _values_.  Imposing valuedom on strings is the
  kind of semantics that SGML/XML specifically does _not_ support.)

> Or perhaps it was the fact that the "real world" substitutes for "parsed
> structure" things like that weird assembly code like notation which looks
> like
> 
>  (A
>  AHREF=foo.html
>  -Text
>  )A
> 
> Perhaps someone was just being uncreative about how a compound-structure
> could be offered as an attribute.

  No, they never actually thought of it that way.  You have to understand
  and appreciate that the design process for SGML was such that some people
  had a very clear picture of the meta-information-vs-information dichotomy
  and that it never occurred to anyone that meta-information had exactly
  the same properties as information.

  Whoever first decided to define HTML in such a way that unknown elements
  should be displayed suffered from exactly the same problem.  As a sorry
  consequence, we have elements that have to contain _comments_ that are
  the real contents because that somebody did not foresee the need to have
  meta-information in contents.  I argue that this is a result of "getting"
  the invalid meta-information/information dichotomy.  If that person had
  not been bitten by the false idea that meta-information is fundamentally
  different from information, he would have realized that there would be a
  need to use element contents for meta-information, as well.

> Good.  I'd hate for it to be "lost" as merely a post here, though I think
> it's fun that you felt comfortable in sharing your thoughts.

  Well, it took ten years of discomfort with the "attribute" concept before
  I went back to examine the genesis of the various forms of attributes and
  persisted in asking the question "could it not have been done with
  sub-elements", and finally found that the reason it could not was that
  somebody did not _want_ it to be done with sub-elements, and that the
  root cause of this was a fundamental misunderstanding of the relationship
  between information and meta-information.  Just like Plato and Aristotle
  agreed that ideas and concepts were somehow "inherent" in the things we
  saw and not a property of the person who observed and organized them in
  his own mind, SGML embodies the false premise that structuring has some
  inherent qualities and processing that structure should reflect its
  inherent qualities.  The result is that the processing defines the
  structure.  If there is a mismatch between the two, the result is a very
  painful and elaborate processing, and it can be solved very simply by
  removing the attribute/sub-contents dichotomy, because once we do that,
  we return to first principles and can move forward with the same
  knowledge and experience that created the attributes, but now we can do
  it with sub-elements, instead, and I can promise you that once you start
  off on that road, the least of your worries will be recursive structure
  in attribute values.

///

From: Ray Blaak
Subject: Re: XML and lisp
Date: Fri, 24 Aug 2001 21:54:28 +0000
Message-ID: <upu9l9l9n.fsf@infomatch.com>

Erik Naggum <····@naggum.net> writes:
>   The key to understanding this is that there is no _one_ real distinction.
>   There are in fact any number of "real distinctions".  You just found one
>   way to wrap your world in the attribute/contents dichotomy because it was
>   there.

I fully agree with Eric here, and have myself implemented S-expression file
formats that in fact collapsed attributes to be just child elements in the
same way.

The only useful information was the name of some data, and the assumed or
explicit type of the data value. It made no difference in terms of processing
if the data was logically an "attribute" or an "element" -- the problem of
extraction is exactly the same in both cases.

That there is a difference with XML is only due to its artifical distinction
of attributes vs elements.

The only useful distinction I have found for attributes vs elements was
aesthetical: how did the element look to a human reader (i.e. me) of the XML
file? Whether I would choose a simple vs compound approach depended solely on
my mental picture of the data in question, e.g.

  <foo name="joe" size="big"/>

vs

  <foo>
    <name>joe</name>
    <size>big</size>
  </foo>

In an S-expression format however, it doesn't matter, and the aesthetic
distinction is only:

  (foo (name joe) (size big))

vs 

  (foo
    (name joe)
    (size big))

>   Recursive structures are in fact allowed in attribute values, provided
>   that your application processe them and not the SGML/XML parser.

<rant>

As a separate topic, I just *hate* when people encode complicated data into
attributes, forcing applications solve yet another parsing problem. The whole
point of something like XML is to have a standard encoding structure. The
"parsing" problem is supposed to be solved once, and only the semantic
interpretation should remain.

E.g. 

<this_bad choice="a|b|c"/>

<this_better>
  <choice>
    <alternative>a</alternative>
    <alternative>b</alternative>
    <alternative>c</alternative>
  </choice>
</this_better>

</rant>

Of course, S-expressions are much more preferrable. The only "cool" things
about XML that I like is the ability to specify character encodings (ASCII vs
Unicode, etc.) and the schema namespaces business, such that one can mix
"tags" from semantically different spaces.

Mind you, the details of Schemas are overly complicated and gross to use, but
they are better than DTDs which should just die die die. DTDs are a lesson in
why a separate language should *not* be invented.

-- 
Cheers,                                        The Rhythm is around me,
                                               The Rhythm has control.
Ray Blaak                                      The Rhythm is inside me,
·····@infomatch.com                            The Rhythm has my soul.

From: Bob Bane
Subject: Re: XML and lisp
Date: Thu, 23 Aug 2001 16:11:24 +0000
Message-ID: <3B852B2C.63397A1E@removeme.gst.com>

I've been using this as a .signature line on Slashdot for awhile:

	To a Lisp hacker, XML is S-expressions in drag.

-- 
Remove obvious stuff to e-mail me.
Bob Bane

From: Kaz Kylheku
Subject: Re: XML and lisp
Date: Thu, 23 Aug 2001 16:57:55 +0000
Message-ID: <nCah7.95612$B37.2156813@news1.rdc1.bc.home.com>

In article <·················@removeme.gst.com>, Bob Bane wrote:
>I've been using this as a .signature line on Slashdot for awhile:
>
>	To a Lisp hacker, XML is S-expressions in drag.

To every other hacker, XML is just a drag, period.

From: Kaz Kylheku
Subject: Re: XML and lisp
Date: Thu, 23 Aug 2001 15:10:43 +0000
Message-ID: <T19h7.95203$B37.2152414@news1.rdc1.bc.home.com>

In article <··············@scumbag.ecs.soton.ac.uk>, Jacek Generowicz wrote:
>I've been doing my best to ignore XML thus far, but repeatedly
>encountering comparisons of XML to lisp has piqued my interest.

Comparisons between XML and Lisp as a whole are meaningless.
Only comparisons between XML and Lisp as a data representation are
meaningful. XML is not a programming language, it is merely a syntax
for data representation which squanders bandwidth, memory and processing
time. Lisp as a data representation is more frugal. It's close to being
as compact as you can make a notation for structured data while remaining
in readable plain text.

From: Graham Ward
Subject: Re: XML and lisp
Date: Thu, 23 Aug 2001 22:06:01 +0000
Message-ID: <877kvuxwhi.fsf@albertine.gorgeous.org>

Jacek Generowicz <···@ecs.soton.ac.uk> writes:

> I've been doing my best to ignore XML thus far, but repeatedly
> encountering comparisons of XML to lisp has piqued my interest. I am
> wondering whether I can advance my understanding of lisp by learning
> about its relation to XML.
> 
> Can you reccommend any books or URLs which could help me to learn
> about XML, whith the aim of being able to discuss intelligently the
> relative merits of the two.


http://www-formal.stanford.edu/jmc/cbcl.html


g



> Jacek