From: Stanley Knutson
Subject: More on XML name representation in Lisp
Date: 
Message-ID: <abhbmtosbmq1mn6ma3javuvnur6de87nuj@4ax.com>
On Mon, 09 Jul 2001 22:21:18 +0200, james anderson
<··············@setf.de> wrote on 8-jul-2001: 
[Entire thread on "XML names" deleted, search for "Re: XML names and
lisp symbols (was Re: package system history)" in deja news.

I read the thread referenced above  about XML namespaces.
After reading that thread, I'm not sure what is the "best" way to
represent XML element names and namespaces in a Lisp program.

I felt it worth summarizing my thinking, and suggesting an
implementation approach.  I am interested in suggestions for
improvemnt on implementation, and how to deal with the mess that W3C
has left us.

Now I just re-read http://www.w3.org/TR/REC-xml-names/
It seems to indicate the following axioms:
1) XML namespaces are defined by a URI.
2) XML namespaces have one or more prefixes that are used in the
actual document.
3) Namespace prefixes are scoped by the defining element.
This means a particular namespace prefix is not globally unique
and is not necessarily even unique within one document.
[i.e., the name "ns:lcl" could be interpreted differently in two
locations within one document due to inclusion or binding of a
namespace prefix].
4) the namespace xml is "builtin" and thus _is_ global.  xmlns prefix
is used to adjust namespace bindings
5) There may be a "default namespace" which is provided either
external to to the document, or within the document via its
DTD/Schema.
There are can also be "implicitly defined" prefixes external to the
document [for example, SOAP assumes xsi and xsd prefixes are defined]
6) The default namespace can be explicitly set to an empty string,
meaning "no default" 
7) Attribute names don't use the default namespace: they are either
explicitly qualified or are in the "no default" namespace.
8) The prefix for an element name can be defined by an xmlns attribute
within that element.
 Example: <m:GetLastTradePriceResponse xmlns:m="Some-URI">
This means the creation of an element name needs to be deferred until
after the attributes have been read [even in a non-validating
parser!].
BTW: I think this is a particularly ugly feature of XML namespaces.

These are these kinds of uses for the "xml name" within a lisp
program:
1) mapping of element names to various actions
[classes to store them, function names, etc.].  To do this, an object
that can be compared via "eq" is desirable, so as to allow use of case
statements as well as eq-type hashtables.
2) If a "DOM-like" structure is created by reading, it is desirable
to be able to write it back out again and reuse the same abbreviations
that were used when it was read.
3) Interpretation of a "DOM-like" structure may require knowing the
URI that is associated with a particular namespace prefix.
4) Names with "no default namespace" have to be kept unique from those
with a namespace.  This is especially important for attribute names.
5) Attribute names are always disjoint from element names in
interpretation

Some observations:
1) Using lisp packages for namespaces and symbols for names is 
not a particularly good idea -- doing so loses the associated URI
This can be managed by creating ugly package names [with the full
URI], but I don't like that approach.
2) The binding of namespaces (and the default namespace) by element
requires some kind of "namespace stack" during parsing.
Any "DOM-like" structure has to preserve this for proper output.
3) The desire for eq-ness of prefix:local for comparison is a problem
since the meaining of "n1:a" is not unique, since only the associated
URI's make them unique.
4) While it is possible for "n1:a" and "n2:a" to refer to the same
name such alias usage is pretty rare.  [this is more of a belief than
observation.  It simplifies implementation]

As a result, I reach the following conclusions:

A) A "namespace" is a URI.
B) A namespace prefix is a binding of a namespace [URI] with a prefix.
These bindings are not really preserved after the document is read
since the "expanded name" captures the binding.
C) A "name" is a tuple of [namespace, prefix, local-name].  
The prefix is kept for convenience in printing, since is not part of
uniquess.  This is the 'expanded names' as defined in XML-namespace
spec.
D) It is convenient to have eq representation [singleton objects] for
a name. To do this, there has to be a way to create eq
objectsdirectly.
This is why people have preferred symbols as the XML names: the eq
object is easily created.
E) It is convenient to have a "standard prefix" for a namespace
so as to allow prettier printing  (example "xsi:integer", not
|http://www.w3.org/1999/XMLSchema-instance|:|integer|)
F) It is probably ok to keep uniqueness based on the triple
of [namespace-uri, prefix, local-name], rather than just
[namespace-uri, local-name].
This allows eq to be used in most cases if a name is a clos object.
(the function name-equal can be provided to compare names without the
prefix).
Note: I'd specialize the "equal" function if this was java :)

So here is an implementation suggestion:
1) Element names and attribute names can be distinct things sometimes.
2) Attribute names can be keywords, UNLESS they are qualified.
3) Qualified attribute names must be keep their reference to the
namespace URI, so they need to be objects of the same type as element
names.
4) A namespace class [as a CLOS object].  It retains the URI, and
holds the names within that namespace.
5) A name class [as a CLOS object].  This is interned [as a singleton]
by the containing namespace class, so that "a" _is_ keep as a unique
object within the namespace (if there is only one prefix).  
The printing of this object depends on the "current ns context"
to determine if the prefix or the full URI must be printed for the
name.
6) A "namespace context" that keeps associations of prefixes and
namespaces AFTER reading.
The "current ns context" is an instance of this class -- it is
referenced to determine if "n1:a" can be printed for the name that is
really "http://myns.com/":"a".
This context is bound by a global variable, so it is possible to have
different contexts available simultaneously if needed [hopefully such
cases are rare]
7) A "namespace binding manager" that can be used during parsing so as
to maintain the current setting for default namespace and which
maintains the "binding stack" that maps prefixes to namespaces.  
8) The "no default" namespace is simplest to do as an empty string for
element names.
9) There needs to be a way to create an "XML name" object that can be
used in writing code.
This requires some further experimentation with the "current ns
context" that manages "current prefixes".
A proposal is this kind of syntax:
(eval-when (:compile-toplevel :load-toplevel)
  (define-ns-prefix "xsi" "http://www.w3.org/1999/XMLSchema-instance")
  (defconstant xsi-type #.(ns-intern :xsi "type")))
Then xsi-type is a constant which prints as EITHER:
	#<XNAME xsi:type>
or if there is a different definition of :xsi prefix
	#<XNAME xsi:type "http://www.w3.org/1999/XMLSchema-instance">

It would be possible to introduce a reader macro for names as well.
However, my experience has been this complicates compilation a lot
since I don't like defining a "project global" reader macro.

I'd appreciate any comments on this set of implementation decisions.
I'm starting on this "XML and SOAP in Lisp" project this week.

- Stanley Knutson
stanley @ [nospam] ktiworld.com

From: Sunil Mishra
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <3B66471B.4010506@notmyemail.com>
this information this information Stanley Knutson wrote:

> 
> Now I just re-read http://www.w3.org/TR/REC-xml-names/
> It seems to indicate the following axioms:
> 1) XML namespaces are defined by a URI.
> 2) XML namespaces have one or more prefixes that are used in the
> actual document.
> 3) Namespace prefixes are scoped by the defining element.
> This means a particular namespace prefix is not globally unique
> and is not necessarily even unique within one document.
> [i.e., the name "ns:lcl" could be interpreted differently in two
> locations within one document due to inclusion or binding of a
> namespace prefix].
> 4) the namespace xml is "builtin" and thus _is_ global.  xmlns prefix
> is used to adjust namespace bindings
> 5) There may be a "default namespace" which is provided either
> external to to the document, or within the document via its
> DTD/Schema.
> There are can also be "implicitly defined" prefixes external to the
> document [for example, SOAP assumes xsi and xsd prefixes are defined]
> 6) The default namespace can be explicitly set to an empty string,
> meaning "no default" 
> 7) Attribute names don't use the default namespace: they are either
> explicitly qualified or are in the "no default" namespace.
I'm quite certain that the attributes take on the namespace of the 
container element as their default. Take a look at the first example in 
section 5 of the namespace specification, specifically the <html:a ...> 
tag. Would have been nice if they had actually made this detail a little 
more obvious.

> 8) The prefix for an element name can be defined by an xmlns attribute
> within that element.
>  Example: <m:GetLastTradePriceResponse xmlns:m="Some-URI">
> This means the creation of an element name needs to be deferred until
> after the attributes have been read [even in a non-validating
> parser!].
> BTW: I think this is a particularly ugly feature of XML namespaces.

Worse, it means that any implementation that attempts to parse and 
assign namespaces in a single pass will turn out to be horribly complex.

> 
> These are these kinds of uses for the "xml name" within a lisp
> program:
> 1) mapping of element names to various actions
> [classes to store them, function names, etc.].  To do this, an object
> that can be compared via "eq" is desirable, so as to allow use of case
> statements as well as eq-type hashtables.

I personally consider comparison via eq a necessity. There is a sense of 
identity here that a weaker predicate would not be able to satisfy. It 
is not something I can satisfactorily explain though.

> 2) If a "DOM-like" structure is created by reading, it is desirable
> to be able to write it back out again and reuse the same abbreviations
> that were used when it was read.
> 3) Interpretation of a "DOM-like" structure may require knowing the
> URI that is associated with a particular namespace prefix.

There is no "may" here. I don't think you can get a reasonable DOM 
representation without being able to compare elements across different 
parts of the document correctly.

> 4) Names with "no default namespace" have to be kept unique from those
> with a namespace.  This is especially important for attribute names.

See earlier comment about attributes.

> 5) Attribute names are always disjoint from element names in
> interpretation

What gave you this impression? A name is a name. I don't see anything in 
the standard that keeps you from using the same name for an element and 
an attribute. Having them in the same element might be improper though, 
from the point of view of SGML semantics. SGML considers the element 
name to be the value of an implicit attribute, and XML clearly prohibits 
two attributes with the same name in a single open tag form. Wonder how 
other parsers deal with this detail...

> Some observations:
> 1) Using lisp packages for namespaces and symbols for names is 
> not a particularly good idea -- doing so loses the associated URI
> This can be managed by creating ugly package names [with the full
> URI], but I don't like that approach.

I consider this practice to be analogous to sticking data in symbol 
property lists, just because it happens to be convenient. Managing 
explicit data structures, IMHO, makes a program more transparent and 
maintainable. If namespaces are equated to packages, attaching data to a 
namespace would necessarily require hiding it in non-obvious ways in 
internal lisp data structures. Ugh.

> 2) The binding of namespaces (and the default namespace) by element
> requires some kind of "namespace stack" during parsing.
> Any "DOM-like" structure has to preserve this for proper output.
> 3) The desire for eq-ness of prefix:local for comparison is a problem
> since the meaining of "n1:a" is not unique, since only the associated
> URI's make them unique.
> 4) While it is possible for "n1:a" and "n2:a" to refer to the same
> name such alias usage is pretty rare.  [this is more of a belief than
> observation.  It simplifies implementation]
> 
> As a result, I reach the following conclusions:
> 
> A) A "namespace" is a URI.
> B) A namespace prefix is a binding of a namespace [URI] with a prefix.
> These bindings are not really preserved after the document is read
> since the "expanded name" captures the binding.
> C) A "name" is a tuple of [namespace, prefix, local-name].  
> The prefix is kept for convenience in printing, since is not part of
> uniquess.  This is the 'expanded names' as defined in XML-namespace
> spec.

I would disagree with this. A name is merely the tuple [namespace, 
local-name]. If anything, the particular prefix used with the name is 
something that an individual document (or document fragment) knows.

> D) It is convenient to have eq representation [singleton objects] for
> a name. To do this, there has to be a way to create eq
> objectsdirectly.
> This is why people have preferred symbols as the XML names: the eq
> object is easily created.
> E) It is convenient to have a "standard prefix" for a namespace
> so as to allow prettier printing  (example "xsi:integer", not
> |http://www.w3.org/1999/XMLSchema-instance|:|integer|)
> F) It is probably ok to keep uniqueness based on the triple
> of [namespace-uri, prefix, local-name], rather than just
> [namespace-uri, local-name].

See above. I would keep the nice, readable prefix associated with the 
namespace rather than with an individual name. This allows for 
consistent naming of elements when developing an application, while 
keeping the names easy to remember, etc.

> This allows eq to be used in most cases if a name is a clos object.
> (the function name-equal can be provided to compare names without the
> prefix).
> Note: I'd specialize the "equal" function if this was java :)
> 
> So here is an implementation suggestion:
> 1) Element names and attribute names can be distinct things sometimes.

I still don't see a reason for this.

> 2) Attribute names can be keywords, UNLESS they are qualified.

Why handle a special case? And see above for attribute scoping rules.

> 3) Qualified attribute names must be keep their reference to the
> namespace URI, so they need to be objects of the same type as element
> names.
> 4) A namespace class [as a CLOS object].  It retains the URI, and
> holds the names within that namespace.
> 5) A name class [as a CLOS object].  This is interned [as a singleton]
> by the containing namespace class, so that "a" _is_ keep as a unique
> object within the namespace (if there is only one prefix).  
> The printing of this object depends on the "current ns context"
> to determine if the prefix or the full URI must be printed for the
> name.
> 6) A "namespace context" that keeps associations of prefixes and
> namespaces AFTER reading.
> The "current ns context" is an instance of this class -- it is
> referenced to determine if "n1:a" can be printed for the name that is
> really "http://myns.com/":"a".
> This context is bound by a global variable, so it is possible to have
> different contexts available simultaneously if needed [hopefully such
> cases are rare]
> 7) A "namespace binding manager" that can be used during parsing so as
> to maintain the current setting for default namespace and which
> maintains the "binding stack" that maps prefixes to namespaces.  
> 8) The "no default" namespace is simplest to do as an empty string for
> element names.
> 9) There needs to be a way to create an "XML name" object that can be
> used in writing code.
> This requires some further experimentation with the "current ns
> context" that manages "current prefixes".
> A proposal is this kind of syntax:
> (eval-when (:compile-toplevel :load-toplevel)
>   (define-ns-prefix "xsi" "http://www.w3.org/1999/XMLSchema-instance")
>   (defconstant xsi-type #.(ns-intern :xsi "type")))
> Then xsi-type is a constant which prints as EITHER:
> 	#<XNAME xsi:type>
> or if there is a different definition of :xsi prefix
> 	#<XNAME xsi:type "http://www.w3.org/1999/XMLSchema-instance">
> 
> It would be possible to introduce a reader macro for names as well.
> However, my experience has been this complicates compilation a lot
> since I don't like defining a "project global" reader macro.

I have already implemented a reader macro oriented approach as part of 
an XML processing package. I'm no longer developing the package, but its 
under LGPL. You should be able to use pieces of it as you wish. Have a 
look at http://sourceforge.net/projects/lsp. It uses structures for 
names and namespaces, has a reader macro, a meta-circular rule engine 
that is capable of matching XML structures based on patterns defined in 
the reader macro syntax. The backend parser is expat (a standard C based 
XML parser), so there is a bit of foreign binding ugliness. But with a 
little work you should be able to replace the parser, if so needed. The 
printer and reader definitely do not follow XML rules correctly. But 
then its only version 0.1.4 :-)

Regards,

Sunil
From: Stanley Knutson
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <60tdmtkt76umtv3ao6efvdcr352irnkaph@4ax.com>
On Mon, 30 Jul 2001 22:50:19 -0700, Sunil Mishra
<·······@notmyemail.com> wrote:

>this information this information Stanley Knutson wrote:
>
>> 
>> Now I just re-read http://www.w3.org/TR/REC-xml-names/
>> 5) Attribute names are always disjoint from element names in
>> interpretation
>
>What gave you this impression? A name is a name. I don't see anything in 
>the standard that keeps you from using the same name for an element and 
>an attribute. Having them in the same element might be improper though, 
>from the point of view of SGML semantics. SGML considers the element 
>name to be the value of an implicit attribute, and XML clearly prohibits 
>two attributes with the same name in a single open tag form. Wonder how 
>other parsers deal with this detail...
>

Thanks for pointing this out.  A third reading of section A.3
clarifies this.

>> C) A "name" is a tuple of [namespace, prefix, local-name].  
>> The prefix is kept for convenience in printing, since is not part of
>> uniquess.  This is the 'expanded names' as defined in XML-namespace
>> spec.
>
>I would disagree with this. A name is merely the tuple [namespace, 
>local-name]. If anything, the particular prefix used with the name is 
>something that an individual document (or document fragment) knows.
>

After thinking about this more, I agree.  What I will do is to allow a
namespace to have _one_ "normal prefix".  This prefix is used for
printing [only].  Within a "namespace context" the prefixes will need
to be unique. This may force some items to print with their URI if
their preferred prefix is already taken when introduced.  Sigh ...

[in particular, its not really possible to re-generate the document in
any other way due to other possible conflicts in prefix usage between
documents that are being combined].

>> 2) Attribute names can be keywords, UNLESS they are qualified.
>
>Why handle a special case? And see above for attribute scoping rules.

I think I'll give up on this.  My goal was to simplify initialization
in the "common case" of creating a CLOS object for the element.
However, I don't know if it is actually very common to have
unqualified names..   Also, when reading/interpreting an XML-SOAP
message, the actual lisp object to be created might not be a CLOS
object at all [look at how arrays are handled!]


>> It would be possible to introduce a reader macro for names as well.
>> However, my experience has been this complicates compilation a lot
>> since I don't like defining a "project global" reader macro.
>
>I have already implemented a reader macro oriented approach as part of 
>an XML processing package. I'm no longer developing the package, but its 
>under LGPL. You should be able to use pieces of it as you wish. Have a 
>look at http://sourceforge.net/projects/lsp. It uses structures for 
>names and namespaces, has a reader macro, a meta-circular rule engine 
>that is capable of matching XML structures based on patterns defined in 
>the reader macro syntax. The backend parser is expat (a standard C based 
>XML parser), so there is a bit of foreign binding ugliness. But with a 
>little work you should be able to replace the parser, if so needed. The 
>printer and reader definitely do not follow XML rules correctly. But 
>then its only version 0.1.4 :-)
>

I don't think it is easy to build a lisp reader for XML that works.
The parsing and recognition of <![CDATA[ requires ability to do
multi-character lookahead, which is not easy with a normal reader
macro.  The Franz XML reader appears to handle this correctly, but it
uses its own looahead mechanism.

- Stanley Knutson
From: Sunil Mishra
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <3B675FEB.4090605@notmyemail.com>
>>>It would be possible to introduce a reader macro for names as well.
>>>However, my experience has been this complicates compilation a lot
>>>since I don't like defining a "project global" reader macro.
>>>
>>I have already implemented a reader macro oriented approach as part of 
>>an XML processing package. I'm no longer developing the package, but its 
>>under LGPL. You should be able to use pieces of it as you wish. Have a 
>>look at http://sourceforge.net/projects/lsp. It uses structures for 
>>names and namespaces, has a reader macro, a meta-circular rule engine 
>>that is capable of matching XML structures based on patterns defined in 
>>the reader macro syntax. The backend parser is expat (a standard C based 
>>XML parser), so there is a bit of foreign binding ugliness. But with a 
>>little work you should be able to replace the parser, if so needed. The 
>>printer and reader definitely do not follow XML rules correctly. But 
>>then its only version 0.1.4 :-)
>>
>>
> 
> I don't think it is easy to build a lisp reader for XML that works.
> The parsing and recognition of <![CDATA[ requires ability to do
> multi-character lookahead, which is not easy with a normal reader
> macro.  The Franz XML reader appears to handle this correctly, but it
> uses its own looahead mechanism.
> 
> - Stanley Knutson
> 

Ack! Realized that this part had not come out right. The expat parser 
does a great job of parsing the XML correctly. Its efficiency is really 
quite impressive, even with the foreign binding overhead. I have been 
able to parse (through the bindings) many, many examples from XML test 
suites. The inaccuracy in the parser and printer have to do with how I 
have implemented whitespace handling on top of what expat returns. That 
part of the code resides entirely in the lisp world.

In any case, the reader macro, data structures, etc are all independent 
of the parser and printer. So in theory you should be able to pick up 
the data structures and manipulators, wrap them with your own parser and 
printer (perhaps even Franz's) and do what you wish with it.

Sunil
From: Louis Theran
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <theran-3107012306060001@10.0.1.6>
In article <················@notmyemail.com>, Sunil Mishra
<·······@notmyemail.com> wrote:


> > 8) The prefix for an element name can be defined by an xmlns attribute
> > within that element.
> >  Example: <m:GetLastTradePriceResponse xmlns:m="Some-URI">
> > This means the creation of an element name needs to be deferred until
> > after the attributes have been read [even in a non-validating
> > parser!].
> > BTW: I think this is a particularly ugly feature of XML namespaces.
> 
> Worse, it means that any implementation that attempts to parse and 
> assign namespaces in a single pass will turn out to be horribly complex.

Can you elaborate on this?  I would think that this has more to do with
the data structure you are trying to build from the XML document than
anything about deferring namespace expansion until after the attribute
list has been parsed.
 

^L
From: Sunil Mishra
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <3B68130D.3060607@notmyemail.com>
Louis Theran wrote:
> In article <················@notmyemail.com>, Sunil Mishra
> <·······@notmyemail.com> wrote:
> 
> 
> 
>>>8) The prefix for an element name can be defined by an xmlns attribute
>>>within that element.
>>> Example: <m:GetLastTradePriceResponse xmlns:m="Some-URI">
>>>This means the creation of an element name needs to be deferred until
>>>after the attributes have been read [even in a non-validating
>>>parser!].
>>>BTW: I think this is a particularly ugly feature of XML namespaces.
>>>
>>Worse, it means that any implementation that attempts to parse and 
>>assign namespaces in a single pass will turn out to be horribly complex.
>>
> 
> Can you elaborate on this?  I would think that this has more to do with
> the data structure you are trying to build from the XML document than
> anything about deferring namespace expansion until after the attribute
> list has been parsed.
>  
> 
> ^L
> 

The problem is that you cannot know the namespace of an open tag without 
first parsing the whole open tag. You absolutely must revisit all the 
data associated with the open tag to figure out the namespace 
assignments. Perhaps horribly complex was an exaggeration, but the 
parser would be far simpler if it had the namespace information before 
it started parsing the open tag.

Also, with the current structure, it is not possible to write a truly 
fine grain event driven (callback oriented) parser. All the data for the 
open tag must be in a single event. It is not possible to separate the 
tag name and the attribute/value pairs into independent events. This is 
usually not an issue, admittedly.

Sunil
From: Marco Antoniotti
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <y6cbslz7lo1.fsf@octagon.mrl.nyu.edu>
Sunil Mishra <·······@notmyemail.com> writes:

> Louis Theran wrote:
> > In article <················@notmyemail.com>, Sunil Mishra
> > <·······@notmyemail.com> wrote:
> > 
> > 
> > 
> >>>8) The prefix for an element name can be defined by an xmlns attribute
> >>>within that element.
> >>> Example: <m:GetLastTradePriceResponse xmlns:m="Some-URI">
> >>>This means the creation of an element name needs to be deferred until
> >>>after the attributes have been read [even in a non-validating
> >>>parser!].
> >>>BTW: I think this is a particularly ugly feature of XML namespaces.
> >>>
> >>Worse, it means that any implementation that attempts to parse and 
> >>assign namespaces in a single pass will turn out to be horribly complex.
> >>
> > 
> > Can you elaborate on this?  I would think that this has more to do with
> > the data structure you are trying to build from the XML document than
> > anything about deferring namespace expansion until after the attribute
> > list has been parsed.
> >  
> > 
> > ^L
> > 
> 
> The problem is that you cannot know the namespace of an open tag without 
> first parsing the whole open tag. You absolutely must revisit all the 
> data associated with the open tag to figure out the namespace 
> assignments. Perhaps horribly complex was an exaggeration, but the 
> parser would be far simpler if it had the namespace information before 
> it started parsing the open tag.
> 
> Also, with the current structure, it is not possible to write a truly 
> fine grain event driven (callback oriented) parser. All the data for the 
> open tag must be in a single event. It is not possible to separate the 
> tag name and the attribute/value pairs into independent events. This is 
> usually not an issue, admittedly.
> 

I must say that I am not that privy to XML intricacies, but, from the
postings on this thread, I gather that most of the gripes w.r.t. XML
namespaces, come from the fact that you are better off writing a two
pass parser.  Why is this such a big problem anyway?

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Greg Menke
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <m38zh3iqvv.fsf@europa.mindspring.com>
> 
> I must say that I am not that privy to XML intricacies, but, from the
> postings on this thread, I gather that most of the gripes w.r.t. XML
> namespaces, come from the fact that you are better off writing a two
> pass parser.  Why is this such a big problem anyway?
> 

It makes parsing streams a bit more tedious for one thing.

Gregm
From: Louis Theran
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <m2itg7bpmc.fsf@localhost.cs.umass.edu>
Marco Antoniotti <·······@cs.nyu.edu> writes:

> I must say that I am not that privy to XML intricacies, but, from the
> postings on this thread, I gather that most of the gripes w.r.t. XML
> namespaces, come from the fact that you are better off writing a two
> pass parser.  

A standard two-level parser structure works fine.  No second pass over
the data is required.  No, it's not a big deal.

^L
From: james anderson
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <3B68C583.C101DFF8@setf.de>
the constraint is that interning of attribute names and the element
identifier for a given tag must be deferred until after the namespace
bindings have been processed.

i would call this a "restriction" rather than a "problem". it affects
how one can do things like inline validation and unmarshalling, but i'm
not sure that makes it a "problem".

Marco Antoniotti wrote:
> 
> Sunil Mishra <·······@notmyemail.com> writes:
> 
> > Louis Theran wrote:
> > > In article <················@notmyemail.com>, Sunil Mishra
> > > <·······@notmyemail.com> wrote:
> > >
> > >
> > >
> > ..
> >
> 
> I must say that I am not that privy to XML intricacies, but, from the
> postings on this thread, I gather that most of the gripes w.r.t. XML
> namespaces, come from the fact that you are better off writing a two
> pass parser.  Why is this such a big problem anyway?
> 
> Cheers
>
From: james anderson
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <3B68C33E.E0616307@setf.de>
one may be surprised  how fine grained the call back / event stream can be.
the standard for java-based parsing (SAX2) notifies one of the start and
end of the scope of each namespace binding individually. these preceed /
follow the events for the start / end of the respective element.

the next cl-xml version will afford application access to any reduction
event (basically any phrase from the xml bnf + internal operations like
binding a namespace prefix). this was implemented in order to drive a
streaming rdf parser. that parser actually needed only sax2-resolution
events, but there is also interest in reducing resoource usage while
parsing large documents, for which it turns out to be possible to afford
application access to any event for which the parser would cons.

...

Sunil Mishra wrote:
> 
> ...
> 
> The problem is that you cannot know the namespace of an open tag without
> first parsing the whole open tag. You absolutely must revisit all the
> data associated with the open tag to figure out the namespace
> assignments. Perhaps horribly complex was an exaggeration, but the
> parser would be far simpler if it had the namespace information before
> it started parsing the open tag.
> 
> Also, with the current structure, it is not possible to write a truly
> fine grain event driven (callback oriented) parser. All the data for the
> open tag must be in a single event. It is not possible to separate the
> tag name and the attribute/value pairs into independent events. This is
> usually not an issue, admittedly.
> 
> Sunil
From: Howard Stearns
Subject: Re: More on XML name representation in Lisp
Date: 
Message-ID: <3B66B0F7.E0AE5743@curl.com>
I suspect that you've got it right, but you used some wording that
suggests that it is POSSIBLE that your understanding is slightly off. 
I've rewritten some of your points to clarify them.

Stanley Knutson wrote:
> ...
> Now I just re-read http://www.w3.org/TR/REC-xml-names/
> It seems to indicate the following axioms:
> 1) XML namespaces are defined by a URI.

XML namespaces can be identified by a URI.
(The file doesn't have to actually exist, and it need not have any
content that actually "defines" the namespace.  The URI is simply a
global name for the namespace.)

> 2) XML namespaces have one or more prefixes that are used in the
> actual document.

Each document can define its own prefixes (which can vary though the
document) to use as a shorthand for the namespace URI.  The prefixes
have lexical scope. 
(That is, the namespaces themselves don't have any inherent prefix
predefined for them.)

> 3) Namespace prefixes are scoped by the defining element.
> This means a particular namespace prefix is not globally unique
> and is not necessarily even unique within one document.
> [i.e., the name "ns:lcl" could be interpreted differently in two
> locations within one document due to inclusion or binding of a
> namespace prefix].

(Not sure what you mean here, but I suspect it's covered by my rewritten
rule 2.)

> 4) the namespace xml is "builtin" and thus _is_ global.  xmlns prefix
> is used to adjust namespace bindings

No prefix can be defined that begins with the three characters "xml". 
The namespace xml is "builtin" and thus _is_ global.  
Something that looks like a prefix named xmlns is actually the syntax
for defining a new lexical prefix.

> 5) There may be a "default namespace" which is provided either
> external to to the document, or within the document via its
> DTD/Schema.
> There are can also be "implicitly defined" prefixes external to the
> document [for example, SOAP assumes xsi and xsd prefixes are defined]

(Not sure what you're saying here.  This might be covered by the
rewritten next rule.)

> 6) The default namespace can be explicitly set to an empty string,
> meaning "no default"

A pseudo-attribute of the form xmlns=URI can be thought of as defining
an empty prefix. (As though it were xmlns:||=URI in a pseudo-XML/Lisp
syntax.)  Tags that then use no prefix (i.e., the empty prefix) use this
namespace.  (As though an in-scope occurence of the unqualified tag foo
were really ||:foo)

> 7) Attribute names don't use the default namespace: they are either
> explicitly qualified or are in the "no default" namespace.

The empty string as a URI names a namespace that is unique for each
document. 
Unqualified attribute names are in the namespace of the element type in
which they appear.

(That is, if you think of there being a class that implements namespace
functionality, element definitions can be thought of as instances of
some class that inherits from the namespace mixin.  Unqualified
attribute names are interned directly in the element type
definition-cum-namespace.  Significantly, they are NOT in the
per-document namespace-with-the-empty-URI that I think you are calling
the "no default" namespace.  This means that the unqualified attribute
named "title" in MyElement is different than the unqualified attribute
named "title" in YourElement.  As far as I can tell, XML doesn't
directly provide for inheritance between MyElement and YourElement so
that "title" can actually mean the same thing in both.  But I don't
think it precludes it either, and RDF seems to cover this.)

> ...
> 
> These are these kinds of uses for the "xml name" within a lisp
> program:
...
> 2) If a "DOM-like" structure is created by reading, it is desirable
> to be able to write it back out again and reuse the same abbreviations
> that were used when it was read.

Careful.  The prefix used is at most a property of the particular
element instance that you read from this particular file.  It is not a
property of the namespace (as used by a collection of files) nor even of
the element type definition (which can be refered to many times in the
same file, with each instance defining a different prefix.

It get's worse.  Imagine that you build a quasi-persistent tree of
element instances, where each is interned from a file that creates it. 
If you read in another file that builds another tree yet somehow refers
to an already existent instance (XPointers), then both trees share the
same interned instance.  You might do this, for example, to implement
reading a document file and one or more separate files of XLinks that
create a unified model in memory.  

OK, now try to serialize the combined model to an XML file, where an
element instance might have been created by merging info from two
different files, where each of them used different qualifier names.  
Have fun.....  

You have an implementation strategy marked "C)" that might or might not
help here, but I doubt it.  My intuition is that you are better off with
(my interpretation of) your "E)" strategy of abandoning the document
specified prefixes within your model, and instead create your
definitions with separately prepared cannonical prefixes for each
namespace.

> ...
> So here is an implementation suggestion:
> ...
> 2) Attribute names can be keywords, UNLESS they are qualified.

In Lisp, we have the idea that a slot has a first class name object (a
symbol) and the separate idea that instances can be initialized with
keyword arguments to the initialize method -- where the keywords might
or might be related in some way to the slot names.  This works because
instantiation is a much more specific context than slot access, so a
many-to-one map from slot names to initializer names MAY be possible. 
Even so, I wouldn't assume that CLOS got this right in a mathematically
general way, only in somme common specific cases. 

I bring this up because I think you can use keywords for unqualified
attribute names in initializing the element instances.  However, to
refer to the individual slot values of the instances after constructing
your model, you may need to distinguish between the slot or attribute
named :title in one element and the one named :title in a different type
of element.