From: David Lichteblau
Subject: Re: XML generator
Date: 
Message-ID: <slrng0bpif.vgc.usenet-2008@radon.home.lichteblau.com>
On 2008-04-16, Jeff Shrager <········@gmail.com> wrote:
> You'd think that this would be an obviously solved problem, and that
> someone would have actually DOCUMENTED it.. but apparently not. Anyone
> care to give a brief explanation with working examples of how to
> actually do lisp-based XML generation. I'm sure it's done a thousand
> times a day, but I can't seem to make it work.

Let me give you the answer for XML serialization using Closure XML.

As for documentation, some of this is explained well in cxml's
documentation (I hope), but perhaps the layering isn't documented as
well as it should be.

(Scroll down to "Serializing Lisp lists" for the answer to your original
question.)


Lowest level: Serialization using SAX
-------------------------------------

At the lowest level, serialization is the inverse of parsing.

Instead of having the parser generate SAX events to a DOM builder while
parsing the document, here user code generates SAX events, sends them to
a `serialization sink', which writes XML for each event that it receives.

An XML document needs to contain a root element, so the minimal sequence
of events for a complete XML document is START-DOCUMENT, START-ELEMENT,
END-ELEMENT, END-DOCUMENT.

Example:

CL-USER> (let ((sink (cxml:make-string-sink)))                                  
           (sax:start-document sink)                                            
           (sax:start-element sink "" "test" "test" nil)                        
           (sax:characters sink " abc ")                                        
           (sax:end-element sink "" "test" "test")                              
           (sax:end-document sink))

"<?xml version=\"1.0\" encoding=\"UTF-8\"?>                                     
<test> abc </test>"                                                             

By convention, END-DOCUMENT is the function that retries the "result" of
a SAX handler.  In this case, the string sink returns the accumulated
document as a string.

Obviously, SAX-based serialization is not something you would do
manually.  Instead, it serves as a building block for higher-level APIs.


Higher Level: Convenience functions and macros
----------------------------------------------

One higher-level API making use of the SAX layer is the WITH-ELEMENT set
of convenience macros and functions.

Example:

CL-USER> (cxml:with-xml-output (cxml:make-string-sink)                          
           (cxml:with-element "test"                                            
             (cxml:text " abc ")))

"<?xml version=\"1.0\" encoding=\"UTF-8\"?>                                     
<test> abc </test>"                                                             

Here, WITH-XML-OUTPUT calls START-DOCUMENT and END-DOCUMENT for you, and
WITH-ELEMENT calls START-ELEMENT and END-ELEMENT.

(Actually, the implementation is slightly more clever than that: The
START-ELEMENT event is delayed until the first child is seen, so that
we know that all calls to CXML:ATTRIBUTE have been seen.)

I've called this API the "convenience layer" because it's more
convenient than manual SAX calls.  But admittedly it is still more
verbose than some other macros for XML-syntax-in-Lisp stuff.

(Personally I actually use WITH-ELEMENT at lot though.  I like it,
because it integrates into normal Lisp code a little better than, say,
net.html.generator, and has somewhat more obvious semantics.)


Side note: Parse HTML to a sink directly
----------------------------------------

Here's some silly toy code, just to illustrate the possibilities:

Since the parser sends SAX events, and a sink serializes SAX events, we
can connect parser and serialization sink directly, without building an
intermediate representation.

Example:

CL-USER> (cxml:parse "<test> abc&#10;def </test>"
                     (cxml:make-string-sink))

"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<test> abc
def </test>"

(I've included a character reference for newline there just to
illustrate that the parser actually did something here.  But except for
character and entity references, which will have been resolved, this
example doesn't actually do anything interesting, of course.)

Okay, so now you're wondering what the example is about then.
There's actually a cute use case for this:

If you're parsing with Closure HTML (rather than Closure XML), and send
events to an XML sink, you've got an HTML cleanup program that converts
broken HTML into well-formed XHTML.

And it's a one-liner!  Here it is:
  (chtml:parse #p"test.html" (cxml:make-string-sink))

Or in the opposite direction: Work with XML/XHTML internally, then as
the very last step before sending it to the browser, send events to an
HTML sink (e.g., chtml:make-string-sink), which writes HTML rather than
XML.  Same API, different output format.


Serializing in-memory-representations
-------------------------------------

It should be obvious now how DOM is serialized: There's a function mapping
the DOM structure, sending SAX events for the nodes it sees.

(Let me give you the example using STP instead of DOM, because it like
it better.)

CL-USER> (stp:serialize (stp:make-document (stp:make-element "test"))
			(cxml:make-string-sink))

"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<test/>"


Serializing Lisp lists
----------------------

And here is the example you've been waiting for: Serializing a lisp list
structure.

Technically, this is just another example for the serialization of
in-memory representations: But this time we're using cxml's
compatibility package for XMLS.

An XMLS list looks like this:
  ("name" (...attributes...) ...children...)

CL-USER> (defparameter *example*
           `("test" () " abc "))
*EXAMPLE*

So let's serialize that:

CL-USER> (cxml-xmls:map-node (cxml:make-string-sink)
			     *example*
			     :include-namespace-uri nil)
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<test> abc </test>"

(What's the include-namespace-uri business for?  The xmls compatibility
is beginning to diverge from xmls here, since xmls has only rather, uh,
`incomplete' support for namespace.  By default, we expect conses of
name and namespace URI rather than strings in the CAR position of the
element.  Some of this code is a bit in a state of flux.  If you've got
trouble with xmls compatibility currently, please report it, so that we
can fix those issues.)


More about sinks
----------------

What kind of sinks are there?

We've already mentioned the difference between HTML syntax sinks and XML
syntax sinks, i.e.  cxml:make-string-sink vs chtml:make-string-sink.

But there are also different sink classes in both packages:

  make-string-sink            Creates a vector of (character)        
  make-character-stream-sink  Writes to a stream/file
  make-octet-vector-sink      Creates a vector of (unsigned-byte 8)
  make-octet-stream-sink      Writes to a stream/file

(plus some weirder classes with support for Lisps without Unicode.)

Note: the latter two streams handle encoding to UTF-8 for you (and in
the upcoming cxml release encoding to any user-specified encoding
supported by Babel.)

The former two streams don't encode anything, so you get unicode Lisp
characters.


Conclusion
----------

I hope this article clears up the layering of serialization features in
cxml.

It is easy to write your own higher-level code based on sinks.  If the
existing WITH-ELEMENT isn't your favourite API, make your own!  The
sinks solve all the hard problems, and you get to define the API you
like in few more lines on top of that.

Similarly, if the existing xmls-compat list structures aren't what you
want, just copy&paste the code to creates your own version of that.
(Edi Weitz did just that for cl-webdav!)

So, perhaps someone would like to write a cl-quasiquote backend for cxml
sinks?


d.

PS Personally I am a big fan of XSLT though.  So I'd never use
WITH-ELEMENT to generate all of my HTML.  I just generate small XML
snippets containing the data, then send that through XSLT stylesheets to
turn it into formatted XML or HTML.  Nothing beats XML documents for XML
generation.