a Scheme based library for SGML Documents

From: Jeremy CALLES
Subject: a Scheme based library for SGML Documents
Date: Wed, 19 Aug 1998 00:00:00 +0000
Message-ID: <35DAD92D.DC1147CA@sophia.inria.fr>

I've done a Scheme based library for SGML Documents, which uses the
NSGMLS parser. It was made to transform and apply easily built-in or
user-defined functions to an SGML Document. 

As examples, this package contains 2 translators, made with the library,
that translate documents from the LinuxDoc DTD to the DocBook DTD and
then to a LaTeX file. 

It also contains a lot of functions such as a wc-like and grep-like
function that respect the specifity of SGML documents. 

Moreover, the user can define, in Scheme R4RS, every functions he wants
to apply to his SGML documents and redefine built-in functions he
doesn't like. 

You can download it and find more information at
        
        http://www.mygale.org/07/jcalles/XML


-- 
Jeremy CALLES  --- ·······@mygale.org
home page      --- http://www.mygale.org/07/jcalles

From: Christopher Browne
Subject: Re: a Scheme based library for SGML Documents
Date: Thu, 20 Aug 1998 00:00:00 +0000
Message-ID: <6rfvkk$2kl$1@blue.hex.net>

On Wed, 19 Aug 1998 15:54:53 +0200, Jeremy CALLES
<·············@sophia.inria.fr> wrote: 
>As examples, this package contains 2 translators, made with the library,
>that translate documents from the LinuxDoc DTD to the DocBook DTD and
>then to a LaTeX file. 
>        http://www.mygale.org/07/jcalles/XML

I'm glad to hear of it.

I've been using a lightly hacked on version of Paul Prescod's 'l2db'
DSSSL script in conjunction with Jade for this purpose. 

It <em/almost/ works well; there is one unfortunate exception that I
haven't yet been able to overcome except by fiddling with the results
using a postprocessor:

In LinuxDoc, itemized lists are handled thus:
<itemize>
<item> Main text
<p> Another paragraph
</itemize>

Note that there is an "implicit paragraph" containing "Main text"
attached to <item>.  Arguably that's a bad thing; starting over, it
might be reasonable to mandate that the item look like:

<item> <p><!-- mandatory paragraph indicator --> Main text

In DocBook, the item and the text are indeed separate...  (end tags
omitted, which is legal)

<ItemizedList>
<ListItem><para> Main text
<para> Another paragraph
</ItemizedList>

Note that you need a <para> for that initial "Main text" in DocBook
where it wasn't needed with LinuxDoc. 

The relevant bit of Paul's script is:
(element item
   (make element gi: "ListItem" 
	(make element gi: "Para")))

This causes the generation of something like:
<ItemizedList>
<Listitem><Para> Main text
<Para> Another paragraph </Para> 
</Para>  <!--  Problem -- This Para close is associated with the
               initial paragraph! -->
</ListItem>
</ItemizedList>

In effect, we really need to "eat" text into that initial paragraph
until we get something that ends that paragraph, and close that
paragraph rather sooner. 

I'm assortedly using Perl/Scheme scripts to postprocess this stuff,
removing the extra </Para> tags.  Tag minimization allows us to skip the
</Para> tags. 

I'd like a better answer, preferably to generate valid DocBook in the
first place, like:
<ItemizedList>
<Listitem><Para> Main text</Para>  
<Para> Another paragraph </Para> 
</ListItem>
</ItemizedList>

Hopefully your scripts handle this situation better.
-- 
"Instant coffee is like pouring hot water over the cremated remains of a
good friend."
········@ntlug.org- <http//www.ntlug.org/~cbbrowne/lsf.html>