I've done a Scheme based library for SGML Documents, which uses the
NSGMLS parser. It was made to transform and apply easily built-in or
user-defined functions to an SGML Document.
As examples, this package contains 2 translators, made with the library,
that translate documents from the LinuxDoc DTD to the DocBook DTD and
then to a LaTeX file.
It also contains a lot of functions such as a wc-like and grep-like
function that respect the specifity of SGML documents.
Moreover, the user can define, in Scheme R4RS, every functions he wants
to apply to his SGML documents and redefine built-in functions he
doesn't like.
You can download it and find more information at
http://www.mygale.org/07/jcalles/XML
--
Jeremy CALLES --- ·······@mygale.org
home page --- http://www.mygale.org/07/jcalles
From: Christopher Browne
Subject: Re: a Scheme based library for SGML Documents
Date:
Message-ID: <6rfvkk$2kl$1@blue.hex.net>
On Wed, 19 Aug 1998 15:54:53 +0200, Jeremy CALLES
<·············@sophia.inria.fr> wrote:
>As examples, this package contains 2 translators, made with the library,
>that translate documents from the LinuxDoc DTD to the DocBook DTD and
>then to a LaTeX file.
> http://www.mygale.org/07/jcalles/XML
I'm glad to hear of it.
I've been using a lightly hacked on version of Paul Prescod's 'l2db'
DSSSL script in conjunction with Jade for this purpose.
It <em/almost/ works well; there is one unfortunate exception that I
haven't yet been able to overcome except by fiddling with the results
using a postprocessor:
In LinuxDoc, itemized lists are handled thus:
<itemize>
<item> Main text
<p> Another paragraph
</itemize>
Note that there is an "implicit paragraph" containing "Main text"
attached to <item>. Arguably that's a bad thing; starting over, it
might be reasonable to mandate that the item look like:
<item> <p><!-- mandatory paragraph indicator --> Main text
In DocBook, the item and the text are indeed separate... (end tags
omitted, which is legal)
<ItemizedList>
<ListItem><para> Main text
<para> Another paragraph
</ItemizedList>
Note that you need a <para> for that initial "Main text" in DocBook
where it wasn't needed with LinuxDoc.
The relevant bit of Paul's script is:
(element item
(make element gi: "ListItem"
(make element gi: "Para")))
This causes the generation of something like:
<ItemizedList>
<Listitem><Para> Main text
<Para> Another paragraph </Para>
</Para> <!-- Problem -- This Para close is associated with the
initial paragraph! -->
</ListItem>
</ItemizedList>
In effect, we really need to "eat" text into that initial paragraph
until we get something that ends that paragraph, and close that
paragraph rather sooner.
I'm assortedly using Perl/Scheme scripts to postprocess this stuff,
removing the extra </Para> tags. Tag minimization allows us to skip the
</Para> tags.
I'd like a better answer, preferably to generate valid DocBook in the
first place, like:
<ItemizedList>
<Listitem><Para> Main text</Para> <!-- End that paragraph when it hits
the next one... -->
<Para> Another paragraph </Para>
</ListItem>
</ItemizedList>
Hopefully your scripts handle this situation better.
--
"Instant coffee is like pouring hot water over the cremated remains of a
good friend."
········@ntlug.org- <http//www.ntlug.org/~cbbrowne/lsf.html>