From: Christopher Browne
Subject: Re: data structure for markup text
Date: 
Message-ID: <29ha3.14962$_m4.301127@news2.giganews.com>
DaveP wrote:
>Fernando <·······@must.die> wrote in message <·····················@news.nova.es>...
>> Hi!
>>
>> What's the best data structure to represent markup text? O:-)
>> I need to move around text, identify each marked group and have random
>> access to it
>
>Err, SGML (or maybe XML).
>Then look for the tools to play with text in that form.
>See www.xml.com for a start.

XML/SGML are "merely" textual representations, and do not specify data
structures. 

It would be entirely reasonable to represent an SGML document as a LISP
list, perhaps thusly, in a form that (rather consciously!) resembles
DocBook.

Note that if you use Jade to process a document, something not unlike
this tree gets generated, albeit perhaps not in so explicit a form.
Similarly, loading an XML document in, and throwing it at a DOM
processor, will involve constructing a tree that is not unlike this... 

(My personal goal: Edit document as a big Lisp tree.  This allows using
Lisp tools to "attack" problems, as well as possibly having some pieces
that are dynamically evaluated.  Then take a Lisp "tree walker" that
shakes out DocBook results.  Ideal situation is to be able to go both
ways: DocBook-to-Lisp, and Lisp-to-DocBook...  I'd doubtless become
popular if I built one that shakes the tree into TeXInfo form...)

(define document-as-tree
'(ARTICLE 
 (ID "FINANCES")
 (ARTHEADER
  (TITLE "Finances, Linux, and Stuff...")
  (AUTHOR
   (FIRSTNAME "Christopher")
   (SURNAME "Browne")))
 (ABSTRACT	
  (PARA "This document collects links I have collected relating to
financial software that runs on the " (ULINK (URL "linux.html") "Linux
Operating System.") " It also contains other information salient to
taxation and finance.")
  (PARA "Apparently, this set of pages has been deemed popular enough to
be considered a " 
	(ULINK (URL "http://www.links2go.com/award/") "Links2Go
``Key Resource.''" 
	       (INLINEGRAPHIC (FILEREF "http://www.links2go.com/award/pics/key.gif")))))
 
 (SECT1 
  (ID "PFINBG")
  (TITLE "Personal Financial Background")
  (ANCHOR "FINBG") 
  (PARA "Due to having received a rather intense " 
	(LINK (LINKEND "EDUCATION") " education  ") 
	" in accounting, as well as several years experience in "
	(LINK (LINKEND "ACCOUNTING") " public accounting, ") 
	"I tend to ``have some clue'' in this area.  Perhaps
 surprisingly, I have a degree of intellectual and academic interest
 in financial matters as well.") 
  
  (PARA "One might reasonably expect this to also imply that I have a
certain degree of competence when I express opinions. Just keep in mind
that I " (EMPHASIS "do not")  " have any professionally recognized financial
designation, and thus cannot express opinions that have any kind of
legal weight according to the laws of either Canada or the United
States. I cannot say " (EMPHASIS "In my professional opinion...")))))

It would then be a reasonable idea to create "tree-walking" functions
to head through the resulting tree so as to access whatever links you
wanted to look at.

The use of this representation has the advantage that you can edit
using a LISP-oriented editor, and if you ensure that the structure
remains syntactically valid in LISP, it may then be dumped out (again,
via a suitable tree-walker) as SGML/XML that will be valid.

The powerful part is that you can programmatically *construct* such
trees, which provides a greater guarantee of correctness than people
usually get out of programmed-generation of HTML/SGML/XML.  (The
*really* powerful part would then be to construct a LISP equivalent to
a DTD to allow the tree to be validated, perhaps at the subtree
level...)

The above approach comes out as my reaction to reading the chapter on
HTML generation in Graham's book "Common Lisp."  He describes a scheme
that amounts to dynamically generating fragments of HTML.

My reaction was that I didn't like his approach very well; I regard
the approach as "throwing bits of HTML over the wall," which provides
*no* guarantee that the end result will represent valid HTML.

I far and away prefer the idea of building those fragments into a tree
structure that thereby represents a whole page, and then, once the
tree is build, walking through the complete tree and generating HTML
(or whatever) at *that* point.

The main downside to this approach is that it requires holding the
whole tree in memory at once.  I would counterargue thusly:

a) Web pages shouldn't be *that* large that this should cause a problem;

b) If the document is intended to be so large that this *does* cause a
problem, then it is probably necessary to link to an out-of-memory
"object database" system, in which case the above approach is even
*more* appropriate, albeit with the consideration that the tree will
mostly sit on disk.

I have not yet come up with the "perfect" solution for handling
attributes.  (e.g. - how to represent ID="FINANCES" in the SGML <SECT1
ID="FINANCES"> ... </SECT1>)  

The issue here is that there are components that are sequential, and
others that are not.  For instance, a title is associated with the
section, whilst the list of paragraphs are sequential:

<SECT1> <TITLE> Here's the Title </TITLE>
<PARA> First Paragraph
<PARA> Second Paragraph
<PARA> Third Paragraph
</SECT1>

I present this thus:

(SECT1 
  (TITLE "Here's the title")
  (PARA "First Paragraph")
  (PARA "Second Paragraph")
  (PARA "Third Paragraph"))

But the title possibly ought to be part of some form of property list
for this SECT1 instance.

I'm quite certain that Erik Naggum has thought this through further than
I have; I'd welcome his thoughts, which might very contrast slightly ;-]
with the .signature below... 
--
"DTDs are not common knowledge because programming students are not
taught markup.  A markup language is not a programming language."
-- Peter Flynn <········@m-net.arbornet.org>
········@hex.net- <http://www.hex.net/~cbbrowne/lsf.html>

From: William F. Hammond
Subject: Re: data structure for markup text
Date: 
Message-ID: <i7r9n9wo24.fsf@hilbert.math.albany.edu>
········@news.hex.net (Christopher Browne) writes:

> The main downside to this approach is that it requires holding the
> whole tree in memory at once.  I would counterargue thusly:
> 
> a) Web pages shouldn't be *that* large that this should cause a problem;
> 
> b) If the document is intended to be so large that this *does* cause a
> problem, then it is probably necessary to link to an out-of-memory
> "object database" system, in which case the above approach is even
> *more* appropriate, albeit with the consideration that the tree will
> mostly sit on disk.

(Don't I recall a time in the ancient past when a file of size over
10^6 bytes was considered to be a "trojan horse".  This is certainly
not so any more, but it is still true that size is relative.)

Relative to most present installations swallowing a whole tree is
broken by design in that there exist instances whose size will
swamp existing memory resources long before existing peripheral
storage is exhausted.

My understanding of SGML (and XML) is that it is mainly about fast,
nearly linear, handling of text-like data.  I have the perception that
things like the "mixed content model" prohibition are there for this
reason.

The whole scheme carries a bias toward single-pass processing.  Why,
for example, does James Clark's "nsgmls", put attributes in front of
element content?  (A rhetorical question.)

I think about these things in designing a DTD.  After all, DTD's
and the processors that go with them are optimally created together.

But my processors, rather than being strictly single-pass, are what I
would describe as single-pass with limited look-back.  I think of
"paragraph-size", whatever that is, as the largest size for *full*
look-back.  Global forms of look-back are done by storing special
small things.

When several *full* passes are needed, my inclination is to chain SGML
independent processors with a system-level script, one for each pass.

William F. Hammond                   Dept. of Mathematics & Statistics
518-442-4625                                  The University at Albany
·······@math.albany.edu                      Albany, NY 12222 (U.S.A.)
http://math.albany.edu:8000/~hammond/          Dept. FAX: 518-442-4731

Never trust an SGML/XML vendor whose web page is not valid HTML.
From: Erik Naggum
Subject: Re: data structure for markup text
Date: 
Message-ID: <3138714312189865@naggum.no>
* ········@news.hex.net (Christopher Browne)
| I have not yet come up with the "perfect" solution for handling
| attributes.  (e.g. - how to represent ID="FINANCES" in the SGML <SECT1
| ID="FINANCES"> ... </SECT1>)

  in proper SGML terms the Generic Identifier is an attribute of the
  element the same way any other attributes are, but it has a special role.
  that is, <foo bar=1>zot</foo> should be regarded as a structure with the
  attributes GI=FOO and BAR=1, and the contents "zot".  note that this
  property of the generic identifier is used productively in the HyTime and
  SMDL standards, in that it is moved to a different attribute in order to
  use "architectual forms".  this is very clever, but made into a horribly
  obscure technical point because people think of the Generic Identifier as
  somehow the _primary_ name of an element.  in reality, it's _only_ a
  means of making the structure-verification support in SGML work, and some
  other attribute may well be much more important.

  so if we allow the same kind of pun on name spaces as Common Lisp already
  uses in (foo bar) where FOO is in the function namespace and bar is in
  the value namespace, my suggestion is ((foo :bar 1) "zot"), which I think
  is a clear winner over the runner-up (foo (:bar 1) "zot") for the simple
  reason that you can view (foo :bar 1) as a function call that returns a
  function that can deal with the contents.  experienced Lisp or Scheme
  programmers may think of this as "currying".

  the best way to deal with SGML is to specify a execution model along the
  lines of the functional form in Lisp.  the above may look like a Scheme
  form where (foo :bar 1) might be called in the standard Scheme execution
  model, but this is not so.  we have to view that form as returning a
  function that should take the subforms literally, like a macro (which
  Scheme doesn't have in the way we need it).  instead of a lambda list to
  take ordinary arguments, envision a function definition that takes an
  SGML content model and which may or may not check against this model when
  called depending on the processing mode (like validate, safe, unsafe).
  that is, we call processing functions from the outside and in, not from
  the inside and out, which is Scheme's and Lisp's execution model.

| The issue here is that there are components that are sequential, and
| others that are not.  For instance, a title is associated with the
| section, whilst the list of paragraphs are sequential:

  very good point.  this is one of the many serious problems with SGML.  in
  essence, the contents of an element and the values of attributes are
  semantic "equals", but like one attribute is elevated to Generic
  Identifier to help define SGML's concept of structure, contents is more
  powerful than attributes, in that it has substructure; the difference is
  that SGML does not have a "list" concept for attributes (expect a string
  with one-level structure).  Lisp programmers may think of the contents
  the same way they think of "implicit PROGN", and think of HANDLER-CASE
  and HANDLER-BIND as having body attributes with and without implicit
  PROGN.  in many cases, SGML forces you to use sub-elements because
  attributes cannot hold the information, although this is an artificial
  separation and a sub-element-cum-attribute could be as influential on the
  way an element is processed as other attributes, and attributes that
  don't actually have such a role might as well be sub-elements if it
  weren't for the cluttered namespace and the inability of elements in SGML
  to have different contents depending on their contexts.

| I'm quite certain that Erik Naggum has thought this through further than
| I have; I'd welcome his thoughts...

  well, I hope it helped.

| "DTDs are not common knowledge because programming students are not
| taught markup.  A markup language is not a programming language."
| -- Peter Flynn <········@m-net.arbornet.org>

  I'll venture a different explanation: DTDs aren't common knowledge
  because they are extremely badly designed: they don't have any semantics
  apart from internal consistency of the "document", and there's no
  execution model for them that people can understand, but several that try
  to conflate different points of view onto them, like trying to insist on
  only one way to look at a Picasso, or like insisting that a Lisp form is
  _either_ code _or_ data in spite of the obvious need for a _viewpoint_ to
  make meaning emerge.  programmers from the Algol family naturally have a
  hard time regarding their data as programs and probably find it mildly
  insane to talk about an execution model for the data in a document,
  despite the obvious fact that other markup languages _do_ come with an
  execution model -- they _are_ procedural.  this leads to another problem
  in SGML that might be addressed if we were to think in Common Lisp terms.

  the main problem with SGML is the lack of Lisp-style macros, and this is
  most visible in HTML.  Cascading Style Sheets were invented to address
  this point.  a better solution would simply have been to define macros
  that massaged the structure contained within them and returned something
  more specific.  this, however, would probably mean that users would like
  to deal with macros and application-level functions the same way, but as
  is now apparent, SGML has set up a lot of conceptual barriers between
  things that are no more than different aspects of the same core principle:

1 attributes and generic identifiers.  one might want to use some attribute
  to define processing and another to define the SGML-defined structure.
  or one might want to modify the procesisng according to any number of
  attributes.

2 attributes and contents.  the artificial prohibition against attributes
  with useful structure means some attributes need to be content.  the
  flat namespace means some contents would be turned into attributes to
  avoid collisions.

3 element name and processing.  the application is forced to deal with
  every issue in processing elements, _except_ validating the structure.
  why stop there?  what is the big deal in prohibiting user-defined macros
  that could produce the same elements the application used to see?  it's
  not like they needed to be invented -- all other markup languages at the
  time SGML was defined had them, but SGML viewed them as a weakness.

  too much of SGML's history, including HTML and XML, are attempts at
  solving problems that were introduced, rather than which fell out of
  simple and correct design.  if these artificial barriers had not been set
  up, DTDs would not _need_ to be "common knowledge", because they would
  have fit in with other language definitions, and instead of insisting
  that other people do something wrong when they don't understand you, it's
  much better to realize that there's something you can do differently that
  might help them understand.

  insisting that programmers are the enemies of text processing, which goes
  back to an early rationale for SGML, that the users needed to take back
  control over the data formats from the programmers, has caused people who
  could program to ignore SGML, and people who wanted SGML to work to do a
  whole lot of amazingly stupid things that have alienated programmers even
  more.  but people who ignore programmers while they want their services
  live in a world of unhealthy delusions, one of which is that programmers
  cannot understand user needs, which is an attitude that made sense when
  programmers were separate from users.  every author of a LaTeX document
  knows that the line between user and programmer is very thin, indeed.
  the same applies to anyone who has written any Visual Basic to make Word
  behave as they want, or indeed anyone who has defined his own functions
  in Excel.  a computer user _is_ a programmer in the late 1990's, but we
  still see remnants of the 1960's attitude that users had to take whatever
  the programmers in white coats gave them.  the sad thing is that both
  users and programmers lose big time from this silly schism: users can't
  _program_ HTML to do useful stuff, so have to invent or accept tools that
  destroy what SGML was intended to further: independence of data from
  applications, and writing HTML is performed by at least 50 different
  "languages" with unbelievably bad design, orders of magnitude worse than
  what the purists say would happen to SGML if it had user-defined macros,
  but then they got XML and were apparently not too alarmed by that.

  how do you get people to understand markup languages?  just make them
  into programming languages, again, and people will understand them, too.
  but insist that markup is not like programming, and you break the concept
  that input to some processor that changes its behavior is not affecting
  it in useful ways.  insisting that the processor be maintained by someone
  else is very counter-productive.  given people too much control is also
  counter-productive, as the failed WYSIWYG experiment has proven.

  I think something has to transcend SGML and XML and all the HTML cruft to
  get at the issues _people_ want, which is no more than an understandable
  conceptual route from input to output.  SGML sets up road blocks on that
  route and insist that you do not want to go all the way.  XML removed one
  roadblock among several and people could go further, but they still don't
  "get it": they are prohibited by politics, not technology, to go all the
  way.  most of that politics lies in the insistence on reinventing all the
  programming language concepts needed to process SGML, like groves, or in
  being anal-retentive about the textual representation.

  DTDs are not common knowledge among programming students because markup
  languages are artificially different from programming languages.

  me, I believe in the division of labor and see no purpose in spending any
  time on formatting my own documents.  people who know how to do that well
  should be allowed to their job as best as they can, but anything that
  actually helps _both_ the author and the formatter (for lack of a better
  term) should be welcome.  practice shows that very little formalism is
  required to make this work right, and once you have reinstalled division
  of labor concepts, the languages and notations used in the resulting
  output is quite immaterial, the same way HTML has become immaterial to
  users of FrontPage and other cruft that generates HTML that is tied very
  closely to the expected application environment.

  I came from a Lisp background to SGML and saw a huge potential.  I also
  saw that without a Lisp background, most of SGML was a complete mystery
  to people, and since Lisp was viewed as an abstract, academic programming
  language, and SGML people were allowed to dislike programmers, neither
  would they _acquire_ the results of a Lisp background.  what I can bring
  to Lisp or to programming in general because of my SGML work is a better
  design of data formats and communications protocols, but I regard SGML as
  a major detour that had value the same way any costly mistake has value
  if you are determined to learn from it.  what's more, I don't think SGML
  people will ever realize that they deal with articifical complications of
  rather simple ideas and have no reason to be smug about it towards people
  who prefer simple ideas simple and move on to really complex ideas.

  I suggest that people who want to understand how to process SGML should
  spend the time on something else entirely, like trying to figure out
  which things SGML does well and to build their own environment from that;
  my suggestion in that regard is to think about processing documents from
  the outside and in through a functional execution model, contextualizing
  the processing through dynamic binding of functions, etc.  experienced
  SGML users will think that this is a conflation of the data format with
  the processing language, and that'ss precisely the idea: SGML has made a
  mistake in dividing them _too_ much.  they don't have to be the same _or_
  entirely separate.  but once you grasp this idea, why do you need SGML in
  your documents?  just save the Lisp forms -- after all, you already have
  a Lisp pretty printer at your disposal and the syntax is sane and simple,
  compared to the contorted mess that is SGML, plus you get a lot more
  flexibility when you do this.  Lisp offers the same structuring means as
  SGML does, only without artificial restrictions, and the only difference
  is that you will have to quote your strings and tighten up the overly
  relaxed entity model in SGML, but it is still a necessary component.
  Lisp programmers should think of the entity model as the #. reader macro
  or backquoted forms.

  in the end, you will find that don't need anything from SGML if you have
  a programming language that can treat code as data and vice versa.

  it seems I could go on for a long time, so I'll stop now.

#:Erik
-- 
@1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century
From: ······@my-deja.com
Subject: Re: data structure for markup text
Date: 
Message-ID: <7kk865$ua0$1@nnrp1.deja.com>
In article <················@naggum.no>,
  Erik Naggum <····@naggum.no> wrote:
[some very helpful information]
You are probably uniquely qualified to explain the inherent problems
with SGML and XML.  It would be a real public service if you were to
write them up.  I know that I would like to read this and to be able to
point others to this information to avoid costly future mistakes.


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
From: Erik Naggum
Subject: Re: data structure for markup text
Date: 
Message-ID: <3138960936337701@naggum.no>
* ······@my-deja.com
| You are probably uniquely qualified to explain the inherent problems with
| SGML and XML.

  well, thank you.

| It would be a real public service if you were to write them up.

  it might be, but I have already donated several thousand hours of work to
  the SGML community and found it very hard to get anything in return on
  the investment despite the fact that other people benefited commercially
  and significantly so from my work, so I'm through with "public services".

| I know that I would like to read this and to be able to point others to
| this information to avoid costly future mistakes.

  I can tell you this: I started to write a book on SGML (working title: "A
  Conceptual Introduction to SGML") and received very serious interest from
  Prentice-Hall, but my work on that book brought to the surface all the
  problems I had found with SGML, such as the sorry fact that as a syntax,
  it is incredibly complex for its simple task and its semantic powers are
  way too simplistic to be worth the straight-jacket.  not to mention the
  then (1994) obvious future impact of HTML on publishing, but HTML is so
  amazingly stupidly designed that whatever was left of SGML's goals and of
  the representation of information for longevity completely evaporated
  after HTML 2.0 became obsolete, both in theory and in practice.  I spent
  a year or so agonizing over the fact that it would be really expensive
  for me to cut the losses and move elsewhere, but that's what I had to do,
  because I would only get even more unhappy if I didn't get out of it:
  SGML grew in the back-end of the production line, but it really has no
  value except insofar as it is able to capture meta-information, and that
  means capturing the _intent_ of some particular forms of expression, but
  what intent can you possibly discern when you "convert" a poor users's
  struggle to get something that just looks OK out of Microsoft Word into
  some DTD that was designed to be easily mapped to Word in the first
  place?  such uses of SGML are hugely expensive wastes of time.  if SGML
  (or a similarly capable system) isn't with you from the start, you should
  be happy to get something that looks OK in print.  and if you are clever
  enough to use SGML on the front-end, chances are you will develop a
  system that far surpasses SGML, anyway, and you're back to using SGML as
  formatting back-end if that's the kind of tools you have available.  in
  this process, all that SGML can offer is a syntax for delimiting tags on
  elements of contents, and some important restrictions on structuring that
  means you have be needlessly clever in the machine-generated output, such
  as using HyTime architectural forms, but by so doing, you severely limit
  the application independence of the data, and you discover that it's way
  easier to generate different SGML depending on need rather than use the
  same SGML for different uses.

  what good SGML does is in the document management arena, and sometimes, a
  straight-jacket is just what you need to keep insane people in check,
  such as those who base an entire company's document base on Microsoft's
  products and secret and uintelligible document "formats".  this aspect of
  SGML _is_ very important, but once the structuring process is in the
  works, document types are defined, etc, and the applications are written,
  where did _SGML_ go, and how much did using SGML cost on top of the very
  necessary cleanup process?  if you can't do this process without SGML, by
  all means, go for it, but if you can, SGML represents additional cost and
  no benefits that cannot be obtained cheaper and better by other means.

  a succinct summary of the lessons I have learned is a pun on the old
  advice: "know SGML, forget SGML".  or in other words, transcend syntax
  and look at the concepts that SGML affords expression of, then push
  forward and desire the concepts that SGML does not afford expression of
  and see that you're better of without SGML, but needed to understand it
  in order to move further.  if you stick with SGML, you hit your head in
  the glass ceiling and spend all your effort being clever within very
  restricted bounds -- that's the waste of time you should avoid.

  however, that said, if you find yourself comfortable within the bounds of
  what SGML makes relatively easy and desire no more, I'm not going to ask
  you to choose a better approach to a problem you don't have.  if you want
  to use the quality tools that eat SGML, design simple DTDs that capture
  structure in a natural way relative to the end result, don't try to be
  clever in the DTD design phase.  if it's hard to do in SGML, use some
  other language or tool, such as a real programming language and database
  support.

  the core problem is that neither SGML, nor HTML, nor XML actually _scale_
  or help you _evolve_, and human endeavors are wont to grow and evolve.
  longevity and stability is not to be found in solid structure, but in the
  ability to turn around effortlessly yet _without_ losing the past.  SGML
  is just as bad as any other static structure in that latter regard.

  anyway, this discussion started (as far as I got into it, anyway) with a
  desire to represent SGML-like structure in a dynamic programming language
  such as Common Lisp, and that's the next step: when your data can easily
  be interpreted as function calls, you can write programs that produce the
  desired output as you "execute" the document.  the syntax you use for
  this is not particularly important and it might as well be SGML if you
  can trivially read it into the program and process it, but SGML is such a
  pain to read and process that you should think twice about using it.  put
  another way: whether you write <foo bar=1>zot</foo> or ((foo :bar 1)
  "zot") is not important; that you think in terms of function calls,
  programming languages, and dynamic semantics is.  but why then accept all
  the bother with a truly arcane syntax?  _that's_ the question I couldn't
  answer.  watching other people struggle like mad with the syntax and not
  even _getting_ the semantic relationships and purposes was what tipped me
  off: SGML had a constructive goal, but is counter-productive in practice.
  the productive approach is to understand the constructive goal and do it
  some much less involved way.  now, if you want to do that, I'm all ears.
  
#:Erik
-- 
@1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century
From: Paolo Amoroso
Subject: Re: data structure for markup text
Date: 
Message-ID: <376f5af8.96368@news.mclink.it>
On 21 Jun 1999 13:35:36 +0000, Erik Naggum <····@naggum.no> wrote:

> * ······@my-deja.com
[...]
> | It would be a real public service if you were to write them up.
> 
>   it might be, but I have already donated several thousand hours of work to
>   the SGML community and found it very hard to get anything in return on
>   the investment despite the fact that other people benefited commercially
>   and significantly so from my work, so I'm through with "public services".
[...]
>   I can tell you this: I started to write a book on SGML (working title: "A
>   Conceptual Introduction to SGML") and received very serious interest from
>   Prentice-Hall, but my work on that book brought to the surface all the
[...]
>   anyway, this discussion started (as far as I got into it, anyway) with a
>   desire to represent SGML-like structure in a dynamic programming language
>   such as Common Lisp, and that's the next step: when your data can easily
>   be interpreted as function calls, you can write programs that produce the
>   desired output as you "execute" the document.  the syntax you use for

If you ever decide to continue writing such a book, I'm personally willing
to pay the price of my copy in advance. Your points on representing
document structure in a dynamic language, and on designing for both users
and programmers, are particularly interesting. In the meantime, thanks for
sharing your thoughts.

Incidentally, you said that your preliminary work for the Prentice-Hall
project brought to the surface all the problems with SGML. Petroski's "To
Engineer is Human" is a great example of a book about learning from design
mistakes.


Paolo
-- 
Paolo Amoroso <·······@mclink.it>
From: William F. Hammond
Subject: Re: data structure for markup text
Date: 
Message-ID: <i7vhcfx8u4.fsf@hilbert.math.albany.edu>
SGML (or a subcategory such as XML) is the best extant choice if you
want a *reliable* way to have different presentation formats from a
single source document.

(I do know another; it called Texinfo.  But SGML, even XML, is better.)

Erik Naggum <····@naggum.no> writes:

[snip]
>   I can tell you this: I started to write a book on SGML (working title: "A
>   Conceptual Introduction to SGML") and received very serious interest from
>   Prentice-Hall, but my work on that book brought to the surface all the
>   problems I had found with SGML, such as the sorry fact that as a syntax,
>   it is incredibly complex for its simple task and its semantic powers are
>   way too simplistic to be worth the straight-jacket.
[snip]

Many of the "problems" with SGML arise from the fact that so few good
examples of SGML processors -- aside from some excellent parsers --
are easily accessible.  (Even then getting a good parser up and running
is a bit of a challenge the first time.)

The lack of easy accessibility involves money, processing-language
choices, lack of education about SGML, and lack of understanding of
staged processing.

There is also a need for authors and editors to have contact with
application language designers and application process designers.  It
appears to me that the consulting business, especially when marketed
as "XML" and where the consultants are competent, is good both for
consultants and clients.  This is rather incompatible with the use of
processors that are not freely available with open source.
Consultants doing business that way seem to be busy.  (But I am not
one of them, and I cannot offer real evidence.)

>   SGML grew in the back-end of the production line, but it really has no
>   value except insofar as it is able to capture meta-information, and that
>   means capturing the _intent_ of some particular forms of expression, but
>   what intent can you possibly discern when you "convert" a poor users's
>   struggle to get something that just looks OK out of Microsoft Word into
>   some DTD that was designed to be easily mapped to Word in the first
>   place?  such uses of SGML are hugely expensive wastes of time.

It captures *structural content*.  Meta-information is part of
structural content.  I recommend the following operational definition
of what "structural content" is:   A piece of markup represents
structural content if it has meaning for at least two different
presentation formats where neither may be derived systematically and
robustly from the other.

[snip]
>                                             and if you are clever
>   enough to use SGML on the front-end, chances are you will develop a
>   system that far surpasses SGML, anyway, and you're back to using SGML as
>   formatting back-end if that's the kind of tools you have available.  in
>   this process, all that SGML can offer is a syntax for delimiting tags on
>   elements of contents, and some important restrictions on structuring that
>   means you have be needlessly clever in the machine-generated output, such
>   as using HyTime architectural forms,

This is the design of SGML:
   (1) A good system need not use all features.
   (2) It is very flexible.
   (3) It is a framework slanted toward processing efficiency.

SGML is only a framework.  One always needs to supply the language and
a family of processors for the various targets.  Such a processor
usually is a pipeline of length greater than 1.  A well designed
pipeline component consists of small modular pieces that are easy to
adjust as needed.  (If one finds that not to be the case, then the
design needs revisiting.)

There are many features in SGML one may not want to use.  (But others
may want to use them.)

There is almost always more than one way to proceed at the design
stage.  The absence of a canonical way to proceed can create the
appearance of complexity.  Such a human perception is only
psychological.

[snip]                               but by so doing, you severely limit
>  the application independence of the data, and you discover that it's way
>  easier to generate different SGML depending on need rather than use the
>  same SGML for different uses.

Is this an objection?   (DTDs may be designed to consist of packages
of plug-in "modules".)

[snip]
>                             if you can't do this process without SGML, by
>   all means, go for it, but if you can, SGML represents additional cost and
>   no benefits that cannot be obtained cheaper and better by other means.

If one only wants a single presentation, forget SGML.

If one wants two presentations, say, print and (valid) HTML, then
Texinfo may be an option.  But someday I expect that there will be a
DTD for Texinfo, and it will come to be understood that the best way
to make Texinfo from something like DocBook is to format DocBook into
Texinfo-SGML and then use a canonical Texinfo formatter for that
language.

>   a succinct summary of the lessons I have learned is a pun on the old
>   advice: "know SGML, forget SGML".  or in other words, transcend syntax
>   and look at the concepts that SGML affords expression of, then push
>   forward and desire the concepts that SGML does not afford expression of
>   and see that you're better of without SGML, but needed to understand it
>   in order to move further.

[snipped from later]
>   anyway, this discussion started (as far as I got into it, anyway) with a
>   desire to represent SGML-like structure in a dynamic programming language
>   such as Common Lisp, and that's the next step: when your data can easily
>   be interpreted as function calls, you can write programs that produce the
>   desired output as you "execute" the document.  the syntax you use for
>   this is not particularly important and it might as well be SGML if you
>   can trivially read it into the program and process it, but SGML is such a
>   pain to read and process that you should think twice about using it.

The lisp-like structure of the earlier posting is very interesting and
worthwhile.  In fact, I prefer that markup style to SGML tagging,
which my eyes do not like to look at.  (Hence, my GELLMU project that
involves still another markup style that is LaTeX-like.)

But markup languages in these alternative notations are isomorphic
(though not always canonically, but canonical within a rather limited
range of sane choices) to SGML applications.  Even though one might
not choose to use SGML processors with them, that possibility will
always exist.  With a good overall design SGML processors should be
optimal.

[snip]
>   the core problem is that neither SGML, nor HTML, nor XML actually _scale_
>   or help you _evolve_, and human endeavors are wont to grow and evolve.
>   longevity and stability is not to be found in solid structure, but in the
>   ability to turn around effortlessly yet _without_ losing the past.  SGML
>   is just as bad as any other static structure in that latter regard.

But also just as good.

As the years go by, the test of a markup language created today will
be its amenability to the automatic processing of legacy documents
into the formats of the future.


William F. Hammond                   Dept. of Mathematics & Statistics
518-442-4625                                  The University at Albany
·······@math.albany.edu                      Albany, NY 12222 (U.S.A.)
http://math.albany.edu:8000/~hammond/          Dept. FAX: 518-442-4731

Never trust an SGML/XML vendor whose web page is not valid HTML.
From: Erik Naggum
Subject: Re: data structure for markup text
Date: 
Message-ID: <3139226293475107@naggum.no>
* William F. Hammond
| Many of the "problems" with SGML arise from the fact that so few good
| examples of SGML processors ... are easily accessible.

  the problems with SGML are conceptual.  practical problems have practical
  solutions and are uninteresting from a language design point of view.
  some conceptual problems have practical solutions, and while interesting
  for an implementor, are also uninteresting insofar as they don't have
  unreasonably high costs.  the rest of the conceptual problems must have
  conceptual solutions, too -- and _those_ are the interesting ones from a
  language design point of view.

  my rule of thumb: if "more money" is the answer, you have an uinteresting
  practical problem -- conversely, the measure of success of conceptually
  good solution is that solving the problem consumes less resources from
  then on.

| It captures *structural content*.

  it's unclear what you're trying to tell me that you think I don't know.

| SGML is only a framework.

  indeed it was intended as such, but it is its capacity as framework that
  I have been lamenting.

| There is almost always more than one way to proceed at the design stage.
| The absence of a canonical way to proceed can create the appearance of
| complexity.  Such a human perception is only psychological.

  while I dismiss practical problems out of hand -- smart people will solve
  them sooner or later -- respecting human nature is what good design is
  all about, and the more you respect it, the better the design is.

| If one only wants a single presentation, forget SGML.

  this is the most dangerous statement you can make if you want to destroy
  SGML completely.  it's like saying a programming language is only good
  for the really complex problems -- it will lose the competition for the
  people who believe their problems are simple, or who need simple steps
  from problem to solution.  (hint: such people abound.)

| The lisp-like structure of the earlier posting is very interesting and
| worthwhile.  In fact, I prefer that markup style to SGML tagging, which
| my eyes do not like to look at.

  I'm glad to hear it was so intuitively appealing.

| (Hence, my GELLMU project that involves still another markup style that
| is LaTeX-like.)

  one of the problems that come with making text the primary syntactic
  element is that you have to invent so much black magic to keep markup
  distinct from text.  I prefer a much simpler way to deal with this, that
  used in programming languages: delimit the data, not the code.  in
  particular, Common Lisp's very simple syntax: a string is delimited by
  double quotes; a backslash precedes a literal character inside a string.
  no black magic like C, and absolute predictability both reading and
  writing the strings.  (note that one of the uninteresting practical
  problems of SGML is that SGML's syntax differs according to the SGML
  declaration, which makes some character sequences magic and others not --
  the only way you can _really_ be safe is using character entites for
  every character.)

* Erik Naggum
| SGML is just as bad as any other static structure in that latter regard.

* William F. Hammond
| But also just as good.

  hello?  the whole point of my article was that static structures are
  insufficient for any publishing problem worth solving.

| As the years go by, the test of a markup language created today will be
| its amenability to the automatic processing of legacy documents into the
| formats of the future.

  I actually agree.  SGML is uniquely slated to flunk this test.  if you
  don't agree, I expect to see your solution to the problem of updating a
  document automatically when its DTD changes.  if this is "impractical",
  take a look at SQL and the tools created for it: the unique strength of
  that language is that you can dynamically improve your database without
  having to dump and reload it, which is what people had to do prior to SQL
  and its quiet revolution.  (yes, that phrase was first used about SQL.)
  the more complex structures become, the more people need to be able to
  change them as they learn more about them.  SGML is the worst possible
  language in which to do just that.

  because SGML/*ML does not support structure rewriting, it cannot survive
  any serious amount of change.  all "macro languages" have such rewriting
  -- it's what "macro" is all about -- but SGML decided to discard this
  aspect of being a programming language.  without it, people will have to
  write tons and tons of code to deal with special cases, build front-ends
  that deal with stuff SGML doesn't, etc, etc.  it is no coincident that
  there are lots of "scripting languages" that produce HTML out there, just
  as it is no coincidence that tools that process SGML come with their very
  own languages, way more arcane than anything programming language people
  could dream up.

  wouldn't you just _love_ to have a Turing complete markup language, with
  a nice syntax that both humans and machines could read with ease, which
  allowed you to do structure rewriting _in_ the language?  the only way
  you can do that is to fully realize that data and code are the _same_.
  in so doing, you realize that creating short-term convenience barriers
  between parts of an inextricably linked whole is counterproductive in
  non-immediate terms: the net effect can only be to force people on both
  sides of each barrier to reinvent the rest of the whole on their own,
  which is a phenomenal waste of time, regardless of what they manage to do
  that is productive and useful, _unless_ your only concern is the short
  term, in which case such waste has no bearing on the evaluation.

#:Erik
-- 
@1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century
From: Fernando
Subject: Re: data structure for markup text
Date: 
Message-ID: <377660e0.451134@news.nova.es>
On 24 Jun 1999 15:18:13 +0000, Erik Naggum <····@naggum.no> wrote:

>* William F. Hammond
>| Many of the "problems" with SGML arise from the fact that so few good
>| examples of SGML processors ... are easily accessible.
>
>  the problems with SGML are conceptual.  
[snip]

OK, then, what would be the ideal solution/substitute for the sgml
problems? O:-)




//-----------------------------------------------
//	Fernando Rodriguez Romero
//
//	frr at mindless dot com
//------------------------------------------------
From: Christopher Browne
Subject: Re: data structure for markup text
Date: 
Message-ID: <99Ac3.88269$_m4.713298@news2.giganews.com>
On 23 Jun 1999 10:58:11 -0400, William F. Hammond
<·······@hilbert.math.albany.edu> wrote: 
>SGML (or a subcategory such as XML) is the best extant choice if you
>want a *reliable* way to have different presentation formats from a
>single source document.

This doesn't address Erik's issues.

If I take a document, and represent it a big Lisp list, I can then
take a Lisp program that renders it in multiple formats.

Taking a document, representing it in SGML, and writing a
multiple-format-renderer in Lisp (ala DSSSL) is no different from that
perspective.

The "represent as Lisp" approach has the disadvantage that there is
not a "de jure" STANDARDIZED way of using this as a GML.  But
standardization was not particularly at issue.

>(I do know another; it called Texinfo.  But SGML, even XML, is better.)
>
>Erik Naggum <····@naggum.no> writes:
>
>[snip]
>>   I can tell you this: I started to write a book on SGML (working title: "A
>>   Conceptual Introduction to SGML") and received very serious interest from
>>   Prentice-Hall, but my work on that book brought to the surface all the
>>   problems I had found with SGML, such as the sorry fact that as a syntax,
>>   it is incredibly complex for its simple task and its semantic powers are
>>   way too simplistic to be worth the straight-jacket.
>[snip]
>
>Many of the "problems" with SGML arise from the fact that so few good
>examples of SGML processors -- aside from some excellent parsers --
>are easily accessible.  (Even then getting a good parser up and running
>is a bit of a challenge the first time.)
>
>The lack of easy accessibility involves money, processing-language
>choices, lack of education about SGML, and lack of understanding of
>staged processing.

Tools != syntax.

Or, in Lisp syntax:
  (not (equalp 'tools 'syntax))

The criticism is not of the tools.  The criticism is deeper than that;
it implies that the problems that are encountered with SGML are
systematic, resulting from the nature of SGML, applicable regardless
of the tools.

The point is that SGML is inherently hard to create tools for.  Its
syntax mandates highly complex tools, which means that the problems
you acknowledge are the *result* of what Erik has said.

>>SGML grew in the back-end of the production line, but it really has no
>>value except insofar as it is able to capture meta-information, and that
>>means capturing the _intent_ of some particular forms of expression, but
>>what intent can you possibly discern when you "convert" a poor users's
>>struggle to get something that just looks OK out of Microsoft Word into
>>some DTD that was designed to be easily mapped to Word in the first
>>place?  such uses of SGML are hugely expensive wastes of time.
>
>It captures *structural content*.  Meta-information is part of
>structural content.  I recommend the following operational definition
>of what "structural content" is:   A piece of markup represents
>structural content if it has meaning for at least two different
>presentation formats where neither may be derived systematically and
>robustly from the other.

Again, presenting a definition of structural content is pretty
orthogonal to the issue Erik raised.

If the poor user is sitting out there trying to get Word to make
something "look right," and is not composing documents with a mindset
oriented to structural content, then what you get at the end of the
day won't be structural markup.

And if there's a *bit* of structural markup thrown in, this makes the
task of assigning *proper* structure to the document a tougher job.

>[snip]
>>                                             and if you are clever
>>   enough to use SGML on the front-end, chances are you will develop a
>>   system that far surpasses SGML, anyway, and you're back to using SGML as
>>   formatting back-end if that's the kind of tools you have available.  in
>>   this process, all that SGML can offer is a syntax for delimiting tags on
>>   elements of contents, and some important restrictions on structuring that
>>   means you have be needlessly clever in the machine-generated output, such
>>   as using HyTime architectural forms,
>
>This is the design of SGML:
>   (1) A good system need not use all features.
>   (2) It is very flexible.
>   (3) It is a framework slanted toward processing efficiency.

I don't think that's the "design of SGML;" that merely represents
three attractive properties of flexible system.

>SGML is only a framework.  One always needs to supply the language and
>a family of processors for the various targets.  Such a processor
>usually is a pipeline of length greater than 1.  A well designed
>pipeline component consists of small modular pieces that are easy to
>adjust as needed.  (If one finds that not to be the case, then the
>design needs revisiting.)

Closer to the issue now, indeed, SGML is "only a framework."
Furthermore, it is only a framework for representing markup syntax.
By the time you create a powerful enough set of tools to represent
that "pipeline" of processors, you have added a whole lot of further,
decidedly NONgeneralized, functionality.

>There are many features in SGML one may not want to use.  (But others
>may want to use them.)
>
>There is almost always more than one way to proceed at the design
>stage.  The absence of a canonical way to proceed can create the
>appearance of complexity.  Such a human perception is only
>psychological.
>
>[snip]                               but by so doing, you severely limit
>>  the application independence of the data, and you discover that it's way
>>  easier to generate different SGML depending on need rather than use the
>>  same SGML for different uses.
>
>Is this an objection?   (DTDs may be designed to consist of packages
>of plug-in "modules".)
>
>[snip]
>>                             if you can't do this process without SGML, by
>>   all means, go for it, but if you can, SGML represents additional cost and
>>   no benefits that cannot be obtained cheaper and better by other means.
>
>If one only wants a single presentation, forget SGML.
>
>If one wants two presentations, say, print and (valid) HTML, then
>Texinfo may be an option.  But someday I expect that there will be a
>DTD for Texinfo, and it will come to be understood that the best way
>to make Texinfo from something like DocBook is to format DocBook into
>Texinfo-SGML and then use a canonical Texinfo formatter for that
>language.

Far more likely is for someone to write DSSSL that takes a DocBook
parse tree and generates TeXinfo.

I see *no* value in creating a new TeXInfo with an SGML parser when
that will change the structure of the application enough as to make it
more sensible to redesign it from scratch, which would probably amount
to subsuming its functionality into a DocBook application "suite."

>>   a succinct summary of the lessons I have learned is a pun on the old
>>   advice: "know SGML, forget SGML".  or in other words, transcend syntax
>>   and look at the concepts that SGML affords expression of, then push
>>   forward and desire the concepts that SGML does not afford expression of
>>   and see that you're better of without SGML, but needed to understand it
>>   in order to move further.
>
>[snipped from later]
>>   anyway, this discussion started (as far as I got into it, anyway) with a
>>   desire to represent SGML-like structure in a dynamic programming language
>>   such as Common Lisp, and that's the next step: when your data can easily
>>   be interpreted as function calls, you can write programs that produce the
>>   desired output as you "execute" the document.  the syntax you use for
>>   this is not particularly important and it might as well be SGML if you
>>   can trivially read it into the program and process it, but SGML is such a
>>   pain to read and process that you should think twice about using it.
>
>The lisp-like structure of the earlier posting is very interesting and
>worthwhile.  In fact, I prefer that markup style to SGML tagging,
>which my eyes do not like to look at.  (Hence, my GELLMU project that
>involves still another markup style that is LaTeX-like.)
>
>But markup languages in these alternative notations are isomorphic
>(though not always canonically, but canonical within a rather limited
>range of sane choices) to SGML applications.  Even though one might
>not choose to use SGML processors with them, that possibility will
>always exist.  With a good overall design SGML processors should be
>optimal.

Optimal in what sense?

- Optimal in that they are interoperable with a bunch of parsers?
- Optimal in usage of resources?
- Optimal in terms of minimal complexity of syntax?

>[snip]
>> the core problem is that neither SGML, nor HTML, nor XML actually _scale_
>> or help you _evolve_, and human endeavors are wont to grow and evolve.
>> longevity and stability is not to be found in solid structure, but in the
>> ability to turn around effortlessly yet _without_ losing the past.  SGML
>> is just as bad as any other static structure in that latter regard.
>
>But also just as good.

If it's just as bad, then this indicates that it's flawed for the
purpose.

>As the years go by, the test of a markup language created today will
>be its amenability to the automatic processing of legacy documents
>into the formats of the future.

The problem, always, is in getting automated support for finding the
structure that we *later* find that we *actually* care about amongst
the structuring that we *thought* we cared about.

The problem with SGML in this regard is that you need to have tools
that take the existing structure and rewrite the documents.
Unfortunately, this winds up requiring not one, not two, but *three*
design efforts:
a) The original DTD,
b) The transformed DTD,
c) A program that groks the original DTD, can search for structural
patterns, and generates new results conformant with the transformed
DTD.

I'd say that a Lisp is likely to be *real* good at c); the only things
that SGML supports at all are a) and b).

>Never trust an SGML/XML vendor whose web page is not valid HTML.

Good comment...
-- 
C program run -- Run program run -- Run, C program, Run! -- (please)
········@ntlug.org- <http://www.ntlug.org/~cbbrowne/sgml.html>
From: Norman Gray
Subject: Re: data structure for markup text
Date: 
Message-ID: <7l024n$lh1$1@lenzie.cent.gla.ac.uk>
Greetings,

Erik Naggum <····@naggum.no>:
> wouldn't you just _love_ to have a Turing complete markup language, with
> a nice syntax that both humans and machines could read with ease, which
> allowed you to do structure rewriting _in_ the language?  the only way
> you can do that is to fully realize that data and code are the _same_.

But we seem to have that already, in the form of (La)TeX, and it seems
to be a bad solution to the problems of document structuring for reuse
(let us avoid the issue of `nice syntax'...).

I'm currently working on moving a document set from LaTeX to SGML.
The previous system had worked reasonably well because LaTeX can be
reasonably well-structured, TeX is flexible so you can do clever
things, the data and the code are united: TeX has a lot in common with
Lisp.  However, the system was crumbling precisely because it had the
features Erik seems to be advocating: it's Turing complete, so authors
could do clever things.  This meant that repurposing the documents
merely to the extent of converting them to HTML was exceedingly hard
and far from robust.

I ended up feeling that the fundamental reason this is a bad solution
is that it allows the authors to be too clever.  If an author includes
a burst of macro magic in a document -- either to deal with some
particular problem, or just because they're bored writing
documentation -- it's quite easy to make it effectively unprocessable
in anything other than the original context.  If you want to do
anything other than print some of our documents out, you either have
to work wonders in the TeX output routine, reimplement TeX's mouth in
Perl, or go through the damn thing by hand.  Now, before anyone else
says it, I'll say that the bulk of this problem is because LaTeX is
somewhat underspecified, and TeX is overly specialised to print
output, but an analogous limitation would surely be reached eventually
if the document were written in Lisp or indeed any system where the
author could scribble on Turing's paper tape.

I can't help feeling that if you restrict what an author _can_ do,
then you limit what a future parser _has_ to do to repurpose the
document, and that this is a good thing.  SGML's strength seems to be
precisely that it doesn't have LaTeX's flexibility (at least from the
point of view of the document author), but perhaps I'm insufficiently
separating the particular from the general, in which case my mind
needs expanding.

On the other hand...

········@news.hex.net (Christopher Browne):

> Unfortunately, this winds up requiring not one, not two, but *three*
> design efforts:
> a) The original DTD,
> b) The transformed DTD,
> c) A program that groks the original DTD, can search for structural
> patterns, and generates new results conformant with the transformed
> DTD.
> 
> I'd say that a Lisp is likely to be *real* good at c); the only things
> that SGML supports at all are a) and b).

Once you have constructed a grove from the input document, is it not
then equivalent to the document expressed in Lisp, so that shaking the
grove from a) to b) (perhaps using DSSSL, since we're on the topic of
Lisp, but possibly using XSLT and being up-to-the-minute) is nothing
more interesting than an exercise which, to recall Erik's memorable
rule of thumb, merely requires more money.

In other words, while the document-as-characters-plus-tags is
agreeably restricted (from my point of view as the author of the SGML
application, as _well_ as from my point of view as the author of the
documentation -- I want to write the words, not do clever things with
the document), the document-as-directed-acyclic-graph-of-properties
seems agreeably flexible and functional.

...or am I missing something?  (I do hope so: in three separate contexts
recently, I've found myself on the side of a status quo, and it's
making me feel old!)

Best wishes,

Norman

---------------------------------------------------------------------------
Norman Gray                        http://www.astro.gla.ac.uk/users/norman/
-- 
---------------------------------------------------------------------------
Norman Gray                        http://www.astro.gla.ac.uk/users/norman/
From: Valeriy E. Ushakov
Subject: Re: data structure for markup text
Date: 
Message-ID: <7l0ais$bgf$1@news.ptc.spbu.ru>
In comp.lang.lisp Norman Gray <····@udcf.gla.ac.uk> wrote:

> I'm currently working on moving a document set from LaTeX to SGML.
> The previous system had worked reasonably well because LaTeX can be
> reasonably well-structured, TeX is flexible so you can do clever
> things, the data and the code are united: TeX has a lot in common with
> Lisp.  However, the system was crumbling precisely because it had the
> features Erik seems to be advocating: it's Turing complete, so authors
> could do clever things.  This meant that repurposing the documents
> merely to the extent of converting them to HTML was exceedingly hard
> and far from robust.

But isn't the Turing-completeness gets in you way because you use
something _other_ than TeX to convert it to HTML?  If I read Erik
right he have a _single_ language in mind.

SY, Uwe
-- 
···@ptc.spbu.ru                         |       Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/            |       Ist zu Grunde gehen
From: Erik Naggum
Subject: Re: data structure for markup text
Date: 
Message-ID: <3139321655035574@naggum.no>
* Norman Gray <····@udcf.gla.ac.uk>
| But we seem to have that already, in the form of (La)TeX, and it seems
| to be a bad solution to the problems of document structuring for reuse
| (let us avoid the issue of `nice syntax'...).

  well, if you program in (La)TeX, I share your pain.  there's nothing in
  TeX that forbids being smart, but both the syntax and the semantics sure
  makes it a pain to write reasonably capable functions and abstractions.

| However, the system was crumbling precisely because it had the features
| Erik seems to be advocating: it's Turing complete, so authors could do
| clever things.

  one problem with TeX is the low level at which you can do clever things.
  another problem is that you _have_ be clever, because straight-forward,
  regular smartness doesn't cut it.

| I ended up feeling that the fundamental reason this is a bad solution is
| that it allows the authors to be too clever.

  I don't think society as a whole condones lobotomy just because somebody
  is considered too smart for their own good.

| If an author includes a burst of macro magic in a document -- either to
| deal with some particular problem, or just because they're bored writing
| documentation -- it's quite easy to make it effectively unprocessable in
| anything other than the original context.

  ah!  you completely misunderstand my use of "macro".  not so strange,
  considering your TeX background and my Lisp background, but let me
  clarify: a macro in Lisp is a form that takes a bunch of arguments and
  returns a _new_ form that replaces the first form.  TeX macros are more
  like functions that actually do their own stuff.  that is, a TeX macro
  implements, while a Lisp macro expresses.

  suppose you have a list of somethings and the DTD doesn't support
  defaulting or computing attributes from the last use within a particular
  element (sounds familiar? ;), you could write a macro that expands the
  "tags" of the sub-elements into having that uniform attribute.  in other
  words, you do NOT go below the language defined by the DTD, but you
  enable interesting ways to _produce_ the structure with interestingly
  defaulted or "computed" attributes.  the output of the macro is just what
  you expect it to be: valid SGML (function calls), not random noise or
  low-level magic, like TeX does.  another important point with Lisp macros
  is that they rewrite one form into another, but it's always a one-to-one
  transformation.  (if you need any other number than 1, you also need a
  generic container form, which SGML also lacks, because it breaks all
  possibility of validating the structure, unless supported by the core of
  the semantics, but SGML lacks a general "list of things" concept, too.)

  now, if you had noticed that I want tags-with-attributes to be function
  _call_ forms, obviously into the application language for which the DTD
  is a small part, you would not have confused a macro with a function.  I
  was also very explicit in making the markup execution model fundamentally
  different from the Lisp execution model, namely that Lisp is inside-out,
  while markup needs to be outside-in.

  TeX is the wrong approach, but it is not because it has computational
  power: it's because it's a glorified assembly language for formatters.

| I can't help feeling that if you restrict what an author _can_ do, then
| you limit what a future parser _has_ to do to repurpose the document, and
| that this is a good thing.

  while your empirical basis is significant in respect to TeX, I would
  recommend that you not run away scared just because you see something
  scary that reminds you of how much TeX hurts.

| Once you have constructed a grove from the input document, is it not
| then equivalent to the document expressed in Lisp,

  this doesn't seem like a valid or even likely conclusion to me.  how did
  you arrive at it?

| In other words, while the document-as-characters-plus-tags is agreeably
| restricted (from my point of view as the author of the SGML application,
| as _well_ as from my point of view as the author of the documentation --
| I want to write the words, not do clever things with the document), the
| document-as-directed-acyclic-graph-of-properties seems agreeably flexible
| and functional.

  you know, I don't think syntax is irrelevant to semantics, because syntax
  is relevant to pragmatics and if the pragmatics of clean semantics is on
  par with the pragmatics of ugly sementics, people will choose randomly,
  meaning they get it ugly 90% of the time, given human nature and all. ;)
  if you make ugly cost 10 times more than beautiful, you'll get a much
  nicer end result.  this is also why 90% of the languages are butt ugly,
  and why 90% of the code written in them is ugly.  Lisp has a tradition of
  making ugly hugely expensive and beautiful cheap, which means only 50% of
  the Lisp code is ugly, given human nature and all.

  if you haven't noticed, I'm trying very hard to invalidate your empirical
  basis from TeX and Perl and other atrocities, both syntax-, pragmatics-
  and semanticswise.  it may seem like a clich�, but people actually do
  think differently when they deal with nice things.

#:Erik
-- 
@1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century
From: Lieven Marchand
Subject: Re: data structure for markup text
Date: 
Message-ID: <m3bte4dk1q.fsf@localhost.localdomain>
Erik Naggum <····@naggum.no> writes:
>   TeX is the wrong approach, but it is not because it has computational
>   power: it's because it's a glorified assembly language for formatters.
> 

What adds insult to injury is that Knuth knew better even at the
time. In his first design document for TeX he mentions Lisp and in a
Q&A article recently published he mentions the following:

"In some sense I put in many of TeX's programming features only after
kicking and screaming; ... In the 70s, I had a negative reaction to
software that tried to be all things to all people. ... So I thought,
"Well, I'm not going to design a programming language; I want to have
just a typesetting language." Little by little, however, I needed more
features and so the programming constructs grew. Guy Steele began
lobbying for more capabilities early on, and I put many such things
into the second version of TeX, TeX82, because of his urging."

To bad Steele couldn't convert him to a sensible design. Any sort of
real programming language would have been better than TeX macrology
and the arcana of "TeX's mouth".

ObLisp: has anything ever come from the NTS rewrite that some people
were doing in Common Lisp?

-- 
Lieven Marchand <···@bewoner.dma.be>
If there are aliens, they play Go. -- Lasker
From: Rolf Marvin B�e Lindgren
Subject: Re: data structure for markup text
Date: 
Message-ID: <lbziu8bkn4m.fsf@morgoth.uio.no>
[ Lieven Marchand

| ObLisp: has anything ever come from the NTS rewrite that some people
| were doing in Common Lisp?

CL was discussed at one time, but I think they have settled for Java,
being under the impression that Java is more platform-independent than
Lisp.  

sad though, several of the involved thougt well of Lisp.  

-- 
Rolf Lindgren                                        http://www.uio.no/~roffe/
·····@tag.uio.no
From: David Combs
Subject: Re: data structure for markup text
Date: 
Message-ID: <7l3b37$am3@dfw-ixnews19.ix.netcom.com>
I still think you guys should look at "Scribe", done as phd
thesis at cmu by brian reid -- a real work of genius (probably
applies to both).

Has macros (non-lispish, via naggum's definition), environments
that can be defined and modified, with "beforeEntry", "afterExit",
etc args.

Seems to me that he was the only person designing these things
who UNDERSTOOD THE PROBLEM.

And best of all it's monolothic, not a chain of programs
a la troff or, worse, tex, latex, ..., where *who knows* where
something goes wrong.

Scribe has a really elegant design.  Just who are these fools
designing this sgml stuff anyway?

I got that black thick book on sgml by the guy who invented it,
jesus, I asked myself as going through it, what a horribly
misdesigned system.

And I look at tex, worse than assembly language, more like
using those long instructions that *implement* the assembly (machine)
lang!

Returning to html:
What kind of fool does it take to design a language where
you have a "begin" construct and an *optional* "end" to
match it?  And what kind of TOTAL IDIOTS does it take to
make such a thing a STANDARD!!!!?

It's a wonder *anything* works in software written for wintel.

------

A good subject for further discussion would be human nature,
or "unfettered" capitalism, or who knows what, that lets such
garbage become the world standard.   :-(


-----

Who wants to take bets on what will happen to M$ in these 
antitrust-long-gone days, these "corporations own 80% of
every politician" days?


---


Anyone note those three supreme-court decisions three
days ago, about no one able to sue states to make them obey
FEDERAL laws, except federal agencies themselves, which of
course are under 100% control of the current President, ie if the
president doesn't "believe in" a certain law, it cannot be
enforced, period -- not by any court.

Sorry, I'm slowy going crazy...

David
From: Hartmann Schaffer
Subject: Re: data structure for markup text
Date: 
Message-ID: <4nad3.37888$%65.97159@tor-nn1.netcom.ca>
In article <··········@dfw-ixnews19.ix.netcom.com>,
	·······@netcom.com (David Combs) writes:
> I still think you guys should look at "Scribe", done as phd
> thesis at cmu by brian reid -- a real work of genius (probably
> applies to both).
> 
> Has macros (non-lispish, via naggum's definition), environments
> that can be defined and modified, with "beforeEntry", "afterExit",
> etc args.
> 
> Seems to me that he was the only person designing these things
> who UNDERSTOOD THE PROBLEM.
> 
> And best of all it's monolothic, not a chain of programs
> a la troff or, worse, tex, latex, ..., where *who knows* where
> something goes wrong.
> 
> Scribe has a really elegant design.  Just who are these fools
> designing this sgml stuff anyway?
> ...

Looks interesting.  Is there a URL?

Hartmann Schaffer
From: David Combs
Subject: Re: data structure for markup text
Date: 
Message-ID: <7l9mk9$2qq@dfw-ixnews5.ix.netcom.com>
Here is some information:

800-365-2946 412-471-2070 
Cygnet Publishing Technologies)  ==> ······@@g.gp.cs.cmu.edu
355 Fifth Avenue, Suite 1515, Pittsburgh, PA  15222
Ask for or leave msg for Elaine Newborn 


Tell her or leave in msg you got the recommendation
from me (David Combs).

As far as I am concerned, it is far and away the best
system out there; certainly, the most intelligently designed.

The place has shrunk to only a couple of full time people,
what with every idiot on the planet using M$.

This scribe is designed for printing LARGE books, like
1000 pages (or far more), with index, tbl of contents,
all kind of neat things.

And, the postscript output does the %%page stuff.

----

Elaine told me a few months ago that their M$/intel version
is 16bits (or was it 32?), and that converting it to
whatever was needed for NT would be a bitch, and what with
the competition on that platform from M$ word, might not
be worth doing.

So on intel it works on whatever M$ has except for NT.

I myself use it on a sparcstation 5 (before that, I had it on a dec 2060,
then vax, then sun 3/160) and on each of all those it is WONDERFUL!

Do ask her what about merced, with eg solaris (which is the OS I
use now on sparc);  I'd like to know her answer, since looks like
sun will someday be switching to merced???

There is an email addr, but she tells me she doesn't read it very
often.  No web site that I know of.

Funny situation for the best system on the planet, no?
Well, what else is new, in this corporate-controlled world
we live in now?

Hey, want to read a neat book?  "A People's History of the United States",
by Howard Zinn.  All the U.S. history they didn't (and would not, or
weren't allowed to) teach you in school.

Keep in touch re Scribe.

David Combs
From: Craig Brozefsky
Subject: Re: data structure for markup text
Date: 
Message-ID: <87hfnupubt.fsf@piracy.red-bean.com>
·······@netcom.com (David Combs) writes:

> I still think you guys should look at "Scribe", done as phd
> thesis at cmu by brian reid -- a real work of genius (probably
> applies to both).

I just spent an hour searching for information about Scribe, and all I
have been able to dig up are some dead links and a few example of the
MicroEmacs manual.  Is there a current distribution of it?  The
Omegasoft pages on Sprynet have vanished, and I have no other leads.
I'm very interested, but stymied.

> Sorry, I'm slowy going crazy...

You and me both.

-- 
Craig Brozefsky                         <·····@red-bean.com>
Free Scheme/Lisp Software     http://www.red-bean.com/~craig
Less matter, more form!                       - Bruno Schulz
ignazz, I am truly korrupted by yore sinful tzourceware. -jb
From: Rainer Joswig
Subject: Re: data structure for markup text
Date: 
Message-ID: <joswig-2606992242480001@194.163.195.67>
In article <··········@dfw-ixnews19.ix.netcom.com>, ·······@netcom.com (David Combs) wrote:

> I still think you guys should look at "Scribe", done as phd
> thesis at cmu by brian reid -- a real work of genius (probably
> applies to both).

Concordia.
From: Valeriy E. Ushakov
Subject: Re: data structure for markup text
Date: 
Message-ID: <7l3lvg$ov6$1@news.ptc.spbu.ru>
In comp.lang.lisp David Combs <·······@netcom.com> wrote:

> I still think you guys should look at "Scribe", done as phd
> thesis at cmu by brian reid -- a real work of genius (probably
> applies to both).

I performed a good search some time ago, but didn't found anything
available online.  Lout was (by its author's words) influenced by
Scribe, so I was interested in learning more about it.

Lout is a batch document formatter programmed in a (sort of) lazy
functional language.

See <http://www.ptc.spbu.ru/~uwe/lout/lout.html>.


SY, Uwe
-- 
···@ptc.spbu.ru                         |       Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/            |       Ist zu Grunde gehen
From: David Dubin
Subject: Re: data structure for markup text
Date: 
Message-ID: <7l4f3g$2o1$1@vixen.cso.uiuc.edu>
David Combs (·······@netcom.com) wrote:
: I still think you guys should look at "Scribe", done as phd
: thesis at cmu by brian reid -- a real work of genius (probably
: applies to both).
[...]
: Scribe has a really elegant design.  Just who are these fools
: designing this sgml stuff anyway?

Well, I suppose Ms. Laplante would be the best person to comment, in light of
her connection to both Scribe/Cygnet Publishing and to SGML Open/Oasis.
But personally, I used Scribe up until just a couple years ago, and I agree
that from an author's perspective it was quite elegant. But designing new or
modifying document types seemed as complex as doing macros in TeX or classes
in LaTeX. The Scribe database administrator's manual didn't even pretend that
it was easy. 

Writing simple SGML DTDs is a lot easier, especially if I can then map them to
existing LaTeX classes that are exactly or almost exactly what I need. Sure, 
SGML has plenty of baroque and esoteric stuff. But nothing stops me from 
designing simple DTDs that don't use those features.

Dave
  
From: David Combs
Subject: Re: data structure for markup text
Date: 
Message-ID: <7l9mou$sau@dfw-ixnews14.ix.netcom.com>
Please see my post of 2 minutes ago.

And here is info re scribe:

800-365-2946 412-471-2070 
Cygnet Publishing Technologies)  ==> ······@@g.gp.cs.cmu.edu
355 Fifth Avenue, Suite 1515, Pittsburgh, PA  15222
Ask for or leave msg for Elaine Newborn 


Tell her that I recommended it (david combs).

Do read the other post, I said a lot more in it.

David Combs
From: Richard Tietjen
Subject: Curl formatting language
Date: 
Message-ID: <87wvwm2vrd.fsf_-_@kale.connix.com>
·······@netcom.com (David Combs) writes:

> 
> I still think you guys should look at "Scribe", done as phd
> thesis at cmu by brian reid -- a real work of genius (probably
> applies to both).
> 


Curl is an interesting language that mixes text and programming
nicely, in a Lispy way with the box concept from TeX.  I don't think
it's going anywhere, unfortunately, but it looks brilliant.  Greenspun
mentions it in his book, Philip and Alex's Guide to Web Publishing.

http://curl.lcs.mit.edu/curl/wwwpaper.html
From: IBMackey
Subject: Re: Curl formatting language
Date: 
Message-ID: <7le7ol$9jv@nnrp3.farm.idt.net>
········@kale.connix.com <········@kale.connix.com> wrote:
>·······@netcom.com (David Combs) writes:
>
>> 
>> I still think you guys should look at "Scribe", done as phd
>> thesis at cmu by brian reid -- a real work of genius (probably
>> applies to both).
>> 
>
>
>Curl is an interesting language that mixes text and programming
>nicely, in a Lispy way with the box concept from TeX.  I don't think
>it's going anywhere, unfortunately, but it looks brilliant.  Greenspun
>mentions it in his book, Philip and Alex's Guide to Web Publishing.
>
>http://curl.lcs.mit.edu/curl/wwwpaper.html

It looks great. The language would probably revolutionize LaTeX. By
the way, I did an excite search. There's a webpage for curl 2.0.

http://munkora.cs.mu.oz.au/~ad/curl/announce.html

I don't know which way the program's going, but I'm going to suggest
to the authors that they consider LaTeX or Lout as an additional
engine.

i.b.
From: William Tanksley
Subject: Re: Curl formatting language
Date: 
Message-ID: <slrn7no9ve.27l.wtanksle@dolphin.openprojects.net>
On 30 Jun 1999 23:03:49 GMT, IBMackey wrote:
>········@kale.connix.com <········@kale.connix.com> wrote:
>>·······@netcom.com (David Combs) writes:

>>Curl is an interesting language that mixes text and programming
>>nicely, in a Lispy way with the box concept from TeX.  I don't think
>>it's going anywhere, unfortunately, but it looks brilliant.  Greenspun
>>mentions it in his book, Philip and Alex's Guide to Web Publishing.

>>http://curl.lcs.mit.edu/curl/wwwpaper.html

>It looks great. The language would probably revolutionize LaTeX. By
>the way, I did an excite search. There's a webpage for curl 2.0.

The language would certainly fix a lot of problems with HTML, Javascript,
and Java.  Very nice.  With the addition of a TeX layout engine it would
be a killer.

>http://munkora.cs.mu.oz.au/~ad/curl/announce.html

Um...  I believe this is a totally different program.  Doesn't seem to
have any connection.

>I don't know which way the program's going, but I'm going to suggest
>to the authors that they consider LaTeX or Lout as an additional
>engine.

That'd be great.

>i.b.

-- 
-William "Billy" Tanksley
From: Jessica Hekman
Subject: Re: Curl formatting language
Date: 
Message-ID: <Pine.GSO.4.10.9907011403210.7675-100000@shell3.shore.net>
On 29 Jun 1999, Richard Tietjen wrote:

> Curl is an interesting language that mixes text and programming
> nicely, in a Lispy way with the box concept from TeX.  I don't think
> it's going anywhere, unfortunately, but it looks brilliant.  Greenspun
> mentions it in his book, Philip and Alex's Guide to Web Publishing.

It's going commercial; I work for them. For more information, see

http://www.curl.com/

-Jessica
From: Jeffrey B. Siegal
Subject: Re: Curl formatting language
Date: 
Message-ID: <377BC59F.EE0CBAA8@quiotix.com>
Jessica Hekman wrote:
> It's going commercial

Well, who isn't?

See "MIT Students Work for Professors' Firms And Jobs Often Cross Over
Into Classroom," _Wall Street Journal_, June 24, 1999.  

Relevant quote:

--->
[EECS Department head, Professor John Guttag] sometimes turns down
faculty who ask for leaves to start companies; otherwise he wouldn't
have enough professors to teach courses.
<---

Very sad.
From: Sebastian Rahtz
Subject: Re: Curl formatting language
Date: 
Message-ID: <87vhc3xrx3.fsf@spqr.oucs.ox.ac.uk>
Jessica Hekman <········@arborius.net> writes:

> On 29 Jun 1999, Richard Tietjen wrote:
> 
> > Curl is an interesting language that mixes text and programming
> > nicely, in a Lispy way with the box concept from TeX.  I don't think
> > it's going anywhere, unfortunately, but it looks brilliant.  Greenspun
> > mentions it in his book, Philip and Alex's Guide to Web Publishing.
> 
> It's going commercial; I work for them. For more information, see
> 
> http://www.curl.com/
> 
I thought you people had these laws about government-funded research
having to be made freely available:

  The Curl project is supported by the Information Technology Office of
  the Defense Advanced Research Projects Agency as part of its
  Intelligent Collaboration and Visualization program.

so how can you start selling it?

sebastian
From: Kent M Pitman
Subject: Re: Curl formatting language
Date: 
Message-ID: <sfwn1xfnjyi.fsf@world.std.com>
[ replying to comp.lang.lisp only
  http://world.std.com/~pitman/pfaq/cross-posting.html ]

Sebastian Rahtz <···············@oucs.ox.ac.uk> writes:

> 
> Jessica Hekman <········@arborius.net> writes:
> 
> > On 29 Jun 1999, Richard Tietjen wrote:
> > 
> > > Curl is an interesting language that mixes text and programming
> > > nicely, in a Lispy way with the box concept from TeX.  I don't think
> > > it's going anywhere, unfortunately, but it looks brilliant.  Greenspun
> > > mentions it in his book, Philip and Alex's Guide to Web Publishing.
> > 
> > It's going commercial; I work for them. For more information, see
> > 
> > http://www.curl.com/
> > 
> I thought you people had these laws about government-funded research
> having to be made freely available:
> 
>   The Curl project is supported by the Information Technology Office of
>   the Defense Advanced Research Projects Agency as part of its
>   Intelligent Collaboration and Visualization program.
> 
> so how can you start selling it?

The government, like any other organization, writes contracts.  It is
the terms of the contracts, not the identities of the participants,
that determines what you can and can't do with the result.  (Even plea
bargain agreements in criminal cases are, as I understand it, dealt
with under contract law, and are not exercises in just assuming that a
particular player--government or defendant--carries with them a
particular pre-determined disposition.)

As I understand it, though I'm not an authority on this issue, the trend
in DARPA is to focus on promoting COTS [commercial off-the-shelf]
software as a way of assuring that not only can they get the result of
the research, but also that there is a healthy competition in the
marketplace in case the research spawns competitors that can be more
featureful or cheaper.

Also, again just according to my understanding, it may be in some
cases that the government can get the result at a greatly reduced
price but that the private citizenry still has to buy it at market
prices.  I don't think there's an automatic implication that if the
government paid it's free to the public.  If there were, I think no
business would find much incentive in developing things for the
government; I think it's the practical reality of this fact which has
led to the changes in the policy over time.

It all depends on the nature of the specific contract, though.  I
haven't seen their contract so wouldn't dare to opine sight-unseen as
to the details.  The purpose of this message on my part was less to
say what the information/ownership/rights policy is and more to say
that it's likely hard to say what the policy is from the data
observed, even though some people seem to be trying.  Even within a
single government funding source, I think there are multiple kinds of 
agreements for funding they can write.
From: Jeffrey B. Siegal
Subject: Re: Curl formatting language
Date: 
Message-ID: <377D3F17.8B56A62D@quiotix.com>
Sebastian Rahtz wrote:
> I thought you people had these laws about government-funded research
> having to be made freely available:

Out of curiousity, does anyone know what laws these are?

> so how can you start selling it?

Even software that is freely available can also be sold (whether someone
chooses to buy what they can get free is another question).
From: Rolf Marvin B�e Lindgren
Subject: Re: Curl formatting language
Date: 
Message-ID: <lbziu83e6m2.fsf@morgoth.uio.no>
[ Sebastian Rahtz

| I thought you people had these laws about government-funded research
| having to be made freely available:
...
| so how can you start selling it?

they can a) develop it further b) sell services for the product

-- 
Rolf Lindgren                                        http://www.uio.no/~roffe/
·····@tag.uio.no
From: Tim Bradshaw
Subject: Re: Curl formatting language
Date: 
Message-ID: <nkj4sjntfdy.fsf@tfeb.org>
Sebastian Rahtz <···············@oucs.ox.ac.uk> writes:


> so how can you start selling it?
> 

I think they can sell descendent products?  Does this mean that the
original MIT LispM code should be free(ly available?)?

--tim
From: Sebastian Rahtz
Subject: Re: Curl formatting language
Date: 
Message-ID: <87iu83p2jh.fsf@spqr.oucs.ox.ac.uk>
Tim Bradshaw <···@tfeb.org> writes:

> Sebastian Rahtz <···············@oucs.ox.ac.uk> writes:
> 
> > so how can you start selling it?
> 
> I think they can sell descendent products? 
I don't know. You tell me. I just remember endless bleats from over
the ocean about how "our taxes paid for this so its free for ever and
ever, and I can use my god-given right to carry arms to protect my
freedom to use this software"

>  Does this mean that the
> original MIT LispM code should be free(ly available?)?
> 
I assume that pre-dated the legislation?

I dont question whether Curl can be developed onwards and sold as
such; its just a question of whether today's Curl can be _withdrawn_
from the public domain.

lord, I don't care! I am just curious.

sebastian
From: Brian Peisley
Subject: Re: Curl formatting language
Date: 
Message-ID: <slrn7nrr5r.1jh.brian@helka.mutagenic.org>
On 02 Jul 1999 14:08:34 +0000, Sebastian Rahtz wrote:

>I dont question whether Curl can be developed onwards and sold as
>such; its just a question of whether today's Curl can be _withdrawn_
>from the public domain.

I don't think the original source can be withdrawn.  This sounds very similar
to what happened with Glimpse, ( http://glimpse.cs.arizona.edu/ ).  From what
I understand, they do not have to release any new code to the public but the
existing source must remain available.

-- 
Brian Peisley
·····@mutagenic.org
From: David Peterson
Subject: Re: Curl formatting language
Date: 
Message-ID: <davep-0307991934560001@dialup-209.244.224.87.boston2.level3.net>
In article <····················@helka.mutagenic.org>,
 ·····@mutagenic.org wrote:

> I don't think the original source can be withdrawn.  This sounds very similar
> to what happened with Glimpse, ( http://glimpse.cs.arizona.edu/ ).  From what
> I understand, they do not have to release any new code to the public but the
> existing source must remain available.

Not available in the sense that they have any obligation to provide it on
demand.  But if someone does have a copy, they can't prohibit its further
replication.

Dave Peterson
SGMLWorks!

·····@acm.org
From: John Atwood
Subject: Re: Curl formatting language
Date: 
Message-ID: <7m5vcj$l45$1@news.NERO.NET>
Is anyone familiar enough with Curl and Tex to have a feel for 
how they differ?  Toss XML into that comparison as well.



John Atwood
From: Thant Tessman
Subject: Re: data structure for markup text
Date: 
Message-ID: <377CC8C9.7D78C974@acm.org>
David Combs wrote:

> A good subject for further discussion would be human nature,
> or "unfettered" capitalism, or who knows what, that lets such
> garbage become the world standard.   :-(

You really think that deciding on standards by government mandate would
be better? The only thing stupider than consumers is politicians.

-thant
From: David Fox
Subject: Re: data structure for markup text
Date: 
Message-ID: <lu908x5dwx.fsf@pipeline.ucsd.edu>
Thant Tessman <·····@acm.org> writes:

> David Combs wrote:
> 
> > A good subject for further discussion would be human nature,
> > or "unfettered" capitalism, or who knows what, that lets such
> > garbage become the world standard.   :-(
> 
> You really think that deciding on standards by government mandate would
> be better? The only thing stupider than consumers is politicians.

As the years go by this sentiment is starting to feel glib.
-- 
David Fox           http://hci.ucsd.edu/dsf             xoF divaD
UCSD HCI Lab                                         baL ICH DSCU
From: Rick Jelliffe
Subject: Re: data structure for markup text
Date: 
Message-ID: <7lsuqs$ke4$1@news1.sinica.edu.tw>
I missed the posting by Erik, but I am very interested. Can anyone email it
to me?

Erik Naggum <····@naggum.no> wrote in message
·····················@naggum.no...

>... you also need a
>   generic container form, which SGML also lacks, because it breaks all
>   possibility of validating the structure, unless supported by the core of
>   the semantics, but SGML lacks a general "list of things" concept, too.

I think SGML (and even more so, XML and XML Schemas) try to make the
"schema" or DTD serve too many purposes: is it an assert statement?, is is
a template for document construction?, can it be used to generate templates
for screen forms?   is it the grammar on which context-sensitive parsing is
conducted, to allow different delimiters to have particular meanings in
various
contexts?

I think it is primarily the last (i.e., a compiler compiler) and then
incidentally the
first, and accidentally the middle two (if at all).  Even as a kind of
assert statement
to allow validation, there are many kinds of document structures that it
cannot
handle. I have some articles at
http://www.ascc.net/xml/en/utf-8/schemas.html if anyone
is interested.

Now when XML is fresh is a great time to consider alternatives to DTDs.
I think the XML Schemas:Structures draft at http://www.w3.org/TR really
misses the mark so far, not the least because it does not try to make
"schemas"
do much more than DTDs do already.

I am very interested in hearing from anyone with ideas about "schemas" that
include more dynamic or LISPish ideas. Email ·····@gate.sinica.edu.tw


Rick Jelliffe

Computing Centre,
Academia Sinica
Taipei, Taiwan

Author, "The XML & SGML Cookbook: Recipes for Structured Information"
Prentice Hall, 1998, ISBN 0-13-614223-0
From: Rolf Marvin B�e Lindgren
Subject: Re: data structure for markup text
Date: 
Message-ID: <lbzemizkmxi.fsf@morgoth.uio.no>
[ Norman Gray

| But we seem to have that already, in the form of (La)TeX, and it seems
| to be a bad solution to the problems of document structuring for reuse
| (let us avoid the issue of `nice syntax'...).

I don't think LaTeX's shortcomings can be used as an argument either
way.  LaTeX is too closely tied to being a markup language for
printable paper-based documents.  there is no standard way to handle
semantics.  TeX's syntax can be changed by hacking charcode
values. 

but most importantly, are code and data the same in LaTeX?  I wouldn't
think so. 

-- 
Rolf Lindgren                                        http://www.uio.no/~roffe/
·····@tag.uio.no
From: William F. Hammond
Subject: Re: data structure for markup text
Date: 
Message-ID: <i7emj0qmjh.fsf@hilbert.math.albany.edu>
········@news.hex.net (Christopher Browne) writes:

I think that we see things in much the same way.

> On 23 Jun 1999 10:58:11 -0400, William F. Hammond
> <·······@hilbert.math.albany.edu> wrote: 
> >SGML (or a subcategory such as XML) is the best extant choice if you
> >want a *reliable* way to have different presentation formats from a
> >single source document.
> 
> This doesn't address Erik's issues.
> 
> If I take a document, and represent it a big Lisp list, I can then
> take a Lisp program that renders it in multiple formats.

I addressed Erik's issue by pointing out that his markup via lisp is
(non-canonically) isomorphic to a subcategory of SGML.  I added that
my LaTeX-like markup (but not LaTeX itself) is parallel.

So Eric's lisp approach and my approach with "Generalized Extensible
LaTeX-Like Markup" are parallel MUI's for SGML, where the acronym MUI
stands for "markup user interface" (as opposed to GUI = "graphical
user interface").

Humans do need MUI's.  GUI's cannot be expected to work for complex
languages.  This point was made by Donald Knuth when he invented TeX,
and IMHO the point's validity has stood the test of time.

While TeX remains an excellent typesetting language for print, in
order to have smart documents in the networked library, in the way
that the Linux folk are reaching for smarter documents, we need
SGML-based MUIs for mathematics that are powerful enough to meet the
needs of present high-end TeX users.

[snip]
> >The lack of easy accessibility involves money, processing-language
> >choices, lack of education about SGML, and lack of understanding of
> >staged processing.
> 
> Tools != syntax.

Of course.

[snip]
> The criticism is not of the tools.  The criticism is deeper than that;
> it implies that the problems that are encountered with SGML are
> systematic, resulting from the nature of SGML, applicable regardless
> of the tools.

Yes, I understand that the criticism was directed to the design of
SGML itself.

My way of looking at it differs from Eric Naggum's in that he is
saying that the glass is 75% empty, while I am saying that it is at
least 25% full.  I think we agree that better markup interfaces to
SGML are needed.  :-)

> The point is that SGML is inherently hard to create tools for.  Its
> syntax mandates highly complex tools, which means that the problems
> you acknowledge are the *result* of what Erik has said.

I do believe that too much is locked up behind closed doors.  This is
a good bit of SGML's political problem.  Pick up a random topic, go to
Robin Cover's web site, and observe the number of dead ends if you
want real information.

Yes, it is hard to create tools for general SGML.  But James Clark's
SP did the hardest part of that job with great success.

There are many things in SGML that each of us on our own platforms
with our own assumptions can avoid dealing with.  So we are inclined
to dismiss them as unnecessary.  It is a very general framework.  I
expect that much of it has never been exercised.  I also expect that
as we become more sophisticated, we will find uses for many of these
things.
 
[snip]
> >>SGML grew in the back-end of the production line, but it really has no
> >>value except insofar as it is able to capture meta-information, and that
> >>means capturing the _intent_ of some particular forms of expression, but

(">>" cites my quotation of Erik Naggum -- 21 Jun 1999 13:35:36 +0000)

[snip]
> >It captures *structural content*.  Meta-information is part of
> >structural content.  I recommend the following operational definition
> >of what "structural content" is:   A piece of markup represents
> >structural content if it has meaning for at least two different
> >presentation formats where neither may be derived systematically and
> >robustly from the other.
> 
> Again, presenting a definition of structural content is pretty
> orthogonal to the issue Erik raised.

I was saying that there is more to structural content than
"meta-information".  For example, the organization of a document
into sections, subsections, and paragraphs is structural content
that I would not be happy characterizing as meta-information.

My definition of structural content was my off-the-cuff attempt to
recast the static v. dynamic issue.  (This issue deserves more
discussion.)  More operationally, that definition characterizes what I
am willing to create as an element in a document type definition.  For
that purpose if hypotheticially HTML were my only target format with
an authoring language, I still might view as content an authoring tag
designed to have meaning when handled by (1) a monolithic web browser
and (2) a browser for the visually impaired inasmuch as these are two
different ultimate formattings.

[snip]
> If the poor user is sitting out there trying to get Word to make
> something "look right," and is not composing documents with a mindset
> oriented to structural content, then what you get at the end of the
> day won't be structural markup.

Even with a mindset toward structural content I wonder whether
this is a reasonable expectation except with the simplest of SGML
languages.

As far as "look right" is concerned, wasn't it Brian Kernighan who
noted that the problem with WYSIWYG is that WYSIAYG ("what you see
is all you get")?  [Does anyone have the citation for this?]

[snip]
> >SGML is only a framework.  One always needs to supply the language and
> >a family of processors for the various targets.  Such a processor
> >usually is a pipeline of length greater than 1.  A well designed
> >pipeline component consists of small modular pieces that are easy to
> >adjust as needed.  (If one finds that not to be the case, then the
> >design needs revisiting.)
> 
> Closer to the issue now, indeed, SGML is "only a framework."
> Furthermore, it is only a framework for representing markup syntax.

SGML may be used to represent much that is not classical markup.  For
example, any classical assembly language can be modeled as an SGML
language.  DVI, the "binary" format in which Donald Knuth's TeX is
compiled is equivalent to a text language DTL created by Geoffrey
Tobin.  DTL has the structure of a classical assembly language.

Whether it is useful to model DVI this way is not the point.  The
point is that the SGML framework covers more ground that one might
imagine at first.

Another example: EDI

[snip]
> By the time you create a powerful enough set of tools to represent
> that "pipeline" of processors, you have added a whole lot of further,
> decidedly NONgeneralized, functionality.

Indeed, but for inhouse use.  It can be platform independent.

I do tend to take the view that net-served SGML should be XML with
"style" (however, that works out).

[snip]
> >If one wants two presentations, say, print and (valid) HTML, then
> >Texinfo may be an option.  But someday I expect that there will be a
> >DTD for Texinfo, and it will come to be understood that the best way
> >to make Texinfo from something like DocBook is to format DocBook into
> >Texinfo-SGML and then use a canonical Texinfo formatter for that
> >language.
> 
> Far more likely is for someone to write DSSSL that takes a DocBook
> parse tree and generates TeXinfo.

I believe that such work is under way.  However, any processor for
formatting something for ultimate processing toward TeX, the program,
is going to run into (surmountable) difficulty with the conflict
between the SGML attitude toward whitespace and the TeX attitude
(where blank lines are significant and newlines following certain
markup in input are not viewed as whitespace.)  So it is best to do
deal with these issues once and be done with them.

This is a very good example of the kind of thing that really needs
to be freely available open source.

[big snip]
> >As the years go by, the test of a markup language created today will
> >be its amenability to the automatic processing of legacy documents
> >into the formats of the future.
> 
> The problem, always, is in getting automated support for finding the
> structure that we *later* find that we *actually* care about amongst
> the structuring that we *thought* we cared about.

It is important in language (i.e., document type) design to try to
anticipate what might be wanted in the future.  (It's the kind of
job for which you need a mathematician, rather than a computer
scientist.   :-)   )

[snip]
> I'd say that a Lisp is likely to be *real* good at c); the only things
> that SGML supports at all are a) and b).

(Elisp is used in my [not yet released] GELLMU MUI.  I wish that I
had a freely available open source parser interfacing framework
equivalent to "sgmlspl" in Elisp.)

SGML provides a framework for language definition (applicable to
certain kinds of languages) and a style of processing that is designed
to be efficient and easily customizable.

Goldfarb's design is good.

But, yes, it is vulnerable to the effects of poor language design and
poor processing design.  And, yes, don't forget Gresham's Law:
(1) Bad design drives out good.  (2) Bad practice drives out good.


William F. Hammond                   Dept. of Mathematics & Statistics
518-442-4625                                  The University at Albany
·······@math.albany.edu                      Albany, NY 12222 (U.S.A.)
http://math.albany.edu:8000/~hammond/          Dept. FAX: 518-442-4731

Never trust an SGML/XML vendor whose web page is not valid HTML.
From: Will Fitzgerald
Subject: Re: data structure for markup text
Date: 
Message-ID: <blra3.2573$ZD4.11638@newsfeed.slurp.net>
Christopher Browne wrote in message
<······················@news2.giganews.com>...
>
> [in a discussion of how to represent XML/SGML as Lisp data structures]
>
>I have not yet come up with the "perfect" solution for handling
>attributes.  (e.g. - how to represent ID="FINANCES" in the SGML <SECT1
>ID="FINANCES"> ... </SECT1>)

One option is to treat attributes as a separate, special property, with the
attributes as a association list:

(sect1
  (attribute (id "finances"))
  (title "Title")
    ... )
From: ········@my-deja.com
Subject: Re: data structure for markup text
Date: 
Message-ID: <7km4tq$jr0$1@nnrp1.deja.com>
In article <····················@newsfeed.slurp.net>,
  "Will Fitzgerald" <··········@no-spam-please-neodesic.com> wrote:

> One option is to treat attributes as a separate, special property,
> with the attributes as a association list:

If I understand you correctly, this is exactly what K, a Scheme/APL
hybrid (http://www.kx.com/) does. Maps are built-in, which also
helps if you want to build hierarchical documents with named nodes.
Attributes are just special maps attached to each node. There are
many interesting ideas in K that other Scheme implementations might
borrow from.

-- O.L.


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
From: David Bakhash
Subject: Re: data structure for markup text
Date: 
Message-ID: <cxju2s4up30.fsf@acs5.bu.edu>
One place to look for Lisp-related data structure is Emacs-w3, the web browser
for Emacs and XEmacs.  Also, there was a recent release of a Linux-based web
browser that worked under ACL, which is open-source.  Just some starting
points.  Hope they help.

http://www.xemacs.org (then look for w3)

dave