From: Rafal Strzalinski
Subject: HTML parser for CMUCL
Date: 
Message-ID: <bf43be7e.0105291119.4b2d0161@posting.google.com>
Hi,

I'm looking for simple to use HTML parser that will works under CMUCL.
I've found PHTML package from Franz, which is fine, but unfortunately it 
doesn't work under CMUCL. Has anyone ported this.


--
Rafal Strzalinski
lisp hacker wannabe :-)

From: Pierre R. Mai
Subject: Re: HTML parser for CMUCL
Date: 
Message-ID: <873d9ngaer.fsf@orion.bln.pmsf.de>
·····@e-point.pl (Rafal Strzalinski) writes:

> I'm looking for simple to use HTML parser that will works under CMUCL.
> I've found PHTML package from Franz, which is fine, but unfortunately it 
> doesn't work under CMUCL. Has anyone ported this.

The question is, do you need an HTML parser that parses real-world
HTML, i.e. the completely bogus stuff that people think is HTML, and
put on their web-pages, or let their scripts generate, or can you make
do with SGML/XML parsers that will allow you to parse stuff that
conforms to the HTML/XHTML DTDs?

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: Eugene Sandulenko
Subject: Re: HTML parser for CMUCL
Date: 
Message-ID: <bbb1c4f1.0106021613.11bb353a@posting.google.com>
"Pierre R. Mai" <····@acm.org> wrote in message news:<··············@orion.bln.pmsf.de>...
> The question is, do you need an HTML parser that parses real-world
> HTML, i.e. the completely bogus stuff that people think is HTML, and
> put on their web-pages, or let their scripts generate, or can you make
> do with SGML/XML parsers that will allow you to parse stuff that
> conforms to the HTML/XHTML DTDs?
What would be solution for real-world HTML?

Eugene
From: Pierre R. Mai
Subject: Re: HTML parser for CMUCL
Date: 
Message-ID: <87k82to45m.fsf@orion.bln.pmsf.de>
·········@yahoo.com (Eugene Sandulenko) writes:

> "Pierre R. Mai" <····@acm.org> wrote in message news:<··············@orion.bln.pmsf.de>...
> > The question is, do you need an HTML parser that parses real-world
> > HTML, i.e. the completely bogus stuff that people think is HTML, and
> > put on their web-pages, or let their scripts generate, or can you make
> > do with SGML/XML parsers that will allow you to parse stuff that
> > conforms to the HTML/XHTML DTDs?
> What would be solution for real-world HTML?

The short answer:  There are no real solutions to parsing real-world
HTML, it's all just a morass of heuristics, etc.  The only way to win
is not to play, or something like that.

Most browser engines contain stuff that tries its best in the face of
real-world HTML, though I don't know of any CL stuff that goes to
similar lengths.  Not that I have looked closely for such a thing, as
I'm lucky enough not to have needed it in the past.  If you only need
to lift bits of information out of HTML files, it is often best not to
try to parse it, but rather to search for the information directly
based on knowledge of its structure.

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: Simon Andr�s
Subject: Re: HTML parser for CMUCL
Date: 
Message-ID: <vcdu22392jd.fsf@russell.math.bme.hu>
·····@e-point.pl (Rafal Strzalinski) writes:


> I've found PHTML package from Franz, which is fine, but unfortunately it 
> doesn't work under CMUCL. Has anyone ported this.

Yes, I'll mail you my quick and dirty port. 
I hope it works for you. 

Andras