In article <·············@WINTERMUTE.eagle>, SDS <···········@cctrading.com> writes:
>> I am pretty sure that someone has implemented HTML parsing using LISP
>> already, so I would appreciate some pointers/hints.
>>
Here's the function from my Charlotte sources that does the first
level of parsing into tags, which does what you want.
My code futher "intern"s the tags into real objects,
and parses the tag attributes. The tag objects are defined
using a macro that expands (ultimately) into a defclass form
for each tag type, and parses the tag attributes for the given tag.
I haven't included that code here.
Hope this is helpful.
-Kelly Murray ···@franz.com
;;
;; parse the html text file into a list of tags and content strings
;;
(function parse-html-into-tags (buffer)
(loop with start = 0
with items = nil
with done = nil
until done
finally (return (nreverse items))
do
(let tag-start = (position #\< buffer :start start)
tag-end = (when tag-start (position #\> buffer :start tag-start))
do
(if (and tag-start tag-end)
then
(if (not (eq start tag-start))
then
(let text = (string-trim nil ; '(#\space #\tab #\newline)
(subseq buffer start tag-start))
do
(if (not (zerop (length text)))
then
(push text items))))
(push (subseq buffer tag-start tag-end) items)
(setf start (1+ tag-end))
else
(setf done t)
(let text = (string-trim '(#\space #\tab #\newline)
(subseq buffer start))
do
(if (not (zerop (length text)))
then
(push text items)))
))))
(function intern-html-tags (taglist)
(loop for tags on taglist
for tag = (first tags)
with objs = nil
finally (return (nreverse objs))
do
;;
;; see if a tag or content
(if (equal (char tag 0) #\<)
then
(push (intern-html-tag tag) objs)
else
(push (make-instance 'html-content :string tag) objs)
)))