From: Kelly Murray
Subject: Re: parsing HTML using LISP
Date: 
Message-ID: <60v7nc$gqn$1@vapor.franz.com>
In article <·············@WINTERMUTE.eagle>, SDS <···········@cctrading.com> writes:
>> I am pretty sure that someone has implemented HTML parsing using LISP
>> already, so I would appreciate some pointers/hints.
>> 

Here's the function from my Charlotte sources that does the first
level of parsing into tags, which does what you want.
My code futher "intern"s the tags into real objects, 
and parses the tag attributes.  The tag objects are defined
using a macro that expands (ultimately) into a defclass form
for each tag type, and parses the tag attributes for the given tag.
I haven't included that code here.  

Hope this is helpful.

-Kelly Murray  ···@franz.com

;;
;; parse the html text file into a list of tags and content strings
;;
(function parse-html-into-tags (buffer)
  (loop with start = 0
        with items = nil
        with done = nil
        until done
        finally (return (nreverse items))
      do
	(let tag-start = (position #\< buffer :start start)
	     tag-end = (when tag-start (position #\> buffer :start tag-start))
	  do
	  (if (and tag-start tag-end)
	    then
	      (if (not (eq start tag-start))
		then
		 (let text = (string-trim nil ; '(#\space #\tab #\newline)
					  (subseq buffer start tag-start))
		    do
		    (if (not (zerop (length text)))
		      then
		       (push text items))))
	      (push (subseq buffer tag-start tag-end) items)
	      (setf start (1+ tag-end))
	    else
	      (setf done t)
	      (let text = (string-trim '(#\space #\tab #\newline)
				       (subseq buffer start))
		do
		(if (not (zerop (length text)))
		  then
		   (push text items)))
	      ))))

(function intern-html-tags (taglist)
  (loop for tags on taglist
        for tag = (first tags)
        with objs = nil
        finally (return (nreverse objs))
      do
	;;
	;; see if a tag or content
	(if (equal (char tag 0) #\<)
	  then
	    (push (intern-html-tag tag) objs)
	  else
	    (push (make-instance 'html-content :string tag) objs)
	    )))
From: Gareth McCaughan
Subject: Re: parsing HTML using LISP
Date: 
Message-ID: <86u3f0tjt2.fsf@g.pet.cam.ac.uk>
Kelly Murray wrote:

> (function parse-html-into-tags (buffer)
>   (loop with start = 0
>         with items = nil
>         with done = nil
>         until done
>         finally (return (nreverse items))
>       do
> 	(let tag-start = (position #\< buffer :start start)
> 	     tag-end = (when tag-start (position #\> buffer :start tag-start))

<IMG ... ALT="<foo>"> ?

-- 
Gareth McCaughan       Dept. of Pure Mathematics & Mathematical Statistics,
·····@dpmms.cam.ac.uk  Cambridge University, England.