html parser

From: Alex Mizrahi
Subject: html parser
Date: Sat, 11 Nov 2006 15:30:45 +0000
Message-ID: <4555eca8$0$49205$14726298@news.sunsite.dk>

Hello, All!

i want to do a simple parsing of a HTML data.
before i'll hack something with regex, i'm going to check what libs are 
currently available.

i've tried cl-html-parse that is port of Allegro's parser, but in ABCL it 
doesn't parse anything -- it just spits the whole string instead. possibly 
it's ABCL's fault..
are there any other parsers out there?

With best regards, Alex Mizrahi.

Re: html parser Lars Rune Nøstdal
Re: html parser Alex Mizrahi
- Re: html parser ·······@gmail.com
Re: html parser Petter Gustad

From: Lars Rune Nøstdal
Subject: Re: html parser
Date: Sat, 11 Nov 2006 15:49:39 +0000
Message-ID: <pan.2006.11.11.15.49.36.847681@gmail.com>

On Sat, 11 Nov 2006 17:30:45 +0200, Alex Mizrahi wrote:

> Hello, All!
> 
> i want to do a simple parsing of a HTML data.
> before i'll hack something with regex, i'm going to check what libs are 
> currently available.
> 
> i've tried cl-html-parse that is port of Allegro's parser, but in ABCL it 
> doesn't parse anything -- it just spits the whole string instead. possibly 
> it's ABCL's fault..
> are there any other parsers out there?

I've used S-XML ( http://www.cliki.net/S-XML ) for this kind of thing.
Hasty copy/paste-job:


(defmacro aif (result-symbol test-form 
               then-form &optional else-form)
  `(let ((,result-symbol ,test-form))
     ,result-symbol ;; To avoid warning.
     (if ,result-symbol
         ,then-form
         ,else-form)))


(defmacro awhen (result-symbol test-form &body body)
  `(aif ,result-symbol ,test-form
        (progn ,@body)))


(defun fillTemplate (template template-data)
  "Fill `template' (string) using data from `template-data' (a-list)."
  (with-output-to-string (ss)
    (print-xml 
     (let ((xml (parse-xml-string template :output-type :sxml)))
       (labels ((walkDOM (elt)
                  (when (listp elt)
                    (if (and (listp (cadr elt)) (eq (caadr elt) ·@))
                        (awhen (new-data (find (second (find :ID (cdadr elt) :key #'first))
                                               template-data
                                               :key #'car
                                               :test #'equal))
                          (setf (third elt) (cdr new-data)))
                        (dolist (elt elt)
                          (walkDOM elt))))))
         (walkDOM (cadr xml)))
       xml)
     :stream ss
     :input-type :sxml
     :pretty t)))


(defun test ()
  (fillTemplate (getTextFile "test.html")
                '(("heading" . "The Dynamic Header")
                  ("main-content" . "The Dynamic Content"))))


#| test.html |#
;;;;;;;;;;;;;;;

<html>
  <body>
    <h1 id="heading">Default heading</h1>
    <div id="main-content">Our programmer should fill in dynamic content here.</div>
    <p/>
    
  </body>
</html>


Well, it kinda works.. :}


-- 
Lars Rune Nøstdal
http://lars.nostdal.org/

From: Alex Mizrahi
Subject: Re: html parser
Date: Sat, 11 Nov 2006 15:37:01 +0000
Message-ID: <4555ee20$0$49198$14726298@news.sunsite.dk>

(message (Hello 'Alex)
(you :wrote :to '(All) :on '(Sat, 11 Nov 2006 17:30:45 +0200))
(

 AM> are there any other parsers out there?

by the way i have cl-ppcre working under ABCL, so if someone can share just 
a bunch of functions that use cl-ppcre to extract some information from 
HTML, that can be useful too. (basically i need to extract all the text).

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"People who lust for the Feel of keys on their fingertips (c) Inity")

From: ·······@gmail.com
Subject: Re: html parser
Date: Sat, 11 Nov 2006 20:49:32 +0000
Message-ID: <1163278172.799381.120520@i42g2000cwa.googlegroups.com>

Alex Mizrahi wrote:

> if someone can share just
> a bunch of functions that use cl-ppcre to extract some information from
> HTML, that can be useful too. (basically i need to extract all the text).

Not CL-PPCRE, but it should be easy to port this one:

  http://weitz.de/html-extract/

Cheers,
Edi.

From: Petter Gustad
Subject: Re: html parser
Date: Sat, 11 Nov 2006 21:34:12 +0000
Message-ID: <7dlkmhr7l7.fsf@www.gratismegler.no>

"Alex Mizrahi" <········@users.sourceforge.net> writes:

> i've tried cl-html-parse that is port of Allegro's parser, but in ABCL it 
> doesn't parse anything -- it just spits the whole string instead. possibly 
> it's ABCL's fault..

Will you get this little example code to run:

http://tinyurl.com/y2gxdg

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?