From: Rupert Swarbrick
Subject: Reading random bits of clhs
Date: 
Message-ID: <g0c7cg$p1f$1@news.albasani.net>
After Kenny's misadventure with logbitp the other day, and the ensuing
comments of "I should be reading random snippets of the hyperspec"
(which I don't think I've made up, but can't find now...) I thought
I'd do something to make it (slightly) more convenient.

So I present some horribly horribly hacky code to grab a list of the
relevant pages from the lispworks site and store it to
~/.hyperspec-list   (this isn't even remotely portable. Meh)

I didn't really want to add attachments, so I'm pasting
inline. There's a simple asd, the lisp itself and (most importantly
for me!)  a snippet of code that you can plonk in ~/.stumpwmrc to add
a command that actually gets firefox to open stuff.

If anyone actually wants this other than me, I'd be happy to clean it
up (a lot). For example, if you change a couple of the parameters, it
works on lisp.org so that would be easy to make nicer. Also, it's easy
enough to make the stumpwm stuff better. But I won't bother unless
anyone wants it!


Rupert


Without further ado (small stuff first):

From my stumpwmrc:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defvar *hyperspec-mutex*
  (sb-thread:make-mutex :name "Hyperspec Lock"))

(defun visit-webpage (url)
  (run-shell-command
   (concatenate 'string
                "/opt/firefox/firefox " url)))

(defun hyperspec-thread-fun ()
  (let ((did-job))
    (sb-thread:with-mutex
        (*hyperspec-mutex* :wait-p nil)
      (require :clhs-picker)
      (visit-webpage (clhs-picker:get-random-link))
      (setf did-job t))
    (unless did-job
      (stumpwm:message "There's already a thread working on this."))))

(define-stumpwm-command "hyperspec-random" ()
  (sb-thread:make-thread #'hyperspec-thread-fun))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


The asd:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;; -*- Mode: lisp -*-

(defsystem clhs-picker
  :author "Rupert Swarbrick <··········@gmail.com>"
  :licence "GPLv3"
  :description "Get a random page reference from the clhs."
  :components ((:file "clhs-picker"))

  :depends-on (:iterate :drakma :cl-ppcre))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


The lisp:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defpackage :clhs-picker
  (:use :cl :iterate)
  (:export get-random-link))

(in-package :clhs-picker)

;;;; UTILS ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defmacro aif (test-form then-form &optional else-form)
  `(let ((it ,test-form))
     (if it ,then-form ,else-form)))

(defmacro awhen (test-form &body body)
  `(aif ,test-form
	(progn ,@body)))

(defmacro join-strings (&rest strings)
  `(concatenate 'string ,@strings))

(defun drop-nils (list)
  (cond
    ((null list)
     nil)
    ((car list)
     (cons (car list) (drop-nils (cdr list))))
    (t
     (drop-nils (cdr list)))))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


(defparameter *root-url*
  "http://www.lispworks.com/documentation/HyperSpec/")

(defparameter *relative-master*
  "Front/X_Master.htm")

(defparameter *frontmatter-dir*
  "Front/")

(defparameter *non-nested-anchor-re*
  "<[Aa][^<]+[Hh][Rr][Ee][Ff]=\"([^\"]*)\">[^<]+</[Aa]>")

(defparameter *links-fname*
  (join-strings (sb-ext:posix-getenv "HOME")
                "/.hyperspec-links"))

(defun all-simple-anchors (url)
  (let ((urls))
    (cl-ppcre:do-register-groups (url)
        (*non-nested-anchor-re*
         (drakma:http-request url))
      (push url urls))

    urls))

(defun all-anchors-firstmatch (url re)
  (let ((urls (all-simple-anchors url)))
    
    (iterate
      (for x in urls)
      (cl-ppcre:do-register-groups (match)
          (re x)
        (awhen match (collect it))))))

(defun get-letter-pages ()
  (drop-nils
   (mapcar
    (lambda (url)
      (join-strings
       *root-url* *frontmatter-dir* url))

    
    (all-anchors-firstmatch
     (join-strings *root-url* *relative-master*)
     ".*/([^/]*[Mm]ast[^/]*\\.html?)"))))
;; (the above regexp should work for lispworks.com or lisp.org)

(defun get-page-locs (index-url)
  (drop-nils
   (mapcar
    (lambda (page)
      (join-strings *root-url* "Body/" page))

    (all-anchors-firstmatch
     index-url
     ".*/Body/(.*)"))))

(defun grab-links ()
  (remove-duplicates
   (apply #'concatenate 'list
          (mapcar #'get-page-locs
                  (get-letter-pages)))
   :test #'string=))

(defun store-links ()
  (let ((urls (grab-links)))
    (with-open-file (f *links-fname* :direction :output
                       :if-exists :supersede)
      (iterate (for url in urls) (format f "~A~%" url)))

    urls))
  
(defun get-links-somehow ()
  (aif
   (with-open-file (f *links-fname* :direction :input
                      :if-does-not-exist nil)
     (when (streamp f)
       (iterate
         (for i upfrom 0)
         (multiple-value-bind (line eof)
             (read-line f nil)
           (when eof (terminate))
           (collect line)))))

   it
   
   (store-links)))

(defun get-random-link ()
  (let ((urls (get-links-somehow)))
    (when urls
      (nth (random (length urls)) urls))))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: Reading random bits of clhs
Date: 
Message-ID: <rem-2008may13-005@yahoo.com>
> From: Rupert Swarbrick <··········@gmail.com>
> If anyone actually wants this other than me, I'd be happy to
> clean it up (a lot). ...
> (defun visit-webpage (url)
>   (run-shell-command
>    (concatenate 'string
>                 "/opt/firefox/firefox " url)))

Ugly!! You concatenate all the pieces of a shell command, just so
the shell can then parse your string to break it back apart. If the
URL contains any special characters interpreted specially by the
shell, such as backslant or apostrophe, it breaks.

Better (in CMUCL) to call EXT:RUN-PROGRAM where you give the path
to the executable and explicitly give a list of the arguments to
that program, so the shell parse never sees any of it, and there's
not the extra layer of process for the shell in the first place.
More efficient and cleaner:

(defun visit-webpage (url)
  (ext:run-program "/opt/firefox/firefox" (list url)))

I have't tested that, since I'm on a VT100 dialup where the only
Web browser I can run is lynx. I think I have the syntax correct.

Try this both ways:
 (visit-webpage "http://www.rawbw.com/~rem/HelloPlus/h2.php?foo=a'b")
I betcha your way will give a shell parse error due to unmatched '
 (Hey, Rocky! Watch me pull three vertically-aligned apostrophes '
  out of my hat!)

> (defparameter *non-nested-anchor-re*
>   "<[Aa][^<]+[Hh][Rr][Ee][Ff]=\"([^\"]*)\">[^<]+</[Aa]>")

Ouch, you're in love with regular expressions? G[R]EEK CHICKEN SCRATCHES!!!
Why not use something nicely obviously structured?
For example, there was a recent thread about something similar to BNF.

> ".*/([^/]*[Mm]ast[^/]*\\.html?)"

Ouch again!
From: Rupert Swarbrick
Subject: Re: Reading random bits of clhs
Date: 
Message-ID: <g0h5br$lt8$1@news.albasani.net>
···················@SpamGourmet.Com (Robert Maas, http://tinyurl.com/uh3t) writes:

>> From: Rupert Swarbrick <··········@gmail.com>
>> If anyone actually wants this other than me, I'd be happy to
>> clean it up (a lot). ...
>> (defun visit-webpage (url)
>>   (run-shell-command
>>    (concatenate 'string
>>                 "/opt/firefox/firefox " url)))
>
> Ugly!! You concatenate all the pieces of a shell command, just so
> the shell can then parse your string to break it back apart. If the
> URL contains any special characters interpreted specially by the
> shell, such as backslant or apostrophe, it breaks.

That's a good point. I admit I was being lazy (and using the stumpwm
"program runner" rather than thinking). But I'll change that.

> Ouch, you're in love with regular expressions? G[R]EEK CHICKEN SCRATCHES!!!
> Why not use something nicely obviously structured?
> For example, there was a recent thread about something similar to BNF.
>
>> ".*/([^/]*[Mm]ast[^/]*\\.html?)"
>
> Ouch again!

Yeah, that's true, but again I was trying to do this quickly and know
regexp syntax already. I admit that the "a href"-finding code is
butt-ugly. Could you give me a pointer to the thread you mention? It
sounds interesting.


Incidentally, clearly no-one actually tried the stuff to integrate
with stumpwm, since the nested require doesn't work (unless the
package has already been loaded). Only noticed this when I restarted
the window manager this morning. Sigh.

Rupert
From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: Reading random bits of clhs
Date: 
Message-ID: <rem-2008jun06-001@yahoo.com>
> From: Rupert Swarbrick <··········@gmail.com>
> > Ouch, you're in love with regular expressions? G[R]EEK CHICKEN SCRATCHES!!!
> > Why not use something nicely obviously structured?
> > For example, there was a recent thread about something similar to BNF.
> >> ".*/([^/]*[Mm]ast[^/]*\\.html?)"
> > Ouch again!

> Yeah, that's true, but again I was trying to do this quickly and
> know regexp syntax already.

But don't you just *hate* needing to know that keystroke-efficient
but human-interface-horrid syntax? Don't you wish you never needed
to learn it in the first place? Wouldn't it be nice if nobody else
ever had to learn what you suffered learning?

> I admit that the "a href"-finding code is butt-ugly. Could you
> give me a pointer to the thread you mention? It sounds interesting.

The article I posted announcing my BNF-like table-based parser:
 <http://groups.google.com/group/comp.programming/msg/4b6f406ace0655c3>
= Message-ID: <·················@Yahoo.Com>
From: Pascal J. Bourguignon
Subject: Re: Reading random bits of clhs
Date: 
Message-ID: <7cwslx18cm.fsf@pbourguignon.anevia.com>
Rupert Swarbrick <··········@gmail.com> writes:

> After Kenny's misadventure with logbitp the other day, and the ensuing
> comments of "I should be reading random snippets of the hyperspec"
> (which I don't think I've made up, but can't find now...) I thought
> I'd do something to make it (slightly) more convenient.
>
> [...]
>
> Without further ado (small stuff first):
>
> From my stumpwmrc:

Well, emacs has more things built-in so you can just do:

(defun random-hyperspec ()
  (interactive)
  (let* ((random-hyperspec-symbol
          (let ((syms '()))
            (do-symbols (sym common-lisp-hyperspec-symbols) (push sym syms))
            (nth (random (length syms)) syms)))
         (random-page (let ((pages (symbol-value random-hyperspec-symbol)))
                        (nth (random (length pages)) pages))))
    (browse-url (concat common-lisp-hyperspec-root "Body/" random-page))))


But of course, hyperspec has not only dictionnary pages, but also
presentation pages that are good reading too.


-- 
__Pascal Bourguignon__