From: Peter Seibel
Subject: Practical Common Lisp gets practical
Date: 
Message-ID: <m3r7qp6xi5.fsf@javamonkey.com>
I just put up a new chapter (chapter 18) of Practical Common Lisp.
This is the first of what will be several "practical" chapters that
will finish the book (or me ;-)) wherein I write some piece of actual
code and talk about it. I realize that I may have used some language
features that I haven't discussed elsewhere in the book--I'm not going
to worry too much about it right now and figure out where to work that
stuff in when I revise. So caveat newbie. (But feel free to send me
comments if there are places you get particularly lost or confused.)
As always I'm interested in any and all feedback from newbies and
wizards alike.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: Jeff
Subject: Re: Practical Common Lisp gets practical
Date: 
Message-ID: <tpzPc.200582$%_6.99505@attbi_s01>
Peter Seibel wrote:

> I just put up a new chapter (chapter 18) of Practical Common Lisp.
> This is the first of what will be several "practical" chapters that
> will finish the book (or me ;-)) wherein I write some piece of actual
> code and talk about it. 

[...]

> As always I'm interested in any and all feedback from newbies and
> wizards alike.

Very nice. I really enjoyed reading it. It is nice to see practical
examples of any language in use. Especially examples that benefit from
using said language.

Jeff
From: Kenny Tilton
Subject: Re: Practical Common Lisp gets practical
Date: 
Message-ID: <tQAPc.111540$a92.81904@twister.nyc.rr.com>
Peter Seibel wrote:
> I just put up a new chapter (chapter 18) of Practical Common Lisp.

You mean this?:

    http://www.gigamonkeys.com/book/practical-a-spam-filter.html

:)

kenny

-- 
Cells? Cello? Celtik?: http://www.common-lisp.net/project/cells/
Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film
From: Peter Seibel
Subject: Re: Practical Common Lisp gets practical
Date: 
Message-ID: <m3isc07v5v.fsf@javamonkey.com>
Kenny Tilton <·······@nyc.rr.com> writes:

> Peter Seibel wrote:
>> I just put up a new chapter (chapter 18) of Practical Common Lisp.
>
> You mean this?:
>
>     http://www.gigamonkeys.com/book/practical-a-spam-filter.html

Ah, yes. Thank you.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Michael Hudson
Subject: Re: Practical Common Lisp gets practical
Date: 
Message-ID: <m3pt68gw7p.fsf@pc150.maths.bris.ac.uk>
Kenny Tilton <·······@nyc.rr.com> writes:

> Peter Seibel wrote:
> > I just put up a new chapter (chapter 18) of Practical Common Lisp.
> 
> You mean this?:
> 
>     http://www.gigamonkeys.com/book/practical-a-spam-filter.html
> 
> :)

The first mention of Gary Robinson seems to suggest that you already
know you're using his algoritms:

   Robinson suggested a function that is based on the Bayesian notion
   of incorporating observed data into prior knowledge or assumptions.

Did some edit reorder things in this respect?

Also, you could use a reference to his work, maybe

   http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html

?  Finally, I'll notice that you skip over the icky issues of how to
parse real world email :-)

Otherwise, looks pretty good!

Cheers,
mwh

-- 
31. Simplicity does not precede complexity, but follows it.
  -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
From: Peter Seibel
Subject: Re: Practical Common Lisp gets practical
Date: 
Message-ID: <m3acxc6xrl.fsf@javamonkey.com>
Michael Hudson <···@python.net> writes:

> Kenny Tilton <·······@nyc.rr.com> writes:
>
>> Peter Seibel wrote:
>> > I just put up a new chapter (chapter 18) of Practical Common Lisp.
>> 
>> You mean this?:
>> 
>>     http://www.gigamonkeys.com/book/practical-a-spam-filter.html
>> 
>> :)
>
> The first mention of Gary Robinson seems to suggest that you already
> know you're using his algoritms:
>
>    Robinson suggested a function that is based on the Bayesian notion
>    of incorporating observed data into prior knowledge or assumptions.
>
> Did some edit reorder things in this respect?

No doubt. Thanks for pointing that out.

> Also, you could use a reference to his work, maybe
>
>    http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html

Yup.

> ?  Finally, I'll notice that you skip over the icky issues of how to
> parse real world email :-)

Indeed I did. ;-) Though to be fair you don't actually have to parse
the email any more than I did to get (on my personal ham/spam corpus)
~98% accuracy. And I did mention at the end that that was an area ripe
for improvement.

> Otherwise, looks pretty good!

Thanks.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: neo88
Subject: Re: Practical Common Lisp gets practical
Date: 
Message-ID: <6a73bb68.0408030453.1fec2d6f@posting.google.com>
Peter Seibel <·····@javamonkey.com> wrote in message news:<··············@javamonkey.com>...
> I just put up a new chapter (chapter 18) of Practical Common Lisp.
> This is the first of what will be several "practical" chapters that
> will finish the book (or me ;-)) wherein I write some piece of actual
> code and talk about it. I realize that I may have used some language
> features that I haven't discussed elsewhere in the book--I'm not going
> to worry too much about it right now and figure out where to work that
> stuff in when I revise. So caveat newbie. (But feel free to send me
> comments if there are places you get particularly lost or confused.)
> As always I'm interested in any and all feedback from newbies and
> wizards alike.
> 
> -Peter

I have been waiting for that chapter to come out! I always wanted to
see what it took to write a spam-filter, since I am interested in
eventually writing one of my own. Excellent chapter and book.

-- 
May the Source be with you.
neo88 (Philip Haddad)
From: Björn Lindberg
Subject: Re: Practical Common Lisp gets practical
Date: 
Message-ID: <hcspt68mg51.fsf@my.nada.kth.se>
Peter Seibel <·····@javamonkey.com> writes:

> I just put up a new chapter (chapter 18) of Practical Common Lisp.
> This is the first of what will be several "practical" chapters that
> will finish the book (or me ;-)) wherein I write some piece of actual
> code and talk about it. I realize that I may have used some language
> features that I haven't discussed elsewhere in the book--I'm not going
> to worry too much about it right now and figure out where to work that
> stuff in when I revise. So caveat newbie. (But feel free to send me
> comments if there are places you get particularly lost or confused.)
> As always I'm interested in any and all feedback from newbies and
> wizards alike.

I have a question regarding your use of DECODE-FLOAT under the
"Avoiding Underflow" heading; this way of handling underflow seems
unnecessarily complicated to me -- for instance, several helper
functions are needed. Why don't you just take the logarithm of all
probabilities and sum them up? Especially since the logarithm is what
you want in the end:

  (log (reduce #'* probs))

becomes

  (reduce #'+ probs :key #'log)

If you need to guard against some of the probabilities being zero, you
can perhaps use something similar to

  (block foo
    (reduce #'+ probs
            :key #'(lambda (x)
                     (and (zerop x) 
                          (return-from foo 'division-by-zero))
                     (log x))))

Quite likely I overlooked something here though, and I am mostly
curious why you decided on using such an esoteric operator as
DECODE-FLOAT when it seems to me that something simpler would do. On
the other hand, I like the way your text many times casually
identifies a need for something, and then shows how the needed
functionality is already a part of standard Lisp. It gives the
impression that CL is a very rich language.


Bj�rn
From: Peter Seibel
Subject: Re: Practical Common Lisp gets practical
Date: 
Message-ID: <m365806x9y.fsf@javamonkey.com>
·······@nada.kth.se (Bj�rn Lindberg) writes:

> Peter Seibel <·····@javamonkey.com> writes:
>
>> I just put up a new chapter (chapter 18) of Practical Common Lisp.
>> This is the first of what will be several "practical" chapters that
>> will finish the book (or me ;-)) wherein I write some piece of actual
>> code and talk about it. I realize that I may have used some language
>> features that I haven't discussed elsewhere in the book--I'm not going
>> to worry too much about it right now and figure out where to work that
>> stuff in when I revise. So caveat newbie. (But feel free to send me
>> comments if there are places you get particularly lost or confused.)
>> As always I'm interested in any and all feedback from newbies and
>> wizards alike.
>
> I have a question regarding your use of DECODE-FLOAT under the
> "Avoiding Underflow" heading; this way of handling underflow seems
> unnecessarily complicated to me -- for instance, several helper
> functions are needed. Why don't you just take the logarithm of all
> probabilities and sum them up? Especially since the logarithm is what
> you want in the end:
>
>   (log (reduce #'* probs))
>
> becomes
>
>   (reduce #'+ probs :key #'log)

I knew the mathematical equivalence but the light didn't go on that I
could do that with REDUCE (I forgot about the :key argument.) Also the
way I did it was how they did it in Spambayes. Then I assumed--without
benchmarking, and you know what that gets you--that taking one log
would be faster. But I just actually timed the two and the summing
logs approach is much faster. So that'll be a good way to cut a good
chunk out of a too long chapter.

  SPAM> (let ((nums (loop repeat 100000 collect (let ((x (random 1.0d0))) (if (zerop x) least-positive-double-float x)))))
          (time (log-of-product nums))
          (time (reduce #'+ nums :key #'log)))
  ; cpu time (non-gc) 1,450 msec user, 0 msec system
  ; cpu time (gc)     220 msec user, 0 msec system
  ; cpu time (total)  1,670 msec user, 0 msec system
  ; real time  1,672 msec
  ; space allocation:
  ;  2 cons cells, 80,403,192 other bytes, 0 static bytes
  ; cpu time (non-gc) 280 msec user, 0 msec system
  ; cpu time (gc)     0 msec user, 0 msec system
  ; cpu time (total)  280 msec user, 0 msec system
  ; real time  280 msec
  ; space allocation:
  ;  5 cons cells, 4,800,016 other bytes, 0 static bytes
  -100642.8796056704d0

> Quite likely I overlooked something here though, and I am mostly
> curious why you decided on using such an esoteric operator as
> DECODE-FLOAT when it seems to me that something simpler would do.

Because the simpler thing is only simpler if you think of it. ;-)

> On the other hand, I like the way your text many times casually
> identifies a need for something, and then shows how the needed
> functionality is already a part of standard Lisp. It gives the
> impression that CL is a very rich language.

It is that.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Björn Lindberg
Subject: Re: Practical Common Lisp gets practical
Date: 
Message-ID: <hcs3c34m9sw.fsf@my.nada.kth.se>
Peter Seibel <·····@javamonkey.com> writes:

> ·······@nada.kth.se (Bj�rn Lindberg) writes:
> 
> > Peter Seibel <·····@javamonkey.com> writes:
> >
> >> I just put up a new chapter (chapter 18) of Practical Common Lisp.
> >> This is the first of what will be several "practical" chapters that
> >> will finish the book (or me ;-)) wherein I write some piece of actual
> >> code and talk about it. I realize that I may have used some language
> >> features that I haven't discussed elsewhere in the book--I'm not going
> >> to worry too much about it right now and figure out where to work that
> >> stuff in when I revise. So caveat newbie. (But feel free to send me
> >> comments if there are places you get particularly lost or confused.)
> >> As always I'm interested in any and all feedback from newbies and
> >> wizards alike.
> >
> > I have a question regarding your use of DECODE-FLOAT under the
> > "Avoiding Underflow" heading; this way of handling underflow seems
> > unnecessarily complicated to me -- for instance, several helper
> > functions are needed. Why don't you just take the logarithm of all
> > probabilities and sum them up? Especially since the logarithm is what
> > you want in the end:
> >
> >   (log (reduce #'* probs))
> >
> > becomes
> >
> >   (reduce #'+ probs :key #'log)
> 
> I knew the mathematical equivalence but the light didn't go on that I
> could do that with REDUCE (I forgot about the :key argument.) Also the
> way I did it was how they did it in Spambayes. Then I assumed--without
> benchmarking, and you know what that gets you--that taking one log
> would be faster. But I just actually timed the two and the summing
> logs approach is much faster. So that'll be a good way to cut a good
> chunk out of a too long chapter.

Yes, I did some quick measurements, and at least on CMUCL,
DECODE-FLOAT seems to take longer than LOG to execute.

>   SPAM> (let ((nums (loop repeat 100000 collect (let ((x (random 1.0d0))) (if (zerop x) least-positive-double-float x)))))
>           (time (log-of-product nums))
>           (time (reduce #'+ nums :key #'log)))
>   ; cpu time (non-gc) 1,450 msec user, 0 msec system
>   ; cpu time (gc)     220 msec user, 0 msec system
>   ; cpu time (total)  1,670 msec user, 0 msec system
>   ; real time  1,672 msec
>   ; space allocation:
>   ;  2 cons cells, 80,403,192 other bytes, 0 static bytes
>   ; cpu time (non-gc) 280 msec user, 0 msec system
>   ; cpu time (gc)     0 msec user, 0 msec system
>   ; cpu time (total)  280 msec user, 0 msec system
>   ; real time  280 msec
>   ; space allocation:
>   ;  5 cons cells, 4,800,016 other bytes, 0 static bytes
>   -100642.8796056704d0
> 
> > Quite likely I overlooked something here though, and I am mostly
> > curious why you decided on using such an esoteric operator as
> > DECODE-FLOAT when it seems to me that something simpler would do.
> 
> Because the simpler thing is only simpler if you think of it. ;-)

:-)

> > On the other hand, I like the way your text many times casually
> > identifies a need for something, and then shows how the needed
> > functionality is already a part of standard Lisp. It gives the
> > impression that CL is a very rich language.
> 
> It is that.

It is very nice how your text captures that. I have to say, I am
really enjoying your book so far. I think it will be the perfect kind
of book that you can show to non-lisp friends or acquaintances to get
them interested in Lisp.


Bj�rn
From: Raymond Toy
Subject: Re: Practical Common Lisp gets practical
Date: 
Message-ID: <sxd6580rq9z.fsf@edgedsp4.rtp.ericsson.se>
>>>>> "Bj�rn" == Bj�rn Lindberg <·······@nada.kth.se> writes:

    Bj�rn> Yes, I did some quick measurements, and at least on CMUCL,
    Bj�rn> DECODE-FLOAT seems to take longer than LOG to execute.

Assuming you ran this on an x86, I'm not really surprised.  x86 has a
pretty fast log function in hardware.  decode-float doesn't have the
hardware-assist, and may not be optimized as well.

Ray