New Paul Graham Article

From: sv0f
Subject: New Paul Graham Article
Date: Fri, 16 Aug 2002 18:30:01 +0000
Message-ID: <none-1608021330010001@129.59.212.53>

On a statistical approach to filtering spam, with Lisp code,
here:

http://www.paulgraham.com/spam.html

Discussion of this article is currently happening on Slashdot,
for the interested.

Re: New Paul Graham Article Christopher Browne
- Re: New Paul Graham Article Herb Martin
  - Re: New Paul Graham Article Christopher Browne
    - Re: New Paul Graham Article Herb Martin
    - Re: New Paul Graham Article Michael Sullivan
      - Re: New Paul Graham Article Rahul Jain
        Re: New Paul Graham Article Michael Hudson
        Re: New Paul Graham Article Håkon Alstadheim
        Re: New Paul Graham Article Christopher Browne
        Re: New Paul Graham Article Rahul Jain
        Re: New Paul Graham Article Christopher Browne
        Re: New Paul Graham Article Herb Martin
Re: New Paul Graham Article c hore
- Re: New Paul Graham Article Christopher Browne
Re: New Paul Graham Article AFS97209
- Re: New Paul Graham Article Herb Martin
- Re: New Paul Graham Article Herb Martin
- Re: New Paul Graham Article Christopher Browne
Re: New Paul Graham Article Erik Naggum
- Re: New Paul Graham Article JB
- Re: New Paul Graham Article xah
  - Re: New Paul Graham Article Erik Naggum
  - Re: New Paul Graham Article Kaz Kylheku
    - Re: New Paul Graham Article Thien-Thi Nguyen
    - Re: New Paul Graham Article Robert St. Amant
      - Re: New Paul Graham Article Erik Naggum
        Re: New Paul Graham Article Frode Vatvedt Fjeld
        Re: New Paul Graham Article synthespian
  - Xah, you are beautiful ilias
  - Re: New Paul Graham Article Michael Sullivan
    - Re: New Paul Graham Article Joe Marshall
      - Re: New Paul Graham Article John Carroll
  - Re: New Paul Graham Article Xah Lee
    - Re: New Paul Graham Article ilias
    - Re: New Paul Graham Article Joe Marshall
Re: New Paul Graham Article Joe Marshall
- Re: New Paul Graham Article Raffael Cavallaro

From: Christopher Browne
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 01:25:40 +0000
Message-ID: <ajk8mj$1c3qah$3@ID-125932.news.dfncis.de>

In an attempt to throw the authorities off his trail, ····@vanderbilt.edu (sv0f) transmitted:
> On a statistical approach to filtering spam, with Lisp code,
> here:
>
> http://www.paulgraham.com/spam.html
>
> Discussion of this article is currently happening on Slashdot,
> for the interested.

And if you want to work with an existing package that has been mature
for several years now, you might look at the URL below for "Ifile."  I
helped tune it to become pretty fast.

And I have to disagree somewhat with Graham's article; Naive Bayesian
filtering _doesn't_ provide _quite_ as good results as he implies.
Having both "sex" and "sexy" in a message does _not_ guarantee at P >
0.99 that messages will get tossed into the "spam" category.

My statistics for those words in my corpus are thus:

sexy 4525 424:28 426:2 449:1 456:1 

sex 62535 160:16 169:6 171:5 173:2 184:1 190:1 194:2 211:1 215:4 218:1
    221:15 224:3 226:1 234:2 237:11 238:1 239:2 241:1 244:1 247:1 249:11
    251:1 264:2 273:2 278:2 285:7 289:2 295:1 306:2 321:5 322:2 323:4
    324:9 327:14 332:2 334:2 343:15 346:2 347:1 350:5 352:1 354:2 362:4
    366:6 368:10 369:3 370:1 397:20 411:2 413:3 414:6 415:15 416:16 418:3
    421:17 423:1 424:338 425:11 426:23 432:2 433:2 439:3 442:1 459:2 465:3

The "424:28" indicates that the word "sexy" occurred 28 times in
folder #424, which happens to be the "Spam/Phonesex" folder.  #426 is
Spam/Snakeoil, #449 is X/Advocacy, with an instance of a quote about
people being "mesemerized by sexy glitz which distracts them from the
work at hand."  #456 pointed to a .signature with the word "sexy."

Frankly, the word "sexy" is a very _useful_ one.  (And looking at the
stats here has caused me to modify a couple email messages in my
archives, which will strengthen the result :-).)

Unfortunately, it's not only found in the "Phonesex" folder.
Instances are found here and there everywhere.  And there are other
words that are very common both in "evil spam" and in everyday
conversation.  Integrating the whole set of statistics together
requires adding up statistics for _all_ the words found in a message,
not just the words "sex" and "sexy."

My finding is that it is _nowhere_ near sufficient to have two
populations, "spam" versus "not spam."  

If you muddle together the Nigerian Pyramid schemes with the "Penis
enhancement" ads along with the offers of new credit cards as well as
the latest sites where you can talk to "hot, horny girls LIVE!", the
statistics don't work out nearly so well.

It's hard to tell, on the face of it, why Nigerian scams _should_ be
considered textually similar to phone sex ads, and in practice, the
result of throwing them all together 

I have my spam split into categories so that filtering is _even more
discriminatory_:

  Credit
  Foreign
  Gambling
  Investigators
  Newsletters
  Phonesex
  Pyramid
  Snakeoil
  Viruses

There are a few things left to improve about Ifile, and I'd like to
redo it in some language fundamentally less painful to work with than
C The project I periodically consider is to redo the filtering
software in Lisp.  Unfortunately, I wind up running into _tremendous_
bottlenecks each time I do so.  Some combination of my skills and the
tools at hand prove not quite adequate.  Maybe next time...
-- 
(concatenate 'string "chris" ·@cbbrowne.com")
http://cbbrowne.com/info/mail.html#ifile
Out of my mind. Back in five minutes.

From: Herb Martin
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 11:40:49 +0000
Message-ID: <5Dq79.300114$q53.9596537@twister.austin.rr.com>

> And if you want to work with an existing package that has been mature
> for several years now, you might look at the URL below for "Ifile."  I
> helped tune it to become pretty fast.

IFile's documentation and download page is
included at the end of Graham's article.

    http://www.ai.mit.edu/~jrennie/ifile/

> And I have to disagree somewhat with Graham's article; Naive Bayesian
> filtering _doesn't_ provide _quite_ as good results as he implies.
> Having both "sex" and "sexy" in a message does _not_ guarantee at P >
> 0.99 that messages will get tossed into the "spam" category.

I am not certain of your 'naive' filtering usage with
the example of only "included" words.  IFile's doc
page describes it's algorythm as "naive bayesian
filtering" as well.

Graham is using the words included in "good mail"
to counter this, as IFile seems to do.

Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

> And I have to disagree somewhat with Graham's article; Naive Bayesian
> filtering _doesn't_ provide _quite_ as good results as he implies.
> Having both "sex" and "sexy" in a message does _not_ guarantee at P >
> 0.99 that messages will get tossed into the "spam" category.
>
> My statistics for those words in my corpus are thus:
>
> sexy 4525 424:28 426:2 449:1 456:1
>
> sex 62535 160:16 169:6 171:5 173:2 184:1 190:1 194:2 211:1 215:4 218:1
>     221:15 224:3 226:1 234:2 237:11 238:1 239:2 241:1 244:1 247:1 249:11
>     251:1 264:2 273:2 278:2 285:7 289:2 295:1 306:2 321:5 322:2 323:4
>     324:9 327:14 332:2 334:2 343:15 346:2 347:1 350:5 352:1 354:2 362:4
>     366:6 368:10 369:3 370:1 397:20 411:2 413:3 414:6 415:15 416:16 418:3
>     421:17 423:1 424:338 425:11 426:23 432:2 433:2 439:3 442:1 459:2 465:3
>
> The "424:28" indicates that the word "sexy" occurred 28 times in
> folder #424, which happens to be the "Spam/Phonesex" folder.  #426 is
> Spam/Snakeoil, #449 is X/Advocacy, with an instance of a quote about
> people being "mesemerized by sexy glitz which distracts them from the
> work at hand."  #456 pointed to a .signature with the word "sexy."
>
> Frankly, the word "sexy" is a very _useful_ one.  (And looking at the
> stats here has caused me to modify a couple email messages in my
> archives, which will strengthen the result :-).)
>
> Unfortunately, it's not only found in the "Phonesex" folder.
> Instances are found here and there everywhere.  And there are other
> words that are very common both in "evil spam" and in everyday
> conversation.  Integrating the whole set of statistics together
> requires adding up statistics for _all_ the words found in a message,
> not just the words "sex" and "sexy."
>
> My finding is that it is _nowhere_ near sufficient to have two
> populations, "spam" versus "not spam."
>
> If you muddle together the Nigerian Pyramid schemes with the "Penis
> enhancement" ads along with the offers of new credit cards as well as
> the latest sites where you can talk to "hot, horny girls LIVE!", the
> statistics don't work out nearly so well.
>
> It's hard to tell, on the face of it, why Nigerian scams _should_ be
> considered textually similar to phone sex ads, and in practice, the
> result of throwing them all together
>
> I have my spam split into categories so that filtering is _even more
> discriminatory_:
>
>   Credit
>   Foreign
>   Gambling
>   Investigators
>   Newsletters
>   Phonesex
>   Pyramid
>   Snakeoil
>   Viruses
>
> There are a few things left to improve about Ifile, and I'd like to
> redo it in some language fundamentally less painful to work with than
> C The project I periodically consider is to redo the filtering
> software in Lisp.  Unfortunately, I wind up running into _tremendous_
> bottlenecks each time I do so.  Some combination of my skills and the
> tools at hand prove not quite adequate.  Maybe next time...
> --
> (concatenate 'string "chris" ·@cbbrowne.com")
> http://cbbrowne.com/info/mail.html#ifile
> Out of my mind. Back in five minutes.

From: Christopher Browne
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 14:31:42 +0000
Message-ID: <ajlmod$1bspp4$1@ID-125932.news.dfncis.de>

A long time ago, in a galaxy far, far away, "Herb Martin" <·····@LearnQuick.Com> wrote:
>> And if you want to work with an existing package that has been mature
>> for several years now, you might look at the URL below for "Ifile."  I
>> helped tune it to become pretty fast.
>
> IFile's documentation and download page is
> included at the end of Graham's article.
>
>     http://www.ai.mit.edu/~jrennie/ifile/
>
>> And I have to disagree somewhat with Graham's article; Naive Bayesian
>> filtering _doesn't_ provide _quite_ as good results as he implies.
>> Having both "sex" and "sexy" in a message does _not_ guarantee at P >
>> 0.99 that messages will get tossed into the "spam" category.
>
> I am not certain of your 'naive' filtering usage with the example of
> only "included" words.  IFile's doc page describes it's algorythm as
> "naive bayesian filtering" as well.
>
> Graham is using the words included in "good mail" to counter this,
> as IFile seems to do.

The point is that _all_ the words in the message are considered.

For instance, if I throw my message, which conspicuously contains both
the word "sex" and the word "sexy," purportedly surefire indications
of spam, at ifile, the fact that it mentions Ifile several times means
that it heads to the "Apps/Ifile" folder where resides my archives of
the last five years of Ifile discussions.

To consider _only_ the words "sex" and "sexy" is a severe
oversimplification.
-- 
(reverse (concatenate 'string ··········@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/lisp.html
Objects & Markets
"Object-oriented programming is about the modular separation of what
from how. Market-oriented, or agoric, programming additionally allows
the modular separation of why."
-- Mark Miller

From: Herb Martin
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 15:03:48 +0000
Message-ID: <oBt79.300442$q53.9631966@twister.austin.rr.com>

> The point is that _all_ the words in the message are considered.
>
> For instance, if I throw my message, which conspicuously contains both
> the word "sex" and the word "sexy," purportedly surefire indications
> of spam, at ifile, the fact that it mentions Ifile several times means
> that it heads to the "Apps/Ifile" folder where resides my archives of
> the last five years of Ifile discussions.
>
> To consider _only_ the words "sex" and "sexy" is a severe
> oversimplification.

Well that makes more sense.

What about Graham's method leads one to believe that IFile
would not be considered?  Several of the examples he gives
(using 'Lisp' for himself instead of 'Ifile' as you would) are
isomorphic to this issue -- he is including words from the "good
mail" as well.


--
Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

"Christopher Browne" <········@acm.org> wrote in message
····················@ID-125932.news.dfncis.de...
> A long time ago, in a galaxy far, far away, "Herb Martin"
<·····@LearnQuick.Com> wrote:
> >> And if you want to work with an existing package that has been mature
> >> for several years now, you might look at the URL below for "Ifile."  I
> >> helped tune it to become pretty fast.
> >
> > IFile's documentation and download page is
> > included at the end of Graham's article.
> >
> >     http://www.ai.mit.edu/~jrennie/ifile/
> >
> >> And I have to disagree somewhat with Graham's article; Naive Bayesian
> >> filtering _doesn't_ provide _quite_ as good results as he implies.
> >> Having both "sex" and "sexy" in a message does _not_ guarantee at P >
> >> 0.99 that messages will get tossed into the "spam" category.
> >
> > I am not certain of your 'naive' filtering usage with the example of
> > only "included" words.  IFile's doc page describes it's algorythm as
> > "naive bayesian filtering" as well.
> >
> > Graham is using the words included in "good mail" to counter this,
> > as IFile seems to do.
>
> --
> (reverse (concatenate 'string ··········@" "enworbbc"))
> http://www.ntlug.org/~cbbrowne/lisp.html
> Objects & Markets
> "Object-oriented programming is about the modular separation of what
> from how. Market-oriented, or agoric, programming additionally allows
> the modular separation of why."
> -- Mark Miller

From: Michael Sullivan
Subject: Re: New Paul Graham Article
Date: Mon, 19 Aug 2002 18:49:10 +0000
Message-ID: <1fh626u.1f3i7v7stu6xaN%michael@bcect.com>

Christopher Browne <········@acm.org> wrote:

> A long time ago, in a galaxy far, far away, "Herb Martin"
> <·····@LearnQuick.Com> wrote:
> >> And if you want to work with an existing package that has been mature
> >> for several years now, you might look at the URL below for "Ifile."  I
> >> helped tune it to become pretty fast.
> >
> > IFile's documentation and download page is
> > included at the end of Graham's article.
> >
> >     http://www.ai.mit.edu/~jrennie/ifile/
> >
> >> And I have to disagree somewhat with Graham's article; Naive Bayesian
> >> filtering _doesn't_ provide _quite_ as good results as he implies.
> >> Having both "sex" and "sexy" in a message does _not_ guarantee at P >
> >> 0.99 that messages will get tossed into the "spam" category.
> >
> > I am not certain of your 'naive' filtering usage with the example of
> > only "included" words.  IFile's doc page describes it's algorythm as
> > "naive bayesian filtering" as well.
> >
> > Graham is using the words included in "good mail" to counter this,
> > as IFile seems to do.

> The point is that _all_ the words in the message are considered.

Graham's algorithm *does* consider all the words, sort of.  It does a
hash lookup on every word, and then considers the fifteen words in that
mail that are the strongest signals (whether that be a signal of "good"
mail, or "bad") and does the bayes calculation on those.  It seems to me
that it wouldn't be all that computationally intensive to extend the
bayes calculation to more words.

I just did a very quick implementation of just the math and it looks
like speed is not the problem, but arbitrary precision.  With thousands
of words, you easily reach past the edge of the IEEE floating point spec
for some of your intermediary values, leading to a (/ x 0) situation.
With a good arbitrary precision math library, this is not an issue, but
it also appears that using the most significant 100-500 words is likely
to produce a certain result so often that it ought to be plenty.

I fed my bayes calculation pseudo random numbers and found that it was
generating probabilities over 4 sigma one way or another more than 1/2
the time using 100 numbers.  At 200 numbers, something like 80% were 5+
sigma, and a 100 run test did not produce a single probability between 5
and 95%.

So I'm guessing that using the most significant 200 numbers is unlikely
to produce results any different from doing the bayes calculation on
every last word.

The one scenario where I see trouble is a real message which for some
legitimate reason includes a forward of a spam example.  If there's
enough stuff added to the real message, his over-weighting of "good"
indicators will probably tip the scale.

But if it's a fairly short forward message, followed by an actual spam
(especially with full headers), it would almost certainly be tagged as
"spam", even though, this might be somebody trading information trying
to track down a spammer.  Or perhaps someone with too much time on their
hands read a spam and found it funny or otherwise interesting and
decided to pass it on to somebody.

I'm not sure how you can filter spam well without risking a false
positive in at least this case, but I suspect that this naive Bayesian
algorithm won't do the trick, unless there's a fair bit of "good"
content.  

> For instance, if I throw my message, which conspicuously contains both
> the word "sex" and the word "sexy," purportedly surefire indications
> of spam, at ifile, the fact that it mentions Ifile several times means
> that it heads to the "Apps/Ifile" folder where resides my archives of
> the last five years of Ifile discussions.

> To consider _only_ the words "sex" and "sexy" is a severe
> oversimplification.

Except that he doesn't actually do this.

Michael

-- 
Michael Sullivan
Business Card Express of CT             Thermographers to the Trade
Cheshire, CT                                      ·······@bcect.com

From: Rahul Jain
Subject: Re: New Paul Graham Article
Date: Mon, 19 Aug 2002 19:28:10 +0000
Message-ID: <87lm72veb9.fsf@photino.localnet>

·······@bcect.com (Michael Sullivan) writes:

> But if it's a fairly short forward message, followed by an actual spam
> (especially with full headers), it would almost certainly be tagged as
> "spam", even though, this might be somebody trading information trying
> to track down a spammer.  Or perhaps someone with too much time on their
> hands read a spam and found it funny or otherwise interesting and
> decided to pass it on to somebody.
> 
> I'm not sure how you can filter spam well without risking a false
> positive in at least this case, but I suspect that this naive Bayesian
> algorithm won't do the trick, unless there's a fair bit of "good"
> content.  

You can have the filter disabled for people you know won't send you
worthless messages.

-- 
-> -/                        - Rahul Jain -                        \- <-
-> -\  http://linux.rice.edu/~rahul -=-  ············@techie.com   /- <-
-> -X "Structure is nothing if it is all you got. Skeletons spook  X- <-
-> -/  people if [they] try to walk around on their own. I really  \- <-
-> -\  wonder why XML does not." -- Erik Naggum, comp.lang.lisp    /- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   (c)1996-2002, All rights reserved. Disclaimer available upon request.

From: Michael Hudson
Subject: Re: New Paul Graham Article
Date: Tue, 20 Aug 2002 13:04:40 +0000
Message-ID: <lkbs7xd6s7.fsf@pc150.maths.bris.ac.uk>

Rahul Jain <·····@rice.edu> writes:

> You can have the filter disabled for people you know won't send you
> worthless messages.

Until the next klez.

Cheers,
M.

-- 
  My hat is lined with tinfoil for protection in the unlikely event
  that the droid gets his PowerPoint presentation working.
                               -- Alan W. Frame, alt.sysadmin.recovery

From: Håkon Alstadheim
Subject: Re: New Paul Graham Article
Date: Thu, 22 Aug 2002 09:33:39 +0000
Message-ID: <m0ofbv5l4c.fsf@alstadhome.dyndns.org>

Michael Hudson <···@python.net> writes:

> Rahul Jain <·····@rice.edu> writes:
>
>> You can have the filter disabled for people you know won't send you
>> worthless messages.
>
> Until the next klez.
>

All this should in _theory_ be taken care of automatically by Graham's
filter: Known senders will be added to the "good" set of words (with
their usual posting hosts etc.) with P=1, and after a short while a
virus will be added to the "bad" set, with P=1. At the same time the
known senders will have their goodness reduced.
-- 
H�kon Alstadheim, hjemmepappa.

From: Christopher Browne
Subject: Re: New Paul Graham Article
Date: Tue, 20 Aug 2002 14:29:07 +0000
Message-ID: <ajtjni$193rfm$2@ID-125932.news.dfncis.de>

In the last exciting episode, Rahul Jain <·····@rice.edu> wrote::
> You can have the filter disabled for people you know won't send you
> worthless messages.

That doesn't work.  I get a lot of messages that claim to be from
·········@acm.org", which is the identity of someone I usually
_presume_ that I'd be prepared to trust fairly well.
-- 
(concatenate 'string "cbbrowne" ·@acm.org")
http://www3.sympatico.ca/cbbrowne/sap.html
Rules of the  Evil Overlord #215. "If I ever MUST  put a digital timer
on my  doomsday device,  I will buy  one free from  quantum mechanical
anomalies. So many brands on the market keep perfectly good time while
you're  looking at  them,  but whenever  you  turn away  for a  couple
minutes then turn back, you  find that the countdown has progressed by
only a few seconds." <http://www.eviloverlord.com/>

From: Rahul Jain
Subject: Re: New Paul Graham Article
Date: Tue, 20 Aug 2002 18:01:39 +0000
Message-ID: <87ptwd4dfg.fsf@photino.localnet>

Christopher Browne <········@acm.org> writes:

> In the last exciting episode, Rahul Jain <·····@rice.edu> wrote::
> > You can have the filter disabled for people you know won't send you
> > worthless messages.
> 
> That doesn't work.  I get a lot of messages that claim to be from
> ·········@acm.org", which is the identity of someone I usually
> _presume_ that I'd be prepared to trust fairly well.

I forgot to mention that you trace the Received headers and only
accept what trusted servers say (and assume that the servers on the
path between you and a friend are trusted).

-- 
-> -/                        - Rahul Jain -                        \- <-
-> -\  http://linux.rice.edu/~rahul -=-  ············@techie.com   /- <-
-> -X "Structure is nothing if it is all you got. Skeletons spook  X- <-
-> -/  people if [they] try to walk around on their own. I really  \- <-
-> -\  wonder why XML does not." -- Erik Naggum, comp.lang.lisp    /- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   (c)1996-2002, All rights reserved. Disclaimer available upon request.

From: Christopher Browne
Subject: Re: New Paul Graham Article
Date: Thu, 22 Aug 2002 11:49:23 +0000
Message-ID: <ak2j42$1f74v0$2@ID-125932.news.dfncis.de>

A long time ago, in a galaxy far, far away, Rahul Jain <·····@rice.edu> wrote:
> ·······@bcect.com (Michael Sullivan) writes:
>
>> But if it's a fairly short forward message, followed by an actual spam
>> (especially with full headers), it would almost certainly be tagged as
>> "spam", even though, this might be somebody trading information trying
>> to track down a spammer.  Or perhaps someone with too much time on their
>> hands read a spam and found it funny or otherwise interesting and
>> decided to pass it on to somebody.
>> 
>> I'm not sure how you can filter spam well without risking a false
>> positive in at least this case, but I suspect that this naive Bayesian
>> algorithm won't do the trick, unless there's a fair bit of "good"
>> content.  
>
> You can have the filter disabled for people you know won't send you
> worthless messages.

That's one method.

Another is that the headers of a 'real' message will look rather more
like those appropriate for 'real' message folders.

This sort of thing is the pathological sort of situation for such
discrimination systems, and if it _does_ get confused at this, it
shouldn't come as any great surprise, because _any_ kind of censorship
system will have problems with this sort of thing.

For instance, if "pictures of naked people" are considered to be
obscenity, and thereby "illegal" (in some manner), what happens when
you have:

 a) An anatomy guide, which _intentionally_ has pictures of naked
    people?

 b) An issue of the "Journal of Surgery" which not only has pictures
    of "naked people," but further, in the special issue on "Treating
    Victims of Sexual Crimes" issue, has Really, Really Nasty Stuff?

 c) A special issue of "Abnormal Psychology Today" specifically on the
    effects of viewing pornography?

 d) An explicit documentary about the ill effects of pornography on
    the status of women?  (This film exists as the Canadian NFB's _Not
    a Love Story_, where viewings apparently are often plagued by
    visits by the police carrying notice of obscenity charges.)

A researcher or doctor, doing their work, occasionally needs this sort
of material.  

This sort of material is fairly likely, in other sorts of hands, to be
treated in a prurient manner.

If you were to want to send me an archive of "spam," it would be
advisable for you to bundle it carefully, putting it into a
compressed, possibly even encrypted, archive, so that it _doesn't_
look like spam whilst in transit.

Ditto if you and I were trying to share a copy of a "virulent"
computer virus.  Don't send it as is, so that I might conceivably be
infected by it: when the CDC transfers potentially dangerous
substances from one lab to another, they need to bundle the substances
up _carefully_.

I've got a fairly nice "corpus" of spam; if someone wanted a copy, you
can be _sure_ that I'd put warning labels on it mentioning (amongst
other things):

 -> This archive contains illegal business proposals;

 -> This archive contains sexually oriented material, some of which is
    _highly_ offensive, some of which is likely to be considered illegal
    obscenity in your jurisdiction;

 -> This may contain attempts at computer security exploits, which
    might be damaging to your computer system.

"If you do not expressly promise to be careful and discreet in what
 you do with what I'm sending you, I'm certainly not giving it to you.
 I don't want trouble to result from sending it to you."

If you think I'm not serious, think again.  I would think _hard_ about
the legal implications before I would _consider_ passing on a copy of
my "spam corpus," and the possibility of unexpected legal action most
_certainly_ would be in my mind.  That Russian fellow spent much of
last year in jail just because of an _accusation_ of breaking the
DMCA.  The makers of _Not A Love Story_ saw the chilling effect that
people get arrested for showing their movie.
-- 
(concatenate 'string "aa454" ·@freenet.carleton.ca")
http://www.ntlug.org/~cbbrowne/ifilter.html
"Over a hundred years ago, the German poet Heine
 warned the French not to underestimate the power of ideas:
 philosophical concepts nurtured in the stillness of a
 professor's study could destroy a civilization."
    --Isaiah Berlin in /The Power of Ideas/

From: Herb Martin
Subject: Re: New Paul Graham Article
Date: Thu, 22 Aug 2002 16:31:50 +0000
Message-ID: <Wl899.91927$eK6.2900264@twister.austin.rr.com>

> This sort of thing is the pathological sort of situation for such
> discrimination systems, and if it _does_ get confused at this, it
> shouldn't come as any great surprise, because _any_ kind of censorship
> system will have problems with this sort of thing.
>
> For instance, if "pictures of naked people" are considered to be
> obscenity, and thereby "illegal" (in some manner), what happens when
> you have:
>
>  a) An anatomy guide, which _intentionally_ has pictures of naked
>     people?
>
>  b) An issue of the "Journal of Surgery" which not only has pictures
>     of "naked people," but further, in the special issue on "Treating
>     Victims of Sexual Crimes" issue, has Really, Really Nasty Stuff?
>
>  c) A special issue of "Abnormal Psychology Today" specifically on the
>     effects of viewing pornography?
>
>  d) An explicit documentary about the ill effects of pornography on
>     the status of women?  (This film exists as the Canadian NFB's _Not
>     a Love Story_, where viewings apparently are often plagued by
>     visits by the police carrying notice of obscenity charges.)

You know Graham discusses this sort of thing
in his article and indicates it handles it very well.

--
Herb Martin, PP-SEL
(...and aerobatic student)
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

"Christopher Browne" <········@acm.org> wrote in message
····················@ID-125932.news.dfncis.de...
> A long time ago, in a galaxy far, far away, Rahul Jain <·····@rice.edu>
wrote:
> > ·······@bcect.com (Michael Sullivan) writes:
> >
> >> But if it's a fairly short forward message, followed by an actual spam
> >> (especially with full headers), it would almost certainly be tagged as
> >> "spam", even though, this might be somebody trading information trying
> >> to track down a spammer.  Or perhaps someone with too much time on
their
> >> hands read a spam and found it funny or otherwise interesting and
> >> decided to pass it on to somebody.
> >>
> >> I'm not sure how you can filter spam well without risking a false
> >> positive in at least this case, but I suspect that this naive Bayesian
> >> algorithm won't do the trick, unless there's a fair bit of "good"
> >> content.
> >
> > You can have the filter disabled for people you know won't send you
> > worthless messages.
>
> That's one method.
>
> Another is that the headers of a 'real' message will look rather more
> like those appropriate for 'real' message folders.
>
> This sort of thing is the pathological sort of situation for such
> discrimination systems, and if it _does_ get confused at this, it
> shouldn't come as any great surprise, because _any_ kind of censorship
> system will have problems with this sort of thing.
>
> For instance, if "pictures of naked people" are considered to be
> obscenity, and thereby "illegal" (in some manner), what happens when
> you have:
>
>  a) An anatomy guide, which _intentionally_ has pictures of naked
>     people?
>
>  b) An issue of the "Journal of Surgery" which not only has pictures
>     of "naked people," but further, in the special issue on "Treating
>     Victims of Sexual Crimes" issue, has Really, Really Nasty Stuff?
>
>  c) A special issue of "Abnormal Psychology Today" specifically on the
>     effects of viewing pornography?
>
>  d) An explicit documentary about the ill effects of pornography on
>     the status of women?  (This film exists as the Canadian NFB's _Not
>     a Love Story_, where viewings apparently are often plagued by
>     visits by the police carrying notice of obscenity charges.)
>
> A researcher or doctor, doing their work, occasionally needs this sort
> of material.
>
> This sort of material is fairly likely, in other sorts of hands, to be
> treated in a prurient manner.
>
> If you were to want to send me an archive of "spam," it would be
> advisable for you to bundle it carefully, putting it into a
> compressed, possibly even encrypted, archive, so that it _doesn't_
> look like spam whilst in transit.
>
> Ditto if you and I were trying to share a copy of a "virulent"
> computer virus.  Don't send it as is, so that I might conceivably be
> infected by it: when the CDC transfers potentially dangerous
> substances from one lab to another, they need to bundle the substances
> up _carefully_.
>
> I've got a fairly nice "corpus" of spam; if someone wanted a copy, you
> can be _sure_ that I'd put warning labels on it mentioning (amongst
> other things):
>
>  -> This archive contains illegal business proposals;
>
>  -> This archive contains sexually oriented material, some of which is
>     _highly_ offensive, some of which is likely to be considered illegal
>     obscenity in your jurisdiction;
>
>  -> This may contain attempts at computer security exploits, which
>     might be damaging to your computer system.
>
> "If you do not expressly promise to be careful and discreet in what
>  you do with what I'm sending you, I'm certainly not giving it to you.
>  I don't want trouble to result from sending it to you."
>
> If you think I'm not serious, think again.  I would think _hard_ about
> the legal implications before I would _consider_ passing on a copy of
> my "spam corpus," and the possibility of unexpected legal action most
> _certainly_ would be in my mind.  That Russian fellow spent much of
> last year in jail just because of an _accusation_ of breaking the
> DMCA.  The makers of _Not A Love Story_ saw the chilling effect that
> people get arrested for showing their movie.
> --
> (concatenate 'string "aa454" ·@freenet.carleton.ca")
> http://www.ntlug.org/~cbbrowne/ifilter.html
> "Over a hundred years ago, the German poet Heine
>  warned the French not to underestimate the power of ideas:
>  philosophical concepts nurtured in the stillness of a
>  professor's study could destroy a civilization."
>     --Isaiah Berlin in /The Power of Ideas/

From: c hore
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 08:58:38 +0000
Message-ID: <ca167c61.0208170058.6a226070@posting.google.com>

> On a statistical approach to filtering spam, with Lisp code,
> here:
> http://www.paulgraham.com/spam.html

Most of the spam I receive seems to be images, presumably
to bypass text-based filters.  I suppose you would have to
run character recognition first on an image before any
text filter, Bayesian or otherwise, could be applied?

From: Christopher Browne
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 13:32:12 +0000
Message-ID: <ajlj8s$1bhq9o$2@ID-125932.news.dfncis.de>

The world rejoiced as ·······@yahoo.com (c hore) wrote:
>> On a statistical approach to filtering spam, with Lisp code,
>> here:
>> http://www.paulgraham.com/spam.html
>
> Most of the spam I receive seems to be images, presumably to bypass
> text-based filters.  I suppose you would have to run character
> recognition first on an image before any text filter, Bayesian or
> otherwise, could be applied?

No, it wouldn't be necessary.

If you have a population of messages that consist just of images,
that's going to bias the vocabulary statistics since there will be
lots of words like "multipart" and "alternative" and "jpeg", and very
few of the "legitimate" words that people use when they send you real
mail.

Remember, if this is being used well, you're not merely classifying
between "spam" and "not spam;" you're classifying into a multiplicity
of _legitimate_ categories, such as:

-> Mail from family members
-> Mail from this friend 
-> Mail from that friend 
-> Mail from the other friend 
-> Email from "technical associates," by person
-> Email from mailing lists, arranged _by mailing list_
-> And so forth, for legitimate categories...

combined, preferably, with "spam" that gets classified so that you can
get finer discrimination

-> Pyramid scams
-> Credit card offers
-> Breast/Penis enhancements, Viagra ads, weight loss, stop smoking
   plans, ...
-> Computer Viruses
and such.

The spam _isn't_ likely to have similar vocabulary to the email you
get from legitimate sources.

If something with totally new characteristics comes along, it may get
misfiled, at which point you move it to a more appropriate folder
(perhaps even a new folder), and it becomes part of the new corpus,
directing future similar spam to the right place.
-- 
(reverse (concatenate 'string ····················@" "454aa"))
http://www.ntlug.org/~cbbrowne/ifilter.html
Rules of the  Evil Overlord #60. "My five-year-old  child advisor will
also  be asked to  decipher any  code I  am thinking  of using.  If he
breaks the code  in under 30 seconds, it will not  be used. Note: this
also applies to passwords." <http://www.eviloverlord.com/>

From: AFS97209
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 09:20:57 +0000
Message-ID: <6dfa3582.0208170120.330064a8@posting.google.com>

How effective is it in filtering out requsts from African govenments
to launder money?

From: Herb Martin
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 11:01:45 +0000
Message-ID: <t2q79.300108$q53.9593581@twister.austin.rr.com>

> How effective is it in filtering out requsts from African govenments
> to launder money?

Apparently very effectic -- Graham discusses that
in specific.

But the key is that it is TUNED to the particular user
by running a pre-processor through both "good mail"
and "spam mail" databases.

The article is worth a quick read.

--
Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

"AFS97209" <········@yahoo.com> wrote in message
·································@posting.google.com...
> How effective is it in filtering out requsts from African govenments
> to launder money?

From: Herb Martin
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 11:33:19 +0000
Message-ID: <3wq79.300113$q53.9596214@twister.austin.rr.com>

The article is worth a quick read.

There is also a FAQ listed at the bottom.

> How effective is it in filtering out requsts from African govenments
> to launder money?

Apparently very effectic -- Graham discusses that
in specific.

But the key is that it is TUNED to the particular user
by running a pre-processor through both "good mail"
and "spam mail" databases.

From the FAQ (someone in this thread asked about
graphics):

<quote from faq>
What if spammers sent their messages as images?

Such an email would include a lot of damning content,
actually. The headers, to start with, would be as bad
as ever. And remember that we scan all the html as
well as the text. Within the message body there would
probably be a link as well as the image, both containing
urls, which would probably score high. "Href" and "img"
themselves both have spam probabilities approaching
pornographic words.

<end quote from faq>
Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds


> How effective is it in filtering out requsts from African govenments
> to launder money?

Apparently very effective -- Graham discusses that
in specific.

But the key is that it is TUNED to the particular user
by running a pre-processor through both "good mail"
and "spam mail" databases.

The article is worth a quick read.

--
Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds



--
Herb Martin, PP-SEL
(...and aerobatic student)
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

"AFS97209" <········@yahoo.com> wrote in message
·································@posting.google.com...
> How effective is it in filtering out requsts from African govenments
> to launder money?

From: Christopher Browne
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 13:32:11 +0000
Message-ID: <ajlj8r$1bhq9o$1@ID-125932.news.dfncis.de>

In the last exciting episode, ········@yahoo.com (AFS97209) wrote::
> How effective is it in filtering out requsts from African govenments
> to launder money?

Very much so.  Those messages head to Spam/Pyramid and nowhere else.

The contents of the messages involve a set of vocabulary that are
quite repetitive between messages, so it's an _ideal_ candidate for
Naive Baysian networks.
-- 
(reverse (concatenate 'string ·············@" "sirhc"))
http://cbbrowne.com/info/spiritual.html
"There are two ways of  constructing a software design:  One way is to
make it so  simple that there are  obviously no deficiencies,  and the
other   way is to make it   so complicated that   there are no obvious
deficiencies.  The first method is far more difficult."
-- C.A.R. Hoare

From: Erik Naggum
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 03:51:05 +0000
Message-ID: <3238545065548483@naggum.no>

* ····@vanderbilt.edu (sv0f)
| On a statistical approach to filtering spam, with Lisp code, here:

  Spam has to be dealt with at the transport level.  The ability of strangers
  to send you mail must be curtailed.  Several large sites offer a system to
  reject all mail from unknown correspondents, temporarily or permanently, and
  wait for the reader of the log to accept incoming mail from addresses that
  look familiar.  Another option is to accept delivery but return transport-
  like error messages if the user does not want the message.  Yet another
  option is to see if the smtp client is set up to accept mail for the domain
  that it tries to deliver mail from.  Yet another option is to temporarily
  reject all mail from unknown sources and utilize the fact that spammers have
  no resources to queue messages for later delivery.  And then you can always
  implement a scheme that returns a temporary rejection, but sends a mail to
  the originator independently asking for confirmation that he is human and by
  accepting the conditions that unsolicited commercial e-mail carries a fee
  that /will/ be collected.  Failure to accept the conditions will cause the
  temporary rejection never to be lifted, thus using up queue space in the
  offending server, which any sysadmin will notice and take care of even if
  they do not bother to fix their system configuration to avoid relaying spam.
  Should the conditions be accepted, the message is allowed through.

  If you allow the message to be delivered and waste CPU or brain time, the
  spammers have won a small victory.  That is just wrong.  Spammers must die.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

From: JB
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 08:03:03 +0000
Message-ID: <3d5e014a_3@news.newsgroups.com>

Erik Naggum wrote:

>   If you allow the message to be delivered and waste CPU
>   or brain time, the
>   spammers have won a small victory.  That is just wrong. 
>   Spammers must die.
> 

The countermeasure you mention in you message should be 
taken by the mail service provider. Otherwise I should have 
to implement a mail client.

In my case the following happened: Immediately after I 
started posting to newsgroups, I started getting mails in 
which I was offered help with my debts or I was given 
advice as to how to make certain parts of my body larger.

I did the following:

(1) I stopped appending a valid email address to my mails
(2) I set up several mail accounts. All but one contain my 
initials in some way and there I sometimes still get spam. 
But one account is well hidden and only my friends know it. 
I never got spam there.

I think that first the users should agree upon spam being 
evil. (There is no such agreement yet.) Then there should 
be a law against spam. And then police action could be 
taken.

-- 
Janos Blazi

-----------== Posted via Newsfeed.Com - Uncensored Usenet News ==----------
   http://www.newsfeed.com       The #1 Newsgroup Service in the World!
-----= Over 100,000 Newsgroups - Unlimited Fast Downloads - 19 Servers =-----

From: xah
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 12:23:23 +0000
Message-ID: <B9838E4A.2CAF%xah@xahlee.org>

There are two lispy big wigs, namely Paul Graham and Erik Naggum, who thinks
their hotshot mouthing on spamming is something of value.

Their outpouring, is not unlike that of damming of drivel flood.

In the treatment of flood, there is the brute force of building of a dam.
This is the no-nonsense solution of brutes the likes of technology geeks.
Among them are the brute elite the likes of Graham'n'Naggum, who speak on
fine engineering far ahead of fellow brutes.

The phenomenon of spam is a human-social phenomenon. Spammers spam because
it is effective. Consumers'S mouths says no but their actions says yes,
because for the vast majority they are unthinking and happy-go-lucky brutes.
These brutes decline spam, but when they happen upon a spam that pleases
them, such as a great porno site advertisement or something else that caught
their personal interest, they will follow the advertisement. For spammers,
spam is effective if there is one response out of one thousand. Similarly,
brutes will respond to one out of one thousand hateful spams. The prospect
of spamming is thus kept alive by the populous brutes, everywhere, despite
tumultuous noises they makes.

As a human community or society, the treatment of spam is up to us, not
fucking technology. (fuck Erik Naggum. Fuck you. You got it?) The gist is
"what do we want?". (Note: not "what technology geeks want" or "what fucking
unix morons want".) If we do not want spam, there are two ways to get rid of
it: Thru law, and thru inaction. The former is a well-known proposal in the
process. The latter, is possible only if human are not lazy unthinking
beer-drinking brutes, which is never going to happen.

--
(gratuitous poem)

  i will be building a dam, when there is a pussy flood.
  for i am a collector of nature, and a lover of its treasure.
  i will use my savings, to quench those thirsty, and lubricate those angry,
  for i have suffered and suffered; the pain and anguish.
               -- Xah Lee

I dedicate this poem to Erik Naggum.

PS i request that anyone who read so far and find it worthwhile to send me
an email with the line "Xah, you are beautiful.". Thanks in advance.

 Xah
 ···@xahlee.org
 http://xahlee.org/PageTwo_dir/more.html


in article ················@naggum.no, Erik Naggum at ····@naggum.no wrote
on 8/16/02 8:51 PM:

> * ····@vanderbilt.edu (sv0f)
> | On a statistical approach to filtering spam, with Lisp code, here:
> 
> Spam has to be dealt with at the transport level.  The ability of strangers
> to send you mail must be curtailed.  Several large sites offer a system to
> reject all mail from unknown correspondents, temporarily or permanently, and
> wait for the reader of the log to accept incoming mail from addresses that
> look familiar.  Another option is to accept delivery but return transport-
> like error messages if the user does not want the message.  Yet another
> option is to see if the smtp client is set up to accept mail for the domain
> that it tries to deliver mail from.  Yet another option is to temporarily
> reject all mail from unknown sources and utilize the fact that spammers have
> no resources to queue messages for later delivery.  And then you can always
> implement a scheme that returns a temporary rejection, but sends a mail to
> the originator independently asking for confirmation that he is human and by
> accepting the conditions that unsolicited commercial e-mail carries a fee
> that /will/ be collected.  Failure to accept the conditions will cause the
> temporary rejection never to be lifted, thus using up queue space in the
> offending server, which any sysadmin will notice and take care of even if
> they do not bother to fix their system configuration to avoid relaying spam.
> Should the conditions be accepted, the message is allowed through.
> 
> If you allow the message to be delivered and waste CPU or brain time, the
> spammers have won a small victory.  That is just wrong.  Spammers must die.

From: Erik Naggum
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 20:42:29 +0000
Message-ID: <3238605749940964@naggum.no>

* xah <···@xahlee.org>
| (fuck Erik Naggum. Fuck you. You got it?)

  Got it.  Now get on with your life.  Thank you.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

From: Kaz Kylheku
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 22:20:54 +0000
Message-ID: <ajmi85$3sd$1@luna.vcn.bc.ca>

In article <·················@xahlee.org>, xah wrote:
> The phenomenon of spam is a human-social phenomenon. Spammers spam because
> it is effective.

That's only because you can't see spammers for the anti-social twits that they
are, who will keep spamming even when it's not effective.  Or they will define
their acceptable effectiveness to be something ridiculously low, like one
positive response from ten million spams. Or even define negative responses as
good responses, so that ``don't send me this crap'' earns one a permanent spot
in their list.

Spamming is not effective in any sense of the word that an actual marketer
would comprehend.

Now, why *don't* you see spammers for the anti-social twits that they are?  I
have my own idea about that.

From: Thien-Thi Nguyen
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 22:51:35 +0000
Message-ID: <kk9heht9k08.fsf@glug.org>

Kaz Kylheku <···@ashi.footprints.net> writes:

> Spamming is not effective in any sense of the word that an actual marketer
> would comprehend.

well clearly you have been lucky enough not to spend too much time around
(professional) marketers, who take great pains in safe-guarding their power to
comprehend everything positively.  their job is to foist this inability to
discern the feedback loop onto others (primarily professional sales people).
this is because when business is good, nobody cares, it's only when business
is bad that self-examination is painful.  it's no surprise that professional
sales people also take it to be their job to point the finger back at the
marketers.

whoever thought up the sales / marketing (organizational) partitioning was
probably a consultant weeping in anticipation of the spoils to be reaped from
the turf wars imminent.  split the mind and sell aspirin...

(actually, i have no clue what your background w/ these professions are; these
ramblings are from my own limited experience as a naive geek co-founding a
chip company where the only lisp involved was emacs lisp...  for round two,
i'd like to work my way up through the tool chain w/ lisp but somehow i got
distracted lo these last four years.)

thi

From: Robert St. Amant
Subject: Re: New Paul Graham Article
Date: Mon, 19 Aug 2002 19:36:51 +0000
Message-ID: <lpnu1lqod2k.fsf@haeckel.csc.ncsu.edu>

Kaz Kylheku <···@ashi.footprints.net> writes:

> In article <·················@xahlee.org>, xah wrote:
> > The phenomenon of spam is a human-social phenomenon. Spammers spam because
> > it is effective.
> 
> That's only because you can't see spammers for the anti-social twits that they
> are, who will keep spamming even when it's not effective.  Or they will define
> their acceptable effectiveness to be something ridiculously low, like one
> positive response from ten million spams. Or even define negative responses as
> good responses, so that ``don't send me this crap'' earns one a permanent spot
> in their list.
> 
> Spamming is not effective in any sense of the word that an actual marketer
> would comprehend.

From an article in this week's Newsweek, titled "Spamming the World"
(http://www.msnbc.com/news/792491.asp):

     One bulk e-mailer says that when she started spamming in 1999,
     she could send out 100,000 e-mails and get 25 responses. Today,
     she has to send out a million messages to get the same response
     (a 0.0025 percent hit rate).

It's interesting reading.  I don't think spammers will ever stop (like
telemarketers), as long as they're getting *any* responses.  Short of
lawsuits, that is.

-- 
Rob St. Amant
http://www4.ncsu.edu/~stamant

From: Erik Naggum
Subject: Re: New Paul Graham Article
Date: Mon, 19 Aug 2002 20:59:30 +0000
Message-ID: <3238779570482046@naggum.no>

* Robert St. Amant
| It's interesting reading.  I don't think spammers will ever stop (like
| telemarketers), as long as they're getting *any* responses.  Short of
| lawsuits, that is.

  I am actually amazed that out of the million people needed to get 25
  responses, there has not yet been a single potential psychopathic axe
  murderer living in the spammer's city.  Imagine just /one/ such case.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

From: Frode Vatvedt Fjeld
Subject: Re: New Paul Graham Article
Date: Tue, 20 Aug 2002 10:03:11 +0000
Message-ID: <2hofbxhmow.fsf@vserver.cs.uit.no>

Erik Naggum <····@naggum.no> writes:

> I am actually amazed that out of the million people needed to get 25
> responses, there has not yet been a single potential psychopathic
> axe murderer living in the spammer's city.  Imagine just /one/ such
> case.

And when the judge asks the axeman if he's got anything to say in his
defense, he'd say he "just wanted to help the economy and add to the
GNP. Please realize this."

-- 
Frode Vatvedt Fjeld

From: synthespian
Subject: Re: New Paul Graham Article
Date: Wed, 21 Aug 2002 04:12:14 +0000
Message-ID: <pan.2002.08.21.01.12.12.340528.18918@debian-rs.org>

On Mon, 19 Aug 2002 17:59:30 -0300, Erik Naggum wrote:

> 
>   I am actually amazed that out of the million people needed to get 25
>   responses, there has not yet been a single potential psychopathic axe
>   murderer living in the spammer's city.  Imagine just /one/ such case.
> 

 Hahahaha... :-) But that's because psycopaths are 1% of the
population (read Robert Hare). We need more spam! :-))

	Cheers,
	Henry
-- 
________________________________________________________________
Micro$oft-Free Human         100% Debian GNU/Linux
     KMFMS              "Bring the genome to the people!

From: ilias
Subject: Xah, you are beautiful
Date: Sun, 18 Aug 2002 14:47:58 +0000
Message-ID: <3D5FB39E.2060506@pontos.net>

xah wrote:
> There are two lispy big wigs, namely Paul Graham and Erik Naggum, who thinks
> their hotshot mouthing on spamming is something of value.
> 
> Their outpouring, is not unlike that of damming of drivel flood.
> 
> In the treatment of flood, there is the brute force of building of a dam.
> This is the no-nonsense solution of brutes the likes of technology geeks.
> Among them are the brute elite the likes of Graham'n'Naggum, who speak on
> fine engineering far ahead of fellow brutes.

...
> These brutes decline spam, but when they happen upon a spam that pleases
> them, such as a great porno site advertisement or something else that caught
> their personal interest, they will follow the advertisement.

i don't like spam.
when something is interesting (it happens, technically, tittically) i 
try to push the delete-button before i read more,
sometimes i'm not able.

> For spammers,
> spam is effective if there is one response out of one thousand. Similarly,
> brutes will respond to one out of one thousand hateful spams. The prospect
> of spamming is thus kept alive by the populous brutes, everywhere, despite
> tumultuous noises they makes.
> 
> As a human community or society, the treatment of spam is up to us, not
> fucking technology. 

that is partly correct.

is up to us, assisted by (fucking or not) technology

> (fuck Erik Naggum. Fuck you. You got it?)

"fuck Erik Naggum".

"Fuck *you*" [relates to 'Erik Naggum', relates to 'the reader', relates 
to 'Paul Graham', relates to technology-lovers?]

"you got it?"

no, please clarify.

> The gist is
> "what do we want?". 

"we". who belongs to "we".

> (Note: not "what technology geeks want" or "what fucking
> unix morons want".) 

are they not included in "we"?

> If we do not want spam, there are two ways to get rid of
> it: Thru law, and thru inaction. 

- law
- inaction
- wisdom
- technology
- creativity
- cooperation
- understanding
- ...

solving the problem when working together.

> The former is a well-known proposal in the
> process. The latter, is possible only if human are not lazy unthinking
> beer-drinking brutes, which is never going to happen.

Take a human with a common intelligence, and place him in a group of 
gorillas, he'll be a brilliant individuum (relative).

if he insists in that group on he's brilliance, the gorillas will give 
him a brilliant fuck.

The "lazy unthinking beer-drinking brutes"-group belong to the 
problem-domain, basicly it is the most important and unchangeable part.

When ignoring this, you declare youreself as a complete idiot.

but you're maybe simply jailous [someone wants something what anotherone 
has]. Cause of your inability of "don't think - drink beer - and fuck - 
be happy"

> PS i request that anyone who read so far and find it worthwhile to send me
> an email with the line "Xah, you are beautiful.". Thanks in advance.

sorry, no email.

i've place it in the subject.

From: Michael Sullivan
Subject: Re: New Paul Graham Article
Date: Mon, 19 Aug 2002 18:49:12 +0000
Message-ID: <1fh63id.15bxu5pizyf4bN%michael@bcect.com>

xah <···@xahlee.org> wrote:

> The phenomenon of spam is a human-social phenomenon. Spammers spam because
> it is effective. Consumers'S mouths says no but their actions says yes,
> because for the vast majority they are unthinking and happy-go-lucky brutes.

In fact, the problem with spam is not that large numbers of people
respond, but that it is so cheap to send (for the spammer anyway, since
the cost is distributed amongst the recipients and those who share their
systems and networks) that nearly *any* response is effective for them.

The problem with spam is that it is theft.  If spammers actually had to
bear the costs of their spam, they would never send it, because the
response rates are ridiculously low.  Since they do not, and it is cheap
and easy to send out hundreds of millions of messages, a response rate
of ten in a million is perfectly acceptable to them.

I think that Graham may be right, that if good spam filtering became
normal and automatic in nearly every email client (or server), that
response rates might eventually drop so low that it would become
worthless to spam.

Michael

-- 
Michael Sullivan
Business Card Express of CT             Thermographers to the Trade
Cheshire, CT                                      ·······@bcect.com

From: Joe Marshall
Subject: Re: New Paul Graham Article
Date: Mon, 19 Aug 2002 23:26:12 +0000
Message-ID: <o8f89.2487$aA.632@sccrnsc02>

"Michael Sullivan" <·······@bcect.com> wrote in message ···································@bcect.com...
>
> In fact, the problem with spam is not that large numbers of people
> respond, but that it is so cheap to send (for the spammer anyway, since
> the cost is distributed amongst the recipients and those who share their
> systems and networks) that nearly *any* response is effective for them.

Nearly any *valid* response is effective.  One part of the reason that
spam works is that it is possible to `identify' the 25 people out of
the million that act upon the message.  When you spam a million email
addresses most of the recipients discard or ignore the message.
The set of people that respond to the spam is *much* richer
in suckers than the original set of people identified by their
addresses.

If *every* spam yielded a (possibly bogus) response, then the
value of spamming would be severely decreased.  Spamming a set
of email addresses would yield no information about which recipients
are suckers because they *all* seem to be.  Putting a URL in the
spam would be useless because it would simply cause a million
automatic `hits' on the page.

> The problem with spam is that it is theft.  If spammers actually had to
> bear the costs of their spam, they would never send it, because the
> response rates are ridiculously low.  Since they do not, and it is cheap
> and easy to send out hundreds of millions of messages, a response rate
> of ten in a million is perfectly acceptable to them.

But a response rate of a million in a million would *not* be
acceptable.

From: John Carroll
Subject: Re: New Paul Graham Article
Date: Tue, 20 Aug 2002 10:32:23 +0000
Message-ID: <johnca-96172B.11322320082002@ames.central.susx.ac.uk>

In article <·················@sccrnsc02>,
 "Joe Marshall" <·············@attbi.com> wrote:

> "Michael Sullivan" <·······@bcect.com> wrote in message 
> ···································@bcect.com...
> >
> > In fact, the problem with spam is not that large numbers of people
> > respond, but that it is so cheap to send (for the spammer anyway, since
> > the cost is distributed amongst the recipients and those who share their
> > systems and networks) that nearly *any* response is effective for them.
> 
> Nearly any *valid* response is effective.  One part of the reason that
> spam works is that it is possible to `identify' the 25 people out of
> the million that act upon the message.  When you spam a million email
> addresses most of the recipients discard or ignore the message.
> The set of people that respond to the spam is *much* richer
> in suckers than the original set of people identified by their
> addresses.
> 
> If *every* spam yielded a (possibly bogus) response, then the
> value of spamming would be severely decreased.  Spamming a set
> of email addresses would yield no information about which recipients
> are suckers because they *all* seem to be.  Putting a URL in the
> spam would be useless because it would simply cause a million
> automatic `hits' on the page.

So the most effective spam processing system would extract URLs /
email / fax / telephone contacts from the spam and automatically
respond with a plausible sounding covering message (of course making
sure not to identify the sender in any way).

Then the replies from the 25 suckers would be submerged. Of course
the spammers might then resort to software that tried to
automatically detect just the real messages from the suckers and
discard the rest.

John

From: Xah Lee
Subject: Re: New Paul Graham Article
Date: Wed, 21 Aug 2002 05:55:28 +0000
Message-ID: <7fe97cc4.0208202155.6a55643f@posting.google.com>

A series of replies in this thread reminded me Spy vs Spy.

Recall that Spy vs Spy was a popular comic by Antonio Prohias that
appears in Mad magazine.

Here's a few snap shots:
http://images.amazon.com/images/P/0823050211.01.LZZZZZZZ.jpg
http://www.collectmad.com/britishcovers/pro_spy1.jpg
http://www.collectmad.com/collectibles/bbsvsc.jpg

the theme being two archenemic spies, colored one white and one black,
who better each other on schemes and technologies. One creates a
voice-recognition missile, then the other invents a voice-exchanging
device. The final frame of the comic would have the second spy
shrieking with mirth and a victory pose over the mishaps of the other.
Turn to the next installment and the winner & loser are reversed: We
see one spy excitedly plans a booby trap. When he enters the other
spy's house to install the bomb, he got blown up because the other spy
has spied on his scheme. Again the hilariously smug victory pose over
the misfortune of the other.

Their fight is endless. Over and over we read with glee over the silly
stratagems and incredible technologies they devices that befalls on
themselves.

As i sit here and read the technology geeking morons fighting with
spammers.


Bibliography:

some snap shot
http://www.leedberg.com/mad/spies/spies.html

a mad cover featuring Spy vs Spi
http://www.collectmad.com/britishcovers/5thmad.htm
http://www.collectmad.com/collectibles/bbsvsc.htm

Spy Vs. Spy: The Complete Casebook by Antonio Prohias
http://www.amazon.com/exec/obidos/ASIN/0823050211/xahhome-20

A dress-up of Spy vs Spy
http://members.aol.com/nebula5/spyvspy.html

 Xah
 ···@xahlee.org
 http://xahlee.org/PageTwo_dir/more.html

From: ilias
Subject: Re: New Paul Graham Article
Date: Wed, 21 Aug 2002 09:50:01 +0000
Message-ID: <3D636249.2090807@pontos.net>

Xah Lee wrote:
> A series of replies in this thread reminded me Spy vs Spy.
> 
> Recall that Spy vs Spy was a popular comic by Antonio Prohias that
> appears in Mad magazine.
one of my favorite.

> Their fight is endless. Over and over we read with glee over the silly
> stratagems and incredible technologies they devices that befalls on
> themselves.
> 
> As i sit here and read the technology geeking morons fighting with
> spammers.

Spy vs. Spy categories:

- "Spy vs. Spy"-fighters.

- "Spy vs. Spy"-talkers.

wake up.

From: Joe Marshall
Subject: Re: New Paul Graham Article
Date: Wed, 21 Aug 2002 14:49:16 +0000
Message-ID: <MLN89.185371$UU1.33866@sccrnsc03>

"Xah Lee" <···@xahlee.org> wrote in message ·································@posting.google.com...
> A series of replies in this thread reminded me Spy vs Spy.
>
> As i sit here and read the technology geeking morons fighting with
> spammers.

Touch�.

From: Joe Marshall
Subject: Re: New Paul Graham Article
Date: Sat, 17 Aug 2002 20:34:29 +0000
Message-ID: <pry79.59612$983.72590@rwcrnsc53>

"sv0f" <····@vanderbilt.edu> wrote in message ··························@129.59.212.53...
> On a statistical approach to filtering spam, with Lisp code,
> here:
>
> http://www.paulgraham.com/spam.html
>
> Discussion of this article is currently happening on Slashdot,
> for the interested.

Perhaps this technique could be used to filter out the large
amount of crap postings on this newsgroup.

From: Raffael Cavallaro
Subject: Re: New Paul Graham Article
Date: Wed, 21 Aug 2002 04:52:50 +0000
Message-ID: <aeb7ff58.0208202052.476729e@posting.google.com>

"Joe Marshall" <·············@attbi.com> wrote in message news:<·····················@rwcrnsc53>...

> Perhaps this technique could be used to filter out the large
> amount of crap postings on this newsgroup.

Perhaps you were being facetious, but the Bayesian approach described
by Paul Graham could certainly be used to filter out whatever you
consider to be undesirable content from *any* set of text based
messages. And it "learns," adapting to novel garbage as it arises.

Raf