lisp and security

From: John Thingstad
Subject: lisp and security
Date: Tue, 24 Feb 2004 17:37:57 +0000
Message-ID: <opr3vxxjq0xfnb1n@news.chello.no>

We were having a discussion on lisp l-expressions vs XML the other days.
I metioned that I saw security problems in using read eval to input lisp 
data structures.
This was rudly dismissed as a security problem not a lanuage problem.
Wery well.
But it is a issue if you create a template based language in lisp and 
allow arbirary users
to write/modify them. The point is although it is easy to write such a 
template structure
using macroes it is by no means trivial to prevent the user from entering 
arbitrary
code. This is the security issue.
In XML with XML schema checking you have good control over the data being 
read.
Such a mechanism does not exist in lisp.
I am still not convinced that it is easier to prototype using lisp 
l-expression,
then later in the producion code define and implement a XML grammar and 
compile it to
this l-expression, than to try to read the file securely.
Anyhow it troubles me. I need more experience in implementing this I guess.
Any thoughts?

John

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Re: lisp and security Erann Gat
Re: lisp and security Christopher C. Stacy
Re: lisp and security Wade Humeniuk
- Re: lisp and security Erann Gat
  - Re: lisp and security Will Hartung
    - Re: lisp and security Erann Gat
      - Re: lisp and security Barry Margolin
        Re: lisp and security Erann Gat
        Re: lisp and security Barry Margolin
        Re: lisp and security Pascal Bourguignon
  - Re: lisp and security Joe Marshall
  - Re: lisp and security Wade Humeniuk
    - Re: lisp and security Steven E. Harris
      - Re: lisp and security Wade Humeniuk
      - Re: lisp and security Christopher C. Stacy
        Re: lisp and security Steven E. Harris
        Re: lisp and security Christopher C. Stacy
      - Re: lisp and security Christopher C. Stacy
        Re: lisp and security Steven E. Harris
        Re: lisp and security Christopher C. Stacy
    - Re: lisp and security Erann Gat
  - Re: lisp and security Vladimir Sedach
- Re: lisp and security Chris Perkins
  - Re: lisp and security Pascal Bourguignon
Re: lisp and security Pascal Costanza

From: Erann Gat
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 18:04:30 +0000
Message-ID: <gNOSPAMat-2402041004300001@k-137-79-50-101.jpl.nasa.gov>

In article <················@news.chello.no>, John Thingstad
<··············@chello.no> wrote:

> The point is although it is easy to write such a 
> template structure using macroes it is by no means trivial
> to prevent the user from entering arbitrary code.

It is in fact impossible to prevent the user from entering anything they
want, including arbitrary code (or, to be more precise, arbitrary text,
including text that looks like code).

It is easy, however, to prevent text that looks like code from actually
becoming executing code.

> Any thoughts?

You seem to be fundamentally confused about something, though I'm not sure
exactly what it is.  My guess is that you do not understand the
distinction between text, s-expressions, and code.  These are three
completely separate things, and you seem to be conflating them, which
results in your perceiving a problem where in fact there is none.

E.

From: Christopher C. Stacy
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 17:50:46 +0000
Message-ID: <u8yiswemh.fsf@news.dtpq.com>

>>>>> On Tue, 24 Feb 2004 18:37:57 +0100, John Thingstad ("John") writes:
 John> We were having a discussion on lisp l-expressions vs XML the
 John> other days.  I metioned that I saw security problems in using
 John> read eval to input lisp data structures.  This was rudly
 John> dismissed as a security problem not a lanuage problem.

I don't know what an "l-expression" is supposed to be,
but in Lisp we have these things called "s-expressions".
An "s-expression" is nothing more than some tokens that
are enclosed by parenthesis.  There is no reason why such
a piece of data would unexpectedly be executed as a program.

 John> Anyhow it troubles me. I need more experience in implementing
 John> this I guess.  Any thoughts?

Try learning Lisp?

From: Wade Humeniuk
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 18:49:00 +0000
Message-ID: <w0N_b.37117$n17.1385@clgrps13>

John Thingstad wrote:
> 
> We were having a discussion on lisp l-expressions vs XML the other days.
> I metioned that I saw security problems in using read eval to input lisp 
> data structures.
> This was rudly dismissed as a security problem not a lanuage problem.
> Wery well.

If I am reading you right, you are thinking of allowing users to
input arbritrary Lisp expressions to be EVAL'ed??  READ can be used
with *READ-EVAL* set to nil.  Then no READ expression can be EVAL'ed
without your explicit consent.

Possible problem I see with using READ for arbritrary expressions
is that someone can send an expression so long that it could cause
the app to run out of system resources.  Or, someone keeps sending
arbitrary/random symbols that are interned and eventually overrun
the system.  But, there are programming ways around this.

Wade

From: Erann Gat
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 19:11:41 +0000
Message-ID: <gNOSPAMat-2402041111410001@k-137-79-50-101.jpl.nasa.gov>

In article <····················@clgrps13>, Wade Humeniuk
<····································@telus.net> wrote:

> John Thingstad wrote:
> > 
> > We were having a discussion on lisp l-expressions vs XML the other days.
> > I metioned that I saw security problems in using read eval to input lisp 
> > data structures.
> > This was rudly dismissed as a security problem not a lanuage problem.
> > Wery well.
> 
> If I am reading you right, you are thinking of allowing users to
> input arbritrary Lisp expressions to be EVAL'ed??  READ can be used
> with *READ-EVAL* set to nil.  Then no READ expression can be EVAL'ed
> without your explicit consent.
> 
> Possible problem I see with using READ for arbritrary expressions
> is that someone can send an expression so long that it could cause
> the app to run out of system resources.  Or, someone keeps sending
> arbitrary/random symbols that are interned and eventually overrun
> the system.  But, there are programming ways around this.

There are?  What are they?  (And what about an arbitrarily long string, or
a symbol with an arbitrarily long name, or an arbitrarily large integer,
or an arbitraryly long string of open-parens

I think it is true that the standard Lisp READ function is not secure and
cannot be made secure except with vendor-specific extensions.  (But note
this is very different from saying that S-expressions are insecure, which
they are not.)

E.

From: Will Hartung
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 21:57:37 +0000
Message-ID: <c1ggev$1hh6ms$1@ID-197644.news.uni-berlin.de>

"Erann Gat" <·········@jpl.nasa.gov> wrote in message
·······························@k-137-79-50-101.jpl.nasa.gov...
> In article <····················@clgrps13>, Wade Humeniuk
> <····································@telus.net> wrote:
>
> > John Thingstad wrote:
> >
> > Possible problem I see with using READ for arbritrary expressions
> > is that someone can send an expression so long that it could cause
> > the app to run out of system resources.  Or, someone keeps sending
> > arbitrary/random symbols that are interned and eventually overrun
> > the system.  But, there are programming ways around this.
>
> There are?  What are they?  (And what about an arbitrarily long string, or
> a symbol with an arbitrarily long name, or an arbitrarily large integer,
> or an arbitraryly long string of open-parens

And how this differs from someone blindly using the DOM model of XML, which
essentially sucks in the entire tree into memory, thereby risking the exact
same problem, eludes me.

Regards,

Will Hartung
(·····@msoft.com)

From: Erann Gat
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 00:23:41 +0000
Message-ID: <gNOSPAMat-2402041623410001@k-137-79-50-101.jpl.nasa.gov>

In article <···············@ID-197644.news.uni-berlin.de>, "Will Hartung"
<·····@msoft.com> wrote:

> "Erann Gat" <·········@jpl.nasa.gov> wrote in message
> ·······························@k-137-79-50-101.jpl.nasa.gov...
> > In article <····················@clgrps13>, Wade Humeniuk
> > <····································@telus.net> wrote:
> >
> > > John Thingstad wrote:
> > >
> > > Possible problem I see with using READ for arbritrary expressions
> > > is that someone can send an expression so long that it could cause
> > > the app to run out of system resources.  Or, someone keeps sending
> > > arbitrary/random symbols that are interned and eventually overrun
> > > the system.  But, there are programming ways around this.
> >
> > There are?  What are they?  (And what about an arbitrarily long string, or
> > a symbol with an arbitrarily long name, or an arbitrarily large integer,
> > or an arbitraryly long string of open-parens
> 
> And how this differs from someone blindly using the DOM model of XML, which
> essentially sucks in the entire tree into memory, thereby risking the exact
> same problem, eludes me.

It doesn't.  Just because other approaches may encounter the same problem
doesn't mean it's not a problem.

E.

From: Barry Margolin
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 05:09:28 +0000
Message-ID: <barmar-E5B54B.00092825022004@comcast.ash.giganews.com>

In article <··························@k-137-79-50-101.jpl.nasa.gov>,
 ·········@jpl.nasa.gov (Erann Gat) wrote:

> > And how this differs from someone blindly using the DOM model of XML, which
> > essentially sucks in the entire tree into memory, thereby risking the exact
> > same problem, eludes me.
> 
> It doesn't.  Just because other approaches may encounter the same problem
> doesn't mean it's not a problem.

Go back to the original post.  The problem was raised in a discussion 
about using Lisp S-Expressions versus XML, and the claim was that this 
is one of the reasons not to use Lisp.  If both approaches share a 
security failing, then it's irrelevant in the comparison.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

From: Erann Gat
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 07:18:27 +0000
Message-ID: <gNOSPAMat-2402042318270001@192.168.1.52>

In article <····························@comcast.ash.giganews.com>, Barry
Margolin <······@alum.mit.edu> wrote:

> In article <··························@k-137-79-50-101.jpl.nasa.gov>,
>  ·········@jpl.nasa.gov (Erann Gat) wrote:
> 
> > > And how this differs from someone blindly using the DOM model of
XML, which
> > > essentially sucks in the entire tree into memory, thereby risking
the exact
> > > same problem, eludes me.
> > 
> > It doesn't.  Just because other approaches may encounter the same problem
> > doesn't mean it's not a problem.
> 
> Go back to the original post.  The problem was raised in a discussion 
> about using Lisp S-Expressions versus XML, and the claim was that this 
> is one of the reasons not to use Lisp.  If both approaches share a 
> security failing, then it's irrelevant in the comparison.

To paraphrase Will, how this differs from what I said eludes me.

E.

From: Barry Margolin
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 21:48:28 +0000
Message-ID: <barmar-A1BB07.16482825022004@comcast.ash.giganews.com>

In article <··························@192.168.1.52>,
 ·········@jpl.nasa.gov (Erann Gat) wrote:

> In article <····························@comcast.ash.giganews.com>, Barry
> Margolin <······@alum.mit.edu> wrote:
> 
> > In article <··························@k-137-79-50-101.jpl.nasa.gov>,
> >  ·········@jpl.nasa.gov (Erann Gat) wrote:
> > 
> > > > And how this differs from someone blindly using the DOM model of
> XML, which
> > > > essentially sucks in the entire tree into memory, thereby risking
> the exact
> > > > same problem, eludes me.
> > > 
> > > It doesn't.  Just because other approaches may encounter the same problem
> > > doesn't mean it's not a problem.
> > 
> > Go back to the original post.  The problem was raised in a discussion 
> > about using Lisp S-Expressions versus XML, and the claim was that this 
> > is one of the reasons not to use Lisp.  If both approaches share a 
> > security failing, then it's irrelevant in the comparison.
> 
> To paraphrase Will, how this differs from what I said eludes me.

It might be a problem, but it's not the problem we're discussing.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

From: Pascal Bourguignon
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 05:58:16 +0000
Message-ID: <877jyb66pz.fsf@thalassa.informatimago.com>

Barry Margolin <······@alum.mit.edu> writes:
> Go back to the original post.  The problem was raised in a discussion 
> about using Lisp S-Expressions versus XML, and the claim was that this 
> is one of the reasons not to use Lisp.  If both approaches share a 
> security failing, then it's irrelevant in the comparison.

AFAIK, Microsoft  does not use Lisp  for its software,  but they still
demonstrate a  dramatic confusion  of data and  program leading  to at
least half of their security issues.  

So indeed, the lisp reader cannot be held against lisp on the security
balance.

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/

From: Joe Marshall
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 21:01:12 +0000
Message-ID: <8yisjip3.fsf@ccs.neu.edu>

·········@jpl.nasa.gov (Erann Gat) writes:

> In article <····················@clgrps13>, Wade Humeniuk
> <····································@telus.net> wrote:
>> 
>> Possible problem I see with using READ for arbritrary expressions
>> is that someone can send an expression so long that it could cause
>> the app to run out of system resources.  Or, someone keeps sending
>> arbitrary/random symbols that are interned and eventually overrun
>> the system.  But, there are programming ways around this.
>
> There are?  What are they?  (And what about an arbitrarily long string, or
> a symbol with an arbitrarily long name, or an arbitrarily large integer,
> or an arbitraryly long string of open-parens

How about hacking the stream itself to discard colons and truncate
after a certain number of characters?

From: Wade Humeniuk
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 20:25:01 +0000
Message-ID: <xqO_b.50955$Ff2.33978@clgrps12>

Erann Gat wrote:
> In article <····················@clgrps13>, Wade Humeniuk
> <····································@telus.net> wrote:

> There are?  What are they?  (And what about an arbitrarily long string, or
> a symbol with an arbitrarily long name, or an arbitrarily large integer,
> or an arbitraryly long string of open-parens
> 

The simpliest thing I can think of is to wrap the READ in a WITH-TIMEOUT like
protection.  READ will be interrupted if it takes longer than a designated
time.  Another is to use READ-CHAR to gather input into a string/temp-file, applying
on-the-go resource restriction checks and parens detection, then READ from the string/
temp-file.

Overloading the symbol tables with garbage symbols can be be
fixed by interning the symbols in a temporary package.  Wipe out the
symbols deemed as garbage or even wipe out all local symbols between each
READ.

There seems to be built in limits for integers, strings/arrays that READ
would ERROR on.  Arbitrarily long lists could be a problem, but WITH-TIMEOUT
might be made handle it.

All this being said, in my own web server I eschewed READ for these
the very concerns and resorted to my own character based parser.

> I think it is true that the standard Lisp READ function is not secure and
> cannot be made secure except with vendor-specific extensions.  (But note
> this is very different from saying that S-expressions are insecure, which
> they are not.)

True, one can write one's own shawdowed READ that handles malicious/erroneous inputs
better.

Wade

From: Steven E. Harris
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 21:39:19 +0000
Message-ID: <q67wu6c88e0.fsf@L75001820.us.ray.com>

Wade Humeniuk <····································@telus.net> writes:

> Another is to use READ-CHAR to gather input into a string/temp-file,
> applying on-the-go resource restriction checks and parens detection,
> then READ from the string/ temp-file.

That sounds like, by the time you're done, you will have rewritten
most of READ, considering you'd have to properly ignore quoted
parentheses.

[...]

> All this being said, in my own web server I eschewed READ for these
> the very concerns and resorted to my own character based parser.

That's depressing to hear, considering how often we hear that one not
need write such parsers with Common Lisp: "Just use READ, it's built
in!"

Being relatively new to Lisp, I have the tendency to consider
serialization formats in terms of how I'd write a parser for them. I
have been trying to unlearn that thinking and understand how to use
what's already there in Common Lisp to do less, and do it better. But
then threads like this one come along that seem to say, "It's a neat
idea, but in practice it doesn't work."

This makes me think back to the distinction in XML parser styles:
stream-based (a la SAX), or graph-based (a la DOM). One can feed a
huge, arbitrarily-structured XML instance to a DOM parser and bring
the system to its knees, for the parser has no criteria for when to
stop or give up. It's only done when the final end tag is read.

By contrast, the SAX parser puts the consumer in control. The consumer
can halt the parse the first time some invalid structure is
encountered. If you're in the middle of collecting characters for a
<name> element and an <employee> element starts, you can bail out. Or
you can bail out if a name exceeds, say, 256 characters.

It would be nice if there was some callback model like this for READ
that would allow the consumer to monitor what structures are being
assembled from the input stream. I recall a thread here last year
suggesting using READ-FROM-STRING instead, consuming small buffers
collected from some input stream, but I'm not sure that would work for
large trees or graphs that start out like

(graph :name "foo"
(node :id 1 ...)
(node :id 2 ...)
(node :id 3 ...)
...

because READ won't finish until all the "nodes" in the enclosing
"graph" have been read -- just like the DOM analogy above. So we're
back to rethinking the serialization format to help overcome
deficiencies in the parser. For the example above, if our intention is
not to hold the entire "graph" in memory, but maybe to filter it or
write it to a database, we would have to write something more like

(graph :name "foo")
(add-node "foo" :id 1 ...)
(add-node "foo" :id 2 ...)

so that each sexp presents a small, tenable job for READ to handle.

--
Steven E. Harris :: ········@raytheon.com
Raytheon :: http://www.raytheon.com

From: Wade Humeniuk
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 22:54:37 +0000
Message-ID: <NCQ_b.64499$Hy3.41664@edtnps89>

Steven E. Harris wrote:
> Wade Humeniuk <····································@telus.net> writes:
> 
> 
>>Another is to use READ-CHAR to gather input into a string/temp-file,
>>applying on-the-go resource restriction checks and parens detection,
>>then READ from the string/ temp-file.
> 
> 
> That sounds like, by the time you're done, you will have rewritten
> most of READ, considering you'd have to properly ignore quoted
> parentheses.
> 
> [...]
> 

Its not that bad, maybe 20-30 lines of Lisp (of course the readtable
has to be pretty well fixed for that).

> 
>>All this being said, in my own web server I eschewed READ for these
>>the very concerns and resorted to my own character based parser.
> 
> 
> That's depressing to hear, considering how often we hear that one not
> need write such parsers with Common Lisp: "Just use READ, it's built
> in!"
> 

One of the reasons I used my own parser is that the input was HTTP
which is definitely not s-expression based and it is known how
malicious http-clients could potentially be.  For your own Lisp to
Lisp communication, with trusted clients and servers it can
work (Worse is Better).

The real world is a messy place, so I do not find it depressing, its
a fact of life.

> Being relatively new to Lisp, I have the tendency to consider
> serialization formats in terms of how I'd write a parser for them. I
> have been trying to unlearn that thinking and understand how to use
> what's already there in Common Lisp to do less, and do it better. But
> then threads like this one come along that seem to say, "It's a neat
> idea, but in practice it doesn't work."
> 

It does work, just with constraints.

> 
> This makes me think back to the distinction in XML parser styles:
> stream-based (a la SAX), or graph-based (a la DOM). One can feed a
> huge, arbitrarily-structured XML instance to a DOM parser and bring
> the system to its knees, for the parser has no criteria for when to
> stop or give up. It's only done when the final end tag is read.
> 
> By contrast, the SAX parser puts the consumer in control. The consumer
> can halt the parse the first time some invalid structure is
> encountered. If you're in the middle of collecting characters for a
> <name> element and an <employee> element starts, you can bail out. Or
> you can bail out if a name exceeds, say, 256 characters.
> 

Yes the exact same behaviour with the standard read.  Though I am willing
to bet there existed a Lisp that had a bulletproof READ.

Wade

From: Christopher C. Stacy
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 23:14:28 +0000
Message-ID: <ubrnom5nw.fsf@news.dtpq.com>

>>>>> On Tue, 24 Feb 2004 13:39:19 -0800, Steven E Harris ("Steven") writes:
 Steven> By contrast, the SAX parser puts the consumer in control.

 Steven> It would be nice if there was some callback model like this
 Steven> for READ that would allow the consumer to monitor what
 Steven> structures are being assembled from the input stream.

This would be handy, and that sort of protocol has been available
in some older versions of Lisp, but Common Lisp doesn't have it.
On the other hand, As someone noted, it's quite trivial 
to implement your own version of READ.

READ can be useful when you're processing an input that has 
known properties (eg. that you produced yourself).  But it's
primary purpose is for the programmer to manipulate Lisp code,
not as a true general-purpose serialization operator.

When people talk about s-expressions having benefits over XML,
the implementation issue is that it's trivial to implement a
suitable version of READ, and highly non-trivial to implement 
an XML parser.  The thing is, the s-expression solution will
buy you the same thing as the hairy XML solution.  And once
you've trivially parsed that input, you have a structure that 
Lisp is naturally good at fiddling with (including lists
and symbols), rather than some weird artificial XML abstraction.

From: Steven E. Harris
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 23:34:36 +0000
Message-ID: <q67ishw831v.fsf@L75001820.us.ray.com>

······@news.dtpq.com (Christopher C. Stacy) writes:

> On the other hand, As someone noted, it's quite trivial to implement
> your own version of READ.

Can you point out any examples of customized user-defined READ
variants? Is there a set of functions one builds up from (such as
read-delimited-list�)?

Footnotes: 
� Which happens to exhibit the "don't stop eating until all the food
  is gone" behavior we've been discussing.

-- 
Steven E. Harris        :: ········@raytheon.com
Raytheon                :: http://www.raytheon.com

From: Christopher C. Stacy
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 23:40:52 +0000
Message-ID: <uwu6ckpvf.fsf@news.dtpq.com>

>>>>> On Tue, 24 Feb 2004 15:34:36 -0800, Steven E Harris ("Steven") writes:

 Steven> ······@news.dtpq.com (Christopher C. Stacy) writes:
 >> On the other hand, As someone noted, it's quite trivial to implement
 >> your own version of READ.

 Steven> Can you point out any examples of customized user-defined
 Steven> READ variants? Is there a set of functions one builds up from
 Steven> (such as read-delimited-list�)?

You're making it too complicated. 
Just do it by hand with READ-CHAR, VECTOR-PUSH-EXTEND,
CHAR=, a stack or two, and INTERN.

From: Christopher C. Stacy
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 23:35:04 +0000
Message-ID: <u65dwm4pj.fsf@news.dtpq.com>

>>>>> On Tue, 24 Feb 2004 13:39:19 -0800, Steven E Harris ("Steven") writes:

 Steven> By contrast, the SAX parser puts the consumer in control.
 Steven> The consumer can halt the parse the first time some invalid
 Steven> structure is encountered. If you're in the middle of
 Steven> collecting characters for a <name> element and an <employee>
 Steven> element starts, you can bail out. Or you can bail out if a
 Steven> name exceeds, say, 256 characters.

For those not familiar with SAX, what he's talking about is that
you can register "handler" functions that will be called upon
certain parse events in the XMLReader interface.

The analagous thing in Lisp would be handler functions that could 
get called READ sees an open-paren, close-paren, a token, 
whitespace, and comments.

 Steven> Or you can bail out if a name exceeds, say, 256 characters.

So you could XMLReader.setContentHandler a ContentHandler.characters
method that would notice when characters are received.

But I don't think there's any promise from SAX that this won't overrun
some implementation-defined limits.  The API just says that it's going
to hand you an array of characters when it feels like it.   It doesn't
even guarantee that Unicode characters will not be split across calls.

Maybe there's some way to tell SAX to call this for every character, and
also control the resources (all down the line) used for input buffering?

Certainly that's not how people really write SAX programs, though.
Therefore they are no more immune to this problem than calling READ.

From: Steven E. Harris
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 17:35:45 +0000
Message-ID: <q671xoj83ke.fsf@L75001820.us.ray.com>

······@news.dtpq.com (Christopher C. Stacy) writes:

> But I don't think there's any promise from SAX that this won't
> overrun some implementation-defined limits.  The API just says that
> it's going to hand you an array of characters when it feels like it.
> It doesn't even guarantee that Unicode characters will not be split
> across calls.

Back in 1998 I spent a lot of time working with expat, an XML parser
written in C, with an interface similar to SAX. One could force expat
to use a provided buffer for its input rather than giving it access to
a stream; a single-character buffer was enough to feed it for each
parsing step. To permit this kind of chunked parsing, expat maintained
state in between provided chunks.� If the input contained some huge
element name, expat would have to buffer that name before reporting a
"start element" event, but the buffers used for that kind of storage
were of configurable size.

> Maybe there's some way to tell SAX to call this for every character,
> and also control the resources (all down the line) used for input
> buffering?

I don't know about SAX today, but expat permitted this as described
above.

> Certainly that's not how people really write SAX programs, though.
> Therefore they are no more immune to this problem than calling READ.

That's exactly how I wrote my programs wrapped around expat. My
programs had hard limits, and I can't think of how I could have driven
expat to exceed limits that it didn't let me set or catch.

Footnotes: 
� I recall reading that expat was written this way to enable inclusion in
  Netscape, anticipating the input XML would arrive in small chunks
  read over HTTP. This non-blocking scheme allows one to avoid
  dedicating a thread to expat's parsing.

-- 
Steven E. Harris        :: ········@raytheon.com
Raytheon                :: http://www.raytheon.com

From: Christopher C. Stacy
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 18:40:05 +0000
Message-ID: <ufzczgfzu.fsf@news.dtpq.com>

>>>>> On Wed, 25 Feb 2004 09:35:45 -0800, Steven E Harris ("Steven") writes:

 Steven> ······@news.dtpq.com (Christopher C. Stacy) writes:
 >> But I don't think there's any promise from SAX that this won't
 >> overrun some implementation-defined limits.  The API just says that
 >> it's going to hand you an array of characters when it feels like it.
 >> It doesn't even guarantee that Unicode characters will not be split
 >> across calls.

 Steven> Back in 1998 I spent a lot of time working with expat, an XML parser
 Steven> written in C, with an interface similar to SAX. One could force expat
 Steven> to use a provided buffer for its input rather than giving it access to
 Steven> a stream; a single-character buffer was enough to feed it for each
 Steven> parsing step. To permit this kind of chunked parsing, expat maintained

If you just want to prevent a data overflow, then you would use 
the extensible stream facility (either "Gray" or "Simple" streams) 
to chunk the input to READ.  That alone does not provide a way to
discriminate events such as token completion, though.
But it does protect you from unbounded input.

If you just have one bufferfull that you're willing to allocate 
up front (and assume everything must fit), then you can just call
READ-SEQUENCE to read the buffer, and then READ-FROM-STRING it.

If you really need to know when every token is being parsed, 
then you have to write your own version of READ, as described 
earlier.  (Or maybe your implementation's vendor has the hooks
that you need, and can help you, but that's not standard.)

From: Erann Gat
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 00:22:12 +0000
Message-ID: <gNOSPAMat-2402041622120001@k-137-79-50-101.jpl.nasa.gov>

In article <·····················@clgrps12>, Wade Humeniuk
<····································@telus.net> wrote:

> Erann Gat wrote:
> > In article <····················@clgrps13>, Wade Humeniuk
> > <····································@telus.net> wrote:
> 
> > There are?  What are they?  (And what about an arbitrarily long string, or
> > a symbol with an arbitrarily long name, or an arbitrarily large integer,
> > or an arbitraryly long string of open-parens
> > 
> 
> The simpliest thing I can think of is to wrap the READ in a WITH-TIMEOUT like
> protection.

Sounds scary to me.  You'd need a delay that was at least as long as what
might reasonably be caused by the network.  And then what happens when
machines get faster?

> Overloading the symbol tables with garbage symbols can be be
> fixed by interning the symbols in a temporary package.

But how do you arrange for that to happen?  Just setting *package* won't
work because the input could use colon syntax.

> There seems to be built in limits for integers, strings/arrays that READ
> would ERROR on.

But that's not in the standard so you can't count on it.

Personally I like the solution of reading characters into a buffer through
a simple-minded syntax checker first and then using READ-FROM-STRING.  Or
using READ on a stream that has some built-in protection, like shutting
down after a certain number of characters, and punting on any suspect
character like colons.

But this isn't really solving the problem of "using READ for arbritrary
expressions", it's sidestepping the problem by trying to insure that READ
never gets applied to arbitrary expressions.

E.

From: Vladimir Sedach
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 02:22:01 +0000
Message-ID: <873c906zui.fsf@shawnews.cg.shawcable.net>

·········@jpl.nasa.gov (Erann Gat) writes:

> In article <····················@clgrps13>, Wade Humeniuk
> <····································@telus.net> wrote:
> 
> > Possible problem I see with using READ for arbritrary expressions
> > is that someone can send an expression so long that it could cause
> > the app to run out of system resources.  Or, someone keeps sending
> > arbitrary/random symbols that are interned and eventually overrun
> > the system.  But, there are programming ways around this.
> 
> There are?  What are they?  (And what about an arbitrarily long string, or
> a symbol with an arbitrarily long name, or an arbitrarily large integer,
> or an arbitraryly long string of open-parens

What about
        (progn (fill *string-of-safe-length* #\Space)
               (read-sequence *string-of-safe-length* *unsafe-stream*)
               (clear-input *unsafe-stream*)
               (read-from-string *string-of-safe-length*)) ?

This won't handle time-out and other DOS attacks, but that's really
the job of the stream machinery anyway (and that's how it gets done
with network sockets).

> I think it is true that the standard Lisp READ function is not secure and
> cannot be made secure except with vendor-specific extensions.  (But note
> this is very different from saying that S-expressions are insecure, which
> they are not.)

I think READ is far too nice to complicate it with issues of
security. There's more than enough standard functions to provide your
own safe reading, and if you really want to do anything about DOS
attacks, you will need to tune machine and OS dependent stuff anyway.

> E.

From: Chris Perkins
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 06:21:21 +0000
Message-ID: <6cb6c81f.0402242221.f40c1e2@posting.google.com>

Wade Humeniuk <····································@telus.net> wrote in message news:<····················@clgrps13>...

> If I am reading you right, you are thinking of allowing users to
> input arbritrary Lisp expressions to be EVAL'ed??  READ can be used
> with *READ-EVAL* set to nil.  Then no READ expression can be EVAL'ed
> without your explicit consent.

I'm guessing that the distinction between READ and EVAL are what John
(the OP) has confused. They occur one right after the other in the
listener. LOAD combines both steps as well.  If you aren't breathing
Lisp everyday, it's an easy distinction to miss.

Certainly, I know that when I first started using Lisp the learning
curve was steep and the mountain of information vast.  Terms like
listener, top level, and REPL get thrown around.  I remember learning
that REPL stood for Read Eval Print Loop, but only later learned that
it was not only a lovely description, but the actual code for
constructing a top level:  (LOOP (PRINT (EVAL (READ))))   Just one of
many Lisp "Ah Ha" moments.

Chris

From: Pascal Bourguignon
Subject: Re: lisp and security
Date: Wed, 25 Feb 2004 08:00:03 +0000
Message-ID: <87wu6b4mik.fsf@thalassa.informatimago.com>

········@medialab.com (Chris Perkins) writes:

> Wade Humeniuk <····································@telus.net> wrote in message news:<····················@clgrps13>...
> 
> > If I am reading you right, you are thinking of allowing users to
> > input arbritrary Lisp expressions to be EVAL'ed??  READ can be used
> > with *READ-EVAL* set to nil.  Then no READ expression can be EVAL'ed
> > without your explicit consent.
> 
> I'm guessing that the distinction between READ and EVAL are what John
> (the OP) has confused. They occur one right after the other in the
> listener. LOAD combines both steps as well.  If you aren't breathing
> Lisp everyday, it's an easy distinction to miss.
> 
> Certainly, I know that when I first started using Lisp the learning
> curve was steep and the mountain of information vast.  Terms like
> listener, top level, and REPL get thrown around.  I remember learning
> that REPL stood for Read Eval Print Loop, but only later learned that
> it was not only a lovely description, but the actual code for
> constructing a top level:  (LOOP (PRINT (EVAL (READ))))   Just one of
> many Lisp "Ah Ha" moments.

     (with-input-from-string (*standard-input* "#.(+ 3 4)")
         (read))

     --> 7

vs.:

     (with-input-from-string (*standard-input* "#.(+ 3 4)")
         (let ((*read-eval* nil)) (read)))

     --> *** - READ from #<INPUT STRING-INPUT-STREAM>: 
             *READ-EVAL* = NIL does not allow the evaluation of (+ 3 4)

But otherwise data stays data:

     (with-input-from-string (*standard-input* "(+ 3 4)")
        (read))

     --> (+ 3 4)

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/

From: Pascal Costanza
Subject: Re: lisp and security
Date: Tue, 24 Feb 2004 18:32:55 +0000
Message-ID: <c1g5cp$nhd$1@newsreader2.netcologne.de>

John Thingstad wrote:

> Anyhow it troubles me. I need more experience in implementing this I guess.
> Any thoughts?

What you need is a way to validate s-expressions. Lisp gives you 
everything you need to destructure an s-expression and apply any kind of 
   conformance rule that you might think of.

This topic is discussed to a certain extent at 
http://c2.com/cgi/wiki?XmlIsaPoorCopyOfEssExpressions and 
http://c2.com/cgi/wiki?LispVsXml

Pascal

-- 
Tyler: "How's that working out for you?"
Jack: "Great."
Tyler: "Keep it up, then."