From: Alex Mizrahi
Subject: reader safe
Date: 
Message-ID: <458e9794$0$49202$14726298@news.sunsite.dk>
Hello, All!

1. what are best practices for evaluating code coming from users -- i mean i 
want users to use some 'safe' subset of a language. is it possible to do 
that just configuring reader in some way?

2. is there some reader trick to read XML code intermixed with lisp code 
transparently?
afaik it's possible to trigger some function on < symbol, but is it possible 
to do the rest?

With best regards, Alex Mizrahi. 

From: Wade Humeniuk
Subject: Re: reader safe
Date: 
Message-ID: <IPwjh.97831$hn.16411@edtnps82>
Alex Mizrahi wrote:
> Hello, All!
> 
> 1. what are best practices for evaluating code coming from users -- i mean i 
> want users to use some 'safe' subset of a language. is it possible to do 
> that just configuring reader in some way?
> 

I really do not think so.  There are a number of problems with the
standard reader, though particular implementations might have ways
around it.  For example peers ending these expressions can mess
things up

1) (((((((((((((((((((((((((.............
2) infinitesymbolnamewithabunchofgarbagethatkeepsgoingandgoing........
3) Sending a 4GB (hello 1 2 3 4 5 ...) list
4) sending random symbols until your symbol table packs up.
5) Sends a ( then never sends anything again.  The reader gets
stuck unless it has a way of timing out.

Eventually a Lisp app will error out, but it you are pretty
well hosed by then.

Wade

> 2. is there some reader trick to read XML code intermixed with lisp code 
> transparently?
> afaik it's possible to trigger some function on < symbol, but is it possible 
> to do the rest?
> 
From: Alex Mizrahi
Subject: Re: reader safe
Date: 
Message-ID: <458eb1c6$0$49207$14726298@news.sunsite.dk>
(message (Hello 'Wade)
(you :wrote  :on '(Sun, 24 Dec 2006 15:14:48 GMT))
(

 WH> Eventually a Lisp app will error out, but it you are pretty
 WH> well hosed by then.

suppose that all IO is already handled, and i have a string of reasonable 
size.
is it possible to read it with #'read?

the only pitfal i see is #., but i think it can be disabled.

you've pointed also to following problem:

 WH> 4) sending random symbols until your symbol table packs up.

but i think it can be handled -- e.g. create a fresh package for each user 
input, so package will not be polluted.

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"People who lust for the Feel of keys on their fingertips (c) Inity") 
From: Wade Humeniuk
Subject: Re: reader safe
Date: 
Message-ID: <xxyjh.90353$YV4.77118@edtnps89>
Alex Mizrahi wrote:

> 
> but i think it can be handled -- e.g. create a fresh package for each user 
> input, so package will not be polluted.
> 

And if the user explicitly uses the package name? i.e. cl-user::blub ?


I know you stated an assumption that the reader will have a string
of a reasonable size to work on, but what is reasonable?  When you
posted originally my brain thought, ahh, he wants code submitted over
a TCP connection and then safely executed within some jail.  To get
the string of reasonable size you need a pre-reader that does that.
You would not plug the standard reader onto a arbitrary TCP stream and
expect to be safe.


W
From: Pascal Costanza
Subject: Re: reader safe
Date: 
Message-ID: <4v7qhgF1alpsnU1@mid.individual.net>
Alex Mizrahi wrote:
> (message (Hello 'Wade)
> (you :wrote  :on '(Sun, 24 Dec 2006 15:14:48 GMT))
> (
> 
>  WH> Eventually a Lisp app will error out, but it you are pretty
>  WH> well hosed by then.
> 
> suppose that all IO is already handled, and i have a string of reasonable 
> size.
> is it possible to read it with #'read?

No, but you can use read-from-string.

> the only pitfal i see is #., but i think it can be disabled.

Yes, it can be disabled by setting *read-eval* to nil.

But not, this is not your only pitfall. In your OP you stated that you 
want to evaluate the code sent by other users. Such code could contain 
forms like the following:

(setf (symbol-function
         (intern "SOME-NAME" (find-package "SOME-PACKAGE")))
       (lambda (&rest args) (format-harddisk)))

Once you evaluate such code, you are probably in trouble.


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
From: Alex Mizrahi
Subject: Re: reader safe
Date: 
Message-ID: <458eb8eb$0$49206$14726298@news.sunsite.dk>
(message (Hello 'Pascal)
(you :wrote  :on '(Sun, 24 Dec 2006 18:09:01 +0100))
(

 ??>> suppose that all IO is already handled, and i have a string of
 ??>> reasonable size. is it possible to read it with #'read?

 PC> No, but you can use read-from-string.

i can use with-input-from-string..

 ??>> the only pitfal i see is #., but i think it can be disabled.

 PC> Yes, it can be disabled by setting *read-eval* to nil.

 PC> But not, this is not your only pitfall. In your OP you stated that you
 PC> want to evaluate the code sent by other users.

at least i'd like to read it -- i don't want to write my own reader.
then i can filter out symbols that are not supposed to be available, and 
only then evaluate.

if this evaluateable subset will be turing-complete, user would be able to 
consume infinite memory/time, but that's another question..

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"People who lust for the Feel of keys on their fingertips (c) Inity") 
From: Greg Johnston
Subject: Re: reader safe
Date: 
Message-ID: <1166987398.675280.220700@73g2000cwn.googlegroups.com>
Alex Mizrahi wrote:
> at least i'd like to read it -- i don't want to write my own reader.
> then i can filter out symbols that are not supposed to be available, and
> only then evaluate.

If you do (setf *read-eval* nil) then you can use #'read without
eval'ing.
From: Richard M Kreuter
Subject: Re: reader safe
Date: 
Message-ID: <87odpt3tk1.fsf@progn.net>
"Alex Mizrahi" <········@users.sourceforge.net> writes:
> (message (Hello 'Pascal)

>  ??>> the only pitfal i see is #., but i think it can be disabled.
>
>  PC> Yes, it can be disabled by setting *read-eval* to nil.
>
>  PC> But not, this is not your only pitfall. In your OP you stated that you
>  PC> want to evaluate the code sent by other users.
>
> at least i'd like to read it -- i don't want to write my own reader.
> then i can filter out symbols that are not supposed to be available, and 
> only then evaluate.

This may be possible on some implementations, but it's not going to be
portable (dunno if that matters to you).  The CL reader is somewhat
underspecified for use as a general purpose data input mechanism (even
though this was a evidently among some people's goal during
standardization, see [1]).

There are many standard things that can go wrong during reading, and
then implementation-specific extensions can add some more wrinkles:

(0) sharp-dot, which can be inhibited via *read-eval* (as you know);

(1) a package-qualified token may have a package prefixe that names
    a package that doesn't exist;

(2) a package-qualified token may name a symbol that doesn't exist in
    the package named by the package prefix;

(3) a package-qualified token containing only one package marker may
    name a symbol that isn't exported from the package named by the
    package prefix;

(4) a spurious comma or comma-at;

(5) a comma or comma-at that's /not/ spurious, potentially;

(6) any other syntax error;

(7) a character outside the standard repertoire may not have either
    a syntax type or constituent traits;

(8) implementation-dependent extensions such as package locking can
    cause conditions to be signaled if a syntactically valid token
    would cause a symbol to be interned into a locked package;

(9) other implementation-dependent extensions (such as read macros)
    might need to be handled or suppressed.
    
You get the idea.  Of course, some implementation might provide
convenient workarounds for any these.  OTOH, it might be that working
around all these things isn't much less hard than implementing the
subset of the reader that your protocol needs.

However, the next problem you'll have is that reader errors tend to
leave streams in an unspecified state (e.g,, an unbalanced open
delimiter probably will cause too much input to be read into a form,
whereas an extra closing delimiter will cause reading to stop), and so
it may be tricky to resynchronize the stream after a read error.

One way to get around this would be to implement your protocol as (at
least) two layers: a packet-oriented lower layer and a Lisp reader
upper layer, with the constraint that Lisp forms may not span packets.
The packet might be a fixed-size, or length-prefixed, etc.  You might
find RFC 1037 [2] interesting in this context.

--

[1] http://www.lisp.org/HyperSpec/Issues/iss089.html

[2] http://www.rfc-archive.org/getrfc.php?rfc=1037
From: Richard M Kreuter
Subject: Re: reader safe
Date: 
Message-ID: <87fyb44pe6.fsf@progn.net>
Richard M Kreuter <·······@progn.net> writes:

> There are many standard things that can go wrong during reading, and
> then implementation-specific extensions can add some more wrinkles:

Of course I forgot several read macros that all signal errors:

(-1) sharpsign followed by less-than, backspace, tab, newline,
     linefeed, page return, space, right parenthesis.

> (0) sharp-dot, which can be inhibited via *read-eval* (as you know);
>
> (1) a package-qualified token may have a package prefixe that names
>     a package that doesn't exist;
>
> (2) a package-qualified token may name a symbol that doesn't exist in
>     the package named by the package prefix;
>
> (3) a package-qualified token containing only one package marker may
>     name a symbol that isn't exported from the package named by the
>     package prefix;
>
> (4) a spurious comma or comma-at;
>
> (5) a comma or comma-at that's /not/ spurious, potentially;
>
> (6) any other syntax error;
>
> (7) a character outside the standard repertoire may not have either
>     a syntax type or constituent traits;
>
> (8) implementation-dependent extensions such as package locking can
>     cause conditions to be signaled if a syntactically valid token
>     would cause a symbol to be interned into a locked package;
>
> (9) other implementation-dependent extensions (such as read macros)
>     might need to be handled or suppressed.
From: Wade Humeniuk
Subject: Re: reader safe
Date: 
Message-ID: <j2zjh.97848$hn.11429@edtnps82>
Alex Mizrahi wrote:

> 
> at least i'd like to read it -- i don't want to write my own reader.
> then i can filter out symbols that are not supposed to be available, and 
> only then evaluate.
> 

Do you have a list of them yet?  It would probably be easier if you
are using a OS like BSD where you fork a seperate Lisp process in a jail
with very limited access (like no file descriptors), small VM use and
limited cpu quotas, unable to fork children, etc, etc, etc.

W
From: Pascal Costanza
Subject: Re: reader safe
Date: 
Message-ID: <4v7t29F18kamoU1@mid.individual.net>
Alex Mizrahi wrote:
> (message (Hello 'Pascal)
> (you :wrote  :on '(Sun, 24 Dec 2006 18:09:01 +0100))
> (
> 
>  ??>> suppose that all IO is already handled, and i have a string of
>  ??>> reasonable size. is it possible to read it with #'read?
> 
>  PC> No, but you can use read-from-string.
> 
> i can use with-input-from-string..

Ah, right. Forgot about that one.

>  ??>> the only pitfal i see is #., but i think it can be disabled.
> 
>  PC> Yes, it can be disabled by setting *read-eval* to nil.
> 
>  PC> But not, this is not your only pitfall. In your OP you stated that you
>  PC> want to evaluate the code sent by other users.
> 
> at least i'd like to read it -- i don't want to write my own reader.
> then i can filter out symbols that are not supposed to be available, and 
> only then evaluate.

For the symbols, it's too late after reading. Reading interns symbols 
already.

But you can indeed use the Common Lisp reader, and then analyze the 
resulting s-expression whether it would try to call functionality that 
you don't want it to call when evaluated.

> if this evaluateable subset will be turing-complete, user would be able to 
> consume infinite memory/time, but that's another question..

You could even place restrictions on the acceptable sublanguage such 
that it is not Turing-complete anymore...


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
From: Rob Warnock
Subject: Re: reader safe
Date: 
Message-ID: <-cGdnVTA8bDLzhLYnZ2dnUVZ_t2tnZ2d@speakeasy.net>
Alex Mizrahi <········@users.sourceforge.net> wrote:
+---------------
| at least i'd like to read it -- i don't want to write my own reader.
+---------------

Actually, writing your own Lisp reader for the subset of forms
to which you're probably going to want to restrict the user input
*anyway* is not a big deal. I hand-compiled a large subset of the
CMUCL reader [mainly so I didn't forget any edge cases] into less
than 500 lines of C [including copious comments!] plus a few tables
that were autogenerated by extracting the initial readtable from
CMUCL. You could do the same thing in CL itself with much less effort.
And if you used a similar readtable/attribute-table approach, the
performance should be quite good. For me, the biggest hunks were
READ-TOKEN, READ-LIST, READ-AFTER-DOT, and then INTERN & FIND-SYMBOL.

Try a quick hack at it. You may find it easier than trying to secure
the built-in READ.


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Pascal Costanza
Subject: Re: reader safe
Date: 
Message-ID: <4v7q5kF1aun74U1@mid.individual.net>
Alex Mizrahi wrote:
> Hello, All!
> 
> 1. what are best practices for evaluating code coming from users -- i mean i 
> want users to use some 'safe' subset of a language. is it possible to do 
> that just configuring reader in some way?

My guess based on a couple of experiments with the reader (though not 
extensive) is that it is probably possible to configure the reader to do 
this, but that it is also likely that you miss a few things and that it 
will probably take some time until your configuration is safe.

Common Lisp has obviously been designed on the premise that protection 
between different parts of code is based on conventions, and that there 
should always be a way around abstraction barriers. While in general 
this ensures that you can never paint yourself into a corner, which is 
quite useful in a lot of circumstances, this also means that you cannot 
easily ensure that someone else indeed stays in some corner.

Therefore, it is probably better that you write your own reader where 
you have better control over what is acceptable and what is not.


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/