Lisp code security

From: Chris Capel
Subject: Lisp code security
Date: Sun, 05 Dec 2004 00:13:52 +0000
Message-ID: <10r4kjmqt529086@corp.supernews.com>

Hi everyone,

I'm interesting in defining a subset of Lisp that is safe in the sense that
any arbitrary code written in the subset can be executed without fear of
the code compromising the security of the system, or taking down the lisp
image (absent impl. bugs), or accessing certain protected information in
the lisp image, or hanging the lisp image in a tight loop, or doing other
malicious things. The code would have to be verified to exist in that
subset with a function that reads in the code from text (with read-time
evaluation disabled, of course) and returns whether it's safe. Is it
possible to define such a subset?

My initial thought would be that you just make sure every symbol is in the
COMMON-LISP package, and that no symbol is INTERN, EVAL, COMPILE-FILE,
MAKE-SYMBOL, or functions that do file I/O, (what am I missing?) so that
the only symbols the code has access to are those that the controlling
process gives to it and a useful part of CL and so it can't mess with the
disk or open sockets or anything. That doesn't solve the tight loop
problem, though, unless the Lisp has OS MP so the code can be run in
another process--I think. Do green threads like in CMUCL allow one process
to interrupt another in the middle of any arbitrary computation, such that
one can guarantee that the controller thread will regain control at some
point?

Chris Capel

Re: Lisp code security Chris Capel
Re: Lisp code security Chris Capel
- Re: Lisp code security Pascal Bourguignon
Re: Lisp code security Chris Capel
- Re: Lisp code security Christopher C. Stacy
  - Re: Lisp code security Chris Capel
    - Re: Lisp code security Pete Kirkham
Re: Lisp code security Pascal Bourguignon
- Re: Lisp code security Chris Capel
  - Re: Lisp code security Peter Seibel
    - Re: Lisp code security Chris Capel
      - Re: Lisp code security Kalle Olavi Niemitalo
      - Re: Lisp code security Peter Seibel
      - Re: Lisp code security Steven M. Haflich
        Re: Lisp code security Pascal Bourguignon
        Re: Lisp code security Chris Capel
        Re: Lisp code security Pascal Bourguignon
        Re: Lisp code security Chris Capel
        Re: Lisp code security Christopher C. Stacy
        Re: Lisp code security Matthias Buelow
        Re: Lisp code security Pascal Bourguignon
        Re: Lisp code security Tim Bradshaw
        Re: Lisp code security Pascal Bourguignon
        Re: Lisp code security Tim Bradshaw
        Re: Lisp code security Pascal Bourguignon
        Re: Lisp code security Tim Bradshaw
        Re: Lisp code security Pascal Bourguignon
        Re: Lisp code security Tim Bradshaw
        Re: Lisp code security Thomas F. Burdick
        Re: Lisp code security Chris Capel
        Re: Lisp code security Eric Daniel
        Re: Lisp code security Christopher C. Stacy
        Re: Lisp code security Chris Capel
        Re: Lisp code security Eric Daniel
        Re: Lisp code security Pascal Bourguignon
        Re: Lisp code security Thomas F. Burdick
        Re: Lisp code security Nikodemus Siivola
Re: Lisp code security Thomas F. Burdick
Re: Lisp code security Wade Humeniuk
- Re: Lisp code security Chris Capel
  - Re: Lisp code security Peter Seibel
    - Re: Lisp code security Chris Capel
Re: Lisp code security Tim Bradshaw
- Re: Lisp code security Pascal Bourguignon
  - Re: Lisp code security Tim Bradshaw

From: Chris Capel
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 00:47:40 +0000
Message-ID: <10r4mj2d9u5lec1@corp.supernews.com>

Chris Capel wrote:

> My initial thought would be that you just make sure every symbol is in the
> COMMON-LISP package,

or in the package of the verified code, of course. Sorry.

In a word, "sandbox".

Chris Capel

From: Chris Capel
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 01:12:31 +0000
Message-ID: <10r4o1ljp6go235@corp.supernews.com>

Chris Capel wrote:

> My initial thought would be that you just make sure every symbol is in the
> COMMON-LISP package, and that no symbol is INTERN, EVAL, COMPILE-FILE,
> MAKE-SYMBOL, or functions that do file I/O, (what am I missing?) so that
> the only symbols the code has access to are those that the controlling
> process gives to it and a useful part of CL and so it can't mess with the
> disk or open sockets or anything.

Is there a way around the situation where a symbol's function in one you
want to allow this untrusted code, but not its variable? For instance,
untrusted code doesn't need *any* access to the REPL variables * and +, but
it's probably going to want to use the functions * and +. Can this be
handled?

Chris Capel

From: Pascal Bourguignon
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 05:10:23 +0000
Message-ID: <87d5xp72kg.fsf@thalassa.informatimago.com>

Chris Capel <······@iba.nktech.net> writes:

> Chris Capel wrote:
> 
> > My initial thought would be that you just make sure every symbol is in the
> > COMMON-LISP package, and that no symbol is INTERN, EVAL, COMPILE-FILE,
> > MAKE-SYMBOL, or functions that do file I/O, (what am I missing?) so that
> > the only symbols the code has access to are those that the controlling
> > process gives to it and a useful part of CL and so it can't mess with the
> > disk or open sockets or anything.
> 
> Is there a way around the situation where a symbol's function in one you
> want to allow this untrusted code, but not its variable? For instance,
> untrusted code doesn't need *any* access to the REPL variables * and +, but
> it's probably going to want to use the functions * and +. Can this be
> handled?

A symbol is a symbol. You cannot cut it in two!

But instead of using THE symbol COMMON-LISP:+, you can as well use the
symbol SAFE:+

(DEFPACKAGE "SAFE"
    (:USE "COMMON-LISP")
    (:SHADOW "+")
    (:EXPORT "+"))
(IN-PACKAGE "SAFE")

(DEFUN + (&REST ARGS) (APPLY (FUNCTION COMMON-LISP:+) ARGS))
;; or more simply:
(SETF (SYMBOL-FUNCTION +) (SYMBOL-FUNCTION COMMON-LISP:+))
;; but that may be lacking...

(in-package "SAFE-USER)
(+ 1 2) ==> 3
+ ==> unbound error


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Chris Capel
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 06:42:34 +0000
Message-ID: <10r5bch94blcga8@corp.supernews.com>

Chris Capel wrote:

> Hi everyone,

Another potential attack. The malicious code conses up as much memory as it
can as fast as it can. This would be bad. Is there any way in ANSI CL to
know how much memory a section of code is using, as opposed to other
sections? Or will I have to shadow all consing functions to keep up with
the memory used?

Then, I can add a GC trigger (all decent implementations have those, right?)
that looks at the memory usage of each untrusted code module and terminates
it if it's above a limit. Would this work, assuming I knew which code was
being a memory hog?

Chris Capel

From: Christopher C. Stacy
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 06:48:10 +0000
Message-ID: <uoeh9utp1.fsf@news.dtpq.com>

Chris Capel <······@iba.nktech.net> writes:
> Another potential attack. The malicious code conses up as much
> memory as it can as fast as it can.

How does Java handle this issue?

From: Chris Capel
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 07:50:48 +0000
Message-ID: <10r5fcfl3j2no14@corp.supernews.com>

Christopher C. Stacy wrote:

> Chris Capel <······@iba.nktech.net> writes:
>> Another potential attack. The malicious code conses up as much
>> memory as it can as fast as it can.
> 
> How does Java handle this issue?

A quick look into the issue doesn't turn up anything. All the pages that
talk about applet/sandbox security features don't mention this attack. And
that's not surprising, as it's not really a very threatening action if
there's a user at the machine there to kill Internet Explorer when it
freezes. But in an unattended server environment, this becomes a threat. I
wonder if I missed something, or if .NET handles this.

Yes, I really am planning on writing a program that downloads code and runs
it, unattended. :-P

Chris Capel

From: Pete Kirkham
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 13:13:01 +0000
Message-ID: <41b3095e$0$16575$cc9e4d1f@news-text.dial.pipex.com>

Prior to Java (1.)5, an OutOfMemoryError would be thrown killing the VM 
unless it is caught in the outermost method invocation of the thread.

In addition, Java (1.)5 adds memory management 'beans'* which provide a 
callback mechanism for when the gc detects that a limit is about to be 
reached, which may be used to monitor memory usage, but (since a tight 
loop might not allow the gc thread to schedule between any limit being 
reached and the OOM exception) is not guaranteed to be called.


Pete

* formerly Sun terminology for a class with 'getFoo' and 'setFoo' 
methods, now apparently used for callback interfaces too.

From: Pascal Bourguignon
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 05:06:40 +0000
Message-ID: <87hdn172qn.fsf@thalassa.informatimago.com>

Chris Capel <······@iba.nktech.net> writes:

> Hi everyone,
> 
> I'm interesting in defining a subset of Lisp that is safe in the sense that
> any arbitrary code written in the subset can be executed without fear of
> the code compromising the security of the system, or taking down the lisp
> image (absent impl. bugs), or accessing certain protected information in
> the lisp image, or hanging the lisp image in a tight loop, or doing other
> malicious things. The code would have to be verified to exist in that
> subset with a function that reads in the code from text (with read-time
> evaluation disabled, of course) and returns whether it's safe. Is it
> possible to define such a subset?
> 
> My initial thought would be that you just make sure every symbol is in the
> COMMON-LISP package, and that no symbol is INTERN, EVAL, COMPILE-FILE,
> MAKE-SYMBOL, or functions that do file I/O, (what am I missing?) so that
> the only symbols the code has access to are those that the controlling
> process gives to it and a useful part of CL and so it can't mess with the
> disk or open sockets or anything. That doesn't solve the tight loop
> problem, though, unless the Lisp has OS MP so the code can be run in
> another process--I think. Do green threads like in CMUCL allow one process
> to interrupt another in the middle of any arbitrary computation, such that
> one can guarantee that the controller thread will regain control at some
> point?

(defpackage "SAFE")
    (:use #|UNSAFE|#"COMMON-LISP")
    (:shadow "CONS" "CAR" "CDR" "NULL" ...)
    (:export "CONS" "CAR" "CDR" "NULL" ...))
(in-package "SAFE")


(defun cons (car cdr)
    ;; do whatever it takes to render COMMON-LISP:CONS "SAFE"
    (saneated (cons car cdr)))

...


(defpackage "SAFE-USER")
    (:use "SAFE"))
(in-package "SAFE-USER")

;; here you can only do "SAFE" things.


You'll have to implement, or neutralize,  a lot in SAFE. READ and a
REPL, I/O, error handling, oackage management, etc etc.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Chris Capel
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 05:37:44 +0000
Message-ID: <10r57iv36nhtn76@corp.supernews.com>

Pascal Bourguignon wrote:

> Chris Capel <······@iba.nktech.net> writes:
> 
>> My initial thought would be that you just make sure every symbol is in
>> the COMMON-LISP package, and that no symbol is INTERN, EVAL,
>> COMPILE-FILE, MAKE-SYMBOL, or functions that do file I/O, (what am I
>> missing?) so that the only symbols the code has access to are those that
>> the controlling process gives to it and a useful part of CL and so it
>> can't mess with the disk or open sockets or anything. That doesn't solve
>> the tight loop problem, though, unless the Lisp has OS MP so the code can
>> be run in another process--I think. Do green threads like in CMUCL allow
>> one process to interrupt another in the middle of any arbitrary
>> computation, such that one can guarantee that the controller thread will
>> regain control at some point?
> 
> (defpackage "SAFE")
>     (:use #|UNSAFE|#"COMMON-LISP")
>     (:shadow "CONS" "CAR" "CDR" "NULL" ...)
>     (:export "CONS" "CAR" "CDR" "NULL" ...))
> (in-package "SAFE")
> 
> 
> (defun cons (car cdr)
>     ;; do whatever it takes to render COMMON-LISP:CONS "SAFE"
>     (saneated (cons car cdr)))

Are you implying here that CAR itself isn't safe, or just using it as a
prototype? Does CAR need any wrapping to be safe? I figure not everything
does, and CAR seems to be the most benign of all functions. It couldn't
harm a fly. So, for those harmless functions, what about

(defpackage "SAFE"
  (:use :cl)
  (:export car ...))

Then in my function, I just make sure that whatever symbols are used are
exported from the SAFE package. It doesn't matter what the home package of
the symbol is.

I'm not trying to "wrap" anything to make sure that calls to function X
aren't going to be malicious. I just want to completely remove access to
those functions. Like a java sandbox, the code can't get at the system, and
that's OK. So, for instance, none of the global variables in the
COMMON-LISP package are relevant to anything a sandoxed application would
want to do, so I just don't export those. So, for instance, I don't make
available *any* way to get at files, because I don't want to provide that
functionality at all.

> You'll have to implement, or neutralize,  a lot in SAFE. READ and a
> REPL, I/O, error handling, oackage management, etc etc.

I was just thinking "don't include". Is there anything wrong with this
strategy? The advice I'm looking for is any particular holes that I might
need to be aware of in advance--tricky things that I might miss.

Chris Capel

From: Peter Seibel
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 06:06:58 +0000
Message-ID: <m3zn0txoql.fsf@javamonkey.com>

Chris Capel <······@iba.nktech.net> writes:

> Pascal Bourguignon wrote:
>
>> Chris Capel <······@iba.nktech.net> writes:
>> 
>>> My initial thought would be that you just make sure every symbol is in
>>> the COMMON-LISP package, and that no symbol is INTERN, EVAL,
>>> COMPILE-FILE, MAKE-SYMBOL, or functions that do file I/O, (what am I
>>> missing?) so that the only symbols the code has access to are those that
>>> the controlling process gives to it and a useful part of CL and so it
>>> can't mess with the disk or open sockets or anything. That doesn't solve
>>> the tight loop problem, though, unless the Lisp has OS MP so the code can
>>> be run in another process--I think. Do green threads like in CMUCL allow
>>> one process to interrupt another in the middle of any arbitrary
>>> computation, such that one can guarantee that the controller thread will
>>> regain control at some point?
>> 
>> (defpackage "SAFE")
>>     (:use #|UNSAFE|#"COMMON-LISP")
>>     (:shadow "CONS" "CAR" "CDR" "NULL" ...)
>>     (:export "CONS" "CAR" "CDR" "NULL" ...))
>> (in-package "SAFE")
>> 
>> 
>> (defun cons (car cdr)
>>     ;; do whatever it takes to render COMMON-LISP:CONS "SAFE"
>>     (saneated (cons car cdr)))
>
> Are you implying here that CAR itself isn't safe, or just using it as a
> prototype? Does CAR need any wrapping to be safe? 

Depends what you mean by "safe". Do you want to prevent folks from
signalling errors? What if someone sends you code that contains (car
1)? Anyway, you might want to approach this from the other end--write
a simple "compiler" that compiles a Lisp like language into Lisp. Then
you have complete control over the constructs that your "safe"
language supports. Then you can even build in things to prevent denial
of service attacks. For instance, suppose your language provided a
simple WHILE construct so folks can write this:

  (while (whatever) (do-stuff))

You might compile that to:

  (do ((max-iterations 5) (iterations 0 (1+ iterations)))
      ((not (whatever)))
    (when (> iterations max-iterations)
      (error "Denial of service attack detected. Runaway loop."))
    (print 'hello))

Obviously how much work this is depends on how much you want your safe
language to be like Common Lisp. And you'll also want to be careful
about the reader, for sure you'll need to set *READ-EVAL* to NIL but
you may also need to worry about denial of service attacks at that
level: what if someone send you a "program" designed to make the
reader eat up all available memory trying to read it in? Or simply by
populating a bunch of random packages with random symbols.

Whatever level of safety you want (within reason), there's probably a
way to achive it but the techniques you'll need to use depend a lot on
what safe means.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: Chris Capel
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 06:31:12 +0000
Message-ID: <10r5an76vqv2udb@corp.supernews.com>

Peter Seibel wrote:

> Chris Capel <······@iba.nktech.net> writes:
> 
>> Pascal Bourguignon wrote:
>>
>>> (defpackage "SAFE")
>>>     (:use #|UNSAFE|#"COMMON-LISP")
>>>     (:shadow "CONS" "CAR" "CDR" "NULL" ...)
>>>     (:export "CONS" "CAR" "CDR" "NULL" ...))
>>> (in-package "SAFE")
>>> 
>>> 
>>> (defun cons (car cdr)
>>>     ;; do whatever it takes to render COMMON-LISP:CONS "SAFE"
>>>     (saneated (cons car cdr)))
>>
>> Are you implying here that CAR itself isn't safe, or just using it as a
>> prototype? Does CAR need any wrapping to be safe?
> 
> Depends what you mean by "safe". Do you want to prevent folks from
> signalling errors? What if someone sends you code that contains (car
> 1)?

You can catch any errors like this that are signalled, right? (Of course, I
would disallow any access to the debugger.) I don't think it's possible to
ensure that any given function will never signal an error. :-) So I was
planning on just catching those signals/conditions and terminating that
part of the application.

> Anyway, you might want to approach this from the other end--write 
> a simple "compiler" that compiles a Lisp like language into Lisp. Then
> you have complete control over the constructs that your "safe"
> language supports.

I could do that, but I don't really think it fits my needs. I really want
this to be as much common lisp as possible.

> Then you can even build in things to prevent denial 
> of service attacks. For instance, suppose your language provided a
> simple WHILE construct so folks can write this:
> 
>   (while (whatever) (do-stuff))
> 
> You might compile that to:
> 
>   (do ((max-iterations 5) (iterations 0 (1+ iterations)))
>       ((not (whatever)))
>     (when (> iterations max-iterations)
>       (error "Denial of service attack detected. Runaway loop."))
>     (print 'hello))

Wrapping all looping constructs with max-iters isn't really a solution. Then
the malicious code could just do something like this:

(dotimes (x (1- max-iters)) (dotimes (x (1- max-iters)) ...))

even if max-iters has to be determined experimentally.

So the only way to protect against unending loops is to have a threaded
environment where you can detect a thread that's been processing for too
long and just terminate it. So my question is: do Lisp-level threads
support this?

Or maybe... Here's something I could do. First: disallow gotos. Then, I
could wrap all the loops to include in their expansions a call back to the
controlling code that signals something if that thread's been taking too
long, and catch that signal in the code that starts off the untrusted code.
(I could optimize this in various ways.) I can ensure that the untrusted
code doesn't catch a given signal by making sure the code doesn't contain
that signal's symbol, can't I?

> Obviously how much work this is depends on how much you want your safe
> language to be like Common Lisp. And you'll also want to be careful
> about the reader, for sure you'll need to set *READ-EVAL* to NIL but
> you may also need to worry about denial of service attacks at that
> level: what if someone send you a "program" designed to make the
> reader eat up all available memory trying to read it in? Or simply by
> populating a bunch of random packages with random symbols.

I would definitely not allow access to anything that can intern symbols or
access packages. My only concern is whether my "disallowing" is really
disallowing it. So if my verification function sets the current package to
some temporary package X before it reads the untrusted code, I'm wondering
whether searching the parsed code for any symbols outside of my safe
package and package X would be good enough to make sure that it can't
access any of those symbols.

Maybe the best way is for me to come up with a package I think is safe and
have you guys poke holes in it. :-)

> Whatever level of safety you want (within reason), there's probably a
> way to achive it but the techniques you'll need to use depend a lot on
> what safe means.

Thanks for the help,
Chris Capel

From: Kalle Olavi Niemitalo
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 09:40:06 +0000
Message-ID: <87653hkrrd.fsf@Astalo.kon.iki.fi>

Chris Capel <······@iba.nktech.net> writes:

> I can ensure that the untrusted code doesn't catch a given
> signal by making sure the code doesn't contain that signal's
> symbol, can't I?

You also need to ensure that the code doesn't handle any of the
parent condition types, such as CONDITION.

UNWIND-PROTECT might provide an avenue for attack, as well.

From: Peter Seibel
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 19:27:33 +0000
Message-ID: <m3mzwsy28q.fsf@javamonkey.com>

Chris Capel <······@iba.nktech.net> writes:

> Maybe the best way is for me to come up with a package I think is
> safe and have you guys poke holes in it. :-)

Go for it. However keep in mind that it's not really a "package" that
you want. You can define whatever package you want but you still need
to prevent someone from just writing:

  (cl:eval ...)

So at the very least you need to take control of the reader, either by
reading the code yourself (i.e. use SAFE:READ rather than CL:READ) or
by walking the tree after it is read and checking for symbols not in
the SAFE package. And at that point you've essentially started down
the path of implementing a "compiler" from the safe language to Common
Lisp.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: Steven M. Haflich
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 19:52:27 +0000
Message-ID: <%FJsd.54836$QJ3.7825@newssvr21.news.prodigy.com>

Chris Capel wrote:

> You can catch any errors like this that are signalled, right? (Of course, I
> would disallow any access to the debugger.) I don't think it's possible to
> ensure that any given function will never signal an error. :-) So I was
> planning on just catching those signals/conditions and terminating that
> part of the application.

I think most of the items in this thread have missed the real issue.

The problem is not so much in detecting or preventing execution from
signalling errors.  The intractible problem is that there is no way to
force the implementation to detect situations where you would _prefer_
an error be signalled, but where the implementation is not required to
do so.

The goal is for malicious or `incompetent' code running in the sandbox
not to be able to compromise the Lisp system.  One way systems are
compromised is by creating bad pointers.  Attacks on browsers do this
with buffer overruns.  In Lisp systems it can be done by passing
incorrect arguments or dereferencing data objects incorrectly.

For example, car and cdr are safe because (in safe code) they are required
to signal error if given a non-list argument.  From the ANS: "The
functions car and cdr should signal type-error if they receive an argument
which is not a list" and "should signal" is a term of art in the Error
Terminology section.  But most of the arithmetic functions are not
required to detect and signal errors, the ANS specifying on the class of
any incorrect type error.  With sufficient knowledge of the implementation
malicious code can construct illegal objects and manipulate pointers in
ways that violate security.

It is possible to build a fairly secure sandbox, but you would need to
implement nearly the entire sandbox language on top of Lisp and keep it
isolated from Lisp.  But if your sandbox does not need to be guaranteed
secure against malicious code, but only helpfully protective against
student code, then perhaps you can do something allowing access to the
underlying Lisp.

BTW, one of the hard problems is stack overflow.  Most implementations
detect stack overflow most of the time, and some allow the limit of stack
growth to be controlled, it is hard to guarantee that stack overflow will
be detected in all circumstances.

From: Pascal Bourguignon
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 21:12:53 +0000
Message-ID: <87sm6k4ffu.fsf@thalassa.informatimago.com>

"Steven M. Haflich" <·················@alum.mit.edu> writes:
> BTW, one of the hard problems is stack overflow.  Most implementations
> detect stack overflow most of the time, and some allow the limit of stack
> growth to be controlled, it is hard to guarantee that stack overflow will
> be detected in all circumstances.

Yes if you use the system stack.
No if you use this stack:

    (defun make-stack (size) (make-array (list size) :fill-pointer 0))
    (defun stack-empty-p (stack) (zerop (fill-pointer stack)))
    (defun stack-push (stack item) (vector-push item stack))
    (defun stack-pop (stack) (vector-pop stack))
    (defun stack-top (stack) (if (plusp (fill-pointer stack))
                                 (aref stack (1- (fill-pointer stack)))
                                 (error "Stack empty")))


(one more good reason to implement it _above_ common-lisp).
-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Chris Capel
Subject: Re: Lisp code security
Date: Mon, 06 Dec 2004 05:02:26 +0000
Message-ID: <10r7pssl5l51v0d@corp.supernews.com>

Steven M. Haflich wrote:

> For example, car and cdr are safe because (in safe code) they are required
> to signal error if given a non-list argument.  From the ANS: "The
> functions car and cdr should signal type-error if they receive an argument
> which is not a list" and "should signal" is a term of art in the Error
> Terminology section.  But most of the arithmetic functions are not
> required to detect and signal errors, the ANS specifying on the class of
> any incorrect type error.  With sufficient knowledge of the implementation
> malicious code can construct illegal objects and manipulate pointers in
> ways that violate security.

Really? There's actually an implementation that you can do this with?

> It is possible to build a fairly secure sandbox, but you would need to
> implement nearly the entire sandbox language on top of Lisp and keep it
> isolated from Lisp.

I don't think, for what I'm going to use it for, implementing CL entirely on
top of CL will be efficient enough without going all the way. The executed
code needs to run rather quickly. And that sounds like a lot of work! I was
hoping it would be easier than that.

It really seems that there *should* be some sort of guaranteed type-safety
with normal optimization settings. What you're saying is that there really
isn't? In real implementations, there isn't?

> BTW, one of the hard problems is stack overflow.  Most implementations
> detect stack overflow most of the time, and some allow the limit of stack
> growth to be controlled, it is hard to guarantee that stack overflow will
> be detected in all circumstances.

This is another one of those things. It seems like there *should* be some
sort of guarantee against a stack overflow trashing the lisp image.

Are there extenuating circumstances where this is particularly hard? I can't
figure out how to make an infinitely recursing function in SBCL that
doesn't throw a control-stack-exhausted-error.

Chris Capel

From: Pascal Bourguignon
Subject: Re: Lisp code security
Date: Mon, 06 Dec 2004 06:24:30 +0000
Message-ID: <87d5xo2bc1.fsf@thalassa.informatimago.com>

Chris Capel <······@iba.nktech.net> writes:

> Steven M. Haflich wrote:
> 
> > For example, car and cdr are safe because (in safe code) they are required
> > to signal error if given a non-list argument.  From the ANS: "The
> > functions car and cdr should signal type-error if they receive an argument
> > which is not a list" and "should signal" is a term of art in the Error
> > Terminology section.  But most of the arithmetic functions are not
> > required to detect and signal errors, the ANS specifying on the class of
> > any incorrect type error.  With sufficient knowledge of the implementation
> > malicious code can construct illegal objects and manipulate pointers in
> > ways that violate security.
> 
> Really? There's actually an implementation that you can do this with?

How do you think the compiler can do it?  All what the compiler can
do, a dedicate user program can do.

An implementation based on a virtual machine like clisp could be safer
(it's not because it allow external modules, (I've a nice PEEK and
POKE one  :-). But in implementations that allow their compiler
provide native code like cmucl or sbcl, you can do the same and
generate PEEK and POKE without even needed an external module.

> It really seems that there *should* be some sort of guaranteed type-safety
> with normal optimization settings. What you're saying is that there really
> isn't? In real implementations, there isn't?

In addition, it depends on the declarations used when compiling the
code.  Some implementation remove all boundary check when you compile
with (speed 3), so you get some surprizing results for things like:

    (replace "Hello" "titi" :start 4)

> This is another one of those things. It seems like there *should* be some
> sort of guarantee against a stack overflow trashing the lisp image.

This is a problem with the host system capacities, and a
performance/security compromize.

If the host system allows the user process to change the
read/write/execute flags on virtual memory pages, then you could mark
program page as not writeable, so when the stack collides with the
program you get informed immediately.  For collisions between the
stack and the heap, it's more difficult.  And in any case, you're back
to the virtual machine problem because if the program can lock a page,
it can unlock it also (for valid reason as well as by runaway code).

Or, said otherwise, complain to Intel, Motorola, Compaq and IBM that
their processors don't do this kind of check in the hardware.  But
they're not in the specialized processor market, they're in the
general processor market.  If you want security, pay the price, slow
down and have a virtual machine that implement the security you
want. Check all operations on the stack pointer, check all pointer
dereferences, check all arithmetic operations, etc.

> Are there extenuating circumstances where this is particularly hard? I can't
> figure out how to make an infinitely recursing function in SBCL that
> doesn't throw a control-stack-exhausted-error.

Perhaps (it depends on the compiler) try:

    (let ((skip (make-array (list (* 400 1024 1024)))) ;adjust!
          (poke (unsigned-byte 32))
      (declare (dynamic-extent skip poke))
      (setf poke #xDEADBEEF))

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Chris Capel
Subject: Re: Lisp code security
Date: Mon, 06 Dec 2004 16:40:46 +0000
Message-ID: <10r92qaei1r5t40@corp.supernews.com>

Pascal Bourguignon wrote:

> Or, said otherwise, complain to Intel, Motorola, Compaq and IBM that
> their processors don't do this kind of check in the hardware.  But
> they're not in the specialized processor market, they're in the
> general processor market.  If you want security, pay the price, slow
> down and have a virtual machine that implement the security you
> want. Check all operations on the stack pointer, check all pointer
> dereferences, check all arithmetic operations, etc.

I don't see why I should be complaining to the processor makers, when both
Sun and Microsoft have made compilers that provide these kind of guarantees
on conventional processors, as far as I know. The fact that they implement
a VM is moot, because both of the good VMs implement JIT compilers, and all
a bytcode interpreter (or lisp interpreter) is anyway is a runtime
compiler, so it's not a significant fact in any way in this discussion.

A lisp implementation's natively generated code can check all array bounds
and argument types just as much as its uncompiled code can. That's what
(declare (optimize ...)) is for. Is it guaranteed by ANSI? Maybe not.
Neither is garbage collection. What I'm interested in is what
implementations actually do in this regard.

Chris Capel

From: Christopher C. Stacy
Subject: Re: Lisp code security
Date: Mon, 06 Dec 2004 15:50:30 +0000
Message-ID: <uu0qzify1.fsf@news.dtpq.com>

Pascal Bourguignon <····@mouse-potato.com> writes:

> Chris Capel <······@iba.nktech.net> writes:
> 
> > Steven M. Haflich wrote:
> > 
> > > For example, car and cdr are safe because (in safe code) they are required
> > > to signal error if given a non-list argument.  From the ANS: "The
> > > functions car and cdr should signal type-error if they receive an argument
> > > which is not a list" and "should signal" is a term of art in the Error
> > > Terminology section.  But most of the arithmetic functions are not
> > > required to detect and signal errors, the ANS specifying on the class of
> > > any incorrect type error.  With sufficient knowledge of the implementation
> > > malicious code can construct illegal objects and manipulate pointers in
> > > ways that violate security.
> > 
> > Really? There's actually an implementation that you can do this with?
> 
> How do you think the compiler can do it?  All what the compiler can
> do, a dedicate user program can do.

I would claim that those are bugs in the implementation, 
and that it's not supposed to let you do that sort of thing,
even though "the consequences are undefined".

Which implementations actually do that, though?

From: Matthias Buelow
Subject: Re: Lisp code security
Date: Fri, 10 Dec 2004 01:11:26 +0000
Message-ID: <41B8F7BE.7050707@mukappabeta.de>

Pascal Bourguignon wrote:
> program you get informed immediately.  For collisions between the
> stack and the heap, it's more difficult.  And in any case, you're back

All contemporary Unix systems have at least one unmapped page between 
the top of the "heap" and the bottom of the stack, and stack overflows 
are always caught.  I bet that goes for Windows aswell.

-- 
   Matthias Buelow; ···@{mukappabeta,informatik.uni-wuerzburg}.de

From: Pascal Bourguignon
Subject: Re: Lisp code security
Date: Fri, 10 Dec 2004 01:21:36 +0000
Message-ID: <87k6rrt0bj.fsf@thalassa.informatimago.com>

Matthias Buelow <···@mukappabeta.de> writes:

> Pascal Bourguignon wrote:
> > program you get informed immediately.  For collisions between the
> > stack and the heap, it's more difficult.  And in any case, you're back
> 
> All contemporary Unix systems have at least one unmapped page between
> the top of the "heap" and the bottom of the stack, and stack overflows
> are always caught.  I bet that goes for Windows aswell.

That does not help much. Try: alloca(1024*1024*1024);

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Tim Bradshaw
Subject: Re: Lisp code security
Date: Fri, 10 Dec 2004 12:44:37 +0000
Message-ID: <ey3fz2exqyy.fsf@cley.com>

* Pascal Bourguignon wrote:

> That does not help much. Try: alloca(1024*1024*1024);

Are you assuming some broken implementation of alloca which does not
check?

--tim

From: Pascal Bourguignon
Subject: Re: Lisp code security
Date: Fri, 10 Dec 2004 18:09:42 +0000
Message-ID: <874qiut47t.fsf@thalassa.informatimago.com>

Tim Bradshaw <···@cley.com> writes:

> * Pascal Bourguignon wrote:
> 
> > That does not help much. Try: alloca(1024*1024*1024);
> 
> Are you assuming some broken implementation of alloca which does not
> check?

What would you check exactly?


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Tim Bradshaw
Subject: Re: Lisp code security
Date: Mon, 13 Dec 2004 11:40:38 +0000
Message-ID: <ey3brcyxw7d.fsf@cley.com>

* Pascal Bourguignon wrote:
> Tim Bradshaw <···@cley.com> writes:

>> Are you assuming some broken implementation of alloca which does not
>> check?

> What would you check exactly?

that it hasn't overflowed the stack.

From: Pascal Bourguignon
Subject: Re: Lisp code security
Date: Mon, 13 Dec 2004 16:27:20 +0000
Message-ID: <87y8g2noyf.fsf@thalassa.informatimago.com>

Tim Bradshaw <···@cley.com> writes:

> * Pascal Bourguignon wrote:
> > Tim Bradshaw <···@cley.com> writes:
> 
> >> Are you assuming some broken implementation of alloca which does not
> >> check?
> 
> > What would you check exactly?
> 
> that it hasn't overflowed the stack.

So it would have to browse a table of mapped pages, the current value
of sbrk(), to identify the heap zones and the other stacks (eg. in
case of multithreading), and to check whether the stack pointer would
cross any heap zone or other stack.  That would quite defeat the
purpose of alloca which is to get a fast allocation on stack.  I don't
think many do.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Tim Bradshaw
Subject: Re: Lisp code security
Date: Mon, 13 Dec 2004 17:44:32 +0000
Message-ID: <ey33byaxfcv.fsf@cley.com>

* Pascal Bourguignon wrote:

> So it would have to browse a table of mapped pages, the current value
> of sbrk(), to identify the heap zones and the other stacks (eg. in
> case of multithreading), and to check whether the stack pointer would
> cross any heap zone or other stack.  That would quite defeat the
> purpose of alloca which is to get a fast allocation on stack.  I don't
> think many do.

Um. No, of course it wouldn't.  It just needs to do a single bound
check (`am I trying to allocate beyond the end of the heap').  If that
fails it might have to do something complicated, but more likely just
return failure.

--tim

From: Pascal Bourguignon
Subject: Re: Lisp code security
Date: Mon, 13 Dec 2004 19:34:02 +0000
Message-ID: <87llc2ngb9.fsf@thalassa.informatimago.com>

Tim Bradshaw <···@cley.com> writes:

> * Pascal Bourguignon wrote:
> 
> > So it would have to browse a table of mapped pages, the current value
> > of sbrk(), to identify the heap zones and the other stacks (eg. in
> > case of multithreading), and to check whether the stack pointer would
> > cross any heap zone or other stack.  That would quite defeat the
> > purpose of alloca which is to get a fast allocation on stack.  I don't
> > think many do.
> 
> Um. No, of course it wouldn't.  It just needs to do a single bound
> check (`am I trying to allocate beyond the end of the heap').  If that
> fails it might have to do something complicated, but more likely just
> return failure.

The limit is dependent on the stack (there are several stacks with
multithreading) and varies each time you call mmap, shmat, sbrk, etc.
If you allow FFI and foreign libraries, you cannot count on them to
update the stack limit table, so you'll have to scan more than what is
done currently. You'd have the choice between doing it at alloca time
or whenever a FFI returns, with perhaps some more optimization
depending on the actual stack you're on,  but this is not something
that's done by alloca(3).

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Tim Bradshaw
Subject: Re: Lisp code security
Date: Tue, 14 Dec 2004 15:39:12 +0000
Message-ID: <ey3y8g0x527.fsf@cley.com>

* Pascal Bourguignon wrote:

> The limit is dependent on the stack (there are several stacks with
> multithreading) and varies each time you call mmap, shmat, sbrk, etc.
> If you allow FFI and foreign libraries, you cannot count on them to
> update the stack limit table, so you'll have to scan more than what is
> done currently. You'd have the choice between doing it at alloca time
> or whenever a FFI returns, with perhaps some more optimization
> depending on the actual stack you're on,  but this is not something
> that's done by alloca(3).

So, how do all the systems which treat stack pages specially (no
execute perm for instance) do it?  Let's see... they know the bounds
of the stack.  All the time.

--tim

From: Thomas F. Burdick
Subject: Re: Lisp code security
Date: Mon, 06 Dec 2004 18:25:05 +0000
Message-ID: <xcvpt1njncu.fsf@conquest.OCF.Berkeley.EDU>

Chris Capel <······@iba.nktech.net> writes:

> Steven M. Haflich wrote:
> 
> > For example, car and cdr are safe because (in safe code) they are required
> > to signal error if given a non-list argument.  From the ANS: "The
> > functions car and cdr should signal type-error if they receive an argument
> > which is not a list" and "should signal" is a term of art in the Error
> > Terminology section.  But most of the arithmetic functions are not
> > required to detect and signal errors, the ANS specifying on the class of
> > any incorrect type error.  With sufficient knowledge of the implementation
> > malicious code can construct illegal objects and manipulate pointers in
> > ways that violate security.
> 
> Really? There's actually an implementation that you can do this with?
> 
> > It is possible to build a fairly secure sandbox, but you would need to
> > implement nearly the entire sandbox language on top of Lisp and keep it
> > isolated from Lisp.
> 

> I don't think, for what I'm going to use it for, implementing CL entirely on
> top of CL will be efficient enough without going all the way. The executed
> code needs to run rather quickly. And that sounds like a lot of work! I was
> hoping it would be easier than that.

A simple CL->CL compiler isn't that much work, and you'll then have a
tiny CL subset that is effectively protected from accessing the host
CL.  Then you're back to your first question of selecting a subset of
the host to expose.

> It really seems that there *should* be some sort of guaranteed type-safety
> with normal optimization settings. What you're saying is that there really
> isn't? In real implementations, there isn't?

The standard requires some things to be safe, but not everything.
Your project isn't portable, though; it might end out being
easy-to-port, but it won't be portable ANSI CL.  SBCL makes a lot of
safety guarantees for safe code -- if safe code produces unsafe
behavior, it's a bug (unless you were doing alien or sap stuff).

> > BTW, one of the hard problems is stack overflow.  Most implementations
> > detect stack overflow most of the time, and some allow the limit of stack
> > growth to be controlled, it is hard to guarantee that stack overflow will
> > be detected in all circumstances.
> 
> This is another one of those things. It seems like there *should* be some
> sort of guarantee against a stack overflow trashing the lisp image.

Standards are compromises between all the users and all the
implementors.  Not everything that every user thinks is important is
going to go in.  So, just insist on using an implementation that has
this...

> Are there extenuating circumstances where this is particularly hard? I can't
> figure out how to make an infinitely recursing function in SBCL that
> doesn't throw a control-stack-exhausted-error.

... like SBCL.  You'll probably be interested in helping with improve
heap-exhaustion behavior, though.

From: Chris Capel
Subject: Re: Lisp code security
Date: Tue, 07 Dec 2004 02:43:44 +0000
Message-ID: <10ra64uqnhn6l3b@corp.supernews.com>

Thomas F. Burdick wrote:

> Chris Capel <······@iba.nktech.net> writes:
> 
>> I don't think, for what I'm going to use it for, implementing CL entirely
>> on top of CL will be efficient enough without going all the way. The
>> executed code needs to run rather quickly. And that sounds like a lot of
>> work! I was hoping it would be easier than that.
> 
> A simple CL->CL compiler isn't that much work,

It doesn't comfort me that much that Thomas Burdick thinks a CL->CL compiler
wouldn't be that much work. :-)

On the other hand, I'm probably not clear on what you mean by "compiler".
What does it involve? Something like Pascal was talking about, where you
simply wrap functions like

(cl:defun cons (car cdr)
  (cl:if (cl:< *allocated* +max-mem+)
     (cl:cons car cdr)
     (error "Too much memory")))

(doing similar things for any other functions that cons) and

(cl:defun + (cl:&rest addends)
  (cl:dolist (x addends)
     ;;check the type and signal error if not number
     )
  (cl:apply #'cl:+ addends))

and so on? If this is what you mean by compiler, then it's not much more
than I was talking about originally. Couldn't you simply re-export many of
CL's symbols?

But I'm not clear on this point. We have Pascal over here saying

Pascal Bourguignon wrote:
> Yes if you use the system stack.
> No if you use this stack:
> 
>     (defun make-stack (size) (make-array (list size) :fill-pointer 0))
>     (defun stack-empty-p (stack) (zerop (fill-pointer stack)))
>     (defun stack-push (stack item) (vector-push item stack))
>     (defun stack-pop (stack) (vector-pop stack))
>     (defun stack-top (stack) (if (plusp (fill-pointer stack))
>                                  (aref stack (1- (fill-pointer stack)))
>                                  (error "Stack empty")))
> 
> 
> (one more good reason to implement it above common-lisp).

With an execution stack implemented entirely in CL (I think this is what
Pascal means), I don't see any other way than to completely rewrite every
function I want to include. Then I end up with some sort of
CL-to-lots-more-CL compiler that's not much faster than an interpreter. Is
this what you mean by implementing it above common lisp? 

Thomas F. Burdick wrote:
> ... and you'll then have a
> tiny CL subset that is effectively protected from accessing the host
> CL.

Would you call a subset that provided CL macros, lists, arrays, a lot of
CLOS, and all utility macros and functions "tiny"? At what point close to
"tiny" does including more functions dispreportionately increase the danger
of missing a security hole?

Chris Capel

From: Eric Daniel
Subject: Re: Lisp code security
Date: Tue, 07 Dec 2004 18:47:30 +0000
Message-ID: <EdOdna8ivMNfZyjcRVn-qw@newedgenetworks.com>

In article <···············@corp.supernews.com>, Chris Capel wrote:
>  
>  On the other hand, I'm probably not clear on what you mean by "compiler".
>  What does it involve? Something like Pascal was talking about, where you
>  simply wrap functions like
>  
>  (cl:defun cons (car cdr)
>    (cl:if (cl:< *allocated* +max-mem+)
>       (cl:cons car cdr)
>       (error "Too much memory")))

I'm not Pascal, but I think something like this would work for your
purpose. Except that you couldn't track the allocated memory this way,
because you wouldn't know when garbage collections occur. You could
however limit the amount of consing (GC-ed or not) any user program is
allowed to perform.

[...]
>  and so on? If this is what you mean by compiler, then it's not much more
>  than I was talking about originally. Couldn't you simply re-export many of
>  CL's symbols?

It depends whether you a) trust the standard for requiring all the safety
checks that matter to you, and b) the implementation for actually
implementing these checks. If you don't, you would have to include these
checks in your version of cons, +, etc.

THEN you have to trust that c) after the safety checks are passed your
implementation doesn't produce buggy machine code.

Security is annoying :-)

Personally I would not let c) bother me, but would be mindful of a) and
b).

>  But I'm not clear on this point. We have Pascal over here saying
>  
>  Pascal Bourguignon wrote:
> > Yes if you use the system stack.
> > No if you use this stack:
> > 
> >     (defun make-stack (size) (make-array (list size) :fill-pointer 0))
> >     (defun stack-empty-p (stack) (zerop (fill-pointer stack)))
> >     (defun stack-push (stack item) (vector-push item stack))
> >     (defun stack-pop (stack) (vector-pop stack))
> >     (defun stack-top (stack) (if (plusp (fill-pointer stack))
> >                                  (aref stack (1- (fill-pointer stack)))
> >                                  (error "Stack empty")))
> > 
> > 
> > (one more good reason to implement it above common-lisp).
>  
>  With an execution stack implemented entirely in CL (I think this is what
>  Pascal means), I don't see any other way than to completely rewrite every
>  function I want to include. Then I end up with some sort of
>  CL-to-lots-more-CL compiler that's not much faster than an interpreter. Is
>  this what you mean by implementing it above common lisp? 

Strictly speaking, Pascal is right. But you could also limit the number of
recursive calls at run time (insert the appropriate check in your version
of DEFUN). You would still have to check what else your implementation
uses the stack for.

>  Thomas F. Burdick wrote:
> > ... and you'll then have a
> > tiny CL subset that is effectively protected from accessing the host
> > CL.
>  
>  Would you call a subset that provided CL macros, lists, arrays, a lot of
>  CLOS, and all utility macros and functions "tiny"? At what point close to
>  "tiny" does including more functions dispreportionately increase the danger
>  of missing a security hole?

I don't think anybody knows for sure... Look at the OpenBSD guys. They
set out specifically to produce a secure OS, and they still get holes
every once in a while.

Consider this: even if you rewrite CL-on-top-of-CL, you can still
introduce your own bugs. They wouldn't be related to the system stack or
wild pointers, but they could allow, say, the user to access or modify
objects that should be protected.

This being said, I think your project can come "close enough" to being
secure. It may take more work than the rest of your anthill simulation,
but it would have uses beyond that.

Actually I may need this for some other project. Would you like to work
together on it?

-- 
Eric Daniel

From: Christopher C. Stacy
Subject: Re: Lisp code security
Date: Tue, 07 Dec 2004 20:02:24 +0000
Message-ID: <ubrd5khbj.fsf@news.dtpq.com>

1. Why not use the operating system facilities for limiting the amount 
of memory each process can use?  That sounds easier to me than
re-implementing and verifying a Common Lisp implementation to
add such a feature.

2. The way that Java addresses this problem should be instructive.

From: Chris Capel
Subject: Re: Lisp code security
Date: Wed, 08 Dec 2004 05:30:18 +0000
Message-ID: <10rd49cecao1lc3@corp.supernews.com>

Eric Daniel wrote:

> In article <···············@corp.supernews.com>, Chris Capel wrote:
>>  
>>  On the other hand, I'm probably not clear on what you mean by
>>  "compiler". What does it involve? Something like Pascal was talking
>>  about, where you simply wrap functions like
>>  
>>  (cl:defun cons (car cdr)
>>    (cl:if (cl:< *allocated* +max-mem+)
>>       (cl:cons car cdr)
>>       (error "Too much memory")))
> 
> I'm not Pascal, but I think something like this would work for your
> purpose. Except that you couldn't track the allocated memory this way,
> because you wouldn't know when garbage collections occur. You could
> however limit the amount of consing (GC-ed or not) any user program is
> allowed to perform.

That gives me another idea. Given that we restrict the safety package to
running on implementations with real threads and with
total-allocated-memory type counters, and given that only one sandboxed
process is run at a time, it'd be possible to run a loop in a high-priority
thread that checks to make sure that the sandboxed code (a) hasn't run for
too long and (b) hasn't allocated more than X bytes, and even to keep up
with the total number of bytes allocated by that process and set a limit on
that. If it runs every five thousand cycles, there's not much a process can
do in that time.

>>  Pascal Bourguignon wrote:
>> > Yes if you use the system stack.
>> > No if you use this stack:
>> > 
>> >     (defun make-stack (size) (make-array (list size) :fill-pointer 0))
>> >     (defun stack-empty-p (stack) (zerop (fill-pointer stack)))
>> >     (defun stack-push (stack item) (vector-push item stack))
>> >     (defun stack-pop (stack) (vector-pop stack))
>> >     (defun stack-top (stack) (if (plusp (fill-pointer stack))
>> >                                  (aref stack (1- (fill-pointer stack)))
>> >                                  (error "Stack empty")))
>> > 
>> > 
>> > (one more good reason to implement it above common-lisp).
>>  
>>  With an execution stack implemented entirely in CL (I think this is what
>>  Pascal means), I don't see any other way than to completely rewrite
>>  every function I want to include. Then I end up with some sort of
>>  CL-to-lots-more-CL compiler that's not much faster than an interpreter.
>>  Is this what you mean by implementing it above common lisp?
> 
> Strictly speaking, Pascal is right.

Right that this is the only way to portably protect and limit the execution
stack?

In any case, I don't think this is feasible. You end up using the host lisp
process as a virtual machine in which to implement a child lisp! CL wasn't
designed to be a VM!

> This being said, I think your project can come "close enough" to being
> secure. It may take more work than the rest of your anthill simulation,
> but it would have uses beyond that.

That's what I was thinking.

> Actually I may need this for some other project. Would you like to work
> together on it?

I think I might get some first steps published in a Arch archive pretty soon
now. But I really wanted to get a basic simulation up and going first.
After I get the simulator itself down, which shouldn't be too terribly
hard, I'd be happy to collaborate.

Chris Capel
---
(concatenate 'string "pdf" "2" "3ds" ·@" "gmail" "." "com")

From: Eric Daniel
Subject: Re: Lisp code security
Date: Wed, 08 Dec 2004 06:25:36 +0000
Message-ID: <9aKdnWZ4tJf9AyvcRVn-tQ@newedgenetworks.com>

In article <···············@corp.supernews.com>, Chris Capel wrote:
>  
>  Right that this is the only way to portably protect and limit the execution
>  stack?
>  

As far as I know:
1) there may be implementation-dependent way to know the current size
of the stack (e.g ROOM on CMUCL)

2) it doesn't guarantee safety, because what is needed is to predict that
an operation (function call, etc.) is about to blow the stack before it
happens!

So yes, for full portability you need to rool out your own call stack.
I'm not 100% sure how to do it. A good start would probably be
Chapter 20 on Graham's /On Lisp/ (Continuations).  That is, you would
perform a slightly complicated transformation on the user-provided code,
and compile the resulting Lisp code. This is, by the way, a testament to
the power of Lisp. I wouldn't even consider doing it in any other
language.

-- 
Eric Daniel

From: Pascal Bourguignon
Subject: Re: Lisp code security
Date: Wed, 08 Dec 2004 14:45:46 +0000
Message-ID: <87wtvsx2zp.fsf@thalassa.informatimago.com>

Chris Capel <······@iba.nktech.net> writes:
> Right that this is the only way to portably protect and limit the execution
> stack?
> 
> In any case, I don't think this is feasible. You end up using the host lisp
> process as a virtual machine in which to implement a child lisp! CL wasn't
> designed to be a VM!

Ah but you should not underestimate the power of virtual machines.

Common-Lisp is a general language, you can use it to implement a
virtual machine. It's even easy and fast to do. (Writting it in
Modula-2 could give a faster VM, but it would be harder and slower to
implement).

Now, the killer feature of Common-Lisp for virtual machines is JIT:
once you've got a sequence a VM instructions, you can build a
Common-Lisp function from the VM microcode and compile it, which would
produce an optimized native code for the original function.

Since you can do this with a page or two of Common-Lisp code, and
could not do this with less than a lot more of C (even if you tried to
system("gcc ...") and dlopen("lib...")). Well, if it was so simple in
C, Apple would have delivered Xcode ten years ago.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Thomas F. Burdick
Subject: Re: Lisp code security
Date: Wed, 08 Dec 2004 21:24:28 +0000
Message-ID: <xcvk6rsjxf7.fsf@conquest.OCF.Berkeley.EDU>

Chris Capel <······@iba.nktech.net> writes:

> Thomas F. Burdick wrote:
> 
> > Chris Capel <······@iba.nktech.net> writes:
> > 
> >> I don't think, for what I'm going to use it for, implementing CL entirely
> >> on top of CL will be efficient enough without going all the way. The
> >> executed code needs to run rather quickly. And that sounds like a lot of
> >> work! I was hoping it would be easier than that.
> > 
> > A simple CL->CL compiler isn't that much work,
> 
> It doesn't comfort me that much that Thomas Burdick thinks a CL->CL compiler
> wouldn't be that much work. :-)
>
> On the other hand, I'm probably not clear on what you mean by "compiler".

I mean a program that takes input in the form of s-expressions, views
that input as the source code to a CL-safe program, and produces an
equivalent CL program.  You can then compile that CL code with the
host compiler.  The guts of it might look something like this:

  (defun compile-form (form env)
    (let ((form (macroexpand form env)))
      (cond
  	((symbolp form)
  	 (if (forbidden-symbol-p form)
  	     (error "Access to this symbol is forbidden: ~S" form)
  	     form))
  	((not (listp form))
  	 (if (forbidden-object-p form)
  	     (error "Access to this object is forbidden: ~S" form)
  	     form))
  	((special-operator-p (first form))
  	 (ecase (first form)
  	   (quote
  	     (if (forbidden-object-p (second form))
  		 (error "You get the idea.")
  		 `(quote ,(second form))))
  	   (unwind-protect
  	     (destructuring-bind (protected &rest cleanup) (rest form)
  	       `(unwind-protect
  		     ,(compile-form protected env)
                  ;; We want to be able to throw past user code if we're trying
                  ;; to abort their execution.
  		  (unless *aborting*
  		    ,@(mapcar (lambda (x) (compile-form x env))
  			      cleanup)))))
  	   ;; ... etc ...
  	   ))
  	((not (symbolp (first form)))
  	 (compile-form `(funcall #',(first form) ,@(rest form)) env))
  	;; If we get here, we have a normal function call
  	((or (find-lexical-function (first form) env)
  	     (safe-function-p (first form)))
  	 (list* (first form) (mapcar (lambda (x) (compile-form x env))
  				     (rest form))))
  	(t `(funcall (get-safe-fun ',(first form))
  		     ,@(mapcar (lambda (x) (compile-form x env))
  			       (rest form)))))))

I'm kind of fudging on the issue of lexical environments above, but if
you reuse your implementation's lexical environments, you can do it
like this.  Also, you'll might want to check for the safety of macros
before you expand them.  Now you can compile CL-safe functions to CL:

  (defun safe-eval (form)
    (let ((cl (compile-form form (make-null-environment))))
      (eval cl)))

The functions SAFE-FUNCTION-P, FORBIDDEN-SYMBOL-P, and
FORBIDDEN-OBJECT-P define the allowed subset, probably by looking
things up in tables.

> Thomas F. Burdick wrote:
> > ... and you'll then have a
> > tiny CL subset that is effectively protected from accessing the host
> > CL.
> 
> Would you call a subset that provided CL macros, lists, arrays, a lot of
> CLOS, and all utility macros and functions "tiny"? At what point close to
> "tiny" does including more functions dispreportionately increase the danger
> of missing a security hole?

I meant you'll have a tiny subset before you populate the tables used
by SAFE-FUNCTION-P, etc.  At first your CL-safe subset won't be able
to call any functions except those it defines itself.  So, no CONS, no
LIST, no ERROR.  Not all that useful, but the host Lisp is protected
from it.  Then you start exposing parts of the host Lisp, to make it
useful, which is the hard part, not the compiler per se.

Oh, and in SBCL, I'd consider any access to CLOS unsafe, because PCL
can really fuck itself up, breaking any CLOS-using code in the image.

From: Nikodemus Siivola
Subject: Re: Lisp code security
Date: Tue, 07 Dec 2004 10:14:04 +0000
Message-ID: <cp3vpc$iqitf$1@midnight.cs.hut.fi>

Chris Capel <······@iba.nktech.net> wrote:

> Are there extenuating circumstances where this is particularly hard? I can't
> figure out how to make an infinitely recursing function in SBCL that
> doesn't throw a control-stack-exhausted-error.

In current SBCL you should not be able to: stack overflow protection is
implemented with a pair of protected pages at the end of the allocated stack
space: sbcl-devel definitely wants to know if you manage to bypass that
without first corrupting the image in some other way.

That said, the situation is _really_ slightly hairier. Try this on for size:

 (defun killer ()
    (unwind-protect
       (progn (killer) (killer))
     (killer)))

Not nice, eh?

Cheers,

 -- Nikodemus

From: Thomas F. Burdick
Subject: Re: Lisp code security
Date: Sun, 05 Dec 2004 16:57:29 +0000
Message-ID: <xcv3byklm2u.fsf@conquest.OCF.Berkeley.EDU>

Chris Capel <······@iba.nktech.net> writes:

> Hi everyone,
> 
> I'm interesting in defining a subset of Lisp that is safe in the sense that
> any arbitrary code written in the subset can be executed without fear of
> the code compromising the security of the system, or taking down the lisp
> image (absent impl. bugs), or accessing certain protected information in
> the lisp image, or hanging the lisp image in a tight loop, or doing other
> malicious things. The code would have to be verified to exist in that
> subset with a function that reads in the code from text (with read-time
> evaluation disabled, of course) and returns whether it's safe. Is it
> possible to define such a subset?

I think that's going about it inside-out.  Instead of trying to
sanitize Common Lisp, implement a safe Lisp dialect that runs inside a
CL image.  Then establish limits on all the resources your hosted lisp
is allowed to consume (this is obviously the tricky part, and is
definately an implementation-dependent problem).

> My initial thought would be that you just make sure every symbol is in the
> COMMON-LISP package, and that no symbol is INTERN, EVAL, COMPILE-FILE,
> MAKE-SYMBOL, or functions that do file I/O, (what am I missing?)

If you take the hosted lisp approach, you don't have to worry about,
say, EVAL, because the user's code that looks like (eval *foo*) will
be equivalent to:

  (funcall (get 'eval 'safe-lisp-fn) (get '*foo* 'safe-lisp-special))

not:

  (funcall #'eval (symbol-value '*foo*))

This came up earlier this year in this thread, which might be interesting reading:

  http://groups.google.com/groups?threadm=xcvad3ycvuw.fsf%40famine.OCF.Berkeley.EDU

An interpreter is stupid-simple to implement, but a compiler that
targets CL isn't very hard.  If you don't have a copy of _Paradigms of
Artificial Intelligence Programming_, I highly recommend getting one.

From: Wade Humeniuk
Subject: Re: Lisp code security
Date: Tue, 07 Dec 2004 05:04:45 +0000
Message-ID: <NRatd.239752$df2.147282@edtnps89>

Chris Capel wrote:
> Hi everyone,
> 
> I'm interesting in defining a subset of Lisp that is safe in the sense that
> any arbitrary code written in the subset can be executed without fear of
> the code compromising the security of the system, or taking down the lisp
> image (absent impl. bugs), or accessing certain protected information in
> the lisp image, or hanging the lisp image in a tight loop, or doing other
> malicious things. The code would have to be verified to exist in that
> subset with a function that reads in the code from text (with read-time
> evaluation disabled, of course) and returns whether it's safe. Is it
> possible to define such a subset?

I think this problem is exactly equivalent to running any arbitrary Program
within an Operating System.  Even the best Operating Systems can get compromised
by either oversight, malice or pure random chance (say a cosmic ray randomly
mutating memory).  The general problem is so hard that no OS can handle it
fully.  Perhaps you could narrow down your scope in what you
specifically want to do?

Wade

From: Chris Capel
Subject: Re: Lisp code security
Date: Tue, 07 Dec 2004 05:22:10 +0000
Message-ID: <10rafe1gekeq6b6@corp.supernews.com>

Wade Humeniuk wrote:

> Chris Capel wrote:
>> Hi everyone,
>> 
>> I'm interesting in defining a subset of Lisp that is safe in the sense
>> that any arbitrary code written in the subset can be executed without
>> fear of the code compromising the security of the system, or taking down
>> the lisp image (absent impl. bugs), or accessing certain protected
>> information in the lisp image, or hanging the lisp image in a tight loop,
>> or doing other malicious things. The code would have to be verified to
>> exist in that subset with a function that reads in the code from text
>> (with read-time evaluation disabled, of course) and returns whether it's
>> safe. Is it possible to define such a subset?
> 
> I think this problem is exactly equivalent to running any arbitrary
> Program
> within an Operating System.  Even the best Operating Systems can get
> compromised by either oversight, malice or pure random chance (say a
> cosmic ray randomly
> mutating memory).  The general problem is so hard that no OS can handle it
> fully.  Perhaps you could narrow down your scope in what you
> specifically want to do?

I'm not exactly sure the scope can be narrowed. What I'm planning on using
it for is a Terrarium-like[1] server process that communicates with other
servers over the internet automatically exchanging bits of creature AI code
and running them in a simulation. So I want to define a subset of CL that
can be verified so that the code that's exchanged can be guaranteed not to
do bad things, but can still be used to define a sophisticated and
efficient creature AI.

Terrarium itself was built by Microsoft as an example of the sort of
security built into the .NET framework. I don't know that it's exactly fair
to compare Lisp on this count, though, because the sort of code security
necessary to do this thing was one of the major goals of .NET, and it never
has been for Lisp. So Terrarium was built to showcase that aspect of .NET.
One might say it's a language designed for the application.

Of course, end the end, it's just a novelty. Maybe it would be more
important if a *real* application needed to do this sort of thing. 

Chris Capel

[1]
http://www.windowsforms.net/Applications/application.aspx?PageID=30&tabindex=8

From: Peter Seibel
Subject: Re: Lisp code security
Date: Tue, 07 Dec 2004 18:42:04 +0000
Message-ID: <m3vfberlvn.fsf@javamonkey.com>

Chris Capel <······@iba.nktech.net> writes:

> Wade Humeniuk wrote:
>
>> Chris Capel wrote:
>>> Hi everyone,
>>> 
>>> I'm interesting in defining a subset of Lisp that is safe in the sense
>>> that any arbitrary code written in the subset can be executed without
>>> fear of the code compromising the security of the system, or taking down
>>> the lisp image (absent impl. bugs), or accessing certain protected
>>> information in the lisp image, or hanging the lisp image in a tight loop,
>>> or doing other malicious things. The code would have to be verified to
>>> exist in that subset with a function that reads in the code from text
>>> (with read-time evaluation disabled, of course) and returns whether it's
>>> safe. Is it possible to define such a subset?
>> 
>> I think this problem is exactly equivalent to running any arbitrary
>> Program within an Operating System. Even the best Operating Systems
>> can get compromised by either oversight, malice or pure random
>> chance (say a cosmic ray randomly mutating memory). The general
>> problem is so hard that no OS can handle it fully. Perhaps you
>> could narrow down your scope in what you specifically want to do?
>
> I'm not exactly sure the scope can be narrowed. What I'm planning on
> using it for is a Terrarium-like[1] server process that communicates
> with other servers over the internet automatically exchanging bits
> of creature AI code and running them in a simulation.

But why do the creatures need to be implemented in full Common Lisp?
Surely they don't need to open files, manipulate pathnames, or any of
a number of other things.

> So I want to define a subset of CL that can be verified so that the
> code that's exchanged can be guaranteed not to do bad things, but
> can still be used to define a sophisticated and efficient creature
> AI.

Hmmmm. So having taken a quick glance at the Terrarium stuff I think
you're making your problem harder than the problem they solved--it
sounds like they just scan the code for certain "operations" which are
disallowed such as accessing the local disk or system registry. I
didn't see anything that suggested the Terrarium environment protects
itself from stack overflow, etc. I think you can get this level
security by walking the tree of source code you get and looking for
occurences of symbols that name bad operations. If you want to be
slick you write a real tree walker that recognizes special operators
and expands macros so you can tell if the suspect symbol is actually
being used as the name of an operation. But more conservatively you
just do a simple tree walk:

  (defun check-code (code)
    (cond
      ((null code) nil)
      ((symbolp code)
       (when (bad-name-p code))
         (error "Malicious code detected: ~s" code))
      ((atom code) nil)
      (t (check-code (car code))
         (check-code (cdr code)))))

(Of course with this implementation you've got to worry about stack
overflow while checking the code. ;-))

I'd be interested to know what Terrarium does about DOS attacks such
as creatures that allocate tons of memory or try to overflow the
stack. (The later is easier to deal with as long as the .NET runtime
signals an appropriate exception--you just have to watch out that
creature code can't touch shared objects that require mutual exclusion
to be thread safe since a creature thread could obtain a lock on an
object, temporarily violate its invariants, and then blow the stack,
leaving that object in a corrupted state. Out of memory is harder
since the final failure may affect an innocent thread that just
happens to allocate memory at the wrong time.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: Chris Capel
Subject: Re: Lisp code security
Date: Wed, 08 Dec 2004 04:40:14 +0000
Message-ID: <10rd1bg82h6uaa6@corp.supernews.com>

Peter Seibel wrote:

> Chris Capel <······@iba.nktech.net> writes:
> 
>> Wade Humeniuk wrote:
>>
>>> I think this problem is exactly equivalent to running any arbitrary
>>> Program within an Operating System. Even the best Operating Systems
>>> can get compromised by either oversight, malice or pure random
>>> chance (say a cosmic ray randomly mutating memory). The general
>>> problem is so hard that no OS can handle it fully. Perhaps you
>>> could narrow down your scope in what you specifically want to do?
>>
>> I'm not exactly sure the scope can be narrowed. What I'm planning on
>> using it for is a Terrarium-like[1] server process that communicates
>> with other servers over the internet automatically exchanging bits
>> of creature AI code and running them in a simulation.
> 
> But why do the creatures need to be implemented in full Common Lisp?
> Surely they don't need to open files, manipulate pathnames, or any of
> a number of other things.

No, not at all. But pathname and package functions are just two really big
holes. Once I've filled in all the big holes, there might be smaller ones.

>> So I want to define a subset of CL that can be verified so that the
>> code that's exchanged can be guaranteed not to do bad things, but
>> can still be used to define a sophisticated and efficient creature
>> AI.
> 
> Hmmmm. So having taken a quick glance at the Terrarium stuff I think
> you're making your problem harder than the problem they solved--it
> sounds like they just scan the code for certain "operations" which are
> disallowed such as accessing the local disk or system registry. I
> didn't see anything that suggested the Terrarium environment protects
> itself from stack overflow, etc.

The .NET runtime environment provides these guarantees. It's pretty
complicated, I understand. Full type safety, for one. Also, it assigns
different parts of the standard library to different permission groups, and
any given code can be executed with a reduced set of permissions based on
just about any kind of policy. Two assemblies (.dll's) that call each other
back and forth can run under two different permission sets, etc. etc.

I'm pretty sure it guarantees that no code that doesn't call out into
foreign functions can crash the runtime. Of course, if an application runs
out of memory, it's going down. But if a sub-process causes everything to
run out of memory, the parent can handle the OutOfMemoryException (called
when memory is low, I believe) and terminate some threads and free
resources so a GC can reclaim it all.

I was running a Terrarium server for a few months way back when it first
started. I never had it crash, or heard anything about there being
successful malware-creatures.

> (Of course with this implementation you've got to worry about stack
> overflow while checking the code. ;-))

Hehe. Good point.

> I'd be interested to know what Terrarium does about DOS attacks such
> as creatures that allocate tons of memory or try to overflow the
> stack.

Terrarium's only safety features that aren't provided by the runtime itself
were cutting off a creature's processing after a certain amount of time (2
to 5 ms). I could be wrong on this, of course.

> (The later is easier to deal with as long as the .NET runtime 
> signals an appropriate exception--you just have to watch out that
> creature code can't touch shared objects that require mutual exclusion
> to be thread safe since a creature thread could obtain a lock on an
> object, temporarily violate its invariants, and then blow the stack,
> leaving that object in a corrupted state. Out of memory is harder
> since the final failure may affect an innocent thread that just
> happens to allocate memory at the wrong time.

This is an interesting issue that I'm not sure about the answer to.

Chris Capel

From: Tim Bradshaw
Subject: Re: Lisp code security
Date: Wed, 08 Dec 2004 10:13:41 +0000
Message-ID: <ey3wtvtxfl6.fsf@cley.com>

* Chris Capel wrote:
> I'm interesting in defining a subset of Lisp that is safe in the sense that
> any arbitrary code written in the subset can be executed without fear of
> the code compromising the security of the system, or taking down the lisp
> image (absent impl. bugs), or accessing certain protected information in
> the lisp image, or hanging the lisp image in a tight loop, or doing other
> malicious things. The code would have to be verified to exist in that
> subset with a function that reads in the code from text (with read-time
> evaluation disabled, of course) and returns whether it's safe. Is it
> possible to define such a subset?

I looked at doing this for CL subsets.  My conclusion was that (a) in
practice it is probably doable in a mildly implementation-dependent
way by controlling the subset of symbols you allow (you need to rely
on the implementation catching stack overflows, being able to kill
threads in tight loops and so on), and (b) if you want it done
*properly* you need a language & runtime designed with safety in mind
from the ground up, which CL obviously was not.

Java was intended to satisfy (b) I think, but even then there are
issues both with what you need to do and just plain bugs as far as I
can tell (I've never looked at Java's security stuff).

If I was doing this now I'd use one of the `container' facilities
offered by various systems as well - *BSD jails or Solaris containers
or something.

--tim

From: Pascal Bourguignon
Subject: Re: Lisp code security
Date: Wed, 08 Dec 2004 15:11:52 +0000
Message-ID: <87pt1kx1s7.fsf@thalassa.informatimago.com>

Tim Bradshaw <···@cley.com> writes:

> * Chris Capel wrote:
> > I'm interesting in defining a subset of Lisp that is safe in the sense that
> > any arbitrary code written in the subset can be executed without fear of
> > the code compromising the security of the system, or taking down the lisp
> > image (absent impl. bugs), or accessing certain protected information in
> > the lisp image, or hanging the lisp image in a tight loop, or doing other
> > malicious things. The code would have to be verified to exist in that
> > subset with a function that reads in the code from text (with read-time
> > evaluation disabled, of course) and returns whether it's safe. Is it
> > possible to define such a subset?
> 
> I looked at doing this for CL subsets.  My conclusion was that (a) in
> practice it is probably doable in a mildly implementation-dependent
> way by controlling the subset of symbols you allow (you need to rely
> on the implementation catching stack overflows, being able to kill
> threads in tight loops and so on), and (b) if you want it done
> *properly* you need a language & runtime designed with safety in mind
> from the ground up, which CL obviously was not.
> 
> Java was intended to satisfy (b) I think, but even then there are
> issues both with what you need to do and just plain bugs as far as I
> can tell (I've never looked at Java's security stuff).
> 
> If I was doing this now I'd use one of the `container' facilities
> offered by various systems as well - *BSD jails or Solaris containers
> or something.


For a small example of (a), see my following implementation of AIM-8
lisp in Common-Lisp (and in AIM-8 lisp).  You don't need any
implementation-dependent stuff to control stack overflow, if the
underlying implementation is safe in this respect, just a
strategically placed HANDLER-CASE.  Only for timeout an implementation
specific API should be used:  WITH-TIMEOUT is provided in the
non-standard PROCESS package.


;;****************************************************************************
;;FILE:               aim-8.lisp
;;LANGUAGE:           Common-Lisp
;;SYSTEM:             Common-Lisp
;;USER-INTERFACE:     NONE
;;DESCRIPTION
;;    
;;    Implements the LISP described in AIM-8 in Common-Lisp.
;;    Usage:  (load "aim-8.lisp") 
;;            (aim-8:repl)
;;    Then at the aim-8 prompt, you have LISP, plus:
;;       (DEFINE name sexp)     corresponding to =
;;       (RELOAD)               to reload aim-8 if you edit it.
;;       (DUMP-ENVIRONMENT)     to dump the defined symbols.
;;       (LOAD "path")          to load an aim-8 source. Try "aim-8.aim-8".
;;    
;;AUTHORS
;;    <PJB> Pascal Bourguignon <···@informatimago.com>
;;MODIFICATIONS
;;    2004-10-24 <PJB> Created.
;;BUGS
;;LEGAL
;;    GPL
;;    
;;    Copyright Pascal Bourguignon 2004 - 2004
;;    
;;    This program is free software; you can redistribute it and/or
;;    modify it under the terms of the GNU General Public License
;;    as published by the Free Software Foundation; either version
;;    2 of the License, or (at your option) any later version.
;;    
;;    This program is distributed in the hope that it will be
;;    useful, but WITHOUT ANY WARRANTY; without even the implied
;;    warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
;;    PURPOSE.  See the GNU General Public License for more details.
;;    
;;    You should have received a copy of the GNU General Public
;;    License along with this program; if not, write to the Free
;;    Software Foundation, Inc., 59 Temple Place, Suite 330,
;;    Boston, MA 02111-1307 USA
;;****************************************************************************
;;    
;; AIM-8 -- 23 MARCH 1959 -- J. MCCARTHY


(DEFPACKAGE "AIM-8"
  (:USE "COMMON-LISP")
  (:EXPORT "REPL")
  (:DOCUMENTATION
   "Implements the lisp of AIM-8 -- 23 MARCH 1959 -- J. McCarthy"));;AIM-8
(IN-PACKAGE "AIM-8")


(DEFPACKAGE "AIM-8-USER"
  (:USE)
  (:IMPORT-FROM "AIM-8"
                "DEFINE" "LAMBDA" "LABEL"
                "COND"  "COMBINE" "FIRST" "REST"
                "NULL" "ATOM" "EQ" "NIL" "T" "QUOTE"));;AIM-8-USER

(DEFPARAMETER *ENVIRONMENT* (MAKE-HASH-TABLE :TEST (FUNCTION EQ)))
(DEFMACRO DEF     (NAME)       `(GETHASH ,NAME *ENVIRONMENT*))
(DEFUN   %BOUNDP  (NAME) (MULTIPLE-VALUE-BIND (VAL BND) (DEF NAME)
                          (DECLARE (IGNORE VAL)) BND))
(DEFMACRO DEFINE  (NAME VALUE) `(SETF (GETHASH ',NAME *ENVIRONMENT*) ',VALUE))
(DEFUN   FDEFINE  (NAME VALUE)  (SETF (GETHASH NAME *ENVIRONMENT*) VALUE))

(DEFINE NIL ())
(DEFINE F   ())
(DEFINE T   T)
(DEFINE AND     (LAMBDA (A B) (COND (A (COND (B T) (T NIL))) (T NIL))))
(DEFINE OR      (LAMBDA (A B) (COND (A T) (B T) (T NIL))))
(DEFINE NOT     (LAMBDA (A)   (COND (A NIL) (T NIL))))
(DEFINE MAPLIST 
        (LAMBDA (X F)
          (COND ((NULL X) NIL)
                (T (COMBINE (F X) (MAPLIST (REST X) F))))))
(DEFINE SUBST 
        (LAMBDA (X Y A)
          (COND ((NULL A) NIL)
                ((ATOM A) (COND ((EQ Y A) X) (T A)))
                (T (COMBINE (SUBST X Y (FIRST A))
                            (SUBST X Y (REST A))))
                )));;SUBST


(DEFUN %SUBST (X Y A)
  (COND ((NULL A) NIL)
        ((ATOM A) (COND ((EQ Y A) X) (T A)))
        (T (CONS (%SUBST X Y (FIRST A)) (%SUBST X Y (REST A))))))


(DEFUN %SUBSQ (X Y Z)
  (COND ((NULL Z) NIL)
        ((ATOM Z) (COND ((EQ Y Z) X)  (T Z)))
        ((EQ (FIRST Z) 'QUOTE) Z)
        (T (CONS (%SUBSQ X Y (FIRST Z)) (%SUBSQ X Y (REST Z))))));;%SUBSQ


(DEFUN %EVCON (C)
  (COND ((%EVAL (FIRST (FIRST C))) (%EVAL (FIRST (REST (FIRST C)))))
        (T (%EVCON (REST C)))))


(DEFUN %EVLAM (VARS EXP ARGS)
  (COND ((NULL VARS) (%EVAL EXP))
        (T (%EVLAM (REST VARS) (%SUBSQ (FIRST ARGS) (FIRST VARS) EXP)
                   (REST ARGS)))))


(DEFUN %APPLY (F ARGS) (%EVAL (CONS F ARGS)))


(DEFUN %EVAL (E)
  (COND
    ;; begin extensions:
    ((ATOM E) (COND ((%BOUNDP E) (DEF E))
                    (T (ERROR "Undefined: ~A" (FIRST E)))))
    ;; end extensions.
    (T (CASE (FIRST E)
         ((NULL)    (NULL  (%EVAL (FIRST (REST E)))))
         ((ATOM)    (ATOM  (%EVAL (FIRST (REST E)))))
         ((QUOTE)   (FIRST (REST E)))
         ((EQ)      (EQ    (%EVAL (FIRST (REST E)))
                           (%EVAL (FIRST (REST (REST E))))))
         ((COMBINE) (CONS  (%EVAL (FIRST (REST E)))
                           (%EVAL (FIRST (REST (REST E))))))
         ((FIRST)   (FIRST (%EVAL (FIRST (REST E)))))
         ((REST)    (REST  (%EVAL (FIRST (REST E)))))
         ((COND)    (%EVCON (REST E)))
         ;; begin extensions:
         ((LOAD)    (LOAD  (%EVAL (FIRST (REST E)))))
         ((PRINT)   (PRINT (%EVAL (FIRST (REST E)))))
         ((READ)    (READ))
         (OTHERWISE
          (COND
            ((ATOM (FIRST E))
             (COND ((%BOUNDP (FIRST E)) (%APPLY (DEF (FIRST E)) (REST E)))
                   (T (ERROR "Undefined: ~A" (FIRST E)))))
            ;; end extensions.
            (T (CASE (FIRST (FIRST E))
                 ((LAMBDA) (%EVLAM (FIRST (REST (FIRST E)))
                              (FIRST (REST (REST (FIRST E))))
                              (REST E)))
                 ((LABEL) (%EVAL (CONS (%SUBST (FIRST E)
                                               (FIRST (REST (FIRST E)))
                                               (FIRST (REST (REST (FIRST E)))))
                                       (REST E))))
                 (OTHERWISE (ERROR "Invalid: ~A" (FIRST E)))))))))));;%EVAL



(DEFUN HELP ()
  (FORMAT T "~&You've got:  
    DEFINE LAMBDA LABEL
    COND AND OR NOT  COMBINE FIRST REST
    NULL ATOM EQ NIL T QUOTE
    QUIT"));;HELP


(DEFMACRO HANDLING-ERRORS (&BODY BODY)
  `(HANDLER-CASE (PROGN ,@BODY)
     (ERROR (ERR)
            (APPLY (FUNCTION FORMAT) *ERROR-OUTPUT*
                   (SIMPLE-CONDITION-FORMAT-CONTROL ERR)
                   (SIMPLE-CONDITION-FORMAT-ARGUMENTS ERR)))));;HANDLING-ERRORS

(DEFUN REPL ()
  (LET ((*PACKAGE* (FIND-PACKAGE "AIM-8")))
    (HELP)
    (LOOP
       (TERPRI)
       (PRINC "AIM-8> ")
       (HANDLING-ERRORS
        (LET ((SEXP (READ)))
          (COND
            ((EQUAL SEXP '(QUIT))
             (FORMAT T "GOOD BYE") (RETURN-FROM REPL))
            ((EQUAL SEXP '(RELOAD))
             (LOAD "aim-8") (REPL) (RETURN-FROM REPL))
            ((EQUAL SEXP '(DUMP-ENVIRONMENT))
             (FORMAT T ·······@A = ~A~%~}" 
                     (LET ((RES '()))
                       (MAPHASH (LAMBDA (K V) (PUSH (LIST K V) RES)) 
                                *ENVIRONMENT*) RES)))
            ((AND (LISTP SEXP) (EQ (FIRST SEXP) 'DEFINE))
             (FDEFINE (SECOND SEXP) (THIRD SEXP))
             (FORMAT T "~A" (SECOND SEXP)))
            (T 
             (FORMAT T "~S" (%EVAL SEXP))))))))
  (TERPRI)
  (VALUES));;REPL
         


;; (in-package "COMMON-LISP-USER")
;; (aim-8:repl)

;;;; aim-8.lisp                       --                     --          ;;;;






;; -*- mode: Lisp -*-
;;****************************************************************************
;;FILE:               aim-8.aim-8
;;LANGUAGE:           AIM-8 LISP
;;SYSTEM:             AIM-8 LISP
;;USER-INTERFACE:     NONE
;;DESCRIPTION
;;    
;;    Implements the LISP described in AIM-8 in AIM-8 LISP
;;    Usage: At the AIM-8> prompt:  
;;      (LOAD (QUOTE "aim-8.aim-8"))
;;      (REPL)
;;    The AIM-8 evaluation algorithm is pure substitution, no binding!
;;    Therefore (read) is called every time it's _used_:
;;    you need to enter the same sexp several times to get it processed!
;;
;;      (EVAL (QUOTE (FF1 (QUOTE ((())()()((A)B C)))))
;;            (QUOTE ((FF1 LAMBDA (X) (COND ((OR (NULL X) (ATOM X)) X) 
;;                                 (T (COND ((FF1 (FIRST X)) (FF1 (FIRST X)))
;;                                          (T (FF1 (REST X)))))))))
;;   Note: Since this implements an substitution interpreter on an interpreter
;;   on a lisp system, it usually quite slow to compute the result of non
;;   trivial functions.
;;
;;AUTHORS
;;    <PJB> Pascal Bourguignon <···@informatimago.com>
;;MODIFICATIONS
;;    2004-10-24 <PJB> Created;
;;BUGS
;;LEGAL
;;    GPL
;;    
;;    Copyright Pascal Bourguignon 2004 - 2004
;;    
;;    This program is free software; you can redistribute it and/or
;;    modify it under the terms of the GNU General Public License
;;    as published by the Free Software Foundation; either version
;;    2 of the License, or (at your option) any later version.
;;    
;;    This program is distributed in the hope that it will be
;;    useful, but WITHOUT ANY WARRANTY; without even the implied
;;    warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
;;    PURPOSE.  See the GNU General Public License for more details.
;;    
;;    You should have received a copy of the GNU General Public
;;    License along with this program; if not, write to the Free
;;    Software Foundation, Inc., 59 Temple Place, Suite 330,
;;    Boston, MA 02111-1307 USA
;;****************************************************************************

;; NOTE: These functions are defined in the AIM-8/Common-Lisp environment
;;       and don't appear in the AIM-8/AIM-8 environment. 
;;       With (repl), you start with an empty environment...


(DEFINE NIL ())
(DEFINE F   ())
(DEFINE T   T)
(DEFINE AND     (LAMBDA (A B) (COND (A (COND (B T) (T NIL))) (T NIL))))
(DEFINE OR      (LAMBDA (A B) (COND (A T) (B T) (T NIL))))
(DEFINE NOT     (LAMBDA (A)   (COND (A NIL) (T NIL))))

(DEFINE MAPLIST 
        (LAMBDA (X F)
          (COND ((NULL X) NIL)
                (T (COMBINE (F X) (MAPLIST (REST X) F))))))

(DEFINE SUBST 
        (LAMBDA (X Y A)
          (COND ((NULL A) NIL)
                ((ATOM A) (COND ((EQ Y A) X) (T A)))
                (T (COMBINE (SUBST X Y (FIRST A))
                            (SUBST X Y (REST A)))))));;SUBST

(DEFINE SUBSQ
        (LAMBDA (X Y Z)
          (COND ((NULL Z) NIL)
                ((ATOM Z) (COND ((EQ Y Z) X)  (T Z)))
                ((EQ (FIRST Z) (QUOTE QUOTE)) Z)
                (T (COMBINE (SUBSQ X Y (FIRST Z)) (SUBSQ X Y (REST Z)))))));;SUBSQ


(DEFINE EVCON 
        (LAMBDA (C ENV)
          (COND ((EVAL (FIRST (FIRST C)) ENV)
                 (EVAL (FIRST (REST (FIRST C))) ENV))
                (T (EVCON (REST C) ENV)))));;EVCON


(DEFINE EVLAM 
        (LAMBDA  (VARS EXP ARGS ENV)
          (COND ((NULL VARS) (EVAL EXP ENV))
                (T (EVLAM (REST VARS) (SUBSQ (FIRST ARGS) (FIRST VARS) EXP)
                          (REST ARGS) ENV)))));;EVLAM


(DEFINE APPLY (LAMBDA (F ARGS ENV) (EVAL (COMBINE F ARGS) ENV)))


(DEFINE ASSOC
        (LAMBDA (ITEM LIST)
          (COND ((NULL LIST) LIST)
                ((EQ ITEM (first (first LIST))) (first LIST))
                (T (ASSOC ITEM (rest LIST))))));;ASSOC


(DEFINE APPEND 
        (LAMBDA (L1 L2)
          (COND ((NULL L2) L1)
                ((NULL L1) L2)
                (T (COMBINE (FIRST L1) (APPEND (REST L1) L2))))));;APPEND

(DEFINE LIST1 (LAMBDA (ON)       (COMBINE ON NIL)))
(DEFINE LIST2 (LAMBDA (ON TW)    (COMBINE ON (COMBINE TW NIL))))
(DEFINE LIST3 (LAMBDA (ON TW TH) (COMBINE ON (COMBINE TW (COMBINE TH NIL)))))
        
(DEFINE GET (LAMBDA (NAME ENV) (rest (ASSOC NAME ENV))))
(DEFINE SET (LAMBDA (NAME VALUE ENV) (COMBINE (COMBINE NAME VALUE) ENV)))
(DEFINE BOUNDP (LAMBDA (NAME ENV) (ASSOC NAME ENV)))

(DEFINE ERROR  (LAMBDA (MESSAGE VALUES) (COMBINE (QUOTE :ERROR)
                                            (APPEND MESSAGE VALUES))))


(DEFINE 
 EVAL 
 (LAMBDA (E ENV)
   (COND
     ((ATOM E) (COND ((BOUNDP E ENV) (GET E ENV))
                     (T (ERROR (LIST1 (QUOTE :UNDEFINED))
                               (LIST1 (FIRST E))))))
     ((EQ (FIRST E) (QUOTE NULL))    (NULL  (EVAL (FIRST (REST E)) ENV)))
     ((EQ (FIRST E) (QUOTE ATOM))    (ATOM  (EVAL (FIRST (REST E)) ENV)))
     ((EQ (FIRST E) (QUOTE QUOTE))   (FIRST (REST E)))
     ((EQ (FIRST E) (QUOTE EQ))      (EQ    (EVAL (FIRST (REST E)) ENV)
                                            (EVAL (FIRST (REST (REST E))) ENV)))
     ((EQ (FIRST E) (QUOTE COMBINE)) (COMBINE (EVAL (FIRST (REST E)) ENV)
                                            (EVAL (FIRST (REST (REST E))) ENV)))
     ((EQ (FIRST E) (QUOTE FIRST))   (FIRST (EVAL (FIRST (REST E)) ENV)))
     ((EQ (FIRST E) (QUOTE REST))    (REST  (EVAL (FIRST (REST E)) ENV)))
     ((EQ (FIRST E) (QUOTE COND))    (EVCON (REST E)))
     ((EQ (FIRST E) (QUOTE LOAD))    (LOAD  (EVAL (FIRST (REST E)) ENV)))
     ((EQ (FIRST E) (QUOTE PRINT))   (PRINT (EVAL (FIRST (REST E)) ENV)))
     ((EQ (FIRST E) (QUOTE READ))    (READ))
     ((ATOM (FIRST E))
      (COND ((BOUNDP (FIRST E) env) (APPLY (GET (FIRST E) ENV) (REST E) ENV))
            (T (ERROR (LIST1 (QUOTE :UNDEFINED))
                      (LIST1 (FIRST E))))))
     ((EQ (FIRST (FIRST E)) (QUOTE LAMBDA))
      (EVLAM (FIRST (REST (FIRST E)))
             (FIRST (REST (REST (FIRST E))))
             (REST E) ENV))
     ((EQ (FIRST (FIRST E)) (QUOTE LABEL))
      (EVAL (COMBINE (SUBST (FIRST E)
                         (FIRST (REST (FIRST E)))
                         (FIRST (REST (REST (FIRST E)))))
                  (REST E)) ENV))
     (T (ERROR (LIST1 (QUOTE :INVALID))
               (LIST1 (FIRST E)))))));;EVAL

(DEFINE REPL
         (LAMBDA (ENV)
           (COND
             ((PRINT (LIST3 (QUOTE AIM-8) (QUOTE IN) (QUOTE AIM-8>)))
              ((LAMBDA (SEXP ENV)
                 (COND
                   ((ATOM SEXP) (PRINT (EVAL SEXP ENV)) (REPL))
                   ((EQ (FIRST SEXP) (QUOTE DEFINE))
                    (REPL (SET (FIRST (REST SEXP))
                               (FIRST (REST (REST SEXP))))))
                   ((EQ (FIRST SEXP) (QUOTE QUIT))
                    (PRINT (QUOTE GOOD-BYE)))
                   (T (PRINT (EVAL SEXP ENV)) (REPL)))) (READ) ENV)))));;REPL

                          
        
;;;; aim-8.aim-8                      --                     --          ;;;;





-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Tim Bradshaw
Subject: Re: Lisp code security
Date: Wed, 08 Dec 2004 17:20:32 +0000
Message-ID: <ey3oeh4yae7.fsf@cley.com>

* Pascal Bourguignon wrote:

> For a small example of (a), see my following implementation of AIM-8
> lisp in Common-Lisp (and in AIM-8 lisp).  You don't need any
> implementation-dependent stuff to control stack overflow, if the
> underlying implementation is safe in this respect, just a
> strategically placed HANDLER-CASE.  

That's what I meant by `mildly implementation-dependent': you need to
know that the implementation is safe. In particular the code should
refuse to run on one that is not.

--tim