Is this legal ANSI CL behaviour?

From: Thomas Schilling
Subject: Is this legal ANSI CL behaviour?
Date: Sun, 24 Oct 2004 19:37:00 +0000
Message-ID: <opsgd3fyt71gy3cn@news.cis.dfn.de>

Since I have the general experience that CL is so well-designed that when  
functions are given not-so-sane arguments they are mostly defaulted to do  
the Right Thing (tm).

So this seems to me like a bug in acl62:

   CL-USER> (subseq "abcd" 0 40)
   ······@······@··@·····@······@··@·····@······@··@abcd"

I would have expected it to return "abcd", that is automatically realizing  
that 40 > (length "abcd") and choosing the latter as the upper bound. Or  
at least to signal an error like clisp does.)

regards

-ts

Re: Is this legal ANSI CL behaviour? Christopher C. Stacy
Re: Is this legal ANSI CL behaviour? Vladimir Sedach
Re: Is this legal ANSI CL behaviour? Harald Hanche-Olsen
Re: Is this legal ANSI CL behaviour? Adam Warner
- Re: Is this legal ANSI CL behaviour? Christopher C. Stacy
  - Re: Is this legal ANSI CL behaviour? Adam Warner
- Re: Is this legal ANSI CL behaviour? Vladimir Sedach

From: Christopher C. Stacy
Subject: Re: Is this legal ANSI CL behaviour?
Date: Sun, 24 Oct 2004 20:23:23 +0000
Message-ID: <u4qkjc1c4.fsf@news.dtpq.com>

"Thomas Schilling" <······@yahoo.de> writes:

> Since I have the general experience that CL is so well-designed that
> when  functions are given not-so-sane arguments they are mostly
> defaulted to do  the Right Thing (tm).
> 
> So this seems to me like a bug in acl62:
> 
>    CL-USER> (subseq "abcd" 0 40)
>    ······@······@··@·····@······@··@·····@······@··@abcd"
> 
> I would have expected it to return "abcd", that is automatically
> realizing  that 40 > (length "abcd") and choosing the latter as the
> upper bound. Or  at least to signal an error like clisp does.)

I don't think it's an ANSI compliance bug.

It's an error for the programmer to reference past the end of a sequence;
the consequences are undefined.  Many implementations signal an error.
Some sequence functions (for example, ELT) "Should signal an error of
type type-error if index is not a valid sequence index for sequence.".

I'm not sure how one easily learns all that using CLHS.
I found it out by starting with the X3J13 cleanup issue
SUBSEQ-OUT-OF-BOUNDS referenced on the SUBSQ.
(I'm therefore not 100% sure myself that it's true.)

Your suggestion of truncating the result sequence to the active length
doesn't seem like better solution.  Since SUBSEQ is not documented to
signal an error in this situation, you must not be relying on that to 
allow handling of the problem at runtime.  Therefore, you must be expecting
that this out-of-bounds situation represents a bug in the program.  
I think that returning the semi-garbage answer you see above is actually
slightly more likely to help you debug than returning the truncated answer.
If you want a truncating version of SUBSEQ, you should make the explicit in
your program by giving it a new name.

I do think it would be optimal runtime behaviour if an error were signalled.
I am not sure what the advantage is of not doing so, at normal safety levels.
When the cleanup issue was written, all the "current practice" implementations
signalled an error.

From: Vladimir Sedach
Subject: Re: Is this legal ANSI CL behaviour?
Date: Sun, 24 Oct 2004 21:17:38 +0000
Message-ID: <87y8hww39f.fsf@shawnews.cg.shawcable.net>

Hi,

Take a look at the SUBSEQ-OUT-OF-BOUNDS issue in the HyperSpec.
http://www.lisp.org/HyperSpec/Issues/iss332-writeup.html

It didn't become part of the official standard, so in fact ACL is free
to give you that result (looks to me like it's just doing a mod length
on the string index and getting some header as a result - look at what
disassemble says to be sure). Like Harald pointed out, chances are
this is because you have a low safety declaration. I'm running ACL 6.2
trial on Linux, and I haven't been able to reproduce this behavior (it
always throws an error). However CMUCL 19a returns "abcd" when you
compile the form in a function, and throws an error in interpreted
code, regardless of declarations. So these things are definitely
implementation-dependent ("pragmatically unportable code" as the issue
calls it).

Vladimir

From: Harald Hanche-Olsen
Subject: Re: Is this legal ANSI CL behaviour?
Date: Sun, 24 Oct 2004 19:52:33 +0000
Message-ID: <pcoy8hv51xa.fsf@shuttle.math.ntnu.no>

+ "Thomas Schilling" <······@yahoo.de>:

| Since I have the general experience that CL is so well-designed that
| when  functions are given not-so-sane arguments they are mostly
| defaulted to do  the Right Thing (tm).
| 
| So this seems to me like a bug in acl62:
| 
|    CL-USER> (subseq "abcd" 0 40)
|    ······@······@··@·····@······@··@·····@······@··@abcd"
| 
| I would have expected it to return "abcd", that is automatically
| realizing  that 40 > (length "abcd") and choosing the latter as the
| upper bound. Or  at least to signal an error like clisp does.)

I guess it would depend on the currently declared safety level.  Of
course, this is Invoking Undefined Behaviour, and a conforming
implementation might cause the universe to implode or all the toilets
in the White house to flush as a result.  But I think the only two
reasonable results would be the one you're seeing (if safety=0), or a
signaled error (if safety is larger).  In fact, checking the array
bounds seems like such a small task compared to allocating a new array
for the result anyway, that it would be reasonable to do the checking
in any case, and signal an error when appropriate.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- Debating gives most of us much more psychological satisfaction
  than thinking does: but it deprives us of whatever chance there is
  of getting closer to the truth.  -- C.P. Snow

From: Adam Warner
Subject: Re: Is this legal ANSI CL behaviour?
Date: Sun, 24 Oct 2004 22:27:07 +0000
Message-ID: <pan.2004.10.24.22.27.06.997912@consulting.net.nz>

Hi Thomas Schilling,

> Since I have the general experience that CL is so well-designed that when  
> functions are given not-so-sane arguments they are mostly defaulted to do  
> the Right Thing (tm).
> 
> So this seems to me like a bug in acl62:
> 
>    CL-USER> (subseq "abcd" 0 40)
>    ······@······@··@·····@······@··@·····@······@··@abcd"
> 
> I would have expected it to return "abcd", that is automatically realizing  
> that 40 > (length "abcd") and choosing the latter as the upper bound. Or  
> at least to signal an error like clisp does.)

Regardless of the outcome of literal interpretations of the ANSI
specification this is unacceptable behaviour in safe code.

After Telnetting into prompt.franz.com and playing around with some
examples:

[5] CL-USER(42): (compile nil (lambda ()
                                (declare (optimize (safety 3)))
                                'secret-password
                                (subseq "abcd" 101 116)))
[5] CL-USER(43): (funcall *)
"SECRET-PASSWORD"

This should be fixed as a security bug.

You can probably also use this bug to extract information about the
running process. By playing with some bounds I came across these paths:

/acl/alisp.dxl
/acl/devel.lic
/home/user5001/
/home/lisp/acl/acli623.tpl

A user who is granted access to a safely compiled version of SUBSEQ should
not be able to use the command to extract such information.

Regards,
Adam

From: Christopher C. Stacy
Subject: Re: Is this legal ANSI CL behaviour?
Date: Mon, 25 Oct 2004 05:25:02 +0000
Message-ID: <uzn2b9xox.fsf@news.dtpq.com>

Adam Warner <······@consulting.net.nz> writes:
> A user who is granted access to a safely compiled version of SUBSEQ
> should not be able to use the command to extract such information.

ANSI Common Lisp doesn't make any representations about anything 
like that -- it doesn't say anything about not being able to subvert
an implementation in any way, and in fact carefully says that entirely
undefined things (like that) can happen in many, many situations.

I guess it would be nice if it did what you suggest, though.

From: Adam Warner
Subject: Re: Is this legal ANSI CL behaviour?
Date: Mon, 25 Oct 2004 07:44:02 +0000
Message-ID: <pan.2004.10.25.07.43.59.123949@consulting.net.nz>

Hi Christopher C. Stacy,

> Adam Warner <······@consulting.net.nz> writes:
>> A user who is granted access to a safely compiled version of SUBSEQ
>> should not be able to use the command to extract such information.
> 
> ANSI Common Lisp doesn't make any representations about anything like
> that -- it doesn't say anything about not being able to subvert an
> implementation in any way, and in fact carefully says that entirely
> undefined things (like that) can happen in many, many situations.
> 
> I guess it would be nice if it did what you suggest, though.

It is implementations that make representations about how they handle
potentially undefined behaviour. My ideal ANSI conforming implementation
would be as safe as Java (bounds checking) with "Safe code" and as fast as
C (no bounds checking) with "Unsafe code".

ANSI Common Lisp doesn't have to change for an implementation to perform
sufficient bounds checking of safe code. It's certainly nice if an
implementation performs sufficient checking of safe code because then you
get a security model with information hiding and enforced APIs for free:
Closures.

Regards,
Adam

From: Vladimir Sedach
Subject: Re: Is this legal ANSI CL behaviour?
Date: Mon, 25 Oct 2004 20:41:14 +0000
Message-ID: <87y8hug8lj.fsf@shawnews.cg.shawcable.net>

It looks like the reason I haven't been able to reproduce this
behavior is that indeed this bug has been fixed (and, IMO, it's a
pretty nasty one!) My (lisp-implementation-version) => "6.2 [Linux
(x86)] (Aug 3, 2004 17:03)," but the one on prompt.franz.com is "6.2
[Linux (x86)] (May 21, 2003 12:21)"

Vladimir