From: Adam Warner
Subject: read-sequence
Date: 
Message-ID: <ala6ak$1opqvu$1@ID-105510.news.dfncis.de>
Hi all,

The ANSI definition of read-sequence has been interpreted as allowing
adjustable vectors with fill pointers to be ignored:

http://groups.google.co.nz/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&th=52b954f6340ea11f&seekm=HAIBLE.95Aug21192430%40laplace.ilog&frame=off

I just want to confirm that there is no safe way to use read-sequence on
*terminal-io* because there is no way to find out the size of the
*terminal-io* buffer.

Thanks,
Adam

From: Steven M. Haflich
Subject: Re: read-sequence
Date: 
Message-ID: <3D7C3507.4040405@alum.mit.edu>
Adam Warner wrote:
> The ANSI definition of read-sequence has been interpreted as allowing
> adjustable vectors with fill pointers to be ignored:
> 
> http://groups.google.co.nz/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&th=52b954f6340ea11f&seekm=HAIBLE.95Aug21192430%40laplace.ilog&frame=off

I don't follow the alleged argument.  In detail, citing from the ANS:

 From the read-sequence dictionary page:

   start, end - _bounding index designators_ of sequence.
   The defaults for start and end are 0 and nil, respectively.

 From the Glossary entry for _bounding index designators_:

   bounding index designator:
   (for a sequence) one of two objects that, taken together as an ordered pair,
   behave as a designator for bounding indices of the sequence; that is, they
   denote bounding indices of the sequence, and are either: an integer
   (denoting itself) and nil  (denoting the _length_ of the sequence), or two
   integers (each denoting themselves).

 From the Glossary entry for _length_:

   length:
   n. (of a sequence) the number of elements in the sequence. (Note that if
   the sequence is a vector with a fill pointer, its length is the same as the
   fill pointer even though the total allocated size of the vector might be
   larger.)

It clearly follows that read-sequence observes but does not modify the
fill-pointer of a vector will fill pointer.  Nor may it adjust the vector
it is given.

> I just want to confirm that there is no safe way to use read-sequence on
> *terminal-io* because there is no way to find out the size of the
> *terminal-io* buffer.

This premise makes no sense at all.  The notion of a stream "buffer" is
addressed nowhere in the ANS (except for the exiatence of force-output and
friends).  Whether any given stream is or is not buffered is implementation
dependent, and none of the portable behavior of streams depends on
buffering.  In particular, the buffer that any particular stream might or
not have has nothing to do with the sequence argument to read-sequence.
Whether the read-sequence request can be satisfied with data from an
internal buffer, or whether the buffer must be refilled many times to
satisfy the request, is all invisible to the calling code.  read-sequence
can read fewer elements than requested only in the case of eof.  If the
implementation's array-dimension-limit allows you to allocate a string
of 2^20 characters, than a read-sequence to that string is required to
fill that entire string before returning, unless eof is encountered,
even if the stream is *terminal-io* connected to a slow typist.

read-sequence is not the low-level buffering function you imagine it to
be.  read-sequence can be used (with care) to speed IO, but it is
basically a failed attempt to integrate efficient, buffered IO within
the original CLtL1 IO paradigm.  Usually when one writes an application
that depends upon high-speed high-volume IO, it will be necessary to
operate below the level of the existing portable ANS API.
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <alhga4$1pdkke$1@ID-105510.news.dfncis.de>
Hi Steven M. Haflich,

> Adam Warner wrote:
>> The ANSI definition of read-sequence has been interpreted as allowing
>> adjustable vectors with fill pointers to be ignored:
>> 
>> http://groups.google.co.nz/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&th=52b954f6340ea11f&seekm=HAIBLE.95Aug21192430%40laplace.ilog&frame=off

<snip>

> It clearly follows that read-sequence observes but does not modify the
> fill-pointer of a vector will fill pointer.  Nor may it adjust the
> vector it is given.

Fine, you concur with the interpretation that I loosely paraphrased. It
was wishful thinking on my part that a vector with a fill pointer could be
extended by read-sequence. I had to check archives to confirm my
misinterpretation of the spec.
 
>> I just want to confirm that there is no safe way to use read-sequence
>> on *terminal-io* because there is no way to find out the size of the
>> *terminal-io* buffer.
> 
> This premise makes no sense at all. <snip>

> read-sequence is not the low-level buffering function you imagine it to
> be.  read-sequence can be used (with care) to speed IO, but it is
> basically a failed attempt to integrate efficient, buffered IO within
> the original CLtL1 IO paradigm.  Usually when one writes an application
> that depends upon high-speed high-volume IO, it will be necessary to
> operate below the level of the existing portable ANS API.

Let me explain why it could make sense:

In CGI programming data being POSTed is sent to the CGI program via the
terminal. It would be ideal to read this data in one step into a sequence
using read-sequence. But unlike a file on a hard disk there appears to be
no way to determine the size of the data. We conceptually know it is a
block of data produced by one web request. But I know of no way to find
out the size of that data so I can first set the size of the vector.

If read-sequence would just auto-extend the vector this would not be a
problem.

There is a way around this. You use read-sequence in stages, manually
extending the vector each time if there is still data remaining on stdin.

Lisp is so high level that I just get surprised when this kind of
functionality is not built in. You have these really cool data types like
adjustable vectors with fill pointers that it would be nice to leverage.

If the functionality did exist it would probably be named
READ-SEQUENCE-EXTEND (to mirror the handy VECTOR-PUSH-EXTEND).

Regards,
Adam
From: Len Charest
Subject: Re: read-sequence
Date: 
Message-ID: <alj4n2$i34$1@nntp1.jpl.nasa.gov>
Adam Warner wrote:

> In CGI programming data being POSTed is sent to the CGI program via the
> terminal. [...]  But unlike a file on a hard disk there appears to be
> no way to determine the size of the data. We conceptually know it is a
> block of data produced by one web request. But I know of no way to find
> out the size of that data so I can first set the size of the vector.

If it's CGI, you have the CONTENT_LENGTH environment variable. If you 
don't have CONTENT_LENGTH, it's not CGI.

http://www.w3.org/CGI/
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <alj8bv$1qk8oh$1@ID-105510.news.dfncis.de>
Hi Len Charest,

> Adam Warner wrote:
> 
>> In CGI programming data being POSTed is sent to the CGI program via the
>> terminal. [...]  But unlike a file on a hard disk there appears to be
>> no way to determine the size of the data. We conceptually know it is a
>> block of data produced by one web request. But I know of no way to find
>> out the size of that data so I can first set the size of the vector.
> 
> If it's CGI, you have the CONTENT_LENGTH environment variable. If you
> don't have CONTENT_LENGTH, it's not CGI.
> 
> http://www.w3.org/CGI/

Yes the CONTENT_LENGTH environment variable is even one of the ways of
distinguishing a POST request from a GET request. Furthermore it is the
canonical way of reading in the data because you will not be told when
there is an EOF:

http://hoohoo.ncsa.uiuc.edu/cgi/forms.html

   If your form has METHOD="POST" in its FORM tag, your CGI program will
   receive the encoded form input on stdin. The server will NOT send you
   an EOF on the end of the data, instead you should use the environment
   variable CONTENT_LENGTH to determine how much data you should read from
   stdin.

Even though a single READ-LINE on x-www-form-urlencoded seems to work I
should follow this advice. If I don't perhaps stdin could run dry before
all the data has been read.

Regards,
Adam
From: Erann Gat
Subject: Re: read-sequence
Date: 
Message-ID: <gat-0909021633030001@k-137-79-50-101.jpl.nasa.gov>
In article <···············@ID-105510.news.dfncis.de>, "Adam Warner"
<······@consulting.net.nz> wrote:

> Hi Len Charest,
> 
> > Adam Warner wrote:
> > 
> >> In CGI programming data being POSTed is sent to the CGI program via the
> >> terminal. [...]  But unlike a file on a hard disk there appears to be
> >> no way to determine the size of the data. We conceptually know it is a
> >> block of data produced by one web request. But I know of no way to find
> >> out the size of that data so I can first set the size of the vector.
> > 
> > If it's CGI, you have the CONTENT_LENGTH environment variable. If you
> > don't have CONTENT_LENGTH, it's not CGI.
> > 
> > http://www.w3.org/CGI/
> 
> Yes the CONTENT_LENGTH environment variable is even one of the ways of
> distinguishing a POST request from a GET request. Furthermore it is the
> canonical way of reading in the data because you will not be told when
> there is an EOF:
> 
> http://hoohoo.ncsa.uiuc.edu/cgi/forms.html
> 
>    If your form has METHOD="POST" in its FORM tag, your CGI program will
>    receive the encoded form input on stdin. The server will NOT send you
>    an EOF on the end of the data, instead you should use the environment
>    variable CONTENT_LENGTH to determine how much data you should read from
>    stdin.
> 
> Even though a single READ-LINE on x-www-form-urlencoded seems to work I
> should follow this advice. If I don't perhaps stdin could run dry before
> all the data has been read.
> 
> Regards,
> Adam

Or, even worse, the client could send you *more* than the advertised
number of bytes without an EOL as a denial-of-service attack.  If you care
about security you should never use anything but read-char or read-byte
directly on a TCP stream.

E.
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <aljdru$1p3l3s$1@ID-105510.news.dfncis.de>
Hi Erann Gat,

>> Yes the CONTENT_LENGTH environment variable is even one of the ways of
>> distinguishing a POST request from a GET request. Furthermore it is the
>> canonical way of reading in the data because you will not be told when
>> there is an EOF:
>> 
>> http://hoohoo.ncsa.uiuc.edu/cgi/forms.html
>> 
>>    If your form has METHOD="POST" in its FORM tag, your CGI program will
>>    receive the encoded form input on stdin. The server will NOT send you
>>    an EOF on the end of the data, instead you should use the environment
>>    variable CONTENT_LENGTH to determine how much data you should read from
>>    stdin.
>> 
>> Even though a single READ-LINE on x-www-form-urlencoded seems to work I
>> should follow this advice. If I don't perhaps stdin could run dry before
>> all the data has been read.
>> 
>> Regards,
>> Adam
> 
> Or, even worse, the client could send you *more* than the advertised
> number of bytes without an EOL as a denial-of-service attack.  If you care
> about security you should never use anything but read-char or read-byte
> directly on a TCP stream.

Apache is the front end shielding me from many of those concerns Erann. I just
have to read the data from stdin:

;;Check for a type of POST request (*post-raw-data* becomes non-nil)
;;At this stage I only accept CONTENT_TYPE=application/x-www-form-urlencoded
;;
;;If I expand my parsing to a binary encoded request I will need to set
;;terminal encoding to a faithfully reproducing character set. It is not
;;necessary in this case because x-www-form-urlencoded data is encoded in
;;ASCII and compatible with my default UTF-8 terminal encoding.
;;
(defparameter *post-raw-data* nil)
(when (and (string= (getenv "CONTENT_TYPE") "application/x-www-form-urlencoded")
	   (getenv "CONTENT_LENGTH"))
  (setf *post-raw-data* (make-string (parse-integer (getenv "CONTENT_LENGTH"))))
  (read-sequence *post-raw-data* *terminal-io*))

Note that GETENV is a CLISP extension.

I could check for a CONTENT_LENGTH "too large" and abort but there is now no
chance of reading more data than CONTENT_LENGTH. However I strongly
suspect Apache would never supply more data than CONTENT_LENGTH. Still, it
is better to have security in depth.

Thanks for the advice.

Regards,
Adam
From: Erann Gat
Subject: Re: read-sequence
Date: 
Message-ID: <gat-0909022328290001@192.168.1.50>
In article <···············@ID-105510.news.dfncis.de>, "Adam Warner"
<······@consulting.net.nz> wrote:

> Apache is the front end shielding me from many of those concerns

Oh, yeah, right.  Duh.  CGI.

I did all my Lisping on a Mac where luxuries such as Apache weren't
available in my day (pre OSX), so I had to write my own server.  That got
me locked into a certain mindset, particularly with regards to security
(which has become a rather touchy issue around here of late.  :-(

E.
From: Tim Bradshaw
Subject: Re: read-sequence
Date: 
Message-ID: <ey3it1e2sok.fsf@cley.com>
* Erann Gat wrote:

> I did all my Lisping on a Mac where luxuries such as Apache weren't
> available in my day (pre OSX), so I had to write my own server.  That got
> me locked into a certain mindset, particularly with regards to security
> (which has become a rather touchy issue around here of late.  :-(

I'd *still* try and be paranoid - just because there's Apache between
you and the world doesn't mean it's not full of bugs (:-).

--tim
From: John Wiseman
Subject: Re: read-sequence
Date: 
Message-ID: <m2fzwhn2ce.fsf@server.local.lemon>
"Adam Warner" <······@consulting.net.nz> writes:

> Even though a single READ-LINE on x-www-form-urlencoded seems to
> work I should follow this advice. If I don't perhaps stdin could run
> dry before all the data has been read.

I'm guessing that Using read-line is also pretty non-portable because
it uses the newline conventions of the lisp implementation, not HTTP
(which uses ASCII CRLF).


John Wiseman
From: Christopher Browne
Subject: Re: read-sequence
Date: 
Message-ID: <allg9n$1qoor5$4@ID-125932.news.dfncis.de>
Oops! John Wiseman <·······@server.local.lemon> was seen spray-painting on a wall:
> "Adam Warner" <······@consulting.net.nz> writes:
>
>> Even though a single READ-LINE on x-www-form-urlencoded seems to
>> work I should follow this advice. If I don't perhaps stdin could run
>> dry before all the data has been read.
>
> I'm guessing that Using read-line is also pretty non-portable because
> it uses the newline conventions of the lisp implementation, not HTTP
> (which uses ASCII CRLF).

I'd expect to be able to modify Newline on a stream-by-stream basis.

That certainly _seems_ to be something that an External File Format
would be intended for...
-- 
(concatenate 'string "cbbrowne" ·@ntlug.org")
http://cbbrowne.com/info/sap.html
Did you  hear about the  Buddhist who refused his  dentist's novocaine
during root canal work? He wanted to transcend dental medication.
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <alm92k$1rkkvt$1@ID-105510.news.dfncis.de>
Hi John Wiseman,

>> Even though a single READ-LINE on x-www-form-urlencoded seems to work I
>> should follow this advice. If I don't perhaps stdin could run dry
>> before all the data has been read.
> 
> I'm guessing that Using read-line is also pretty non-portable because it
> uses the newline conventions of the lisp implementation, not HTTP (which
> uses ASCII CRLF).

As I mentioned earlier John any newline characters are escaped in
urlencoded form. That's why read-line reads in all available data. So the
fact that HTTP uses CRLF is a job for the decoder function, not read.

For example this is a result from entering three lines of text in a
textarea named address:

   address=Line+1%0D%0ALine+2%0D%0ALine+3

The CRLF combinations are escaped as hex (and the spaces become +). The
fact that the lines are separated by %0D%0A isn't going to bother
read-line.

Regards,
Adam
From: Erik Naggum
Subject: Re: read-sequence
Date: 
Message-ID: <3240579789398792@naggum.no>
* Adam Warner
| There is a way around this. You use read-sequence in stages, manually
| extending the vector each time if there is still data remaining on stdin.

  This is massively wasteful.  If you have this kind of need, which indicates
  that something is wrong at the outset, read a bunch of chunks and
  concatenate them at the end.  The function `concatenate� should compute the
  sum of the lengths of its arguments and copy each chunck only once.

| Lisp is so high level that I just get surprised when this kind of
| functionality is not built in.

  It is because it is high-level that you do not need these things.  It
  appears that you are still thinking in very low-level terms.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Rob Warnock
Subject: Re: read-sequence
Date: 
Message-ID: <unq3haff9ukt4c@corp.supernews.com>
Adam Warner <······@consulting.net.nz> wrote:
+---------------
| In CGI programming data being POSTed is sent to the CGI program via the
| terminal. It would be ideal to read this data in one step into a sequence
| using read-sequence. But unlike a file on a hard disk there appears to be
| no way to determine the size of the data. We conceptually know it is a
| block of data produced by one web request. But I know of no way to find
| out the size of that data so I can first set the size of the vector.
+---------------

Read RFC 2616 "Hypertext Transfer Protocol -- HTTP/1.1"
<URL:ftp://ftp.isi.edu/in-notes/rfc2616.txt>:

    4.3 Message Body
	...
	The presence of a message-body in a request is signaled by the
	inclusion of a Content-Length or Transfer-Encoding header field
	in the request's message-headers.
	...
    4.4 Message Length
	...
	When a message-body is included with a message, the transfer-length
	of that body is determined by one of the following (in order of
	precedence):

	1. Any response message which "MUST NOT" include a message-body...

	2. ...Transfer-Encoding header field... [e.g., "chunked" encoding,
	   which carries a length marker pre chunk]

	3. ...Content-Length header field...

	4. If the message uses the media type "multipart/byteranges",
	   and the transfer-length is not otherwise specified, then
	   this self-delimiting media type defines the transfer-length.

	5. By the server closing the connection. (Closing the connection
	   cannot be used to indicate the end of a request body, since
	   that would leave no possibility for the server to send back
	   a response.)
    ...
    7.2.2 Entity Length
	The entity-length of a message is the length of the message-body
	before any transfer-codings have been applied. Section 4.4 defines
	how the transfer-length of a message-body is determined.
    ...
    9.5 POST
	The POST method is used to request that the origin server accept
	the entity enclosed in the request...

That is, the client side is *required* to send the length -- in some form
or another, whether Content-Length, chunked transfers, or byteranges --
for any entity (message body) it sends with a "POST". If the client uses
Content-Length, then you know *before* you start reading the data exactly
how long it is. If it uses chunked or byteranges, you at least know how
big each chunk or ramge is before reading it (and may need to concatentate
them later).

If your server is not prepared to deal with the full generality of
HTTP/1.1 chunks/ranges, you might want to consider having it declare
itself as an HTTP/1.0 server, in which case:

	<URL:ftp://ftp.isi.edu/in-notes/rfc1945.txt>
        ...
	8.3 POST
	    A valid Content-Length is required on all HTTP/1.0 POST
	    requests. An HTTP/1.0 server should respond with a 400
	    (bad request) message if it cannot determine the length
	    of the request message's content.

Almost all clients these days will still send an HTTP/1.1 "Host:" header
even when talking to an HTTP/1.0 server, which still allows virtual
domains (virutal servers) to work, so don't let that bother you too much.

Or if your CL program isn't the server itself, but is running via
CGI under some other server (such as Apache), then AFAIK that server
is required to "gather up" any chunked input and present it to you
in a single blob, the same way an HTTP/1.0 server would, with the
"CONTENT_LENGTH" environment variable set to the entity length.

[The CL implementation you're using *does* provide some way to get
at the CGI-mandated <URL:http://hoohoo.ncsa.uiuc.edu/cgi/env.html>
environment variables, yes?]

+---------------
| If read-sequence would just auto-extend the vector this would not be
| a problem.
+---------------

You don't need this. As noted above, the client is required to tell
you how big the data is.


-Rob

-----
Rob Warnock, PP-ASEL-IA		<····@rpw3.org>
627 26th Avenue			<URL:http://www.rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <alj78d$1qdskr$1@ID-105510.news.dfncis.de>
Hi Rob Warnock,

> That is, the client side is *required* to send the length -- in some
> form or another, whether Content-Length, chunked transfers, or
> byteranges -- for any entity (message body) it sends with a "POST". If
> the client uses Content-Length, then you know *before* you start reading
> the data exactly how long it is. If it uses chunked or byteranges, you
> at least know how big each chunk or ramge is before reading it (and may
> need to concatentate them later).

...

> You don't need this. As noted above, the client is required to tell you
> how big the data is.

Thanks! As you say I need to read the RFCs.

To make matters worse, one thing I forgot to mention is that with
application/x-www-form-urlencoded requests newline characters are escaped
as something else. This means that all the raw data can be read in using a
single READ-LINE, with no need to even pre-initialise a sequence to a
certain length!

Regards,
Adam
From: Steven M. Haflich
Subject: Re: read-sequence
Date: 
Message-ID: <3D7E03A7.90803@franz.com>
Adam Warner wrote:

> Fine, you concur with the interpretation that I loosely paraphrased. It
> was wishful thinking on my part that a vector with a fill pointer could be
> extended by read-sequence. I had to check archives to confirm my
> misinterpretation of the spec.

The issues with Content-Length and GCI have pretty much been beaten into
the ground by now, but I don't want to let drop this question about
whether (perhaps for other purposes) read-sequence could have or should
have been defined to extend the target vector automatically.  It raises
interesting language design issues.

As read-sequence is currently defined, the length of the sequence specifies
exactly how much data read-sequence will collect, unless eof occurs first.

If read-sequence automatically extended its argument array, then there is
no limit how large a malicious data source could cause the vector to grow.
So this feature would be an obvious and pernicious vulnerability for
denial of service attacks.  That much is obvious.

But even in applications where malicious data sources are not an issue,
one still must examine what is gained by this feature.  read-sequence
exists only for efficiency; otherwise, it is entirely equivalent to
simlpe iteration with calls to read-byte or read-char.  As a portable
interface to IO efficiency, I believe it is a failure, since the
circumstances under which it will be able to optimize throughput are
not portably defined, nor are they easily predicted by an unsophisticated
programmer.

But even so, what wuold be gained by allowing read-sequence to operate in
the manner of vector-push-extend?  I suspect nothing.  read-sequence can
sometimes attain efficiency because it can call or rewrite to lower-level
code that efficiently transfers data in bulk from the underlying data
stream to the vector, without copying the data element by element.  It
does so based on the manifest length of the argument array.  Without the
requested feature of automatic vector extension, an application program
facing data of unknown must either read the data laboriously element by
element, or else must read in large read-sequence chunks, then if the
data exceeds the size of an indivisual chunk, either develop an internal
protocol to deal with collections of chunks, or else repeatedly
copy/append the data from each chunk into some cumulative chunk.

This copying and appending can be expensive.  It has N^2 behavior if
dont stupidly, and 2*N behavior if done intelligently.  But it doesn't
have 1*N behavior, such as one would desire of read-sequence.

So wouldn't it be neat to have read-sequence extend arrays automatically?

No.  CL is defined around a double-handful of special operators, and a
larger but still reasonably-cmopact set of primitive operators (e.g.
cons, car, cdr, aref) that could not be written by the application
programmer.  But there are a great number of additional operators that
exist for one of two different reasons: (1) They express non-primitive
but common operations that programmers could write themselves but which
are needed repeatedly by many programmers need repeatedly (e.g. dotimes,
member, position, with-open-file); (2) they are operators which while not
themselves primitive, can be coded more efficiently by the implementation
than portably by the user programmer, because the implementation is free
to use its special knowlegde of the implementation and make non-portable
assumptions about datatypes, memory management, environment, etc.

Now, for all current practical hardware, extending a vector (when the
fill-pointer exceeds the allocation) requires copying the data from the
original array storage top the new extended storage, assuming the CL
implementation is unable to get from the OS fine-grained control over the
memory mapping hardware, or is too sane to try.

So asking read-sequence to extend vectors automatically, in addition to
being dangerous, doesn't magically execute with zero cost just because
the application programmer can't count the clock cycles expended in his
code.  Given that read-sequence is already somewhat a failed function,
it would not be good to further compound the opaqueness of its execution
efficiency.

> Lisp is so high level that I just get surprised when this kind of
> functionality is not built in. You have these really cool data types like
> adjustable vectors with fill pointers that it would be nice to leverage.

read-sequence exists _entirely_ for efficiency, not for expressivity.
It is semantically exactly equivalent to an obvious iteration oven
the sequence executing repeated read-{char-byte} calls.  While it _is_
therefore higher-level than that equivalent loop, that isn't the reason
it exists.  Giving it the ability to extend its argument sequence would --
I suppose -- make it yet higher-level, but would not be compatible with
the high efficiency which was the motivation for its addition to the
language back around 1991.
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <pan.2002.09.11.12.14.44.481.379@consulting.net.nz>
Steven M. Haflich wrote:

> As read-sequence is currently defined, the length of the sequence
> specifies exactly how much data read-sequence will collect, unless eof
> occurs first.
> 
> If read-sequence automatically extended its argument array, then there
> is no limit how large a malicious data source could cause the vector to
> grow. So this feature would be an obvious and pernicious vulnerability
> for denial of service attacks.  That much is obvious.

It is equally obvious that a READ-SEQUENCE-EXTEND could accept an optional
argument for the maximum size of the extended vector is this was an issue.

So long as a computer is configured with copious amounts of virtual memory
then reading in a huge file that is partially swapped out to disk still
works even if it doesn't turn out to be as efficient as implementing the
buffering yourself.

<big snip>

> This copying and appending can be expensive.  It has N^2 behavior if
> dont stupidly, and 2*N behavior if done intelligently.  But it doesn't
> have 1*N behavior, such as one would desire of read-sequence.
> 
> So wouldn't it be neat to have read-sequence extend arrays
> automatically?
> 
> No.  CL is defined around a double-handful of special operators, and a
> larger but still reasonably-cmopact set of primitive operators (e.g.
> cons, car, cdr, aref) that could not be written by the application
> programmer.  But there are a great number of additional operators that
> exist for one of two different reasons: (1) They express non-primitive
> but common operations that programmers could write themselves but which
> are needed repeatedly by many programmers need repeatedly (e.g. dotimes,
> member, position, with-open-file); (2) they are operators which while
> not themselves primitive, can be coded more efficiently by the
> implementation than portably by the user programmer, because the
> implementation is free to use its special knowlegde of the
> implementation and make non-portable assumptions about datatypes, memory
> management, environment, etc.
> 
> Now, for all current practical hardware, extending a vector (when the
> fill-pointer exceeds the allocation) requires copying the data from the
> original array storage top the new extended storage, assuming the CL
> implementation is unable to get from the OS fine-grained control over
> the memory mapping hardware, or is too sane to try.
> 
> So asking read-sequence to extend vectors automatically, in addition to
> being dangerous, doesn't magically execute with zero cost just because
> the application programmer can't count the clock cycles expended in his
> code.  Given that read-sequence is already somewhat a failed function,
> it would not be good to further compound the opaqueness of its execution
> efficiency.

And a second optional argument would take away this opaqueness: you could
specify the block increments that the vector would be adjusted by. Doesn't
such an argument embody the essential tradeoff: how much space do we want
to waste each time in preallocating the vector so that we minimise the
number of times the vector has to be adjusted.

And the implementation could provide the most efficient method to achieve
this that's available.

Furthermore in implementations like CLISP it is typically faster to use
one higher level function than to reimplement it using a number of lower
level functions. So the built in functionality could gain you efficiency.
This would probably not apply to native code compilers.

>> Lisp is so high level that I just get surprised when this kind of
>> functionality is not built in. You have these really cool data types
>> like adjustable vectors with fill pointers that it would be nice to
>> leverage.
> 
> read-sequence exists _entirely_ for efficiency, not for expressivity. It
> is semantically exactly equivalent to an obvious iteration oven the
> sequence executing repeated read-{char-byte} calls.  While it _is_
> therefore higher-level than that equivalent loop, that isn't the reason
> it exists.  Giving it the ability to extend its argument sequence would
> -- I suppose -- make it yet higher-level, but would not be compatible
> with the high efficiency which was the motivation for its addition to
> the language back around 1991.

Talking about read/write efficiency, I have to live with the fact that
CLISP prohibits binary data to be sent or received through the terminal
streams (stdin/stdout). Think about how that impacts upon CGI programming!
You have to read or send all binary data using a faithfully reproducing
character set, and convert binary data to/from this character set. At
least CLISP implements the extension functions CONVERT-STRING-FROM-BYTES
and CONVERT-STRING-TO-BYTES.

Regards,
Adam
From: Tim Bradshaw
Subject: Re: read-sequence
Date: 
Message-ID: <ey37khkkwwm.fsf@cley.com>
* Adam Warner wrote:

> Furthermore in implementations like CLISP it is typically faster to use
> one higher level function than to reimplement it using a number of lower
> level functions. So the built in functionality could gain you efficiency.
> This would probably not apply to native code compilers.

I think it's worth pointing out that this kind of feature of
implementations is regarded by many as pretty undesirable.  If its not
possible to make your own code as fast as primitively-provided
functions then it's a barrier to writing your own abstractions - better
to use lots of primitives, because they will be faster.

Obviously it's hard for a byte-coded or interpreted implementation to
get around this.  My approach when using these kinds of system is to
try and ignore the speed differences as far as possible, but that's
not really a good solution because it then becomes very hard to do
things like profiling.

The classic example of this was in the early days of CLOS (well, the
early days of something-a-bit-like-CLOS being available, in the late
80s) when there were often really large differences between the
performance of CLOS and older facilities (obviously a factor of a few
between GF call and normal function call is OK, but a factor of 100 or
1000 really isn't).  Happily this seems to be mostly a non-issue now -
it was for CMUCL for a while, but I measured some small things
recently and its CLOS now seems to be pretty much OK.

(There are always some things where you don't expect performance to be
reasonable - for instance I don't think it really matters that class
redefinition probably takes a while on most implementations because
it's not really something you expect to do in inner loops.  It is kind
of annoying that changing the class of instances is often very slow
(or was 5 years ago...).)

--tim
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <am8ime$3lrjk$1@ID-105510.news.dfncis.de>
Hi Tim Bradshaw,

> * Adam Warner wrote:
> 
>> Furthermore in implementations like CLISP it is typically faster to use
>> one higher level function than to reimplement it using a number of
>> lower level functions. So the built in functionality could gain you
>> efficiency. This would probably not apply to native code compilers.
> 
> I think it's worth pointing out that this kind of feature of
> implementations is regarded by many as pretty undesirable.  If its not
> possible to make your own code as fast as primitively-provided functions
> then it's a barrier to writing your own abstractions - better to use
> lots of primitives, because they will be faster.
> 
> Obviously it's hard for a byte-coded or interpreted implementation to
> get around this.  My approach when using these kinds of system is to try
> and ignore the speed differences as far as possible, but that's not
> really a good solution because it then becomes very hard to do things
> like profiling.

I think this kind of feature being undesirable is overstated:

1. As a consequence of the design the range of hardware and operating
system support is far better. I'm not even tied to x86-32. Perhaps it's
also the reason that CLISP has superb Unicode support (cf CMUCL).

2. CLISP runs uncompiled code extremely fast compared to native code
compilers like CMUCL. There are dynamic situations where it is just not
helpful having to recompile everything in advance to get somewhat decent
speed.

3. CLISP's memory consumption is extraordinary. It will run in as little
as 2MiB of RAM and has very fast startup.

4. It encourages you to write your own abstractions using the highest
abstractions already available. This leads to easy to understand code. As
Sam Steingold recently remarked on the CLISP list:

   Actually, CLISP favors abstraction much more than some other lisps.

   E.g. in most lisps it is faster to open-code by hand, while in CLISP it
   is not.

   E.g., (list* 1 2 3 4 5) is (probably) faster in CLISP than the
   equivalent (cons 1 (cons 2 (cons 3 (cons 4 5)))) (which is probably
   faster, in, say, ACL).

   A simple way to estimate the speed of a compiled function is to look at
   it's disassembly.� The longer the slower.� So if you can use a single
   built-in function instead of a bunch of simpler built-ins, you probably
   win.

Tim, why is better for the implementation to encourage the wholesale
rewriting of built in functionality?

> The classic example of this was in the early days of CLOS (well, the
> early days of something-a-bit-like-CLOS being available, in the late
> 80s) when there were often really large differences between the
> performance of CLOS and older facilities (obviously a factor of a few
> between GF call and normal function call is OK, but a factor of 100 or
> 1000 really isn't).  Happily this seems to be mostly a non-issue now -
> it was for CMUCL for a while, but I measured some small things recently
> and its CLOS now seems to be pretty much OK.

I can turn your reasoning around. If this kind of functionality is
supplied by CLISP it should be slower to attempt to reimplement it
yourself, because you can only leverage byte code instead of the natively
implemented code. You are being encoraged to abstract further. Not
reimplement.

When you say abstraction you seem to mean being able to rewrite the basic
functionality. When I say abstraction I essentially mean being able to
build upon the already provided functionality.

It is a relief to know what when I use a built-in CLISP function that it
is generally the fastest way to achieve the functionality. So I abstract
upon the basis of these built in functions instead of more primitive Lisp
functions.

Regards,
Adam
From: Bulent Murtezaoglu
Subject: Re: read-sequence
Date: 
Message-ID: <87znug2jex.fsf@acm.org>
>>>>> "AW" == Adam Warner <······@consulting.net.nz> writes:
[...]
    AW> 2. CLISP runs uncompiled code extremely fast compared to
    AW> native code compilers like CMUCL. There are dynamic situations
    AW> where it is just not helpful having to recompile everything in
    AW> advance to get somewhat decent speed.  [...]

Is this substantiated somewhere?

    AW> 4. It encourages you to write your own abstractions using the
    AW> highest abstractions already available. This leads to easy to
    AW> understand code. As Sam Steingold recently remarked on the
    AW> CLISP list:  [... I deleted SS's argument as the above is a fair 
    AW> summary]

I don't buy this.  If I want to write my own abstaction, it just might be 
the case that there's a good reason for it.  It is a perfectly reasonable 
trade-off to choose a byte-coded implementation for _other_ reasons.  
What I object to is the presentation of the ensuing disparity in efficiency
between built-in's and similar user-provided functions as an advantage.

[...]
    AW> Tim, why is better for the implementation to encourage the
    AW> wholesale rewriting of built in functionality? [...]

I don't think Tim said that.  That an implementation enables you to roll 
your own analogues to the built-in's without much loss in efficiency does 
not mean that it is encouraging you reinvent the wheel.  It just is not 
getting in the way if you choose to do that.

cheers,

BM
From: Tim Bradshaw
Subject: Re: read-sequence
Date: 
Message-ID: <ey3wupjk6id.fsf@cley.com>
* Bulent Murtezaoglu wrote:

> Is this substantiated somewhere?

But, really, who gives a damn if it does run interpreted code fast?  I
used (a)kcl on Vaxen and Sun 3/50s, and on those machines you really
wanted to not have to compile things too much because it took ages -
cranking up the C compiler and the linker on a ~1 MIPS machine without
enough real memory was something you really did not want to do too
often (which was OK: you *couldn't* do it too often because it took
about 60 seconds to recompile a function).  But Lisp compilers now are
practically instantaneous: the time it takes to recompile a function
- even a large complicated function - is completely dominated by the
time it takes to press the keys to get it recompiled (even on old slow
HW like our ~300MHz Sun).  And in a decent environment that's not very
long.  I essentially never run interpreted definitions, because the
compiler helps me find bugs.  At the other end of the scale, it takes
me ~25 seconds to compile and load my ~20,000 line system from cold
(~5 seconds to load it when precompiled): this isn't very long.

--tim
From: Bulent Murtezaoglu
Subject: Re: read-sequence
Date: 
Message-ID: <87u1kn2lee.fsf@acm.org>
>>>>> "TimB" == Tim Bradshaw <···@cley.com> writes:

    TimB> * Bulent Murtezaoglu wrote:
    >> Is this substantiated somewhere?

    TimB> But, really, who gives a damn if it does run interpreted
    TimB> code fast?  

Not me.  The OP was talking about 'dynamic situations' which I didn't 
take to mean interpreted code during development.  I was hoping maybe 
there was an interesting run-time code generation issue he had in mind
where calling the compiler didn't make sense.

    TimB> ...  I essentially never run interpreted
    TimB> definitions, because the compiler helps me find bugs.  

Someone in a particularly bratty mood might take this sentence out of 
context and re-start the ML vs. Lisp thread.  
  
    TimB> At
    TimB> the other end of the scale, it takes me ~25 seconds to
    TimB> compile and load my ~20,000 line system from cold (~5
    TimB> seconds to load it when precompiled): this isn't very long.

You must be doing substantial initialization at load time.  5 s load only 
seems too long on HW that can complile 20k LOC in 25 s.

cheers,

BM
From: Tim Bradshaw
Subject: Re: read-sequence
Date: 
Message-ID: <ey3k7lil0qa.fsf@cley.com>
* Bulent Murtezaoglu wrote:
> You must be doing substantial initialization at load time.  5 s load only 
> seems too long on HW that can complile 20k LOC in 25 s.

yes, lots of big structures getting set up by the expansions of
macros.

--tim
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <amb8ev$4cg3b$1@ID-105510.news.dfncis.de>
Hi Bulent Murtezaoglu,

>>>>>> "TimB" == Tim Bradshaw <···@cley.com> writes:
> 
>     TimB> * Bulent Murtezaoglu wrote:
>     >> Is this substantiated somewhere?
> 
>     TimB> But, really, who gives a damn if it does run interpreted TimB>
>     code fast?
> 
> Not me.  The OP was talking about 'dynamic situations' which I didn't
> take to mean interpreted code during development.  I was hoping maybe
> there was an interesting run-time code generation issue he had in mind
> where calling the compiler didn't make sense.

Tim Bradshaw used an argument that computers are now so comparatively fast
as a reason to never choose to run interpreted code!

The run-time code generation issue Bulent that I can expand upon but have
not explicitly mentioned [because even though I have been working on it
for months the sites that employ this CMS are yet to go live] is that I am
wrapping code around what I call a proGrammable Document Format (GDF, not
PDF for obvious reasons). These are Lisp programs in disguise, and the
results are evaluated on the fly.

This document format has a functional design because you don't print
output within the format. Instead functions and macros return values that
become part of the end document. A simple string is a GDF document. And
that string can be manipulated by the wrapper functions to, for example,
transparently escape characters that are not safe in the chosen output
format (e.g. you need to escape the { character when generating a plain
LaTeX string).

Since these programs are created and evaluated on the fly and will
generate different output depending upon the output format specified and
whether the document generates dynamic data (e.g. the current date) there
are situations where it makes little sense to compile the GDF output in
advance because the results will either be inapplicable or just plain
stale.

These practical considerations along with CLISP's very good Unicode
support, fast startup and small footprint make it an idea choice to
generate this document output on the fly.

Regards,
Adam
From: Tim Bradshaw
Subject: Re: read-sequence
Date: 
Message-ID: <ey3elbqkzoh.fsf@cley.com>
* Adam Warner wrote:

> Tim Bradshaw used an argument that computers are now so comparatively fast
> as a reason to never choose to run interpreted code!

yes, and I meant it seriously.  We have an application - a documention
system - that lets you define little (or sometimes not so little)
macros which end up as lisp functions.  These can be defined in the
source document, so you can say something like:

    <def :name title
         :value "My document">

And this ends up somewhere as something like

     (lambda ()
       (block title
        "My document"))

The question is do you want to bother compiling these tiny fragments,
especially if you only use them once?  I did some measurements of this
and I found that it was hard to find a case where the time cost of
compiling things was easy to measure - it was completely in the noise
compared to reading and parsing the document, say.  So now I don't
stress, I just compile everything.

In response to another message on this thread: yes I do think the
compiler can *help* find bugs.  But I don't take the little-green-man
position that it can find *all* bugs.

--tim
From: Thomas F. Burdick
Subject: Re: read-sequence
Date: 
Message-ID: <xcvn0qdx68l.fsf@hurricane.OCF.Berkeley.EDU>
Well, I have one example of a time you want a good interpreter: I have
some machine-generated code that must be interpreted.  You can't
compile it -- if you try, you discover that the person who wrote the
code-generator was very confused on the issues of different
environments and times in CL.  He understands them now (I think).  At
some point he and I are going to go through and clean up his code so
it can be compiled, but right now we're pretty happy that CLISP
interprets code quickly, because it lets us use this code--which
behaves correctly when interpreted--and move on to working on the rest
of the project.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <am8pbn$3e7mv$1@ID-105510.news.dfncis.de>
Hi Bulent Murtezaoglu,

>     2. CLISP runs uncompiled code extremely fast compared to native code
>     compilers like CMUCL. There are dynamic situations where it is just
>     not helpful having to recompile everything in advance to get
>     somewhat decent speed.  [...]
> 
> Is this substantiated somewhere?

Only by my limited experience. I understand it is a widely agreed that
CLISP's interpreter is generally more efficient than CMUCL's. It makes
sense: The majority of effort in CMUCL has gone into efficient native code
compiling.

I can only make up arbitrary examples on the spot. You are welcome to do
likewise. Here's one:

   (defun test ()
      (loop for x from 1 to 1000 do
         (loop for y from 1 to 1000 collect (+ x y))))

Running (time (test)) on CLISP 2.30 takes about 7.7s when uncompiled and
1.1s when byte compiled on my system. In comparison CMUCL 18d+ takes 25s
when uncompiled and 0.3s when compiled.

>     AW> 4. It encourages you to write your own abstractions using the
>     AW> highest abstractions already available. This leads to easy to
>     AW> understand code. As Sam Steingold recently remarked on the AW>
>     CLISP list:  [... I deleted SS's argument as the above is a fair AW>
>     summary]
> 
> I don't buy this.  If I want to write my own abstaction, it just might
> be the case that there's a good reason for it.  It is a perfectly
> reasonable trade-off to choose a byte-coded implementation for _other_
> reasons. What I object to is the presentation of the ensuing disparity
> in efficiency between built-in's and similar user-provided functions as
> an advantage.
> 
> [...]
>     AW> Tim, why is better for the implementation to encourage the AW>
>     wholesale rewriting of built in functionality? [...]
> 
> I don't think Tim said that.  That an implementation enables you to roll
> your own analogues to the built-in's without much loss in efficiency
> does not mean that it is encouraging you reinvent the wheel.  It just is
> not getting in the way if you choose to do that.

If you can gain a speed advantage by rewriting built in functionality
using more primitive forms then there is an incentive to use those more
primitive forms in your code. Especially if you can create smarter
algorithms than the implementators (you'd have to be amazingly smarter to
beat the built-in functionality of a byte-compiled system).

Yes I would like this ability as I could create code in CLISP that
performs as fast as the implementator's code. But not having this ability
does encourage me to fully learn the language. The more I learn of the
language the faster the code becomes as I learn how to put the higher
level pieces together.

Regards,
Adam
From: Christophe Rhodes
Subject: Re: read-sequence
Date: 
Message-ID: <sqfzw73cxp.fsf@lambda.jcn.srcf.net>
"Adam Warner" <······@consulting.net.nz> writes:

> Hi Tim Bradshaw,
> 
> > Obviously it's hard for a byte-coded or interpreted implementation to
> > get around this.  My approach when using these kinds of system is to try
> > and ignore the speed differences as far as possible, but that's not
> > really a good solution because it then becomes very hard to do things
> > like profiling.
> 
> I think this kind of feature being undesirable is overstated:
> 
> 1. As a consequence of the design the range of hardware and operating
> system support is far better. I'm not even tied to x86-32. 

I don't buy this, sorry.

Having spend a certain amount of time in the CMUCL/SBCL backends, I
would like to point out firstly that these systems are far from being
"tied to x86-32"; they run on SPARC, Alpha, MIPS (both endiannesses),
HPPA, PPC; under Linux, Solaris, Irix, Tru64, and HP-UX.  Allegro CL
runs under a similar number of platforms, too, and that also has the
all-important Windows support :-)

But that's more by the by; the question is whether the design of CLISP
helps this.  To a certain extent it must; the byte code strategy means
that there's less work to do to support a new platform. However, the
essential thought that needs to be done to support a new platform is,
I suspect, close to the same in all of these implementations: the ABI
must be understood, otherwise there are unexplained segfaults on
various platforms; the OS interface must be used properly, and so on.
Once this is done properly, the rest is straightforward.

> Perhaps it's also the reason that CLISP has superb Unicode support
> (cf CMUCL).

Perhaps.  If you had said that the easy recompilation of CLISP was a
factor in making it easily hackable, I might have agreed with you.  But
I don't think you can claim that CLISP has superb Unicode support
because it's an implementation that byte-compiles; as far as I can
see, there's no connection at all (otherwise, maybe CLISP would also
have superb ANSI standard support, as opposed to the missing
functionality in its CLOS implementation[*])

Cheers,

Christophe

[*] workaroundable by using PCL; and, to be fair, it has improved in
the last couple of years. It's still missing CHANGE-CLASS and useful
method-combinations to the best of my knowledge, though.
-- 
Jesus College, Cambridge, CB5 8BL                           +44 1223 510 299
http://www-jcsu.jesus.cam.ac.uk/~csr21/                  (defun pling-dollar 
(str schar arg) (first (last +))) (make-dispatch-macro-character #\! t)
(set-dispatch-macro-character #\! #\$ #'pling-dollar)
From: Thomas F. Burdick
Subject: Re: read-sequence
Date: 
Message-ID: <xcv7khj9j2m.fsf@conquest.OCF.Berkeley.EDU>
Christophe Rhodes <·····@cam.ac.uk> writes:

> (otherwise, maybe CLISP would also have superb ANSI standard
> support, as opposed to the missing functionality in its CLOS
> implementation[*])
>
> [*] workaroundable by using PCL; and, to be fair, it has improved in
> the last couple of years. It's still missing CHANGE-CLASS and useful
> method-combinations to the best of my knowledge, though.

Hmm, are you sure you can use PCL?  I tried to recently, and failed,
and I recall seeing something about someone trying to load PCL into a
recent CLISP, failing, and I don't recall seeing a solution (although
I might have missed it).  So far I've just been faking method
combinations by hand on CLISP; it would be nice to remove the
conditionals from my code, but I have no idea how much of a time-pit
it would be to get PCL up and running on CLISP.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Will Deakin
Subject: Re: read-sequence
Date: 
Message-ID: <amamir$klq$1@knossos.btinternet.com>
Thomas F. Burdick wrote:
> Hmm, are you sure you can use PCL?  I tried to recently, and failed,
> and I recall seeing something about someone trying to load PCL into a
> recent CLISP, failing, and I don't recall seeing a solution (although
> I might have missed it). 
I know that in the past I have pointed the clos stuff. I have used this 
-- but only in my gentlemanly pursuits. However, the pointer is still on 
the clisp site to the pcl download[1].

> So far I've just been faking method combinations by hand on CLISP;
Ouch.

:)w

[1] clisp.sourceforge.net/impnotes.html#clos-diff
From: Christophe Rhodes
Subject: Re: read-sequence
Date: 
Message-ID: <sqptvbt6v0.fsf@lambda.jcn.srcf.net>
···@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> Christophe Rhodes <·····@cam.ac.uk> writes:
> 
> > (otherwise, maybe CLISP would also have superb ANSI standard
> > support, as opposed to the missing functionality in its CLOS
> > implementation[*])
> >
> > [*] workaroundable by using PCL; and, to be fair, it has improved in
> > the last couple of years. It's still missing CHANGE-CLASS and useful
> > method-combinations to the best of my knowledge, though.
> 
> Hmm, are you sure you can use PCL?  I tried to recently, and failed,
> and I recall seeing something about someone trying to load PCL into a
> recent CLISP, failing, and I don't recall seeing a solution (although
> I might have missed it).  So far I've just been faking method
> combinations by hand on CLISP; it would be nice to remove the
> conditionals from my code, but I have no idea how much of a time-pit
> it would be to get PCL up and running on CLISP.

I'm afraid I don't know; I was going on hearsay about the ability to
work around CLISP's broken CLOS. It may indeed no longer be possible
to load PCL into it (or it may); I haven't tried recently.

Sorry to dash your hopes,

Cheers,

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <am9lrl$438e2$1@ID-105510.news.dfncis.de>
Hi Christophe Rhodes,

> "Adam Warner" <······@consulting.net.nz> writes:
> 
>> Hi Tim Bradshaw,
>> 
>> > Obviously it's hard for a byte-coded or interpreted implementation to
>> > get around this.  My approach when using these kinds of system is to
>> > try and ignore the speed differences as far as possible, but that's
>> > not really a good solution because it then becomes very hard to do
>> > things like profiling.
>> 
>> I think this kind of feature being undesirable is overstated:
>> 
>> 1. As a consequence of the design the range of hardware and operating
>> system support is far better. I'm not even tied to x86-32.
> 
> I don't buy this, sorry.

I specifically mentioned 32-bit x86 with an eye to future 64-bit Intel/AMD
processors becoming available and affordable. I estimate that an
implementation like CLISP will be first to support such platforms in
64-bit mode.

In the past there was significant academic and financial support for an
implementation like CMUCL. I just don't see who will be providing the same
kind of resources to undertake such native porting efforts. In comparison a
robust C compiler for these and future platforms is a sure bet.
 
> Having spend a certain amount of time in the CMUCL/SBCL backends, I
> would like to point out firstly that these systems are far from being
> "tied to x86-32"; they run on SPARC, Alpha, MIPS (both endiannesses),
> HPPA, PPC; under Linux, Solaris, Irix, Tru64, and HP-UX.  Allegro CL
> runs under a similar number of platforms, too, and that also has the
> all-important Windows support :-)
> 
> But that's more by the by; the question is whether the design of CLISP
> helps this.  To a certain extent it must; the byte code strategy means
> that there's less work to do to support a new platform. However, the
> essential thought that needs to be done to support a new platform is, I
> suspect, close to the same in all of these implementations: the ABI must
> be understood, otherwise there are unexplained segfaults on various
> platforms; the OS interface must be used properly, and so on. Once this
> is done properly, the rest is straightforward.

For a native code compiler "the rest is straightforward" involves
supporting a new instruction set and optimising for a new processor
design.

>> Perhaps it's also the reason that CLISP has superb Unicode support (cf
>> CMUCL).
> 
> Perhaps.  If you had said that the easy recompilation of CLISP was a
> factor in making it easily hackable, I might have agreed with you. But
> I don't think you can claim that CLISP has superb Unicode support
> because it's an implementation that byte-compiles; as far as I can see,
> there's no connection at all (otherwise, maybe CLISP would also have
> superb ANSI standard support, as opposed to the missing functionality in
> its CLOS implementation[*])

I think the connection is quite strong: 8-bit strings are/were wired into
the native code. It's a huge undertaking rewriting the native string
support. Here's a mailing list post from Dec 2000 that highlights this
issue (note the phrase "moderately intractable" for complete support):

From:�	Robert Maclachlan
To:�	Winton Davies

Date:�	06 Dec 2000 12:40:30 -0500

�� Can anyone give me a ball park figure for the amount of time it would
�� take to implement these (assuming they are feasible):

It's kind of hard to give a time estimate without considering the CMUCL
experience of who would be doing it.

�� (i) Unicode support.

This could be more or less painful depending on what you mean.� Adding a
new string type wouldn't be so bad, perhaps two weeks for someone who knows
what they are doing.� However, if you want every function in CMU CL that
takes a string argument to work on extended strings, then that's moderately
intractable.� A reasonable intermediate point would be to support extended
character sets in the string utility and sequence functions and low-level file
I/O.� This would allow new code to be written so that it supports
international characters.� This might add another week.� 

�� (ii) 64 bit machines.

Do you mean 64 bit Lisp pointers?� The Alpha is a 64 bit machine, and is
already supported.� I'd guess two wizard-months to change to 64 bit
pointers on an already supported architecture with 64 bit capability.

�� (iii) SSL Socket Streams.

I'm not really up on SSL, but I think it's pretty clear the complexity here
would be mainly related to implementing the protocol itself, rather than
lisp specific aspects.� Opening sockets and defining new stream classes is
pretty straightforward.� Perhaps you were already assuming this, but it
would be definitely worth seeing if an existing C/C++ implementation could
be accessed via foreign function call.

� Rob

Regards,
Adam
From: Daniel Barlow
Subject: Re: read-sequence
Date: 
Message-ID: <871y7rqpj4.fsf@noetbook.telent.net>
"Adam Warner" <······@consulting.net.nz> writes:

> For a native code compiler "the rest is straightforward" involves
> supporting a new instruction set and optimising for a new processor
> design.

I think you're comparing apples with bananas here.  For a native
compiler that generates code-comparable-with-C, sure, you need to
optimize it.  Here, you're comparing against a bytecode interpreter,
though, so it's really not going to make much difference if the native
code is a bit naive.

Gary Byers ported (an earlier version of) OpenMCL to the SPARC in
something like six weeks, according to the notes on
http://openmcl.clozure.com/ToDo/OtherPorts


-dan

-- 

  http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources 
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <amb5q5$4be3r$1@ID-105510.news.dfncis.de>
Hi Daniel Barlow,

> "Adam Warner" <······@consulting.net.nz> writes:
> 
>> For a native code compiler "the rest is straightforward" involves
>> supporting a new instruction set and optimising for a new processor
>> design.
> 
> I think you're comparing apples with bananas here.  For a native
> compiler that generates code-comparable-with-C, sure, you need to
> optimize it.  Here, you're comparing against a bytecode interpreter,
> though, so it's really not going to make much difference if the native
> code is a bit naive.
> 
> Gary Byers ported (an earlier version of) OpenMCL to the SPARC in
> something like six weeks, according to the notes on
> http://openmcl.clozure.com/ToDo/OtherPorts

Thanks for making this point. To paraphrase it: because byte-code
compilers can be significantly slower than native code compilers I
shouldn't compare the time taken to produce an optimal byte-code compiler
port against an optimal native code compiler port. I only need to look at
the time it would take to create a port of a somewhat native code compiler
that runs as fast as byte-code.

[Plus all other factor would have to be equal--for example the somewhat
native port would have to be able to generate as reliable results as the
byte-code implementation.]

I suspect that you would need someone with "Wizard" calibre working on
such a project for it to be as successful as in the link above. Gary Byers
noted that for some byzantine platforms it would take 3, 4 or more months
of full-time effort to do a port, and while doable by volunteers it "would
be a very noble effort."

Regards,
Adam
From: Erik Naggum
Subject: Re: read-sequence
Date: 
Message-ID: <3241396476606010@naggum.no>
* Adam Warner
| I only need to look at the time it would take to create a port of a somewhat
| native code compiler that runs as fast as byte-code.

  A first approximation to a native port would indeed be to produce assembly
  code for an abstract machine that would probably require several native CPU
  instructions per abstract instruction, such that you would in fact produce
  native code and not require any byte-code interpreters.  Once the system
  runs on the new platform, you could change the target assembly language.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Thomas F. Burdick
Subject: Re: read-sequence
Date: 
Message-ID: <xcvptv9x6lq.fsf@hurricane.OCF.Berkeley.EDU>
"Adam Warner" <······@consulting.net.nz> writes:

> I suspect that you would need someone with "Wizard" calibre working on
> such a project for it to be as successful as in the link above. Gary Byers
> noted that for some byzantine platforms it would take 3, 4 or more months
> of full-time effort to do a port, and while doable by volunteers it "would
> be a very noble effort."

Yes, it's pretty atypical -- PPC -> SPARC is (one of?) the easiest
ports I could think of, particularly since they ignored the SPARC's
register windows.  Shoot, even going the other direction would
probably be a *little* more difficult.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Paul Dietz
Subject: Re: read-sequence
Date: 
Message-ID: <3D8A249E.B0FE7177@motorola.com>
Daniel Barlow wrote:

> I think you're comparing apples with bananas here.  For a native
> compiler that generates code-comparable-with-C, sure, you need to
> optimize it.  Here, you're comparing against a bytecode interpreter,
> though, so it's really not going to make much difference if the native
> code is a bit naive.

And if you use a dynamic code generation package (like GNU Lightning)
you can generate the machine code in a completely portable way
(assuming someone has ported the code generator to the new platform).

http://www.gnu.org/software/lightning/

	Paul
From: Paolo Amoroso
Subject: Re: read-sequence
Date: 
Message-ID: <R7SJPZjxRFCzLmFpO0ethfluBalT@4ax.com>
On 18 Sep 2002 09:43:46 +0100, Christophe Rhodes <·····@cam.ac.uk> wrote:

> see, there's no connection at all (otherwise, maybe CLISP would also
> have superb ANSI standard support, as opposed to the missing
> functionality in its CLOS implementation[*])
[...]
> [*] workaroundable by using PCL; and, to be fair, it has improved in
> the last couple of years. It's still missing CHANGE-CLASS and useful
> method-combinations to the best of my knowledge, though.

Is the CLISP port of PCL still being maintained?


Paolo
-- 
EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
http://www.paoloamoroso.it/ency/README
From: Will Deakin
Subject: Re: read-sequence
Date: 
Message-ID: <amd6lk$2l3$1@helle.btinternet.com>
Paolo Amoroso wrote:
> Is the CLISP port of PCL still being maintained?
Since the download is called pcl.sept92f.clisp.tar.gz I suspect not.

:)w
From: Tim Bradshaw
Subject: Re: read-sequence
Date: 
Message-ID: <ey31y7rlluw.fsf@cley.com>
* Adam Warner wrote:

> Tim, why is better for the implementation to encourage the wholesale
> rewriting of built in functionality?

Where did I say this?  Where did I say anything at *all* about
reimplementing anything?  You seem to have some large chip on your
shoulder about CLISP: try and read what I actually wrote rather than
interpreting it as an attack on your pet implementation.

--tim
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <am9jeq$3u10c$1@ID-105510.news.dfncis.de>
Hi Tim Bradshaw,

> * Adam Warner wrote:
> 
>> Tim, why is better for the implementation to encourage the wholesale
>> rewriting of built in functionality?
> 
> Where did I say this?  Where did I say anything at *all* about
> reimplementing anything?  You seem to have some large chip on your
> shoulder about CLISP: try and read what I actually wrote rather than
> interpreting it as an attack on your pet implementation.

This is a particularly nasty piece of work Tim. Six days later you restart
a thread telling me that these kind of systems are "pretty undesirable"
and a "barrier to writing your own abstractions". I respond in a quite
detailed way and in part specificially address your point. So you turn
around and execute an ad hominem attack.

Adam
From: Tim Bradshaw
Subject: Re: read-sequence
Date: 
Message-ID: <ey3d6rbk31q.fsf@cley.com>
* Adam Warner wrote:

> This is a particularly nasty piece of work Tim. Six days later you restart
> a thread telling me that these kind of systems are "pretty undesirable"
> and a "barrier to writing your own abstractions". I respond in a quite
> detailed way and in part specificially address your point. So you turn
> around and execute an ad hominem attack.

I'm sorry if you regarded it as ad-hominem.  If it would help you just
take out the bit about chips on your shoulder and concentrate on the
question about where I said anything about encouraging the wholesale
rewriting of built in functionality.  I don't think you'll find it in
my article.

I have no idea what you meant about six days, either: your article to
which I responded is dated Mon, 16 Sep 2002 21:48:48 GMT, my response
is 18 Sep 2002 00:40:25 +0100.  Looks like less than 30 hours to me.

I'm not going to respond further in this thread as you seem to not be
very good at reading what I have actually written rather than what you
think I wrote (and this is ad-hominem, I freely admit it).

--tim
From: Adam Warner
Subject: Re: read-sequence
Date: 
Message-ID: <am9ogc$3vgud$1@ID-105510.news.dfncis.de>
Hi Tim Bradshaw,

> * Adam Warner wrote:
> 
>> This is a particularly nasty piece of work Tim. Six days later you
>> restart a thread telling me that these kind of systems are "pretty
>> undesirable" and a "barrier to writing your own abstractions". I
>> respond in a quite detailed way and in part specificially address your
>> point. So you turn around and execute an ad hominem attack.
> 
> I'm sorry if you regarded it as ad-hominem.  If it would help you just
> take out the bit about chips on your shoulder and concentrate on the
> question about where I said anything about encouraging the wholesale
> rewriting of built in functionality.  I don't think you'll find it in my
> article.
> 
> I have no idea what you meant about six days, either: your article to
> which I responded is dated Mon, 16 Sep 2002 21:48:48 GMT, my response is
> 18 Sep 2002 00:40:25 +0100.  Looks like less than 30 hours to me.

Whatever the date it propagated you can look at the headers to see that it
was sent on 11 September:

http://groups.google.co.nz/groups?selm=pan.2002.09.11.12.14.44.481.379%40consulting.net.nz&oe=UTF-8&output=gplain

Look at this part of the message ID: 2002.09.11

Look at this header: X-Original-Trace: 11 Sep 2002 12:14:51 +1200,
scream.auckland.ac.nz

It appears to have taken around 6 days for the message to have finally
propagated from one of the New Zealand news servers to when you read it! A
fact I wasn't aware of.

> I'm not going to respond further in this thread as you seem to not be
> very good at reading what I have actually written rather than what you
> think I wrote (and this is ad-hominem, I freely admit it).

It appears you did not grok my point that since an implementation like
CLISP encourages someone to use the higher language abstractions that it
may also encourage someone to learn more of the language and create more
understandable code (instead of using fast primitives). This was a
counterpoint to your contention that not being able to make your own code
as fast as primitively-provided functions is pretty undesirable.

Even if you disagreed with the contention you could have addressed it
instead of belittling me.

Regards,
Adam
From: Erik Naggum
Subject: Re: read-sequence
Date: 
Message-ID: <3241343277174291@naggum.no>
* Adam Warner
| Even if you disagreed with the contention you could have addressed it
| instead of belittling me.

  Even if you feel belittled, you could have addressed his points.

  It is clearly more important to you feel well and well treated than to argue
  well.  News is not a good place for people who feel first and think later, if
  at all.  So instead of making a huge fuss about your personal feelings, just
  address the points in question.  You cannot always demand to be treated well,
  either.  Sometimes, everybody does things that looks stupid to others.  What
  matters is not how it makes you look, but how maturely you respond.  If all
  you can do is whine that you were treated badly, that does a lot more damage
  to the impression people have of you than if you simply ignore it and do
  better next time, which in effect, if not in words, rejects and repudiates
  what you feel is an attack more than you could by denying it, which actually
  gives it credibility in the eyes of the beholders.

  If you dislike that others do not address the point, just address the point.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Daniel Barlow
Subject: Re: read-sequence
Date: 
Message-ID: <87u1knpaez.fsf@noetbook.telent.net>
"Adam Warner" <······@consulting.net.nz> writes:

> It appears you did not grok my point that since an implementation like
> CLISP encourages someone to use the higher language abstractions that it
> may also encourage someone to learn more of the language and create more
> understandable code (instead of using fast primitives). This was a
> counterpoint to your contention that not being able to make your own code
> as fast as primitively-provided functions is pretty undesirable.

This is really only an argument for using CLISP for someone who (a)
doesn't know CL very well, and (b) considers that he won't make the
effort to learn unless his implementation gives him negative
reinforcement for being lazy.  

I don't think it's a counterpoint to Tim's argument, which I
considered to be about creating domain-specific or otherwise
extra-standard abstractions which aren't already built into CLISP and
aren't likely to be by virtue of their nature.  There's no reason to
penalise that style of programming, and many to encourage it.


-dan

-- 

  http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources