implementations allowing bivalent streams

From: Marc Battyani
Subject: implementations allowing bivalent streams
Date: Tue, 17 Dec 2002 08:40:27 +0000
Message-ID: <ADA489C7B383B9D7.1587ACB0FD6BC652.E8652236114002D3@lp.airnews.net>

In cl-pdf I have to write characters and binary data (for JPEG) to the pdf
stream.
At least LW and ACL allows for this but I would like to know what
implementations does not allow it. So that I can add the some conditional
code to handle it if needed.

Thanks.

Marc

Re: implementations allowing bivalent streams Thomas F. Burdick
- Re: implementations allowing bivalent streams Duane Rettig
  - Re: implementations allowing bivalent streams Thomas F. Burdick
    - Re: implementations allowing bivalent streams Duane Rettig
      - pread in simple streams Alexander Kjeldaas
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Bulent Murtezaoglu
        Re: pread in simple streams Kent M Pitman
        Re: pread in simple streams Peter Seibel
        Re: pread in simple streams Alexander Kjeldaas
        Re: pread in simple streams Tim Bradshaw
        Re: pread in simple streams Thomas F. Burdick
        Re: pread in simple streams Tim Bradshaw
        Re: pread in simple streams Thomas F. Burdick
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Thomas F. Burdick
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Tim Bradshaw
        Re: pread in simple streams Doug McNaught
        Re: pread in simple streams Tim Bradshaw
        Re: pread in simple streams Raymond Toy
        Re: pread in simple streams Alexander Kjeldaas
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Peter Seibel
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Peter Seibel
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Peter Seibel
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Alexander Kjeldaas
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Alexander Kjeldaas
        Re: pread in simple streams Duane Rettig
        Re: pread in simple streams Alexander S A Kjeldaas
        Re: pread in simple streams Scott Schwartz
  - Re: implementations allowing bivalent streams Duane Rettig
- Re: implementations allowing bivalent streams Marc Battyani
Re: implementations allowing bivalent streams Sam Steingold
- Re: implementations allowing bivalent streams Marc Battyani

From: Thomas F. Burdick
Subject: Re: implementations allowing bivalent streams
Date: Tue, 17 Dec 2002 18:52:47 +0000
Message-ID: <xcvsmww8olb.fsf@conquest.OCF.Berkeley.EDU>

"Marc Battyani" <·············@fractalconcept.com> writes:

> In cl-pdf I have to write characters and binary data (for JPEG) to the pdf
> stream.
> At least LW and ACL allows for this but I would like to know what
> implementations does not allow it. So that I can add the some conditional
> code to handle it if needed.

Are you sure you need this?  Doesn't the PDF spec define PDFs as being
a stream of octets?  (Not rhetorical, I'm working from memory).  If
so, it might be a good idea to always write 8-bit bytes.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Duane Rettig
Subject: Re: implementations allowing bivalent streams
Date: Tue, 17 Dec 2002 20:00:01 +0000
Message-ID: <44r9ca1ho.fsf@beta.franz.com>

···@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> "Marc Battyani" <·············@fractalconcept.com> writes:
> 
> > In cl-pdf I have to write characters and binary data (for JPEG) to the pdf
> > stream.
> > At least LW and ACL allows for this but I would like to know what
> > implementations does not allow it. So that I can add the some conditional
> > code to handle it if needed.
> 
> Are you sure you need this?  Doesn't the PDF spec define PDFs as being
> a stream of octets?  (Not rhetorical, I'm working from memory).  If
> so, it might be a good idea to always write 8-bit bytes.

This is in fact how I read Marc's article.  How he _writes_ data
and how it comes out at the device are not necessarily the same.
Simple-streams does write external data in terms of 8-bit bytes.
So if characters are not 8-bit bytes, they must be translated by
an external-format.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Thomas F. Burdick
Subject: Re: implementations allowing bivalent streams
Date: Tue, 17 Dec 2002 20:12:11 +0000
Message-ID: <xcvpts08kx0.fsf@conquest.OCF.Berkeley.EDU>

Duane Rettig <·····@franz.com> writes:

> ···@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:
> 
> > "Marc Battyani" <·············@fractalconcept.com> writes:
> > 
> > > In cl-pdf I have to write characters and binary data (for JPEG) to the pdf
> > > stream.
> > > At least LW and ACL allows for this but I would like to know what
> > > implementations does not allow it. So that I can add the some conditional
> > > code to handle it if needed.
> > 
> > Are you sure you need this?  Doesn't the PDF spec define PDFs as being
> > a stream of octets?  (Not rhetorical, I'm working from memory).  If
> > so, it might be a good idea to always write 8-bit bytes.
> 
> This is in fact how I read Marc's article.  How he _writes_ data
> and how it comes out at the device are not necessarily the same.
> Simple-streams does write external data in terms of 8-bit bytes.
> So if characters are not 8-bit bytes, they must be translated by
> an external-format.

(I'm not sure where my brain was when I wrote this...) yes, of course.
Simple-streams seem nice, but aren't widely supported (yet?).  You can
do this with Gray streams easily enough, and they're probably your
best bet for portability.  The way I've done this is to open the base
stream as a binary stream, and make Gray stream objects that sit over
it like a filter:

  (with-open-file (raw "foo.txt" :element-type '(unsigned-byte 8))
    (with-filtered-input (in raw my-character-stream-class)
      (read-line in)))

I've never once profiled anything where I've used this approach, though :)

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Duane Rettig
Subject: Re: implementations allowing bivalent streams
Date: Tue, 17 Dec 2002 22:00:01 +0000
Message-ID: <4vg1s8i89.fsf@beta.franz.com>

···@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > ···@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:
> > 
> > > "Marc Battyani" <·············@fractalconcept.com> writes:
> > > 
> > > > In cl-pdf I have to write characters and binary data (for JPEG) to the pdf
> > > > stream.
> > > > At least LW and ACL allows for this but I would like to know what
> > > > implementations does not allow it. So that I can add the some conditional
> > > > code to handle it if needed.
> > > 
> > > Are you sure you need this?  Doesn't the PDF spec define PDFs as being
> > > a stream of octets?  (Not rhetorical, I'm working from memory).  If
> > > so, it might be a good idea to always write 8-bit bytes.
> > 
> > This is in fact how I read Marc's article.  How he _writes_ data
> > and how it comes out at the device are not necessarily the same.
> > Simple-streams does write external data in terms of 8-bit bytes.
> > So if characters are not 8-bit bytes, they must be translated by
> > an external-format.
> 
> (I'm not sure where my brain was when I wrote this...) yes, of course.
> Simple-streams seem nice, but aren't widely supported (yet?).

I am hoping some interest was generated at ICL 2002, and the feedback
from other vendors was positive, but I have not yet heard from them.
Paul Foley has a preliminary implementation for CMUCL, at his page:
http://users.actrix.gen.nz/mycroft/cl.html
which can be downloaded and experimentations performed.

>  You can
> do this with Gray streams easily enough, and they're probably your
> best bet for portability.  The way I've done this is to open the base
> stream as a binary stream, and make Gray stream objects that sit over
> it like a filter:
> 
>   (with-open-file (raw "foo.txt" :element-type '(unsigned-byte 8))
>     (with-filtered-input (in raw my-character-stream-class)
>       (read-line in)))

The kinds of things that this makeshift encapsulation is doing can
be done in simple-streams without encapsulation; simple character encodings
and translations tend to come under the purview of external-formats.
However, you can easily perform arbitrary binary-to-binary,
binary-to-character, or character-to-character translations using
encapsulations in simple-streams.  You can find two examples of
encapsulating streams at
http://www.franz.com/support/documentation/6.2/doc/streams.htm#encapsulation-examples-2

> I've never once profiled anything where I've used this approach, though :)

Like any well-designed tool, you should expect to be able to pay a
cost-per-functionality performance price, so yes, I would expect
the addition of a filter would slow things down a bit.  If you
are in fact _afraid_ to profile it, well, perhaps you should consider
simple-streams for encapsulations after all ... :-)

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Alexander Kjeldaas
Subject: pread in simple streams
Date: Wed, 18 Dec 2002 11:00:57 +0000
Message-ID: <atpjpq$2t7$1@news.broadnet.no>

Duane Rettig wrote:

> 
> I am hoping some interest was generated at ICL 2002, and the feedback
> from other vendors was positive, but I have not yet heard from them.
> Paul Foley has a preliminary implementation for CMUCL, at his page:
> http://users.actrix.gen.nz/mycroft/cl.html
> which can be downloaded and experimentations performed.
> 

This simple streams thing is exciting. After reading some of the
documentation on simple streams, I have a few questions to you.

I do not see support for pread-like reading of files.  Is this intentional? 
It seems to me that the most efficient low leverl I/O interface is
pread-like.

I see that you support streams that are mmapped. Is it possible to get a
direct reference to the mmapped area/to the internal buffer to avoid
copying huge amounts of data in I/O intense applications?

Have you considered an interface for asynchronous read/write/open from
files? For server applications, having asynchronous I/O available often
makes it possible to totally avoid threads.  Maybe especially interesting
for implementations that do not support threads - emulating asynchronous
read/write/open can be done with special helper threads that work around a
garbage-collector that is not thread-aware.

astor

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Wed, 18 Dec 2002 17:00:01 +0000
Message-ID: <4el8fcljt.fsf@beta.franz.com>

Alexander Kjeldaas <··········@fast.no> writes:

> Duane Rettig wrote:
> 
> > 
> > I am hoping some interest was generated at ICL 2002, and the feedback
> > from other vendors was positive, but I have not yet heard from them.
> > Paul Foley has a preliminary implementation for CMUCL, at his page:
> > http://users.actrix.gen.nz/mycroft/cl.html
> > which can be downloaded and experimentations performed.
> > 
> 
> 
> This simple streams thing is exciting. After reading some of the
> documentation on simple streams, I have a few questions to you.

Sure, no problem.  Perhaps further questions and replies should be
taken to email, as I suspect that this conversation will get pretty
low-level and off-group.

> I do not see support for pread-like reading of files.  Is this intentional? 
> It seems to me that the most efficient low leverl I/O interface is
> pread-like.

After over 25 years of unix hacking, I've never used pread() and didn't
know it existed.  After reading the man pages for it, I see very little
to be gained in efficiency; I've always thought of a file position as
an efficient thing to change (or not) - it is just a number to set in
the iob; when it comes down to the nuts and bolts of the transfer, it
must be translated into drive/track/sector info anyway.  I'm willing
to be shown what efficiency gain can be had by pread and under what
circumstances, but we should do this offline in email.

> I see that you support streams that are mmapped. Is it possible to get a
> direct reference to the mmapped area/to the internal buffer to avoid
> copying huge amounts of data in I/O intense applications?

Yes, of course; that's precisely what mapped file streams are for.
The underlying "buffer" attached to the stream is actually just a
memory address; operations into and out of the streams are reading
and modifying mapped memory directly.  The buffer is aligned according
to operating-system requirements, and thus the kernel makes the actual
data transfers to disk during syncing without having to do the extra
copy from user space to kernel memory.

Incidentally, Paul Foley asked me a lot of questions about mapped files,
and told me that mapped files were the very reason why he had looked into
simple-streams; CMUCL did not have support for mapped files, and he wanted
to use them.

> Have you considered an interface for asynchronous read/write/open from
> files? For server applications, having asynchronous I/O available often
> makes it possible to totally avoid threads.  Maybe especially interesting
> for implementations that do not support threads - emulating asynchronous
> read/write/open can be done with special helper threads that work around a
> garbage-collector that is not thread-aware.

Although the Allegro CL implementation of simple-streams does not take
advantage of asynchronous i/o, it is intended that the simple-streams concept
is compatible with async io.  Both device-read and device-write tend to be
called with nil as their buffers (which implies that they should use their
"standard" buffers from the stream).  This allows these methods to perform
their job asynchronously by detaching these buffers and scheduling a task
to actually perform the transfer, or in fact the stream class can lock the
buffer with a semaphore, so that the transfer can take place while other
operations are occuring, and the next write will only be blocked when
the previous write has not yet finished.

As with any of the additional features that we've added over the past two
or three years, or as we use simple-streams in more and more situations,
and as we find bugs in the specification, we find that we need to tweak
that specification here and there.  I suspect that we will need to do so
whenever someone comes up with an actual asynch io implementation.  For
example, I suspect that the blocking argument to at least device-write
will really need to be a trinary (t, :bnb, or nil) instead of non-nil
meaning :bnb (for info and definitions on blocking behavior, see):
http://www.franz.com/support/documentation/6.2/doc/streams.htm#block-non-block-3

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Bulent Murtezaoglu
Subject: Re: pread in simple streams
Date: Wed, 18 Dec 2002 20:22:13 +0000
Message-ID: <8765trt6ve.fsf@acm.org>

>>>>> "DR" == Duane Rettig <·····@franz.com> writes:
[on interesting stuff about simple streams]
    DR> Sure, no problem.  Perhaps further questions and replies
    DR> should be taken to email, as I suspect that this conversation
    DR> will get pretty low-level and off-group. [...]

No, please keep it here.  I'm sure I'm not the only one who's 
interested.

cheers,

BM

From: Kent M Pitman
Subject: Re: pread in simple streams
Date: Wed, 18 Dec 2002 23:01:45 +0000
Message-ID: <sfwd6nz2ap2.fsf@shell01.TheWorld.com>

Bulent Murtezaoglu <··@acm.org> writes:

> >>>>> "DR" == Duane Rettig <·····@franz.com> writes:
> [on interesting stuff about simple streams]
>     DR> Sure, no problem.  Perhaps further questions and replies
>     DR> should be taken to email, as I suspect that this conversation
>     DR> will get pretty low-level and off-group. [...]
> 
> No, please keep it here.  I'm sure I'm not the only one who's 
> interested.

I agree.

From: Peter Seibel
Subject: Re: pread in simple streams
Date: Wed, 18 Dec 2002 23:02:18 +0000
Message-ID: <m3ptrz53sg.fsf@localhost.localdomain>

Bulent Murtezaoglu <··@acm.org> writes:

> >>>>> "DR" == Duane Rettig <·····@franz.com> writes:
> [on interesting stuff about simple streams]
>     DR> Sure, no problem.  Perhaps further questions and replies
>     DR> should be taken to email, as I suspect that this conversation
>     DR> will get pretty low-level and off-group. [...]
> 
> No, please keep it here.  I'm sure I'm not the only one who's 
> interested.

FWIW, I second that.

-Peter

-- 
Peter Seibel
·····@javamonkey.com

From: Alexander Kjeldaas
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 11:15:23 +0000
Message-ID: <ats90t$n1c$1@news.broadnet.no>

Duane Rettig wrote:

> Alexander Kjeldaas <··········@fast.no> writes:
>> 
>> This simple streams thing is exciting. After reading some of the
>> documentation on simple streams, I have a few questions to you.
> 
> Sure, no problem.  Perhaps further questions and replies should be
> taken to email, as I suspect that this conversation will get pretty
> low-level and off-group.
> 

I will answer here as a number of people requested it.

>> I do not see support for pread-like reading of files.  Is this
>> intentional? It seems to me that the most efficient low leverl I/O
>> interface is pread-like.
> 
> After over 25 years of unix hacking, I've never used pread() and didn't
> know it existed.  After reading the man pages for it, I see very little
> to be gained in efficiency; I've always thought of a file position as
> an efficient thing to change (or not) - it is just a number to set in
> the iob; when it comes down to the nuts and bolts of the transfer, it
> must be translated into drive/track/sector info anyway.  I'm willing
> to be shown what efficiency gain can be had by pread and under what
> circumstances, but we should do this offline in email.
> 

pread and pwrite are typically used for access to a random-access file. The
file backing a database would be an ideal example. If such a database is
single threaded pread and pwrite saves the lseek/llseek system call.  That
can be important in cases where system calls are expensive.  

On the other hand, if the database is multi-threaded, pread and pwrite saves
serializing all access to the file.  Unless you use pread, you have to take
a mutex, do an (l)lseek, read, and release the mutex.  Thus no other thread
can do I/O on the file at the same time.  pread elegantly solves this
problem by making the seek and the read/write one atomic operation. You can
work around this, but it can be ugly.  One method would be to open(2) the
file once for each thread.  Another method would be to have a pool of
open(2)ed file descriptors for the file where the size of the pool limits
the number of concurrent I/O requests you can have on the file.  If the
application wants to use mapped files however, mapping the file several
times is probably out of the question (both from a performance perspective
and a virtual memory starvation perspective) and there might not be any
workaround except going outside the API.

So basically lseek + read/write + multi-threading is not a good combination
and one solution on unix has been to make the read/write accept a position
as an argument and make it an atomic operation.

>> I see that you support streams that are mmapped. Is it possible to get a
>> direct reference to the mmapped area/to the internal buffer to avoid
>> copying huge amounts of data in I/O intense applications?
> 
> Yes, of course; that's precisely what mapped file streams are for.
> The underlying "buffer" attached to the stream is actually just a
> memory address; operations into and out of the streams are reading
> and modifying mapped memory directly.  The buffer is aligned according
> to operating-system requirements, and thus the kernel makes the actual
> data transfers to disk during syncing without having to do the extra
> copy from user space to kernel memory.

Sounds good! 
There are two copies I would like to avoid by using mapped files.  One is
from kernel space to user space, and the other is from the buffer in user
space to the octet-array or sequence used in a read-sequence/read-vector
call. In my question I was thinking of the second copy. To avoid the second
copy, the memory address where the file is mapped, the address returned by
the mmap call, should be (at least) available as an octet array, so that it
can be used as a sequence directly.

> 
> Incidentally, Paul Foley asked me a lot of questions about mapped files,
> and told me that mapped files were the very reason why he had looked into
> simple-streams; CMUCL did not have support for mapped files, and he wanted
> to use them.
> 

The lack of mmap (or general I/O issues) was my biggest concern the first
time I looked at lisp.  It is very good to hear that it is being supported.

> 
> Although the Allegro CL implementation of simple-streams does not take
> advantage of asynchronous i/o, it is intended that the simple-streams
> concept
> is compatible with async io.  Both device-read and device-write tend to be
> called with nil as their buffers (which implies that they should use their
> "standard" buffers from the stream).  This allows these methods to perform
> their job asynchronously by detaching these buffers and scheduling a task
> to actually perform the transfer, or in fact the stream class can lock the
> buffer with a semaphore, so that the transfer can take place while other
> operations are occuring, and the next write will only be blocked when
> the previous write has not yet finished.
> 

What would be great to have was some sort of event system that could take
advantage of completion ports on windows, sigio, async read/write,
select(2), poll(2) on unix.  There are several asynchronous i/o delivery
mechanisms on various operating systems.  I do not know of any languages
that combine a great streams implementation with a great asynchronous
interface, but lisp would be the language where it could be done elegantly
because it has keyword arguments.  Let me give an example of what I am
thinking of when I say an asynchronous interface (with pread
functionality):

(defun read-3-sequences (stream)
  (let ((a1 (make-array 100 :element-type '(unsigned-byte 8)))
        (a2 (make-array 100 :element-type '(unsigned-byte 8)))
        (a3 (make-array 100 :element-type '(unsigned-byte 8)))
        (counter 0))
    (flet ((inc-counter (stream sequence) (atomic-incf counter)))
      ;; Start 3 reads.  inc-counter handler is called when the read
      ;; is finished. When they have all finished, counter should
      ;; be 3.
      ;; :position is used to indicate the position in the stream we
      ;; want to read from (like pread(2)).
      ;; :when-finished implies that the operation should be asynchronous
      (read-sequence a1 stream :position 0   :when-finished #'inc-counter)
      (read-sequence a2 stream :position 100 :when-finished #'inc-counter)
      (read-secuence a3 stream :position 200 :when-finished #'inc-counter))
    (do () 
        ((eq counter 3) (values a1 a2 a3)))))

astor

From: Tim Bradshaw
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 14:53:34 +0000
Message-ID: <ey3k7i6m55d.fsf@cley.com>

* Alexander Kjeldaas wrote:

> pread and pwrite are typically used for access to a random-access file. The
> file backing a database would be an ideal example. If such a database is
> single threaded pread and pwrite saves the lseek/llseek system call.  That
> can be important in cases where system calls are expensive.  

Does anyone actually do this rather than using mmap though?  Actually,
I guess they might, because on certain primitive machines, there isn't
enough address space to mmap reasonably-sized files.

--tim

From: Thomas F. Burdick
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 19:44:43 +0000
Message-ID: <xcvy96lrdxw.fsf@famine.OCF.Berkeley.EDU>

Tim Bradshaw <···@cley.com> writes:

> * Alexander Kjeldaas wrote:
> 
> > pread and pwrite are typically used for access to a random-access file. The
> > file backing a database would be an ideal example. If such a database is
> > single threaded pread and pwrite saves the lseek/llseek system call.  That
> > can be important in cases where system calls are expensive.  
> 
> Does anyone actually do this rather than using mmap though?

Yes.  Some Unixes (I'm looking in FreeBSD's general direction) freak
out when you use mmap'ing too heavily and are prone to kernel panics.
Hopefully this isn't true any more, but it certainly was at one time
(FreeBSD 3.3, I think).

> Actually, I guess they might, because on certain primitive machines,
> there isn't enough address space to mmap reasonably-sized files.

Yeah, 32-bit weenies get all kinds of problems.  (Hmmm, I wonder how
long until Apple comes out with 64-bit machines...)

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Tim Bradshaw
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 19:54:04 +0000
Message-ID: <ey3vg1plr8j.fsf@cley.com>

* Thomas F Burdick wrote:

> Yeah, 32-bit weenies get all kinds of problems.  (Hmmm, I wonder how
> long until Apple comes out with 64-bit machines...)

aren't the powerpc macs 64bit?  I admit to being confused as to the
bitness of powerpc...

--tim

From: Thomas F. Burdick
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 21:14:04 +0000
Message-ID: <xcvhed93e5f.fsf@apocalypse.OCF.Berkeley.EDU>

Tim Bradshaw <···@cley.com> writes:

> * Thomas F Burdick wrote:
> 
> > Yeah, 32-bit weenies get all kinds of problems.  (Hmmm, I wonder how
> > long until Apple comes out with 64-bit machines...)
> 
> aren't the powerpc macs 64bit?  I admit to being confused as to the
> bitness of powerpc...

Alas, no, PowerPC chips can be either 32- or 64-bit.  Apple uses
32-bit chips.  However, a while ago, IBM announced that they were
going to make lower-power-consumption (ie, appropriate for a Mac
desktop machine) 64-bit PowerPC chips.  Having previously refused to
put a vector unit in their PowerPC chips, IBM quietly mentioned that
these will include the vector unit that Apple makes heavy use of.  I
normally don't follow Mac rumors, because there's too many and they're
too unreliable ... but who else could IBM be making these for?  No one
else makes PowerPC-based PCs.

(Being a strictly-two-years-ago's technology consumer when it comes to
computers, this means these things are still quite some distance in my
future, but it's still kind of exciting.)

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 22:00:01 +0000
Message-ID: <4vg1p3d9t.fsf@beta.franz.com>

···@apocalypse.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> Tim Bradshaw <···@cley.com> writes:
> 
> > * Thomas F Burdick wrote:
> > 
> > > Yeah, 32-bit weenies get all kinds of problems.  (Hmmm, I wonder how
> > > long until Apple comes out with 64-bit machines...)
> > 
> > aren't the powerpc macs 64bit?  I admit to being confused as to the
> > bitness of powerpc...
> 
> Alas, no, PowerPC chips can be either 32- or 64-bit.

Why do you say this?  The PowerPC architecture itself includes 64-bit
instructions, so if it is true that no 64-bit PowerPC single-chip
implementations currently exist today, I would think that that is
an implementation detail only, and that it might not be so in the
future.  It's all in the economics of the manufacture...

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Thomas F. Burdick
Subject: Re: pread in simple streams
Date: Sat, 21 Dec 2002 03:50:11 +0000
Message-ID: <xcvk7i4hvyk.fsf@conquest.OCF.Berkeley.EDU>

Duane Rettig <·····@franz.com> writes:

> ···@apocalypse.OCF.Berkeley.EDU (Thomas F. Burdick) writes:
> 
> > Tim Bradshaw <···@cley.com> writes:
> > 
> > > * Thomas F Burdick wrote:
> > > 
> > > > Yeah, 32-bit weenies get all kinds of problems.  (Hmmm, I wonder how
> > > > long until Apple comes out with 64-bit machines...)
> > > 
> > > aren't the powerpc macs 64bit?  I admit to being confused as to the
> > > bitness of powerpc...
> > 
> > Alas, no, PowerPC chips can be either 32- or 64-bit.
> 
> Why do you say this?  The PowerPC architecture itself includes 64-bit
> instructions, so if it is true that no 64-bit PowerPC single-chip
> implementations currently exist today, I would think that that is
> an implementation detail only, and that it might not be so in the
> future.  It's all in the economics of the manufacture...

I think you misread me.  I was lamenting that implementations can be
32-bit, because the result is that Macs are all 32-bit PPCs.
Certainly there are 64-bit PPC implementations from IBM, and they're
even developing one that doesn't require a jet engine for a fan, so
Apple can use them.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 22:00:01 +0000
Message-ID: <4znr13duk.fsf@beta.franz.com>

Tim Bradshaw <···@cley.com> writes:

> * Thomas F Burdick wrote:
> 
> > Yeah, 32-bit weenies get all kinds of problems.  (Hmmm, I wonder how
> > long until Apple comes out with 64-bit machines...)
> 
> aren't the powerpc macs 64bit?  I admit to being confused as to the
> bitness of powerpc...

Macs are 32-bit.

The whole set of Power* terminology is rather confusing.  The original
RS/6000 implemented the Power architecture.  Then, Power2 was defined
at around the same time IBM, Motorola, and Apple got together and defined
the PowerPC architecture.  Motorola implemented the MPC601 chip, which
was PowerPC (sort-of) with Power instructions for back compatibility.

The Power3 and Power4 architectures define the first bona fide 64-bit
implementations.  We did our 64-bit AIX port on a Power3 machine,
which then is portable to the big Power4 mainframes.

To my knowledge, though, Apple has not gone public with any plans
for a 64-bit version of MacOSX.  I suppose they would tend to follow
sometime after BSD goes wide ...

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 17:00:01 +0000
Message-ID: <4heda3qpa.fsf@beta.franz.com>

Tim Bradshaw <···@cley.com> writes:

> * Alexander Kjeldaas wrote:
> 
> > pread and pwrite are typically used for access to a random-access file. The
> > file backing a database would be an ideal example. If such a database is
> > single threaded pread and pwrite saves the lseek/llseek system call.  That
> > can be important in cases where system calls are expensive.  
> 
> Does anyone actually do this rather than using mmap though?  Actually,
> I guess they might, because on certain primitive machines, there isn't
> enough address space to mmap reasonably-sized files.

Gotta think big; what really is a reasonable sized file?  In the last
5 years disks have jumped from the hundreds-of-megabytes range to the
10s-of-gigibytes range.  Is a 10 Gb database file a reasonable-size
file?  Five years ago, maybe not.  Five years from now, perhaps it
will be considered small.  And perhaps in 5 years, we will no longer
be working with measly 32-bit machines, so mmapping such a 10 Gb file
will no longer be the problem that it is now ...

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Tim Bradshaw
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 18:06:51 +0000
Message-ID: <ey365tpnaro.fsf@cley.com>

* Duane Rettig wrote:
> Gotta think big; what really is a reasonable sized file?  In the last
> 5 years disks have jumped from the hundreds-of-megabytes range to the
> 10s-of-gigibytes range.  Is a 10 Gb database file a reasonable-size
> file?  Five years ago, maybe not.  Five years from now, perhaps it
> will be considered small.  

Yes, I think 10-100GB files (maybe up to 1TB) are pretty reasonable.

> And perhaps in 5 years, we will no longer be working with measly
> 32-bit machines, so mmapping such a 10 Gb file will no longer be the
> problem that it is now ...

Well, some of us don't work with them now (:-), but yes, I hope so.  I
guess that depends on Itanic though, so I'm not holding my breath...

--tim

From: Doug McNaught
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 19:59:56 +0000
Message-ID: <m3lm2lixtv.fsf@abbadon.mcnaught.org>

Tim Bradshaw <···@cley.com> writes:

> * Duane Rettig wrote:
> > And perhaps in 5 years, we will no longer be working with measly
> > 32-bit machines, so mmapping such a 10 Gb file will no longer be the
> > problem that it is now ...
> 
> Well, some of us don't work with them now (:-), but yes, I hope so.  I
> guess that depends on Itanic though, so I'm not holding my breath...

Or AMD's upcoming x86-64.  Or existing Sparc64/PA-RISC/MIPS/Alpha...

-Doug

From: Tim Bradshaw
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 19:56:23 +0000
Message-ID: <ey3r8cdlr4o.fsf@cley.com>

* Doug McNaught wrote:

> Or AMD's upcoming x86-64.  Or existing Sparc64/PA-RISC/MIPS/Alpha...

I think that x86-64 is the only one that holds out serious potential
for being a major desktop cpu, since the others have already lost
there.  But yes, one of the others is what I meant when I said `some
of us don't work with them now...' ...

--tim

From: Raymond Toy
Subject: Re: pread in simple streams
Date: Fri, 20 Dec 2002 16:49:13 +0000
Message-ID: <4nptrwmy9i.fsf@edgedsp4.rtp.ericsson.se>

>>>>> "Tim" == Tim Bradshaw <···@cley.com> writes:

    Tim> * Doug McNaught wrote:
    >> Or AMD's upcoming x86-64.  Or existing Sparc64/PA-RISC/MIPS/Alpha...

    Tim> I think that x86-64 is the only one that holds out serious potential
    Tim> for being a major desktop cpu, since the others have already lost
    Tim> there.  But yes, one of the others is what I meant when I said `some
    Tim> of us don't work with them now...' ...

I think there's some hope for a 64-bit ppc if Apple should adopt one.
At least it would be more mainstream than sparc64/pa-risc/mips/alpha.

Ray

From: Alexander Kjeldaas
Subject: Re: pread in simple streams
Date: Fri, 20 Dec 2002 09:07:24 +0000
Message-ID: <atulst$5on$1@news.broadnet.no>

Tim Bradshaw wrote:

> * Alexander Kjeldaas wrote:
> 
>> pread and pwrite are typically used for access to a random-access file.
>> The file backing a database would be an ideal example. If such a database
>> is
>> single threaded pread and pwrite saves the lseek/llseek system call. 
>> That can be important in cases where system calls are expensive.
> 
> Does anyone actually do this rather than using mmap though?  Actually,
> I guess they might, because on certain primitive machines, there isn't
> enough address space to mmap reasonably-sized files.
> 

I would say both methods are reasonable.  Both on 32-bit and 64-bit
machines.  It all depends on the application.  pread/pwrite can win over
mmap because of reduced TLB pressure.  mmap can loose for an application
that wants to do its own caching because the data will be copied anyway,
and you loose the opportunity to use direct io (DMA directly from disk into
the buffer provided by pread/pwrite).

Additionally, compared to aio, there is no notification system when a
madvice(..., MADV_WILLNEED) is in the core.  mincore is a polled interface. 
If you do not want to use such a polled interface, you end up with more
threads than with aio which means more context switches.

As already mentioned - the OS might not handle mmap well.  Small issues like
that the kernel handles "code" and "data" pages differently under memory
pressure.  It can be hard for the kernel to know that an mmapped file is in
fact data.  Support for mmap can be buggy in interesting ways.

astor

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 20:00:02 +0000
Message-ID: <4d6nx4ww3.fsf@beta.franz.com>

Alexander Kjeldaas <··········@fast.no> writes:

> Duane Rettig wrote:
> 
> > Alexander Kjeldaas <··········@fast.no> writes:
> >> 
> >> This simple streams thing is exciting. After reading some of the
> >> documentation on simple streams, I have a few questions to you.
> > 
> > Sure, no problem.  Perhaps further questions and replies should be
> > taken to email, as I suspect that this conversation will get pretty
> > low-level and off-group.
> > 
> 
> I will answer here as a number of people requested it.

Yes, I also got the message very clearly :-)

For those of you who requested that we stay here: well, you asked for
it, and I'll direct any complaints regarding the length of this reply
your way ... :-)

> >> I do not see support for pread-like reading of files.  Is this
> >> intentional? It seems to me that the most efficient low leverl I/O
> >> interface is pread-like.
> > 
> > After over 25 years of unix hacking, I've never used pread() and didn't
> > know it existed.  After reading the man pages for it, I see very little
> > to be gained in efficiency; I've always thought of a file position as
> > an efficient thing to change (or not) - it is just a number to set in
> > the iob; when it comes down to the nuts and bolts of the transfer, it
> > must be translated into drive/track/sector info anyway.  I'm willing
> > to be shown what efficiency gain can be had by pread and under what
> > circumstances, but we should do this offline in email.
> 
> pread and pwrite are typically used for access to a random-access file. The
> file backing a database would be an ideal example. If such a database is
> single threaded pread and pwrite saves the lseek/llseek system call.  That
> can be important in cases where system calls are expensive.  

Saving the time for an lseek call isn't the argument that sways me here;
the time taken for the kernel call is completely swamped by the seek and
rotation latency that occurs when the read is actually done.

If this had been your only argument, I would have asked for some data
to show timings showing the efficiency of this.  However ...

> On the other hand, if the database is multi-threaded, pread and pwrite saves
> serializing all access to the file.  Unless you use pread, you have to take
> a mutex, do an (l)lseek, read, and release the mutex.  Thus no other thread
> can do I/O on the file at the same time.  pread elegantly solves this
> problem by making the seek and the read/write one atomic operation. You can
> work around this, but it can be ugly.  One method would be to open(2) the
> file once for each thread.  Another method would be to have a pool of
> open(2)ed file descriptors for the file where the size of the pool limits
> the number of concurrent I/O requests you can have on the file.  If the
> application wants to use mapped files however, mapping the file several
> times is probably out of the question (both from a performance perspective
> and a virtual memory starvation perspective) and there might not be any
> workaround except going outside the API.
> 
> So basically lseek + read/write + multi-threading is not a good combination
> and one solution on unix has been to make the read/write accept a position
> as an argument and make it an atomic operation.

Ah, yes; atomicity at the kernel level is _the_ cogent argument for
using such an operation.

Well, I see no inherent issue here in simple-streams support for pread.
You can easily specialize file-simple-stream for this kind of thing.
Let me outline the specialization, which should be fairly easy:

 1. Define a new class to specialize file-simple-stream.  Let's call it
    pread-file-stream, just for discussion.  This new class will have
    an extra slot; let's say it's called "current-position".

 2. Define new device-file-position method (and its setf inverse) on
    pread-file-stream.   Instead of performing low-level seek and tell
    operations, as the file-simple-stream methods do, this method pair
    will simply access and set the current-position slot.

 3. Define new device-read and device-write methods which call pread()
    and pwrite() instead of read() and write(), taking the desired offset
    argument from the current-position slot in the stream.  The number of
    octets that were successfully written can then be added to the
    current-position slot before device-read returns.

There may be sundry other issues involved, but this is the main idea,
and although untested, it should be easy enough to implement.

This gives you the capability to manipulate files using FILE-POSITION,
and use the standard CL api, without actually performing any real seeks
until they are necessary for the next read or write.  There is a lot to
be said for not forcing a change to the api - it allows you to provide
the desired optimization and still allow CL programmers to use an
interface they are accustomed to.

Note that the device-read (device-write) methods for file-simple-stream
will read(write) from any handle; if the handle is an integer it will
call read(write), but if the handle is another stream (due to an
encapsulation), it will do the equivalent of read-vector(write-vector)
(which are new api functions defined by simple-streams to supplement
read-sequence and write-sequence without bothering their definitions).
However, the simple description I gave in step 3 above doesn't take this
into account, and so streams built on pread-file-streams would not be
encapsulatable.  However, it should be easy enough to enhance the
device-read and device-write methods to check the hande for stream-ness,
and to simply call file-position followed by read-vector or write-vector
on the stream.  And although this is not an atomic operation, it is still
atomic wrt the kernel, because the encapsulated stream, if it is also a
pread-file-stream, will delay the seek until the read operation anyway.

Finally: It may be that you really want an API that goes beyond the
CL spec, as youe example below shows.  You can still do this.
Simple-streams modularizes itself into API, strategy, and device
layers, and of course the strategy layer implements the CL standard
API.  However, as we expose the strategy layer more and more, I
suspect that people will be able to write their own APIs for their
own languages (Lisp is, after all, the language-writing-language).

I presented a tutorial on simple-streams at ILC 2002, and it will be
included in the proceedings.  In addition, you can download the
powerpoint from ftp://ftp.franz.com/pub/duane/Simp-stms.ppt.  In this
presentation I describe simple-streams and go into a little more detail
about "strategy".  The best way to view this powerpoint is by going to
the "Notes Page" view.  I followed most of my notes at the conference,
but even those of you who were at the presentation might enjoy reading
what notes I had actually written down.

> >> I see that you support streams that are mmapped. Is it possible to get a
> >> direct reference to the mmapped area/to the internal buffer to avoid
> >> copying huge amounts of data in I/O intense applications?
> > 
> > Yes, of course; that's precisely what mapped file streams are for.
> > The underlying "buffer" attached to the stream is actually just a
> > memory address; operations into and out of the streams are reading
> > and modifying mapped memory directly.  The buffer is aligned according
> > to operating-system requirements, and thus the kernel makes the actual
> > data transfers to disk during syncing without having to do the extra
> > copy from user space to kernel memory.
> 
> Sounds good! 
> There are two copies I would like to avoid by using mapped files.  One is
> from kernel space to user space, and the other is from the buffer in user
> space to the octet-array or sequence used in a read-sequence/read-vector
> call. In my question I was thinking of the second copy.

Since read-sequence and write-sequence are in fact transfer operations, I
don't see how you could use them and yet avoid copying.

>    To avoid the second
> copy, the memory address where the file is mapped, the address returned by
> the mmap call, should be (at least) available as an octet array, so that it
> can be used as a sequence directly.

I assume that you want to bypass the stream and its interface, and look
into the memory itself.

An octet array is in CL a lisp object of either type
(simple-array (unsigned-byte 8) (*)) or (simple-array (signed-byte 8) (*)).
But a lisp object arbitrarily allocated in the Lisp heap most definitely does
not have the alignment requirements that are needed in order for mmap() to
succeed (mmap requires at least page-alignment, and Lisp objects are
generally allocated on 8 or 16-byte alignments; they also have headers, so
the alignment is further messed up).

So for a mapped-file stream, we let the operating system choose the
location of the mapping, and then represent its address as simply an
integer.  In Allegro CL, which represents fixnums with 0 in the two or
three least significant bits, such a page-aligned address is easy to
represent without consing; it is simply the address of the memory,
divided by 4 (on 32-bit machines) or by 8 (on 64-bit machines) which
can always be reperesented by a positive or negaitive fixnum.  Here
are two fixnums which represent the memory addresses #x48d00000 and
#xb7300000, conveniently shown by the inspector:

CL-USER(1): #x12340000
305397760
CL-USER(2): :i *
fixnum 305397760 [#x48d00000]
CL-USER(3): #x-12340000
-305397760
CL-USER(4): :i *
fixnum -305397760 [#xb7300000]
CL-USER(5): 

Anyway, the mmap'd workspace can be retrieved by looking in the
"buffer" slot of the mapped stream, where you'll find a fixnum.

> > Although the Allegro CL implementation of simple-streams does not take
> > advantage of asynchronous i/o, it is intended that the simple-streams
> > concept
> > is compatible with async io.  Both device-read and device-write tend to be
> > called with nil as their buffers (which implies that they should use their
> > "standard" buffers from the stream).  This allows these methods to perform
> > their job asynchronously by detaching these buffers and scheduling a task
> > to actually perform the transfer, or in fact the stream class can lock the
> > buffer with a semaphore, so that the transfer can take place while other
> > operations are occuring, and the next write will only be blocked when
> > the previous write has not yet finished.
> > 
> 
> What would be great to have was some sort of event system that could take
> advantage of completion ports on windows, sigio, async read/write,
> select(2), poll(2) on unix.  There are several asynchronous i/o delivery
> mechanisms on various operating systems.  I do not know of any languages
> that combine a great streams implementation with a great asynchronous
> interface, but lisp would be the language where it could be done elegantly
> because it has keyword arguments.  Let me give an example of what I am
> thinking of when I say an asynchronous interface (with pread
> functionality):
> 
> (defun read-3-sequences (stream)
>   (let ((a1 (make-array 100 :element-type '(unsigned-byte 8)))
>         (a2 (make-array 100 :element-type '(unsigned-byte 8)))
>         (a3 (make-array 100 :element-type '(unsigned-byte 8)))
>         (counter 0))
>     (flet ((inc-counter (stream sequence) (atomic-incf counter)))
>       ;; Start 3 reads.  inc-counter handler is called when the read
>       ;; is finished. When they have all finished, counter should
>       ;; be 3.
>       ;; :position is used to indicate the position in the stream we
>       ;; want to read from (like pread(2)).
>       ;; :when-finished implies that the operation should be asynchronous
>       (read-sequence a1 stream :position 0   :when-finished #'inc-counter)
>       (read-sequence a2 stream :position 100 :when-finished #'inc-counter)
>       (read-secuence a3 stream :position 200 :when-finished #'inc-counter))
>     (do () 
>         ((eq counter 3) (values a1 a2 a3)))))

Note that you are assuming extensions to the CL definition of read-sequence.
See my notes above about strategy.  Whether or not these extensions are
really needed, however, I'd love to see you explore this further and
propose a design integrated with simple-streams (with changes to
simple-streams, if appropriate) to accomplish what you want.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 20:00:02 +0000
Message-ID: <48yyl4wda.fsf@beta.franz.com>

Duane Rettig <·····@franz.com> writes:

>  3. Define new device-read and device-write methods which call pread()
>     and pwrite() instead of read() and write(), taking the desired offset
>     argument from the current-position slot in the stream.  The number of
>     octets that were successfully written can then be added to the
>     current-position slot before device-read returns.

Actually, whether or not device-read and device-write increment the
current-position or leave it alone is a design decision that would
have to be made if and when the stream class is designed and implemented.
I naturally chose the behavior that would make the stream semantics
the same as for other CL streams, but I suppose arguments could be
made for requiring streams of this class to always have file-position
called before each read or write operation, since pread and pwrite
act that way.  Such behavior is counter-intuitive, however, and I
would tend to recommend against it.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Peter Seibel
Subject: Re: pread in simple streams
Date: Thu, 19 Dec 2002 22:59:16 +0000
Message-ID: <m3hed962eh.fsf@localhost.localdomain>

Duane Rettig <·····@franz.com> writes:

> I assume that you want to bypass the stream and its interface, and
> look into the memory itself.

[...]

> Anyway, the mmap'd workspace can be retrieved by looking in the
> "buffer" slot of the mapped stream, where you'll find a fixnum.

This may be a simple question, but what can you do with the fixnum? Is
there some standard or implementation-defined way to get back to a
"raw" address and from there to the contents of the memory as you
surmised the OP wanted to do?

-Peter

-- 
Peter Seibel
·····@javamonkey.com

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Fri, 20 Dec 2002 02:00:01 +0000
Message-ID: <4znr1fpha.fsf@beta.franz.com>

Peter Seibel <·····@javamonkey.com> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > I assume that you want to bypass the stream and its interface, and
> > look into the memory itself.
> 
> [...]
> 
> > Anyway, the mmap'd workspace can be retrieved by looking in the
> > "buffer" slot of the mapped stream, where you'll find a fixnum.
> 
> This may be a simple question, but what can you do with the fixnum? Is
> there some standard or implementation-defined way to get back to a
> "raw" address and from there to the contents of the memory as you
> surmised the OP wanted to do?

Allegro CL provides two primitive functions, sys:memref and
sys:memref-int, to access arbitrary memory or lisp objects and
to interpret the bits in any way desired (i.e. pont gun, shoot
foot ...)

memref-int accepts an integer and treats it as a memory address.
It decodes the integer (shifting if a fixnum and pulling out bigits
if a bignum) and adds the index and loads from that memory address.

memref, on the other hand, takes a LispVal (a number of bits of tag
and the rest either pointer or immediate data) and uses that entire
LispVal as the address.  For most Lisp objects, this is easily
understood; you take the object, subtract the tag, and add the byte
index, and that is the address to load from or to store into.  For
fixnums, whose LispVal representations are immediate values, the
bits plus tag in fact _are_ the base address of the object.  So there
is no consing; when the index is added, the memory is referenced and
interpreted according to the call, and you've accessed memory.

These accesses are optimized when conditions are right, so you might
in fact see a memref call compile to one instruction.

This fixnum-punning of addresses is in fact how mapped-file streams
perform the acesses to the "buffer", and how a user can perform random
accesses on the same memory area.  Be sure the address is not in fact
pointing to a Lisp Object in the heap, however; if it moves, you'll be
trashing someone else's object (but of course this won't happen to the
buffer-slot address in a mapped file stream, since it doesn't move).

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Peter Seibel
Subject: Re: pread in simple streams
Date: Fri, 20 Dec 2002 03:44:37 +0000
Message-ID: <m37ke55p6w.fsf@localhost.localdomain>

Duane Rettig <·····@franz.com> writes:

> Peter Seibel <·····@javamonkey.com> writes:
> 
> > Duane Rettig <·····@franz.com> writes:
> > 
> > > I assume that you want to bypass the stream and its interface, and
> > > look into the memory itself.
> > 
> > [...]
> > 
> > > Anyway, the mmap'd workspace can be retrieved by looking in the
> > > "buffer" slot of the mapped stream, where you'll find a fixnum.
> > 
> > This may be a simple question, but what can you do with the fixnum? Is
> > there some standard or implementation-defined way to get back to a
> > "raw" address and from there to the contents of the memory as you
> > surmised the OP wanted to do?
> 
> Allegro CL provides two primitive functions, sys:memref and
> sys:memref-int, to access arbitrary memory or lisp objects and
> to interpret the bits in any way desired (i.e. pont gun, shoot
> foot ...)

So despite the danger (and the warnings in the ACL docs that start
with, "This function provides low-level memory access ...") these
functions (memref-int in particular) would be the correct way to get
at the contents of a memory mapped file, assuming you don't want to
use read-sequence and write-sequence because they will copy data? Is
there any way to use Allegro's foreign-type facility on the mapped
memory? If not I suppose one could write your own classes that wrap up
a mapped-stream and an offset into that stream representing the
beginning of that object's data and then implement accessors that use
memref under the covers to get and set their "slots".

-Peter

-- 
Peter Seibel
·····@javamonkey.com

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Fri, 20 Dec 2002 06:00:01 +0000
Message-ID: <4u1h9xnlf.fsf@beta.franz.com>

Peter Seibel <·····@javamonkey.com> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > Peter Seibel <·····@javamonkey.com> writes:
> > 
> > > Duane Rettig <·····@franz.com> writes:
> > > 
> > > > I assume that you want to bypass the stream and its interface, and
> > > > look into the memory itself.
> > > 
> > > [...]
> > > 
> > > > Anyway, the mmap'd workspace can be retrieved by looking in the
> > > > "buffer" slot of the mapped stream, where you'll find a fixnum.
> > > 
> > > This may be a simple question, but what can you do with the fixnum? Is
> > > there some standard or implementation-defined way to get back to a
> > > "raw" address and from there to the contents of the memory as you
> > > surmised the OP wanted to do?
> > 
> > Allegro CL provides two primitive functions, sys:memref and
> > sys:memref-int, to access arbitrary memory or lisp objects and
> > to interpret the bits in any way desired (i.e. pont gun, shoot
> > foot ...)
> 
> So despite the danger (and the warnings in the ACL docs that start
> with, "This function provides low-level memory access ...") these
> functions (memref-int in particular) would be the correct way to get
> at the contents of a memory mapped file, assuming you don't want to
> use read-sequence and write-sequence because they will copy data?

Well, sort of.  I should apologize for missing the first part of your
question; Stream slots can be accessed via slot-value, although the
names of those slots are not exported. You can also use a fast
accessor macro which works because stream slots are fixed-index (all
documented in or around the streams document).  It is the contents of
the memory-mapping itself that you can get via memref (not memref-int).
See the example below.

> Is
> there any way to use Allegro's foreign-type facility on the mapped
> memory?

Yes.  I suppose that is preferred - you would probably define a foreign
:array type, and access it using an :aligned access (which is what these
punned-fixnums really are).  But beware anyway; going into structures
with memref or with foreign-types is equally as dangerous...

> If not I suppose one could write your own classes that wrap up
> a mapped-stream and an offset into that stream representing the
> beginning of that object's data and then implement accessors that use
> memref under the covers to get and set their "slots".

No, slot-value works fine on the streams themselves:

CL-USER(1): (shell "cat xxx")
abcdefg
0
CL-USER(2): (setq xxx (open "xxx" :mapped t))
; Autoloading for class MAPPED-FILE-SIMPLE-STREAM:
; Fast loading from bundle code/streamm.fasl.
#<MAPPED-FILE-SIMPLE-STREAM #p"xxx" mapped for input pos 0 @ #x7161ca02>
CL-USER(3): :i * skip 25
MAPPED-FILE-SIMPLE-STREAM @ #x7161ca02 = #<MAPPED-FILE-SIMPLE-STREAM
                                           #p"xxx" mapped for input pos 0
                                           @
                                           #x7161ca02>
   ...
  25 CHARPOS ------> fixnum 0 [#x00000000]
  26 BUFFER-PTR ---> fixnum 8 [#x00000020]
  27 BUFFPOS ------> fixnum 0 [#x00000000]
  28 BUFFER -------> fixnum 268455936 [#x40014000]
  29 RECORD-END ---> The symbol NIL
  30 UNREAD-PAST-SOFT-EOF -> The symbol NIL
  31 MODE ---------> fixnum -16 [#xffffffc0]
  32 SRC-POSITION-TABLE -> The symbol NIL
CL-USER(4): (setq buf (slot-value xxx 'excl::buffer))
268455936
CL-USER(5): (loop for i below 8 collect (code-char (sys:memref buf 0 i :unsigned-byte)))
(#\a #\b #\c #\d #\e #\f #\g #\Newline)
CL-USER(6): 

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Peter Seibel
Subject: Re: pread in simple streams
Date: Fri, 20 Dec 2002 07:15:49 +0000
Message-ID: <m33cot5few.fsf@localhost.localdomain>

Duane Rettig <·····@franz.com> writes:

> Peter Seibel <·····@javamonkey.com> writes:
> 
> > Duane Rettig <·····@franz.com> writes:
> > 
> > > Peter Seibel <·····@javamonkey.com> writes:
> > > 
> > > > Duane Rettig <·····@franz.com> writes:
> > > > 
> > > > > I assume that you want to bypass the stream and its interface, and
> > > > > look into the memory itself.
> > > > 
> > > > [...]
> > > > 
> > > > > Anyway, the mmap'd workspace can be retrieved by looking in the
> > > > > "buffer" slot of the mapped stream, where you'll find a fixnum.
> > > > 
> > > > This may be a simple question, but what can you do with the fixnum? Is
> > > > there some standard or implementation-defined way to get back to a
> > > > "raw" address and from there to the contents of the memory as you
> > > > surmised the OP wanted to do?
> > > 
> > > Allegro CL provides two primitive functions, sys:memref and
> > > sys:memref-int, to access arbitrary memory or lisp objects and
> > > to interpret the bits in any way desired (i.e. pont gun, shoot
> > > foot ...)
> > 
> > So despite the danger (and the warnings in the ACL docs that start
> > with, "This function provides low-level memory access ...") these
> > functions (memref-int in particular) would be the correct way to get
> > at the contents of a memory mapped file, assuming you don't want to
> > use read-sequence and write-sequence because they will copy data?
> 
> Well, sort of. I should apologize for missing the first part of your
> question; Stream slots can be accessed via slot-value, although the
> names of those slots are not exported. You can also use a fast
> accessor macro which works because stream slots are fixed-index (all
> documented in or around the streams document). It is the contents of
> the memory-mapping itself that you can get via memref (not
> memref-int). See the example below.

No, no apology required--I think I'm confusing you rather than the
other way around. I may have skipped a step in my question. I wasn't
worried about how to get the slot values off the stream itself; I
assumed slot-value would work for that. Rather, I was imagining
wanting to implement something where I have persistent objects that I
store and retrieve from in a memory-mapped file without a lot of
copying while hiding the fact that these objcets are sitting on top of
a file as much as possible. Rather than instantiate an object with a
bunch of slots that hold values that I fill in by slurping data out of
the memory-mapped file, I might instead write a class and define the
appropritae methods to get and set values directly on the
memory-mapped file.

For instance if I was implementing btree's in Lisp, I might want to
have a btree-node class to represent the make up a btree (each node
consists of a number of key/value/pointer triples where pointers are
pointers to other btree-nodes. The class and some of the methods
specialized on that class that I might need for reading data out of
the file might look something like this (incomplete and untested):

  (defpackage "BTREE" (:use "COMMON-LISP"))

  (in-package "BTREE")

  (defconstant *number-of-keys-offset* 0)
  (defconstant *number-of-keys-size* 2)
  (defconstant *index-start* (+ *number-of-keys-offset* *number-of-keys-size*))
  (defconstant *key-size* 4)
  (defconstant *value-size* 4)
  (defconstant *pointer-size* 4)
  (defconstant *index-entry-size* (+ *key-size* *value-size* *pointer-size*))

  (defclass btree-node ()
    ((buffer :initarg :buffer :accessor buffer)
     (offset :initarg :offset :accessor offset)

  (defmethod number-of-keys ((node btree-node))
    (with-slots (buffer offset) node
      (sys:memref buffer offset *number-of-keys-offset* :unsigned-word)))

  (defmethod value-for-key ((node btree-node) key)
    (long32-at node (value-position (index-for-key node key))))

  (defmethod pointer-for-key ((node btree-node) key)
    (long32-at node (pointer-position (index-for-key node key))))

  (defmethod key-at ((node btree-node) idx)
    (long32-at node (key-position idx)))

  (defmethod index-for-key ((node btree-node) key)
    (dotimes (i (number-of-keys node))
      (if (= (key-at node i)) (return-from index-for-key i)))
    nil)

  (defmethod long32-at ((node btree-node) pos)
    (with-slots (buffer offset) node
      (sys:memref buffer offset pos :unsigned-long32)))

  (defun key-position (idx)
    (+ *index-start* (* idx *index-entry-size*)))

  (defun value-position (idx)
    (+ (key-position idx) *key-size*))

  (defun pointer-position (idx)
    (+ (value-position idx) *value-size*))

Where you'd make a new btree-node something like:

  (defun new-btree-node (pointer)
    (make-instance 'btree-node
                   :buffer (slot-value some-stream 'excl::buffer)
                   :offset (compute-offset-from-pointer pointer)))

where a 'pointer' is a pointer within this file, maybe a raw offset or
maybe encoded somehow.

Beyond that, what I was really thinking is that there's a bunch of
hand-written code here that I could probably generate with apropriate
macros--def-persistent-object or something. Or maybe write a metaclass
that (if I understand correctly) changes the behavior of SLOT-VALUE
such that it uses some information passed in the defclass form to
figure out where exactly it needs to memref to get/set each "slot".

Anyway, hopefully I haven't muddied the waters even worse with my
untested and inexpert code sketch. Regardless, thanks for all your
help.

-Peter

-- 
Peter Seibel
·····@javamonkey.com

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Fri, 20 Dec 2002 09:00:05 +0000
Message-ID: <4ptrxxf8k.fsf@beta.franz.com>

Peter Seibel <·····@javamonkey.com> writes:

>  Rather, I was imagining
> wanting to implement something where I have persistent objects that I
> store and retrieve from in a memory-mapped file without a lot of
> copying while hiding the fact that these objcets are sitting on top of
> a file as much as possible. Rather than instantiate an object with a
> bunch of slots that hold values that I fill in by slurping data out of
> the memory-mapped file, I might instead write a class and define the
> appropritae methods to get and set values directly on the
> memory-mapped file.

 [ example elided ... ]

> Anyway, hopefully I haven't muddied the waters even worse with my
> untested and inexpert code sketch. Regardless, thanks for all your
> help.

No problem.  No, your intention is clear.  I would tend to say that
the problem you're trying to solve isn't really a streams problem,
and that you might consider just making foreign calls to mmap
(or MS's equivalent) to get the mapped file memory space you desire.

Having said that, though, I can attest to and remember the pain that
getting memory-mapped files right put me through, and I suppose using
OPEN and CLOSE to get the mapping (especially with such a trivial
interface) isn't such a bad thing after all; it would be a "file"
thing rather than a "stream" thing.

Be sure to call force-output or finish-output when you want any changes
to show up in the file; some operating systems' mmap implementations
do aggressive synching of the memory to the file, but others do as
little as possible, so if the file remains open and you expect to see
the changes, one of these functions is required.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Alexander Kjeldaas
Subject: Re: pread in simple streams
Date: Fri, 20 Dec 2002 20:08:15 +0000
Message-ID: <atvsk1$c40$1@news.broadnet.no>

--nextPart1795598.hCKFubiVLv
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8Bit

Duane Rettig wrote:

> Alexander Kjeldaas <··········@fast.no> writes:
> 
>> 
>> pread and pwrite are typically used for access to a random-access file.
>> The file backing a database would be an ideal example. If such a database
>> is
>> single threaded pread and pwrite saves the lseek/llseek system call. 
>> That can be important in cases where system calls are expensive.
> 
> Saving the time for an lseek call isn't the argument that sways me here;
> the time taken for the kernel call is completely swamped by the seek and
> rotation latency that occurs when the read is actually done.
> 
> If this had been your only argument, I would have asked for some data
> to show timings showing the efficiency of this.  However ...
> 

Assuming that a the disk is involved in the read, yes.  If only the cache in
the kernel is involved, the system call latency _can_ be an issue.  I have
never _not_ used pread when it was available and I wanted fast random
access, so I have not measured this.  Just for fun I wrote a simple test
program which I have attached that you can run to test various random reads
of different sizes.  I do not think the results I got were stable enough,
but to me they hint that on my machine (a linux desktop box), I got around
10% savings when doing 1M random reads of 1024bytes from a 20MB file.  At
10kB reads pread was less than 2% faster (linux).  However system calls on
linux are known to be fast.

>> So basically lseek + read/write + multi-threading is not a good
>> combination and one solution on unix has been to make the read/write
>> accept a position as an argument and make it an atomic operation.
> 
> Ah, yes; atomicity at the kernel level is _the_ cogent argument for
> using such an operation.
> 
> Well, I see no inherent issue here in simple-streams support for pread.
> You can easily specialize file-simple-stream for this kind of thing.
> Let me outline the specialization, which should be fairly easy:
> 

[removed description of how a pread-enabled simple-streams stream could be
implemented]

> This gives you the capability to manipulate files using FILE-POSITION,
> and use the standard CL api, without actually performing any real seeks
> until they are necessary for the next read or write.  There is a lot to
> be said for not forcing a change to the api - it allows you to provide
> the desired optimization and still allow CL programmers to use an
> interface they are accustomed to.

Thank you for your thorough explanation.  I will probably be offline for a 
few weeks now so I can not continue this discussion.

I will just ask one question before I leave.  When introducing
simple-streams you tried to change as little as possible.  That is natural. 
However, with some foresight it seems that CL apis can be very extensible. 
For example if the standard sais that any specialization of the generic
function read-sequence should include &allow-other-keys in their lambda
lists, it would not be problematic for an implementation to have their
read-sequence gf take an extra keyword argument and old code should still
compile. I see that you have added the partial-fill argument to
read-sequence and other functions so CL might already be designed that way.  

How does the community view addition of non-standard keyword arguments to
functions like read-sequence?  As you saw, I added a few in my example, and
I think CL is one of a handfull of languages where you can extend an api
like that. 

> This gives you the capability to manipulate files using FILE-POSITION,
> and use the standard CL api, without actually performing any real seeks
> until they are necessary for the next read or write.  There is a lot to
> be said for not forcing a change to the api - it allows you to provide
> the desired optimization and still allow CL programmers to use an
> interface they are accustomed to.
> 
[...]
> 
> Finally: It may be that you really want an API that goes beyond the
> CL spec, as youe example below shows.  You can still do this.
> Simple-streams modularizes itself into API, strategy, and device
> layers, and of course the strategy layer implements the CL standard
> API.  However, as we expose the strategy layer more and more, I
> suspect that people will be able to write their own APIs for their
> own languages (Lisp is, after all, the language-writing-language).
> 

I am thinking of features that are beyond the CL spec, but I wonder how far
you can take the old api without breaking it.  To me it seems than an api
can be stretched further in CL than in other languages.

> I presented a tutorial on simple-streams at ILC 2002, and it will be
> included in the proceedings.  In addition, you can download the
> powerpoint from ftp://ftp.franz.com/pub/duane/Simp-stms.ppt.  In this
> presentation I describe simple-streams and go into a little more detail
> about "strategy".  The best way to view this powerpoint is by going to
> the "Notes Page" view.  I followed most of my notes at the conference,
> but even those of you who were at the presentation might enjoy reading
> what notes I had actually written down.
> 

I will take a look.

>> There are two copies I would like to avoid by using mapped files.  One is
>> from kernel space to user space, and the other is from the buffer in user
>> space to the octet-array or sequence used in a read-sequence/read-vector
>> call. In my question I was thinking of the second copy.
> 
> Since read-sequence and write-sequence are in fact transfer operations, I
> don't see how you could use them and yet avoid copying.
> 

My question was answered the thread with Peter Seibel.

> Note that you are assuming extensions to the CL definition of
> read-sequence.
> See my notes above about strategy.  Whether or not these extensions are
> really needed, however, I'd love to see you explore this further and
> propose a design integrated with simple-streams (with changes to
> simple-streams, if appropriate) to accomplish what you want.
> 

One day when all other projects are finished.. ;-)

astor
--nextPart1795598.hCKFubiVLv
Content-Type: text/x-csrc; name="testreadvspread.c"
Content-Transfer-Encoding: 8Bit
Content-Description: Test program comparing lseek/read versus pread
Content-Disposition: attachment; filename="testreadvspread.c"


#define _XOPEN_SOURCE 500
#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#define PERROR_ENSURE(x, msg) if (x) { perror(msg); exit(EXIT_FAILURE); }
#define ENSURE(x, msg) if (x) { fprintf(stderr, msg); exit(EXIT_FAILURE); }
#ifndef timersub
# define timersub(a, b, result)                                               \
  do {                                                                        \
    (result)->tv_sec = (a)->tv_sec - (b)->tv_sec;                             \
    (result)->tv_usec = (a)->tv_usec - (b)->tv_usec;                          \
    if ((result)->tv_usec < 0) {                                              \
      --(result)->tv_sec;                                                     \
      (result)->tv_usec += 1000000;                                           \
    }                                                                         \
  } while (0)
#endif

int main(int argc, char **argv) 
{
  int fd;
  unsigned int size;
  unsigned int num_reads;
  int *req;
  int i;
  struct stat st;
  char *buffer;
  struct timeval start, end, res;
  ENSURE(argc != 4, "Usage testreadvspread size num-reads file\n");
  PERROR_ENSURE((fd = open(argv[3], O_RDONLY)) < 0, "Could not open file");
  PERROR_ENSURE(fstat(fd, &st) < 0, "Could not stat file");
  size = strtoul(argv[1], NULL, 10);
  num_reads = strtoul(argv[2], NULL, 10);
  printf("Reading %s into memory\n", argv[3]);
  ENSURE((buffer = (char *)malloc(st.st_size)) == NULL, "malloc error!");
  PERROR_ENSURE(read(fd, buffer, st.st_size) < 0, "Could not read file");
  ENSURE(!(buffer = (char *)realloc(buffer, size)), "realloc error!");
  ENSURE(!(req = (int *)malloc(num_reads * sizeof(int))), "malloc error(3)!");
  for (i = 0; i < num_reads; i++) {
    req[i] = (int)(((double)(st.st_size-size))*random()/(RAND_MAX+1.0));
  }

  printf("Warmup cycle...");
  for (i = 0; i < num_reads; i++) {
    PERROR_ENSURE(pread(fd, buffer, size, req[i]) < 0, "pread error!");
  }
  printf("done\n");

  printf("Doing %d random reads of size %d (pread)\n", num_reads, size);
  PERROR_ENSURE(gettimeofday(&start, NULL) != 0, "gettimeofday error(1)!");
  for (i = 0; i < num_reads; i++) {
    PERROR_ENSURE(pread(fd, buffer, size, req[i]) < 0, "pread error!");
  }
  PERROR_ENSURE(gettimeofday(&end, NULL) != 0, "gettimeofday error(2)!");
  timersub(&end, &start, &res);
  printf("took %d.%06ds\n", res.tv_sec, res.tv_usec);

  printf("Doing %d random reads of size %d (seek/read)\n", num_reads, size);
  PERROR_ENSURE(gettimeofday(&start, NULL) != 0, "gettimeofday error(1)!");
  for (i = 0; i < num_reads; i++) {
    PERROR_ENSURE(lseek(fd, req[i], SEEK_SET) < 0, "seek error!");
    PERROR_ENSURE(read(fd, buffer, size) < 0, "read error!");
  }
  PERROR_ENSURE(gettimeofday(&end, NULL) != 0, "gettimeofday error(2)!");
  timersub(&end, &start, &res);
  printf("took %d.%06ds\n", res.tv_sec, res.tv_usec);

  exit(EXIT_SUCCESS);
}

--nextPart1795598.hCKFubiVLv--

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Fri, 20 Dec 2002 23:00:02 +0000
Message-ID: <43coswbi1.fsf@beta.franz.com>

Alexander Kjeldaas <··········@fast.no> writes:

> Duane Rettig wrote:
> 
> Thank you for your thorough explanation.  I will probably be offline for a 
> few weeks now so I can not continue this discussion.
> 
> I will just ask one question before I leave.  When introducing
> simple-streams you tried to change as little as possible.  That is natural. 
> However, with some foresight it seems that CL apis can be very extensible. 
> For example if the standard sais that any specialization of the generic
> function read-sequence should include &allow-other-keys in their lambda
> lists, it would not be problematic for an implementation to have their
> read-sequence gf take an extra keyword argument and old code should still
> compile. I see that you have added the partial-fill argument to
> read-sequence and other functions so CL might already be designed that way.  
> 
> How does the community view addition of non-standard keyword arguments to
> functions like read-sequence?  As you saw, I added a few in my example, and
> I think CL is one of a handfull of languages where you can extend an api
> like that. 

I had no problem specifically with your addition of a :position keyword to
read-sequence.  The problem I had was with the semantics of the addition:
if a stream has pread characteristics, then _all_ of the stream api functions
would have to allow for a position argument, and I'm not sure what the
expected behavior would be on such a stream where the position argument
is not specified.  For example, what would you propose as the new interface
to read-char and/or read-byte?

There would be a conflation between the current position of the stream
and the arguments to each stream api function, and that would make
for a confusing api.  Mostly, I am applying the KISS principle.

> > Finally: It may be that you really want an API that goes beyond the
> > CL spec, as youe example below shows.  You can still do this.
> > Simple-streams modularizes itself into API, strategy, and device
> > layers, and of course the strategy layer implements the CL standard
> > API.  However, as we expose the strategy layer more and more, I
> > suspect that people will be able to write their own APIs for their
> > own languages (Lisp is, after all, the language-writing-language).
> > 
> 
> I am thinking of features that are beyond the CL spec, but I wonder how far
> you can take the old api without breaking it.

As far as possible.

>  To me it seems than an api
> can be stretched further in CL than in other languages.

Yes.  This is only my opinion, but I think that either conciously or
subconciously, the designers of CL took expansion into consideration
(much of the wording of the spec points to a more concious process).

 [ ... ]

> > Whether or not these extensions are
> > really needed, however, I'd love to see you explore this further and
> > propose a design integrated with simple-streams (with changes to
> > simple-streams, if appropriate) to accomplish what you want.
> 
> One day when all other projects are finished.. ;-)

Oh, well, it was worth a try...

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Alexander Kjeldaas
Subject: Re: pread in simple streams
Date: Mon, 06 Jan 2003 10:44:20 +0000
Message-ID: <avblu3$uio$1@news.broadnet.no>

Duane Rettig wrote:

> Alexander Kjeldaas <··········@fast.no> writes:
>
>> How does the community view addition of non-standard keyword arguments to
>> functions like read-sequence?  As you saw, I added a few in my example,
>> and I think CL is one of a handfull of languages where you can extend an
>> api like that.
> 
> I had no problem specifically with your addition of a :position keyword to
> read-sequence.  The problem I had was with the semantics of the addition:
> if a stream has pread characteristics, then _all_ of the stream api
> functions would have to allow for a position argument, and I'm not sure
> what the expected behavior would be on such a stream where the position
> argument
> is not specified.  For example, what would you propose as the new
> interface to read-char and/or read-byte?
> 
> There would be a conflation between the current position of the stream
> and the arguments to each stream api function, and that would make
> for a confusing api.  Mostly, I am applying the KISS principle.
>

This calls for at least one more posting on this subject :-)
I view a call to read-sequence/char/byte with the :position keyword to as an
atomic seek to the given position + read-sequence/char/byte + a seek back
to the position we were at before we started.

The stream position is only affected when the :position keyword is not used. 
To me that is not confusing.  Maybe a more lispish name for :position would
be :with-position to make the temporary quality apparent.

astor

From: Duane Rettig
Subject: Re: pread in simple streams
Date: Mon, 06 Jan 2003 18:00:01 +0000
Message-ID: <47kdijjfq.fsf@beta.franz.com>

Alexander Kjeldaas <··········@fast.no> writes:

> Duane Rettig wrote:
> 
> > Alexander Kjeldaas <··········@fast.no> writes:
> >
> >> How does the community view addition of non-standard keyword arguments to
> >> functions like read-sequence?  As you saw, I added a few in my example,
> >> and I think CL is one of a handfull of languages where you can extend an
> >> api like that.
> > 
> > I had no problem specifically with your addition of a :position keyword to
> > read-sequence.  The problem I had was with the semantics of the addition:
> > if a stream has pread characteristics, then _all_ of the stream api
> > functions would have to allow for a position argument, and I'm not sure
> > what the expected behavior would be on such a stream where the position
> > argument
> > is not specified.  For example, what would you propose as the new
> > interface to read-char and/or read-byte?
> > 
> > There would be a conflation between the current position of the stream
> > and the arguments to each stream api function, and that would make
> > for a confusing api.  Mostly, I am applying the KISS principle.
> >
> 
> This calls for at least one more posting on this subject :-)

No problem.  I do enjoy technichal discussions, even if protracted
(although with the Holidays, it does seem like yesterday that we were
discussing this :-)

> I view a call to read-sequence/char/byte with the :position keyword to as an
> atomic seek to the given position + read-sequence/char/byte + a seek back
> to the position we were at before we started.
> 
> The stream position is only affected when the :position keyword is not used. 
> To me that is not confusing.

Your proposed interface is certainly not confusing at the surface, but
allowing atomic and non-atomic seek/transfer operations on a stream class
complicates the stream implementation and renders any buffering
ineffective, causing large performance degradation.  It also confuses
the semantics of just where the atomiocity occurs in the low-level operations.

As an example, suppose we have an output stream X of (unsigned-byte 8)
and its underlying stream handle X' (which might either be a file number
or an encapsulated stream).
Suppose further that X has been recently sync'd with a finish-output
and has position N, and then a series of write-bytes to it places 10
octets into the buffer.  So now the state of the streams is that X'
still has position N, and X has file-position N+10 (a call to
file-position on X might change the position of X', or it might just
note the position of X' and add the 10 octets in the buffer to get the
right answer).

Now, suppose we do a write-sequence on X at position N+2.  In order to
preserve the effect of the previous writes, the contents of the buffer
cannot be discounted.  Two methods could be used to ensure that the file
contents' integrity is not compromised:

 1. The buffer can first be flushed by a call to finish-output, and then
the write can be performed.  Of course, it might be confusing as to why
the position of X' has changed by an operation that is not supposed to
make any position changes, but I can overlook this issue, which really
is an artifact of the synchronization that is necessary due to the
mixed atomic/non-atomic nature of your proposed interface.

 2. The write-sequence can notice that there are octets in the buffer
at the desired positon, and can write into the buffer starting at
position 2 of the buffer.

Unfortunately, approach #1 removes the atomicity of the write-sequence
call, because a finish-output call will be doing a write to X' before
the atomic write of the write-sequence is performed.  It also forces
the buffer to be flushed, which is a performance degradation because
the buffering is thwarted.

And approach #2 defeats the whole purpose of the atomic write-sequence.
When it comes time to write the buffer, what method is used to do
the write?  An atomic one?  If so, by what means is the atomicity
of the buffer write flagged?

Finally, there may be a temptation to say "When an atomic write is
performed, it just bypasses the buffer".  Avoid this temptation; it
will lead to bugs.  If in the above example your write-sequence just
does the pwrite, then when the buffer is finally written, octets 2
through 9 will be the wrong values, because they had been written
_before_ the atomic write, and yet they persist after the buffer is
flushed rather than the first 8 octets that had been written atomically,
which are the octets which should have been preserved.

>  Maybe a more lispish name for :position would
> be :with-position to make the temporary quality apparent.

I think you would have some more design work to do in order to make
your approach workable, before deciding between :position and
:with-position.  I personally think that you would have to change a
lot more of the interface than just adding a keyword, whatever the
name.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Alexander S A Kjeldaas
Subject: Re: pread in simple streams
Date: Tue, 07 Jan 2003 16:36:54 +0000
Message-ID: <in7fzs4dihl.fsf@bacchus.pvv.ntnu.no>

Duane Rettig <·····@franz.com> writes:

> Alexander Kjeldaas <··········@fast.no> writes:
> 
> > I view a call to read-sequence/char/byte with the :position keyword to as an
> > atomic seek to the given position + read-sequence/char/byte + a seek back
> > to the position we were at before we started.
> > 
> > The stream position is only affected when the :position keyword is not used. 
> > To me that is not confusing.

What I wrote above is wrong!  There is no reason the seek +
read-sequence/char/byte + seek should be atomic.  It should be
isolated i.e. side-effect free.  N (read-sequence :position)
operations shold be able to be in flight at the same time without
there being any side-effect issues wrt the file position.  What
happens when there are other write-sequences at the same time would be
undefined in the general case.

In this case there should not be issues with encapsulated streams. If
you are encapsulating a stream that can not do several read-sequences
in parallel then you have to use locking.

Sorry for the confusion.

astor

From: Scott Schwartz
Subject: Re: pread in simple streams
Date: Sat, 21 Dec 2002 07:21:41 +0000
Message-ID: <8gel8bvnui.fsf@galapagos.cse.psu.edu>

Duane Rettig <·····@franz.com> writes:
> After over 25 years of unix hacking, I've never used pread() and didn't
> know it existed.

It's a recent invention.

>  After reading the man pages for it, I see very little
> to be gained in efficiency;

It supports multi-threaded programms.  lseek() followed by read() or
write() isn't atomic, so someone could move the seek pointer in
between.  pread()/pwrite() let you manipulate random access files
without touching that global state.

From: Duane Rettig
Subject: Re: implementations allowing bivalent streams
Date: Tue, 17 Dec 2002 20:00:02 +0000
Message-ID: <4znr48mq3.fsf@beta.franz.com>

Duane Rettig <·····@franz.com> writes:

> Simple-streams does write external data in terms of 8-bit bytes.
       ^^^       ^^^

English. Yech.  Tripping over my native tongue.

"The simple-streams package does ..."  or

"Simple-streams do ..." or

"A simple-stream does ..."

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Marc Battyani
Subject: Re: implementations allowing bivalent streams
Date: Tue, 17 Dec 2002 20:30:22 +0000
Message-ID: <FCAE57838187F845.6F91A3352B844A66.FADF06D59726B327@lp.airnews.net>

"Thomas F. Burdick" <···@conquest.OCF.Berkeley.EDU> wrote
> "Marc Battyani" <·············@fractalconcept.com> writes:
>
> > In cl-pdf I have to write characters and binary data (for JPEG) to the
pdf
> > stream.
> > At least LW and ACL allows for this but I would like to know what
> > implementations does not allow it. So that I can add the some
conditional
> > code to handle it if needed.
>
> Are you sure you need this?  Doesn't the PDF spec define PDFs as being
> a stream of octets?  (Not rhetorical, I'm working from memory).  If
> so, it might be a good idea to always write 8-bit bytes.

A PDF file is a sequence of 8 bit bytes. But all the content of this file is
in ASCII except for the content streams which can be also be compressed
ASCII or directly in binary format like image data.
As every PDF construct is in ASCII, it's easier to deal with a character
file where we can use #'format for almost all and just consider binary data
as a special case than the contrary. IIRC, in cl-pdf I have only one line
where I have to worry about binary data.

Marc

From: Sam Steingold
Subject: Re: implementations allowing bivalent streams
Date: Tue, 17 Dec 2002 16:32:43 +0000
Message-ID: <m3lm2oeha1.fsf@loiso.podval.org>

> * In message <··················································@lp.airnews.net>
> * On the subject of "implementations allowing bivalent streams"
> * Sent on Tue, 17 Dec 2002 09:40:27 +0100
> * Honorable "Marc Battyani" <·············@fractalconcept.com> writes:
>
> In cl-pdf I have to write characters and binary data (for JPEG) to the
> pdf stream.  At least LW and ACL allows for this but I would like to
> know what implementations does not allow it. So that I can add the
> some conditional code to handle it if needed.

CLISP allows changing STREAM-ELEMENT-TYPE:
<http://clisp.cons.org/impnotes/stream-dict.html#stream-eltype>
<http://clisp.cons.org/impnotes.html#stream-eltype>

-- 
Sam Steingold (http://www.podval.org/~sds) running RedHat8 GNU/Linux
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.palestine-central.com/links.html>
Let us remember that ours is a nation of lawyers and order.

From: Marc Battyani
Subject: Re: implementations allowing bivalent streams
Date: Tue, 17 Dec 2002 20:28:56 +0000
Message-ID: <7609058E9111A22A.CA4B876C22D914AD.9E78D0782B1FCE33@lp.airnews.net>

"Sam Steingold" <···@gnu.org> wrote
> > * On the subject of "implementations allowing bivalent streams"
> > * Sent on Tue, 17 Dec 2002 09:40:27 +0100
> > * Honorable "Marc Battyani" <·············@fractalconcept.com> writes:
> >
> > In cl-pdf I have to write characters and binary data (for JPEG) to the
> > pdf stream.  At least LW and ACL allows for this but I would like to
> > know what implementations does not allow it. So that I can add the
> > some conditional code to handle it if needed.
>
> CLISP allows changing STREAM-ELEMENT-TYPE:
> <http://clisp.cons.org/impnotes/stream-dict.html#stream-eltype>
> <http://clisp.cons.org/impnotes.html#stream-eltype>

OK so it can be optimized for CLISP.
Volunteers somewhere ?

Marc