From: Peter Seibel
Subject: Modern GC's and destructive operators
Date: 
Message-ID: <m33cir6l8b.fsf@javamonkey.com>
In the Java world the folks writing Java virtual machines have put a
bunch of effort into improving performance on what they called "small
object" garbage collection. I.e. how does the GC perform when lots of
small objects are being created and quickly becoming garbage. (E.g.
generational garbage collectors helps with this problem.)

Lisp implementors have obviously been working on GC technology for
quite some time. As I was thinking about destructive operators
(DELETE, NSUBSTITUTE, etc.) it occured to me that in a system where
the GC were sufficiently good the performance benefit of using
destructive operators would approach zero.

My question is, does anyone know if any current CL implementations
have set out to tune their GC such that there would really be no
reason to use those destructive functions?

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: Pascal Costanza
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <costanza-B7DB89.21322603062003@news.netcologne.de>
In article <··············@javamonkey.com>,
 Peter Seibel <·····@javamonkey.com> wrote:

> In the Java world the folks writing Java virtual machines have put a
> bunch of effort into improving performance on what they called "small
> object" garbage collection. I.e. how does the GC perform when lots of
> small objects are being created and quickly becoming garbage. (E.g.
> generational garbage collectors helps with this problem.)

This is not generally true - although I know what you mean.

The JVM implementations that do these kinds of things are foremostly the 
HotSpot VM by Sun and the JVM by IBM. Then there is the Jikes Research 
VM, also by IBM, that is mainly an experimental platform. Apart from 
that no implementation that I know of can do these stunts. (I am not so 
sure about the implementation by Microsoft.)

The point here is that they have thrown lots of money and manpower on 
this stuff, much more than anybody else can probably afford.

> Lisp implementors have obviously been working on GC technology for
> quite some time. As I was thinking about destructive operators
> (DELETE, NSUBSTITUTE, etc.) it occured to me that in a system where
> the GC were sufficiently good the performance benefit of using
> destructive operators would approach zero.
> 
> My question is, does anyone know if any current CL implementations
> have set out to tune their GC such that there would really be no
> reason to use those destructive functions?

For the above reason I wouldn't expect that. (But I might be wrong.)


Pascal
From: Michael Parker
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <030620031926084552%michaelparker@earthlink.net>
In article <··············@javamonkey.com>, Peter Seibel
<·····@javamonkey.com> wrote:

> In the Java world the folks writing Java virtual machines have put a
> bunch of effort into improving performance on what they called "small
> object" garbage collection. I.e. how does the GC perform when lots of
> small objects are being created and quickly becoming garbage. (E.g.
> generational garbage collectors helps with this problem.)

historically, most of the research in to garbage collection in general,
and generational collectors in particular were done for Lisp and
Smalltalk.  Most (all) commercial lisps and smalltalks use generational
collectors.  I know Lispworks, ACL, and ParcPlace Smalltalk do.  they
do indeed deal very well with that type of memory usage.
 
> Lisp implementors have obviously been working on GC technology for
> quite some time. As I was thinking about destructive operators
> (DELETE, NSUBSTITUTE, etc.) it occured to me that in a system where
> the GC were sufficiently good the performance benefit of using
> destructive operators would approach zero.

Theoretical performance or actual performance?  Theoretically yes.  In
actual practice, no, no collector really does achieve this on real
problems, for a variety of reasons, cache and VM being two of the big
ones.  If you mutate an object there's a chance it'll be cache, while
if you're always creating new objects there's little chance that memory
will be in cache.  Andrew Appel had a "proof" years ago (early 90's,
don't have the reference offhand) that what you are asking for was
possible, but this was purely theoretical -- it never happened in the
real world.

> My question is, does anyone know if any current CL implementations
> have set out to tune their GC such that there would really be no
> reason to use those destructive functions?

No.  But they are good enough that there's no real reason to go out of
your way to use the destructive operators except as an optimization
(after you've done some profiling and decided that it's a problem).
From: Kaz Kylheku
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <cf333042.0306032114.96ca186@posting.google.com>
Michael Parker <·············@earthlink.net> wrote in message news:<································@earthlink.net>...
> In article <··············@javamonkey.com>, Peter Seibel
> <·····@javamonkey.com> wrote: 
> > Lisp implementors have obviously been working on GC technology for
> > quite some time. As I was thinking about destructive operators
> > (DELETE, NSUBSTITUTE, etc.) it occured to me that in a system where
> > the GC were sufficiently good the performance benefit of using
> > destructive operators would approach zero.
> 
> Theoretical performance or actual performance?  Theoretically yes.  In
> actual practice, no, no collector really does achieve this on real
> problems, for a variety of reasons, cache and VM being two of the big
> ones.  If you mutate an object there's a chance it'll be cache, while
> if you're always creating new objects there's little chance that memory
> will be in cache.  Andrew Appel had a "proof" years ago (early 90's,
> don't have the reference offhand) that what you are asking for was
> possible, but this was purely theoretical -- it never happened in the
> real world.

Sorry, I don't believe this. Suppose I have a twenty kilobyte object,
and I want to make one almost like it, but differing in one word of
memory. You're telling me that in theory, I can make a brand new
object, copy all twenty kilobytes into  it, and substitute my change,
as cheaply as just modifying that word in place.

That can only be true if memory accesses don't cost anything. If
memory accesses don't take any cycles, then what does? Maybe we can
close our eyes and pretend that no operation takes any cycles.
Addition, multiplication, all zero cost, what the heck. Gate
propagation delay? What is that?
From: Barry Margolin
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <gwnDa.6$GF1.158@paloalto-snr1.gtei.net>
In article <···························@posting.google.com>,
Kaz Kylheku <···@ashi.footprints.net> wrote:
>Sorry, I don't believe this. Suppose I have a twenty kilobyte object,
>and I want to make one almost like it, but differing in one word of
>memory. You're telling me that in theory, I can make a brand new
>object, copy all twenty kilobytes into  it, and substitute my change,
>as cheaply as just modifying that word in place.

Generational GC deals well with creation of lots of small, short-lived
objects.  The larger and longer-lived the objects get, the further from
this theoretical perfection you go.

-- 
Barry Margolin, ··············@level3.com
Level(3), Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: Nils Goesche
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <874r3530eb.fsf@darkstar.cartan>
Barry Margolin <··············@level3.com> writes:

> In article <···························@posting.google.com>,
> Kaz Kylheku <···@ashi.footprints.net> wrote:

> > Sorry, I don't believe this. Suppose I have a twenty kilobyte
> > object, and I want to make one almost like it, but differing
> > in one word of memory. You're telling me that in theory, I
> > can make a brand new object, copy all twenty kilobytes into
> > it, and substitute my change, as cheaply as just modifying
> > that word in place.
> 
> Generational GC deals well with creation of lots of small,
> short-lived objects.  The larger and longer-lived the objects
> get, the further from this theoretical perfection you go.

That's true, but the original question was whether great
generational GCs make destructive operations unnecessary.  They
don't, of course.  Just consider the following program:

(defun nwarp (cell)
  (declare ((cons (unsigned-byte 8)) cell)
           (optimize speed (safety 0)
                     #+lispworks (fixnum-safety 0)))
  (setf (car cell) (logand #xFF (+ 13 (* (car cell) 17))))
  cell)

(defun warp (cell)
  (declare ((cons (unsigned-byte 8)) cell)
           (optimize speed (safety 0)
                     #+lispworks (fixnum-safety 0)))
  (cons (logand #xFF (+ 13 (* (car cell) 17)))
        (cdr cell)))

(defun test ()
  (format t "~&Timing WARP:")
  (let ((cell (cons 19 42)))
    (time (loop repeat 8000000 do
            (setq cell (warp cell))))
    (format t "~&Cell is ~A." cell))
  (format t "~&Timing NWARP:")
  (let ((cell (cons 19 42)))
    (time (loop repeat 8000000 do
            (setq cell (nwarp cell))))
    (format t "~&Cell is ~A." cell)))

LispWorks has a really good generational GC.  So good that I
often don't mind generating lots of garbage in loops.  But still,
that doesn't mean that we need not use destructive operations
anymore.  Some timings:

CL-USER 21 > (test)
Timing WARP:
Timing the evaluation of (LOOP REPEAT 8000000 DO (SETQ CELL (WARP CELL)))

user time    =      1.100
system time  =      0.000
Elapsed time =   0:00:01
Allocation   = 544 bytes standard / 88002167 bytes fixlen
0 Page faults
Cell is (19 . 42).
Timing NWARP:
Timing the evaluation of (LOOP REPEAT 8000000 DO (SETQ CELL (NWARP CELL)))

user time    =      0.340
system time  =      0.000
Elapsed time =   0:00:00
Allocation   = 672 bytes standard / 2387 bytes fixlen
0 Page faults
Cell is (19 . 42).
NIL



The destructive version is much faster.  Does anybody really
believe that a better GC would make this difference go away?
Give me a break ;-)

Regards,
-- 
Nils G�sche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID #xD26EF2A0
From: Michael Parker
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <040620031908534581%michaelparker@earthlink.net>
In article <··············@darkstar.cartan>, Nils Goesche
<···@cartan.de> wrote:

> 
> That's true, but the original question was whether great
> generational GCs make destructive operations unnecessary.  They
> don't, of course.  Just consider the following program:
...
> CL-USER 21 > (test)
> Timing WARP:
> Timing the evaluation of (LOOP REPEAT 8000000 DO (SETQ CELL (WARP CELL)))
> 
> user time    =      1.100
...
> Timing NWARP:
> Timing the evaluation of (LOOP REPEAT 8000000 DO (SETQ CELL (NWARP CELL)))
> 
> user time    =      0.340
> The destructive version is much faster.  Does anybody really
> believe that a better GC would make this difference go away?
> Give me a break ;-)

I don't know how advanced the Lispworks GC is -- from using it I'd say
it's pretty good, but I don't really know.  In general I'd say that
Lisp GC's aren't quite as advanced as Haskell/ML/Self GC's, simply
because the availability and widespread utility of destructive
operations means that Lisp GC's don't *have* to be as advanced.

But back to your question, with all regards, yes, it is possible that a
"better" GC would theoretically make this difference go away.  The
emphasis is on *theoretically*, which was the point of my original
post.  In *practice*, cache effects and VM effects and register effects
mean that destructive operators are usually faster.  There are GC's out
there for which the example code would have resulted in the copying
version being faster than mutation.  You may be forgetting about
write-barrier overhead and other related bookkeeping operations needed
to make generational gc work.  Depending on the GC system, that simple
setf could easily mean 5-20 extra instructions each time.
From: Nils Goesche
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <87vfvl1hf0.fsf@darkstar.cartan>
Michael Parker <·············@earthlink.net> writes:

> In article <··············@darkstar.cartan>, Nils Goesche
> <···@cartan.de> wrote:
> 
> > That's true, but the original question was whether great
> > generational GCs make destructive operations unnecessary.  They
> > don't, of course.  Just consider the following program:
> ...
> > CL-USER 21 > (test)
> > Timing WARP:
> > Timing the evaluation of (LOOP REPEAT 8000000 DO (SETQ CELL (WARP CELL)))
> > 
> > user time    =      1.100
> ...
> > Timing NWARP:
> > Timing the evaluation of (LOOP REPEAT 8000000 DO (SETQ CELL (NWARP CELL)))
> > 
> > user time    =      0.340
> > The destructive version is much faster.  Does anybody really
> > believe that a better GC would make this difference go away?
> > Give me a break ;-)
> 
> I don't know how advanced the Lispworks GC is -- from using it
> I'd say it's pretty good, but I don't really know.  In general
> I'd say that Lisp GC's aren't quite as advanced as
> Haskell/ML/Self GC's, simply because the availability and
> widespread utility of destructive operations means that Lisp
> GC's don't *have* to be as advanced.

Sorry.  The point of the example was not really that the
destructive version is faster with a certain ``good�� GC.  That
alone wouldn't prove that it would be faster even with an ideal
GC, either, of course.  I just wanted to encourage people to
think about whether this particular simple example could
/possibly/ have a different outcome with an ideal GC.

> But back to your question, with all regards, yes, it is
> possible that a "better" GC would theoretically make this
> difference go away.

I cannot believe this, so far.

> The emphasis is on *theoretically*, which was the point of my
> original post.  In *practice*, cache effects and VM effects and
> register effects mean that destructive operators are usually
> faster.

Even without those, I do not believe (yet) that it is possible
that, with /any/ GC, NWARP could be as fast as WARP.

> There are GC's out there for which the example code would have
> resulted in the copying version being faster than mutation.

It would really baffle me if that was true.

> You may be forgetting about write-barrier overhead and other
> related bookkeeping operations needed to make generational gc
> work.  Depending on the GC system, that simple setf could
> easily mean 5-20 extra instructions each time.

Well, I am not really forgetting about it because I don't even
know what you mean by ``write-barrier��.  Sure, (SETF CAR)
writes, but how could CONS create a copy without writing, too?

Here is what NWARP compiles to in my version of LispWorks:

       0:      8B78FF           move  edi, [eax-1]
       3:      6BFF11           imul  edi, edi, 11
       6:      81C7000D0000     add   edi, D00
      12:      81E700FF0000     and   edi, FF00
      18:      8978FF           move  [eax-1], edi
      21:      FD               std
      22:      C3               ret

Here is what WARP compiles to:

       0:      8B78FF           move  edi, [eax-1]
       3:      6BFF11           imul  edi, edi, 11
       6:      81C7000D0000     add   edi, D00
      12:      81E700FF0000     and   edi, FF00
      18:      8B4003           move  eax, [eax+3]
      21:      FF3424           push  [esp]
      24:      897C2404         move  [esp+4], edi
      28:      E95F7D4600       jmp   20B4DBBA

The `jmp� will go to CONS.  This is unavoidable and, obviously,
totally independant of the GC implementation.  It is friggin'
obvious (I think) that this is slower.  How can a better GC
improve on this?

Regards,
-- 
Nils G�sche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID #xD26EF2A0
From: Michael Parker
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <040620032311129322%michaelparker@earthlink.net>
> > I don't know how advanced the Lispworks GC is -- from using it
> > I'd say it's pretty good, but I don't really know.  In general
> > I'd say that Lisp GC's aren't quite as advanced as
> > Haskell/ML/Self GC's, simply because the availability and
> > widespread utility of destructive operations means that Lisp
> > GC's don't *have* to be as advanced.
> 
> Sorry.  The point of the example was not really that the
> destructive version is faster with a certain ``good�� GC.  That
> alone wouldn't prove that it would be faster even with an ideal
> GC, either, of course.  I just wanted to encourage people to
> think about whether this particular simple example could
> /possibly/ have a different outcome with an ideal GC.
> 
> > But back to your question, with all regards, yes, it is
> > possible that a "better" GC would theoretically make this
> > difference go away.
> 
> I cannot believe this, so far.
> 
> > The emphasis is on *theoretically*, which was the point of my
> > original post.  In *practice*, cache effects and VM effects and
> > register effects mean that destructive operators are usually
> > faster.
> 
> Even without those, I do not believe (yet) that it is possible
> that, with /any/ GC, NWARP could be as fast as WARP.
> 
> > There are GC's out there for which the example code would have
> > resulted in the copying version being faster than mutation.
> 
> It would really baffle me if that was true.

See Clinger's post.  It does  happen, there are implementations that
are that fast.  Part of it is the write barrier (or equivalent)
overhead
that many of the more advanced GC methods need.


> > You may be forgetting about write-barrier overhead and other
> > related bookkeeping operations needed to make generational gc
> > work.  Depending on the GC system, that simple setf could
> > easily mean 5-20 extra instructions each time.
> 
> Well, I am not really forgetting about it because I don't even
> know what you mean by ``write-barrier��.  Sure, (SETF CAR)
> writes, but how could CONS create a copy without writing, too?

Initializing-updates (like cons does) are special, and don't involve
write barriers.


> Here is what NWARP compiles to in my version of LispWorks:
> 
>        0:      8B78FF           move  edi, [eax-1]
>        3:      6BFF11           imul  edi, edi, 11
>        6:      81C7000D0000     add   edi, D00
>       12:      81E700FF0000     and   edi, FF00
>       18:      8978FF           move  [eax-1], edi
>       21:      FD               std
>       22:      C3               ret
> 
> Here is what WARP compiles to:
> 
>        0:      8B78FF           move  edi, [eax-1]
>        3:      6BFF11           imul  edi, edi, 11
>        6:      81C7000D0000     add   edi, D00
>       12:      81E700FF0000     and   edi, FF00
>       18:      8B4003           move  eax, [eax+3]
>       21:      FF3424           push  [esp]
>       24:      897C2404         move  [esp+4], edi
>       28:      E95F7D4600       jmp   20B4DBBA
> 
> The `jmp� will go to CONS.  This is unavoidable and, obviously,
> totally independant of the GC implementation.  It is friggin'
> obvious (I think) that this is slower.  How can a better GC
> improve on this?

It's not obvious at all.  GC systems are extremely varied and you'd be
amazed at the levels of optimization and tuning that goes into a really
good one. 

A GC implementation that requires a write barrier will impose an
additional overhead on the nwarp implementation of 5-20 instructions,
several of which are branches; and a good copying generational
collector can result in CONS being inlined into about 5 instructions --
seriously.  In such a system the allocation is a test-and-branch
(almost never taken), increment, mov to the car, and mov to the cdr.

The fact that Lispworks has to call a function for CONS implies to me
that it doesn't use a copying collector (probably to make interfacing
to C easier?).  At any rate, it doesn't seem to use as advanced a
collector as SML/NJ or Self (or ACL AFAICT).  If indeed it made this
tradeoff to make it easier to hook to C, it's probably a worthwhile
tradeoff.

I seem to remember a paper by Appel or one of his students describing a
particular ML program that on average allocated a word every 20
instructions, that nonetheless had only a 20% total memory management
overhead -- that's allocation and GC combined.
From: Brian Downing
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <2dHDa.1121367$F1.134993@sccrnsc04>
In article <································@earthlink.net>,
Michael Parker  <·············@earthlink.net> wrote:
> It's not obvious at all.  GC systems are extremely varied and you'd be
> amazed at the levels of optimization and tuning that goes into a really
> good one. 
> 
> A GC implementation that requires a write barrier will impose an
> additional overhead on the nwarp implementation of 5-20 instructions,
> several of which are branches; and a good copying generational
> collector can result in CONS being inlined into about 5 instructions --
> seriously.  In such a system the allocation is a test-and-branch
> (almost never taken), increment, mov to the car, and mov to the cdr.

Perhaps I'm wrong, but I've always heard that write barriers are
typically implemented with the CPU's paging unit.  The page is set
to read-only, and you take a trap on the first write, when it is
changed to read-write.  This way you'll only take a trap if you are
writing to a page that hasn't been written to yet.

Here the code from CMUCL (18e) - I know that CMUCL uses a generational
collector (:GENCGC is on *FEATURES*), and that requires write
barriers.  There are no branches in NWARP:

4802F6B0:     .ENTRY NWARP()               ; FUNCTION
      C8:     POP   DWORD PTR [EBP-8]
      CB:     LEA   ESP, [EBP-32]
      CE:     MOV   EAX, [EDX-3]           ; No-arg-parsing entry point
      D1:     MOV   ECX, EAX
      D3:     SHL   ECX, 4
      D6:     ADD   EAX, ECX
      D8:     ADD   EAX, 52
      DB:     AND   EAX, 1020
      E0:     MOV   [EDX-3], EAX
      E3:     MOV   ECX, [EBP-8]
      E6:     MOV   EAX, [EBP-4]
      E9:     ADD   ECX, 2
      EC:     MOV   ESP, EBP
      EE:     MOV   EBP, EAX
      F0:     JMP   ECX

Compare with WARP, which clearly has an inline allocation (the only
call is to alloc_overflow_ecx, which sounds exceptional).  It looks
like if you ignore the atomicity stuff, the allocator is only about
6 or 7 instructions, like you said; from 1D1 to 1EF:

48045190:     .ENTRY WARP()                ; FUNCTION
     1A8:     POP   DWORD PTR [EBP-8]
     1AB:     LEA   ESP, [EBP-32]
     1AE:     MOV   EAX, [EDX-3]           ; No-arg-parsing entry point
     1B1:     MOV   ECX, EAX
     1B3:     SHL   ECX, 4
     1B6:     ADD   EAX, ECX
     1B8:     ADD   EAX, 52
     1BB:     AND   EAX, 1020
     1C0:     MOV   EDX, [EDX+1]
     1C3:     MOV   BYTE PTR [#x280001D4], 0 ; CL::*PSEUDO-ATOMIC-INTERRUPTED*
     1CA:     MOV   BYTE PTR [#x280001BC], 4 ; CL::*PSEUDO-ATOMIC-ATOMIC*
     1D1:     MOV   ECX, 8
     1D6:     ADD   ECX, [#x28000504] ; X86::*CURRENT-REGION-FREE-POINTER*
     1DC:     CMP   ECX, [#x2800051C] ; X86::*CURRENT-REGION-END-ADDR*
     1E2:     JBE   L0
     1E4:     CALL  #xB0000038        ; #xB0000038: alloc_overflow_ecx
     1E9: L0: XCHG  ECX, [#x28000504] ; X86::*CURRENT-REGION-FREE-POINTER*
     1EF:     LEA   ECX, [ECX+3]
     1F2:     MOV   BYTE PTR [#x280001BC], 0 ; CL::*PSEUDO-ATOMIC-ATOMIC*
     1F9:     CMP   BYTE PTR [#x280001D4], 0 ; CL::*PSEUDO-ATOMIC-INTERRUPTED*
     200:     JEQ   L1
     202:     BREAK 9                      ; Pending interrupt trap
     204: L1: MOV   [ECX-3], EAX
     207:     MOV   [ECX+1], EDX
     20A:     MOV   EDX, ECX
     20C:     MOV   ECX, [EBP-8]
     20F:     MOV   EAX, [EBP-4]
     212:     ADD   ECX, 2
     215:     MOV   ESP, EBP
     217:     MOV   EBP, EAX
     219:     JMP   ECX

-bcd
--
(format nil "~:(~{~A ~}~)~(<····@·@{~A~#[~:;.~]~}~}>~)"
        '(brian downing) '(bdowning lavos net))
From: Paul Dietz
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <3EDF5718.769080C8@motorola.com>
Brian Downing wrote:
 
> Perhaps I'm wrong, but I've always heard that write barriers are
> typically implemented with the CPU's paging unit.  The page is set
> to read-only, and you take a trap on the first write, when it is
> changed to read-write.  This way you'll only take a trap if you are
> writing to a page that hasn't been written to yet.

They can be implemented this way, but this approach has problems,
particularly if page sizes are large.  Alternate user-space approaches
(such as 'card marking') are also used; these enable more precise
localization of modifications to old spaces.

	Paul
From: Joe Marshall
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <xXMDa.844221$OV.797959@rwcrnsc54>
"Brian Downing" <···········@lavos.net> wrote in message ····························@sccrnsc04...
> In article <································@earthlink.net>,
> Michael Parker  <·············@earthlink.net> wrote:
> > It's not obvious at all.  GC systems are extremely varied and you'd be
> > amazed at the levels of optimization and tuning that goes into a really
> > good one.
> >
> > A GC implementation that requires a write barrier will impose an
> > additional overhead on the nwarp implementation of 5-20 instructions,
> > several of which are branches; and a good copying generational
> > collector can result in CONS being inlined into about 5 instructions --
> > seriously.  In such a system the allocation is a test-and-branch
> > (almost never taken), increment, mov to the car, and mov to the cdr.
>
> Perhaps I'm wrong, but I've always heard that write barriers are
> typically implemented with the CPU's paging unit.  The page is set
> to read-only, and you take a trap on the first write, when it is
> changed to read-write.  This way you'll only take a trap if you are
> writing to a page that hasn't been written to yet.

That has been tried, but it turns out on most CPU's and OS's, the
overhead of taking a page fault is *far* higher than taking an
inline test and branch on each write.

On WARP and NWARP, the compiler has noticed that a fixnum is being
written to the location, consequently the write barrier test need
not be performed.
From: Brian Downing
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <8INDa.877391$Zo.198436@sccrnsc03>
In article <······················@rwcrnsc54>,
Joe Marshall <·············@attbi.com> wrote:
> That has been tried, but it turns out on most CPU's and OS's, the
> overhead of taking a page fault is *far* higher than taking an
> inline test and branch on each write.
> 
> On WARP and NWARP, the compiler has noticed that a fixnum is being
> written to the location, consequently the write barrier test need
> not be performed.

CMUCL definitely seems to take the page-fault approach.  You can
look through the GENCGC code and see all kinds of write-protecting
going on.  The performance of it has never seemed to be horrible
(especially since it is conservative as well, due to the x86's tiny
register file).

The disassembly for:

(defun nwarp-2 (cell value)
  (declare (optimize (speed 3) (safety 0) (debug 0)))
  (setf (car cell) value)  ; Can make older gens point to newer ones!
  cell)

is:

485CB818:       .ENTRY NWARP-2()             ; FUNCTION
      30:       POP   DWORD PTR [EBP-8]
      33:       LEA   ESP, [EBP-32]
      36:       MOV   [EDX-3], EDI           ; No-arg-parsing entry point
      39:       MOV   ECX, [EBP-8]
      3C:       MOV   EAX, [EBP-4]
      3F:       ADD   ECX, 2
      42:       MOV   ESP, EBP
      44:       MOV   EBP, EAX
      46:       JMP   ECX

Definitely no branching going on there.

Also, if you assume that most of your mutations are happening in
gen 0, then you are not taking any traps at all.

Inline tests would seem to win when:

1. Most mutations are on many old gen objects spanning many different 
   pages.
2. Your OS's trap handling is horribly slow.
3. You want more portable code.  Page protection is very processor
   and OS specific.
4. You want to run on hardware without a paging unit.  :-)

Write protected pages would seem to win when:

1. Most mutations are on gen 0 objects, or on old gen objects spanning
   few pages (e.g. a single object mutated 10000 times between GC's
   only takes one trap, as opposed to 10000 tests and branches
   with inline tests.)  Gen 0 mutations obviously take no trap
   even the first time.
2. Your OS's trap handling is of a reasonable speed.

Which of the above is better is a good question, and probably depends
on the situation at hand.

-bcd
--
*** Brian Downing <bdowning at lavos dot net> 
From: Joe Marshall
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <r5ODa.54500$DV.75796@rwcrnsc52.ops.asp.att.net>
"Brian Downing" <···········@lavos.net> wrote in message ···························@sccrnsc03...
>
> Inline tests would seem to win when:
>
> 1. Most mutations are on many old gen objects spanning many different
>    pages.
> 2. Your OS's trap handling is horribly slow.
> 3. You want more portable code.  Page protection is very processor
>    and OS specific.
> 4. You want to run on hardware without a paging unit.  :-)
>
> Write protected pages would seem to win when:
>
> 1. Most mutations are on gen 0 objects, or on old gen objects spanning
>    few pages (e.g. a single object mutated 10000 times between GC's
>    only takes one trap, as opposed to 10000 tests and branches
>    with inline tests.)  Gen 0 mutations obviously take no trap
>    even the first time.
> 2. Your OS's trap handling is of a reasonable speed.
>
> Which of the above is better is a good question, and probably depends
> on the situation at hand.

OS trap handling is typically in the thousands of instructions,
(which I count as horribly slow).
From: Nils Goesche
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <lyvfvk7lea.fsf@cartan.de>
Michael Parker <·············@earthlink.net> writes:

> > Here is what NWARP compiles to in my version of LispWorks:
> > 
> >        0:      8B78FF           move  edi, [eax-1]
> >        3:      6BFF11           imul  edi, edi, 11
> >        6:      81C7000D0000     add   edi, D00
> >       12:      81E700FF0000     and   edi, FF00
> >       18:      8978FF           move  [eax-1], edi
> >       21:      FD               std
> >       22:      C3               ret
> > 
> > Here is what WARP compiles to:
> > 
> >        0:      8B78FF           move  edi, [eax-1]
> >        3:      6BFF11           imul  edi, edi, 11
> >        6:      81C7000D0000     add   edi, D00
> >       12:      81E700FF0000     and   edi, FF00
> >       18:      8B4003           move  eax, [eax+3]
> >       21:      FF3424           push  [esp]
> >       24:      897C2404         move  [esp+4], edi
> >       28:      E95F7D4600       jmp   20B4DBBA
> > 
> > The `jmp� will go to CONS.  This is unavoidable and, obviously,
> > totally independant of the GC implementation.  It is friggin'
> > obvious (I think) that this is slower.  How can a better GC
> > improve on this?
> 
> It's not obvious at all.  GC systems are extremely varied and you'd
> be amazed at the levels of optimization and tuning that goes into a
> really good one.
> 
> A GC implementation that requires a write barrier will impose an
> additional overhead on the nwarp implementation of 5-20
> instructions, several of which are branches; and a good copying
> generational collector can result in CONS being inlined into about 5
> instructions -- seriously.

I don't doubt that (I don't know why LispWorks doesn't inline CONS,
either.  Possibly to reduce code size?  Code has to be cached, too,
after all :-).  However, this sounds a bit like saying ``I can make an
implementation where (SETF CAR) is so slow that WARP will be faster
than NWARP in that implementation�� :-)

> In such a system the allocation is a test-and-branch (almost never
> taken), increment, mov to the car, and mov to the cdr.
> 
> The fact that Lispworks has to call a function for CONS implies to
> me that it doesn't use a copying collector (probably to make
> interfacing to C easier?).

Well, I don't know how they scan generations, but I do know that its
GC normally moves objects around.  You can explicitly request objects
to be allocated in the ``static area��, though, where they will still
be GC'd, but not moved (so you can pass an object's address to C and
find it later at the same place).

> At any rate, it doesn't seem to use as advanced a collector as
> SML/NJ or Self (or ACL AFAICT).  If indeed it made this tradeoff to
> make it easier to hook to C, it's probably a worthwhile tradeoff.

Somehow I doubt that.  I have always been very impressed by the
performance of LW's GC in practice.

Regards,
-- 
Nils G�sche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x0655CFA0
From: Barry Margolin
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <DNIDa.9$9Q2.107@paloalto-snr1.gtei.net>
In article <··············@cartan.de>, Nils Goesche  <······@cartan.de> wrote:
>Well, I don't know how they scan generations, but I do know that its
>GC normally moves objects around.

An intuitive assumption is that a copying GC will be slower than a
mark/sweep GC, because of all the data being moved around.  However,
copying GC's also compact memory and tend to bring related objects close
together, so they improve locality.  If any paging is going on, this can
result in a significant performance boost after the GC is done.

The paradoxical result is that slower intermediate operations may result in
an overall speedup.  Similar results may be obtained by using a well-tuned
GC and non-destructive operations.

-- 
Barry Margolin, ··············@level3.com
Level(3), Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: Nils Goesche
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <lyof1c7d28.fsf@cartan.de>
Barry Margolin <··············@level3.com> writes:

> In article <··············@cartan.de>, Nils Goesche  <······@cartan.de> wrote:

> The paradoxical result is that slower intermediate operations may
> result in an overall speedup.  Similar results may be obtained by
> using a well-tuned GC and non-destructive operations.

Ok, I admit that that's theoretically possible.  If something like
that would ever be successfully implemented, it should be possible to
make Haskell as fast as, or even faster than, say, ML.  But I am not
holding my breath :-)

Regards,
-- 
Nils G�sche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x0655CFA0
From: Nils Goesche
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <lyk7c07czv.fsf@cartan.de>
I <······@cartan.de> wrote:

> Barry Margolin <··············@level3.com> writes:
> 
> > In article <··············@cartan.de>, Nils Goesche  <······@cartan.de> wrote:
[nothing]

Sorry for the garbled quoting.

Regards,
-- 
Nils G�sche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x0655CFA0
From: Tim Bradshaw
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <ey3llwgx263.fsf@cley.com>
* Nils Goesche wrote:

> I cannot believe this, so far.

Well, an ideal GC/runtime can spot nondestructive operations and turn
them into destructive ones.

--tim
From: Peter Seibel
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <m37k815pfw.fsf@javamonkey.com>
Nils Goesche <···@cartan.de> writes:

> Barry Margolin <··············@level3.com> writes:
> 
> > In article <···························@posting.google.com>,
> > Kaz Kylheku <···@ashi.footprints.net> wrote:
> 
> > > Sorry, I don't believe this. Suppose I have a twenty kilobyte
> > > object, and I want to make one almost like it, but differing
> > > in one word of memory. You're telling me that in theory, I
> > > can make a brand new object, copy all twenty kilobytes into
> > > it, and substitute my change, as cheaply as just modifying
> > > that word in place.
> > 
> > Generational GC deals well with creation of lots of small,
> > short-lived objects.  The larger and longer-lived the objects
> > get, the further from this theoretical perfection you go.
> 
> That's true, but the original question was whether great
> generational GCs make destructive operations unnecessary. They
> don't, of course. Just consider the following program:

Well I listed generational GC as *one* technique that has been used to
make allocating small, short-lived objects cheaper. Probably I
shouldn't even have limited my question just to GC when I was really
interested in implementation techniques that obviate the need for the
semantically redundant destructive operators such as NREVERSE, DELETE,
et al.

For instance, maybe the way to make non-destructive functions as fast
as destructive functions (assuming one cares to) is to figure out how
to let the compiler figure out when a non-destructive call can be
safely replaced with a destructive version.

But there are other possible ways to enlist the GC than just
generational garbage collection. For instance (and this is very
handwavy, know) if your GC used a one-bit reference counter
supplemented with standard mark-and-sweep to reclaim cycles, then you
might imagine that at runtime you could use those reference counts to
detect that the cells in a list you were REVERSING were not referenced
from elsewhere and therefore could safely be immediately reused, a la
NREVERSE. (Well, you'd need some help from the compiler to let you
know that the variable that references the first cons cell in the list
is about to be reasigned.) Thus the progammer could write REVERSE and
still get the efficiency of NREVERSE in the cases where it would have
been safe to use NREVERSE explicitly. Now, whether adding that kind of
complexity--even assuming it is possible--to the GC system is a good
idea, I don't know. Just thinking out loud.

> (defun nwarp (cell)
>   (declare ((cons (unsigned-byte 8)) cell)
>            (optimize speed (safety 0)
>                      #+lispworks (fixnum-safety 0)))
>   (setf (car cell) (logand #xFF (+ 13 (* (car cell) 17))))
>   cell)
> 
> (defun warp (cell)
>   (declare ((cons (unsigned-byte 8)) cell)
>            (optimize speed (safety 0)
>                      #+lispworks (fixnum-safety 0)))
>   (cons (logand #xFF (+ 13 (* (car cell) 17)))
>         (cdr cell)))
> 
> (defun test ()
>   (format t "~&Timing WARP:")
>   (let ((cell (cons 19 42)))
>     (time (loop repeat 8000000 do
>             (setq cell (warp cell))))
>     (format t "~&Cell is ~A." cell))
>   (format t "~&Timing NWARP:")
>   (let ((cell (cons 19 42)))
>     (time (loop repeat 8000000 do
>             (setq cell (nwarp cell))))
>     (format t "~&Cell is ~A." cell)))
> 
> LispWorks has a really good generational GC.  So good that I
> often don't mind generating lots of garbage in loops.  But still,
> that doesn't mean that we need not use destructive operations
> anymore.  Some timings:
> 
> CL-USER 21 > (test)
> Timing WARP:
> Timing the evaluation of (LOOP REPEAT 8000000 DO (SETQ CELL (WARP CELL)))
> 
> user time    =      1.100
> system time  =      0.000
> Elapsed time =   0:00:01
> Allocation   = 544 bytes standard / 88002167 bytes fixlen
> 0 Page faults
> Cell is (19 . 42).
> Timing NWARP:
> Timing the evaluation of (LOOP REPEAT 8000000 DO (SETQ CELL (NWARP CELL)))
> 
> user time    =      0.340
> system time  =      0.000
> Elapsed time =   0:00:00
> Allocation   = 672 bytes standard / 2387 bytes fixlen
> 0 Page faults
> Cell is (19 . 42).
> NIL
> 
> 
> 
> The destructive version is much faster.  Does anybody really
> believe that a better GC would make this difference go away?
> Give me a break ;-)

Well, probably not just arbitrary motherhood-and-apple-pie
improvements to GC generally. But it does seem that if one did want to
make WARP faster than NWARP, GC would play a role. Especially because
the cost of CONSing is presumably tied to your GC system.

Clearly NWARP is going to be faster as long as one call to (SETF CAR)
is faster than a call to CONS and a call to CDR plus the cost of
GC'ing the short-lived cons cells. Assuming the best possible GC could
somehow reduce the (amortized) cost of GC'ing the extra cons cells to
effectively zero, the question boils down to whether there's any
conceivable implementation where (SETF CAR) could be more expensive
than the call to CONS plus the call to CDR.

For the sake of further argument, let's say that the cost of the CAR
part of (SETF CAR) and the CDR are basically negligible. So now the
question is, is it possible that the machinery of the CONS could be
made faster than that of the SETF. I certainly don't know that it can
be and there's probably a lot more ways to implement things such that
it's not. But imagine that the GC/allocation system was tuned for
creating millions of almost instantly dead cons cells and had a
generation that lived entirely in the cache. All the cells that are
allocated by the first 7,999,999 iterations of WARP could live and die
on the cache without ever being flushed to main memory. Given that the
data touched by NWARP would also presumably stay in the cache during
the loop, the relative performance would boils down to raw instruction
counts and all the deep vodoo that happens in modern CPUs with
pipelining, etc. (Which I don't pretend understand.)

I'm not saying that it's necessarily possible and I certainly don't
know enough about the relative tradeoffs that would have to be made
within an implementation to make it possible.

I guess what I was really getting by my original question was whether
any implementors (or academics) had every played around with trying to
make destructive operators that don't exist for any semantic reason
unnecessary.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Nils Goesche
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <87r8691fnl.fsf@darkstar.cartan>
Peter Seibel <·····@javamonkey.com> writes:

> Nils Goesche <···@cartan.de> writes:
> 
> > That's true, but the original question was whether great
> > generational GCs make destructive operations
> > unnecessary. They don't, of course. Just consider the
> > following program:
> 
> Well I listed generational GC as *one* technique that has been
> used to make allocating small, short-lived objects
> cheaper. Probably I shouldn't even have limited my question
> just to GC when I was really interested in implementation
> techniques that obviate the need for the semantically redundant
> destructive operators such as NREVERSE, DELETE, et al.

Fine with me.  I don't think any other technique would make WARP
as fast as NWARP, either :-)  And I am /very/ surprised that
there are people who think it would.

And these operators are /not/ ``semantically redundant��.
NREVERSE has /different/ semantics than REVERSE!  That some
functional programming adepts think they can do without them does
not make them redundant!

> For instance, maybe the way to make non-destructive functions
> as fast as destructive functions (assuming one cares to) is to
> figure out how to let the compiler figure out when a
> non-destructive call can be safely replaced with a destructive
> version.

Well, that's an entirely different question.  Sure, if the
compiler can figure out that it can replace REVERSE by NREVERSE,
and I guess some can do that already in some cases, there will be
no difference anymore, but that is trivial.  Instead, the
question is: Will the compiler /always/ be able to decide that?
The answer is ``no��, of course, especially in a dynamic language
like Lisp.  But even in a purely static language this would not
be possible, I think.  This just screams like ``Halting
Problem!�� :-) At least the Haskell compiler writers apparently
have still a long way to go towards that goal (which I think is
unsolvable, anyway ;-)

> > The destructive version is much faster.  Does anybody really
> > believe that a better GC would make this difference go away?
> > Give me a break ;-)
> 
> Well, probably not just arbitrary motherhood-and-apple-pie
> improvements to GC generally. But it does seem that if one did
> want to make WARP faster than NWARP, GC would play a
> role. Especially because the cost of CONSing is presumably tied
> to your GC system.

Any sentence beginning with ``If I was Napoleon, ...�� is true.
You cannot make WARP as fast as NWARP, I think, and not even
God's personal GC will help you there, I think.

> Clearly NWARP is going to be faster as long as one call to
> (SETF CAR) is faster than a call to CONS and a call to CDR plus
> the cost of GC'ing the short-lived cons cells.

Exactly.  But how could it possibly be otherwise?

> Assuming the best possible GC could somehow reduce the
> (amortized) cost of GC'ing the extra cons cells to effectively
> zero, the question boils down to whether there's any
> conceivable implementation where (SETF CAR) could be more
> expensive than the call to CONS plus the call to CDR.

That's it.  Exactly.  I am saying there isn't, /even if/ you make
the bogus assumption that you could GC with /zero/(!) cost.

> For the sake of further argument, let's say that the cost of
> the CAR part of (SETF CAR) and the CDR are basically
> negligible. So now the question is, is it possible that the
> machinery of the CONS could be made faster than that of the
> SETF. I certainly don't know that it can be and there's
> probably a lot more ways to implement things such that it's
> not.

This probably means that we agree :-)

> But imagine that the GC/allocation system was tuned for
> creating millions of almost instantly dead cons cells and had a
> generation that lived entirely in the cache. All the cells that
> are allocated by the first 7,999,999 iterations of WARP could
> live and die on the cache without ever being flushed to main
> memory. Given that the data touched by NWARP would also
> presumably stay in the cache during the loop, the relative
> performance would boils down to raw instruction counts and all
> the deep vodoo that happens in modern CPUs with pipelining,
> etc. (Which I don't pretend understand.)

Yes, but NWARP simply doesn't have to do all that work.  WARP has
to do the /same/ work, and then some!  That's why it can /never/
be as fast.

> I guess what I was really getting by my original question was
> whether any implementors (or academics) had every played around
> with trying to make destructive operators that don't exist for
> any semantic reason unnecessary.

I cannot parse this, but maybe you should think about Haskell for
a moment.

Regards,
-- 
Nils G�sche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID #xD26EF2A0
From: Peter Seibel
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <m3llwh47l2.fsf@javamonkey.com>
Nils Goesche <···@cartan.de> writes:

> Peter Seibel <·····@javamonkey.com> writes:
> 
> > Nils Goesche <···@cartan.de> writes:
> > 
> > > That's true, but the original question was whether great
> > > generational GCs make destructive operations
> > > unnecessary. They don't, of course. Just consider the
> > > following program:
> > 
> > Well I listed generational GC as *one* technique that has been
> > used to make allocating small, short-lived objects
> > cheaper. Probably I shouldn't even have limited my question
> > just to GC when I was really interested in implementation
> > techniques that obviate the need for the semantically redundant
> > destructive operators such as NREVERSE, DELETE, et al.
> 
> Fine with me.  I don't think any other technique would make WARP
> as fast as NWARP, either :-)  And I am /very/ surprised that
> there are people who think it would.
> 
> And these operators are /not/ ``semantically redundant��.
> NREVERSE has /different/ semantics than REVERSE!  That some
> functional programming adepts think they can do without them does
> not make them redundant!

What are the different semantics of NREVERSE that I can rely on? Or,
put another way, suppose I've just finished my new FooLisp Common Lisp
implementation which happens to implement NREVERSE as:

  (defun nreverse (sequence) (reverse sequence))

Is FooLisp not a compliant implementation of Common Lisp (assuming I
got everything else right)? If not, can you show me a program that
demonstrates it's non-compliance.

> > Clearly NWARP is going to be faster as long as one call to (SETF
> > CAR) is faster than a call to CONS and a call to CDR plus the cost
> > of GC'ing the short-lived cons cells.
> 
> Exactly.  But how could it possibly be otherwise?

It couldn't. I was just laying the ground work to make sure we agreed
on the basics. Which we appear to. ;-)

> > Assuming the best possible GC could somehow reduce the (amortized)
> > cost of GC'ing the extra cons cells to effectively zero, the
> > question boils down to whether there's any conceivable
> > implementation where (SETF CAR) could be more expensive than the
> > call to CONS plus the call to CDR.
> 
> That's it. Exactly. I am saying there isn't, /even if/ you make the
> bogus assumption that you could GC with /zero/(!) cost.
>
> > For the sake of further argument, let's say that the cost of the
> > CAR part of (SETF CAR) and the CDR are basically negligible. So
> > now the question is, is it possible that the machinery of the CONS
> > could be made faster than that of the SETF. I certainly don't know
> > that it can be and there's probably a lot more ways to implement
> > things such that it's not.
> 
> This probably means that we agree :-)
> 
> > But imagine that the GC/allocation system was tuned for
> > creating millions of almost instantly dead cons cells and had a
> > generation that lived entirely in the cache. All the cells that
> > are allocated by the first 7,999,999 iterations of WARP could
> > live and die on the cache without ever being flushed to main
> > memory. Given that the data touched by NWARP would also
> > presumably stay in the cache during the loop, the relative
> > performance would boils down to raw instruction counts and all
> > the deep vodoo that happens in modern CPUs with pipelining,
> > etc. (Which I don't pretend understand.)
> 
> Yes, but NWARP simply doesn't have to do all that work.  WARP has
> to do the /same/ work, and then some!  That's why it can /never/
> be as fast.

Well, as another poster pointed out, it's possible that the SETF in
NWARP may require flushing the assignment from cache out to main
memory (since you're mutating an object that could be referenced from
elsewhere) while monkeying with freshly created cons cells doesn't
have touch main memory. Repeat 8,000,000 times and the WRAP is looking
*a lot* faster. (Note, that's about as much detail as I understand
about these issues--maybe someone who really knows what they're
talking about can jump in here. I dealt with this issues when trying
to understand the Java memory model but have never personally
progammed at a level where this matters.)

Admitedly, this kind of analysis is straying away from pure Common
Lisp since those kinds of issues only really matter in multithreaded
systems and, as we all know, Common Lisp is silent on the topic of
multithreading. But in a Lisp implemenation that supported
multithreading on SMP machines, these issues would arise and might be
decided in ways that affect the relative costs of NWARP vs. WARP.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Nils Goesche
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <87n0gx1ce3.fsf@darkstar.cartan>
Peter Seibel <·····@javamonkey.com> writes:

> Nils Goesche <···@cartan.de> writes:
> 
> > And these operators are /not/ ``semantically redundant��.
> > NREVERSE has /different/ semantics than REVERSE!  That some
> > functional programming adepts think they can do without them
> > does not make them redundant!
> 
> What are the different semantics of NREVERSE that I can rely
> on? Or, put another way, suppose I've just finished my new
> FooLisp Common Lisp implementation which happens to implement
> NREVERSE as:
> 
>   (defun nreverse (sequence) (reverse sequence))
> 
> Is FooLisp not a compliant implementation of Common Lisp
> (assuming I got everything else right)? If not, can you show me
> a program that demonstrates it's non-compliance.

One difference is that REVERSE /guarantees/ that

(defun test ()
  (let ((list (list 1 2 3)))
    (reverse list)
    list))

will return a list (1 2 3), whereas

(defun test ()
  (let ((list (list 1 2 3)))
    (nreverse list)
    list))

possibly won't.  That's a semantic difference.  Of course, there
are better examples, like

(defun test ()
  (let ((cell-1 (cons 2 3))
        (cell-2 (cons 2 3)))
    (setf (car cell-1) 42)
    (cons 42 (cdr cell-2))
    (values cell-1 cell-2)))

Here, the return values

(42 . 3)
(2 . 3)

are even guaranteed.

> > Yes, but NWARP simply doesn't have to do all that work.  WARP
> > has to do the /same/ work, and then some!  That's why it can
> > /never/ be as fast.
> 
> Well, as another poster pointed out, it's possible that the
> SETF in NWARP may require flushing the assignment from cache
> out to main memory (since you're mutating an object that could
> be referenced from elsewhere) while monkeying with freshly
> created cons cells doesn't have touch main memory.

It hasn't?  Why not?  Or rather, why has the first assignment to
be flushed back in the first place?  Who would read it but the
processor?  The processor would read it from the cache in any
case!  We are not talking about anything like DMA here.  I do not
know very much about PCs, but I am an embedded systems programmer
and have encountered quite a few systems with caches so far and
this argument simply doesn't hold on the systems I know, at
least.

> Repeat 8,000,000 times and the WRAP is looking *a lot* faster.

No, I don't think so (and it's called WARP :-)

> (Note, that's about as much detail as I understand about these
> issues--maybe someone who really knows what they're talking
> about can jump in here. I dealt with this issues when trying to
> understand the Java memory model but have never personally
> progammed at a level where this matters.)

I have, actually :-)

Regards,
-- 
Nils G�sche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID #xD26EF2A0
From: Peter Seibel
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <m34r353zp5.fsf@javamonkey.com>
Nils Goesche <···@cartan.de> writes:

> Peter Seibel <·····@javamonkey.com> writes:
> 
> > Nils Goesche <···@cartan.de> writes:
> > 
> > > And these operators are /not/ ``semantically redundant��.
> > > NREVERSE has /different/ semantics than REVERSE!  That some
> > > functional programming adepts think they can do without them
> > > does not make them redundant!
> > 
> > What are the different semantics of NREVERSE that I can rely
> > on? Or, put another way, suppose I've just finished my new
> > FooLisp Common Lisp implementation which happens to implement
> > NREVERSE as:
> > 
> >   (defun nreverse (sequence) (reverse sequence))
> > 
> > Is FooLisp not a compliant implementation of Common Lisp
> > (assuming I got everything else right)? If not, can you show me
> > a program that demonstrates it's non-compliance.
> 
> One difference is that REVERSE /guarantees/ that
> 
> (defun test ()
>   (let ((list (list 1 2 3)))
>     (reverse list)
>     list))
> 
> will return a list (1 2 3), whereas
> 
> (defun test ()
>   (let ((list (list 1 2 3)))
>     (nreverse list)
>     list))
> 
> possibly won't. 

But it could. My point was that there are no semantics of NREVERSE
that you can rely on because it's only *allowed* to do certain things,
not *required* to. I don't believe there's any "conforming" (per CLHS
1.5.2) Common Lisp program that uses NREVERSE that wouldn't run in my
hypothetical FooLisp.

> That's a semantic difference. 

Okay. Let me be more precise. NREVERSE is semantically different from
REVERSE only in that it has less restricted semantics. I.e. REVERSE is
required to leave the sequence argument untouched, NREVERSE is not.
But NREVERSE has no *specified* semantics that are not also semantics
of REVERSE.

> Of course, there are better examples, like
> 
> (defun test ()
>   (let ((cell-1 (cons 2 3))
>         (cell-2 (cons 2 3)))
>     (setf (car cell-1) 42)
>     (cons 42 (cdr cell-2))
>     (values cell-1 cell-2)))
> 
> Here, the return values
> 
> (42 . 3)
> (2 . 3)
> 
> are even guaranteed.

I didn't mean to imply that *all* operations with side-effects are
without specified semantics. I was merely talking about functions like
NREVERSE and DELETE, which are the "destructive" analogs of other
non-destructive functions and don't provide (as far as I can tell) any
additional or different semantics (except that they have weaker
postconditions). Interestingly at least one of the functions that
seem, at first glance, to be of this variety actually has specified
semantics. In the dictionary entry for NSUBSTITUTE it explicitly says:

  nsubstitute and nsubstitute-if can be used in for-effect-only
  positions in code.

Similarly, the behavior of NCONC is also carefully specified, so one
could rely on it being different from its non-destructive analog
APPEND. But that's not the case for NREVERSE, unless I'm misreading
something.

> > > Yes, but NWARP simply doesn't have to do all that work. WARP has
> > > to do the /same/ work, and then some! That's why it can /never/
> > > be as fast.
> > 
> > Well, as another poster pointed out, it's possible that the
> > SETF in NWARP may require flushing the assignment from cache
> > out to main memory (since you're mutating an object that could
> > be referenced from elsewhere) while monkeying with freshly
> > created cons cells doesn't have touch main memory.
> 
> It hasn't?  Why not?  Or rather, why has the first assignment to
> be flushed back in the first place?  Who would read it but the
> processor?

Another thread running in a different processor.

> The processor would read it from the cache in any case! We are not
> talking about anything like DMA here. I do not know very much about
> PCs, but I am an embedded systems programmer and have encountered
> quite a few systems with caches so far and this argument simply
> doesn't hold on the systems I know, at least.

Are any of those systems symmetric multi-processors? That's what I'm
talking about.

> > Repeat 8,000,000 times and the WRAP is looking *a lot* faster.
> 
> No, I don't think so (and it's called WARP :-)
> 
> > (Note, that's about as much detail as I understand about these
> > issues--maybe someone who really knows what they're talking about
> > can jump in here. I dealt with this issues when trying to
> > understand the Java memory model but have never personally
> > progammed at a level where this matters.)
> 
> I have, actually :-)

Okay. So then you should be the one explaining it to me. ;-) I think
what I'm talking about is systems with "relaxed" (as opposed to
"sequential") memory consistency models such as the Alpha, Sparc, and
PowerPC which (as I understand it) provide instructions that force
memory actions (writes and reads) performed by individual processors
to be reflected in main memory (i.e. visible to other processors). On
the Alpha these instructions are MB (for Memory Barrier) and WMB
(Write Memory Barrier); on the Sparc MEMBAR; and on PowerPC SYNC.

Again, I haven't written code myself that uses these instructions
(directly anyway--who knows what compilers may be doing under the
covers) but I think I'm not totally on crack here. Anyway, while
searching around my bookshelf and then the web I found this
introduction to memory consistency issues:

  <http://citeseer.nj.nec.com/adve95shared.html>

Maybe once I've finished reading it I'll be able to explain myself
better.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Joe Marshall
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <CIMDa.1139560$S_4.1171156@rwcrnsc53>
"Peter Seibel" <·····@javamonkey.com> wrote in message ···················@javamonkey.com...

> the question boils down to whether there's any
> conceivable implementation where (SETF CAR) could be more expensive
> than the call to CONS plus the call to CDR.

It's quite conceivable.  CONSing a fresh cell doesn't require interaction
with the write barrier.  SETF CAR does.  If the write barrier is
in some shared address space, there may be an atomic interlock
as well.  If fresh storage is allocated on a per-thread basis,
CONS can avoid the interlock.
From: Pascal Bourguignon
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <873cipqvex.fsf@thalassa.informatimago.com>
Nils Goesche <···@cartan.de> writes:
> The destructive version is much faster.  Does anybody really
> believe that a better GC would make this difference go away?
> Give me a break ;-)

Of course!  
And you did not even copy 10000 conses to modify: (cdr (nth 10000 l))


-- 
__Pascal_Bourguignon__                   http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.
From: Nils Goesche
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <87znkx1j8c.fsf@darkstar.cartan>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> Nils Goesche <···@cartan.de> writes:

> > The destructive version is much faster.  Does anybody really
> > believe that a better GC would make this difference go away?
> > Give me a break ;-)

> Of course!  And you did not even copy 10000 conses to modify:
> (cdr (nth 10000 l))

Right (although you probably mean NTHCDR), but then the evil
opposition could claim that a list with 10000 elements is not a
small object.

Regards,
-- 
Nils G�sche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID #xD26EF2A0
From: Joe Marshall
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <ADMDa.53847$DV.75588@rwcrnsc52.ops.asp.att.net>
"Nils Goesche" <···@cartan.de> wrote in message ···················@darkstar.cartan...
>
> CL-USER 21 > (test)
> Timing WARP:
> Timing the evaluation of (LOOP REPEAT 8000000 DO (SETQ CELL (WARP CELL)))
>
> user time    =      1.100
> system time  =      0.000
> Elapsed time =   0:00:01
> Allocation   = 544 bytes standard / 88002167 bytes fixlen
> 0 Page faults
> Cell is (19 . 42).
> Timing NWARP:
> Timing the evaluation of (LOOP REPEAT 8000000 DO (SETQ CELL (NWARP CELL)))
>
> user time    =      0.340
> system time  =      0.000
> Elapsed time =   0:00:00
> Allocation   = 672 bytes standard / 2387 bytes fixlen
> 0 Page faults
> Cell is (19 . 42).
> NIL
>
>
>
> The destructive version is much faster.

``Much'' faster?  Not even an order of magnitude....

But seriously, that factor of 4 can dwindle if you start
adding in the overhead for managing resources.
From: Michael Parker
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <040620031857082274%michaelparker@earthlink.net>
In article <···························@posting.google.com>, Kaz
Kylheku <···@ashi.footprints.net> wrote:
 
> Sorry, I don't believe this. Suppose I have a twenty kilobyte object,
> and I want to make one almost like it, but differing in one word of
> memory. You're telling me that in theory, I can make a brand new
> object, copy all twenty kilobytes into  it, and substitute my change,
> as cheaply as just modifying that word in place.

I don't mean to get snippy here, but the OP was talking about *small*
objects:
> > In the Java world the folks writing Java virtual machines have put a
> > bunch of effort into improving performance on what they called "small
> > object" garbage collection. I.e. how does the GC perform when lots of
> > small objects are being created and quickly becoming garbage. (E.g.
> > generational garbage collectors helps with this problem.)

Admittedly, he didn't come out and say what he meant by "small", but I'm
pretty sure that 20k wasn't it.
From: Pascal Bourguignon
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <87y90grflr.fsf@thalassa.informatimago.com>
Michael Parker <·············@earthlink.net> writes:

> In article <···························@posting.google.com>, Kaz
> Kylheku <···@ashi.footprints.net> wrote:
>  
> > Sorry, I don't believe this. Suppose I have a twenty kilobyte object,
> > and I want to make one almost like it, but differing in one word of
> > memory. You're telling me that in theory, I can make a brand new
> > object, copy all twenty kilobytes into  it, and substitute my change,
> > as cheaply as just modifying that word in place.
> 
> I don't mean to get snippy here, but the OP was talking about *small*
> objects:

Ah! The evil opposition ! :-)

For me, a  cons is a small  object, and 10000 of them  still make them
small  objects.    And  that  was  the   point:  generational  garbage
collectors are good with a lot (like 10000) of small objects!


> > > In the Java world the folks writing Java virtual machines have put a
> > > bunch of effort into improving performance on what they called "small
> > > object" garbage collection. I.e. how does the GC perform when lots of
> > > small objects are being created and quickly becoming garbage. (E.g.
> > > generational garbage collectors helps with this problem.)
> 
> Admittedly, he didn't come out and say what he meant by "small", but I'm
> pretty sure that 20k wasn't it.

Perhaps he meant small AND few objects...

-- 
__Pascal_Bourguignon__                   http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.
From: Michael Sullivan
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <1fw31z8.1kbtiw36k46nrN%michael@bcect.com>
Pascal Bourguignon <····@thalassa.informatimago.com> wrote:

> Michael Parker <·············@earthlink.net> writes:
> 
> > In article <···························@posting.google.com>, Kaz
> > Kylheku <···@ashi.footprints.net> wrote:
> >  
> > > Sorry, I don't believe this. Suppose I have a twenty kilobyte object,
> > > and I want to make one almost like it, but differing in one word of
> > > memory. You're telling me that in theory, I can make a brand new
> > > object, copy all twenty kilobytes into  it, and substitute my change,
> > > as cheaply as just modifying that word in place.
> > 
> > I don't mean to get snippy here, but the OP was talking about *small*
> > objects:
> 
> Ah! The evil opposition ! :-)
> 
> For me, a  cons is a small  object, and 10000 of them  still make them
> small  objects.    And  that  was  the   point:  generational  garbage
> collectors are good with a lot (like 10000) of small objects!

in fact, a 10000 node cons sequence/tree is not necessarily going to
cause you to copy 10000 cons cells in order to perform a nondestructive
operation.  Implementationally, all those conses are basically pointers,
and it could happen that the compliler just pointer maps out the
modified cell and still keeps the other 9999 cells in common...
Admittedly, this is a SMOP, and I'm not certain how much it buys|hurts
you over the long haul, but then we were talking *theory* here and not
practice.  That said, you've still got to do *some* copying/remapping in
a non-destructive operation that is unnecessary in a destructive
operation.  

I can see that some implementations might force overhead onto
destructive operations that will make non-destructive operations just as
fast or even faster, but that's really cheating, no?  Unless that
overhead actually buys speed on average in the *destructive* operations
as well, it's like inserting a pile of no-ops in the code and saying
"See, the non-destructive operator is now faster."

And even if it does help on the whole -- how much, and according to what
profile?  Are there certain kinds of operations where it *doesn't*
increase the average speed, and how common are they?  Seems like you'd
have a lot of work to do to demonstrate that you weren't taking away
some optimization capabilities by doing this, and the primary
justification is a removal of operators.  If you're coming from a "pure"
FP perspective that makes lots of sense as a goal, but most lisps are
not intended as pure FP, CL explicitly so.


Michael
From: William D Clinger
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <b84e9a9f.0306041931.2b74dbb8@posting.google.com>
Peter Seibel wrote:
> Lisp implementors have obviously been working on GC technology for
> quite some time. As I was thinking about destructive operators
> (DELETE, NSUBSTITUTE, etc.) it occured to me that in a system where
> the GC were sufficiently good the performance benefit of using
> destructive operators would approach zero.

That is true in certain limited situations.  Consider for example
the following simplified form of Scheme's MAP procedure (similar
to MAPCAR in Common Lisp):

    (define (map f x)
      (do ((x x (cdr x))
           (result '() (cons (f (car x)) result)))
        ((null? x)
         (reverse result))))

Many articles and textbooks have used something like this as an
example of when a call to REVERSE can be replaced by a call to
REVERSE! (or NREVERSE in CL).

But does that make MAP run faster?  In the two implementations
of Scheme that have the best garbage collectors, on most of the
SPARC machines I have benchmarked, changing REVERSE to REVERSE!
actually makes MAP _slower_ in the usual case when F allocates
little storage and the list x contains no more than a few tens
of thousands of elements.

This is a very simple example.  A lot depends on the write barrier,
allocator, object lifetimes, and other arcana.  In general, not
even the experts understand the tradeoffs very well, and you ought
to be skeptical of the answers you'll get on Usenet.  One of the
few experimental studies on this was done for a master's thesis
under my direction:

    Lars Thomas Hansen.  "The Impact of Programming Style on the
    Performance of Scheme Programs".  Masters Thesis.  University
    of Oregon.  August 1992.  Available online at
    ftp://ftp.cs.indiana.edu/pub/scheme-repository/doc/pubs/impact.ps.gz

But that thesis is pretty old now, and changing technology has
changed some of the tradeoffs.

Will
From: William D Clinger
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <b84e9a9f.0306050843.562f1d44@posting.google.com>
Nils Goesche posted an NWARP/WARP benchmark.  Here are elapsed
times (except ACL reports only CPU time for gc), best of three runs,
(proclaim '(optimize (speed 3) (safety 0) (space 0) (debug 0)))
or the closest equivalent in Scheme, for three different systems on
two different Sun workstations.

On adara:
                                   warp              nwarp
                               elapsed   gc       elapsed   gc
Allegro CL 4.3.1                4.92    .03         4.81     0
Chez Scheme 6.1                 4.02    .04         3.82     0
Larceny v0.51                   2.32    .10         1.98     0

On cygnus:
                                   warp              nwarp
                               elapsed   gc       elapsed   gc
Allegro CL 4.3.1                5.26    .03         4.98     0
Chez Scheme 6.1                 4.25    .00         4.19     0
Larceny v0.51                   2.32    .07         2.12     0


Before you conclude that destructive operations will always be
faster than consing, ponder this:  In Larceny, the write barrier
normally allocates four bytes.  In the benchmark above, this
allocation is bypassed because there is only one object being
mutated, and it is in the youngest generation.

Will
From: Nils Goesche
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <87adcwkup6.fsf@darkstar.cartan>
······@qnci.net (William D Clinger) writes:

> Nils Goesche posted an NWARP/WARP benchmark.  Here are elapsed
> times (except ACL reports only CPU time for gc), best of three runs,
> (proclaim '(optimize (speed 3) (safety 0) (space 0) (debug 0)))
> or the closest equivalent in Scheme, for three different systems on
> two different Sun workstations.
> 
> On adara:
>                                    warp              nwarp
>                                elapsed   gc       elapsed   gc
> Allegro CL 4.3.1                4.92    .03         4.81     0
> Chez Scheme 6.1                 4.02    .04         3.82     0
> Larceny v0.51                   2.32    .10         1.98     0
> 
> On cygnus:
>                                    warp              nwarp
>                                elapsed   gc       elapsed   gc
> Allegro CL 4.3.1                5.26    .03         4.98     0
> Chez Scheme 6.1                 4.25    .00         4.19     0
> Larceny v0.51                   2.32    .07         2.12     0
> 
> 
> Before you conclude that destructive operations will always be
> faster than consing,

Thanks for this; quite impressive!  NWARP might be slightly
faster, but anything below one order of magnitude should not
upset anybody.  It is great that with modern implementations we
can write our code the way we like without worrying all too much
about GC times.  I still doubt that the original point, that
advanced GCs will make destructive operations unnecessary for
gaining performance, is valid in general (or ever will be) (not
mentioning that we might not even /want/ to get rid of
destructive operations even if that would be the case), but
actually this is not so terribly important.  With current
technology, we can already write code without thinking about
consing as much as in earlier times.  This is great!

Regards,
-- 
Nils G�sche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID #xD26EF2A0
From: Kent M Pitman
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <sfwy90gc4ba.fsf@shell01.TheWorld.com>
Peter Seibel <·····@javamonkey.com> writes:

> In the Java world the folks writing Java virtual machines have put a
> bunch of effort into improving performance on what they called "small
> object" garbage collection. I.e. how does the GC perform when lots of
> small objects are being created and quickly becoming garbage. (E.g.
> generational garbage collectors helps with this problem.)
> 
> Lisp implementors have obviously been working on GC technology for
> quite some time. As I was thinking about destructive operators
> (DELETE, NSUBSTITUTE, etc.) it occured to me that in a system where
> the GC were sufficiently good the performance benefit of using
> destructive operators would approach zero.
> 
> My question is, does anyone know if any current CL implementations
> have set out to tune their GC such that there would really be no
> reason to use those destructive functions?

To what end?

It's fine to make it such that non-destructive operations run with no
serious performance penalty, but that's not the entire issue.

Supposing that the list to be reversed occupied more than 50% of
available memory, only NREVERSE could work since REVERSE could not
work fast enough to succeed.  NREVERSE effectively gives the processor
permission to succeed in this case where REVERSE could not possibly.

Dropping support for NREVERSE is like a kind of voluntary suicide in
this situation.

IMO, NREVERSE amounts to a declaration that the old list will no
longer be needed, and that it's ok to recycle its structure in the
process of the reversal.  A compiler might or might not be able to
prove that this is the case, but if the compiler could not prove this,
I see no reason to ignore the user's declaration.

The introduction of the term of "destructive" implies a certain degree
of pejorative action.  IMO, it's "functional" operations that people
should look at with a smirk.  The entire universe is constructed of
"destructive" stuff--that is, in a 4-dimensional universe, all actions
occur by destroying T=n's state to produce T=n+1's state.  There is no
evidence at present that T=n+1 occurs by consing new matter.  Among
other things, if you believe time to be continuous, this would require
infinite consing for any finite unit "1" to be added to any "n".
Consequently, I see no shame in asserting that destructive operations
are the ones that can best "keep up" with real world simulations
because they consume space that is big-O compatible with the march of
time.  The idea that to compute a new state, one should use a consing
model and then optimize it seems a philosophically silly position to
me, and I see no reason to apologize for not using this as a base
case.  I'm sorry if mathemeticians find this an unpleasant reality,
but then, mathemeticians seem to often find a lot of things about
reality unpleasant.

Not that my personal opinion really matters in the Grand Scheme of 
Things, but...

If you want _me_ to seriously engage this question, your burden is
first to respond to my alternate question: What IS this passion for
non-destructive functions?  For anything that models real world
statefulness, why on earth wouldn't I want to use a representation
that is isomorphic to the observed real world, that is, that has state
and that by its nature _calls for_ destructive operations just to make
it easy to prove the correctness of its behavior.  If you imagine the
universe to be memory for a processor running a "real world"
simulation, then there is a row-major projection of the matter in the
universe that I can imagine happily nreversing.  Requiring me to
reverse the projection instead requires me to buy new hardware, which
I'm not sure I have the budget for.

None of this is an argument against working on good GC's.  I just
don't think the end of "eliminating destructive operators" is a good
one.  I _do_ think the end of "making non-destructive operators" incur
minimal performance penalty is a fine one.

Incidentally, as a total aisde, in the presence of cdr-coding,
NREVERSE is also well-behaved memory-wise while REVERSE is a bit
philosophically adrift since it has to guess at an appropriate
cdr-coding for the result.  If it packs the result, it might be
setting itself up to cons (and fragment) unnecessarily later, which
yes, eventually a GC might fix, but which can have bad effects
depending on timing of GC's; if it does not pack the result, it might
be bloating unnecessarily.  NREVERSE, by contrast, nicely defers the
decision by reusing prior consing and further, two NREVERSE's undo the
storage effects cleanly with no risk of screwing the cdr-coding.  A
bidirectional plist constructed by (list* x y more) can then cleanly
be extended by SETF of GETF, can be nreversed, and can continue to be
extended, while operating using compact cdr-coding.  The very nature
of cdr-coding is to ask the programmer to become involved in
identifying which cdr-links are likely to be snapped and which ones
are not.  It seems to me that the GC, which uses only dynamic
knowledge, cannot (short of neural net technology I've not seen)
predict such things, which might be known only statically to the
programmer, and so might thrash, never realizing it was working
against itself.
From: Peter Seibel
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <m3znkw2x5q.fsf@javamonkey.com>
Kent M Pitman <······@world.std.com> writes:

> Peter Seibel <·····@javamonkey.com> writes:
> 
> > In the Java world the folks writing Java virtual machines have put a
> > bunch of effort into improving performance on what they called "small
> > object" garbage collection. I.e. how does the GC perform when lots of
> > small objects are being created and quickly becoming garbage. (E.g.
> > generational garbage collectors helps with this problem.)
> > 
> > Lisp implementors have obviously been working on GC technology for
> > quite some time. As I was thinking about destructive operators
> > (DELETE, NSUBSTITUTE, etc.) it occured to me that in a system where
> > the GC were sufficiently good the performance benefit of using
> > destructive operators would approach zero.
> > 
> > My question is, does anyone know if any current CL implementations
> > have set out to tune their GC such that there would really be no
> > reason to use those destructive functions?
> 
> To what end?

First let me be clear, in case you haven't read all the rest of this
thread, that I'm only talking about those "destructive" operators that
don't add any semantics to their "functional" counterparts. I.e. I'm
talking about NREVERSE and DELETE, not NSUBSTITUTE and NCONC. And I'm
definitely *not* talking about SETF.

That said, to answer your question, "to what end?" Conceptual clarity
and the reduction of action-at-a-distance errors. I recognize that
those goals do not automatically trump all other goals. But if they
can be had at reasonable cost they seem worth thinking about.

What struck me was that NREVERSE (to keep things concrete) is
basically a way of defeating automatic memory management. We used to
do this a lot in the early days of Java because the Java garbage
collectors weren't very good--make pools of objects that we could
manually allocate and release. And it tended to lead to similar
problems as problems as free and malloc--releasing and reusing objects
that were still in use elsewhere and forgetting to release them. The
Java VM guys then worked pretty hard on optimizing their GC so that
most programs written in the natural "create new objects when you need
'em" style would outperform the object-pooling versions. NREVERSE is
sort of like a built in object-pooling mechanism and as such has
similar problems--if you use it at the wrong time your code breaks.

> It's fine to make it such that non-destructive operations run with
> no serious performance penalty, but that's not the entire issue.
> 
> Supposing that the list to be reversed occupied more than 50% of
> available memory, only NREVERSE could work since REVERSE could not
> work fast enough to succeed.  NREVERSE effectively gives the processor
> permission to succeed in this case where REVERSE could not possibly.

That's an interesting point. Of course it's somewhat analogous to
saying a copying GC is wrong/broken/bad whatever if it's possible that
your app will need more than 50% of available memory. It's true but it
has to be weighed against other considerations; ultimately, in both
cases, it's a quality of implementation issue since the standard
doesn't *require* that NREVERSE work in this case.

> Dropping support for NREVERSE is like a kind of voluntary suicide in
> this situation.
> 
> IMO, NREVERSE amounts to a declaration that the old list will no
> longer be needed, and that it's ok to recycle its structure in the
> process of the reversal.  A compiler might or might not be able to
> prove that this is the case, but if the compiler could not prove this,
> I see no reason to ignore the user's declaration.

Yup. That's how I see it too--like other declarations (execpt
"special") it doesn't affect the semantics of a correct program but
lets the programmer tell the compiler something it might not be able
to figure out. I was actually imagining that it might be neat to have
something like a "recyclable" declaration that makes:

    (defun foo (some-big-list)
      (declare (recyclable some-big-list))
      (reverse some-big-list))

equivalent to:

    (defun foo (some-big-list)
      (nreverse someb-big-list))


> The introduction of the term of "destructive" implies a certain degree
> of pejorative action.

Hey, don't look at me. I didn't make up that term. I actually prefer
my own term "recycling" since that seems to get more to the point of
what's really going on. And it's less perjorative. ;-)

> IMO, it's "functional" operations that people should look at with a
> smirk. The entire universe is constructed of "destructive"
> stuff--that is, in a 4-dimensional universe, all actions occur by
> destroying T=n's state to produce T=n+1's state. There is no
> evidence at present that T=n+1 occurs by consing new matter. Among
> other things, if you believe time to be continuous, this would
> require infinite consing for any finite unit "1" to be added to any
> "n". Consequently, I see no shame in asserting that destructive
> operations are the ones that can best "keep up" with real world
> simulations because they consume space that is big-O compatible with
> the march of time. The idea that to compute a new state, one should
> use a consing model and then optimize it seems a philosophically
> silly position to me, and I see no reason to apologize for not using
> this as a base case. I'm sorry if mathemeticians find this an
> unpleasant reality, but then, mathemeticians seem to often find a
> lot of things about reality unpleasant.

Well, I find it unpleasant that my computer doesn't have an infinite
amount of memory. However using languages with garbage collection
allow me to write many programs without having to deal with that
unplesant fact--it's not a perfect simulation but for most purposes GC
makes it seem as though I have so much memory I don't have to think
about where it's coming from.

I'm not saying I want to program in a purely functional language--I
know where to find them if that's what I want. I'm just saying that if
I could get all the benefits of NREVERSE (speed and, as you point out,
lower peak memory usage) *automatically* because my compiler and
runtime are smart enough to give it to me, that'd be good. If they
can't, well, there it is; nobody said progamming had to be easy.

> Not that my personal opinion really matters in the Grand Scheme of
> Things, but...
> 
> If you want _me_ to seriously engage this question, your burden is
> first to respond to my alternate question: What IS this passion for
> non-destructive functions?  For anything that models real world
> statefulness, why on earth wouldn't I want to use a representation
> that is isomorphic to the observed real world, that is, that has state
> and that by its nature _calls for_ destructive operations just to make
> it easy to prove the correctness of its behavior.  If you imagine the
> universe to be memory for a processor running a "real world"
> simulation, then there is a row-major projection of the matter in the
> universe that I can imagine happily nreversing.  Requiring me to
> reverse the projection instead requires me to buy new hardware, which
> I'm not sure I have the budget for.

So, hopefully I've made myself clear I don't want to program using
only non-destructive functions. In fact I'd be happy if there were
*more* destrutive functions if they also had defined side-effecting
semantics. For instance, if SORT could be called in for-effect-only
positions, it wouldn't upset me at all.

> None of this is an argument against working on good GC's. I just
> don't think the end of "eliminating destructive operators" is a good
> one. I _do_ think the end of "making non-destructive operators"
> incur minimal performance penalty is a fine one.

Okay, that's all I'm talking about.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Kent M Pitman
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <sfwllwcrwft.fsf@shell01.TheWorld.com>
Peter Seibel <·····@javamonkey.com> writes:

>     (defun foo (some-big-list)
>       (declare (recyclable some-big-list))
>       (reverse some-big-list))
> 
> equivalent to:
> 
>     (defun foo (some-big-list)
>       (nreverse someb-big-list))

Without prejudice as to your other remarks, this wouldn't work.
What would work is
 (reverse (the last-use some-big-list))
or
 (defun foo (some-big-list)
   (declare (otherwise-inaccessible some-big-list))
   (reverse some-big-list))

That is, as a point of meaningful semantics, at the point of your 
declaration, the list is not recyclable.  Moreover, it's an invitation
for something to get broken if you later add code. Consider:

 (defun foo (some-big-list)
   (declare (recyclable some-big-list))
   (setq *foo* some-big-list)
   (reverse some-big-list))

This wouldn't be a correct program, but you have invited it.  Consider 

 (defun foo (some-big-list)
   (declare (recyclable some-big-list))
+---------------------------------------+
|  (setq *foo* some-big-list)           |
|  (reverse some-big-list))             |

wouldn't make it apparent that the declaration was in effect if the bars
are screen view.  Whereas, if all you'd done is declared it to be 
otherwise-inaccessible above, you wouldn't necessarily void the declaration.
Moreover, in the last-use case, it's clearer where the last use is.

A similar case comes up with the place where people like to use the icky
LispM declaration 
 (lambda (...)
   (declare (sys:downward-function))
   ...)
where it isn't obvious what the scope is relative to.  In
 (foo (lambda (...)
       (declare (sys:downward-function))
       ...))
there is technically still a return upward from the (lambda...) to the
arg-accumulator for FOO before going downward.  I prefer in my code
 (with-dynamic-funargs
   (foo (lambda (...) ...)))
that turns into
 (flet ((temp (...) ...))
   (declare (dynamic-extent #'temp))
   (foo #'temp))
so that the extent of the declaration is clearly bounded.

-- -- --

My comments above are purely as to syntax, and I'm not sure whether I agree
that using a declaration mechanism is the right thing here.  It makes it hard
to funcall with the appropriate semantics.  That is, one has to do
 (f #'(lambda (x) (reverse (the last-use x))))
instead of 
 (f #'nreverse)
I'm not sure that making the name for these things hard to get to is good.

I also basically think that since these are two utterly different things,
they need different names.  The algorithms for reverse and nreverse are
quite different in some implementations, especially cdr-coded.  So the real
question is whether you want to make inaccessible an operation that some
people need only because _you_ don't.  And on that point, I just don't see
it.

If you were making a new language, that'd be another deal, but given that the
language is deployed, all you can seek to do is to hide one of the names.
And that seems silly to me. YMMV.
From: Pascal Bourguignon
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <87r864fbz3.fsf@thalassa.informatimago.com>
Peter Seibel <·····@javamonkey.com> writes:
> First let me be clear, in case you haven't read all the rest of this
> thread, that I'm only talking about those "destructive" operators that
> don't add any semantics to their "functional" counterparts. I.e. I'm
> talking about NREVERSE and DELETE, not NSUBSTITUTE and NCONC. And I'm
> definitely *not* talking about SETF.


What do you mean by "the don't add any semantics"?

[12]> (setq l1 '(a b c))
(A B C)
[13]> (setq l2 l1)
(A B C)
[14]> (remove 'b l2)
(A C)
[15]> l1
(A B C)
[16]> (delete 'b l2)
(A C)
[17]> l1
(A C)
[18]> (reverse l2)
(C A)
[19]> l1
(A C)
[20]> (nreverse l2)
(C A)
[21]> l1
(C A)



-- 
__Pascal_Bourguignon__                   http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.
From: Peter Seibel
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <m3n0grwzqo.fsf@javamonkey.com>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> Peter Seibel <·····@javamonkey.com> writes:
> > First let me be clear, in case you haven't read all the rest of this
> > thread, that I'm only talking about those "destructive" operators that
> > don't add any semantics to their "functional" counterparts. I.e. I'm
> > talking about NREVERSE and DELETE, not NSUBSTITUTE and NCONC. And I'm
> > definitely *not* talking about SETF.
> 
> 
> What do you mean by "the don't add any semantics"?

I mean the side effects that NREVERSE is allowed to have are not
required. A conforming Lisp implementation could (though it would be
quite strange and a poor choice) implement NREVERSE exactly the same
as REVERSE. So no conforming program can rely on any particular
semantics provided by NREVERSE that are different than those provided
by REVERSE. Likewise DELETE *could* be strictly speaking correctly
implemented as an alias for REMOVE. Your examples below just show that
in whatever implementation you're using they happen to be implented to
take advantage of the lattitude the standard gives them and how that
interacts with aliasing.

> [12]> (setq l1 '(a b c))
> (A B C)
> [13]> (setq l2 l1)
> (A B C)
> [14]> (remove 'b l2)
> (A C)
> [15]> l1
> (A B C)
> [16]> (delete 'b l2)
> (A C)
> [17]> l1
> (A C)
> [18]> (reverse l2)
> (C A)
> [19]> l1
> (A C)
> [20]> (nreverse l2)
> (C A)
> [21]> l1
> (C A)

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Joe Marshall
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <cUMDa.581518$Si4.531784@rwcrnsc51.ops.asp.att.net>
"Kent M Pitman" <······@world.std.com> wrote in message ····················@shell01.TheWorld.com...

>  The entire universe is constructed of
> "destructive" stuff--that is, in a 4-dimensional universe, all actions
> occur by destroying T=n's state to produce T=n+1's state.  There is no
> evidence at present that T=n+1 occurs by consing new matter.  Among
> other things, if you believe time to be continuous, this would require
> infinite consing for any finite unit "1" to be added to any "n".

On the other hand, given that baryon number, charge, and lepton number,
(among other things) is conserved, you could say that *no* destructive
action is taking place, only a functional mapping from state to state.

(Inicidentally, since the Hamiltonian of a quantum system evolves in
a unitary manner, it's a function.)
From: Kent M Pitman
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <sfwd6hsfez8.fsf@shell01.TheWorld.com>
"Joe Marshall" <·············@attbi.com> writes:

> "Kent M Pitman" <······@world.std.com> wrote in message ····················@shell01.TheWorld.com...
> 
> >  The entire universe is constructed of
> > "destructive" stuff--that is, in a 4-dimensional universe, all actions
> > occur by destroying T=n's state to produce T=n+1's state.  There is no
> > evidence at present that T=n+1 occurs by consing new matter.  Among
> > other things, if you believe time to be continuous, this would require
> > infinite consing for any finite unit "1" to be added to any "n".
> 
> On the other hand, given that baryon number, charge, and lepton number,
> (among other things) is conserved, you could say that *no* destructive
> action is taking place, only a functional mapping from state to state.

Nice try.

I find this uninteresting, though, since the same could be said about
the computer that I'm simulating it on.  No destruction is occurring.
Just a functional "transition" from 'shape' at time T1 to 'shape' at 
time T2.

The sense of "destruction" in both cases is not that the prior matter is 
unavailable, but that its physical state is inaccessible.  

If matter is never really destroyed, then the term "destruction" has no
meaning if it doesn't mean "makes a prior state inaccessible", usually by
applying a kind of trapdoor transformation, i.e., one that is easier to
compute forward than in reverse.
From: Joe Marshall
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <D%NDa.1140086$S_4.1171285@rwcrnsc53>
"Kent M Pitman" <······@world.std.com> wrote in message ····················@shell01.TheWorld.com...
> "Joe Marshall" <·············@attbi.com> writes:
>
> > "Kent M Pitman" <······@world.std.com> wrote in message ····················@shell01.TheWorld.com...
> >
> > >  The entire universe is constructed of
> > > "destructive" stuff--that is, in a 4-dimensional universe, all actions
> > > occur by destroying T=n's state to produce T=n+1's state.  There is no
> > > evidence at present that T=n+1 occurs by consing new matter.  Among
> > > other things, if you believe time to be continuous, this would require
> > > infinite consing for any finite unit "1" to be added to any "n".
> >
> > On the other hand, given that baryon number, charge, and lepton number,
> > (among other things) is conserved, you could say that *no* destructive
> > action is taking place, only a functional mapping from state to state.
>
> Nice try.
>
> I find this uninteresting, though, since the same could be said about
> the computer that I'm simulating it on.  No destruction is occurring.
> Just a functional "transition" from 'shape' at time T1 to 'shape' at
> time T2.
>
> The sense of "destruction" in both cases is not that the prior matter is
> unavailable, but that its physical state is inaccessible.

In the quantum world, it appears that the prior state is simply the
inverse function.  Hardly inaccessible.

> If matter is never really destroyed, then the term "destruction" has no
> meaning if it doesn't mean "makes a prior state inaccessible", usually by
> applying a kind of trapdoor transformation, i.e., one that is easier to
> compute forward than in reverse.

You'd have to invoke thermodynamics to get irreversability.
From: Kent M Pitman
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <sfw3ciotd8k.fsf@shell01.TheWorld.com>
"Joe Marshall" <·············@attbi.com> writes:

> You'd have to invoke thermodynamics to get irreversability.

I didn't say irreversible.  I said "much higher expense".
"Just wait a moment" to move forward in time.
Apply complex math/physics to move backward.
I'd say that was "much higher expense".

Anyway, this is off the track and at the wrong abstraction level.
Even if we used "functional" solutions, it would not change the fact
that it compiles down to the same computer with the same instruction
set.  So either that instruction set is already inherently functional
(in which case there is nothing to complain about) or it is inherently
not functional (in which case the "inherent" part dominates, and
there's nothing we can change from software, for lack of a 
change-hardware opcode).  Anything happening at this level is not 
materially interesting.

What IS interesting in this discussion is whether the operators you're
using work with O(1) storage or not.  NREVERSE does.  REVERSE does
not.  NREVERSE can reverse any argument it is given without fear of
running out of space; REVERSE cannot.  Absent special-case compiler
optimizations for specific patterns of dataflow, REVERSE has inherent
intermediate expression swell that NREVERSE does not.  I don't see a
way to get around that.  If you abort a REVERSE operation half-way
through, no one's pointers have been screwed up; while for NREVERSE
they have.  But the price you pay for this is that you must be able
to, at least instantaneously, store two 'copies' (more or less) of the
item being reversed.
From: Erann Gat
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <gat-0506031709500001@k-137-79-50-101.jpl.nasa.gov>
In article <···············@shell01.TheWorld.com>, Kent M Pitman
<······@world.std.com> wrote:

> REVERSE has inherent intermediate expression swell that NREVERSE does not.

Only for one particular representaion of lists (albeit the usual one). 
There are representations of lists for which non-destructive reverse can
be done in O(1) space, e.g. a vector containing the list elements, and a
separate record containing pointers to the beginning and end of the list,
with the proviso that if the beginning is "after" the end then the vector
elements are to be interpreted as being in reverse order.

E.
From: William D Clinger
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <b84e9a9f.0306052040.1931e1b6@posting.google.com>
I see two small flaws in your argument, Kent.  IMO the first flaw
is fatal to half of your argument.  The second half of your argument
becomes more compelling when its flaw is repaired.

Kent M Pitman wrote:
> Anyway, this is off the track and at the wrong abstraction level.
> Even if we used "functional" solutions, it would not change the fact
> that it compiles down to the same computer with the same instruction
> set....

If your level of abstraction were the right one, programming
languages wouldn't matter because they all compile down to the
same instruction set.  The fact is that programming languages
do matter because they are the abstractions that the programmer
sees and reasons about.

> What IS interesting in this discussion is whether the operators you're
> using work with O(1) storage or not.  NREVERSE does.  REVERSE does
> not.

When you remember the space occupied by the argument to these
procedures, both require O(n) storage.  (This accounting becomes
more complicated when the argument is shared with some other
structure, but NREVERSE is seldom usable in that situation so
I'm going to ignore that problem.)

Your real point is that a constant factor of 2 is significant
here.

Will
From: Pascal Bourguignon
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <87k7bzf011.fsf@thalassa.informatimago.com>
"Joe Marshall" <·············@attbi.com> writes:
> > I find this uninteresting, though, since the same could be said about
> > the computer that I'm simulating it on.  No destruction is occurring.
> > Just a functional "transition" from 'shape' at time T1 to 'shape' at
> > time T2.
> >
> > The sense of "destruction" in both cases is not that the prior matter is
> > unavailable, but that its physical state is inaccessible.
> 
> In the quantum world, it appears that the prior state is simply the
> inverse function.  Hardly inaccessible.
> 
> > If matter is never really destroyed, then the term "destruction" has no
> > meaning if it doesn't mean "makes a prior state inaccessible", usually by
> > applying a kind of trapdoor transformation, i.e., one that is easier to
> > compute forward than in reverse.
> 
> You'd have to invoke thermodynamics to get irreversability.

With thermodynamics, you don't get irreversibility, you get improbability.

However,  with  quantum   dynamics,  and  the  Heisenberg  uncertainty
principle,   you  get   irreversibility  (and   unpredictability)  per
uncertainty.



-- 
__Pascal_Bourguignon__                   http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.
From: Joe Marshall
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <65ni9b49.fsf@ccs.neu.edu>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> "Joe Marshall" <·············@attbi.com> writes:
> > > I find this uninteresting, though, since the same could be said about
> > > the computer that I'm simulating it on.  No destruction is occurring.
> > > Just a functional "transition" from 'shape' at time T1 to 'shape' at
> > > time T2.
> > >
> > > The sense of "destruction" in both cases is not that the prior matter is
> > > unavailable, but that its physical state is inaccessible.
> > 
> > In the quantum world, it appears that the prior state is simply the
> > inverse function.  Hardly inaccessible.
> > 
> > > If matter is never really destroyed, then the term "destruction" has no
> > > meaning if it doesn't mean "makes a prior state inaccessible", usually by
> > > applying a kind of trapdoor transformation, i.e., one that is easier to
> > > compute forward than in reverse.
> > 
> > You'd have to invoke thermodynamics to get irreversability.
> 
> With thermodynamics, you don't get irreversibility, you get improbability.
> 
> However,  with  quantum   dynamics,  and  the  Heisenberg  uncertainty
> principle,   you  get   irreversibility  (and   unpredictability)  per
> uncertainty.

The Second Law of Thermodynamics states that entropy increases with
time.  It is not symmetric about time, so therefore irreversible.

Quantum dynamics, however, *is* symmetrical about time, and therefore
*is* reversible.
From: Christophe Rhodes
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <sqk7byz0od.fsf@lambda.jcn.srcf.net>
Joe Marshall <···@ccs.neu.edu> writes:

> Pascal Bourguignon <····@thalassa.informatimago.com> writes:
>
>> With thermodynamics, you don't get irreversibility, you get improbability.
>> 
>> However,  with  quantum   dynamics,  and  the  Heisenberg  uncertainty
>> principle,   you  get   irreversibility  (and   unpredictability)  per
>> uncertainty.
>
> The Second Law of Thermodynamics states that entropy increases with
> time.  It is not symmetric about time, so therefore irreversible.
>
> Quantum dynamics, however, *is* symmetrical about time, and therefore
> *is* reversible.

You're both right, OK?

Viewed classically, the second law of thermodynamics ("heat cannot of
itself pass from one body to a hotter body" :-) is an expression of
the statistics of large numbers, and not a fundamental physical law.
It is an expression of the volume in phase space occupied by otherwise
indistinguishable microstates corresponding to a given macrostates;
put another way, it says that there are many more ways to have a
scrambled egg than a whole one.  On the other hand, even though the
second law is "just" a statistical law, the statistics are so good
that you won't see any violations in your lifetimes, or, indeed, in
the lifetime of the Universe.

Schroedinger's equation for the evolution of a quantum state, and the
Klein-Gordon or Dirac equations, is symmetrical in time; indeed, since
the latter two are relativistic, they'd better even be symmetrical in
time and space.  So quantum dynamics are time-symmetric, except that
there must be something more to it than that (though what, no-one's
terribly sure), as observation seems to reduce the quantum state to
just one of the available eigenstates.  This process, the mechanics of
which are unknown at present, is not time-symmetric (one can't "undo"
a state reduction).

At this point, I might be tempted to wander of into the realms of
speculation about quantum gravity and consciousness, but Roger Penrose
has already done that; even if I don't agree with many of his
conclusions, I recommend reading _The Emperor's New Mind_ for the
physics and the philosophy.

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)
From: Erann Gat
Subject: Re: Modern GC's and destructive operators
Date: 
Message-ID: <gat-0706030900350001@192.168.1.51>
In article <··············@lambda.jcn.srcf.net>, Christophe Rhodes
<·····@cam.ac.uk> wrote:

> So quantum dynamics are time-symmetric, except that
> there must be something more to it than that (though what, no-one's
> terribly sure),

Adami and Cerf have IMO a pretty convincing theory.  See
http://www.flownet.com/QM.PDF (section 5 in particular) for a layman's
summary.

Very quick summary: "measurement" and "entanglement" are essentially the
same thing, except that measurement involves a macroscopic number of
particles.  It _is_ possible to reverse an entanglement, but only by
bringing the entangled particles together.  Therefore, it is possible to
reverse a measurement, but only by bringing _all_ the entangled particles
back together again with enough precesion to undo all the many
entanglements in the system, which is practically impossible, though not
impossible in theory.

E.