Instruction re-ordering and MP lisp implementations

From: Daniel Barlow
Subject: Instruction re-ordering and MP lisp implementations
Date: Mon, 19 Nov 2001 00:21:50 +0000
Message-ID: <87k7wn4nkx.fsf_-_@noetbook.telent.net>

The standard says "progn evaluates forms, in the order in which they
are given".  It's not unheard-of for compilers in other languages - or
CPUs, for that matter - to re-order instructions to avoid pipeline
stalls and so forth.  Does the definition of progn forbid this for CL?

The obvious answer is "not unless it would make a difference to the
result".  The matter I'm pondering is that in a single-threaded Lisp
you could therefore perform a lot of reordering, but in a
multiprocessing lisp where one thread is updating shared data, another
thread could see the effects of this reordering - so is this still a
legitimate optimization?  Which is true?

(1) single-threaded lisps can potentially go faster by exploiting 
    parallelism opportunities to fill pipelines more effectively

or

(2) as soon as we use multiple processes we only have a weak C-like
    guarantee of ordering as seen from another process in the same 
    Lisp image  

(3) some other possibility I missed

I was unable to find any mention of threading in CLHS, but I'm sure
there were implementations available at the time that provided it and
purported to conform, so I expect that it might have come up at some
point.

I'm expecting (2) is actually going to be the case for most
implementations, which leads me to my next question: what do different
MP implementations provide in the way of ordering guarantees for
applications that need them?


-dan

-- 

  http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources

Re: Instruction re-ordering and MP lisp implementations Tim Bradshaw
- Re: Instruction re-ordering and MP lisp implementations Daniel Pittman
  - Re: Instruction re-ordering and MP lisp implementations Kent M Pitman
  - Re: Instruction re-ordering and MP lisp implementations Daniel Barlow
    - Re: Instruction re-ordering and MP lisp implementations Daniel Pittman
      - Re: Instruction re-ordering and MP lisp implementations Kent M Pitman
        Re: Instruction re-ordering and MP lisp implementations Kaz Kylheku
        Re: Instruction re-ordering and MP lisp implementations Thomas F. Burdick
        Re: Instruction re-ordering and MP lisp implementations Daniel Pittman
  - Re: Instruction re-ordering and MP lisp implementations Hartmann Schaffer
    - Re: Instruction re-ordering and MP lisp implementations Daniel Pittman
  - Re: Instruction re-ordering and MP lisp implementations Kaz Kylheku
Re: Instruction re-ordering and MP lisp implementations Roger Corman
- Re: Instruction re-ordering and MP lisp implementations Hartmann Schaffer
Re: Instruction re-ordering and MP lisp implementations Kent M Pitman
- Re: Instruction re-ordering and MP lisp implementations Tim Bradshaw

From: Tim Bradshaw
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Mon, 19 Nov 2001 10:44:14 +0000
Message-ID: <fbc0f5d1.0111190244.570d5f82@posting.google.com>

Daniel Barlow <···@telent.net> wrote in message news:<·················@noetbook.telent.net>...
> The obvious answer is "not unless it would make a difference to the
> result".  The matter I'm pondering is that in a single-threaded Lisp
> you could therefore perform a lot of reordering, but in a
> multiprocessing lisp where one thread is updating shared data, another
> thread could see the effects of this reordering - so is this still a
> legitimate optimization?  Which is true?

I would have thought that in a multithreaded system where one thread
is updating shared data it's more-or-less mandatory to do some kind of
synchronization (locks / semaphores / whatever) and that the
synchronizarion mechanism would control the ordering.

--tim

From: Daniel Pittman
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Mon, 19 Nov 2001 12:06:19 +0000
Message-ID: <87snbbrmmc.fsf@inanna.rimspace.net>

On 19 Nov 2001, Tim Bradshaw wrote:
> Daniel Barlow <···@telent.net> wrote in message
> news:<·················@noetbook.telent.net>...
>> The obvious answer is "not unless it would make a difference to the
>> result". The matter I'm pondering is that in a single-threaded Lisp
>> you could therefore perform a lot of reordering, but in a
>> multiprocessing lisp where one thread is updating shared data,
>> another thread could see the effects of this reordering - so is this
>> still a legitimate optimization? Which is true?
> 
> I would have thought that in a multithreaded system where one thread
> is updating shared data it's more-or-less mandatory to do some kind of
> synchronization (locks / semaphores / whatever) and that the
> synchronizarion mechanism would control the ordering.

If you wanted anything like a reliable piece of software[1], you do
that. Reordering within an "operation" is less well defined, though.

In C, the concept of a "sequence point" exists. ";" at the end of a
statement is one, so is "," in the statement stream, but *NOT* in an
argument list or static initializer.

When the compiler hits a sequence point, it *MUST* have completed all
operations between that sequence point and the previous one.

Between them, however, the order of operations is undefined. This is
why the order of evaluation of function arguments is not defined in the
standard. At least, it's the technical reason why...

So, for the C code:

int x = 1 * 2 + 3;

The implementor can process the operation math "1 * 2 + 3" and the
assignment to "x" in any order they feel like, and they can do anything
else that they want in that period, no limits.

Once they hit that semi-colon, though, the entire statement *must* be
complete. You can't legally delay the math, for example, until "x" is
used -- you must assign to it at the semi-colon.

This, incidentally, maps relatively well to the read and write barriers
in weakly-ordered CPU implementations. :)

Under Lisp, I didn't find any assurance in the standard that anything
was the moral equivalent of a sequence point. That is, I believe it
would be legal for a vendor to delay operations as long as they
wanted...

...if they could prove that this didn't change the state of the image
for the next operation in the order of evaluation.

This is actually a *stronger* assurance than C, because there is no
operation that lacks a well defined order of execution.

It's also a nicer one, because it lets the compiler be as smart as it
wants and as lazy as it wants.

How, exactly, this makes life miserable for a threads programmer, of
course, I don't know. I don't have enough experience there -- but high
level locking primitives should just do the right thing, because the
Lisp standard *enforces* the order of operations.

        Daniel

Footnotes: 
[1]  This doesn't seem to be high on the priority lists of many people
     who write threaded code out there, from my cynical perspective. :)

-- 
The past is a foreign country: they do things differently there.
        -- L P Hartley, _The Go-Between_

From: Kent M Pitman
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Mon, 19 Nov 2001 17:46:39 +0000
Message-ID: <sfw7ksm3b7k.fsf@shell01.TheWorld.com>

Daniel Pittman <······@rimspace.net> writes:

> In C, the concept of a "sequence point" exists. ";" at the end of a
> statement is one, so is "," in the statement stream, but *NOT* in an
> argument list or static initializer.
> 
> When the compiler hits a sequence point, it *MUST* have completed all
> operations between that sequence point and the previous one.
> 
> Between them, however, the order of operations is undefined. This is
> why the order of evaluation of function arguments is not defined in the
> standard. At least, it's the technical reason why...

Interesting.

In CL, we found that there were too many opportunities for non-portability
if we let implementations re-order things in a way that was not user-visible.
Yes, this does tie the hands of implementors sometimes.  But it also enables
users to write certain effects more concisely because the order is defined.
 (foo (read) (read))
for example does not require any special notation to make sure the two read
calls occur in the right order. 

The belief is, and I stand behind it 100%, that the function of optimizations
should be to enhance semantics, not to drive semantics.  Put another way, an 
optimization is not an optimization if it destroys meaning.  And at the end
of the day, changing the instruction pipeline seems to get me little constant 
factors of time, but at the cost of code readability, predictability, 
portability, and other things I value more.  I'd rather just wait 10 minutes
and get a new release of some faster chip if I was that close to a necessary
speed and needed more.

I basically am not a believer in the whole ratrace created by making every
program compete for speed with other programs on the same processor.  It's
like a never-ending arms race spiral.  I know it's essential in some cases,
but largely I think it's the syren's call because it just encourages one to
waste large amounts of one's valuable life doing stupid little optimizations
that don't improve quality of life.  For example, long ago, lisp paid a price
for making just about everything a pointer indirection.  And implementors have
done a wonderful job of narrowing  the perceived effect of that choice so
that in practice it doesn't even cost us a factor of 2 in most cases.  But
I'd still trade that factor of 2 in any remaining cases for the flexibility.
And if I want more speed, I'll get a faster processor.  Others might compare
it to C on the same processor and say "yeah, but the competition will be
faster on that new processor".  And maybe they'll be right.  But it is also
a crutch that keeps the competition (C) from ever being flexible because,
by induction, they will never have a processor fast enough that they are EVER
willing to lose some speed in exchange for flexibility.

It reminds me of money.  I know people with lots more money than me (which
is not hard to do) who won't spend their money because that's how they got
rich and they don't want to lose it.  But if you have money and you can't use
it for anything, how rich are you really?  If your goal in life is to retire
and pass money on to your kids, after having passed along to them your money
ethic that says they must never use the family money because they have to pass
it on to THEIR kids, how have you enriched anyone's life?  As Aristotle said
in his discussion of what I think is called virtue ethics, virtues are the
means between ridiculous extremes.  

The whole purpose of faster processors, to me, is not to take the same old
lumbering code that didn't quite run fast enough on the earlier processor and
declare it to not run quite fast enough on a new one.  It's to give you enough
of a speed boost that you can afford to do new and fun things with the next
processor.  And to me, indulging a failure to use the cache fully is a good
thing, if it buys me debuggability, portability, and other things that 
materially affect my productivity, the quality (not just speed but, for 
example, correctness) of the things I make, etc.

Life is short and it was not meant to be spent making people feel guilty
about instruction pipelines being only partly full or caches being missed.
Those things are there to serve me, not vice versa.

Just my opinion, of course.  And it doesn't imply Lisp can't win on these
points.  It's just not gonna be me that is either asking for it or responding
to others asking for it.

> Under Lisp, I didn't find any assurance in the standard that anything
> was the moral equivalent of a sequence point. That is, I believe it
> would be legal for a vendor to delay operations as long as they
> wanted...

The sequence points are implicit, based on what the compiler can infer has
no observable side-effect.  In essence, every evaluation or sub-evaluation
that might potentially have a side-effect is bounded by a sequence point.

Implementations are permitted to have declarations that identify functions
as non-side-effecting in order to improve this, and users are permitted
to program in a side-effect-free way in order to further improve this.

> This is actually a *stronger* assurance than C, because there is no
> operation that lacks a well defined order of execution.
>
> It's also a nicer one, because it lets the compiler be as smart as it
> wants and as lazy as it wants.

Yes.

> How, exactly, this makes life miserable for a threads programmer, of
> course, I don't know. I don't have enough experience there -- but high
> level locking primitives should just do the right thing, because the
> Lisp standard *enforces* the order of operations.

Among programmers who like to synchronize their code, I expect the
difference is minimal.  There are issues in HOW to express
synchronization, but once expressed I don't think there's any worse of
an issue in multitasking.  We in the Lisp design community talked
about the issue of people wanting mapping operations to just
"naturally" work by farming out the mapping to different processors
with no synchronization markers and we concluded this was really an
utterly different language.  (See the connection machine, for example,
and its associated lisp.)  The Scheme community has defined arg
evaluation to work in any order becuase it didn't want to thwart
compilers, but that's yet another reason I always post here and not on
comp.lang.lisp.  I have used Scheme a lot, and I don't begrudge them
their personal choice, but I just have a lot of problems with letting
known optimization choices lead my design.  Not the last of the
problems this causes is that it fails to take into account the next
set of optimizations that are going to be discovered right after you
release the language, so the language will have to always be back in
the shop being repaired.  I'd rather pick a surface goal, whether
achievable or not, and let my language drive the optimization
community for a while.

From: Daniel Barlow
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Mon, 19 Nov 2001 18:46:41 +0000
Message-ID: <8766864mzy.fsf@noetbook.telent.net>

Daniel Pittman <······@rimspace.net> writes:

> How, exactly, this makes life miserable for a threads programmer, of
> course, I don't know. I don't have enough experience there -- but high
> level locking primitives should just do the right thing, because the
> Lisp standard *enforces* the order of operations.

Yes, I have heard of locks ;-)

Consider this example:

Thread A is traversing (but not modifying) a list, which we'll call *my-list*
Thread B is inserting a new element (we'll call it `new') into it.  
`pred' is the element in the list which we're inserting after

If Thread B does - 

  (setf (cdr new) (cdr pred))
  (setf (cdr pred) new)

then *my-list* is always consistent, and writes don't need to be
locked against readers - only other writers.

If the reverse sequence is attempted (so that `new' is put in place
first and then the rest of the tail is joined onto it second) then a
simultaneous reader may only get halfway down the list and miss the
second half.

If we can guarantee ordering, readers of this list do not need to lock
it (writers obviously do, but that's another matter).  Otherwise they
do, and everything gets slower.  If you're about to tell me that I
should just use a lock and expect the Lisp implementation to work this
out for itself, then I will be both gladdened and amazed, because it
really does sound like SSC territory.

-dan

-- 

  http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources

From: Daniel Pittman
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Tue, 20 Nov 2001 01:35:47 +0000
Message-ID: <87bshyns0c.fsf@inanna.rimspace.net>

On 19 Nov 2001, Daniel Barlow wrote:
> Daniel Pittman <······@rimspace.net> writes:

You dropped my qualification about this being threads under Lisp.
*sulk* :)

>> How, exactly, this makes life miserable for a threads programmer, of
>> course, I don't know. I don't have enough experience there -- but
>> high level locking primitives should just do the right thing, because
>> the Lisp standard *enforces* the order of operations.

[...]

> Thread A is traversing (but not modifying) a list, which we'll call
> *my-list* Thread B is inserting a new element (we'll call it `new')
> into it. `pred' is the element in the list which we're inserting after
> 
> If Thread B does - 
> 
>   (setf (cdr new) (cdr pred))
>   (setf (cdr pred) new)
> 
> then *my-list* is always consistent, and writes don't need to be
> locked against readers - only other writers.
> 
> If the reverse sequence is attempted (so that `new' is put in place
> first and then the rest of the tail is joined onto it second) then a
> simultaneous reader may only get halfway down the list and miss the
> second half.
> 
> If we can guarantee ordering, readers of this list do not need to lock
> it (writers obviously do, but that's another matter).  

...assuming that the Lisp makes the (setf ...) form atomic, so you can't
read a partial value in the other (OS) thread. :)

> Otherwise they do, and everything gets slower. If you're about to tell
> me that I should just use a lock and expect the Lisp implementation to
> work this out for itself, then I will be both gladdened and amazed,
> because it really does sound like SSC territory.

Depends on your implementation, but it /should/ be safe on any
implementation out there that says (setf ...) and friends are atomic
operations across OS threads, or that implements Lisp level threads.

I think, but I am not a /lisp/ threading expert. KMP says as much in his
comments above, though.

        Daniel

-- 
Then he started in to dealing with slaves / And something inside of him died
     She had to sell everything she owned / And froze up inside.
        -- Bob Dylan, _Tangled Up In Blue_

From: Kent M Pitman
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Tue, 20 Nov 2001 02:09:45 +0000
Message-ID: <sfwk7wmgpli.fsf@shell01.TheWorld.com>

Daniel Pittman <······@rimspace.net> writes:

> Depends on your implementation, but it /should/ be safe on any
> implementation out there that says (setf ...) and friends are atomic
> operations across OS threads, or that implements Lisp level threads.
> 
> I think, but I am not a /lisp/ threading expert. KMP says as much in his
> comments above, though.

I don't think I meant to say setf and friends were atomic.  Just that their
order of argument evaluation was not something an implementation could 
rearrange.  

I'm not a threading expert myself, though I've certainly programmed a lot
of multi-threaded lisp stuff.  My knowledge is mostly seat-of-the-pants 
based on long experience with existing implementations, and is not
theoretical.

From: Kaz Kylheku
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Tue, 20 Nov 2001 03:12:31 +0000
Message-ID: <zSjK7.51409$Ud.2523094@news1.rdc1.bc.home.com>

In article <···············@shell01.TheWorld.com>, Kent M Pitman wrote:
>Daniel Pittman <······@rimspace.net> writes:
>
>> Depends on your implementation, but it /should/ be safe on any
>> implementation out there that says (setf ...) and friends are atomic
>> operations across OS threads, or that implements Lisp level threads.
>> 
>> I think, but I am not a /lisp/ threading expert. KMP says as much in his
>> comments above, though.
>
>I don't think I meant to say setf and friends were atomic.  Just that their
>order of argument evaluation was not something an implementation could 
>rearrange.  

Can the order be interleaved or parallel?

From: Thomas F. Burdick
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Tue, 20 Nov 2001 09:30:19 +0000
Message-ID: <xcv8zd1zt5g.fsf@conquest.OCF.Berkeley.EDU>

···@ashi.footprints.net (Kaz Kylheku) writes:

> In article <···············@shell01.TheWorld.com>, Kent M Pitman wrote:
> >Daniel Pittman <······@rimspace.net> writes:
> >
> >> Depends on your implementation, but it /should/ be safe on any
> >> implementation out there that says (setf ...) and friends are atomic
> >> operations across OS threads, or that implements Lisp level threads.
> >> 
> >> I think, but I am not a /lisp/ threading expert. KMP says as much in his
> >> comments above, though.
> >
> >I don't think I meant to say setf and friends were atomic.  Just that their
> >order of argument evaluation was not something an implementation could 
> >rearrange.  
> 
> Can the order be interleaved or parallel?

If you mean what I think you mean, no!  The assignments in PSETF are
done in parallel, but (I think) the evaluation of the value-generating
forms still needs to go left-to-right.  As in:

  * (let ((a ())
          b c d)
      (psetf b (push :b a)
             c (push :c a)
             d (push :d a))
      (values a b c d))
  (:D :C :B)
  (:B)
  (:C :B)
  (:D :C :B)

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Daniel Pittman
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Tue, 20 Nov 2001 03:13:24 +0000
Message-ID: <87d72em8x7.fsf@inanna.rimspace.net>

On Tue, 20 Nov 2001, Kent M. Pitman wrote:
> Daniel Pittman <······@rimspace.net> writes:
> 
>> Depends on your implementation, but it /should/ be safe on any
>> implementation out there that says (setf ...) and friends are atomic
>> operations across OS threads, or that implements Lisp level threads.
>> 
>> I think, but I am not a /lisp/ threading expert. KMP says as much in
>> his comments above, though.
> 
> I don't think I meant to say setf and friends were atomic. Just that
> their order of argument evaluation was not something an implementation
> could rearrange.

My reading of your statement, which I admit could be flawed, was that
you felt that the sort of structure he described was safe, in the way he
described.

Basically, you said that the order of evaluation was fixed to be
left-to-right in an externally visible way. As such, an implementation
would be correct if it did what was stated in the post, and incorrect if
it reordered the statements.

I am sorry if you feel I misrepresented you. I certainly had no such
intention.

> I'm not a threading expert myself, though I've certainly programmed a
> lot of multi-threaded lisp stuff. My knowledge is mostly
> seat-of-the-pants based on long experience with existing
> implementations, and is not theoretical.

Actually, the last point makes me regard your points as valid. I have
had "fun" recently, in an Erik kind of way, dealing with someone here at
work who has plenty of theoretical threading knowledge, but no actual
practical knowledge.

Java is warping the minds of young people today, rant, rant, rant. :)

        Daniel

-- 
Glass is, we now know, a 'slow liquid;' and we're slow dust.
        -- Albert Goldbarth

From: Hartmann Schaffer
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Tue, 20 Nov 2001 00:36:12 +0000
Message-ID: <3bf9a57c@news.sentex.net>

In article <··············@inanna.rimspace.net>,
	Daniel Pittman <······@rimspace.net> writes:
> ...
> In C, the concept of a "sequence point" exists. ";" at the end of a
> statement is one, so is "," in the statement stream, but *NOT* in an
> argument list or static initializer.
> 
> When the compiler hits a sequence point, it *MUST* have completed all
> operations between that sequence point and the previous one.

is that true?  i always assumed that the only purpose of the sequence
points is to define from which point on you can rely on all results of
side effects in expressions being available for use in other
expressions.  i still assume that the compiler can move code past a
sequence point when it can prove that it doesn't change the result of
the program execution.  The loop unrolling optimization clearly makes
mostly only sense if you are wrong.

> Between them, however, the order of operations is undefined. This is
> why the order of evaluation of function arguments is not defined in the
> standard. At least, it's the technical reason why...
> 
> So, for the C code:
> 
> int x = 1 * 2 + 3;
> 
> The implementor can process the operation math "1 * 2 + 3" and the
> assignment to "x" in any order they feel like, and they can do anything
> else that they want in that period, no limits.

this is clearly wrong:  the compiler is free to arrange the execution
order of the right hand side in any order it likes, also whatever it
has to generate for the left hand side, it also can mix up the code
sequences for both sides, but the assignment *must come last*,
otherwise x won't be 5.

> Once they hit that semi-colon, though, the entire statement *must* be
> complete. You can't legally delay the math, for example, until "x" is
> used -- you must assign to it at the semi-colon.

which one?  at the semicolon or when z is used.  just assume the
following situation:  the code for x= 1*2+3 is causing many scheduling
delays and is followed by a lengthy code sequence using x with many
more scheduling delays and sequence points, but no reference to x and
no side or aliassing effect threatening to modify x.  spreading the
code for the statement through the following code would make sense and
afaict is legal

hs

-- 

Apart from the obvious disagreement about who the good guys are, what
is the difference between "You are either with us or against us" and
"There are only good muslim and infidels"?

From: Daniel Pittman
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Tue, 20 Nov 2001 01:48:26 +0000
Message-ID: <876686nrf9.fsf@inanna.rimspace.net>

On 19 Nov 2001, Hartmann Schaffer wrote:
> In article <··············@inanna.rimspace.net>,
> 	Daniel Pittman <······@rimspace.net> writes:
>> ...
>> In C, the concept of a "sequence point" exists. ";" at the end of a
>> statement is one, so is "," in the statement stream, but *NOT* in an
>> argument list or static initializer.
>> 
>> When the compiler hits a sequence point, it *MUST* have completed all
>> operations between that sequence point and the previous one.
> 
> is that true?  

No. It's the standard, though, which is not the same thing. :)

> i always assumed that the only purpose of the sequence points is to
> define from which point on you can rely on all results of side effects
> in expressions being available for use in other expressions. 

That is the point of view of the current generation of C compiler
writers, which is fair and reasonable. It's the same decision as x86
architecture CPU makers made:

        If the end user can't actually /tell/ that we broke
        the ordering rules, did we actually break them?

> i still assume that the compiler can move code past a sequence point
> when it can prove that it doesn't change the result of the program
> execution. The loop unrolling optimization clearly makes mostly only
> sense if you are wrong.

It's living in a gray area according to the standard, but you probably
shouldn't count on the standard too much.

>> Between them, however, the order of operations is undefined. This is
>> why the order of evaluation of function arguments is not defined in
>> the standard. At least, it's the technical reason why...
>> 
>> So, for the C code:
>> 
>> int x = 1 * 2 + 3;
>> 
>> The implementor can process the operation math "1 * 2 + 3" and the
>> assignment to "x" in any order they feel like, and they can do
>> anything else that they want in that period, no limits.
> 
> this is clearly wrong:  the compiler is free to arrange the execution
> order of the right hand side in any order it likes, also whatever it
> has to generate for the left hand side, it also can mix up the code
> sequences for both sides, but the assignment *must come last*,
> otherwise x won't be 5.

Maybe my example wasn't brilliant. :)  Seriously, though, if you have a
CPU architecture where you can do the store before you finish the math,
it's legal to do it then.

IIRC, some of the clockless ARM designs were capable of doing that with
the integer pipeline and a spare memory store unit, but I wouldn't quote
me on that.

Anyway, so long as any caller can see the value 5 when they say "content
of x", you are legal enough for most authors. :)

>> Once they hit that semi-colon, though, the entire statement *must* be
>> complete. You can't legally delay the math, for example, until "x" is
>> used -- you must assign to it at the semi-colon.
> 
> which one?  at the semicolon or when z is used.  

By the standard, you must have assigned it at the semi-colon, at least
for the rest of the code. OTOH, C doesn't specify resistance to signals,
let alone threads, so it's a vague statement anyway.

> just assume the following situation: the code for x= 1*2+3 is causing
> many scheduling delays and is followed by a lengthy code sequence
> using x with many more scheduling delays and sequence points, but no
> reference to x and no side or aliassing effect threatening to modify
> x. spreading the code for the statement through the following code
> would make sense and afaict is legal

Makes sense, improves performance, is done ... but is less clearly legal
according to the standard.

Maybe I should point out that this is my reading of a standard and, as
such, is my interpretation. This is a corner case that they didn't spec
well, so I may be reading it wrong.

        Daniel

-- 
If a man is pictured chopping off a woman's breast, it only gets a R rating,
but if, God forbid, a man is pictured kissing a woman's breast, it gets an X
rating. Why is violence more acceptable than tenderness?
        -- Sally Struthers

From: Kaz Kylheku
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Tue, 20 Nov 2001 03:32:48 +0000
Message-ID: <A9kK7.51470$Ud.2525161@news1.rdc1.bc.home.com>

In article <··············@inanna.rimspace.net>, Daniel Pittman wrote:
>On 19 Nov 2001, Tim Bradshaw wrote:
>> Daniel Barlow <···@telent.net> wrote in message
>> news:<·················@noetbook.telent.net>...
>>> The obvious answer is "not unless it would make a difference to the
>>> result". The matter I'm pondering is that in a single-threaded Lisp
>>> you could therefore perform a lot of reordering, but in a
>>> multiprocessing lisp where one thread is updating shared data,
>>> another thread could see the effects of this reordering - so is this
>>> still a legitimate optimization? Which is true?
>> 
>> I would have thought that in a multithreaded system where one thread
>> is updating shared data it's more-or-less mandatory to do some kind of
>> synchronization (locks / semaphores / whatever) and that the
>> synchronizarion mechanism would control the ordering.
>
>If you wanted anything like a reliable piece of software[1], you do
>that. Reordering within an "operation" is less well defined, though.
>
>In C, the concept of a "sequence point" exists. ";" at the end of a
>statement is one, so is "," in the statement stream, but *NOT* in an
>argument list or static initializer.

It would be meaningless in a static initializer because such
initialization happens before program startup, at least conceptually. No
side effects can take place and in fact only constant expressions can
be used in such initializers.

C++ churns the waters because it allows more types of expressions;
and so the program can tell, or rely on, the initialization happening
as late as possible. C++ compilers generate code which uses hidden
variables associated with statement blocks to tell whether the
static variables need to be initialized. I don't know any compiler
which does this in a thread safe way.

>When the compiler hits a sequence point, it *MUST* have completed all
>operations between that sequence point and the previous one.

Not really; the rules are very loose. Only the visible results have to
agree. For instance, modifying an object is supposedly a side effect.
Does that mean that a modification must be written to the object at the
next sequence point? Heck no! Not if it can be proven that nobody can
tell! As a result, for instance, local variables can often be entirely
cached in registers for their entire lifetimes, sequence points be damned.

>Between them, however, the order of operations is undefined. This is
>why the order of evaluation of function arguments is not defined in the
>standard. At least, it's the technical reason why...
>
>So, for the C code:
>
>int x = 1 * 2 + 3;
>
>The implementor can process the operation math "1 * 2 + 3" and the
>assignment to "x" in any order they feel like, and they can do anything

Careful: this is an initialization, not an assignment! :)

>else that they want in that period, no limits.

In the abstract semantics, the assignment to the object cannot take
place until the new value is computed.  Moreover, the addition
cannot be done first, because then the result would be wrong.

In the actual semantics, you will probably have some machine instruction
which loads 5 into a register or something like that.

>Once they hit that semi-colon, though, the entire statement *must* be
>complete. You can't legally delay the math, for example, until "x" is
>used -- you must assign to it at the semi-colon.

Or, more accurately, a sequence point occurs at the end of every
full expression, such as the controlling expression of an if,
while or switch, and in certain other contexts, like operators
that have sequencing properties.

>This, incidentally, maps relatively well to the read and write barriers
>in weakly-ordered CPU implementations. :)

You'd be surprised how poorly things at this level map to such hardware
considerations.  Memory barriers are optimization barriers. Sequence
points are not optimization barriers; the only form of optimization
barrier in the C language is the volatile keyword, and that has weakly
defined semantics.

>Under Lisp, I didn't find any assurance in the standard that anything
>was the moral equivalent of a sequence point.

I suspect that such a thing is not necessary, because in Lisp you
don't have ambiguitites.

The whole purpose of sequence points in C is to punish programmers
who write something like

	a[i++] = i;

This is wrong because something is done between two consecutive sequence
point that ought not be done. The sequence points are needed so that
we can say that the timing of certain effects is not determined within
the expression, and so that we can slap the programmer's wrist for not
understanding it.

In Lisp we don't have the moral equivalent of a[i++] = i; or do we?

 That is, I believe it
>would be legal for a vendor to delay operations as long as they
>wanted...
>
>...if they could prove that this didn't change the state of the image
>for the next operation in the order of evaluation.
>
>This is actually a *stronger* assurance than C, because there is no
>operation that lacks a well defined order of execution.
>
>It's also a nicer one, because it lets the compiler be as smart as it
>wants and as lazy as it wants.

Some people, entirely wrongly in my opinion, believe that C's
permisiveness helps to produce faster code. That may have been
true of braindamaged compilers 20+ years ago.

>How, exactly, this makes life miserable for a threads programmer, of
>course, I don't know. 

It doesn't; these problems of sequencing make the life of the non-threads
programmer miserable.  In C implementations that support threading, like
POSIX environments, you must simply use the synchronization functions to
assure that side effects are properly propagated from the thread which
``publishes'' them to the one which ``consumes'' them.  Propagated at
every level, from the abstract language level, to the optimization level
of the actual semantics down to the memory hardware level with its own
caching and reordering.

From: Roger Corman
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Mon, 19 Nov 2001 08:34:51 +0000
Message-ID: <3bf8bebd.1557611386@news.callatg.com>

On 19 Nov 2001 00:21:50 +0000, Daniel Barlow <···@telent.net> wrote:

>
>The standard says "progn evaluates forms, in the order in which they
>are given".  It's not unheard-of for compilers in other languages - or
>CPUs, for that matter - to re-order instructions to avoid pipeline
>stalls and so forth.  Does the definition of progn forbid this for CL?
Actually I think the standard specifies order of evaluation in all cases, not
just progn. Such as function argument evaluation, and order of
evaluation for all special operators. I think you know that, but just to
clarify.

>
>The obvious answer is "not unless it would make a difference to the
>result".  
Right, if by "result" you mean that the difference would be perceptible by a
user or another part of the program. I think this excludes debuggers and
debugging tools, so that compiler writers can at least have a bit of freedom.

>The matter I'm pondering is that in a single-threaded Lisp
>you could therefore perform a lot of reordering, but in a
>multiprocessing lisp where one thread is updating shared data, another
>thread could see the effects of this reordering - so is this still a
>legitimate optimization?  Which is true?
The moment you speak of multiprocessing, you are out of the domain of the
standard, and potentially anything goes. An implementation can pretty much
define how it behaves with multiple threads, IMO. This is an obvious area that
could use standardization. I think that in a multi-threaded system a compiler
should not perform re-ordering of operations when it could trip up another
thread. However, the details and implications of that are pretty complex.
I think it would make sense to use the safety setting, or add some non-standard
declarations to specifically allow reordering in some cases.

Many lisp atomic operations actually require a series of instructions, and there
is no reason those can't be reordered. Of course if you have data structures
shared between threads you probably need to synchronize on them somehow anyway.
>
>(1) single-threaded lisps can potentially go faster by exploiting 
>    parallelism opportunities to fill pipelines more effectively
>
>or
>
>(2) as soon as we use multiple processes we only have a weak C-like
>    guarantee of ordering as seen from another process in the same 
>    Lisp image  
Well, an implementation could do both, and let the user decide via declarations
which was preferable. 

>
>(3) some other possibility I missed
>
>I was unable to find any mention of threading in CLHS, but I'm sure
>there were implementations available at the time that provided it and
>purported to conform, so I expect that it might have come up at some
>point.
An obvious Kent question. If he is still around...
>
>I'm expecting (2) is actually going to be the case for most
>implementations, which leads me to my next question: what do different
>MP implementations provide in the way of ordering guarantees for
>applications that need them?
Corman Lisp does not re-order operations, although it will eliminate them if
possible. Actually, I should say if it does re-order them I consider it a defect
to be fixed. Not that I don't see a possible value in that kind of optimization,
but there are many higher priority things to do that don't lead into the type of
predicament you bring up.

Roger

From: Hartmann Schaffer
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Tue, 20 Nov 2001 00:17:41 +0000
Message-ID: <3bf9a125@news.sentex.net>

In article <···················@news.callatg.com>,
	·····@corman.net (Roger Corman) writes:
> ...
>>The matter I'm pondering is that in a single-threaded Lisp
>>you could therefore perform a lot of reordering, but in a
>>multiprocessing lisp where one thread is updating shared data, another
>>thread could see the effects of this reordering - so is this still a
>>legitimate optimization?  Which is true?
> The moment you speak of multiprocessing, you are out of the domain of the
> standard, and potentially anything goes. An implementation can pretty much
> define how it behaves with multiple threads, IMO. This is an obvious area
> that
> could use standardization. I think that in a multi-threaded system a compiler
> should not perform re-ordering of operations when it could trip up another
> thread. However, the details and implications of that are pretty complex.

multithreading usually doesn't specify at which relative speed the
paralle threads proceed.  if a thread relies on data being supplied by
another thread, some synchronisation is required.  this would imply
that the reordering has to make sure that at synchronisation points
the state would e identical to unoptimized code.  otherwise a compiler
should be free to reorder the same way as in a single threaded program

hs

-- 

Apart from the obvious disagreement about who the good guys are, what
is the difference between "You are either with us or against us" and
"There are only good muslim and infidels"?

From: Kent M Pitman
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Mon, 19 Nov 2001 17:19:45 +0000
Message-ID: <sfwadxi3cge.fsf@shell01.TheWorld.com>

Daniel Barlow <···@telent.net> writes:

> The standard says "progn evaluates forms, in the order in which they
> are given".  It's not unheard-of for compilers in other languages - or
> CPUs, for that matter - to re-order instructions to avoid pipeline
> stalls and so forth.  Does the definition of progn forbid this for CL?

> The obvious answer is "not unless it would make a difference to the
> result". 

Exactly.  You can do any optimization you want as long as there's no
conforming operation you can do that violates it.

Lisp in general guarantees left-to-right order of nearly everything.
There's a huge amount of mechanism in SETF support, for example, to
assure users don't botch this as they extend it.  (Even contagion in
the math stuff obeys this, not that that's a multi-threading issue.)

> The matter I'm pondering is that in a single-threaded Lisp
> you could therefore perform a lot of reordering, but in a
> multiprocessing lisp where one thread is updating shared data, another
> thread could see the effects of this reordering - so is this still a
> legitimate optimization?

Since the standard does not speak to multithreading, I think you have
to assume each thread is a separate program.  Conformance claims can
only be made in that context.  The whole of the program cannot be said
to be conforming since it relies on features of an implementation not
part of the standard, so the matter is moot from a conformance point 
of view.  From a practical point of view, I think vendors try to do
reasonable things and try to seek an audience of users based on their
ability to defend their choice of what is reasonable.

>
> Which is true?
> 
> (1) single-threaded lisps can potentially go faster by exploiting 
>     parallelism opportunities to fill pipelines more effectively
>
> or
> 
> (2) as soon as we use multiple processes we only have a weak C-like
>     guarantee of ordering as seen from another process in the same 
>     Lisp image  
> 
> (3) some other possibility I missed

I prefer 3, I think, but only in the sense that it subsumes (2),
assuming I understand 2, which I may or may not since C multithreading
is not something I've ever done.  I think introducing multiple
processes responsibly means introducing synchronization mechanisms
that will allow each user a database-like "consistent view" of a local
region of code and data so that it isn't changing underneath him".  I
don't see how one can reasonably program absent such synchronization.
I think the compiler should compile as if for single-threading and 
not worry about what the view is from other processes (so I don't understand
the use of "as seen from another process" in (2)).  The only issue in my
mind is "as seen from the same process" and then "how to lock out other
processes".  Other processes that use unrelated data is not an issue.

There is a never-much-talked-about issue about whether symbol plists,
which are sort-of-private, sort-of-not, are to be properly
interlocked.  On the LispM, where multithreading was commonplace, they
were not interlocked, and it was recommended not to call LOAD from two
processes at once, primarily because some global effects like symbol
value/function/plist updating could get subtle race conditions in rare
cases.  This was so rare most people didn't think twice about it, but
the support folks did. :-) And people use plists much less these days
than they once did, since there is always that subtle risk someone
will setf SYMBOL-PLIST uncarefully.  So no one really knows if GET and
its SETF should be synchronized. It would probably slow it massively
if it were... 

> I was unable to find any mention of threading in CLHS, but I'm sure
> there were implementations available at the time that provided it and
> purported to conform, so I expect that it might have come up at some
> point.
> 
> I'm expecting (2) is actually going to be the case for most
> implementations, which leads me to my next question: what do different
> MP implementations provide in the way of ordering guarantees for
> applications that need them?

I would assume no different than single-threaded lisps, per my remarks
above.  I would ask instead, "how do they synchronize".
without-preemption, without-aborts, without-interrupts, etc. are the
kinds of names you see bandied about, and they do vary subtly between
implementations.  there are also usually variants on a with-lock
operation.  I often wish for a test-and-set and/or atomic-update
operation, and sometimes imperfectly mimic this with
without-interrupts, even though it makes some people cringe.  For
anything longer than just a test-and-set thing, I prefer
without-preemption in order to bound the risk of locking up the
machine.  Whether I am a model of good style in this arena, I can't
really say.  I do know I think the usual tools for locking a bit too
heavyweight to use as often as I need them, and the usual options for
managing interrupts/aborts/etc a bit too coarse.  I think there is
room for improvement in the set of linguistic options available to
users.

From: Tim Bradshaw
Subject: Re: Instruction re-ordering and MP lisp implementations
Date: Tue, 20 Nov 2001 10:43:24 +0000
Message-ID: <fbc0f5d1.0111200243.4e783d4c@posting.google.com>

Kent M Pitman <······@world.std.com> wrote in message news:<···············@shell01.TheWorld.com>...
> I prefer 3, I think, but only in the sense that it subsumes (2),
> assuming I understand 2, which I may or may not since C multithreading
> is not something I've ever done.  I think introducing multiple
> processes responsibly means introducing synchronization mechanisms
> that will allow each user a database-like "consistent view" of a local
> region of code and data so that it isn't changing underneath him".  I
> don't see how one can reasonably program absent such synchronization.
> I think the compiler should compile as if for single-threading and 
> not worry about what the view is from other processes (so I don't understand
> the use of "as seen from another process" in (2)).  The only issue in my
> mind is "as seen from the same process" and then "how to lock out other
> processes".  Other processes that use unrelated data is not an issue.

I agree strongly with this.  In particular I think there's a
significant difference between evaluation-order per thread and
evaluation-order as seen between threads.  CL has this wonderful
garuantee of ordering within a thread but I don't think it's safe to
assume that that implies ordering between threads.  For instance, on a
machine with more than one processor, it seems to me that to ensure
ordering *as seen by another thread* you'd need (apart from ensuring
atomicity of everything that you cared about) to ensure that any state
the compiler was keeping in registers for a given thread was pushed
out to memory every time anything that might affect another thread on
another CPU changed, and further you'd need to either assume a
cache-coherent system or manually flush the cache all the time.  I
think that this is an unreasonable burden on the system.

--tim