diifferences CLtL -> ANS in MAKE-SEQUENCE and MAKE-STRING

From: Steven M. Haflich
Subject: diifferences CLtL -> ANS in MAKE-SEQUENCE and MAKE-STRING
Date: Fri, 24 Jan 2003 05:42:28 +0000
Message-ID: <3E30D244.8010307@no_spiced_ham_franz.com>

I've noticed an obscure but unfortunate inconsistency between CLtL and
the ANS (actually a spinoff from Paul Dietz's test suite development).
So far as I can recollect, it was _not_ something X3J13 did
intentionally.  This is mostly a request for corroboration by any
other old X3J13 members, and especially Kent.

The question concerns the :INITIAL-ELEMENT argument to MAKE-SEQUENCE
and MAKE-STRING and it effect on the initial contents of vectors
created by these functions.

CLtL says of MAKE-SEQUENCE:

  If the :INITIAL-ELEMENT argument is not specified, then the sequence
  will be initialized in an implementation-dependent way.

and of MAKE-STRING:

  If an :initial-element argument is not specified, then the string
  will be initialized in an implementation-dependent way.

Presumably, this allows almost any harmless behavior, including
initialization to "what happened already to be in memory."

But this is what the ANS says about the INITIAL-ELEMENT argument to
MAKE-SEQUENCE:

  initial-element - an object. The default is implementation-dependent.

and similarly of MAKE-STRING:

  initial-element - a character. The default is implementation-dependent.

Now, the CLtL and ANS definitions differ in an detectable way.
According to CLtL, after a call to either of these functions which
omits the :INITIAL-ELEMENT argument, user code can make no assumptions
about the elements of the sequence or string.  But with a literal
language-lawyer interpretation of the ANS, one can make no assumption
about the specific value to which each element of the sequence is
initialized, but one _can_ depend that every element of the
sequence/string is eql to every other element.

Speaking both as an implementer and a user, this change in the ANS is
obviously a bad idea.  It _requires_ the implementation to traverse
each newly-created sequence, initializing each element to the same
value, regardless of the fact that the user code doesn't care.  Any
user application that _cared_ about the initialization of the sequence
would have specified the initialization.

[With a particularly jaundiced reading, the ANS definition could provide
even worse semantics!  There is no implication that the
implementation-dependent default value must be a legal element for the
specified sequence type.  For example
   (make-sequence '(vector bit))
could default initial-element to a double-float, which arguably should
signal error noe way or another either at array reference time or array
reference time.  No call to MAKE-SEQUENCE or MAKE-STRING that results in
a more-specialized actual array-element-type than T or CHARACTER
respectively would even be portably legal.  Ugh!]

In a quick search, the only tangentially-relevant discussion I could
find in X3J13 discussions were some comments by Moon and others,
basically that the ANS should not preclude an implementation from
signaling error when an uninitialized array element is referenced.
It is easy to argue that this is a very-useful feature both at
program-development time and at run time, providing the hardware is
capable of implementing the uninitialized check with little overhead.

So I believe that this change from CLtL to ANS was an inadvertent
editorial construction, and something not intended the X3J13.  The ANS
was approved as it now stands, and says exactly what it says, but if
the change was unintended, it is something that could be noted and
perhaps fixed.

Any other X3J13 participants care to comment?

Steve Haflich
Chair, INCITS/J13

Re: diifferences CLtL -> ANS in MAKE-SEQUENCE and MAKE-STRING Kent M Pitman
- Re: diifferences CLtL -> ANS in MAKE-SEQUENCE and MAKE-STRING Duane Rettig
  - Re: diifferences CLtL -> ANS in MAKE-SEQUENCE and MAKE-STRING Duane Rettig
  - Re: diifferences CLtL -> ANS in MAKE-SEQUENCE and MAKE-STRING Steven M. Haflich

From: Kent M Pitman
Subject: Re: diifferences CLtL -> ANS in MAKE-SEQUENCE and MAKE-STRING
Date: Fri, 24 Jan 2003 18:19:34 +0000
Message-ID: <sfwlm1al8ax.fsf@shell01.TheWorld.com>

"Steven M. Haflich" <···@no_spiced_ham_franz.com> writes:

> Speaking both as an implementer and a user, this change in the ANS is
> obviously a bad idea.  It _requires_ the implementation to traverse
> each newly-created sequence, initializing each element to the same
> value, regardless of the fact that the user code doesn't care.  Any
> user application that _cared_ about the initialization of the sequence
> would have specified the initialization.

I vaguely recall what happened here.  My dim recollection says someone
convinced me that the implementation has to do this anyway for GC
reasons.

In retrospect, I suppose there are some 32-bit-fixnum and 32-bit-float
arrays that might not be marked through and might not need a GC scan to
validate as safe for allocation.  Oh well.

[Personally, I think it's good if the array is required to have data of a
 consistent value (though I would not in my editorial role have intentionally
 made such a technical change to the standard without a committee vote).
 It's not that I think that the idea of "getting me arbitrary garbage" is
 never wanted, I just think you ought to have to be more clear about 
 how you get that.]

> [With a particularly jaundiced reading, the ANS definition could provide
> even worse semantics!  There is no implication that the
> implementation-dependent default value must be a legal element for the
> specified sequence type.  For example
>    (make-sequence '(vector bit))
> could default initial-element to a double-float, which arguably should
> signal error noe way or another either at array reference time or array
> reference time.  No call to MAKE-SEQUENCE or MAKE-STRING that results in
> a more-specialized actual array-element-type than T or CHARACTER
> respectively would even be portably legal.  Ugh!]

I don't think this is a likely reading in practice.

> In a quick search, the only tangentially-relevant discussion I could
> find in X3J13 discussions were some comments by Moon and others,
> basically that the ANS should not preclude an implementation from
> signaling error when an uninitialized array element is referenced.
> It is easy to argue that this is a very-useful feature both at
> program-development time and at run time, providing the hardware is
> capable of implementing the uninitialized check with little overhead.

Yes, I recall there was discussion of how an uninitialized structure signals
an error but an uninitialized array does not, and that that was irregular.

> So I believe that this change from CLtL to ANS was an inadvertent
> editorial construction,

Mostly.  See above.

> and something not intended the X3J13.

Well, as with all other things, X3J13 did vote the whole of the result. 
What I don't recall is whether I brought it to the attention of the 
committee as a whole--I do think that some reviewers had a dialog with 
me on it and that it wasn't only me who saw it before it went out.

> The ANS was approved as it now stands, and says exactly what it
> says, but if the change was unintended, it is something that could
> be noted and perhaps fixed.

IMO it would be better to support a keyword that says :DONT-INITIALIZE T
(or :INITIALIZE-P NIL, if you prefer) which was plainer if that's the 
functionality you wanted to give.  I  think the present behavior is, for
better or worse, useful.

From: Duane Rettig
Subject: Re: diifferences CLtL -> ANS in MAKE-SEQUENCE and MAKE-STRING
Date: Fri, 24 Jan 2003 20:23:02 +0000
Message-ID: <4u1fypaah.fsf@beta.franz.com>

Kent M Pitman <······@world.std.com> writes:

> "Steven M. Haflich" <···@no_spiced_ham_franz.com> writes:
> 
> > Speaking both as an implementer and a user, this change in the ANS is
> > obviously a bad idea.  It _requires_ the implementation to traverse
> > each newly-created sequence, initializing each element to the same
> > value, regardless of the fact that the user code doesn't care.  Any
> > user application that _cared_ about the initialization of the sequence
> > would have specified the initialization.
> 
> I vaguely recall what happened here.  My dim recollection says someone
> convinced me that the implementation has to do this anyway for GC
> reasons.

The GC ramifications only tend to occur on non-specialized arrays (i.e.
arrays of type T.  Specialized arrays tend to place values unboxed
into specialized (and sometimes smaller) slots, and are not looked at
by the GC.  In a sense, it is the array itself which "boxes" the
set of values contained within it.

> > The ANS was approved as it now stands, and says exactly what it
> > says, but if the change was unintended, it is something that could
> > be noted and perhaps fixed.
> 
> IMO it would be better to support a keyword that says :DONT-INITIALIZE T
> (or :INITIALIZE-P NIL, if you prefer) which was plainer if that's the 
> functionality you wanted to give.  I  think the present behavior is, for
> better or worse, useful.

It is, from a performance aspect, anti-competitive with C.  The C libraries
provide both calloc() (which always initializes to 0) and malloc() (which
does no initialization). Guess which one C programmers use most often? :-)

I think such an extra keyword would be good (I would prefer :initialize-p
over :dont-initialize, for a number of reasons), and the defaults you
suggest would additionally afford backward-compatibilty, and would also
allow the faster allocation.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Duane Rettig
Subject: Re: diifferences CLtL -> ANS in MAKE-SEQUENCE and MAKE-STRING
Date: Fri, 24 Jan 2003 22:30:02 +0000
Message-ID: <4ptqmp4et.fsf@beta.franz.com>

Duane Rettig <·····@franz.com> writes:

> Kent M Pitman <······@world.std.com> writes:
> 
> > I vaguely recall what happened here.  My dim recollection says someone
> > convinced me that the implementation has to do this anyway for GC
> > reasons.
> 
> The GC ramifications only tend to occur on non-specialized arrays (i.e.
> arrays of type T.  Specialized arrays tend to place values unboxed
> into specialized (and sometimes smaller) slots, and are not looked at
> by the GC.  In a sense, it is the array itself which "boxes" the
> set of values contained within it.


> > IMO it would be better to support a keyword that says :DONT-INITIALIZE T
> > (or :INITIALIZE-P NIL, if you prefer) which was plainer if that's the 
> > functionality you wanted to give.  I  think the present behavior is, for
> > better or worse, useful.
> 
> It is, from a performance aspect, anti-competitive with C.  The C libraries
> provide both calloc() (which always initializes to 0) and malloc() (which
> does no initialization). Guess which one C programmers use most often? :-)
> 
> I think such an extra keyword would be good (I would prefer :initialize-p
> over :dont-initialize, for a number of reasons), and the defaults you
> suggest would additionally afford backward-compatibilty, and would also
> allow the faster allocation.

I also forgot to mention that such a specification extension such as
 (make-array 10 :initialize-p nil ...)
should only be advisory in cases where GC is indeed an issue.  The
implementation would need to be free to ignore the request not to
initialize, if such lack poses a danger to the GC.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Steven M. Haflich
Subject: Re: diifferences CLtL -> ANS in MAKE-SEQUENCE and MAKE-STRING
Date: Fri, 24 Jan 2003 22:55:23 +0000
Message-ID: <vvjY9.68$FK4.45@newssvr19.news.prodigy.com>

Duane Rettig wrote:
> Kent M Pitman <······@world.std.com> writes:

>>IMO it would be better to support a keyword that says :DONT-INITIALIZE T
>>(or :INITIALIZE-P NIL, if you prefer) which was plainer if that's the 
>>functionality you wanted to give.  I  think the present behavior is, for
>>better or worse, useful.

> I think such an extra keyword would be good (I would prefer :initialize-p
> over :dont-initialize, for a number of reasons), and the defaults you
> suggest would additionally afford backward-compatibilty, and would also
> allow the faster allocation.

I'm sorry to say I can't follow the logic of this interface design.  There
already is a keyword arg that specifies the initialization of each element in
the new sequence.  I would think that omitting it would be adequate indication
of the programmer's intention that the initial contents don't matter.  It
seems a little silly, and/or confusing, to have keyword arguments that control
how other keyword arguments are interpreted.

   (defun find (item sequence &key key test test-not ignore-test-not-p)
      ...)

I'm also hard pressed to imagine any _portable_ application that would blow
up if make-string and make-sequenmce were reverted to the CLtL initialization
semantics, which allow the programmer to express his choice between
efficiency and initialization, which (Kent suggests) is likely useful while
debugging, if useful at all.  Any sane application that depended  on
initialization would surely have expressed that by including the :initial-element
argument, rather than relying on an implementation-dependent default.  Remember,
the semantic point we are discussing here is _only_ whether all elements of the
new sequence are eql to one another, and _not_ the identity of that value.

There is a tension in language design between allowing the programmer to express
his intentions and always doing things the same way, avoiding shortcuts.
I generally lean towards the former.

Anyway, we have agreed on two things, at least: that the ANS currently specifies
all elements of the new seqwuence will be initialized the same; and that this
requirement was an editorial addition that was subsequently passed by X3J13,
although perhaps few reviewers appreciated it at the time.