Re: string assembly in lisp?? help??

From: Sunil Mishra
Subject: Re: string assembly in lisp?? help??
Date: Wed, 17 Mar 1999 00:00:00 +0000
Message-ID: <efylngw7wx3.fsf@whizzy.cc.gatech.edu>

Tim Bradshaw <···@tfeb.org> writes:

> Lars Marius Garshol <······@ifi.uio.no> writes:
> 
> > * Lars Marius Garshol
> > | 
> > | I've stuffed the chars into a vector (using vector-push-extend) and
> > | then used coerce to make a string of it.
> > 
> > * Kent M. Pitman
> > | 
> > | Coerce can make a string from a list.  You don't have to first convert
> > | it to a vector.  (coerce '(#\a #\b #\c) 'string)
> > 
> > I assumed the vector approach to be more effective (and made sure to
> > allocate a reasonable initial-size vector to begin with). That is a
> > reasonable assumption, no?
> > 
> 
> No, it's a bad assumption.  Your approach allocates a vector (why not
> a string, in the first place?) and then makes a string with the same
> contents (remember that COERCE does not modify its argument, it makes
> a new object `corresponding to' its argument but with the right type).
> Using COERCE directly will traverse the list (I guess) twice, but only
> allocate a single string of the right length.

Well, making a string using the vector-push-extend approach is not all that 
difficult. In LW4.1,

EVENT-SEARCH 20 > (setq x (make-array 20 :element-type 'base-char
                                         :adjustable t :fill-pointer 0))
""

EVENT-SEARCH 21 > (vector-push-extend #\a x)
0

EVENT-SEARCH 22 > (vector-push-extend #\b x)
1

EVENT-SEARCH 23 > (vector-push-extend #\c x)
2

EVENT-SEARCH 24 > x
"abc"

EVENT-SEARCH 25 > (stringp x)
T

The only difference when using another version of lisp is the type of
character you use to make up the elements of the array. I believe in MCL,
for instance, it's base-character.

Sunil

Re: string assembly in lisp?? help?? Lars Marius Garshol
- Re: string assembly in lisp?? help?? Sunil Mishra
- Re: string assembly in lisp?? help?? Tim Bradshaw
- Re: string assembly in lisp?? help?? Erik Naggum
  - Re: string assembly in lisp?? help?? Sunil Mishra
    - Re: string assembly in lisp?? help?? Erik Naggum
  - Re: string assembly in lisp?? help?? Kent M Pitman

From: Lars Marius Garshol
Subject: Re: string assembly in lisp?? help??
Date: Wed, 17 Mar 1999 00:00:00 +0000
Message-ID: <wk4snjyklx.fsf@ifi.uio.no>

* Sunil Mishra
| 
| The only difference when using another version of lisp is the type
| of character you use to make up the elements of the array. I believe
| in MCL, for instance, it's base-character.

Is there a standardized way to do this? It really would be nice to
have extensible strings for this, since in my case I'm doing this in
an OMG IDL parser (which preferably shouldn't barf on long names).

--Lars M.

From: Sunil Mishra
Subject: Re: string assembly in lisp?? help??
Date: Wed, 17 Mar 1999 00:00:00 +0000
Message-ID: <efyk8wf90d3.fsf@whizzy.cc.gatech.edu>

Lars Marius Garshol <······@ifi.uio.no> writes:

> * Sunil Mishra
> | 
> | The only difference when using another version of lisp is the type
> | of character you use to make up the elements of the array. I believe
> | in MCL, for instance, it's base-character.
> 
> Is there a standardized way to do this? It really would be nice to
> have extensible strings for this, since in my case I'm doing this in
> an OMG IDL parser (which preferably shouldn't barf on long names).
> 
> --Lars M.

This *is* the standard way. You can get the element type of a string by
typing:

(array-element-type "a")

in any common lisp. Again, in LW4.1,

GRADER 90 > (array-element-type "a")
BASE-CHAR

Sunil

From: Tim Bradshaw
Subject: Re: string assembly in lisp?? help??
Date: Thu, 18 Mar 1999 00:00:00 +0000
Message-ID: <nkjemmnqhdg.fsf@tfeb.org>

Lars Marius Garshol <······@ifi.uio.no> writes:

> Is there a standardized way to do this? It really would be nice to
> have extensible strings for this, since in my case I'm doing this in
> an OMG IDL parser (which preferably shouldn't barf on long names).

Yes, this should be easy.  You can use your approach of an adjustable
array with a fill pointer, but make the element-type be whatever is
right (I forget what is now for strings, but the hyperspec will say).

--tim

From: Erik Naggum
Subject: Re: string assembly in lisp?? help??
Date: Fri, 19 Mar 1999 00:00:00 +0000
Message-ID: <3130830258347173@naggum.no>

* Lars Marius Garshol <······@ifi.uio.no>
| Is there a standardized way to do this?  It really would be nice to have
| extensible strings for this, since in my case I'm doing this in an OMG
| IDL parser (which preferably shouldn't barf on long names).

  from the description of the system class STRING in ANSI X3.226:

A string is a specialized vector whose elements are of type CHARACTER or a
subtype of type CHARACTER.  When used as a type specifier for object
creation, STRING means (VECTOR CHARACTER).

  so CHARACTER is already standard.  there is no need to use BASE-CHAR, and
  no need to worry about portability problems.

  in my view, however, the actual task at hand is to extract a subsequence
  of a stream's input buffer.  I do this with a mark in the stream and
  avoid copying until absolutely necessary.  this, however, requires access
  to and meddling with stream internals.

  a less "internal" solution is to use WITH-OUTPUT-TO-STRING and simply
  write characters to it until the terminating condition is met, to wit:

(with-output-to-string (name)
  (stream-copy <input-stream> name <condition>))
=> <string>

  the function STREAM-COPY could be defined like this:

(defun stream-copy (input output &key (count -1) end-test filter transform)
  "Copy characters from INPUT to OUTPUT.
COUNT is the maximum number of characters to copy.
END-TEST if specified, causes termination when true for a character.
FILTER if specified, causes only characters for which it is true to be copied.
TRANSFORM if specified, causes its value to be copied instead of character."
  (loop
    (when (zerop count)
      (return))
    (let ((character (read-char input nil :eof)))
      (when (eq :eof character)
	(return))
      (when (and end-test (funcall end-test character))
	(unread-char character input)
	(return))
      (when (or (null filter) (funcall filter character))
	(write-char (if transform
		      (funcall transform character)
		      character)
		    output)))
    (decf count)))

#:Erik

From: Sunil Mishra
Subject: Re: string assembly in lisp?? help??
Date: Fri, 19 Mar 1999 00:00:00 +0000
Message-ID: <efyg171rlzc.fsf@whizzy.cc.gatech.edu>

Erik Naggum <····@naggum.no> writes:

>   from the description of the system class STRING in ANSI X3.226:
> 
> A string is a specialized vector whose elements are of type CHARACTER or a
> subtype of type CHARACTER.  When used as a type specifier for object
> creation, STRING means (VECTOR CHARACTER).
> 
>   so CHARACTER is already standard.  there is no need to use BASE-CHAR, and
>   no need to worry about portability problems.

Erik,

This is exactly what I had thought when I had first tried this in lispworks
3.2. Alas, what I got was

CL-USER 6 > (make-array 10 :element-type 'character)
#(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL)

Hardly a string... Which had me pretty confused and led me to believe that
'character was not the right type to use. (I probably should have asked,
but I was much more of a newbie back then than I am now, and this bit of
information became a decontextualized fact over time.)

Thankfully, lispworks 4.1 does the right thing:

CL-USER 13 > (make-array 10 :element-type 'character)
"

Thanks for clearing things up.

Sunil

From: Erik Naggum
Subject: Re: string assembly in lisp?? help??
Date: Sat, 20 Mar 1999 00:00:00 +0000
Message-ID: <3130915884917754@naggum.no>

* Sunil Mishra <·······@whizzy.cc.gatech.edu>
| This is exactly what I had thought when I had first tried this in
| lispworks 3.2.  Alas, what I got was
| 
| CL-USER 6 > (make-array 10 :element-type 'character)
| #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL)

  ouch.  (I'm glad it has been fixed later.)

| Hardly a string...  Which had me pretty confused and led me to believe that
| 'character was not the right type to use.  (I probably should have asked,
| but I was much more of a newbie back then than I am now, and this bit of
| information became a decontextualized fact over time.)

  stuff like this is why I think programmers who don't read specifications
  learn bad habits: failure to get what you expect must be investigated and
  the culprit must actually be _found_: either you did something wrong, or
  somebody else did something wrong.  "oh, that didn't work, let's try
  something else" is good if you deal with the physical world and people,
  but when you're dealing with computers and programming languages, it's
  the _last_ property of the physical world I want to imitate.  if an
  expectation doesn't come true, either the expectation is wrong, you made
  a mistake in preparation for it, or there is a flaw in the system.  if
  you don't do the work necessary to figure out which of these three is the
  right one, you have 1/3 chance of getting it right by luck.  I think the
  most important desideratum for a programmer is an _unwillingness_ just to
  try something until it works -- a good programmer needs to know _why_.

| Thanks for clearing things up.

  sure.

#:Erik

From: Kent M Pitman
Subject: Re: string assembly in lisp?? help??
Date: Fri, 19 Mar 1999 00:00:00 +0000
Message-ID: <sfwyakt5x8j.fsf@world.std.com>

Erik Naggum <····@naggum.no> writes:

> * Lars Marius Garshol <······@ifi.uio.no>
> | Is there a standardized way to do this?  It really would be nice to have
> | extensible strings for this, since in my case I'm doing this in an OMG
> | IDL parser (which preferably shouldn't barf on long names).
> 
>   from the description of the system class STRING in ANSI X3.226:
> 
> A string is a specialized vector whose elements are of type CHARACTER or a
> subtype of type CHARACTER.  When used as a type specifier for object
> creation, STRING means (VECTOR CHARACTER).
> 
>   so CHARACTER is already standard.  there is no need to use BASE-CHAR, and
>   no need to worry about portability problems.

The first of the two uses of the word "need" here is odd.  Strictly,
there is no "need" to use Lisp, nor even to use computers.  There is a
"need" for food, clothing, and shelter.  But if we extend "need" to
sometimes mean "want" (which I assume you mean here), then there is
sometimes a "need" to use BASE-CHAR because in some implementations
CHARACTER may be inefficient (e.g., it might reserve a much more
heavy-duty space capable of holding multi-byte characters), and it may
sometimes be "necessary" to avoid this.  The problem Lars might be
perceiving, and I think it's a legit concern in certain limited
contexts, is that you can't know when one "needs" to use BASE-CHAR to
avoid overallocating space.

I think Erik is saying that one shouldn't pre-optimize something without
first knowing the general case will be a problem.  And what Lars is
saying is that he perceives it will be a problem.  This is something of
a clash of absolutes and both have some merit.  Mostly I think one should
probably define Erik's approach to be the most conservative, even if not
the one most people do.  I, too, prefer to write general code first and
get the shape and functionality right, and then to tune as needed where
a problem is discovered.  Since there is no a priori functional problem
with CHARACTER, one should just use it until a problem is discovered.
And then one might find one "needs" BASE-CHAR.  But not otherwise.

Pre-optimizing the type specifier before knowing that CHARACTER leads
to problems is mostly a bad idea.  It needlessly increases program
complexity at a time when you're just exploring what you want from
your program.  You may later throw away that line of code, and there's
no sense in having optimized it.  Or you may later find the code
doesn't get enough play and doesn't need a declaration.

Life is short, and one isn't meant to spend it writing gratuitous
declarations that don't actually do anything other than make code
harder to write.  At least, that's my own personal religious belief.
(Apologies in advance to those in this multicultural forum if I've
stepped on the toes of anyone whose religion teaches them that this IS
a good way to spend one's life.)

Over-optimizing also encourages you to build fragile interfaces.  For
example, I ran into a bug in some Lisp implementation where the vendor
had decided to use simple strings for symbol names.  Maybe an ok
assumption, but they didn't get it from the book, and they didn't fix
INTERN and friends to coerce symbol names to be simple, so when you
made a symbol with an adjustable string as a name, it let you do this,
but this made a mess.  (Note that the spec says "a string" not "a
simple string" as the argument to INTERN.  It doesn't forbid the
implementation from copying the string to be simple or another
element-type more suited to the characters it contains or whatever.
But it says this fact shouldn't be revealed to users.)  The
particulars of the bug are not as relevant as the point about your
responsibility when you narrow a type: your interface points must
minimally check the incoming type and preferrably should coerce
reasonable alternate types so that people don't do what I had to do
while learning some Java a few weeks ago to cast an "integer
represented as an object" back to an integer by something nutty like
[if memory serves me right--my Java memory is very flaky]:
((Integer)someObject).getIntVal() having to do not one but two
coercions just to say "this is in fact the integer it looks like".  I
don't want to get into a long discussion about Java or my inability to
navigate it smoothly or the fact that this clumsiness doesn't
overwhelm me with a desire to give up Lisp for it.  My point here is
simply this: type restriction has its place but it also has its cost.
And you should try to avoid paying the cost because that cost can
include infection of unrelated modules with needless paranoia.  (For
varying values of "need" again.)  In the INTERN case above, the spec
said it wasn't supposed to work the way the implementation did it, so
it was easy for me to complain, but when you design your own
interfaces your users won't have that luxury--they have to do what you
implement.  So make sure you're being sensitive to what's rational for
them to use.

- - - -

If one does feel compelled to pre-optimize this, what I recommend
doing is something akin to:

 ;;; !!! KLUDGE: Hide BASE-CHAR (ANSI) vs BASE-CHARACTER (CLTL2) distinction
 (defconstant +base-char-type+ 
   '#.(type-of (array-element-type "foo")))

 (make-array :element-type +base-char-type+ ...options...)

This is a kludge and isn't quite 100% right because theoretically the
system could allocate a more restricted string representation for the
constant string "foo" of known character composition than it would
for base-char, but in practice I haven't observed implementations to
do this and so the kludge is pretty portable.

Sometimes MAKE-STRING keeps you from doing this, but MAKE-STRING doesn't
take all the arguments MAKE-ARRAY does so in practice I've had to do
this in some cases.

(Sometimes you'll run into cases where the +base-char-type+ needs to be
used in a not-for-evaluation situation and it may be helpful to use
#.+base-char-type+ in that case.  If you do this, an EVAL-WHEN around the
DEFCONSTANT to make sure the variable is ready in the read-time environment
may be needed.  I left it out of the above just to keep things simple.)