From: Jim Menard
Subject: String manipulation
Date: 
Message-ID: <wsqd6fr24f4.fsf@io.com>
As with any new language I try to learn, I'm quickly overwhelmed not by the
core language but rather by the libraries.

I'm trying to write a function that takes a string and returns a new string
with all XHTML special characters replaced by their XHTML equivalients. For
example, "Hello, <World>" becomes "Hello, &lt;World&gt;".

I've read about with-output-to-string and with-input-from-string. Are they
what I should be using? Here's what I have come up with so far. This
function should just return a copy of the origial string, but it doesn't.

(defun escape-xhtml (str)
  (let ((out-str (make-array '(0) :element-type 'base-char
                             :fill-pointer 0 :adjustable t)))
    (with-input-from-string (s str)
			    (with-output-to-string (new out-str)
                                                   (write-char (read-char s)
                                                               new)))
    out-str))

> (escape-xhtml "foo")
> "f"

I'm using CLISP, if that matters.

Thank you in advance for your help.

Jim
-- 
Jim Menard, ····@io.com, http://www.io.com/~jimm/
"Thanks to the joint efforts of OpenOffice, Mozilla, and a few others, Emacs
officially entered the category of lightweight utilities." -- kalifa on /.

From: Erann Gat
Subject: Re: String manipulation
Date: 
Message-ID: <gat-3007031337340001@k-137-79-50-101.jpl.nasa.gov>
In article <···············@io.com>, Jim Menard <····@io.com> wrote:

> As with any new language I try to learn, I'm quickly overwhelmed not by the
> core language but rather by the libraries.
> 
> I'm trying to write a function that takes a string and returns a new string
> with all XHTML special characters replaced by their XHTML equivalients. For
> example, "Hello, <World>" becomes "Hello, &lt;World&gt;".
> 
> I've read about with-output-to-string and with-input-from-string. Are they
> what I should be using?

Probably not, but it depends on what you really want to do.

If you really want to just operate on strings, then this is probably not
what you want.  What you want instead is (loop for c across str ...) or
(map nil ...), and vector-push-extend for output.

But if you want to write a function that actually does this substitution
on streams rather than strings (which is probably what you'd actually be
doing in a real application) and you wanted to test that function on some
strings then WITH-INPUT-FROM-STRING and WITH-OUTPUT-TO-STRING is what
you'd use.  But the code would look like this:

(defun escape-xhtml (instream outstream)
  ...)

(defun test-escape-xhtml-on-a-string (str)
  (with-input-from-string (in str)
    (with-output-to-string (out)
      (escape-xhtml in out))))

> Here's what I have come up with so far. This
> function should just return a copy of the origial string, but it doesn't.
> 
> (defun escape-xhtml (str)
>   (let ((out-str (make-array '(0) :element-type 'base-char
>                              :fill-pointer 0 :adjustable t)))
>     (with-input-from-string (s str)
>                             (with-output-to-string (new out-str)
>                                                    (write-char (read-char s)
>                                                                new)))
>     out-str))
> 
> > (escape-xhtml "foo")
> > "f"

You should probably be using WITH-OUTPUT-TO-STRING with just one argument
for now, and dispense with OUT-STR until you know what you're doing.  Try
this:

  (with-output-to-string (s) (print "hello" s) (print "goodbye" s))

But the main problem is that you're only reading one character from str
because you don't have a loop in your code.

> I'm using CLISP, if that matters.

Nope.

E.
From: Brian Downing
Subject: Re: String manipulation
Date: 
Message-ID: <z2eWa.31895$YN5.27515@sccrnsc01>
In article <····················@k-137-79-50-101.jpl.nasa.gov>,
Erann Gat <···@jpl.nasa.gov> wrote:
> Probably not, but it depends on what you really want to do.
> 
> If you really want to just operate on strings, then this is probably not
> what you want.  What you want instead is (loop for c across str ...) or
> (map nil ...), and vector-push-extend for output.

This is what I thought as well, but with CMUCL at least,
WITH-OUTPUT-TO-STRING combined with WRITE-CHAR and WRITE-STRING is
much faster than VECTOR-PUSH-EXTEND.  V-P-E is more space-efficient,
but only with large strings.  Does anybody know why?  It would seem
to me that the streams interface should be slower.

(declaim (speed 3) (safety 0) (debug 0))

(declaim (inline lookup-entity))
(defun lookup-entity (char)
  (declare (base-char char))
  (case char
    (#\< "&lt;")
    (#\> "&gt;")
    (#\& "&amp;")
    (otherwise nil)))

;  20 char string: 15,000-20,000 CPU cycles, 216 bytes consed.
; 838 char string: 313,644 CPU cycles, 13,440 bytes consed.
(defun escape-xhtml (string)
  (declare (type (simple-base-string *) string))
  (with-output-to-string (new)
    (loop for char of-type base-char across string
          for escaped = (lookup-entity char)
          if escaped
          do (write-string escaped new)
          else do (write-char char new))))


;  20 char string: 41,134 CPU cycles, 384 bytes consed.
; 838 char string: 1,814,163 CPU cycles, 8,736 bytes consed.
(defun escape-xhtml-2 (string)
  (declare (type (simple-base-string *) string))
  (let ((new (make-array 0 :fill-pointer 0 :element-type 'base-char)))
    (loop for char of-type base-char across string
          for escaped = (lookup-entity char)
          if escaped
          do (loop for char-2 of-type base-char across escaped do
                   (vector-push-extend char-2 new))
          else do (vector-push-extend char new))
    new))

-bcd
--
*** Brian Downing <bdowning at lavos dot net> 
From: Erann Gat
Subject: Re: String manipulation
Date: 
Message-ID: <gat-3107031342010001@k-137-79-50-101.jpl.nasa.gov>
In article <·····················@sccrnsc01>, Brian Downing
<···········@lavos.net> wrote:

> In article <····················@k-137-79-50-101.jpl.nasa.gov>,
> Erann Gat <···@jpl.nasa.gov> wrote:
> > Probably not, but it depends on what you really want to do.
> > 
> > If you really want to just operate on strings, then this is probably not
> > what you want.  What you want instead is (loop for c across str ...) or
> > (map nil ...), and vector-push-extend for output.
> 
> This is what I thought as well, but with CMUCL at least,
> WITH-OUTPUT-TO-STRING combined with WRITE-CHAR and WRITE-STRING is
> much faster than VECTOR-PUSH-EXTEND.

Interesting.  In MCL it's the exact opposite.  That's why I think the best
solution is to add a layer of abstraction (like the GMAP that I proposed
in another thread) so you can transparently change implementations when
you move from one Lisp to another.

E.
From: Eric Marsden
Subject: Re: String manipulation
Date: 
Message-ID: <wzi7k5xaif3.fsf@melbourne.laas.fr>
>>>>> "bd" == Brian Downing <···········@lavos.net> writes:

  bd> This is what I thought as well, but with CMUCL at least,
  bd> WITH-OUTPUT-TO-STRING combined with WRITE-CHAR and WRITE-STRING is
  bd> much faster than VECTOR-PUSH-EXTEND.  V-P-E is more space-efficient,
  bd> but only with large strings.  Does anybody know why?  It would seem
  bd> to me that the streams interface should be slower.

WITH-OUTPUT-TO-STRING operates similarly to VECTOR-PUSH-EXTEND behind
the scenes: initially it allocates a string of size 40, and each time
the string is exhausted it allocates a new string that's two times
longer, copying over the previous contents to the new string using
REPLACE. 

The difference is that W-O-T-S is working with simple-base-strings,
whereas V-P-E is operating on base-strings (which aren't simple,
because they have a fill pointer). Operations on simple arrays are
much more efficient than on non-simple arrays.
  
-- 
Eric Marsden                          <URL:http://www.laas.fr/~emarsden/>
From: Paul F. Dietz
Subject: Re: String manipulation
Date: 
Message-ID: <K5OcnRgL-axT_LeiXTWJkA@dls.net>
Eric Marsden wrote:

> The difference is that W-O-T-S is working with simple-base-strings,
> whereas V-P-E is operating on base-strings (which aren't simple,
> because they have a fill pointer). Operations on simple arrays are
> much more efficient than on non-simple arrays.

If I understand correctly, CMUCL (and SBCL) need to have extensions
to their type systems that let them reason about the extra bells-and-
whistles on arrays.  Like, 'this is a vector of base characters with
a fill pointer, but without displacement or adjustability.'  Given that,
the compiler could generate code for v-p-e that would be as fast
as w-o-t-s.

(See SBCL bug #257.)

	Paul
From: Raymond Toy
Subject: Re: String manipulation
Date: 
Message-ID: <4nadat4it9.fsf@edgedsp4.rtp.ericsson.se>
>>>>> "Paul" == Paul F Dietz <·····@dls.net> writes:

    Paul> Eric Marsden wrote:

    >> The difference is that W-O-T-S is working with simple-base-strings,
    >> whereas V-P-E is operating on base-strings (which aren't simple,
    >> because they have a fill pointer). Operations on simple arrays are
    >> much more efficient than on non-simple arrays.

    Paul> If I understand correctly, CMUCL (and SBCL) need to have extensions
    Paul> to their type systems that let them reason about the extra bells-and-
    Paul> whistles on arrays.  Like, 'this is a vector of base characters with
    Paul> a fill pointer, but without displacement or adjustability.'  Given that,
    Paul> the compiler could generate code for v-p-e that would be as fast
    Paul> as w-o-t-s.

Are you sure?  I thought simple-base-strings are specialized arrays
which therefore have fast access such that a single memory read gives
you access to the desired element.  Other arrays need at least an
extra memory read or two to get at the desired element.

Ray, who didn't look at SBCL bug 257.
From: Christophe Rhodes
Subject: Re: String manipulation
Date: 
Message-ID: <sqsmola5pa.fsf@lambda.jcn.srcf.net>
"Paul F. Dietz" <·····@dls.net> writes:

> Eric Marsden wrote:
>
>> The difference is that W-O-T-S is working with simple-base-strings,
>> whereas V-P-E is operating on base-strings (which aren't simple,
>> because they have a fill pointer). Operations on simple arrays are
>> much more efficient than on non-simple arrays.
>
> If I understand correctly, CMUCL (and SBCL) need to have extensions
> to their type systems that let them reason about the extra bells-and-
> whistles on arrays.  Like, 'this is a vector of base characters with
> a fill pointer, but without displacement or adjustability.'  Given that,
> the compiler could generate code for v-p-e that would be as fast
> as w-o-t-s.

Is that enough?  I'd have thought the major problem was the extra
level of indirection in a non-simple array.

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)
From: Paul F. Dietz
Subject: Re: String manipulation
Date: 
Message-ID: <YZmcnZYgiLAV97eiXTWJjQ@dls.net>
Christophe Rhodes wrote:

>>If I understand correctly, CMUCL (and SBCL) need to have extensions
>>to their type systems that let them reason about the extra bells-and-
>>whistles on arrays.  Like, 'this is a vector of base characters with
>>a fill pointer, but without displacement or adjustability.'  Given that,
>>the compiler could generate code for v-p-e that would be as fast
>>as w-o-t-s.
> 
> 
> Is that enough?  I'd have thought the major problem was the extra
> level of indirection in a non-simple array.

Well, ok.  It also needs to be able to lift that out of the loop,
and infer that it isn't otherwise changed in the loop.

	Paul
From: Thomas F. Burdick
Subject: Re: String manipulation
Date: 
Message-ID: <xcvhe515gls.fsf@famine.OCF.Berkeley.EDU>
"Paul F. Dietz" <·····@dls.net> writes:

> Christophe Rhodes wrote:
> 
> >>If I understand correctly, CMUCL (and SBCL) need to have extensions
> >>to their type systems that let them reason about the extra bells-and-
> >>whistles on arrays.  Like, 'this is a vector of base characters with
> >>a fill pointer, but without displacement or adjustability.'  Given that,
> >>the compiler could generate code for v-p-e that would be as fast
> >>as w-o-t-s.
> > Is that enough?  I'd have thought the major problem was the extra
> > level of indirection in a non-simple array.
> 
> Well, ok.  It also needs to be able to lift that out of the loop,
> and infer that it isn't otherwise changed in the loop.

And once again, we get to Python's need to recognize loop invariants.
Someone really ought to implement that.

(Ducking out the back door...)

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Daniel Barlow
Subject: Re: String manipulation
Date: 
Message-ID: <87brv9vya0.fsf@noetbook.telent.net>
Brian Downing <···········@lavos.net> writes:

> This is what I thought as well, but with CMUCL at least,
> WITH-OUTPUT-TO-STRING combined with WRITE-CHAR and WRITE-STRING is
> much faster than VECTOR-PUSH-EXTEND.  V-P-E is more space-efficient,
> but only with large strings.  Does anybody know why?  It would seem
> to me that the streams interface should be slower.

If you ask on the cmucl-imp list you may find that someone offers to
slow the streams interface down to help you.  Even if nobody has time
available to do this for free, I imagine that you could contract
someone to make the changes for pay.

Merging the changes into the base CMUCL might prove politically more
difficult, though.  I'm not sure how best to go about that.


-dan

-- 

   http://www.cliki.net/ - Link farm for free CL-on-Unix resources 
From: Brian Downing
Subject: Re: String manipulation
Date: 
Message-ID: <qJTWa.49718$uu5.5332@sccrnsc04>
In article <··············@noetbook.telent.net>,
Daniel Barlow  <···@telent.net> wrote:
> If you ask on the cmucl-imp list you may find that someone offers to
> slow the streams interface down to help you.  Even if nobody has time
> available to do this for free, I imagine that you could contract
> someone to make the changes for pay.
> 
> Merging the changes into the base CMUCL might prove politically more
> difficult, though.  I'm not sure how best to go about that.

I'm sure I can handle slowing things down on my own.  8)

-bcd
--
*** Brian Downing <bdowning at lavos dot net> 
From: David Lichteblau
Subject: Re: String manipulation
Date: 
Message-ID: <87d6foq90a.fsf@neon.rz.fhtw-berlin.de>
Brian Downing <···········@lavos.net> writes:
[...]
>           else do (vector-push-extend char new))
[...]

(vector-push-extend char new (length new))
From: Zachary Beane
Subject: Re: String manipulation
Date: 
Message-ID: <slrnbig8ii.c8n.xach@localhost.localdomain>
In article <···············@io.com>, Jim Menard wrote:
> As with any new language I try to learn, I'm quickly overwhelmed not by the
> core language but rather by the libraries.
> 
> I'm trying to write a function that takes a string and returns a new string
> with all XHTML special characters replaced by their XHTML equivalients. For
> example, "Hello, <World>" becomes "Hello, &lt;World&gt;".
> 
> I've read about with-output-to-string and with-input-from-string. Are they
> what I should be using? Here's what I have come up with so far. This
> function should just return a copy of the origial string, but it doesn't.
> 
> (defun escape-xhtml (str)
>   (let ((out-str (make-array '(0) :element-type 'base-char
>                              :fill-pointer 0 :adjustable t)))
>     (with-input-from-string (s str)
> 			    (with-output-to-string (new out-str)
>                                                    (write-char (read-char s)
>                                                                new)))
>     out-str))
> 
>> (escape-xhtml "foo")
>> "f"
> 
> I'm using CLISP, if that matters.

In the code above, you'd need to loop, reading characters from the
stream until you don't have any left to read. If you were going to
stick with the above strategy, it might be something like:

   (defun escape-xhtml (str)
     (with-input-from-string (s str)
       (with-output-to-string (new)
	 (loop for char = (read-char s nil nil)
	       while char do (write-char char new)))))

If you wanted to stick in actual escaping, it might be something like:

   (defun escape-xhtml (string output-stream)
     (loop for char across string
           if (needs-escaping-p char)
	   do (write-string (lookup-entity char) output-stream)
           else do (write-char char output-stream)))

Zach
From: Erann Gat
Subject: Re: String manipulation
Date: 
Message-ID: <gat-3007031323530001@k-137-79-50-101.jpl.nasa.gov>
In article <···················@localhost.localdomain>, ····@xach.com wrote:

> In article <···············@io.com>, Jim Menard wrote:
> > As with any new language I try to learn, I'm quickly overwhelmed not by the
> > core language but rather by the libraries.
> > 
> > I'm trying to write a function that takes a string and returns a new string
> > with all XHTML special characters replaced by their XHTML equivalients. For
> > example, "Hello, <World>" becomes "Hello, &lt;World&gt;".
> > 
> > I've read about with-output-to-string and with-input-from-string. Are they
> > what I should be using? Here's what I have come up with so far. This
> > function should just return a copy of the origial string, but it doesn't.
> > 
> > (defun escape-xhtml (str)
> >   (let ((out-str (make-array '(0) :element-type 'base-char
> >                              :fill-pointer 0 :adjustable t)))
> >     (with-input-from-string (s str)
> >                           (with-output-to-string (new out-str)
> >                                                    (write-char (read-char s)
> >                                                                new)))
> >     out-str))
> > 
> >> (escape-xhtml "foo")
> >> "f"
> > 
> > I'm using CLISP, if that matters.
> 
> In the code above, you'd need to loop, reading characters from the
> stream until you don't have any left to read. If you were going to
> stick with the above strategy, it might be something like:
> 
>    (defun escape-xhtml (str)
>      (with-input-from-string (s str)
>        (with-output-to-string (new)
>          (loop for char = (read-char s nil nil)
>                while char do (write-char char new)))))

If you're going to use loop (which IMO is a good thing to do) you may as
well simply do:

  (loop for c across str ...

and dispense with the WITH-INPUT-FROM-STRING.

If you don't like LOOP you can use (map nil ...) instead.

E.
From: Adam Warner
Subject: Re: String manipulation
Date: 
Message-ID: <pan.2003.07.31.00.00.37.316110@consulting.net.nz>
Hi Jim Menard,

> As with any new language I try to learn, I'm quickly overwhelmed not by the
> core language but rather by the libraries.
> 
> I'm trying to write a function that takes a string and returns a new string
> with all XHTML special characters replaced by their XHTML equivalients. For
> example, "Hello, <World>" becomes "Hello, &lt;World&gt;".

(defun escape-xhtml (string)  
  (with-output-to-string (stream)  
    (loop for char across string do  
          (case char  
            (#\& (write-string "&amp;" stream))  
            (#\< (write-string "&lt;" stream))  
            (#\> (write-string "&gt;" stream))  
            (otherwise (write-char char stream)))))) 

[2]> (escape-xhtml "Hello, <World>")
"Hello, &lt;World&gt;"

> I've read about with-output-to-string and with-input-from-string. Are they
> what I should be using?

Just with-output-to-string, as above. Feel free to use the example
unencumbered.

Regards,
Adam