From: Adam Warner
Subject: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.05.04.21.26.497108@consulting.net.nz>
Hi all,

Say I want to redefine how strings are printed. Why when I add the method
below to the generic function PRINT-OBJECT do strings continue to print as
normal?

(defmethod print-object ((object string) stream)
  (funcall #'undefined object stream))

(make-string 10 :initial-element #\a) prints "aaaaaaaaaa" instead of
being unable to locate the print function UNDEFINED.

It works with user-defined classes:

(defclass user-defined () ())
(defmethod print-object ((object user-defined) stream)
  (funcall #'undefined object stream))

(make-instance 'user-defined) breaks into the debugger as expected (the
function UNDEFINED is undefined).

Thanks,
Adam

From: Christopher C. Stacy
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <u3c8nok0x.fsf@news.dtpq.com>
>>>>> On Fri, 05 Mar 2004 17:21:29 +1300, Adam Warner ("Adam") writes:
 Adam> Say I want to redefine how strings are printed.

In general, you're not allowed to redefine the Common Lisp runtime system.
Defining a PRINT-OBJECT method on a built-in class like STRING would be
in the same vein as redefining LISP:CAR.  Lisp might continue to work,
might do what you expected, or (as in your experiment with CMUCL) might
sometimes do sort of what you expected, or might just crash.

CLHS says:
  "Users may write methods for print-object for their own classes
   if they do not wish to inherit an implementation-dependent method."

It also says that the Lisp implementation has to define enough methods
so that somehow, everything will print (not necessarily readably).
But notice that it doesn't anywhere give you license to redefine
PRINT-OBJECT methods on system classes - just your own classes.
From: Pascal Bourguignon
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <878yifafsb.fsf@thalassa.informatimago.com>
······@news.dtpq.com (Christopher C. Stacy) writes:

> >>>>> On Fri, 05 Mar 2004 17:21:29 +1300, Adam Warner ("Adam") writes:
>  Adam> Say I want to redefine how strings are printed.
> 
> In general, you're not allowed to redefine the Common Lisp runtime system.
> Defining a PRINT-OBJECT method on a built-in class like STRING would be
> in the same vein as redefining LISP:CAR.  Lisp might continue to work,
> might do what you expected, or (as in your experiment with CMUCL) might
> sometimes do sort of what you expected, or might just crash.
> 
> CLHS says:
>   "Users may write methods for print-object for their own classes
>    if they do not wish to inherit an implementation-dependent method."
> 
> It also says that the Lisp implementation has to define enough methods
> so that somehow, everything will print (not necessarily readably).
> But notice that it doesn't anywhere give you license to redefine
> PRINT-OBJECT methods on system classes - just your own classes.

Yep, you would have to have 042, license to redefine COMMON-LISP symbols.

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Adam Warner
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.05.08.04.29.362813@consulting.net.nz>
Hi Christopher C. Stacy,

>>>>>> On Fri, 05 Mar 2004 17:21:29 +1300, Adam Warner ("Adam") writes:
>  Adam> Say I want to redefine how strings are printed.
> 
> In general, you're not allowed to redefine the Common Lisp runtime
> system. Defining a PRINT-OBJECT method on a built-in class like STRING
> would be in the same vein as redefining LISP:CAR.  Lisp might continue
> to work, might do what you expected, or (as in your experiment with
> CMUCL) might sometimes do sort of what you expected, or might just
> crash.
> 
> CLHS says:
>   "Users may write methods for print-object for their own classes
>    if they do not wish to inherit an implementation-dependent method."

Thanks for explaining the issue Christopher (I initially misunderstood the
sentence by skipping over "their own"). I was hoping to transparently
print CMUCL and SBCL ISO-8859-1 8-bit strings as UTF-8 by redefining the
string printer. I can easily do the subset of UTF-8 to ISO-8859-1 at input
by redefining the double-quote macro character. But there appears to be no
defined way to alter the string printer's output short of hacking each
implementation's source.

Regards,
Adam
From: Pascal Bourguignon
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <87znav9099.fsf@thalassa.informatimago.com>
Adam Warner <······@consulting.net.nz> writes:

> Hi Christopher C. Stacy,
> 
> >>>>>> On Fri, 05 Mar 2004 17:21:29 +1300, Adam Warner ("Adam") writes:
> >  Adam> Say I want to redefine how strings are printed.
> > 
> > In general, you're not allowed to redefine the Common Lisp runtime
> > system. Defining a PRINT-OBJECT method on a built-in class like STRING
> > would be in the same vein as redefining LISP:CAR.  Lisp might continue
> > to work, might do what you expected, or (as in your experiment with
> > CMUCL) might sometimes do sort of what you expected, or might just
> > crash.
> > 
> > CLHS says:
> >   "Users may write methods for print-object for their own classes
> >    if they do not wish to inherit an implementation-dependent method."
> 
> Thanks for explaining the issue Christopher (I initially misunderstood the
> sentence by skipping over "their own"). I was hoping to transparently
> print CMUCL and SBCL ISO-8859-1 8-bit strings as UTF-8 by redefining the
> string printer. I can easily do the subset of UTF-8 to ISO-8859-1 at input
> by redefining the double-quote macro character. But there appears to be no
> defined way to alter the string printer's output short of hacking each
> implementation's source.

However.  I think this kind of thing should be easier done in cmucl or
sbcl than for example in clisp.  Since a lot of clisp is written in C,
it's  harder  to modify  from  Lisp.  But  that's  not  the case  with
cmucl/sbcl. You should  be able to modify the  COMMON-LISP package (or
define  a UNICODE-COMMON-LISP  package if  you  want to  keep the  old
strings around),  where you'll change all the  external _and_ internal
functions that  need patching to  handle unicode.  This  would include
the reader and the printer functions.

It would be nice to have unicode support in cmucl & sbcl too.


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Adam Warner
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.05.08.47.35.504734@consulting.net.nz>
Hi Pascal Bourguignon,

>> Thanks for explaining the issue Christopher (I initially misunderstood
>> the sentence by skipping over "their own"). I was hoping to
>> transparently print CMUCL and SBCL ISO-8859-1 8-bit strings as UTF-8 by
>> redefining the string printer. I can easily do the subset of UTF-8 to
>> ISO-8859-1 at input by redefining the double-quote macro character. But
>> there appears to be no defined way to alter the string printer's output
>> short of hacking each implementation's source.
> 
> However.  I think this kind of thing should be easier done in cmucl or
> sbcl than for example in clisp.  Since a lot of clisp is written in C,
> it's  harder  to modify  from  Lisp.  But  that's  not  the case  with
> cmucl/sbcl. You should  be able to modify the  COMMON-LISP package (or
> define  a UNICODE-COMMON-LISP  package if  you  want to  keep the  old
> strings around),  where you'll change all the  external _and_ internal
> functions that  need patching to  handle unicode.  This  would include
> the reader and the printer functions.

I've already been working on this, representing Unicode strings as vectors
of pointers to Unicode character structures. Because of this limitation I
cannot control the printing of built-in vectors solely containing Unicode
character objects. The only way I can control it is by creating a Unicode
string _structure_ and the second I do that I have to rewrite all the
sequence-related functions, including LOOP :-( [e.g. I could no longer
write (loop for unicode-char across the-unicode-string do ...)].

Note that if I was able to redefine the string printer it should have
worked transparently as ASCII strings would have printed identically and
CMUCL and SBCL internals probably only depend upon ASCII strings (if the
internals use strings of characters for (unsigned-byte 8) binary
transportation then I'd be in trouble).

Perhaps I should refine the question as to whether there is an
implementation-defined way to change the printing of build-in classes
within CMUCL and SBCL.

Regards,
Adam
From: Marco Baringer
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <m2hdx3ejoa.fsf@bese.it>
Adam Warner <······@consulting.net.nz> writes:

> I've already been working on this, representing Unicode strings as vectors
> of pointers to Unicode character structures. Because of this limitation I
> cannot control the printing of built-in vectors solely containing Unicode
> character objects. The only way I can control it is by creating a Unicode
> string _structure_ and the second I do that I have to rewrite all the
> sequence-related functions, including LOOP :-( [e.g. I could no longer
> write (loop for unicode-char across the-unicode-string do ...)].

i haven't given enough thought to this, but wouldn't it be enough to
create, and provide operators for, unicode code points? A unicode
string would then just be a regular vector of unicode points, which
you could manipulate as you wish. 

Another issue here is that more often than not your unicode strings
will have two different formats: one for internal processing and
another for printing and writing (and reading).

-- 
-Marco
Ring the bells that still can ring.
Forget your perfect offering.
There is a crack in everything.
That's how the light gets in.
     -Leonard Cohen
From: Adam Warner
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.05.10.02.38.845153@consulting.net.nz>
Hi Marco Baringer,

>> I've already been working on this, representing Unicode strings as vectors
>> of pointers to Unicode character structures. Because of this limitation I
>> cannot control the printing of built-in vectors solely containing Unicode
>> character objects. The only way I can control it is by creating a Unicode
>> string _structure_ and the second I do that I have to rewrite all the
>> sequence-related functions, including LOOP :-( [e.g. I could no longer
>> write (loop for unicode-char across the-unicode-string do ...)].
> 
> i haven't given enough thought to this, but wouldn't it be enough to
> create, and provide operators for, unicode code points? A unicode
> string would then just be a regular vector of unicode points, which
> you could manipulate as you wish.

It would not be transparent. But it would work. Say you encode the string
"abc" as a vector of fixnums. Then the REPL would return #(97 98 99)
instead of a string. So you would have to add explicit code to print the
string instead of a vector. I think the best notation would be #"..."
for Unicode strings compared to "..." for standard strings. CLISP has
put a spanner in this notation by reserving #" as a dispatching reader
macro for pathnames (yes, CLISP developers and users can write #"..."
instead of #P"..." for pathnames).

The other downfall of this approach is no type checking between fixnums
and Unicode characters. Unicode character structures fix this issue
with the overhead of type T vectors (i.e. pointers to structures instead
of plain fixnums).

> Another issue here is that more often than not your unicode strings
> will have two different formats: one for internal processing and
> another for printing and writing (and reading).

The ANSI Common Lisp standard makes strings a subclass of vector. This
somewhat precludes the internal storage of variable-width character
strings with list-like access properties (e.g. UTF-8). So it's easiest to
store Unicode as a vector of constant-sized integers internally and
translate to UTF-8 to talk to the rest of the world.

Regards,
Adam
From: Ray Dillinger
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <40525975.A24CF969@sonic.net>
Marco Baringer wrote:
> 
> Adam Warner <······@consulting.net.nz> writes:
> 
> > I've already been working on this, representing Unicode strings as vectors
> > of pointers to Unicode character structures. Because of this limitation I
> > cannot control the printing of built-in vectors solely containing Unicode
> > character objects. The only way I can control it is by creating a Unicode
> > string _structure_ and the second I do that I have to rewrite all the
> > sequence-related functions, including LOOP :-( [e.g. I could no longer
> > write (loop for unicode-char across the-unicode-string do ...)].
> 
> i haven't given enough thought to this, but wouldn't it be enough to
> create, and provide operators for, unicode code points? A unicode
> string would then just be a regular vector of unicode points, which
> you could manipulate as you wish.

It depends on what character set you're using unicode to implement.  

If the OP is, like me, going the route where a "character" is a 
unicode base character codepoint plus nondefective sequence of 
combining characters, he needs to be able to box his characters as 
some kind of structure because he doesn't necessarily know how long 
each character is going to be.  

I looked at it and it's just incredibly clear that Unicode is a 
representation method, not a character set.  The character set 
isn't the (finite) set of unicode codepoints. There are two 
character sets implemented in Unicode, which are the characters 
expressible as NFKC and NFC.  Those are both infinite sets, although
charset:NFKC is a proper subset of charset:NFC.

Beyond that, you have a set of "pseudocharacters" which are not valid 
sequences of NFKC or NFC.  pseudocharacters don't start with a base 
codepoint, or don't have their combining codepoints in a possible 
order, or etc. 

				Bear
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.13.22.00.51.770901@knm.org.pl>
On Sat, 13 Mar 2004 00:44:17 +0000, Ray Dillinger wrote:

> If the OP is, like me, going the route where a "character" is a 
> unicode base character codepoint plus nondefective sequence of 
> combining characters, he needs to be able to box his characters as 
> some kind of structure because he doesn't necessarily know how long 
> each character is going to be.

It's a controversial viewpoint. I haven't seen it in use anywhere and
somehow doubt it will work well.

How would you represent strings: by a vector of those compound characters
or by a vector of code points?

In the first case characters in strings should be tagged like standalone
objects. It would complicate processing like string comparison & hashing,
I/O, encoding to streams of bytes, interfacing with other languages etc.

In the second case string indexing is no longer O(1), and mutating a
string in the middle may require unobvious shifting of its contents.

(IMHO in dynamically typed languages a separate type for characters
is mostly a historical artifact and efficiency hack. Characters can
be represented as strings of length 1, or if we mean a base character
with combining marks - as strings of a larger length. Since conceptual
characters are no longer restricted to a single string cell, strings
can use the convenient representation of code points without limiting
conceptual characters. Python, Perl, Ruby and Icon represent characters
as strings.)

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Ray Dillinger
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <4058A270.68F07CDC@sonic.net>
Marcin 'Qrczak' Kowalczyk wrote:
> 
> On Sat, 13 Mar 2004 00:44:17 +0000, Ray Dillinger wrote:
> 
> > If the OP is, like me, going the route where a "character" is a
> > unicode base character codepoint plus nondefective sequence of
> > combining characters, he needs to be able to box his characters as
> > some kind of structure because he doesn't necessarily know how long
> > each character is going to be.
> 
> It's a controversial viewpoint. I haven't seen it in use anywhere and
> somehow doubt it will work well.
> 
> How would you represent strings: by a vector of those compound characters
> or by a vector of code points?

By a vector of characters.  Most of the characters are single 
codepoints which fit in "immediate" values; occasionally one is 
a pointer, which points to a boxed "long" character.  This is 
basically the same way vectors of integers are handled except 
the typetags on the immediate values are different. 

> In the first case characters in strings should be tagged like standalone
> objects. It would complicate processing like string comparison & hashing,
> I/O, encoding to streams of bytes, interfacing with other languages etc.

String comparison is still easy; char=? characters are eq? (have the 
same address), which means I don't have to look at the boxed characters.  
I compare pointers with the same machine code that compares immediate 
characters.  Hashing got a little harder, but not really complicated; 
If a boxed character is collected and then re-allocated because it 
results from later computation, it may get a different address.  So to 
get reliable hashcodes persistent across string and character lifetimes, 
I have to look inside the boxed characters and get their values. 

I/O did get somewhat complicated; ports now have types attached.  There 
are character ports (which specify an encoding and character set) and 
there are binary ports (which allow reading/writing raw bits).  There 
are also conversions that use a port type to convert a string to a byte 
vector or codepoint vector, or vice versa; but codepoint vectors and 
byte vectors are vectors of bit fields, not vectors of characters. 

My take on things was that having the programmer sweat about the 
representation of individual graphemes on the level of codepoints 
is something like having the user sweat about the representation 
of big integers on the level of words; something that should be 
accessible through declarations, but not something that should be 
the "default" method of dealing with them.  If somebody wants a 
guarantee that a string has no boxed characters, he can declare it 
to be a string of charset:NFC-1 or even a string of charset:ascii - 
and then it's an error to assign any character in that string a 
value that's out of range for the declaration. 

Sigh, not that I've implemented these declarations yet.  :-(
 
> (IMHO in dynamically typed languages a separate type for characters
> is mostly a historical artifact and efficiency hack. Characters can
> be represented as strings of length 1, or if we mean a base character
> with combining marks - as strings of a larger length. Since conceptual
> characters are no longer restricted to a single string cell, strings
> can use the convenient representation of code points without limiting
> conceptual characters. Python, Perl, Ruby and Icon represent characters
> as strings.)

There's a good case for it.  I already have a type "text" 
which is a supertype of both character and string.  The 
more I work with this string-handling system, the more it 
seems to me like the distinction between characters and 
length-one strings is pointless.

I started out implementing a scheme.  But now I'm way *way* 
off the path to any kind of standard.  Scheme's macrology was 
too limited, so I dumped it and implemented something different.
Now I have macros which are first-class values at runtime, and 
changing a macro may trigger recompilation of routines during 
the run.  This is completely unlike CL's macrology, too.  

Once I got okay with the idea that I was implementing some lisp
that wasn't a scheme, I dumped case-insensitive identifiers too,
because I no longer cared about legacy code, it increased the 
namespace, and case identities in Unicode are a bugger.  Next 
came types and declarations because I wanted something that could
produce fast machine code.   That got me looking at CL, but I 
decided that CL had a lot of redundant stuff;  If your symbols 
have property lists for example, and your code is arbitrary 
list structure which may be cyclic in spots, then, if you 
implement the property lists as proper namespaces with their 
own hashtables, the structs and the package and module system 
are strictly redundant.  You can replace them completely with 
a few more operations on property lists.  Also I didn't need 
or want the separate function and data namespaces, and without 
them a lot of operations that are part of CL just aren't needed. 

Then I started looking for ways to make it more succinct and 
decided that many of the basic, most frequently used functions 
can just be implicit and nameless -- open-paren, arguments, 
close paren, and you can tell which function to call by the 
argument types.  

So, it's getting weird...  It's still a lisp, but it's less 
and less like any mainstream lisp.

				Bear
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.05.09.53.31.154826@knm.org.pl>
On Fri, 05 Mar 2004 21:47:39 +1300, Adam Warner wrote:

> The only way I can control it is by creating a Unicode
> string _structure_ and the second I do that I have to rewrite all the
> sequence-related functions, including LOOP :-( [e.g. I could no longer
> write (loop for unicode-char across the-unicode-string do ...)].

Why doesn't CL have user defined sequences (which work with loop, map etc.,
probably described by generic functions)?

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Jacek Generowicz
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <tyfk71z7c1o.fsf@pcepsft001.cern.ch>
Marcin 'Qrczak' Kowalczyk <······@knm.org.pl> writes:

> Why doesn't CL have user defined sequences (which work with loop, map etc.,
> probably described by generic functions)?

A while ago (about a year, is my guess) this topic was discussed in
c.l.l, and (if my memory isn't playing nasty tricks of fantasy with
me) someone posted an approach which allowed you to make your own
arbitrary sequences play nicely with loop. I was trying to find this
again a couple of weeks ago, but failed. Anyone else remember this? 
(Or was I dreaming?)
From: Adam Warner
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.05.14.03.43.177863@consulting.net.nz>
Hi Jacek Generowicz,

> Marcin 'Qrczak' Kowalczyk writes:
> 
>> Why doesn't CL have user defined sequences (which work with loop, map etc.,
>> probably described by generic functions)?
> 
> A while ago (about a year, is my guess) this topic was discussed in
> c.l.l, and (if my memory isn't playing nasty tricks of fantasy with
> me) someone posted an approach which allowed you to make your own
> arbitrary sequences play nicely with loop. I was trying to find this
> again a couple of weeks ago, but failed. Anyone else remember this? 
> (Or was I dreaming?)

I've never heard of it. But I've just developed a rather neat solution. It
should be acceptable as is because LOOP can't build vectors like collect
creates lists. I've included the minimum setup below.

(defpackage #:ucl
  (:use #:cl)
  (:shadow #:loop))

(in-package ucl)

(defstruct (ustring)
  (value nil :type t))

(defmacro loop (&rest args)
  (let (code wrap (temp (gensym "TEMP")))
    (dolist (item args)
      (cond (wrap (push `(let ((,temp ,item))
			  (if (ustring-p ,temp)
			      (ustring-value ,temp)
			      ,temp)) code)
		  (setf wrap nil))
	    (t (push item code)))
      (when (or (eq item 'across) (eq item :across))
	(setf wrap t)))
    `(cl:loop ,@(nreverse code))))

Now in the package UCL LOOP can also operate upon USTRING structures, e.g.:

(loop for char across (make-ustring :value "abcdef") collect char)
=> (#\a #\b #\c #\d #\e #\f)

Regards,
Adam
From: Adam Warner
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.05.10.27.40.864866@consulting.net.nz>
Hi Marcin 'Qrczak' Kowalczyk,

>> The only way I can control it is by creating a Unicode string
>> _structure_ and the second I do that I have to rewrite all the
>> sequence-related functions, including LOOP :-( [e.g. I could no longer
>> write (loop for unicode-char across the-unicode-string do ...)].
> 
> Why doesn't CL have user defined sequences (which work with loop, map
> etc., probably described by generic functions)?

I was speaking to a Ruby programmer the other day and he remarked that my
concerns about the extra overhead of its generic dispatch made me sound
like a C programmer.

It made me realise that there may soon come a day when we have to rethink
many of the non-generic elements of Common Lisp that exist solely for
higher speed. But at least the rest of the world is blind to Lisp's
infinitely superior syntax and statement/expression non-distinction. I'd
be scared of a Ruby with more Lisp-like syntax, semantics and macros.

Regards,
Adam
From: Paul Dietz
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <4048A880.7F42B44C@motorola.com>
Adam Warner wrote:

> > Why doesn't CL have user defined sequences (which work with loop, map
> > etc., probably described by generic functions)?
> 
> I was speaking to a Ruby programmer the other day and he remarked that my
> concerns about the extra overhead of its generic dispatch made me sound
> like a C programmer.
> 
> It made me realise that there may soon come a day when we have to rethink
> many of the non-generic elements of Common Lisp that exist solely for
> higher speed.

It's not obvious to me that there would be a speed penalty.
In CL, it is not conforming for a program to redefine the behavior
that standardized generic functions have on standardized classes
(specifically, the user can't define a method which is applicable
when all arguments are direct instances of standardized classes.)

This means the compiler, when it knows the types of the
arguments, can compile to a call to a nongeneric function
implementing the builtin method, or even inline that function.
This is just like what compilers do now on some sequence
functions.

As for the general case, the sequence functions already have
to do the equivalent of finding the applicable method at runtime.
If user extensions are added, this just means the last 'signal
an error' case gets converted to a lookup of the user-defined method.

BTW, a conforming lisp implementation may make any standardized
function be a generic function.

	Paul
From: Adam Warner
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.05.08.57.58.982962@consulting.net.nz>
I wrote:

> Perhaps I should refine the question as to whether there is an
> implementation-defined way to change the printing of build-in classes
> within CMUCL and SBCL.

There isn't. Here's the CMUCL code that prints strings (refer
src/code/print.lisp):

(defun output-vector (vector stream)
  (declare (vector vector))
  (cond ((stringp vector)
         (cond ((or *print-escape* *print-readably*)
                (write-char #\" stream)
                (quote-string vector stream)
                (write-char #\" stream))
               (t
                (write-string vector stream))))
              ...
              ...

(defun quote-string (string stream)
  (macrolet ((frob (char)
               ;; Probably should look at readtable, but just do this for now.
               `(or (char= ,char #\\)
                    (char= ,char #\"))))
    (with-array-data ((data string) (start) (end (length string)))
      (do ((index start (1+ index)))
          ((>= index end))
        (let ((char (schar data index)))
          (when (frob char) (write-char #\\ stream))
          (write-char char stream))))))

Regards,
Adam
From: Tim Bradshaw
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <ey38yifeav5.fsf@cley.com>
* Adam Warner wrote:

> Thanks for explaining the issue Christopher (I initially misunderstood the
> sentence by skipping over "their own"). I was hoping to transparently
> print CMUCL and SBCL ISO-8859-1 8-bit strings as UTF-8 by redefining the
> string printer. I can easily do the subset of UTF-8 to ISO-8859-1 at input
> by redefining the double-quote macro character. But there appears to be no
> defined way to alter the string printer's output short of hacking each
> implementation's source.

Why is this a property of how strings are printed?  Surely the
encoding of a character set is a property of the stream, and how
strings get printed should just follow that: the printer just sends
characters to whatever stream, and the stream deals with encoding
issues.

--tim
From: Adam Warner
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.05.13.25.09.398911@consulting.net.nz>
Hi Tim Bradshaw,

> * Adam Warner wrote:
> 
>> Thanks for explaining the issue Christopher (I initially misunderstood
>> the sentence by skipping over "their own"). I was hoping to
>> transparently print CMUCL and SBCL ISO-8859-1 8-bit strings as UTF-8 by
>> redefining the string printer. I can easily do the subset of UTF-8 to
>> ISO-8859-1 at input by redefining the double-quote macro character. But
>> there appears to be no defined way to alter the string printer's output
>> short of hacking each implementation's source.
> 
> Why is this a property of how strings are printed?  Surely the encoding
> of a character set is a property of the stream, and how strings get
> printed should just follow that: the printer just sends characters to
> whatever stream, and the stream deals with encoding issues.

If you have Lisp source code with UTF-8 encoded strings the semantics of a
Lisp character will still be messed up if 8 bit Unicode character _codes_
are read into an 8 bit Lisp. Unicode character codes between 128 and 255
are represented as two bytes in UTF-8 encoding. Without translation these
bytes would _semantically appear to be two Lisp characters_.

So to map Unicode character codes 0-255 between UTF-8 and ISO-8859-1
semantics a Lisp would have to translate some of the bytes in the input
stream.

The same 8 bit Lisp would also have to translate its 8 bit character codes
into a UTF-8 byte stream for output. This is a property of how the strings
are printed because 8 bit character codes encoded as single bytes
(ISO-8859-1) aren't valid UTF-8 encoding (only the lower 7 bits are).

So you can either store 8 bit character codes in UTF-8 encoded strings
and lose character semantics or store 8 bit character codes as ISO-8859-1
strings and perform explicit input and output translation to a UTF-8
encoded world.

Regards,
Adam
From: Tim Bradshaw
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <ey34qt3e5jv.fsf@cley.com>
* Adam Warner wrote:

> If you have Lisp source code with UTF-8 encoded strings the semantics of a
> Lisp character will still be messed up if 8 bit Unicode character _codes_
> are read into an 8 bit Lisp. Unicode character codes between 128 and 255
> are represented as two bytes in UTF-8 encoding. Without translation these
> bytes would _semantically appear to be two Lisp characters_.

I'm confused by what you want.  Are you proposing to have strings *in
the image* with UTF-8 encoding, or files containing a UTF-8-encoded
representation of some Lisp source code?

If the former, then I think this is mad.  Indexing into such a string
is some horrible problem which involves walking down the string rather
than just computing an offset based on the index and the size of a
character in bytes.

If the latter, then surely the encoding/decoding happens at the point
where characters are read from the stream (and applies equally to all
characters - for instance Lisp symbols can have arbitrary strings as
their names).

I guess I'm kind of worried by all this talk of UTF-8 or ISO-8859-1
*strings*.  Strings are arrays of characters, not arrays of encodings
of characters (which would likely be arrays of octets).

--tim (not an expert on unicode)
From: Peter Seibel
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <m37jxzgrmn.fsf@javamonkey.com>
Adam Warner <······@consulting.net.nz> writes:

> Hi Christopher C. Stacy,
>
>>>>>>> On Fri, 05 Mar 2004 17:21:29 +1300, Adam Warner ("Adam") writes:
>>  Adam> Say I want to redefine how strings are printed.
>> 
>> In general, you're not allowed to redefine the Common Lisp runtime
>> system. Defining a PRINT-OBJECT method on a built-in class like STRING
>> would be in the same vein as redefining LISP:CAR.  Lisp might continue
>> to work, might do what you expected, or (as in your experiment with
>> CMUCL) might sometimes do sort of what you expected, or might just
>> crash.
>> 
>> CLHS says:
>>   "Users may write methods for print-object for their own classes
>>    if they do not wish to inherit an implementation-dependent method."
>
> Thanks for explaining the issue Christopher (I initially
> misunderstood the sentence by skipping over "their own"). I was
> hoping to transparently print CMUCL and SBCL ISO-8859-1 8-bit
> strings as UTF-8 by redefining the string printer. I can easily do
> the subset of UTF-8 to ISO-8859-1 at input by redefining the
> double-quote macro character. But there appears to be no defined way
> to alter the string printer's output short of hacking each
> implementation's source.

Depending on what exactly you're trying to do, you may be able to use
the pretty printer since as far as I can tell there's no prohibition
against creating your own *print-pprint-dispatch* table with custom
dispatch functions for built-in types. For instance, here's an example
of using the pretty printer to output strings is something like Java's
unicode escape syntax.

  CL-USER> (flet ((output-encoded (stream char) 
                    (if (<= 0 (char-code char) 255)
                      (format stream "~c" char)
                      (format stream "\\u~4,'0d" (char-code char)))))
             (let ((*print-pprint-dispatch* (copy-pprint-dispatch)))
             (set-pprint-dispatch
              'string
              #'(lambda (stream string)
                   (let ((*print-pretty* nil))
                     (format stream "#u\"")
                     (loop for char across string
                           do (output-encoded stream char))
                      (format stream "\""))))
             (pprint (format nil "foo~cbar" (code-char 300)))))

  #u"foo\u0300bar"
  ; No value
  CL-USER> 

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Adam Warner
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <pan.2004.03.05.22.45.35.483699@consulting.net.nz>
Hi Peter Seibel,

> Depending on what exactly you're trying to do, you may be able to use
> the pretty printer since as far as I can tell there's no prohibition
> against creating your own *print-pprint-dispatch* table with custom
> dispatch functions for built-in types.

This is valuable information thanks Peter! And yes it would solve the
problem of printing the built-in 8 bit strings as another encoding.

Tim, here's an example of the issue. Like the rest of the Linux/Unix world
I am standardising upon UTF-8 encoding. In a traditional 8 bit string Lisp
if I evaluate (string (char-code 240)) the string does not print correctly
since I fed garbage to a UTF-8 stream.

If I go over to CLISP and evaluate the same form "ð" is returned. If I
paste this string into CMUCL or SBCL I will find it has a length of 2. The
semantics of the character have been broken (it appears like two
characters to CMUCL and SBCL when it is the single character with
character code 240). These semantics can be fixed by modifying the
double-quote reader macro so whenever these two bytes are seen the single
byte 240 is feed to the Lisp.

In the other direction I can now create a custom pretty printer dispatch
for 8 bit strings. Then whenever a character code from 128 to 255 is about
to be printed the UTF-8 sequence of dual bytes will be printed instead.

Regards,
Adam
From: Tim Bradshaw
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <ey3ishgsue7.fsf@cley.com>
* Adam Warner wrote:

> If I go over to CLISP and evaluate the same form "ð" is returned. If I
> paste this string into CMUCL or SBCL I will find it has a length of 2. The
> semantics of the character have been broken (it appears like two
> characters to CMUCL and SBCL when it is the single character with
> character code 240). These semantics can be fixed by modifying the
> double-quote reader macro so whenever these two bytes are seen the single
> byte 240 is feed to the Lisp.

I started to write a long reply to this, but I don't have time.
If only Erik was still here he'd put it much better than I can.

To see what is wrong with your proposed `solution' and how you've
misunderstood what the problem is, consider what happens if you have
your clever string-reader.  What happens, for instance, for symbols?
What happens if I write a parser for my own little language which
expects to be able to read characters from a stream?  What is the fix
for this?

--tim
From: Steven M. Haflich
Subject: Re: Redefining how a standard object prints
Date: 
Message-ID: <bWv2c.34990$6U1.2168@newssvr25.news.prodigy.com>
Adam Warner wrote:

> Thanks for explaining the issue Christopher (I initially misunderstood the
> sentence by skipping over "their own"). I was hoping to transparently
> print CMUCL and SBCL ISO-8859-1 8-bit strings as UTF-8 by redefining the
> string printer. I can easily do the subset of UTF-8 to ISO-8859-1 at input
> by redefining the double-quote macro character. But there appears to be no
> defined way to alter the string printer's output short of hacking each
> implementation's source.

The implication that you are not allowed to define print-object methods on
system classes is simply incorrect.

You _are_ allowed to define a print-object method provided one of the
two required arguments to the gf is not a system class.  If the stream
argument is of your own specialized stream, then the definition is
allowed with some serious caveats:

  - There is no way in the portable language to define your own
specialized stream class.  But it is possible in some implementations.
  - Argument precedence and method sorting might still prevent your method
from being called.  In particular, you might define a method on string,
while the implementation might have different methods on subclasses of
string.
  - As mentioned in the ANS, it is generally not a good idea to specialize
print-object methods on the stream argument because some printer operations
may use encapsulating streams (or whatever) and the stream to which an
object is printed might not be the stream passed to the outermost print
function, e.g. print, write, format, or prin1.  This is most likely to be
a problem if pretty printing.

So, the suggestion elsewhere that you can customize printer syntax with the
pretty printer, using a custom pprint-dispatch table.  print-object methods
are global and not under localized program control, but a pprint-dispatch
table cn be passed around and *print-pprint-dispatch* lambda bound at will.

But you should also consider whether you want to use the printer at all.
Programmers often use the reader or printer for specialized I/O operations
that have little to do with what the reader and printer do, which is
process Lisp syntax.  You probably should be using the printer if you
are printing Lisp data of which strings are only one component.  But if you
are outputing textual data to make some sort of report with no other Lisp
syntax, there is little reason to use the printer at all.