From: Duncan Harvey
Subject: Computing the most specialised type representing a union of types
Date: 
Message-ID: <1ew2zeo.1atrs664nwaygN%spam@hubris2.demon.co.uk>
I *think* the simplest method of generating a type specification
representing the union of some types is to place all those type
specifications into a list that has OR as its car, e.g.

  (union-type 'fixnum 'base-char 'trout)
    => (OR FIXNUM BASE-CHAR TROUT)

This could be used as in the following (simplified) example code:

(defun most-specialised-union-type (list-of-type-specs)
  (case (length list-of-type-specs)
    ((0) 't)
    ((1) (first list-of-type-specs))
    (otherwise (list* 'or list-of-type-specs))))

(defun join-strings (list-of-strings)
  "A safe version of (apply #'concatenate 'string list-of-strings)."
  (let ((result (make-string (reduce #'+ (mapcar #'length list-of-strings))
                             :element-type (most-specialised-union-type 
                                            (mapcar #'array-element-type
                                                    list-of-strings))))
        (insertion-point 0))
    (dolist (string list-of-strings result)
      (setf result (replace result string :start1 insertion-point))
      (incf insertion-point (length string)))))

As can be seen from the following transcript, JOIN-STRINGS will now provide
the 'thinnest' string-type that can accommodate all the characters provided
in the arguments.  In an implementation that provides, say, 8-bit BASE-CHARs
and 16-bit EXTENDED-CHARs this could conserve memory.

> (type-of "Hello World!")
(SIMPLE-BASE-STRING 12)
> (join-strings (list "Hello" " " "world" "!"))
"Hello world!"
> (type-of (join-strings (list "Hello" " " "world" "!")))
(SIMPLE-BASE-STRING 12)
> (setf wide-string (string #\454))
[string with weird character present]
> (type-of wide-string)
(SIMPLE-ARRAY CHARACTER (1))
> (type-of (join-strings (list "Hello" wide-string "world" "!")))
(SIMPLE-ARRAY CHARACTER (12))

My two questions are:

1) is this kind of run-time type handling OK?  Or is it frowned upon (as
wanton use of EVAL would be for instance)?  Is what I've done above 'bad' in
some way?

2) if it is a reasonable thing to do, is there a better means of determining
the union type?  Specifically one that doesn't require consing the typespec
list every time?  (I see the phrase 'false economy' fast approaching ;->)

-- 
Duncan Harvey
"Smiling and waving. Before letting himself fall."
                       -- Anja Garbarek, The Diver

From: Barry Margolin
Subject: Re: Computing the most specialised type representing a union of types
Date: 
Message-ID: <j7117.28$on2.293@burlma1-snr2>
In article <···························@hubris2.demon.co.uk>,
Duncan Harvey <······@hubris2.demon.co.uk> wrote:
>I *think* the simplest method of generating a type specification
>representing the union of some types is to place all those type
>specifications into a list that has OR as its car, e.g.
>
>  (union-type 'fixnum 'base-char 'trout)
>    => (OR FIXNUM BASE-CHAR TROUT)
>
>This could be used as in the following (simplified) example code:
>
>(defun most-specialised-union-type (list-of-type-specs)
>  (case (length list-of-type-specs)
>    ((0) 't)
>    ((1) (first list-of-type-specs))
>    (otherwise (list* 'or list-of-type-specs))))

You might want to include a check for all the types being the same:

     (otherwise
       (if (loop with first = (first list-of-type-specs)
                 for type in (rest list-of-type-specs)
                 every (equal first type))
           (first list-of-type-specs)
           (cons 'or list-of-type-specs)))

In your tests it looks like the implementation's handling of OR performs
this check (and other checks for one type being a supertype of the rest),
but since this is a pretty easy check to do beforehand you might want to do
it just in case the implementation doesn't optimize.

>1) is this kind of run-time type handling OK?  Or is it frowned upon (as
>wanton use of EVAL would be for instance)?  Is what I've done above 'bad' in
>some way?

It seems reasonable to me.  It's not often necessary, as the application
writer usually knows the type appropriate in a situation.  It might not
necessarily be the most specialized type; for instance, if you're working
with strings you might use a general string rather than a specialized one,
and put up with the extra overhead even when no extended characters are in
it.

>2) if it is a reasonable thing to do, is there a better means of determining
>the union type?  Specifically one that doesn't require consing the typespec
>list every time?  (I see the phrase 'false economy' fast approaching ;->)

If you're only doing this in places where you're already doing quite a bit
of other consing, as in your example, I don't think this would cause much
trouble.  These typespecs will become garbage almost immediately, so an
ephemeral GC will deal with them efficiently.

-- 
Barry Margolin, ······@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: Duncan Harvey
Subject: Re: Computing the most specialised type representing a union of types
Date: 
Message-ID: <1ew914w.18xgw8gv90udwN%spam@hubris2.demon.co.uk>
Barry Margolin <······@genuity.net> wrote:

> In article <···························@hubris2.demon.co.uk>,
> Duncan Harvey <······@hubris2.demon.co.uk> wrote:
> >1) is this kind of run-time type handling OK?  Or is it frowned upon (as
> >wanton use of EVAL would be for instance)?  Is what I've done above 'bad' in
> >some way?
> 
> It seems reasonable to me.  It's not often necessary, as the application
> writer usually knows the type appropriate in a situation.  

I'm mildly surprised that Common Lisp doesn't have a standard way of
computing (not just expressing) this for any and all types.

Given two objects you can call SUBTYPEP twice, with both objects,
reversing the arguments in successive calls.  This will give you a
common supertype if and only if both types share a 'path' through the
type DAG (are on the same branch of the tree).[1]  But, in general, you
can no further with the standard language.

In practice the only problem I have with generating the compound (OR
...) type specifier at run-time is the consing.  And if that consing is
*really*, *actually* burning you, I'd imagine you'd be able to solve the
problem in some other, non-general manner.

Errr, so I'm agreeing with Barry above.  And definitely /not/ whining
about the language, OK? :-)

[1] Warning: for a paragraph that sounds so sure of itself, it's based
on too little knowledge.  Skepticism recommended.
-- 
Duncan Harvey
"Smiling and waving. Before letting himself fall."
                       -- Anja Garbarek, The Diver