From:  (typep 'nil '(satisfies identity)) => ?
Subject: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130084870.135431.100230@g14g2000cwa.googlegroups.com>
Hi, since 2 years I'm programming intensively in common lisp. I find it
a truly dynamic and smart language. There is one thing about it, I
really hate: "t" stands for the type true and for the boolean
expression true.

Has anybody an idea how this could be fixed in common lisp?

From: Kaz Kylheku
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130086894.581785.315870@g49g2000cwa.googlegroups.com>
> (typep 'nil '(satisfies identity)) => ? wrote:
> Hi, since 2 years I'm programming intensively in common lisp. I find it
> a truly dynamic and smart language. There is one thing about it, I
> really hate: "t" stands for the type true and for the boolean
> expression true.

T does not stand for ``true''. There is a class called T which is the
superclass of all classes. When you specialize a method to that type,
it can ``catch'' any object---including NIL!!! So how can you call it a
``tue type''?

So you should instead consider the letter T, when it denotes a class,
as standing for some (t)ype --- any type.

You see this convention outside of Lisp. For instance, in C++
templates, the letter T is often used as the name of the template
parameter that takes the substituted type.

This T thing is quite useful for tri-state interfaces. For instance in
FORMAT, the symbol T represents the standard output stream, and NIL
means output is collected into a string via a string stream. Or you can
pass a stream object.

> Has anybody an idea how this could be fixed in common lisp?

Something that you hate probably needs to be fixed in your head. At
least that's the first place to look.

In Lisp, you have to get used to the idea that a symbol is an
intersection of meanings, whose interpretation is context-dependent.
From: justinhj
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130089577.611665.190440@g47g2000cwa.googlegroups.com>
Kaz Kylheku wrote:

> Something that you hate probably needs to be fixed in your head. At
> least that's the first place to look.

That sounds like a very useful piece of advice for life in general.
From:  (typep 'nil '(satisfies identity)) => ?
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130107748.377793.18610@f14g2000cwb.googlegroups.com>
hi,
first of all in some implementations:
;; Current reader case mode: :case-sensitive-lower  "t" corrisponds to
"T"!

second I am aware of the different meanings of t in cl, there are at
least 3!

... and an intersection of meanings is often a lost of information. in
some cases i would  like to keep it. that was the question!

my head is all rigth, sometimes I get headache from postings.
From: Paul F. Dietz
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <bsSdndPuj_A0IsbenZ2dnUVZ_tOdnZ2d@dls.net>
> Hi, since 2 years I'm programming intensively in common lisp. I find it
> a truly dynamic and smart language. There is one thing about it, I
> really hate: "t" stands for the type true and for the boolean
> expression true.
> 
> Has anybody an idea how this could be fixed in common lisp?

You could stop hating it.  That would fix your problem.

	Paul
From: Pascal Bourguignon
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <87u0f813xw.fsf@thalassa.informatimago.com>
" (typep 'nil '(satisfies identity)) => ?" <··············@yahoo.it> writes:

> Hi, since 2 years I'm programming intensively in common lisp. I find it
> a truly dynamic and smart language. There is one thing about it, I
> really hate: "t" stands for the type true and for the boolean
> expression true.
>
> Has anybody an idea how this could be fixed in common lisp?

In CL, most of the time, true is not only T, it's anything not NIL.

And since booleans maps to sets:

    For a,b in {T,NIL}  a and b = a inter b 
                        a or  b = a union b 
                          not a = complement of a in T.

    T   = the set of all possible values.
    NIL = the empty set.

it's natural to use T to denote true, and NIL to denote false.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Litter box not here.
You must have moved it again.
I'll poop in the sink. 
From:  (typep 'nil '(satisfies identity)) => ?
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130109089.648464.191950@g43g2000cwa.googlegroups.com>
I'm aware of the different meanings of t in cl and of those for nil.
but sometimes I would like to distinguish between them. In all user
functions anybody is able to use more or different return values. I've
seen no way for doing so in lisp functions ... except to define my own
and shadow the old ones ...
;-((
From: ··············@hotmail.com
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130117980.413299.272430@f14g2000cwb.googlegroups.com>
 (typep 'nil '(satisfies identity)) => ? wrote:
> I'm aware of the different meanings of t in cl and of those for nil.
> but sometimes I would like to distinguish between them.

Humans can make use of this wonderful thing called "context" to make
sense out of what they read. It seems to me that the context of "naming
a class of objects" and "a boolean value" are quite different things.

You should be quite glad, however, to know that there is a distinction
between NIL and the type NULL. Meditate on this for some time.

As for "how this could be fixed in Common Lisp?" the answer is

1) it can't  (it is basic to the definition of Common Lisp)
2) it doesn't need to be fixed (you have failed, as it true for most
trolls in this group, to provide a single concrete case in which this
causes the slightest difficulty in programming).

Perhaps you could stop using computers, and turn to something more
amenable to your personality.
From: Cameron MacKinnon
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <a9adnap8q6eKx8HeRVn-3g@rogers.com>
··············@hotmail.com wrote:
>  (typep 'nil '(satisfies identity)) => ? wrote:
> 
>>I'm aware of the different meanings of t in cl and of those for nil.
>>but sometimes I would like to distinguish between them.

> As for "how this could be fixed in Common Lisp?" the answer is
> 
> 1) it can't  (it is basic to the definition of Common Lisp)

Really? Certainly if the OP wants a dialect where the various meanings 
of T have distinct symbols, it shouldn't be too difficult to create a 
code walker which would transform the preferred dialect to CL. Then the 
OP could program in (typep OP)'s preferred dialect.

> 2) it doesn't need to be fixed (you have failed, as it true for most
> trolls in this group, to provide a single concrete case in which this
> causes the slightest difficulty in programming).

The beauty of Lisp is that it is malleable enough that one isn't a slave 
to linguistic martinets. The bar to language change is low enough that 
"I want to" suffices as a reason.
From: Thomas F. Burdick
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <xcvacgzz89i.fsf@conquest.OCF.Berkeley.EDU>
" (typep 'nil '(satisfies identity)) => ?" <··············@yahoo.it> writes:

> Hi, since 2 years I'm programming intensively in common lisp. I find it
> a truly dynamic and smart language. There is one thing about it, I
> really hate: "t" stands for the type true and for the boolean
> expression true.

T is the "top" type.  I'm personally glad that the bottom type is
named NIL, not _|_, not the least because of CL's escaping rules.

The true type is (MEMBER T)

> Has anybody an idea how this could be fixed in common lisp?

What's broken?

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | Free Mumia Abu-Jamal! |
     ,--'    _,'   | Abolish the racist    |
    /       /      | death penalty!        |
   (   -.  |       `-----------------------'
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: ····@unreal.uncom
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <27gol1tjh5360g3s23p5qmi6lhd23j5frg@4ax.com>
On 23 Oct 2005 09:27:50 -0700, " (typep 'nil '(satisfies identity)) =>
?" <··············@yahoo.it> wrote:

>really hate: "t" stands for the type true and for the boolean
>expression true.

The overloading is not what bothers me about T.  What bothers me is
that T uses the entire namespace of one-letter symbols that start with
the letter T, while NIL only uses a small fraction of the namespace of
3-letter symbols that start with the letter N.  Why that bothers me is
that NIL is many orders of magnitude more important than T, and
deserves a much bigger share of the namespace pie.  It's simply not
fair, that T can be such a pig, while NIL has to do most of the work.

A practical limitation imposed by this arbitrary unfairness is having
to find other names for such common variables as T for temperature, T
for time, T for temporary, T for test, etc.  CL fans complain about
having to use LST instead of LIST in Scheme, but that's a minor
complaint compared to hogging T.

As for overloading, the use of NIL as both false and empty is really
an abstraction, taking the ideas of false and empty, and abstracting
what they have in common, into a more fundamental and more powerful
idea, and making that powerful idea a fundamental concept of our
language.  So it's not overloading, just abstraction.
From: Cameron MacKinnon
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <SaSdneRSZZd1wMHenZ2dnUVZ_tGdnZ2d@rogers.com>
····@unreal.uncom wrote:
> The overloading is not what bothers me about T.  What bothers me is
> that T uses the entire namespace of one-letter symbols that start with
> the letter T [...]
> 
> A practical limitation imposed by this arbitrary unfairness is having
> to find other names for such common variables as T for temperature, T
> for time, T for temporary, T for test, etc.

(let ((*readtable* (copy-readtable nil)))
    (setf (readtable-case *readtable*) :rot13)

Of course, then you have to find another name for all your 'g' one 
letter symbols...
From: Pascal Costanza
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <3s34skFlvftkU1@individual.net>
····@unreal.uncom wrote:

> A practical limitation imposed by this arbitrary unfairness is having
> to find other names for such common variables as T for temperature, T
> for time, T for temporary, T for test, etc.  CL fans complain about
> having to use LST instead of LIST in Scheme, but that's a minor
> complaint compared to hogging T.

It's indeed a pity that the name 't is "used up" by the common-lisp 
package because it is indeed too easy to forget about that fact. 
However, fortunately this is straightforward to fix. You can just shadow 
't and then redefine it in whatever way you like:

(shadow 't)

(let ((t 50))
   (print t))

It's still possible to refer to Common Lisp's 't by typing common-lisp:t 
or cl:t.

However, instead of using the function 'shadow this way, it's better to 
set up a package in which you have shadowed 't correctly. This has 
better maintainability characteristics. See the docs for defpackage for 
further details.


Pascal

-- 
My website: http://p-cos.net
Closer to MOP & ContextL:
http://common-lisp.net/project/closer/
From: Thomas A. Russ
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <ymill0i221q.fsf@sevak.isi.edu>
····@unreal.uncom writes:

> A practical limitation imposed by this arbitrary unfairness is having
> to find other names for such common variables as T for temperature, T
> for time, T for temporary, T for test, etc.  CL fans complain about
> having to use LST instead of LIST in Scheme, but that's a minor
> complaint compared to hogging T.

Well, of course encouraging programmers to name their variables
TEMPERATURE (or TEMP), TIME, TEMP, TEST etc. could be considered a good
thing.

But on balance, I suspect that if things were being done denovo, T would
today be called TRUE instead....

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Christophe Rhodes
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <sq3bmqbric.fsf@cam.ac.uk>
···@sevak.isi.edu (Thomas A. Russ) writes:

> ····@unreal.uncom writes:
>
>> A practical limitation imposed by this arbitrary unfairness is having
>> to find other names for such common variables as T for temperature, T
>> for time, T for temporary, T for test, etc.  CL fans complain about
>> having to use LST instead of LIST in Scheme, but that's a minor
>> complaint compared to hogging T.
>
> Well, of course encouraging programmers to name their variables
> TEMPERATURE (or TEMP), TIME, TEMP, TEST etc. could be considered a good
> thing.

It "could be considered" a good thing, but I haven't heard an argument
that convinces me in the general case, particularly for the cases of
time and temperature: often, code that uses those concepts will be
implementing algorithms described in mathematical notation, and in
such cases staying close to the notation of the paper has value over
and above the ability to read long variables in simple isolation.

Christophe
From: Alan Crowe
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <86br1fc6h9.fsf@cawtech.freeserve.co.uk>
····@unreal.uncom writes:
> A practical limitation imposed by this arbitrary unfairness is having
> to find other names for such common variables as T for temperature, T
> for time, T for temporary, T for test, etc. 

I agree. I do this

;; I hate getting caught out by the fact that CL:T is a constant
;; I want to use t for time
(defconstant true 'cl:t)
(shadow 't)
;; Ofcourse, now I get caught out when (format t ...) doesn't work.
;; It needs to be (format true ...) so that format sees CL:T
;; Format doesn't accept CL-USER:T 

in http://alan.crowe.name/clx/3D-viewer/index.html

it would have been much better to have used "true" instead
of "t"

Don't forget t for thymine. All your DNA code wants to use
the letters A,C,G,T as variable names.

Alan Crowe
Edinburgh
Scotland
From: Pascal Costanza
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <3s4b81Flsq1dU1@individual.net>
Alan Crowe wrote:
> ····@unreal.uncom writes:
> 
>>A practical limitation imposed by this arbitrary unfairness is having
>>to find other names for such common variables as T for temperature, T
>>for time, T for temporary, T for test, etc. 
> 
> 
> I agree. I do this
> 
> ;; I hate getting caught out by the fact that CL:T is a constant
> ;; I want to use t for time
> (defconstant true 'cl:t)
> (shadow 't)
> ;; Ofcourse, now I get caught out when (format t ...) doesn't work.
> ;; It needs to be (format true ...) so that format sees CL:T
> ;; Format doesn't accept CL-USER:T 
> 
> in http://alan.crowe.name/clx/3D-viewer/index.html

What about modifying the readtable such that #t gives cl:t and #f gives 
cl:nil? (just brainstorming...)


Pascal

-- 
My website: http://p-cos.net
Closer to MOP & ContextL:
http://common-lisp.net/project/closer/
From: Alan Crowe
Subject: Re: "t" stands for the type true and for the boolean expression true  ..
Date: 
Message-ID: <8664rmdgi7.fsf@cawtech.freeserve.co.uk>
Pascal Costanza <··@p-cos.net> writes:
> What about modifying the readtable such that #t gives cl:t and #f gives 
> cl:nil? (just brainstorming...)

I don't have a problem with nil. If it had been shortened to
"n" for no that would have caused problems.

I don't like punctuation characters in my code. I much
prefer "true" to "#t". Perhaps that is due to being a native
english speaker. 

Alan Crowe
Edinburgh
Scotland
From: Marco Antoniotti
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <Tzt7f.46$pa3.19838@typhoon.nyu.edu>
Alan Crowe wrote:
> ····@unreal.uncom writes:
> 
>>A practical limitation imposed by this arbitrary unfairness is having
>>to find other names for such common variables as T for temperature, T
>>for time, T for temporary, T for test, etc. 
> 
> 
> I agree. I do this
> 
> ;; I hate getting caught out by the fact that CL:T is a constant
> ;; I want to use t for time
> (defconstant true 'cl:t)
> (shadow 't)
> ;; Ofcourse, now I get caught out when (format t ...) doesn't work.
> ;; It needs to be (format true ...) so that format sees CL:T
> ;; Format doesn't accept CL-USER:T 
> 
> in http://alan.crowe.name/clx/3D-viewer/index.html
> 
> it would have been much better to have used "true" instead
> of "t"
> 
> Don't forget t for thymine. All your DNA code wants to use
> the letters A,C,G,T as variable names.

Nope.  My DNA handling code wants to use the symbol 't to denote 
thymine.  As such I can use it as much as I want.

Cheers
--
Marco
From: Rob Warnock
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <u8adnWTFMriGrMLeRVn-sA@speakeasy.net>
Marco Antoniotti  <·······@cs.nyu.edu> wrote:
+---------------
| Alan Crowe wrote:
| > ····@unreal.uncom writes:
| > > it would have been much better to have used "true" instead of "t"
| > 
| > Don't forget t for thymine. All your DNA code wants to use
| > the letters A,C,G,T as variable names.
| 
| Nope.  My DNA handling code wants to use the symbol 't to
| denote thymine.  As such I can use it as much as I want.
+---------------

Indeed:

    > (let ((acgt (vector 'a 'c 'g 't)))
        (loop for i below 200 collect (aref acgt (random 4))))

    (G T T C C G G C C G A C T G A A T T C A T T G T A C A G G C A C
     G T G A G C C C G C G C A C G G G G C T G G G A T T T G G A T G
     T A T A T A A T G G A G T C G G A T A G C G A C G T C C G A C C
     A C A A A T C T C C G C G G C G A C C T T A G A G A C T A T A G
     G A C G G T G G A T T A T T C T A A T T T A C G A C C G C A G T
     G T T G C C A T C T G C A G C A C C A A T C C G A A A A A G G C
     G G G T A T C G)
    > 

Hey, this DNA stuff is fun!!   ;-}  ;-}


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Marco Antoniotti
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <F9M7f.47$pa3.19928@typhoon.nyu.edu>
Rob Warnock wrote:
> Marco Antoniotti  <·······@cs.nyu.edu> wrote:
> +---------------
> | Alan Crowe wrote:
> | > ····@unreal.uncom writes:
> | > > it would have been much better to have used "true" instead of "t"
> | > 
> | > Don't forget t for thymine. All your DNA code wants to use
> | > the letters A,C,G,T as variable names.
> | 
> | Nope.  My DNA handling code wants to use the symbol 't to
> | denote thymine.  As such I can use it as much as I want.
> +---------------
> 
> Indeed:
> 
>     > (let ((acgt (vector 'a 'c 'g 't)))
>         (loop for i below 200 collect (aref acgt (random 4))))
> 
>     (G T T C C G G C C G A C T G A A T T C A T T G T A C A G G C A C
>      G T G A G C C C G C G C A C G G G G C T G G G A T T T G G A T G
>      T A T A T A A T G G A G T C G G A T A G C G A C G T C C G A C C
>      A C A A A T C T C C G C G G C G A C C T T A G A G A C T A T A G
>      G A C G G T G G A T T A T T C T A A T T T A C G A C C G C A G T
>      G T T G C C A T C T G C A G C A C C A A T C C G A A A A A G G C
>      G G G T A T C G)
>     > 
> 
> Hey, this DNA stuff is fun!!   ;-}  ;-}

Yes.  Apart from the fact that in this case you really do not want to 
use a symbol, but probably a character.  A symbol is too heavy for this 
sort of things. :}



Cheers
--
Marco
From: Ulrich Hobelmann
Subject: Re: "t" stands for the type true and for the boolean expression true   ..
Date: 
Message-ID: <3s9hmeFn030eU2@individual.net>
Marco Antoniotti wrote:
>> Indeed:
>>
>>     > (let ((acgt (vector 'a 'c 'g 't)))
>>         (loop for i below 200 collect (aref acgt (random 4))))
>>
>>     (G T T C C G G C C G A C T G A A T T C A T T G T A C A G G C A C
>>      G T G A G C C C G C G C A C G G G G C T G G G A T T T G G A T G
>>      T A T A T A A T G G A G T C G G A T A G C G A C G T C C G A C C
>>      A C A A A T C T C C G C G G C G A C C T T A G A G A C T A T A G
>>      G A C G G T G G A T T A T T C T A A T T T A C G A C C G C A G T
>>      G T T G C C A T C T G C A G C A C C A A T C C G A A A A A G G C
>>      G G G T A T C G)
>>     >
>> Hey, this DNA stuff is fun!!   ;-}  ;-}
> 
> Yes.  Apart from the fact that in this case you really do not want to 
> use a symbol, but probably a character.  A symbol is too heavy for this 
> sort of things. :}

Why?  Are symbols any heavier than characters (except for their GC 
footprint, but all symbols usually share one symbol instance, if I'm not 
mistaken)?

I'd guess that symbols are just pointers (typically 32bit), while 
characters would usually also be padded to 32 bits.

So in my mental model symbols only create overhead once (when read), but 
are lightweight objects afterwards, indeed one of the most basic 
constructs in Lisp.

-- 
The road to hell is paved with good intentions.
From: Duane Rettig
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <4fyqoz1xw.fsf@franz.com>
Ulrich Hobelmann <···········@web.de> writes:

> Marco Antoniotti wrote:
>>> Indeed:
>>>
>>>     > (let ((acgt (vector 'a 'c 'g 't)))
>>>         (loop for i below 200 collect (aref acgt (random 4))))
>>>
>>>     (G T T C C G G C C G A C T G A A T T C A T T G T A C A G G C A C
>>>      G T G A G C C C G C G C A C G G G G C T G G G A T T T G G A T G
>>>      T A T A T A A T G G A G T C G G A T A G C G A C G T C C G A C C
>>>      A C A A A T C T C C G C G G C G A C C T T A G A G A C T A T A G
>>>      G A C G G T G G A T T A T T C T A A T T T A C G A C C G C A G T
>>>      G T T G C C A T C T G C A G C A C C A A T C C G A A A A A G G C
>>>      G G G T A T C G)
>>>     >
>>> Hey, this DNA stuff is fun!!   ;-}  ;-}
>> Yes.  Apart from the fact that in this case you really do not want
>> to use a symbol, but probably a character.  A symbol is too heavy
>> for this sort of things. :}
>
> Why?  Are symbols any heavier than characters (except for their GC
> footprint, but all symbols usually share one symbol instance, if I'm
> not mistaken)?
>
> I'd guess that symbols are just pointers (typically 32bit), while
> characters would usually also be padded to 32 bits.
>
> So in my mental model symbols only create overhead once (when read),
> but are lightweight objects afterwards, indeed one of the most basic
> constructs in Lisp.

Marco answered you correctly here, but he didn't say _how_ characters
might be optimized for space.

Most of the genome analysis projects I know about do indeed use
characters rather than symbols, and they store large strings of
DNA material into ... strings!

Strings are indeed the optimization medium used to maximize space
usage via characters.  When you store a symbol into an vector,
it must be at least (simple-array t (*)), which requires 4 bytes
per element in a 32-bit lisp and 8 bytes per element in a 64-bit
lisp.  When you store characters into a vector, you can also use
the simple-general-vector, but it is not the simplest - the simplest
you can go is (simple-array character (*)), which requires 1 or 2
bytes per element, in the lisps that I know of.  So you can get
anywhere from a 2x to 8x space differential by storing DNA sequences
as strings.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Christophe Rhodes
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <sq7jc0maeg.fsf@cam.ac.uk>
Duane Rettig <·····@franz.com> writes:

> Strings are indeed the optimization medium used to maximize space
> usage via characters.  When you store a symbol into an vector,
> it must be at least (simple-array t (*)), which requires 4 bytes
> per element in a 32-bit lisp and 8 bytes per element in a 64-bit
> lisp.  When you store characters into a vector, you can also use
> the simple-general-vector, but it is not the simplest - the simplest
> you can go is (simple-array character (*)), which requires 1 or 2
> bytes per element, in the lisps that I know of. 

For general characters, it's true that the simplest you can go is
(simple-array character (*)) [ but you may be interested to know that
in SBCL this requires 4 bytes per element, as it does in the general
case in CLISP, though in that implementation there is an extra level
of indirection which admits compaction ].  However...

> So you can get anywhere from a 2x to 8x space differential by
> storing DNA sequences as strings.

If you're interested in space, I'd recommend storing DNA sequences in
objects of type (simple-array (unsigned-byte 2) (*)), which would give
you an extra factor of four over the strings, assuming that your lisp
implementation supports that specialization.

If it doesn't, since #\A, #\C, #\G, #\T and even #\U are
standard-characters, I would use (simple-array standard-char (*)),
equivalent to simple-base-string, giving the lisp the information that
you don't need general characters to represent sequences of DNA bases.

Christophe
From: Duane Rettig
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <4br1cyvk6.fsf@franz.com>
Christophe Rhodes <·····@cam.ac.uk> writes:

> Duane Rettig <·····@franz.com> writes:
>
>> Strings are indeed the optimization medium used to maximize space
>> usage via characters.  When you store a symbol into an vector,
>> it must be at least (simple-array t (*)), which requires 4 bytes
>> per element in a 32-bit lisp and 8 bytes per element in a 64-bit
>> lisp.  When you store characters into a vector, you can also use
>> the simple-general-vector, but it is not the simplest - the simplest
>> you can go is (simple-array character (*)), which requires 1 or 2
>> bytes per element, in the lisps that I know of. 
>
> For general characters, it's true that the simplest you can go is
> (simple-array character (*)) [ but you may be interested to know that
> in SBCL this requires 4 bytes per element, as it does in the general
> case in CLISP, though in that implementation there is an extra level
> of indirection which admits compaction ].  However...

Ouch.  No, I did not know that - I presume that this was to get all 21
bits of unicode into each element?  Or perhaps to get the old CLtL1
bits/fonts characteristics to stick?  I guess every lisp makes its
own decisions for tradeoffs...

>> So you can get anywhere from a 2x to 8x space differential by
>> storing DNA sequences as strings.
>
> If you're interested in space, I'd recommend storing DNA sequences in
> objects of type (simple-array (unsigned-byte 2) (*)), which would give
> you an extra factor of four over the strings, assuming that your lisp
> implementation supports that specialization.

Here we start to see another tradeoff start to become important;
speed.  Accessing sub-octet fields is not pretty, and it can be
downright slow, especially when dealing with multi-gigabyte databases.
And in Allegro CL, we perform some string operations one-word-at-a-time,
covering from two to eight characters per operation.  So the
programmer always needs to choose what tradeoff he is willing
to live with.  It seems like since 64-bits gives you enough potential
space that you'll not have to worry about running out soon, octets
would be an ideal tradeoff between reasonable space usage and speed.

> If it doesn't, since #\A, #\C, #\G, #\T and even #\U are
> standard-characters, I would use (simple-array standard-char (*)),
> equivalent to simple-base-string, giving the lisp the information that
> you don't need general characters to represent sequences of DNA bases.

Yes, this is what I was referring to.  Since Allegro CL only has one
subclass of string, (simple-array standard-char (*)) is equivalent
to (simple-array character (*)) - it is yet another speed tradeoff
where generic code (in the generic sense, not in the CLOS sense)
need discriminate on fewer string types.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Christophe Rhodes
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <sqwtk0ul55.fsf@cam.ac.uk>
Duane Rettig <·····@franz.com> writes:

> Christophe Rhodes <·····@cam.ac.uk> writes:
>
>> For general characters, it's true that the simplest you can go is
>> (simple-array character (*)) [ but you may be interested to know that
>> in SBCL this requires 4 bytes per element, as it does in the general
>> case in CLISP, though in that implementation there is an extra level
>> of indirection which admits compaction ].  However...
>
> Ouch.  No, I did not know that - I presume that this was to get all 21
> bits of unicode into each element?  Or perhaps to get the old CLtL1
> bits/fonts characteristics to stick?  I guess every lisp makes its
> own decisions for tradeoffs...

In both cases it's to support the full Unicode space, yes -- I'm
pretty sure that CLISP doesn't support bits/fonts in characters, and
I'm absolutely sure that SBCL doesn't, though there are three spare
bits that might be fun to play with someday...  The mitigation is, in
SBCL's case, that we also support simple-base-string as a distinct
octet-based representation; in clisp there's an extra pointer, with
possibility for compaction and memory localization on each GC.

(Nem con on the rest of your post :-)

Christophe
From: Bulent Murtezaoglu
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <87u0f4nras.fsf@p4.internal>
>>>>> "DR" == Duane Rettig <·····@franz.com> writes:
[...]
    DR> Most of the genome analysis projects I know about do indeed
    DR> use characters rather than symbols, and they store large
    DR> strings of DNA material into ... strings! [...]

Hmpf.  I assume, then, they use 64 bit machines with >4gig RAM?  It
seems the approach Paul Dietz outlined might have considerable
performance advantages.  Maybe the ability to address bytes directly 
as opposed to shifting and masking is a factor?  I am not familiar 
with the field though, perhaps someone would tell us why the seemingly 
obvious optimization is not used.  

cheers,

BM
From: Duane Rettig
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <47jc0yv83.fsf@franz.com>
Bulent Murtezaoglu <··@acm.org> writes:

>>>>>> "DR" == Duane Rettig <·····@franz.com> writes:
> [...]
>     DR> Most of the genome analysis projects I know about do indeed
>     DR> use characters rather than symbols, and they store large
>     DR> strings of DNA material into ... strings! [...]
>
> Hmpf.  I assume, then, they use 64 bit machines with >4gig RAM?

Yes.

> It
> seems the approach Paul Dietz outlined might have considerable
> performance advantages.  Maybe the ability to address bytes directly 
> as opposed to shifting and masking is a factor?  I am not familiar 
> with the field though, perhaps someone would tell us why the seemingly 
> obvious optimization is not used.  

Yes, see also my response to Christophe on this issue - it is desirable
to maximize both speed and space in this kind of problem.  Going
sub-octet certainly optimizes the size side, but it may de-optimize the
speed size in at least two ways (both the less efficient access of
shift-and-mask pre- or post- processing, and the degradation that
comes from not having built-in functions which work one natural word
at a time).

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: John Thingstad
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <op.szasoea5pqzri1@mjolner.upc.no>
On Wed, 26 Oct 2005 20:43:56 +0200, Duane Rettig <·····@franz.com> wrote:

> Bulent Murtezaoglu <··@acm.org> writes:
>
>>>>>>> "DR" == Duane Rettig <·····@franz.com> writes:
>> [...]
>>     DR> Most of the genome analysis projects I know about do indeed
>>     DR> use characters rather than symbols, and they store large
>>     DR> strings of DNA material into ... strings! [...]
>>
>> Hmpf.  I assume, then, they use 64 bit machines with >4gig RAM?
>
> Yes.
>
>> It
>> seems the approach Paul Dietz outlined might have considerable
>> performance advantages.  Maybe the ability to address bytes directly
>> as opposed to shifting and masking is a factor?  I am not familiar
>> with the field though, perhaps someone would tell us why the seemingly
>> obvious optimization is not used.
>
> Yes, see also my response to Christophe on this issue - it is desirable
> to maximize both speed and space in this kind of problem.  Going
> sub-octet certainly optimizes the size side, but it may de-optimize the
> speed size in at least two ways (both the less efficient access of
> shift-and-mask pre- or post- processing, and the degradation that
> comes from not having built-in functions which work one natural word
> at a time).
>

I am hardly an expert, but the 'obvious solution' to me is to store it in  
a 64 bit unsigned integer array
if you have a 64 bit computer. I would use two bit's to represent a  
base-pair.
Now to compare large sequences (something they seem to do a lot.) you can  
use integer comparison.
This compares 32 pairs in one operation.. Considering you don't have to  
load and store each symbol as
well the speedup should be dramatic. (50 times?)

Further there is the size of the array. Stored in this matter a 3.3 Gph  
would fit
in a gigabyte. In the event you had to compare two such sequences you  
would need twice that.

With a list of 64 bit pointers to symbols you would need 100 Gb.
Imagine the time spent reading this amount of data from disk!

Even with 8 bit character strings (if your 64 bit system still supports  
them!)
it would take 3.3 Gb pr sequence or 6.6 Gb total probably forcing
you to swap from disc.

I normally don't waste time on bit fiddling but in this case the benefits  
seem
enormous. Extracting bits from a integer gives a trivial overhead compared  
to
having to load large chunks of data from disk during a compare or simular  
operation.

I figure I must have missed something. But without seeing a complete list
of desired operations, average sequence size, etc. etc it it is hard to  
see what.

Is there someone her with experince (or knowlege of) with bioinformathics  
that could
tell me what I missed?

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: Christophe Rhodes
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <sqzmove21v.fsf@cam.ac.uk>
"John Thingstad" <··············@chello.no> writes:

> I am hardly an expert, but the 'obvious solution' to me is to store
> it in a 64 bit unsigned integer array if you have a 64 bit
> computer. I would use two bit's to represent a base-pair.  Now to
> compare large sequences (something they seem to do a lot.) you can
> use integer comparison.  This compares 32 pairs in one
> operation.. Considering you don't have to load and store each symbol
> as well the speedup should be dramatic. (50 times?)
>
>  [...]
>
> Is there someone here with experince (or knowlege of) with
> bioinformathics that could tell me what I missed?

IANAbioinformatician.

Integer comparison of the form you are advocating only works if the
two sequences to be compared start at the same base-pair offset modulo
32.  If they don't, you have to shift one entire sequence before you
can compare them using integer operations.

Christophe
From: Paul F. Dietz
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <x8ydnZi9tL6XT_3eRVn-pg@dls.net>
Christophe Rhodes wrote:

> Integer comparison of the form you are advocating only works if the
> two sequences to be compared start at the same base-pair offset modulo
> 32.  If they don't, you have to shift one entire sequence before you
> can compare them using integer operations.

If this sort of thing were important, you'd probably want to customize
your lisp implementation so operations on packed integer vectors
were more efficient.  For example, for comparisons, replicate the main
loop for each value of the offset, and dispatch once at the beginning.
IIUC optimized routines like memcpy do this now.

It might also help to have some way of expressing, in user code, what
the offset was, and have the compiler understand that.

It might also be nice if (array (member A T G C)) were recognized
as a specialized array type. :)

	Paul
From: John Thingstad
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <op.sza2kwjepqzri1@mjolner.upc.no>
On Thu, 27 Oct 2005 11:34:20 +0200, Christophe Rhodes <·····@cam.ac.uk>  
wrote:

> IANAbioinformatician.
>
> Integer comparison of the form you are advocating only works if the
> two sequences to be compared start at the same base-pair offset modulo
> 32.  If they don't, you have to shift one entire sequence before you
> can compare them using integer operations.
>
> Christophe

Yes, but making the ends is a negotiable cost is the sequence is long.
It is at most 32 bits.
That gives you a 16 symbol array of masks.
( ~0b1111110000000000 & value like type)
Not in lisp where you have to take into account 3 bits to represent type  
(or 4),
but the elements are all of the same type so the comparison works.
By the way this is a waste of space in a array where all elements
are known to be of a type.
Add that integers are signed and subtract one byte (a nibble) so 28 bits.
Duane Retting informs me that Allegro CL takes 4 bytes in 1 gulp while  
comparing
on a 64 bit'er. So he get's 4 symbols. I get 14. Still a significant  
improvement.


-- 
Using Opera's revolutionary e-mail client: http://www.operacom/mail/
From: John Thingstad
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <op.sza389zvpqzri1@mjolner.upc.no>
On Thu, 27 Oct 2005 15:03:58 +0200, John Thingstad  
<··············@chello.no> wrote:

> On Thu, 27 Oct 2005 11:34:20 +0200, Christophe Rhodes <·····@cam.ac.uk>  
> wrote:
>
>> IANAbioinformatician.
>>
>> Integer comparison of the form you are advocating only works if the
>> two sequences to be compared start at the same base-pair offset modulo
>> 32.  If they don't, you have to shift one entire sequence before you
>> can compare them using integer operations.
>>
>> Christophe
>
> Yes, but making the ends is a negotiable cost is the sequence is long.
> It is at most 32 bits.
> That gives you a 16 symbol array of masks.
> ( ~0b1111110000000000 & value like type)
> Not in lisp where you have to take into account 3 bits to represent type  
> (or 4),
> but the elements are all of the same type so the comparison works.
> By the way this is a waste of space in a array where all elements
> are known to be of a type.
> Add that integers are signed and subtract one byte (a nibble) so 28 bits.
> Duane Retting informs me that Allegro CL takes 4 bytes in 1 gulp while  
> comparing
> on a 64 bit'er. So he get's 4 symbols. While I get 14. Still a  
> significant improvement.
>
>

I make a lot of stupid statements here. While getting the right point.

1. 'it's at most 32 bits' -- read 64 bits
2. mask is 64 bit
3. subrtact one byte -- bit of course
4. While I get 14 -- 30 actually


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: Duane Rettig
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <4br1b2es7.fsf@franz.com>
"John Thingstad" <··············@chello.no> writes:

> On Thu, 27 Oct 2005 15:03:58 +0200, John Thingstad
> <··············@chello.no> wrote:
>
>> On Thu, 27 Oct 2005 11:34:20 +0200, Christophe Rhodes
>> <·····@cam.ac.uk>  wrote:
>>
>>> IANAbioinformatician.
>>>
>>> Integer comparison of the form you are advocating only works if the
>>> two sequences to be compared start at the same base-pair offset modulo
>>> 32.  If they don't, you have to shift one entire sequence before you
>>> can compare them using integer operations.
>>>
>>> Christophe
>>
>> Yes, but making the ends is a negotiable cost is the sequence is long.
>> It is at most 32 bits.
>> That gives you a 16 symbol array of masks.
>> ( ~0b1111110000000000 & value like type)
>> Not in lisp where you have to take into account 3 bits to represent
>> type  (or 4),
>> but the elements are all of the same type so the comparison works.
>> By the way this is a waste of space in a array where all elements
>> are known to be of a type.
>> Add that integers are signed and subtract one byte (a nibble) so 28 bits.
>> Duane Retting informs me that Allegro CL takes 4 bytes in 1 gulp
>> while  comparing
>> on a 64 bit'er. So he get's 4 symbols. While I get 14. Still a
>> significant improvement.
>>
>>
>
> I make a lot of stupid statements here. While getting the right point.
>
> 1. 'it's at most 32 bits' -- read 64 bits
> 2. mask is 64 bit
> 3. subrtact one byte -- bit of course
> 4. While I get 14 -- 30 actually

5. '4 bytes in 1 gulp' -- '8 bytes in one gulp'

But in general, I'm not following your logic.  Are you thinking that
elements in a specialized array must be tagged?  What is the 14 (which
became 30)?  I can get 16/32 bit pairs out of a word on a 32/64-bit
lisp - all that's needed is to make it a ([un]signed-byte 32/64)
array.

And of course this whole subthread sprouted from my allusion to fast
comparisons, which is not the only factor in manipulating strings
of elements.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: John Thingstad
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <op.szbczccgpqzri1@mjolner.upc.no>
On Thu, 27 Oct 2005 16:52:24 +0200, Duane Rettig <·····@franz.com> wrote:

> "John Thingstad" <··············@chello.no> writes:
>
>> On Thu, 27 Oct 2005 15:03:58 +0200, John Thingstad
>> <··············@chello.no> wrote:
>>
>>> On Thu, 27 Oct 2005 11:34:20 +0200, Christophe Rhodes
>>> <·····@cam.ac.uk>  wrote:
>>>
>>>> IANAbioinformatician.
>>>>
>>>> Integer comparison of the form you are advocating only works if the
>>>> two sequences to be compared start at the same base-pair offset modulo
>>>> 32.  If they don't, you have to shift one entire sequence before you
>>>> can compare them using integer operations.
>>>>
>>>> Christophe
>>>
>>> Yes, but making the ends is a negotiable cost is the sequence is long.
>>> It is at most 32 bits.
>>> That gives you a 16 symbol array of masks.
>>> ( ~0b1111110000000000 & value like type)
>>> Not in lisp where you have to take into account 3 bits to represent
>>> type  (or 4),
>>> but the elements are all of the same type so the comparison works.
>>> By the way this is a waste of space in a array where all elements
>>> are known to be of a type.
>>> Add that integers are signed and subtract one byte (a nibble) so 28  
>>> bits.
>>> Duane Retting informs me that Allegro CL takes 4 bytes in 1 gulp
>>> while  comparing
>>> on a 64 bit'er. So he get's 4 symbols. While I get 14. Still a
>>> significant improvement.
>>>
>>>
>>
>> I make a lot of stupid statements here. While getting the right point.
>>
>> 1. 'it's at most 32 bits' -- read 64 bits
>> 2. mask is 64 bit
>> 3. subrtact one byte -- bit of course
>> 4. While I get 14 -- 30 actually
>
> 5. '4 bytes in 1 gulp' -- '8 bytes in one gulp'
>
> But in general, I'm not following your logic.  Are you thinking that
> elements in a specialized array must be tagged?  What is the 14 (which
> became 30)?  I can get 16/32 bit pairs out of a word on a 32/64-bit
> lisp - all that's needed is to make it a ([un]signed-byte 32/64)
> array.
>
> And of course this whole subthread sprouted from my allusion to fast
> comparisons, which is not the only factor in manipulating strings
> of elements.
>

Sorry about that.

Am I wrong that elements of a specialized array must be tagged?
If not what happens when I assign a simple-array of integer a integer?

My description was based of sequential code.
As Bjorn Linberg points out later in the thread this is not the case for  
long sequences.
My assumption was that it would have been obvious it it was this simple so  
I had
to be wrong. But I couldn't put my finger on what. Now I can.

I am not saying that 'my' compressed version is all lost yet
because I would like to look at a typical application loop and
see what kinda operations it does first.

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: Duane Rettig
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <47jby3lmy.fsf@franz.com>
"John Thingstad" <··············@chello.no> writes:

> On Thu, 27 Oct 2005 16:52:24 +0200, Duane Rettig <·····@franz.com> wrote:
>
>> But in general, I'm not following your logic.  Are you thinking that
>> elements in a specialized array must be tagged?  What is the 14 (which
>> became 30)?  I can get 16/32 bit pairs out of a word on a 32/64-bit
>> lisp - all that's needed is to make it a ([un]signed-byte 32/64)
>> array.
>>
>> And of course this whole subthread sprouted from my allusion to fast
>> comparisons, which is not the only factor in manipulating strings
>> of elements.
>>
>
> Sorry about that.
>
> Am I wrong that elements of a specialized array must be tagged?

Yes, you're wrong.  What purpose would there be in having specialized
arrays, if they had to have tagged elements in them?  Common Lisp
could have been _so_ much simpler if it weren't for specialized
arrays...

> If not what happens when I assign a simple-array of integer a integer?

It gets assigned.

Note below that the element of *a* is just a field of 64 bits, whereas
the element of *b* is just a Lisp value - a bignum, in this case.

CL-USER(1): (defvar *a* (make-array 1 :element-type '(unsigned-byte 64)))
*A*
CL-USER(2): (defvar *b* (make-array 1))
*B*
CL-USER(3): (setf (aref *a* 0) #xdeadbeef0badf00d)
16045690981293355021
CL-USER(4): (setf (aref *b* 0) #xdeadbeef0badf00d)
16045690981293355021
CL-USER(5): *a*
#(16045690981293355021)
CL-USER(6): :i *
A NEW simple (UNSIGNED-BYTE 64) vector (1) @ #x1000943812
   0-> The field #xdeadbeef0badf00d
CL-USER(7): *b*
#(16045690981293355021)
CL-USER(8): :i *
A NEW simple T vector (1) @ #x1000943d62
   0-> A bignum = 16045690981293355021
CL-USER(9): :i 0
A NEW bignum = 16045690981293355021 with 2 bigits @ #x1000944f92
   0-> The field #x0badf00d
   1-> The field #xdeadbeef
CL-USER(10): 

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Thomas Schilling
Subject: Re: "t" stands for the type true and for the boolean expression   true ..
Date: 
Message-ID: <3sf7trFnsvk4U1@news.dfncis.de>
Duane Rettig wrote:

> CL-USER(1): (defvar *a* (make-array 1 :element-type '(unsigned-byte 64)))
> *A*
> CL-USER(2): (defvar *b* (make-array 1))
> *B*
> CL-USER(3): (setf (aref *a* 0) #xdeadbeef0badf00d)
> 16045690981293355021
> CL-USER(4): (setf (aref *b* 0) #xdeadbeef0badf00d)
> 16045690981293355021
> CL-USER(5): *a*
> #(16045690981293355021)
> CL-USER(6): :i *
> A NEW simple (UNSIGNED-BYTE 64) vector (1) @ #x1000943812
>    0-> The field #xdeadbeef0badf00d
> CL-USER(7): *b*
> #(16045690981293355021)
> CL-USER(8): :i *
> A NEW simple T vector (1) @ #x1000943d62
>    0-> A bignum = 16045690981293355021
> CL-USER(9): :i 0
> A NEW bignum = 16045690981293355021 with 2 bigits @ #x1000944f92
>    0-> The field #x0badf00d
>    1-> The field #xdeadbeef

What does this "NEW" mean? Does Allegro support linear types?
From: Duane Rettig
Subject: Re: "t" stands for the type true and for the boolean expression  true ..
Date: 
Message-ID: <4wtjxbga7.fsf@franz.com>
Thomas Schilling <······@yahoo.de> writes:

> Duane Rettig wrote:
>
>> CL-USER(9): :i 0
>> A NEW bignum = 16045690981293355021 with 2 bigits @ #x1000944f92
>>    0-> The field #x0badf00d
>>    1-> The field #xdeadbeef
>
> What does this "NEW" mean? Does Allegro support linear types?

Nah; Common Lisp doesn't do much aliasing, so there's not much need.
The NEW is new in version 7.0, displayed by both by the inspector and by
describe, and it refers to the object's allocation status a la
sys:lispval-storage-type:

http://www.franz.com/support/documentation/7.0/doc/operators/system/lispval-storage-type.htm

If you waited for enough gcs, you'd probably see the description
change to TENURED.

As for linear types, I guess we could use such a system for something
I experimented with many years ago (kids, don't try this at home):

CL-USER(1): (setq x (excl::create-box 'single-float))
0.0
CL-USER(2): (describe x)
0.0 is a NEW SINGLE-FLOAT.
 It is a writable box.
 The hex representation is [#x00000000].
CL-USER(3): (setq y x)
0.0
CL-USER(4): (setf (excl::box-value x) (coerce pi 'single-float))
3.1415927
CL-USER(5): (describe x)
3.1415927 is a NEW SINGLE-FLOAT.
 It is a writable box.
 The hex representation is [#x40490fdb].
CL-USER(6): (describe y)
3.1415927 is a NEW SINGLE-FLOAT.
 It is a writable box.
 The hex representation is [#x40490fdb].
CL-USER(7): (eq x y)
T
CL-USER(8): (setq z (+ x 0))
3.1415927
CL-USER(9): (eq x z)
NIL
CL-USER(10): (eql x z)
T
CL-USER(11): (describe z)
3.1415927 is a NEW SINGLE-FLOAT.
 The hex representation is [#x40490fdb].
CL-USER(12): (setf (excl::box-value z) 1.0)
Error: 3.1415927 is not a single-float box
  [condition type: SIMPLE-ERROR]

Restart actions (select using :continue):
 0: Return to Top Level (an "abort" restart).
 1: Abort entirely from this (lisp) process.
[1] CL-USER(13): 

I've never exported or documented it because I don't want the
headache of dealing with people who have used it without being
careful about how the boxes might change out from under them...
If you ignore my warning and use this anyway, then don't come
cryin' to me when ...

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Thomas Schilling
Subject: Re: "t" stands for the type true and for the boolean expression    true ..
Date: 
Message-ID: <3sfjphFng6p3U1@news.dfncis.de>
Duane Rettig wrote:

> As for linear types, I guess we could use such a system for something
> I experimented with many years ago (kids, don't try this at home):

> CL-USER(1): (setq x (excl::create-box 'single-float))
[...]
> CL-USER(8): (setq z (+ x 0))

So you would reuse X's box if the type inferencer could prove that it's
unshared. But ...

> CL-USER(12): (setf (excl::box-value z) 1.0)
> Error: 3.1415927 is not a single-float box

.. can only happen on machines with more than 32 bits, doesn't it?

Or does Z use a different kind of box here? I guess you can't get
untagged values on the REPL, only within functions. Although the latter
would require special hints for the GC and probably gets more
complicated in the presence of multi-threading (how to handle registers
stored during context switch?). OTOH this doesn't apply to floating
point registers since they are (usually) stored in a different register
set. AFAIK 32bit Allegro (as of 6.2, not sure about 7.0) does quite some
(unnecessary) boxing.
From: Duane Rettig
Subject: Re: "t" stands for the type true and for the boolean expression  true ..
Date: 
Message-ID: <44q71z1s2.fsf@franz.com>
Thomas Schilling <······@yahoo.de> writes:

> Duane Rettig wrote:
>
>> As for linear types, I guess we could use such a system for something
>> I experimented with many years ago (kids, don't try this at home):
>
>> CL-USER(1): (setq x (excl::create-box 'single-float))
> [...]
>> CL-USER(8): (setq z (+ x 0))
>
> So you would reuse X's box if the type inferencer could prove that it's
> unshared. But ...

Right.

>> CL-USER(12): (setf (excl::box-value z) 1.0)
>> Error: 3.1415927 is not a single-float box
>
> .. can only happen on machines with more than 32 bits, doesn't it?

No, it has nothing to do with word size.  The example I gave happend,
in fact, to be on a 32-bit machine.

> Or does Z use a different kind of box here?

Yes.  As I said, I didn't release this functionality.  One of the first
things I would do if I were to release it would be to rename it to what
it more appropriately represents - instead of "box" it would be called
a "writable-box" (you can even see traces of this newer thinking creeping
into the describe output).  The point is that a boxed number is normally
not mutable, and thus what separates this kind of box above from other
numbers (and also what makes it so dangerous) is that it is writable.

> I guess you can't get
> untagged values on the REPL, only within functions.

This is not a dichotomy - it's not REPL vs unboxed-within-a-function.
It's boxed-land vs unboxed-land. And though it is true that the REPL
lives withing boxed-land, it is not true that unboxed-land is limited
to being within a function.

 Although the latter
> would require special hints for the GC and probably gets more
> complicated in the presence of multi-threading (how to handle registers
> stored during context switch?). OTOH this doesn't apply to floating
> point registers since they are (usually) stored in a different register
> set.

And, intyerestingly enough, unboxed float values are usually stored in
float registers, so this poses no problem for a gc.

> AFAIK 32bit Allegro (as of 6.2, not sure about 7.0) does quite some
> (unnecessary) boxing.

I disagree. All of the boxing that Allegro CL does is necessary,
otherwise we wouldn't be doing it.  If you really meant to say that
it does more boxing than you would have liked for it to do for your
application, then you could always become a supported customer and
send in a request to our support list; we have ways of eliminating
boxing, even between function calls, that might surprise you.


-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Thomas Schilling
Subject: Re: "t" stands for the type true and for the boolean expression    true ..
Date: 
Message-ID: <3sieg0Fo3js5U1@news.dfncis.de>
Duane Rettig wrote:

> This is not a dichotomy - it's not REPL vs unboxed-within-a-function.
> It's boxed-land vs unboxed-land. And though it is true that the REPL
> lives withing boxed-land, it is not true that unboxed-land is limited
> to being within a function.

I've been thinking about this - it shouldn't be too hard to kind of
"cluster" static functions, i.e. fully declare some function's types and
call it with untagged/unboxed parameters.  This would, however, mean
keeping track of all call sites in case this function needs to be
redefined.  But that's ok for release code, you can keep this
information separate (you don't even need to deliver it unless an
application user is allowed to alter the program's functions).  A simple
boxed-land stub should suffice (for APPLY and friends).  And of course
GC modifications will be necessary ...

I'm going to play around with such stuff eventually.  (Yes, as many, I
also have my toy lisp project - though not an interpreter.  I chose the
hard drugs, a compiler to ia32(!).  Nice toy, there're vast amounts of
papers and books about things that might be useful - SSA, ANF, CPS,
typing stuff, all sorts of optimizations, register allocation, ...
wiiide field.  But, no, I won't go into any more detail, it's just a toy
ATM.)

> And, intyerestingly enough, unboxed float values are usually stored in
> float registers, so this poses no problem for a gc.

Which may be harder when you have to use stack based floating point
units.  But I have to guess, I never did write code for those.

>>AFAIK 32bit Allegro (as of 6.2, not sure about 7.0) does quite some
>>(unnecessary) boxing.
>
> I disagree. All of the boxing that Allegro CL does is necessary,
> otherwise we wouldn't be doing it. If you really meant to say that
> it does more boxing than you would have liked for it to do for your
> application, then you could always become a supported customer and
> send in a request to our support list; we have ways of eliminating
> boxing, even between function calls, that might surprise you.

Well, it's just some experience of a fellow programmer trying to
optimize some bootleneck using type declarations but still getting quite
some amount of boxing (at least more than expected).  I can't remember
the details and I bailed out of the project quite some time ago.  But I
can witness your good customer support ;)
From: Duane Rettig
Subject: Re: "t" stands for the type true and for the boolean expression  true ..
Date: 
Message-ID: <43bmjo8vz.fsf@franz.com>
Thomas Schilling <······@yahoo.de> writes:

> Duane Rettig wrote:
>
>> This is not a dichotomy - it's not REPL vs unboxed-within-a-function.
>> It's boxed-land vs unboxed-land. And though it is true that the REPL
>> lives withing boxed-land, it is not true that unboxed-land is limited
>> to being within a function.
>
> I've been thinking about this - it shouldn't be too hard to kind of
> "cluster" static functions, i.e. fully declare some function's types and
> call it with untagged/unboxed parameters.  This would, however, mean
> keeping track of all call sites in case this function needs to be
> redefined.  But that's ok for release code, you can keep this
> information separate (you don't even need to deliver it unless an
> application user is allowed to alter the program's functions).  A simple
> boxed-land stub should suffice (for APPLY and friends).  And of course
> GC modifications will be necessary ...

Yes, ypu've got the idea!

> I'm going to play around with such stuff eventually.  (Yes, as many, I
> also have my toy lisp project - though not an interpreter.  I chose the
> hard drugs, a compiler to ia32(!).  Nice toy, there're vast amounts of
> papers and books about things that might be useful - SSA, ANF, CPS,
> typing stuff, all sorts of optimizations, register allocation, ...
> wiiide field.  But, no, I won't go into any more detail, it's just a toy
> ATM.)
>
>> And, intyerestingly enough, unboxed float values are usually stored in
>> float registers, so this poses no problem for a gc.
>
> Which may be harder when you have to use stack based floating point
> units.  But I have to guess, I never did write code for those.

They are painful when your compiler architecture assumes randomly
accessible registers.  But not because of the gc; the gc still ignores
the floating point unit.

>>>AFAIK 32bit Allegro (as of 6.2, not sure about 7.0) does quite some
>>>(unnecessary) boxing.
>>
>> I disagree. All of the boxing that Allegro CL does is necessary,
>> otherwise we wouldn't be doing it. If you really meant to say that
>> it does more boxing than you would have liked for it to do for your
>> application, then you could always become a supported customer and
>> send in a request to our support list; we have ways of eliminating
>> boxing, even between function calls, that might surprise you.
>
> Well, it's just some experience of a fellow programmer trying to
> optimize some bootleneck using type declarations but still getting quite
> some amount of boxing (at least more than expected).  I can't remember
> the details and I bailed out of the project quite some time ago.  But I
> can witness your good customer support ;)

Thanks.  And as for getting feedback from the compiler itself about what
makes it feel good, I think 8.0 is going to have some good diagnostic
abilities, and will allow the savvy programmer to know exactly why his
program is still boxing in particular places.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Thomas A. Russ
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <ymi8xwdslwl.fsf@sevak.isi.edu>
"John Thingstad" <··············@chello.no> writes:

> Am I wrong that elements of a specialized array must be tagged?
> If not what happens when I assign a simple-array of integer a integer?

It is not necessary to have elements of specialized arrays be tagged.
What happens is that any tagged ("boxed") item gets unboxed.  That is,
its tag gets removed.

Now, in practice on stock hardware, this will not happen for FIXNUMs,
since they are already cleverly designed to fit into a single machine
word, so there would be no savings.  More significant is floating point
operations, where the desire to use stock floating point hardware and
existing standards suchs as IEEE 745 mean that you can't use the FIXNUM
trick of taking bits away.  Floats, therefore, have to have additional
space for the tag (usually an entire word).

So, aside from the array header information, the actual memory layout of
a specialized array of floats in Lisp doesn't differ from that in
languages such as FORTRAN.

In some cases, the advantages of using such a native storage format are
large enough that it can (depending on implementation) be preferable to
pass single element arrays of double-float rather than double-float
arguments.  It really depends on the specfic implementation and how much
optimization of function entry points is performed.

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Björn Lindberg
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <9mpr7a7awqq.fsf@muvclx01.cadence.com>
"John Thingstad" <··············@chello.no> writes:

> On Wed, 26 Oct 2005 20:43:56 +0200, Duane Rettig <·····@franz.com> wrote:
> 
> > Bulent Murtezaoglu <··@acm.org> writes:
> >
> >>>>>>> "DR" == Duane Rettig <·····@franz.com> writes:
> >> [...]
> >>     DR> Most of the genome analysis projects I know about do indeed
> >>     DR> use characters rather than symbols, and they store large
> >>     DR> strings of DNA material into ... strings! [...]
> >>
> >> Hmpf.  I assume, then, they use 64 bit machines with >4gig RAM?
> >
> > Yes.
> >
> >> It
> >> seems the approach Paul Dietz outlined might have considerable
> >> performance advantages.  Maybe the ability to address bytes directly
> >> as opposed to shifting and masking is a factor?  I am not familiar
> >> with the field though, perhaps someone would tell us why the seemingly
> >> obvious optimization is not used.
> >
> > Yes, see also my response to Christophe on this issue - it is desirable
> > to maximize both speed and space in this kind of problem.  Going
> > sub-octet certainly optimizes the size side, but it may de-optimize the
> > speed size in at least two ways (both the less efficient access of
> > shift-and-mask pre- or post- processing, and the degradation that
> > comes from not having built-in functions which work one natural word
> > at a time).
> >
> 
> I am hardly an expert, but the 'obvious solution' to me is to store it
> in  a 64 bit unsigned integer array
> if you have a 64 bit computer. I would use two bit's to represent a
> base-pair.
> Now to compare large sequences (something they seem to do a lot.) you
> can  use integer comparison.
> This compares 32 pairs in one operation.. Considering you don't have
> to  load and store each symbol as
> well the speedup should be dramatic. (50 times?)
> 
> Further there is the size of the array. Stored in this matter a 3.3
> Gph  would fit
> in a gigabyte. In the event you had to compare two such sequences you
> would need twice that.
> 
> With a list of 64 bit pointers to symbols you would need 100 Gb.
> Imagine the time spent reading this amount of data from disk!
> 
> Even with 8 bit character strings (if your 64 bit system still
> supports  them!)
> it would take 3.3 Gb pr sequence or 6.6 Gb total probably forcing
> you to swap from disc.
> 
> I normally don't waste time on bit fiddling but in this case the
> benefits  seem
> enormous. Extracting bits from a integer gives a trivial overhead
> compared  to
> having to load large chunks of data from disk during a compare or
> simular  operation.
> 
> I figure I must have missed something. But without seeing a complete list
> of desired operations, average sequence size, etc. etc it it is hard
> to  see what.
> 
> Is there someone her with experince (or knowlege of) with
> bioinformathics  that could
> tell me what I missed?

The comparisons involved are rarely base by base. Sequence alignment
algorithms for example typically allows to insert gaps in one of the
sequences to align different parts of it to different parts of the
other sequence.


Bj�rn
From: Björn Lindberg
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <9mpmzkvawe3.fsf@muvclx01.cadence.com>
·····@runa.se (Bj�rn Lindberg) writes:

> > I am hardly an expert, but the 'obvious solution' to me is to store it
> > in  a 64 bit unsigned integer array
> > if you have a 64 bit computer. I would use two bit's to represent a
> > base-pair.
> > Now to compare large sequences (something they seem to do a lot.) you
> > can  use integer comparison.
> > This compares 32 pairs in one operation.. Considering you don't have
> > to  load and store each symbol as
> > well the speedup should be dramatic. (50 times?)
> > 
> > Further there is the size of the array. Stored in this matter a 3.3
> > Gph  would fit
> > in a gigabyte. In the event you had to compare two such sequences you
> > would need twice that.
> > 
> > With a list of 64 bit pointers to symbols you would need 100 Gb.
> > Imagine the time spent reading this amount of data from disk!
> > 
> > Even with 8 bit character strings (if your 64 bit system still
> > supports  them!)
> > it would take 3.3 Gb pr sequence or 6.6 Gb total probably forcing
> > you to swap from disc.
> > 
> > I normally don't waste time on bit fiddling but in this case the
> > benefits  seem
> > enormous. Extracting bits from a integer gives a trivial overhead
> > compared  to
> > having to load large chunks of data from disk during a compare or
> > simular  operation.
> > 
> > I figure I must have missed something. But without seeing a complete list
> > of desired operations, average sequence size, etc. etc it it is hard
> > to  see what.
> > 
> > Is there someone her with experince (or knowlege of) with
> > bioinformathics  that could
> > tell me what I missed?
> 
> The comparisons involved are rarely base by base. Sequence alignment
> algorithms for example typically allows to insert gaps in one of the
> sequences to align different parts of it to different parts of the
> other sequence.

... meaning that I think a solution allowing to easily and efficiently
manipulate single bases wins, i.e. -- as Duane suggested -- strings of
characters.


Bj�rn
From: John Thingstad
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <op.sza6hmvjpqzri1@mjolner.upc.no>
On Thu, 27 Oct 2005 16:04:52 +0200, Bj�rn Lindberg <·····@runa.se> wrote:

> ·····@runa.se (Bj�rn Lindberg) writes:
>
>> > I am hardly an expert, but the 'obvious solution' to me is to store it
>> > in  a 64 bit unsigned integer array
>> > if you have a 64 bit computer. I would use two bit's to represent a
>> > base-pair.
>> > Now to compare large sequences (something they seem to do a lot.) you
>> > can  use integer comparison.
>> > This compares 32 pairs in one operation.. Considering you don't have
>> > to  load and store each symbol as
>> > well the speedup should be dramatic. (50 times?)
>> >
>> > Further there is the size of the array. Stored in this matter a 3.3
>> > Gph  would fit
>> > in a gigabyte. In the event you had to compare two such sequences you
>> > would need twice that.
>> >
>> > With a list of 64 bit pointers to symbols you would need 100 Gb.
>> > Imagine the time spent reading this amount of data from disk!
>> >
>> > Even with 8 bit character strings (if your 64 bit system still
>> > supports  them!)
>> > it would take 3.3 Gb pr sequence or 6.6 Gb total probably forcing
>> > you to swap from disc.
>> >
>> > I normally don't waste time on bit fiddling but in this case the
>> > benefits  seem
>> > enormous. Extracting bits from a integer gives a trivial overhead
>> > compared  to
>> > having to load large chunks of data from disk during a compare or
>> > simular  operation.
>> >
>> > I figure I must have missed something. But without seeing a complete  
>> list
>> > of desired operations, average sequence size, etc. etc it it is hard
>> > to  see what.
>> >
>> > Is there someone her with experince (or knowlege of) with
>> > bioinformathics  that could
>> > tell me what I missed?
>>
>> The comparisons involved are rarely base by base. Sequence alignment
>> algorithms for example typically allows to insert gaps in one of the
>> sequences to align different parts of it to different parts of the
>> other sequence.
>
> ... meaning that I think a solution allowing to easily and efficiently
> manipulate single bases wins, i.e. -- as Duane suggested -- strings of
> characters.
>
>
> Bj�rn

Yes. That is an entirely different prblem.
Thanks for clearifying this for me.

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: Marco Antoniotti
Subject: DNA sequence coding (Re: "t" stands for the type true and for the boolean expression true ..)
Date: 
Message-ID: <tsM7f.49$pa3.19975@typhoon.nyu.edu>
Ulrich Hobelmann wrote:
> Marco Antoniotti wrote:
> 
>>> Indeed:
>>>
>>>     > (let ((acgt (vector 'a 'c 'g 't)))
>>>         (loop for i below 200 collect (aref acgt (random 4))))
>>>
>>>     (G T T C C G G C C G A C T G A A T T C A T T G T A C A G G C A C
>>>      G T G A G C C C G C G C A C G G G G C T G G G A T T T G G A T G
>>>      T A T A T A A T G G A G T C G G A T A G C G A C G T C C G A C C
>>>      A C A A A T C T C C G C G G C G A C C T T A G A G A C T A T A G
>>>      G A C G G T G G A T T A T T C T A A T T T A C G A C C G C A G T
>>>      G T T G C C A T C T G C A G C A C C A A T C C G A A A A A G G C
>>>      G G G T A T C G)
>>>     >
>>> Hey, this DNA stuff is fun!!   ;-}  ;-}
>>
>>
>> Yes.  Apart from the fact that in this case you really do not want to 
>> use a symbol, but probably a character.  A symbol is too heavy for 
>> this sort of things. :}
> 
> 
> Why?  Are symbols any heavier than characters (except for their GC 
> footprint, but all symbols usually share one symbol instance, if I'm not 
> mistaken)?
> 
> I'd guess that symbols are just pointers (typically 32bit), while 
> characters would usually also be padded to 32 bits.
> 
> So in my mental model symbols only create overhead once (when read), but 
> are lightweight objects afterwards, indeed one of the most basic 
> constructs in Lisp.
> 

Well, yes.  But using characters gives you a chance that the underlying 
implementation will "optimize" them. A character may takes less space. 
And if you consider the size of genomes (HS being at about 3.3Gbp - with 
1 base-pair, bp, being the meausring unit) you see why you want things 
compressed, no matter which language you use.

Cheers
--
Marco
From: Paul Dietz
Subject: Re: DNA sequence coding (Re: "t" stands for the type true and for the   boolean expression true ..)
Date: 
Message-ID: <djo7l7$5vv$1@avnika.corp.mot.com>
Marco Antoniotti wrote:

> Well, yes.  But using characters gives you a chance that the underlying 
> implementation will "optimize" them. A character may takes less space. 
> And if you consider the size of genomes (HS being at about 3.3Gbp - with 
> 1 base-pair, bp, being the meausring unit) you see why you want things 
> compressed, no matter which language you use.

One may want to go with nybble or bit pair arrays; that is, arrays
of element type (integer 0 15) or (integer 0 3).  Many CL
implementations have specialized arrays with these element types
(and many had bugs in them :).)

	Paul
From: Thomas A. Russ
Subject: Re: "t" stands for the type true and for the boolean expression true    ..
Date: 
Message-ID: <ymiacgvssxq.fsf@sevak.isi.edu>
Ulrich Hobelmann <···········@web.de> writes:

> Why?  Are symbols any heavier than characters (except for their GC 
> footprint, but all symbols usually share one symbol instance, if I'm not 
> mistaken)?
> 
> I'd guess that symbols are just pointers (typically 32bit), while 
> characters would usually also be padded to 32 bits.
> 
> So in my mental model symbols only create overhead once (when read), but 
> are lightweight objects afterwards, indeed one of the most basic 
> constructs in Lisp.

Well, if you are actually using them in lists, it probably doesn't make
any difference in size.  But you can very efficiently and conveniently
use strings to hold them instead:

"AGCTTTGCAGC..."

Of course, the benefit of using symbols is that you could, if desired,
attach other information to the symbol object.  If I were to use
symbols, though, I'd be tempted to use keywords, though.

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Kaz Kylheku
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130436209.215801.63870@o13g2000cwo.googlegroups.com>
Ulrich Hobelmann wrote:
> Marco Antoniotti wrote:
> >> Indeed:
> >>
> >>     > (let ((acgt (vector 'a 'c 'g 't)))
> >>         (loop for i below 200 collect (aref acgt (random 4))))
> >>
> >>     (G T T C C G G C C G A C T G A A T T C A T T G T A C A G G C A C
> >>      G T G A G C C C G C G C A C G G G G C T G G G A T T T G G A T G
> >>      T A T A T A A T G G A G T C G G A T A G C G A C G T C C G A C C
> >>      A C A A A T C T C C G C G G C G A C C T T A G A G A C T A T A G
> >>      G A C G G T G G A T T A T T C T A A T T T A C G A C C G C A G T
> >>      G T T G C C A T C T G C A G C A C C A A T C C G A A A A A G G C
> >>      G G G T A T C G)
> >>     >
> >> Hey, this DNA stuff is fun!!   ;-}  ;-}
> >
> > Yes.  Apart from the fact that in this case you really do not want to
> > use a symbol, but probably a character.  A symbol is too heavy for this
> > sort of things. :}
>
> Why?  Are symbols any heavier than characters (except for their GC
> footprint, but all symbols usually share one symbol instance, if I'm not
> mistaken)?

It's not just that the symbol is heavy, but really, these four letters
don't *symbolize* anything on their own. What kind of useful
DNA-related semantics would you attach to these four symbols?

The use of symbols would begin at the level of some higher lexical
units recognized within the DNA. So maybe GTTCCGG could be a symbol.
From: Pascal Bourguignon
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <87wtjyv2te.fsf@thalassa.informatimago.com>
"Kaz Kylheku" <········@gmail.com> writes:
> It's not just that the symbol is heavy, but really, these four letters
> don't *symbolize* anything on their own. What kind of useful
> DNA-related semantics would you attach to these four symbols?

Why are you asking such a trick question?  The meaning is in the eyes
of the beholder!

(setf (get 'g :composition) '((H 5)(C 4)(N 5)(O 1)))
(setf (get 'g :absorbtion)  #2A((220 23000)(235 10500)(250 11000)
                                (265 10500)(290 14000)(300 50)(400 0)))

(setf (get 'c :composition) '((H 5)(C 2)(N 3)(O 1)))
(setf (get 'c :absorbtion)  #2A((220 6000)(250 3500)(300 4500)
                                (300 50)(400 0)))

(setf (get 'h :protons)  1)
(setf (get 'h :neutrons) 0)
(setf (get 'c :protons)  6)
(setf (get 'c :neutrons) 6)


> The use of symbols would begin at the level of some higher lexical
> units recognized within the DNA. So maybe GTTCCGG could be a symbol.

The meaning is in the eyes of the beholder!!! 
In animals,               CUA = leucine
but in yeast mitochodria, CUA = threonine.


For a programmer, this should be obvious!

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
In deep sleep hear sound,
Cat vomit hairball somewhere.
Will find in morning.
From: Rob Warnock
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <9b-dncZJV-_sBf3eRVn-gA@speakeasy.net>
Marco Antoniotti  <·······@cs.nyu.edu> wrote:
+---------------
| Rob Warnock wrote:
| > Marco Antoniotti  <·······@cs.nyu.edu> wrote:
| > +---------------
| > | Alan Crowe wrote:
| > | > Don't forget t for thymine. All your DNA code wants to use
| > | > the letters A,C,G,T as variable names.
| > | 
| > | Nope.  My DNA handling code wants to use the symbol 't to
| > | denote thymine.  As such I can use it as much as I want.
| > +---------------
...
| >     > (let ((acgt (vector 'a 'c 'g 't)))
| >         (loop for i below 200 collect (aref acgt (random 4))))
...
| > Hey, this DNA stuff is fun!!   ;-}  ;-}
| 
| Yes.  Apart from the fact that in this case you really do not want to 
| use a symbol, but probably a character.  A symbol is too heavy for this 
| sort of things. :}
+---------------

Then why did *you* say "My DNA handling code wants to use the
symbol 't to denote thymine"?!?

Or did I get the previous-posting nesting/quoting wrong?


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Marco Antoniotti
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <hg78f.50$pa3.18765@typhoon.nyu.edu>
Rob Warnock wrote:
> Marco Antoniotti  <·······@cs.nyu.edu> wrote:
> +---------------
> | Rob Warnock wrote:
> | > Marco Antoniotti  <·······@cs.nyu.edu> wrote:
> | > +---------------
> | > | Alan Crowe wrote:
> | > | > Don't forget t for thymine. All your DNA code wants to use
> | > | > the letters A,C,G,T as variable names.
> | > | 
> | > | Nope.  My DNA handling code wants to use the symbol 't to
> | > | denote thymine.  As such I can use it as much as I want.
> | > +---------------
> ...
> | >     > (let ((acgt (vector 'a 'c 'g 't)))
> | >         (loop for i below 200 collect (aref acgt (random 4))))
> ...
> | > Hey, this DNA stuff is fun!!   ;-}  ;-}
> | 
> | Yes.  Apart from the fact that in this case you really do not want to 
> | use a symbol, but probably a character.  A symbol is too heavy for this 
> | sort of things. :}
> +---------------
> 
> Then why did *you* say "My DNA handling code wants to use the
> symbol 't to denote thymine"?!?
> 
> Or did I get the previous-posting nesting/quoting wrong?


Nope.  I was just responding to the thread in the context of "T is bad 
and cannot be used as a symbol".  My second answer changed the context 
(at least for me) and started a whole subtread) before I actually 
changed the subject header.  Sorry.  Don't you love Usenet? :)

Cheers
--
Marco
From: William Bland
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <pan.2005.10.26.17.12.27.684438@gmail.com>
On Wed, 26 Oct 2005 02:32:43 -0500, Rob Warnock wrote:
> 
>     > (let ((acgt (vector 'a 'c 'g 't)))
>         (loop for i below 200 collect (aref acgt (random 4))))
> 
>     (G T T C C G G C C G A C T G A A T T C A T T G T A C A G G C A C
>      G T G A G C C C G C G C A C G G G G C T G G G A T T T G G A T G
>      T A T A T A A T G G A G T C G G A T A G C G A C G T C C G A C C
>      A C A A A T C T C C G C G G C G A C C T T A G A G A C T A T A G
>      G A C G G T G G A T T A T T C T A A T T T A C G A C C G C A G T
>      G T T G C C A T C T G C A G C A C C A A T C C G A A A A A G G C
>      G G G T A T C G)
>     > 
> 
> Hey, this DNA stuff is fun!!   ;-}  ;-}
> 

Excellent!  Now all we need is an eval that understands DNA:

> (eval-dna *my-genome*)

#<HUMAN>

> (inspect *)

The object is of type HUMAN.
0. EYES: brown
1. HAIR: black
etc...

Can't be too difficult ;-)

Cheers,
	Bill.
From: Thomas Schilling
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <3sfkm3FnrktiU1@news.dfncis.de>
William Bland wrote:
> Excellent!  Now all we need is an eval that understands DNA:

There is. (kind of)

"DNA Haskell is a special programming language that supports the
construction of DNA algorithms which should be executed in laboratories.
A DNA algorithm includes important information which is also included in
a "protocol". A protocol serves the purpose of a manual for a
researcher. This information refers to the order in which molecular
biological operations should be executed, in which tubes and with which
DNA material. Experimental studies on DNA Computing are prepared on the
basis of these algorithms and protocols. In computer science in order to
solve a mathematical problem, first an algorithm is developed, e.g. as a
computer program source, and then it is executed on a computer. A DNA
Computer works with DNA algorithms, which can be implemented in practice
in laboratories. At the moment practical studies on DNA Computing are
still very expensive. The execution of a protocol in a laboratory or its
simulation should be therefore prepared in such a way that underlying
algorithm is correct and there is no possibility of errors. The
algorithms constructed under DNA Haskell satisfy this goal. The
construction tool DNA Haskell uses the functional language Haskell and
hence it has advantages of a non-strict, typed and referentially
transparent language. The project is being implemented via different
student subprojects."

  <http://wwwtcs.inf.tu-dresden.de/molecules/index.php.en>
From: Joe Marshall
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <7jbz7p4b.fsf@alum.mit.edu>
····@rpw3.org (Rob Warnock) writes:

> Marco Antoniotti  <·······@cs.nyu.edu> wrote:
> +---------------
> | Alan Crowe wrote:
> | > ····@unreal.uncom writes:
> | > > it would have been much better to have used "true" instead of "t"
> | > 
> | > Don't forget t for thymine. All your DNA code wants to use
> | > the letters A,C,G,T as variable names.
> | 
> | Nope.  My DNA handling code wants to use the symbol 't to
> | denote thymine.  As such I can use it as much as I want.
> +---------------
>
> Indeed:
>
>     > (let ((acgt (vector 'a 'c 'g 't)))
>         (loop for i below 200 collect (aref acgt (random 4))))
>
>     (G T T C C G G C C G A C T G A A T T C A T T G T A C A G G C A C
>      G T G A G C C C G C G C A C G G G G C T G G G A T T T G G A T G
>      T A T A T A A T G G A G T C G G A T A G C G A C G T C C G A C C
>      A C A A A T C T C C G C G G C G A C C T T A G A G A C T A T A G
>      G A C G G T G G A T T A T T C T A A T T T A C G A C C G C A G T
>      G T T G C C A T C T G C A G C A C C A A T C C G A A A A A G G C
>      G G G T A T C G)
>     > 
>
> Hey, this DNA stuff is fun!!   ;-}  ;-}

Keep this man away from the gene-splicing equipment.
From: Rob Warnock
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <8t2dne8XEshzB_3enZ2dnUVZ_smdnZ2d@speakeasy.net>
Joe Marshall  <·········@alum.mit.edu> wrote:
+---------------
| ····@rpw3.org (Rob Warnock) writes:
| >     > (let ((acgt (vector 'a 'c 'g 't)))
| >         (loop for i below 200 collect (aref acgt (random 4))))
| >     (G T T C C G G C C G A C T G A A T T C A T T G T A C A G G C A C
| >      G T G A G C C C G C G C A C G G G G C T G G G A T T T G G A T G
| >      T A T A T A A T G G A G T C G G A T A G C G A C G T C C G A C C
| >      A C A A A T C T C C G C G G C G A C C T T A G A G A C T A T A G
| >      G A C G G T G G A T T A T T C T A A T T T A C G A C C G C A G T
| >      G T T G C C A T C T G C A G C A C C A A T C C G A A A A A G G C
| >      G G G T A T C G)
| >     > 
| > Hey, this DNA stuff is fun!!   ;-}  ;-}
| 
| Keep this man away from the gene-splicing equipment.
+---------------

So what *is* the probability that a random 200 base-pair sequence
encodes something biologically active? (...or even a viable virus?!?)  ;-}

I know, I know, they actually come in base-pair *triplets* [that is,
DNA is really base-64], so a 200 base-pair sequence isn't even valid.
And it needs special "start" & "stop" triplets at the beginning and end. 
[And if it's a virus, it's probably RNA, not DNA.]

But that said, what's the probability that a random *well-formed*
207 base-pair sequence encodes something biologically active (or viable)?


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Björn Lindberg
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <9mpk6g1czbe.fsf@muvclx01.cadence.com>
Alan Crowe <····@cawtech.freeserve.co.uk> writes:

> ····@unreal.uncom writes:
> > A practical limitation imposed by this arbitrary unfairness is having
> > to find other names for such common variables as T for temperature, T
> > for time, T for temporary, T for test, etc. 
> 
> I agree. I do this
> 
> ;; I hate getting caught out by the fact that CL:T is a constant
> ;; I want to use t for time
> (defconstant true 'cl:t)
> (shadow 't)
> ;; Ofcourse, now I get caught out when (format t ...) doesn't work.
> ;; It needs to be (format true ...) so that format sees CL:T
> ;; Format doesn't accept CL-USER:T 
> 
> in http://alan.crowe.name/clx/3D-viewer/index.html
> 
> it would have been much better to have used "true" instead
> of "t"
> 
> Don't forget t for thymine. All your DNA code wants to use
> the letters A,C,G,T as variable names.

I don't think so, more likely then `n' for nucleotide. For naming them
you can use :a, :c, :g & :t. (Not picking a side in the topic at hand
here, but as CL is today, you have to make a tradeoff: Is it worth
shadowing T in a particular piece of code or not?)


Bj�rn
From: Bill Atkins
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130352400.985353.25400@g43g2000cwa.googlegroups.com>
> Don't forget t for thymine. All your DNA code wants to use
> the letters A,C,G,T as variable names.

Hmm, I don't get it.

I can see why you'd want to use 't as a variable *value* while
processing DNA, but what good woud t (or a, g, c, or u, for that
matter) do as a variable *name*?  Nothing really comes to mind.
Wouldn't you be better off describing what the variable actually does
(e.g. t-count, t-runs, etc.), since the name of a base pair doesn't
really signify anything by itself?

Bill
From: Barry Margolin
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <barmar-78C250.20183124102005@comcast.dca.giganews.com>
In article <··································@4ax.com>,
 ····@unreal.uncom wrote:

> A practical limitation imposed by this arbitrary unfairness is having
> to find other names for such common variables as T for temperature, T
> for time, T for temporary, T for test, etc.  CL fans complain about
> having to use LST instead of LIST in Scheme, but that's a minor
> complaint compared to hogging T.

I don't see a smiley, so I'm going to assume you're serious.

IMHO, you should use more meaningful variable names, not bemoan the fact 
that CL has usurped one of the one-letter variables.  About the only 
time it's close to reasonable to use one-letter variables is for 
throwaway indexes in loops, e.g.

(dotimes (i ...) ...)

This is idiomatic that there's little to be gained by naming the 
variable something like thing-index.  But if you have a variable for 
temperature, why not call it TEMPERATURE?

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
From:  (typep 'nil '(satisfies identity)) => ?
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130235475.449718.80500@g44g2000cwa.googlegroups.com>
I think after all the good and annoying contributions; I should reveal
the reasons for my unusual question. By the way I don't think cl
should be fixed, but a user friendly language should allow the
programmer to customize the language us much as possible ... and that
is the challenge for lisp!!!

My challenge in lisp programming is to store Data as commands with
their result:
e.g.:
(print 444 t) => 444
(boundp '*terminal-io*) 0 => t
(subtypep 'symbol 't)  => t

By abstracting the meaning of this Data (replacing t with an arbitrary
placeholder) you may access the general syntax of lisp expressions.
e.g. ((subtypep 'symbol 't) t) =>
((subtypep `symbol `:key):key)

Now filling in other objects in stead of :key ... you will get in
general new useful Data. But in case of t just nonsense, because t is
the union of three different meanings. The meaning is context dependent
und may not be determend with functions like 'eq or 'equal.

Distinct names for the symbols would provide a solution...

Thank you for your comments.
From: ··············@hotmail.com
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130265210.849189.323630@g14g2000cwa.googlegroups.com>
 (typep 'nil '(satisfies identity)) => ? wrote:
>
> My challenge in lisp programming is to store Data as commands with
> their result:
> e.g.:
> (print 444 t) => 444
> (boundp '*terminal-io*) 0 => t
> (subtypep 'symbol 't)  => t
>
> By abstracting the meaning of this Data (replacing t with an arbitrary
> placeholder) you may access the general syntax of lisp expressions.
> e.g. ((subtypep 'symbol 't) t) =>
> ((subtypep `symbol `:key):key)

  I don't quite understand what you are trying to do. (Including why
you use backquotes in the second line). The general SYNTAX of lisp
expressions is s-expressions, with various complications added by the
reader. Your "commands" are simply s-expressions being read in and
evaluated, to give a result. This is the SEMANTICS of Common Lisp,
which is defined by eval, including the bindings of variables and
functions in the environment.

  You cannot abstract out the semantics in this kind of symbolic way,
except in a relatively limited subset of self-evaluating objects and
"static" functions, or certain invariants required by the standard.

> Now filling in other objects in stead of :key ... you will get in
> general new useful Data.

  How is it useful? You can catalog what types 'symbol is a subtype of,
but that is information that is encoded in subtypep. (Which is
constrained by the standard).

(subtypep 'symbol 'integer) --> NIL
(subtypep 'symbol 'atom) --> T

what is abstract here? #'(lambda (x) (subtypep 'symbol x)) ? #'subtypep
?

> But in case of t just nonsense, because t is
> the union of three different meanings. The meaning is context dependent
> und may not be determend with functions like 'eq or 'equal.

This context is the whole point of having a programming language: if we
could determine the outcome of evaluations by static analysis, we
wouldn't need to run programs, just read them.

What functions are you trying to study? What do you make of such
functions as (get-universal-time) or (random 1.0d0) or (symbol-value
'*foo*) or even (cdr foo) ?

Your whole analysis seems to be suitable for a much different
evaluation strategy, like that of the lambda calculus or symbolic
logic, rather than Lisp.

> Distinct names for the symbols would provide a solution...

I think you are looking for a solution to a problem that doesn't exist,
but if you want "distinct names" look at #'gensym. That will give you
symbols guaranteed not to collide with any other usage in Common Lisp,
either by the standard, by implementation extensions, or even by user
code.
From: Barry Margolin
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <barmar-BAEB9A.22041825102005@comcast.dca.giganews.com>
In article <························@g14g2000cwa.googlegroups.com>,
 ···············@hotmail.com" <············@gmail.com> wrote:

> > Distinct names for the symbols would provide a solution...
> 
> I think you are looking for a solution to a problem that doesn't exist,
> but if you want "distinct names" look at #'gensym. That will give you
> symbols guaranteed not to collide with any other usage in Common Lisp,
> either by the standard, by implementation extensions, or even by user
> code.

I think what he's saying is that it's confusing that T is used for 
several different, unrelated things in Common Lisp.  It's the name of 
the "universe" type, it's the default "truth" return value of 
predicates, it's an abbreviation for *STANDARD-INPUT* and 
*STANDARD-OUTPUT* for some I/O functions, etc.

There's no real logic to it, it's mostly historical.  Decades ago, when 
computers had little memory, there was value in overloading the same 
name.  Maclisp didn't have *STANDARD-OUTPUT*, it just had T; it was a 
convenient name for a placeholder when you didn't want to refer to a 
stream that you explicitly opened.

Use of T as the root of the type hierarchy, though, was a latecomer, 
since earlier Lisps didn't have as much type introspection.  In this 
case, I think it was used because T and NIL are considered "opposites", 
and NIL was the obvious name for the empty type.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
From:  (typep 'nil '(satisfies identity)) => ?
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <1130340594.498737.222530@o13g2000cwo.googlegroups.com>
··············@hotmail.com schrieb:

> (typep 'nil '(satisfies identity)) => ? wrote:
> >
> > My challenge in lisp programming is to store Data as commands with
> > their result:
> > e.g.:
> > (print 444 t) => 444
> > (boundp '*terminal-io*) 0 => t
> > (subtypep 'symbol 't)  => t
> >
> > By abstracting the meaning of this Data (replacing t with an arbitrary
> > placeholder) you may access the general syntax of lisp expressions.
> > e.g. ((subtypep 'symbol 't) t) =>
> > ((subtypep `symbol `:key):key)
>
>   I don't quite understand what you are trying to do. (Including why
> you use backquotes in the second line). The general SYNTAX of lisp
> expressions is s-expressions, with various complications added by the
> reader. Your "commands" are simply s-expressions being read in and
> evaluated, to give a result. This is the SEMANTICS of Common Lisp,
> which is defined by eval, including the bindings of variables and
> functions in the environment.
>

You surely describe lisp in better appropriate way than I do. English
is for me the 4th language and an Italian keyboard with ms words is
quite tricky for `' and what ever ... .  In many points you are
simply right. The point of view makes the difference.
Lisp is a powerful language, especially for its clear syntax, the
unlimited possibility to combine and extend s-expressions and to modify
them while run time. So lisp code is a way to express something.
It would be good if such expressions could be clear and not
misunderstanding (like for t or nil).

.. and you may abstract the data I described in the last posting, in
what way ever, even predicates and so on. Normally I abstract with
"special" symbols that are known to be only placeholders.

And it is not the normal way lisp is to yoused ;-)))
From: Wade Humeniuk
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <6q57f.33074$yS6.27878@clgrps12>
  (typep 'nil '(satisfies identity)) => ? wrote:
> Hi, since 2 years I'm programming intensively in common lisp. I find it
> a truly dynamic and smart language. There is one thing about it, I
> really hate: "t" stands for the type true and for the boolean
> expression true.
> 
> Has anybody an idea how this could be fixed in common lisp?
> 

If you want to use T as a var name, (it ain't broken)

CL-USER 3 > (defpackage cl-user-1 (:use :common-lisp)
               (:shadow t))
#<PACKAGE CL-USER-1>

CL-USER 4 > (in-package :cl-user-1)
#<PACKAGE CL-USER-1>

CL-USER-1 11 > (defun test ()
               (let ((t 10))
                 (values t cl:t)))
TEST

CL-USER-1 12 > (test)
10
COMMON-LISP:T

CL-USER-1 13 >

Wade
From: Joe Marshall
Subject: Re: "t" stands for the type true and for the boolean expression true ..
Date: 
Message-ID: <fyqqeb1x.fsf@alum.mit.edu>
" (typep 'nil '(satisfies identity)) => ?" <··············@yahoo.it> writes:

> Hi, since 2 years I'm programming intensively in common lisp. I find it
> a truly dynamic and smart language. There is one thing about it, I
> really hate: "t" stands for the type true and for the boolean
> expression true.

This is a new one.

> Has anybody an idea how this could be fixed in common lisp?

I don't understand what you think the problem is!

There is no `type true'.