Re: question on sort

From: Barry Margolin
Subject: Re: question on sort
Date: Tue, 01 Aug 2000 00:00:00 +0000
Message-ID: <PoFh5.20$PL5.1602@burlma1-snr2>

In article <···························@news.uni-ulm.de>,
kp gores  <·····@sip.medizin.uni-ulm.de> wrote:
>hello,
>
>Hyperspec knows the functions sort, stable-sort: 
>
>   sort sequence predicate &key key => sorted-sequence
>
>   If sequence is a vector, the result is a vector that has the same
>   actual array element type as sequence. If sequence is a list, the
>   result is a list. 
>
>unfortunately they sort only sequences or vectors. I'd like to sort 
>arrays of whatever dimensions (my array is (* 6)).
>The key should be a element in the array (eg. accessor for column 2).
>
>does a fast and flexible implentation of sort/stable-sort for arbitrary 
>array exist somewhere on the net? 

You could create a sequence of row numbers:

(setq indices
      (loop for i below (array-dimension 0 old-array)
            collect i))

then sort these:

(setq sorted-indices
      (sort indices #'< :key (lambda (index) (aref old-array index 2)))

then finally fill in a new array using the sorted indexes:

(loop for i upfrom 0
      for sorted-index in sorted-indices
      do
  (dotimes (j array-dimension 1 old-array)
    (setf (aref new-array i j) (aref old-array sorted-index j))))

> BTW, 1) how do i declare the types of the columns in my array?

With the :ELEMENT-TYPE option to MAKE-ARRAY.

>      2) what is the correct form for (declare (type (list my-type))),
>         i.e a declarationn telling the compiler that the type is a list
>         of elements of type my-type

(defun list-of-my-type (list)
  (every #'(lambda (x) (typep x 'my-type))))

(deftype list-of-my-type ()
  '(and list (satisfies list-of-my-type)))

(declare (type list-of-my-type ...))

Such a declaration is probably not going to do much; this type is useful
with CHECK-TYPE, but types that involve SATISFIES generally can't do much
for optimization.

-- 
Barry Margolin, ······@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Re: question on sort Christophe Rhodes
- Re: question on sort Raymond Toy
  - Re: question on sort Robert Monfera
    - Re: question on sort Kent M Pitman
      - Re: question on sort Robert Monfera
    - Re: question on sort Raymond Toy
      - Re: question on sort Barry Margolin
      - Re: question on sort Robert Monfera
- Re: question on sort Barry Margolin
- Re: question on sort Coby Beck

From: Christophe Rhodes
Subject: Re: question on sort
Date: Wed, 02 Aug 2000 00:00:00 +0000
Message-ID: <sq1z07ss72.fsf@lambda.jesus.cam.ac.uk>

kp gores <·····@sip.medizin.uni-ulm.de> writes:

> hm, i meant if it is possible for my array of dimensions (* 6) to 
> declare that column 0 is of type string, column 1 and 2 of type date 
> etc...

Um. Are you sure that you have the optimal data type for your
application? An array should be seen as a set of the same types of
things -- for example, a temperature field in a room could be
represented as a three-dimensional array of floats (having set up a
suitable grid for the indices). What you seem to want is an array of
structs, or even possibly a list of structs, since you're not sure how
big your first dimension is.

Or am I missing something?

Christophe

From: Raymond Toy
Subject: Re: question on sort
Date: Wed, 02 Aug 2000 00:00:00 +0000
Message-ID: <4nhf93fy8h.fsf@rtp.ericsson.se>

>>>>> "kp" == kp gores <·····@sip.medizin.uni-ulm.de> writes:

    kp> in another post i gave his example:
    kp> i want to put the long file listing of a directory into an array.
    kp> the array has the  dimensions (* 6):
    kp> i dont know how many files there will be (thats why i wrote * as first 
    kp> index), but for each file i need the six entries 
    kp> name, owner, group ,size, creation date, modification date etc..
    kp> each entry has a different type.

    kp> of course i could define an array with elements of a type file-info, eg. 
    kp> a defclass with the six slots i need.
    kp> but it was obvious to me to use an array.

    kp> i'll try to think about why an array of structs is better.

Because it states more cleary what your intent is?
Because the elements have names?
What's in entry 3?  Or was entry 5 what I wanted?
What happens if I decide to add a new entry?
What happens all the code if I rearrange the entries?

Pretty good reasons for me.

Ray

From: Robert Monfera
Subject: Re: question on sort
Date: Thu, 03 Aug 2000 00:00:00 +0000
Message-ID: <39895FAF.EE40B496@fisec.com>

Raymond Toy wrote:

>     kp> i'll try to think about why an array of structs is better.
> 
> Because it states more cleary what your intent is?
> Because the elements have names?
> What's in entry 3?  Or was entry 5 what I wanted?
> What happens if I decide to add a new entry?
> What happens all the code if I rearrange the entries?
> 
> Pretty good reasons for me.

Why could a heterogenous array _sometimes_ be better?

Because of better locality of data in memory (->cache hits)?
Because you may save ~24 type overhead bytes per row?
Because you don't have to garbage collect a huge array?
How fast can you access information?

OK, these reasons are all optimization-related, and there is a lot to
lose if one goes that way (no heterogenous objects, no class
dispatching, manual memory management...).

Robert

From: Kent M Pitman
Subject: Re: question on sort
Date: Thu, 03 Aug 2000 00:00:00 +0000
Message-ID: <sfw7l9ytrib.fsf@world.std.com>

Robert Monfera <·······@fisec.com> writes:

> OK, these reasons are all optimization-related, and there is a lot to
> lose if one goes that way (no heterogenous objects, no class
> dispatching, manual memory management...).

And no sorting support... :-)

From: Robert Monfera
Subject: Re: question on sort
Date: Thu, 03 Aug 2000 00:00:00 +0000
Message-ID: <398A022A.60288458@fisec.com>

Kent M Pitman wrote:
> 
> Robert Monfera <·······@fisec.com> writes:
> 
> > OK, these reasons are all optimization-related, and there is a lot to
> > lose if one goes that way (no heterogenous objects, no class
> > dispatching, manual memory management...).
> 
> And no sorting support... :-)

That's part of manual memory management.

From: Raymond Toy
Subject: Re: question on sort
Date: Thu, 03 Aug 2000 00:00:00 +0000
Message-ID: <4nr986eb3j.fsf@rtp.ericsson.se>

>>>>> "Robert" == Robert Monfera <·······@fisec.com> writes:

    Robert> Raymond Toy wrote:
    kp> i'll try to think about why an array of structs is better.
    >> 
    >> Because it states more cleary what your intent is?
    >> Because the elements have names?
    >> What's in entry 3?  Or was entry 5 what I wanted?
    >> What happens if I decide to add a new entry?
    >> What happens all the code if I rearrange the entries?
    >> 
    >> Pretty good reasons for me.

    Robert> Why could a heterogenous array _sometimes_ be better?

    Robert> Because of better locality of data in memory (->cache hits)?

How is locality improved?  If the array is heterogenous, doesn't that
mean each array element is now some boxed thing so it's just a pointer
to where the real data is stored?  When I access the real data, my
cache is blown.

Am I missing something obvious here?

    Robert> Because you may save ~24 type overhead bytes per row?

I'm sorry, I don't quite see this.

    Robert> Because you don't have to garbage collect a huge array?

I don't quite follow here.  

    Robert> How fast can you access information?

Well, of the Lisp implementations I know, multidimensional array
access is significantly slower than 1-D array access.  However, in this
case, I suppose you might make up for that by not having to access the
structure slots.

Ray

From: Barry Margolin
Subject: Re: question on sort
Date: Thu, 03 Aug 2000 00:00:00 +0000
Message-ID: <wcfi5.5$G87.448@burlma1-snr2>

In article <··············@rtp.ericsson.se>,
Raymond Toy  <···@rtp.ericsson.se> wrote:
>    Robert> Because of better locality of data in memory (->cache hits)?
>
>How is locality improved?  If the array is heterogenous, doesn't that
>mean each array element is now some boxed thing so it's just a pointer
>to where the real data is stored?  When I access the real data, my
>cache is blown.

That will be true of the structure slot contents as well.  Since it's the
same for both representations, it is irrelevant to the comparison.

However, if all the structures are created in a loop (which is reasonable
to expect if you create them at the same time as you would have allocated
the 2-D array), they'll probably be adjacent in memory, which should
provide similar locality (not quite the same, because you'll have to jump
back and forth between the vector of structure references and the
structures themselves, but unless you're really hard up for RAM they should
both stay in memory).

>Am I missing something obvious here?
>
>    Robert> Because you may save ~24 type overhead bytes per row?
>
>I'm sorry, I don't quite see this.

He's referring to the header data for each structure instance.  Although 24
bytes sounds excessive.  I wouldn't expect much more than a 4-byte pointer
to a structure or class definition object, which would be shared by all
instances of the same structure type or class.

-- 
Barry Margolin, ······@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Robert Monfera
Subject: Re: question on sort
Date: Thu, 03 Aug 2000 00:00:00 +0000
Message-ID: <398A01C4.CA5B2AF7@fisec.com>

Raymond Toy wrote:

> How is locality improved?  If the array is heterogenous, doesn't that
> mean each array element is now some boxed thing so it's just a pointer
> to where the real data is stored?  When I access the real data, my
> cache is blown.
> 
> Am I missing something obvious here?

Certainly :-) We talk about a non-existent type of array where one
column may be a single-float, another one a fixnum etc.  Data are
unboxed.

>     Robert> Because you may save ~24 type overhead bytes per row?
> 
> I'm sorry, I don't quite see this.\

Try this:

(defclass loan ()
  ((amount :type double-float :initform 100.0d0)
   (rate :type single-float :initform 0.07d0)))

(defparameter *v* (make-array 1000000 :element-type 'loan))

(defun test (v)
  (loop for row from 0 to 999999
      do (setf (aref v row) (make-instance 'loan))))
        
>     Robert> Because you don't have to garbage collect a huge array?
> 
> I don't quite follow here.

Immediate data do not have to be GC'd, whether it's a vector of vectors
or the above mentioned non-existent type.

Robert

From: Barry Margolin
Subject: Re: question on sort
Date: Wed, 02 Aug 2000 00:00:00 +0000
Message-ID: <aEYh5.42$PL5.3213@burlma1-snr2>

In article <···························@news.uni-ulm.de>,
kp gores  <·····@sip.medizin.uni-ulm.de> wrote:
>probably not. it seems like nobody in this group looks at an array the 
>way i do :-(

You sound like a spreadsheet geek.  When all you have is a hammer,
everything looks like a nail.  In languages with that provide more
expressive data structures, arrays are generally only used for uniform
data.  Lisp certainly allows you to put arbitrary data in any array
element, but it's usually more appropriate to use structures for groups of
related items that aren't uniform.

-- 
Barry Margolin, ······@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Coby Beck
Subject: Re: question on sort
Date: Wed, 02 Aug 2000 00:00:00 +0000
Message-ID: <gb2i5.18288$47.278301@news.bc.tac.net>

"kp gores" <·····@sip.medizin.uni-ulm.de> wrote in message
································@news.uni-ulm.de...
>
> in another post i gave his example:
> i want to put the long file listing of a directory into an array.
> the array has the  dimensions (* 6):
> i dont know how many files there will be (thats why i wrote * as first
> index), but for each file i need the six entries
> name, owner, group ,size, creation date, modification date etc..
> each entry has a different type.
>
> of course i could define an array with elements of a type file-info, eg.
> a defclass with the six slots i need.

THIS is the obvious approach!

> but it was obvious to me to use an array.

An array, yes, but not a multi-dimensional array.
>
> i'll try to think about why an array of structs is better.

because otherwise you are trying to put a square peg in a round hole!!

if it looks like a duck and quacks like a duck, model it after a duck  ;-)

Coby