sort :key

From: Jacek Generowicz
Subject: sort :key
Date: Tue, 24 Sep 2002 08:47:30 +0000
Message-ID: <tyfr8fjwz8d.fsf@pcitapi22.cern.ch>

The Hyperspec states that there is no guarantee on the number of times
the sort key will be called.

What can be expected in practice from different implementations ?


Thanks,

Re: sort :key Joe Marshall
- Re: sort :key Frank A. Adrian
  - Re: sort :key Joe Marshall
    - Re: sort :key Jacek Generowicz
    - Re: sort :key Frank A. Adrian
Re: sort :key Thomas A. Russ
- Re: sort :key Vassil Nikolov

From: Joe Marshall
Subject: Re: sort :key
Date: Tue, 24 Sep 2002 13:24:02 +0000
Message-ID: <r8fjzfkd.fsf@ccs.neu.edu>

Jacek Generowicz <················@cern.ch> writes:

> The Hyperspec states that there is no guarantee on the number of times
> the sort key will be called.
> 
> What can be expected in practice from different implementations ?

Nothing.  Different implementations are free to call the sort key
function any number of times and change the number of calls at any
time.

However, it is the case that the sort key function *must* be called at
least once for each element.  It would be a truly poor implementation
that used an O(n^2) sorting algorithm and called the key function on
each comparison (such an implementation would be legal, though).

From: Frank A. Adrian
Subject: Re: sort :key
Date: Wed, 25 Sep 2002 04:29:08 +0000
Message-ID: <oYak9.60$z26.115369@news.uswest.net>

Joe Marshall wrote:

> However, it is the case that the sort key function *must* be called at
> least once for each element.  It would be a truly poor implementation
> that used an O(n^2) sorting algorithm and called the key function on
> each comparison (such an implementation would be legal, though).

Actually, if the algorithm first verified that any elements were out of 
order before sorting, the minimum you'd need is n-1 calls (elt 0 vs. elt 1, 
elt1 vs. elt 2, etc.).  Most algorithms, though, will call the key function 
O(n log n) times in the worst case.

This is actually a pretty interesting topic.  Given that you have the actual 
number of elements stored in a dope structure for arrays, you could vary 
the algorithm used based on sequence size.  Do any of the Lisps out there 
do this?  And you'd probably want to use something like a merge sort for a 
list sequence.  What are the sorting pragmatics of the various Lisp systems 
out there?

faa

From: Joe Marshall
Subject: Re: sort :key
Date: Wed, 25 Sep 2002 13:29:46 +0000
Message-ID: <r8fii4dx.fsf@ccs.neu.edu>

"Frank A. Adrian" <·······@ancar.org> writes:

> Joe Marshall wrote:
> 
> > However, it is the case that the sort key function *must* be called at
> > least once for each element.  It would be a truly poor implementation
> > that used an O(n^2) sorting algorithm and called the key function on
> > each comparison (such an implementation would be legal, though).
> 
> Actually, if the algorithm first verified that any elements were out of 
> order before sorting, the minimum you'd need is n-1 calls (elt 0 vs. elt 1, 
> elt1 vs. elt 2, etc.).  Most algorithms, though, will call the key function 
> O(n log n) times in the worst case.

The *compare* function would only need to be called N-1 times, but the
*key* function would still need to be called N times.

> This is actually a pretty interesting topic.  Given that you have the actual 
> number of elements stored in a dope structure for arrays, you could vary 
> the algorithm used based on sequence size.  Do any of the Lisps out there 
> do this?

I have seen code that specialized for very small arrays (n <= 3).

From: Jacek Generowicz
Subject: Re: sort :key
Date: Wed, 25 Sep 2002 16:53:07 +0000
Message-ID: <tyfu1ket3ik.fsf@pcitapi22.cern.ch>

Joe Marshall <···@ccs.neu.edu> writes:

> "Frank A. Adrian" <·······@ancar.org> writes:
> 
> > This is actually a pretty interesting topic.  Given that you have
> > the actual number of elements stored in a dope structure for
> > arrays, you could vary the algorithm used based on sequence size.

I think it would be useful to be able to vary the algorithm used based
on cost of key function ... but I guess that this would be very hard
to do without a major hint from the programmer.

> >  Do any of the Lisps out there do this?

This is the sort of discussion I was hoping for ...

> I have seen code that specialized for very small arrays (n <= 3).

From: Frank A. Adrian
Subject: Re: sort :key
Date: Thu, 26 Sep 2002 05:03:05 +0000
Message-ID: <dywk9.153$b93.213191@news.uswest.net>

Joe Marshall wrote:
> The *compare* function would only need to be called N-1 times, but the
> *key* function would still need to be called N times.

Duh.  Mea culpa.

faa

From: Thomas A. Russ
Subject: Re: sort :key
Date: Fri, 27 Sep 2002 00:52:49 +0000
Message-ID: <ymiu1kcckym.fsf@sevak.isi.edu>

Jacek Generowicz <················@cern.ch> writes:

> 
> The Hyperspec states that there is no guarantee on the number of times
> the sort key will be called.
> 
> What can be expected in practice from different implementations ?

Between N and 2*N^2 times for a sequence of length N.

The lower bound would be if the system were designed with the assumption
that key functions were expensive to compute, and that therefore call
the key function once on each element, and then sort the key and
original sequences together.  

The high end is for small sorts where it doesn't really make sense to
spend the overhead to use one of the fancier O(n log n) sorting
algorithms.  (Also, quicksort in the worst case).  Also, a factor of 2
is added in because a single comparison will call the key twice (unless
the one pass trick is used).

Anyway, some empirical data, using lists:

ACL 5.1:
  1000 random elements

17386  first trial
17472  second trial

MCL 4.3.1:
  1000 random elements

9714  first trial
9712  second trial

For reference, 2 * N log N is 13815.511

> Thanks,

-- 
Thomas A. Russ,  USC/Information Sciences Institute          ···@isi.edu

From: Vassil Nikolov
Subject: Re: sort :key
Date: Sat, 28 Sep 2002 07:11:39 +0000
Message-ID: <kzbit0q7fmc.fsf@panix2.panix.com>

···@sevak.isi.edu (Thomas A. Russ) writes:

[...]
> The lower bound would be if the system were designed with the assumption
> that key functions were expensive to compute, and that therefore call
> the key function once on each element, and then sort the key and
> original sequences together.  

If memory space and memory management is very cheap, and calling the
test and the key functions is expensive, a fancy implementation might
tabulate the keys in a vector of length N, and the test results in an
NxN matrix...  (A sparse matrix if we expect an almost sorted
sequence.)  In fact, that might not be too fancy if sorting a `remote'
sequence.

[...]
> Anyway, some empirical data, using lists:
> 
> ACL 5.1:
>   1000 random elements
> 
> 17386  first trial
> 17472  second trial
> 
> MCL 4.3.1:
>   1000 random elements
> 
> 9714  first trial
> 9712  second trial

I guess it is not by chance that all these numbers are even, and it
appears pretty likely that if the number of calls to the test function
were counted, it would be half of each.  Determining the better
implementation is left as an exercise for the statistically-minded
reader.

---Vassil.