sorting...

From: David Bakhash
Subject: sorting...
Date: Mon, 14 Dec 1998 00:00:00 +0000
Message-ID: <cxjww3vp7s9.fsf@engc.bu.edu>

Just wanted to know...

suppose that you have a list.  You want to sort it.  suppose that you
knew ahead of time that the list would be much closer to being sorted
if you reversed it.  Would it make sense to call sort on the reversed
list rather than the list itself?  would people think that this would
save time, or take more time?

thanks,
dave

Re: sorting... Gareth McCaughan
Re: sorting... Jens Kilian
Re: sorting... Kent M Pitman
Re: sorting... Thomas A. Russ
Re: sorting... R. Toy
Re: sorting... Erik Naggum
- Re: sorting... Erik Naggum
  - Re: sorting... Marco Antoniotti
- Re: sorting... Pierre Mai
- Re: sorting... Bernhard Pfahringer
  - Re: sorting... Erik Naggum

From: Gareth McCaughan
Subject: Re: sorting...
Date: Mon, 14 Dec 1998 00:00:00 +0000
Message-ID: <86k8zuizca.fsf@g.pet.cam.ac.uk>

David Bakhash wrote:

> suppose that you have a list.  You want to sort it.  suppose that you
> knew ahead of time that the list would be much closer to being sorted
> if you reversed it.  Would it make sense to call sort on the reversed
> list rather than the list itself?  would people think that this would
> save time, or take more time?

Totally dependent on the sorting method used by your implementation.
I'd expect it usually to take more time. (I just tried a single
test case using CMU CL, an array of 10^5 fixna with each being
on the order of 5 places out of reversed order, and the element
type not being known at compile time. In this particular case,
not reversing first was on the order of 10% faster.)

-- 
Gareth McCaughan       Dept. of Pure Mathematics & Mathematical Statistics,
·····@dpmms.cam.ac.uk  Cambridge University, England.

From: Jens Kilian
Subject: Re: sorting...
Date: Mon, 14 Dec 1998 00:00:00 +0000
Message-ID: <sfn24qn5bg.fsf@bstde026.bbn.hp.com>

David Bakhash <·····@bu.edu> writes:
> suppose that you have a list.  You want to sort it.  suppose that you
> knew ahead of time that the list would be much closer to being sorted
> if you reversed it.  Would it make sense to call sort on the reversed
> list rather than the list itself?  would people think that this would
> save time, or take more time?

Depends on the internals of the sorting algorithm.  For example, some variants
of mergesort (the algorithm of choice for linked lists) wouldn't benefit.
The version I'm thinking of builds pre-sorted runs of elements and merges
them, like this:

	Original list:  (1 2 7 8 6 5 4 3)
	Pre-sorted:	(1 2 7 8) (3 4 5 6)
	Merged:		(1 2 3 4 5 6 7 8)

Pre-sorting and merging are both O(N); the number of merges is O(log M),
where M is the number of pre-sorted lists, which is always <= N/2.

Bye,
	Jens.
-- 
··········@acm.org                 phone:+49-7031-14-7698 (HP TELNET 778-7698)
  http://www.bawue.de/~jjk/          fax:+49-7031-14-7351
PGP:       06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish,
0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]

From: Kent M Pitman
Subject: Re: sorting...
Date: Mon, 14 Dec 1998 00:00:00 +0000
Message-ID: <sfwd85mhcqr.fsf@world.std.com>

David Bakhash <·····@bu.edu> writes:

> Just wanted to know...
> 
> suppose that you have a list.  You want to sort it.  suppose that you
> knew ahead of time that the list would be much closer to being sorted
> if you reversed it.  Would it make sense to call sort on the reversed
> list rather than the list itself?  would people think that this would
> save time, or take more time?

Well, reversing a list is a linear time operation.  If there's already
a linear factor involved, and it's hard to imagine a sort operation
faster than that unless the object to be sorted is maintained in a
format that knows it's pre-sorted (beyond the scope of any Lisp
datastructure or the predefined SORT function lisp offers, but not, I
suppose, beyond the scope of data structures and sort operators you
can define yourself).

So the question comes down to the one of whether being mostly (or even
completely) sorted is an aid to your sort function.  As I recall, some
sort functions are really fast if the list is almost sorted and others
are not.  It's possible, though I don't recall, that some are actually
slower if the list is almost or fully sorted.  But the point is, it
probably makes sense from a theoretical efficiency perspective to
reverse the list before sorting if (a) the sort algorithm is know to
be positively affected by having done the reverse, and (b) the list is
usually long enough that algorithmic efficiency (rather than constant
factors) dominates.  For the Lisp SORT function, I don't think you know
enough information to use either of these factoids, so I don't know what
to advise you--if performance isn't critical, the reverse is probably
not worth it.  If it is critical, I'd either hand-code a sort function
whose characteristics I knew or else I'd contact the specific vendors 
involved and get their assurance that it was doing what I thought it was.
Either way, I'd put in a BIG comment saying what I was doing. e.g.,

 ;; PERFORMANCE NOTE: The list X comes in mostly sorted in reverse.
 ;; If performance becomes a bottleneck here, it might be worth
 ;; reversing it before sorting it.
 (SETQ X (SORT X PRED))

or

 ;; KLUDGE ALERT: The list X comes in mostly sorted in reverse.
 ;; Because we're running in the ACME implementation, we happen to 
 ;; know SORT here will work faster if we first NREVERSE the list
 ;; here. If we move to other implementations, that assumption
 ;; should be reconsidered!
 (SETQ X (SORT (NREVERSE X) PRED))

If you're looking for the overriding meta-rules, I'd say it's that
they're these:

(1) performance tuning is a science.  there's a rule i think
    from poker that might be useful here.  they say that in
    every hand of poker there's a sucker, and if you don't know
    who it is, it's probably you.  so if you don't find yourself
    doing science to figure out your performance, you might
    consider not trying.  a lot of performance tuning by dabblers
    situations worse and obfuscates code to no good end.

(2) good, clean documentation is often enough.  and properly
    applied it will allow you to later dig out of small messes
    you may make.  failure to add such will leave you with
    unintelligible pieces of code that didn't make sense at all
    in the first place, require mystic mumbo jumbo to explain,
    and that you're afraid to change in the future.

From: Thomas A. Russ
Subject: Re: sorting...
Date: Mon, 14 Dec 1998 00:00:00 +0000
Message-ID: <ymipv9m60d5.fsf@sevak.isi.edu>

This really depends on the sorting algorithm used.  If you call the
built-in SORT function, then you don't really have a lot of control or
knowledge about exactly what algorithm they used.

There are algorithms that work particularly well on nearly sorted
input.  Straight insertion sort is one example -- but it is a very poor
choice unless the input is nearly sorted.

There are other algorithms that can work particularly poorly on
input that is in reverse sorted order and some that can be very poor on
sorted order input.  A Quicksort that uses the first element as the
partition element will have N^2 behavior with sorted input.

There are other algorithms that aren't affected by input order.




-- 
Thomas A. Russ,  USC/Information Sciences Institute          ···@isi.edu

From: R. Toy
Subject: Re: sorting...
Date: Tue, 15 Dec 1998 00:00:00 +0000
Message-ID: <36773E1A.A4E970DE@mindspring.com>

David Bakhash wrote:
> 
> Just wanted to know...
> 
> suppose that you have a list.  You want to sort it.  suppose that you
> knew ahead of time that the list would be much closer to being sorted
> if you reversed it.  Would it make sense to call sort on the reversed
> list rather than the list itself?  would people think that this would
> save time, or take more time?
> 

How about this:  rather than reversing the list, complement the sorting
predicate.  So if you wanted the list in ascending order, sort the list
in descending order.  If the list must be in ascending order, then this
doesn't help.

If in doubt, always profile.

-- 
---------------------------------------------------------------------------
----> Raymond Toy	····@mindspring.com
                        http://www.mindspring.com/~rtoy

From: Erik Naggum
Subject: Re: sorting...
Date: Tue, 15 Dec 1998 00:00:00 +0000
Message-ID: <3122671946890928@naggum.no>

* David Bakhash <·····@bu.edu>
| suppose that you have a list.  You want to sort it.  suppose that you
| knew ahead of time that the list would be much closer to being sorted
| if you reversed it.  Would it make sense to call sort on the reversed
| list rather than the list itself?  would people think that this would
| save time, or take more time?

  this is all wrong as an approach to improving performance.  instead,
  study the available sort algorithms in the considerable literature and
  implement one that fits your data after you have discovered that the
  sorting time is significant to the rest of your work and worth fixing.

  I do (sort (cons <element> <sorted-list>) <test>) when adding one element
  at a time to a priority list.  profiling shows that it is irrelevant to
  the execution time of the program, although I'd bet that some form of
  programmer instinct would be triggered by this particular example and
  would try to rewrite it.

#:Erik
-- 
  man who cooks while hacking eats food that has died twice.

From: Erik Naggum
Subject: Re: sorting...
Date: Tue, 15 Dec 1998 00:00:00 +0000
Message-ID: <3122754335784936@naggum.no>

* Marco Antoniotti <·······@copernico.parades.rm.cnr.it>
| Ahem!

  :)

| Insertion in a priority queue should be O(lg n).  Most likely, you are
| doing at least O(n lg n) with ...

  the priority queues are supposed to be depleted very quickly, and have an
  average length of 1.2 elements before sorting.  when a particular queue
  gets stuck, it normally gets unstuck before the length reaches 10.  when
  a queue is stuck for a long time, which happens to about 1% of the queues
  over the course of a week, the cost of monitoring it dwarfes the cost of
  the insertions.  so the sorting time really is irrelevant.

  it's useful as an example of when a simple-minded algorithm is sufficient
  because other parts of the system profit more from optimization than this
  particular part.  a long queue length really is an important problem, and
  making the priority queue insert fast is irrelevant in the big picture.
  popping off the highest-priority element when the queue starts to drain,
  however, needs to be very fast and simple because in the event that the
  queue has gotten quite long, releasing it is a serious hit to the system
  all at once, compared to the very well distributed O(n lg n) costs of the
  insertions.

  I actually found it quite amusing that I was somewhat unnerved by this
  code and spent a lot of real time testing whether it was a bottle-neck,
  returning to it time and again to see if it could be fixed.  however, as
  is quite often the case with optimizations, my hunch was all wrong, and
  it was way more important to make deletion from the queue O(1).

| Here is the solution.  It is old code and I will be willing to include
| any changes and to change the copyright.

  thanks.  I appreciate it.

#:Erik
-- 
  man who cooks while hacking eats food that has died twice.

From: Marco Antoniotti
Subject: Re: sorting...
Date: Wed, 16 Dec 1998 00:00:00 +0000
Message-ID: <lwyao8jw4f.fsf@copernico.parades.rm.cnr.it>

Erik Naggum <····@naggum.no> writes:

> * Marco Antoniotti <·······@copernico.parades.rm.cnr.it>
> | Ahem!
> 
>   :)
> 
> | Insertion in a priority queue should be O(lg n).  Most likely, you are
> | doing at least O(n lg n) with ...
> 
>   the priority queues are supposed to be depleted very quickly, and have an
>   average length of 1.2 elements before sorting.  when a particular queue
>   gets stuck, it normally gets unstuck before the length reaches 10.  when
>   a queue is stuck for a long time, which happens to about 1% of the queues
>   over the course of a week, the cost of monitoring it dwarfes the cost of
>   the insertions.  so the sorting time really is irrelevant.

	...

>   is quite often the case with optimizations, my hunch was all wrong, and
>   it was way more important to make deletion from the queue O(1).

	...

Of course. This is what you really need to do. As a matter of fact,
is it worth paying the O(lg n) insertion/deletion price only if you
need a change-key operation.

> 
> | Here is the solution.  It is old code and I will be willing to include
> | any changes and to change the copyright.
> 
>   thanks.  I appreciate it.
> 

You are welcome.

Cheers

-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - (0)6 - 68 10 03 17, fax. +39 - (0)6 - 68 80 79 26
http://www.parades.rm.cnr.it

From: Pierre Mai
Subject: Re: sorting...
Date: Tue, 15 Dec 1998 00:00:00 +0000
Message-ID: <87k8zt6zan.fsf@orion.dent.isdn.cs.tu-berlin.de>

Marco Antoniotti <·······@copernico.parades.rm.cnr.it> writes:

> >   I do (sort (cons <element> <sorted-list>) <test>) when adding one element
> >   at a time to a priority list.  profiling shows that it is irrelevant to
> >   the execution time of the program, although I'd bet that some form of
> >   programmer instinct would be triggered by this particular example and
> >   would try to rewrite it.
> 
> Ahem! Insertion in a priority queue should be O(lg n). Most likely,
> you are doing at least O(n lg n) with
> 
> 	(sort (cons <element> <sorted-list>) <test>)

Whereby you have just lost Erik's bet ;) The point (I think) Erik was
trying to make is that without profiling, you won't know whether it is
even worth _thinking_ about performance-issues in a piece of code.
And since Erik's profiling has shown that sort is good enough for his
particular program, it doesn't make any sense implementing any form of
heap or other advanced data-structure.

The solution with sort is faster to code, safer, better to maintain,
easier to document, etc.  Switching to a more advanced algorithm
and/or data-structure is only worth it, if profiling and/or
anticipated usage patterns indicate that sort will be a bottle-neck
here. 

All of your other arguments are all true, and if one could disregard
implementation complexity, heaps (or splay-trees, or any of the other
100 solutions for priority-queues) would always be the better
solutions than sort.  But one can't disregard implementation
complexity, since the effort spent on one part of a project will
always detract from the effort available for other parts of a project.

Regs, Pierre.

PS: Re-use does of course alleviate the problem of implementation
complexity somewhat.  But re-use brings with it another can of
worms...

-- 
Pierre Mai <····@acm.org>               http://home.pages.de/~trillian/
  "One smaller motivation which, in part, stems from altruism is Microsoft-
   bashing." [Microsoft memo, see http://www.opensource.org/halloween1.html]

From: Bernhard Pfahringer
Subject: Re: sorting...
Date: Tue, 15 Dec 1998 00:00:00 +0000
Message-ID: <755fqd$25ke$1@www.univie.ac.at>

In article <················@naggum.no>, Erik Naggum  <····@naggum.no> wrote:
>
>  I do (sort (cons <element> <sorted-list>) <test>) when adding one element
>  at a time to a priority list.  profiling shows that it is irrelevant to
>  the execution time of the program, although I'd bet that some form of
>  programmer instinct would be triggered by this particular example and
>  would try to rewrite it.
>
Yep, why not use:

 (merge 'list (list <element>) <sorted-list> <test>)

Not much of a rewrite, O(n), might not bite as early for larger priority lists.

Bernhard
-- 
--------------------------------------------------------------------------
Bernhard Pfahringer
Austrian Research Institute for  http://www.ai.univie.ac.at/~bernhard/
Artificial Intelligence          ········@ai.univie.ac.at

From: Erik Naggum
Subject: Re: sorting...
Date: Tue, 15 Dec 1998 00:00:00 +0000
Message-ID: <3122754465238243@naggum.no>

* ········@korb.ai.univie.ac.at (Bernhard Pfahringer)
| Yep, why not use: (merge 'list (list <element>) <sorted-list> <test>).
| Not much of a rewrite, O(n), might not bite as early for larger priority
| lists.

  that's actually very useful advice that doesn't cost anything to adopt.
  thank you.

#:Erik
-- 
  man who cooks while hacking eats food that has died twice.