From: Kenny
Subject: Top Ten List?
Date: 
Message-ID: <48d4848b$0$4901$607ed4bc@cv.net>
Is there a name for this?

Suppose you have a gigantic list and need the top N (by any arbitrary 
key returning a numeric evaluation).

One could sort the whole list descending and subseq 0 N it, but the one 
is sorting (- gigantic N) for no reason.

Is there a name for this?

kxo

From: lisp linux
Subject: Re: Top Ten List?
Date: 
Message-ID: <wfGdnf3dRJ1TAknVnZ2dnUVZ_hadnZ2d@comcast.com>
Kenny wrote:
> Is there a name for this?
> 
> Suppose you have a gigantic list and need the top N (by any arbitrary 
> key returning a numeric evaluation).
> 
> One could sort the whole list descending and subseq 0 N it, but the one 
> is sorting (- gigantic N) for no reason.
> 
> Is there a name for this?
> 
> kxo
'prioriy queue'
It's a common problem for search (as in full text search kind), you rank the results while walking 
the results, but want to retain only the top N
-Antony
From: Alex Mizrahi
Subject: Re: Top Ten List?
Date: 
Message-ID: <48d4cdc8$0$90272$14726298@news.sunsite.dk>
 ll> 'prioriy queue'
 ll> It's a common problem for search (as in full text search kind), you
 ll> rank the results while walking the results, but want to retain only the
 ll> top N -Antony

it's not the optimal way to do selection because priority queue management
has considerable overhead on each operation 
From: John Thingstad
Subject: Re: Top Ten List?
Date: 
Message-ID: <op.uhr0mapqut4oq5@pandora.alfanett.no>
P� Sat, 20 Sep 2008 12:17:35 +0200, skrev Alex Mizrahi  
<········@users.sourceforge.net>:

>  ll> 'prioriy queue'
>  ll> It's a common problem for search (as in full text search kind), you
>  ll> rank the results while walking the results, but want to retain only  
> the
>  ll> top N -Antony
>
> it's not the optimal way to do selection because priority queue  
> management
> has considerable overhead on each operation
>
>

I don't think it is as bad as you think. You only need to rank the top 10.  
If the score is higher than the lowest of the 10 then add it and boot the  
lowest value. After itererating through a few values the lowest value will  
change rather rarely.

--------------
John Thingstad
From: John Thingstad
Subject: Re: Top Ten List?
Date: 
Message-ID: <op.uhr4slf3ut4oq5@pandora.alfanett.no>
P� Sat, 20 Sep 2008 12:55:12 +0200, skrev John Thingstad  
<·······@online.no>:

>
> I don't think it is as bad as you think. You only need to rank the top  
> 10. If the score is higher than the lowest of the 10 then add it and  
> boot the lowest value. After itererating through a few values the lowest  
> value will change rather rarely.
>
> --------------
> John Thingstad

Something along the lines:

(defmacro aif (pred then-part &optional else-part)
   `(let ((it ,pred))
      (if it
        ,then-part
        ,else-part)))

(defun position-score (array number)
   (aif (position-if (lambda (current) (>= number current)) array) it 0))

(defun nshift-right (array index)
   (loop for current from (- (length array) 2) downto index
         do (setf (aref array (1+ current)) (aref array current)))
   array)

(defun add-score (array number)
   (let ((pos (position-score array number)))
     (nshift-right array pos)
     (setf (aref array pos) number)
     array))

(defun min-score (array)
   (aref array (1- (length array))))

(defun make-score-board (size)
   (make-array size :element-type 'fixnum :initial-element 0))

(defun test ()
   (let* ((score-board-size 10) (example-size 100)
          (test-data (make-array example-size :initial-contents
                                 (loop repeat example-size collect (random  
example-size))))
          (board (make-score-board score-board-size)))
     (loop for current across test-data
           when (> current (min-score board)) do (add-score board current))
     (values test-data board)))

CL-USER 78 > (test)
#(19 2 79 90 94 8 84 16 69 59 7 72 88 79 16 50 47 83 84 6 52 49 37 83 8 81  
71 36 41 81 64 45 81 64 21 64 88 36 51 39 41 57 78 1 49 92 59 1 73 74 92  
18 94 88 64 83 43 85 58 70 94 41 8 0 64 24 63 2 82 83 48 30 58 98 70 17 99  
21 91 98 30 68 11 75 79 27 81 2 19 27 90 18 34 87 71 15 50 45 40 65)
#(99 98 98 94 94 94 92 92 91 90)

--------------
John Thingstad
From: Pascal J. Bourguignon
Subject: Re: Top Ten List?
Date: 
Message-ID: <87myi3h0cc.fsf@hubble.informatimago.com>
Kenny <·········@gmail.com> writes:


> Is there a name for this?
>
> Suppose you have a gigantic list and need the top N (by any arbitrary
> key returning a numeric evaluation).
>
> One could sort the whole list descending and subseq 0 N it, but the
> one is sorting (- gigantic N) for no reason.
>
> Is there a name for this?

http://en.wikipedia.org/wiki/Selection_algorithm

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"Debugging?  Klingons do not debug! Our software does not coddle the
weak."
From: Madhu
Subject: Re: Top Ten List?
Date: 
Message-ID: <m38wtnx97a.fsf@moon.robolove.meer.net>
* Kenny <························@cv.net> :
Wrote on Sat, 20 Sep 2008 01:05:15 -0400:

| Is there a name for this?

theres the precedent in UNIX for `head' to mean the first 10 lines of a
file.  
| Suppose you have a gigantic list and need the top N (by any arbitrary
| key returning a numeric evaluation).
|
| One could sort the whole list descending and subseq 0 N it, but the
| one is sorting (- gigantic N) for no reason.

Just keep a sorted list of size n. Implementation below

| Is there a name for this?

head is great.

(defun head (list &optional (predicate #'<) &key (key #'identity) (n 10))
  (let (clist (clength 0))
    (map nil (lambda (elem)
                 (let ((x (funcall key elem)))
                   (cond ((or (endp clist)
                              (or (null n) (< clength n))
                              (funcall predicate x
                                       (car (nthcdr (1- n) clist))))
                          (setq clist (insert-ordered elem clist predicate))
                          (cond ((< clength n) (incf clength))
                                (t (setq clist (nbutlast clist 1))))))))
         list)
    clist))


To reinforce the comp.lang.lisp "Give them exactly what they asked for
no matter how stupid the question" sterotype, here's an insert-ordered
implementation from my init-file to complete the job. [if you use it,
comments welcome]

(defun insert-ordered (item list &optional (predicate #'<)
                       &key key test test-not)
  "Insert ITEM in LIST. may modify LIST. LIST is assumed to be
sorted by PREDICATE. If TEST (or TEST-NOT) is non-NIL, it is used
to determine if ITEM is present (or absent) in LIST, in which
case it is not inserted.  KEY if non-NIL is applied to both ITEM
and elements of LIST before comparing them. However, unlike
section 17.2.1 of the spec, if both TEST and TEST-NOT are NIL, no
test is made."
  (if (endp list)
      (list item)
      (macrolet ((apply-key (key element)
                   `(if ,key
                        (funcall ,key ,element)
                        ,element)))
        (let ((item-key (apply-key key item))
              (obj-key (apply-key key (car list))))
          (cond ((cond (test (funcall test item-key obj-key))
                       (test-not (not (funcall test-not item-key obj-key))))
                 list)
                ((funcall predicate item-key obj-key) (cons item list))
                (t (loop for prev-cons = list then cons for cons on (cdr list)
                         do (setq obj-key (apply-key key (car cons)))
                         if (cond (test (funcall test item-key obj-key))
                                  (test-not
                                   (not (funcall test-not item-key obj-key))))
                         return list
                         else if (funcall predicate item-key obj-key)
                         do (loop-finish)
                         finally
                         (setf (cdr prev-cons) (cons item cons))
                         (return list))))))))
From: Madhu
Subject: Re: Top Ten List?
Date: 
Message-ID: <m34p4bx78l.fsf@moon.robolove.meer.net>
* Madhu <··············@moon.robolove.meer.net> :
Wrote on Sat, 20 Sep 2008 18:31:29 +0530:

| (defun head (list &optional (predicate #'<) &key (key #'identity) (n 10))
|   (let (clist (clength 0))
|     (map nil (lambda (elem)
|                  (let ((x (funcall key elem)))
|                    (cond ((or (endp clist)
|                               (or (null n) (< clength n))
|                               (funcall predicate x
|                                        (car (nthcdr (1- n) clist))))
|                           (setq clist (insert-ordered elem clist predicate))
|                           (cond ((< clength n) (incf clength))
|                                 (t (setq clist (nbutlast clist 1))))))))
|          list)
|     clist))
|

I noticed I didn't remove the key keyword argument completely before
posting. With it, it should've read

(defun head (list &optional (predicate #'<) &key (key #'identity) (n 10))
  (let (clist (clength 0))
    (map nil (lambda (elem)
		 (let ((x (funcall key elem)))
		   (cond ((or (endp clist)
			      (or (null n) (< clength n))
			      (funcall predicate x
				       (funcall key
						(car (nthcdr (1- n) clist)))))
			  (setq clist (insert-ordered elem clist predicate
						      :key key))
			  (cond ((< clength n) (incf clength))
				(t (setq clist (nbutlast clist 1))))))))
	 list)
    clist)) 

[A slightly "more clever" approach would be to use
 	(complement predicate)
 and chop CLIST with POP, while still doing the same "naive"
 INSERT-ORDERED on cons-represented list]

| To reinforce the comp.lang.lisp "Give them exactly what they asked for
| no matter how stupid the question" sterotype,| here's an insert-ordered
| implementation from my init-file to complete the job. [if you use it,
| comments welcome]

That sentence is messed up too, its not that the question is stupid but
the appropriateness of the correct answer
--
Madhu
From: Thomas A. Russ
Subject: Re: Top Ten List?
Date: 
Message-ID: <ymiabe0rwqi.fsf@blackcat.isi.edu>
Kenny <·········@gmail.com> writes:

> Is there a name for this?
> 
> Suppose you have a gigantic list and need the top N (by any arbitrary
> key returning a numeric evaluation).
> 
> One could sort the whole list descending and subseq 0 N it, but the one
> is sorting (- gigantic N) for no reason.

I suppose you could get some speed-up from doing this, although from an
informaiton theoretic viewpoint, your performance gain is somewht
limited.

In the case of sorting N items and then choosing the top K, you would
have an algorithm that performs in [O(n log n) + O(k)], and since we
assume  K << N, that means sorting dominates and you have O(n log n).

For doing a more limited selection algorithm, you still have to examine
all N items.  And you have to compare them against the current set of K
best items.  So a naive algorithm would require O(n * k) comparisons.
Assuming you have the K best items already sorted, you can improve on
that using binary search and end up with O(n log k).  This will provide
a bit of an improvement, since (log k) is less than (log n), but the
difference would be smaller because of the logarithmic term.

Now in practice, reducing the multiplier can still yield a reasonable
speed up.  For example, if choosing the top 10 out of 1,000,000
elemeents, the selection algorithm could be expected to be roughly 6
times faster.  On modern processors this would likely be an even greater
boost, since the relatively small number of selections that one is
always testing against would remain in the data cache, and the large
1,000,000 entity base would only need to be traversed once and then
accessed again.


-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Raffael Cavallaro
Subject: Re: Top Ten List?
Date: 
Message-ID: <gb8qt1$fcp$1@aioe.org>
On 2008-09-22 12:06:29 -0400, ···@sevak.isi.edu (Thomas A. Russ) said:

> Now in practice, reducing the multiplier can still yield a reasonable
> speed up.  For example, if choosing the top 10 out of 1,000,000
> elemeents, the selection algorithm could be expected to be roughly 6
> times faster.  On modern processors this would likely be an even greater
> boost, since the relatively small number of selections that one is
> always testing against would remain in the data cache, and the large
> 1,000,000 entity base would only need to be traversed once and then
> accessed again.

For example, I get an order of magnitude speed up with both LispWorks 
and sbcl (both Mac Intel 32-bit) by caching the current top n instead 
of sorting the entire list.

CL-USER 15 > (let ((test-data
                    (loop repeat 1000000 collect (random 1000000))))
                   (time (naive-top-n 10 test-data #'>))
                   (time (top-n 10 test-data #'>)))
Timing the evaluation of (NAIVE-TOP-N 10 TEST-DATA (FUNCTION >))

User time    =        2.440
System time  =        0.008
Elapsed time =        2.499
Allocation   = 1772 bytes
0 Page faults
Timing the evaluation of (TOP-N 10 TEST-DATA (FUNCTION >))

User time    =        0.203
System time  =        0.001
Elapsed time =        0.222
Allocation   = 3424 bytes
0 Page faults
(999999 999998 999998 999997 999996 999996 999996 999996 999995 999995)

where:

(defun top-n (n some-list pred &key (key #'identity))
  (let* ((current-top-n
          (sort
           (loop repeat n
                 for elt in some-list
                 collect elt)
           (complement pred) :key key)))
    (when (< (length current-top-n) n)
      (return-from top-n (nreverse current-top-n)))
    (loop for elt in (nthcdr n some-list)
          when (funcall pred
                        (funcall key elt)
                        (funcall key (car current-top-n)))
          do (setf current-top-n
                   (sort (cons elt (cdr current-top-n))
                         (complement pred) :key key))
          finally (return (nreverse current-top-n)))))

(defun naive-top-n (n some-list pred &key (key #'identity))
  (subseq (sort some-list pred :key key) 0 n))
From: Timofei Shatrov
Subject: Re: Top Ten List?
Date: 
Message-ID: <48d8c3be.267378129@news.motzarella.org>
On Sat, 20 Sep 2008 01:05:15 -0400, Kenny <·········@gmail.com> tried to confuse
everyone with this message:

>Is there a name for this?

I thought this post would be about the top ten Lisp applications. Then I
remembered that we don't do applications.


-- 
|Don't believe this - you're not worthless              ,gr---------.ru
|It's us against millions and we can't take them all... |  ue     il   |
|But we can take them on!                               |     @ma      |
|                       (A Wilhelm Scream - The Rip)    |______________|