From: ·········@gmail.com
Subject: Optimize a sequence function
Date: 
Message-ID: <1139653063.441056.41980@g44g2000cwa.googlegroups.com>
Hello I would like a little help optimizing this function:

(defun get-set (set seq)
  (let* ((type (type-of set))
         (type (if (listp type) (car type) 'list)))
    (map type (lambda (x) (elt set x)) seq)))

I can use it like this:
(get-set #(a b c d e) '(0 1 2))
-> #(A B C)

or:
(get-set '(a b c d e) '(0 1 2))
-> (A B C)

or:
(get-set "abcde" '(0 1 2))
-> "abc"

However, in some algorithm I did, this function is the one where 90% of
time is spent.
If there is any help you can provide me to optimize it, I will thank
you.

From: Alan Manuel K. Gloria
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <1139665240.313016.323950@g43g2000cwa.googlegroups.com>
First, in your algorithm, are you using lists, vectors/arrays, or
strings?

I think on most implementations the fastest access is vectors/arrays,
possibly followed by strings and finally lists at the bottom.

If you're using lists my suggestion is that you *don't* optimize this
function and instead optimize the type of data you pass to it.  If you
need random access, use vectors/arrays - they have the advantage in
access time! (most of the time - some compilers I think are so smart
they convert lists to arrays)

If you *are* already using vectors/arrays then you might want to
rewrite the function to do vector access directly, possibly the elt is
the slowdown.
From: Alan Crowe
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <86k6c2t5gn.fsf@cawtech.freeserve.co.uk>
··········@gmail.com" <·········@gmail.com> writes:

> Hello I would like a little help optimizing this function:
> 
> (defun get-set (set seq)
>   (let* ((type (type-of set))
>          (type (if (listp type) (car type) 'list)))
>     (map type (lambda (x) (elt set x)) seq)))
> 
> I can use it like this:
> (get-set #(a b c d e) '(0 1 2))
> -> #(A B C)
> 
> or:
> (get-set '(a b c d e) '(0 1 2))
> -> (A B C)
> 
> or:
> (get-set "abcde" '(0 1 2))
> -> "abc"
> 
> However, in some algorithm I did, this function is the one where 90% of
> time is spent.
> If there is any help you can provide me to optimize it, I will thank
> you.

My guess would be that this is eating your computer time
budget specifically in the case when the input set is a
list.

I suggest that the diagnostic test to run is to use get-set
for copying lists. ie

(get-set '(a b c d e) '(0 1 2 3 4))

The use of elt with a list causes the computer to traverse
the list from the start to find the nth element over and
over again. I think you will find that when you use get-set
to copy a list the run time is proportional to the square of
the length of the list ie quadratic time

One way to tackle the problem is to coerce the list to be a
vector ie (coerce set 'vector). Then elt (which could be
just aref or svref) will run in constant time and get-set
will copy a list in linear time. This approach creates
garbage, COERCE is allocating storage for the vector which
is soon discarded. My guess is that allocating and
collecting a short lived vector is very cheap and will solve
your performance problems.

Alan Crowe
Edinburgh
Scotland
From: ·········@gmail.com
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <1139668039.930242.169190@z14g2000cwz.googlegroups.com>
I thought the same, however the speed difference when using lists and
vectors is not that big as I though it should be.

May be is type checking the slowdown ?

Then the solution is to use declare statements.

I posted in the group because it is not even obvious to me if this
function can be really optimized much.
From: Pascal Costanza
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <456f7cF57c5sU1@individual.net>
·········@gmail.com wrote:
> I thought the same, however the speed difference when using lists and
> vectors is not that big as I though it should be.
> 
> May be is type checking the slowdown ?
> 
> Then the solution is to use declare statements.

Note that in method definitions, the class on which a parameter is 
specialized is very likely also used as a type declaration when it can 
be proven that the respective variable is never changed (which should be 
almost always the case).

> I posted in the group because it is not even obvious to me if this
> function can be really optimized much.

The call of the map function also looks suspicious: It has to use a 
generic version because the first parameter is not a quoted form. If the 
first parameter were quoted, it could use some specialized code. 
Furthermore, with the map function the code has to create a closure 
which may add further costs.

Note that I am only speculating here, but I would probably try to 
replace this with an iterative version as well and do some benchmarks. 
In my experience, you can typically get a good idea what's going on when 
you compare the speed of different alternatives.


Pascal

-- 
My website: http://p-cos.net
Closer to MOP & ContextL:
http://common-lisp.net/project/closer/
From: ···············@yahoo.com
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <1139866460.168397.256170@g14g2000cwa.googlegroups.com>
Pascal Costanza wrote:
> Note that in method definitions, the class on which a parameter is
> specialized is very likely also used as a type declaration when it can
> be proven that the respective variable is never changed (which should be
> almost always the case).

Do you mean the compiler is likely to put in the type declaration for
you, or are you remarking that programmers should put it in themselves
(assuming the variable is never changed)?  I've often wished that the
compiler would do it.

(defmethod factor ((x integer))
  (declare (integer x)) ;; boring...
  (super-number-theory-factorization x))

(defmethod factor ((x polynomial))
  ...)
From: Pascal Costanza
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <4565iaF4hpbdU1@individual.net>
·········@gmail.com wrote:
> Hello I would like a little help optimizing this function:
> 
> (defun get-set (set seq)
>   (let* ((type (type-of set))
>          (type (if (listp type) (car type) 'list)))
>     (map type (lambda (x) (elt set x)) seq)))
> 
> I can use it like this:
> (get-set #(a b c d e) '(0 1 2))
> -> #(A B C)
> 
> or:
> (get-set '(a b c d e) '(0 1 2))
> -> (A B C)
> 
> or:
> (get-set "abcde" '(0 1 2))
> -> "abc"
> 
> However, in some algorithm I did, this function is the one where 90% of
> time is spent.
> If there is any help you can provide me to optimize it, I will thank
> you.

The elt function looks suspicious. It is defined both for lists and for 
vectors, but for lists it is in general a bad idea to use it, since it 
can only iterate through the list until it arrives at the element you 
want to access.

I would separate the two cases and handle them differently. A generic 
function is a good idea to use here:

(defgeneric get-set (set seq))

(defmethod get-set ((set vector) seq)
   (let* ((type (type-of set))
          (type (if (listp type) (car type) type)))
     (map type (lambda (x) (elt set x)) seq)))

(defmethod get-set ((set cons) seq)
   (loop while seq
         for element in set
         for counter from 0
         when (eql counter (car seq))
         collect element
         and do (setq seq (cdr seq))))

The second method assumes that seq is sorted.


Pascal

-- 
My website: http://p-cos.net
Closer to MOP & ContextL:
http://common-lisp.net/project/closer/
From: Pascal Costanza
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <4565puF4hpbdU2@individual.net>
Pascal Costanza wrote:
> ·········@gmail.com wrote:
> 
>> Hello I would like a little help optimizing this function:
>>
>> (defun get-set (set seq)
>>   (let* ((type (type-of set))
>>          (type (if (listp type) (car type) 'list)))
>>     (map type (lambda (x) (elt set x)) seq)))
>>
>> I can use it like this:
>> (get-set #(a b c d e) '(0 1 2))
>> -> #(A B C)
>>
>> or:
>> (get-set '(a b c d e) '(0 1 2))
>> -> (A B C)
>>
>> or:
>> (get-set "abcde" '(0 1 2))
>> -> "abc"
>>
>> However, in some algorithm I did, this function is the one where 90% of
>> time is spent.
>> If there is any help you can provide me to optimize it, I will thank
>> you.
> 
> 
> The elt function looks suspicious. It is defined both for lists and for 
> vectors, but for lists it is in general a bad idea to use it, since it 
> can only iterate through the list until it arrives at the element you 
> want to access.
> 
> I would separate the two cases and handle them differently. A generic 
> function is a good idea to use here:
> 
> (defgeneric get-set (set seq))
> 
> (defmethod get-set ((set vector) seq)
>   (let* ((type (type-of set))
>          (type (if (listp type) (car type) type)))
>     (map type (lambda (x) (elt set x)) seq)))

Sorry, that should have been (aref set x) here.

> (defmethod get-set ((set cons) seq)
>   (loop while seq
>         for element in set
>         for counter from 0
>         when (eql counter (car seq))
>         collect element
>         and do (setq seq (cdr seq))))
> 
> The second method assumes that seq is sorted.

Maybe there's also a way to optimize the first method based on that 
assumption.


Pascal

-- 
My website: http://p-cos.net
Closer to MOP & ContextL:
http://common-lisp.net/project/closer/
From: ·········@gmail.com
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <1139667421.934205.271370@g44g2000cwa.googlegroups.com>
>> The second method assumes that seq is sorted.

>Maybe there's also a way to optimize the first method based on that
>assumption.

Your assumption is correct, seq is always sorted.

=)

The type of set is known before the call to get-set, it can even be
coded away because the calling function is a closure, so an optimized
version based in each type of sequence is very probably the way to go.

I like the elegance of the generic functions, I just never used them
before.
From: Wade Humeniuk
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <V6pHf.2125$Be4.1440@clgrps13>
·········@gmail.com wrote:

> 
> However, in some algorithm I did, this function is the one where 90% of
> time is spent.
> If there is any help you can provide me to optimize it, I will thank
> you.
> 

I am not sure you even need this function.  When you build the second
arg seq, how is it done?  In deciding which elts to get don't you
have to traverse the set to begin with?  (So instead of building a
seq with the indexes, why not the elts?)

Wade
From: Nicolay Giraldo
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <1139681363.522034.194590@g44g2000cwa.googlegroups.com>
Wade Humeniuk wrote:
> ·········@gmail.com wrote:
>
> >
> > However, in some algorithm I did, this function is the one where 90% of
> > time is spent.
> > If there is any help you can provide me to optimize it, I will thank
> > you.
> >
>
> I am not sure you even need this function.  When you build the second
> arg seq, how is it done?  In deciding which elts to get don't you
> have to traverse the set to begin with?  (So instead of building a
> seq with the indexes, why not the elts?)
>
> Wade

No, seq is generated by arithmetic manipulation. So instead of
traversing a big list of sequences I can say: give me the 749th
sequence of the set and I get just that, using very few memory, very
fast. Well, except for this function.   ^_^
From: Wade Humeniuk
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <vFvHf.2237$Be4.954@clgrps13>
Nicolay Giraldo wrote:

> 
> No, seq is generated by arithmetic manipulation. So instead of
> traversing a big list of sequences I can say: give me the 749th
> sequence of the set and I get just that, using very few memory, very
> fast. Well, except for this function.   ^_^
> 

As people have pointed out with the list problem, try this ..
I am assuming that seq is always a list.

(defun get-set (set seq)
   (etypecase set
     (list (loop with positions = (sort seq #'<)
                 for position from 0
                 for elt in set
                 while positions
                 when (= position (car positions)) collect elt
                 and do (pop positions)))
     (vector (let ((result (subseq set 0 (length seq))))
               (map-into result (lambda (x) (aref set x)) seq)))))

Wade
From: Pascal Bourguignon
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <87y80g1q1v.fsf@thalassa.informatimago.com>
Wade Humeniuk <··················@telus.net> writes:

> Nicolay Giraldo wrote:
>
>> No, seq is generated by arithmetic manipulation. So instead of
>> traversing a big list of sequences I can say: give me the 749th
>> sequence of the set and I get just that, using very few memory, very
>> fast. Well, except for this function.   ^_^
>> 
>
> As people have pointed out with the list problem, try this ..
> I am assuming that seq is always a list.
>
> (defun get-set (set seq)
>    (etypecase set
>      (list (loop with positions = (sort seq #'<)

;; sort is destructive -----------> (sort (copy-list seq) #'<)
;; once you've started making a copy, you could as well (coerce seq 'vector)...

>                  for position from 0
>                  for elt in set
>                  while positions
>                  when (= position (car positions)) collect elt
>                  and do (pop positions)))
>      (vector (let ((result (subseq set 0 (length seq))))
>                (map-into result (lambda (x) (aref set x)) seq)))))
>
> Wade
>
>

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

CAUTION: The mass of this product contains the energy equivalent of
85 million tons of TNT per net ounce of weight.
From: Wade Humeniuk
Subject: Re: Optimize a sequence function
Date: 
Message-ID: <iZLHf.3877$s36.2012@edtnps89>
Pascal Bourguignon wrote:
> Wade Humeniuk <··················@telus.net> writes:
> 
>> Nicolay Giraldo wrote:
>>
>>> No, seq is generated by arithmetic manipulation. So instead of
>>> traversing a big list of sequences I can say: give me the 749th
>>> sequence of the set and I get just that, using very few memory, very
>>> fast. Well, except for this function.   ^_^
>>>
>> As people have pointed out with the list problem, try this ..
>> I am assuming that seq is always a list.
>>
>> (defun get-set (set seq)
>>    (etypecase set
>>      (list (loop with positions = (sort seq #'<)
> 
> ;; sort is destructive -----------> (sort (copy-list seq) #'<)
> ;; once you've started making a copy, you could as well (coerce seq 'vector)...

Since it is not clear to me how Nicolay is using all this I would
suggest as a possible solution..

1) Do not use lists at all for your sets, also index seq should be vectors.
2) Specialize the set arrays (float, integer,...) and put some compiler
    optimizations.
and

3) If possible, do not do any vector construction, instead do something like

(defstruct vset
   vector
   index-mapping)

(defun vsetref (vset index)
   (aref (vset-vector vset)
         (aref (vset-index-mapping vset) index)))

(defun get-set (raw-vector raw-index-vector)
   (assert (and (vectorp raw-vector) (vectorp raw-index-vector)))
   (make-vset :vector raw-vector
              :index-mapping raw-index-vector))


CL-USER 1 > (get-set #(1 2 3 4 5) #(1 2 4))
#S(VSET VECTOR #(1 2 3 4 5) INDEX-MAPPING #(1 2 4))

CL-USER 2 > (setf v *)
#S(VSET VECTOR #(1 2 3 4 5) INDEX-MAPPING #(1 2 4))

CL-USER 3 > (vsetref v 2)
5

CL-USER 4 > (vsetref v 0)
2

CL-USER 5 >

Wade