'Tipping point' for list of integers versus vector?

From: jmckitrick
Subject: 'Tipping point' for list of integers versus vector?
Date: Mon, 06 Nov 2006 15:14:13 +0000
Message-ID: <1162826052.985817.29930@m73g2000cwd.googlegroups.com>

Has anyone discovered a 'tipping point' where a certain number of
floats/integers/other numbers as a list is noticeably slower than a
similar vector?

I realize this is a very general question, but I'm just looking for
general rules-of-thumb, perhaps an order-of-magnitude guideline for
when iterating/mapping a list of numbers begins to become noticeably
less efficient than a vector of the same.

If it's necessary to be more specific to get meaningful results, then I
could say I'm using small groups integers and floats with basic math
operations in SBCL.

Re: 'Tipping point' for list of integers versus vector? Geoffrey Summerhayes
- Re: 'Tipping point' for list of integers versus vector? jmckitrick
Re: 'Tipping point' for list of integers versus vector? ········@gmail.com
Re: 'Tipping point' for list of integers versus vector? Barry Margolin
- Re: 'Tipping point' for list of integers versus vector? grackle
  - Re: 'Tipping point' for list of integers versus vector? Juho Snellman
    - Re: 'Tipping point' for list of integers versus vector? grackle
      - Re: 'Tipping point' for list of integers versus vector? Juho Snellman

From: Geoffrey Summerhayes
Subject: Re: 'Tipping point' for list of integers versus vector?
Date: Mon, 06 Nov 2006 18:01:32 +0000
Message-ID: <1162836092.032622.17120@i42g2000cwa.googlegroups.com>

jmckitrick wrote:
> Has anyone discovered a 'tipping point' where a certain number of
> floats/integers/other numbers as a list is noticeably slower than a
> similar vector?
>
> I realize this is a very general question, but I'm just looking for
> general rules-of-thumb, perhaps an order-of-magnitude guideline for
> when iterating/mapping a list of numbers begins to become noticeably
> less efficient than a vector of the same.
>
> If it's necessary to be more specific to get meaningful results, then I
> could say I'm using small groups integers and floats with basic math
> operations in SBCL.

It's tempting to just say, "42" and be done with it.

And no, you have to be a lot more specific: What kind of architecture
will the code run on? What version of SBCL are you using? How big
are the integers and floats? What kind of operations are you going to
be using on the group? How do you plan on accessing entries in the
group? How often do you need to access entries in each specific way?
Is the group set in stone or will it change in size? How?...

Knowing the answers to these questions will help you choose the
data structures to use, each has its good points and its own
weaknesses. But when in doubt, test the alternatives yourself!

Don't forget the prime rule though, make the code work correctly first.
You may find that even if the code is not as efficent as it could be,
it
is still fast enough and small enough to allow you to move on to other
things.

http://groups.google.ca/groups/search?q=premature+optimization

-----
Geoff

From: jmckitrick
Subject: Re: 'Tipping point' for list of integers versus vector?
Date: Wed, 08 Nov 2006 00:04:10 +0000
Message-ID: <1162944250.373319.63100@h48g2000cwc.googlegroups.com>

On Nov 6, 1:01 pm, "Geoffrey Summerhayes" <·······@hotmail.com> wrote:
> And no, you have to be a lot more specific: What kind of architecture
> will the code run on? What version of SBCL are you using? How big
> are the integers and floats? What kind of operations are you going to
> be using on the group? How do you plan on accessing entries in the
> group? How often do you need to access entries in each specific way?
> Is the group set in stone or will it change in size? How?...

Not that it matters now, but under a hundred elements in the sequence,
mostly iterating with statistics calculations, with the sequence
unchanging in content and length.  Oh, and Intel/OSX, SBCL 0.9.18.

> Don't forget the prime rule though, make the code work correctly first.
> You may find that even if the code is not as efficent as it could be,
> it
> is still fast enough and small enough to allow you to move on to other
> things.

Yes, that's usually what I do.  Implementation was a list at first, and
I decided I wanted to become more adept with mapping vectors, so I
thought I'd try it.  At the scale I am using, the difference is
negligible, but I thought maybe someone had some 'rule of thumb'
guidelines they had discovered.  But I guess the question is still too
broad.

From: ········@gmail.com
Subject: Re: 'Tipping point' for list of integers versus vector?
Date: Mon, 06 Nov 2006 18:08:52 +0000
Message-ID: <1162836532.319779.84160@b28g2000cwb.googlegroups.com>

> Has anyone discovered a 'tipping point' where a certain number of
> floats/integers/other numbers as a list is noticeably slower than a
> similar vector?

I don't think that this question makes sense. Vectors are always faster
by SOME factor (except, perhaps, for the trival cases of 0 and 1
length), so whether they are "noticeably" faster depends on whether you
are doing a noticeable amount of access. The only tradeoff is in terms
of speed v. effort/complexity, and you have to decide for yourself your
subjective thresholds for e/c.

From: Barry Margolin
Subject: Re: 'Tipping point' for list of integers versus vector?
Date: Tue, 07 Nov 2006 01:32:34 +0000
Message-ID: <barmar-18A6F9.20323406112006@comcast.dca.giganews.com>

In article <·······················@m73g2000cwd.googlegroups.com>,
 "jmckitrick" <···········@yahoo.com> wrote:

> Has anyone discovered a 'tipping point' where a certain number of
> floats/integers/other numbers as a list is noticeably slower than a
> similar vector?
> 
> I realize this is a very general question, but I'm just looking for
> general rules-of-thumb, perhaps an order-of-magnitude guideline for
> when iterating/mapping a list of numbers begins to become noticeably
> less efficient than a vector of the same.
> 
> If it's necessary to be more specific to get meaningful results, then I
> could say I'm using small groups integers and floats with basic math
> operations in SBCL.

It depends on what you're doing with the collection.  For instance, the 
answer will be different if you're doing lots of random access, versus 
iterating through the collection sequentially.

Rather than try to find specific sizes where the performance breaks 
even, I recommend you make the data structure choice match the most 
common type of use.  Arrays are generally appropriate for random access, 
lists for sequential access, and lists are also usually better if you 
will be inserting/deleting entries frequently.

Worrying about the exact point where performance changes is likely to be 
premature micro-optimization.  If your application has performance 
problems, profile it and only worry about this issue if the bottleneck 
is actually in the accesses to the collection.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

From: grackle
Subject: Re: 'Tipping point' for list of integers versus vector?
Date: Tue, 07 Nov 2006 10:42:33 +0000
Message-ID: <1162896153.333577.57980@m73g2000cwd.googlegroups.com>

I'm playing around with SBCL using (map-into sum #'+ A B), where sum is
a vector, to compare performance when A and B are vectors vs.
performance when A and B are lists.  Lists seem to be faster than
vectors at small sizes, with a break-even point at about 60 items.  I
don't get it.  My received conventional wisdom says that pointer
chasing and larger data size should make lists at least a bit slower
for any benchmark that doesn't measure creation, insertion, or
deletion.

OTOH, I don't understand my own benchmark, because I have no idea why
it conses and creates garbage :(  An integer small enough to be
represented as a fixnum will be represented as a fixnum, right?

-David

;; locality-test-list is the equivalent function for lists.
;; In locality-test-list, sum remains a vector, but
;; "lists" contains lists instead of vectors.
(defun locality-test-vector (list-count cons-count)
  (let ((lists (make-array list-count :initial-element 0)))
    (dotimes (i list-count)
      (setf (aref lists i) (make-array cons-count :element-type 'fixnum
:initial-element 0)))
    (dotimes (i cons-count)
      (dotimes (j list-count)
	(setf (aref (aref lists j) i) (the fixnum i))))
    (gc :full t)
    (let ((sum (make-array cons-count)))
      (time
       (dotimes (foo 1000)
	 (dotimes (i (1- list-count))
	   (map-into sum (lambda (x y) (the fixnum (+ x y))) (aref lists i)
(aref lists (1+ i)))))))))


CL-USER> (locality-test-vector 10000 10)
Evaluation took:
  33.893 seconds of real time
  33.478092 seconds of user run time
  0.340021 seconds of system run time
  [Run times include 1.808 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  5,599,462,752 bytes consed.
NIL
CL-USER> (locality-test-list 10000 10)
Evaluation took:
  23.542 seconds of real time
  23.241451 seconds of user run time
  0.252016 seconds of system run time
  [Run times include 1.805 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  5,599,460,960 bytes consed.
NIL

From: Juho Snellman
Subject: Re: 'Tipping point' for list of integers versus vector?
Date: Tue, 07 Nov 2006 10:58:38 +0000
Message-ID: <slrnel0pmu.vip.jsnell@sbz-30.cs.Helsinki.FI>

grackle <···········@gmail.com> wrote:
> ;; locality-test-list is the equivalent function for lists.
> ;; In locality-test-list, sum remains a vector, but
> ;; "lists" contains lists instead of vectors.
> (defun locality-test-vector (list-count cons-count)

You should (declare (optimize speed)) here.

>   (let ((lists (make-array list-count :initial-element 0)))
>     (dotimes (i list-count)
>       (setf (aref lists i) (make-array cons-count :element-type 'fixnum
>:initial-element 0)))
>     (dotimes (i cons-count)
>       (dotimes (j list-count)
> 	(setf (aref (aref lists j) i) (the fixnum i))))
>     (gc :full t)
>     (let ((sum (make-array cons-count)))
>       (time
>        (dotimes (foo 1000)
> 	 (dotimes (i (1- list-count))
> 	   (map-into sum (lambda (x y) (the fixnum (+ x y))) (aref lists i)
> (aref lists (1+ i)))

Which would make the compiler tell you that MAP-INTO here can't be
open-coded, due to the types of the arrays being unknown. So:

           (map-into sum
                    (lambda (x y)
                       (the fixnum (+ x y)))
                    (the (simple-array fixnum) (aref lists i))
                    (the (simple-array fixnum) (aref lists (1+ i))))

> ))))))

-- 
Juho Snellman

From: grackle
Subject: Re: 'Tipping point' for list of integers versus vector?
Date: Tue, 07 Nov 2006 11:33:43 +0000
Message-ID: <1162899223.620857.136390@i42g2000cwa.googlegroups.com>

Juho Snellman wrote:
> Which would make the compiler tell you that MAP-INTO here can't be
> open-coded, due to the types of the arrays being unknown. So:
>
>            (map-into sum
>                     (lambda (x y)
>                        (the fixnum (+ x y)))
>                     (the (simple-array fixnum) (aref lists i))
>                     (the (simple-array fixnum) (aref lists (1+ i))))
>
> > ))))))

Thanks, these declarations eliminate consing and produce reasonable
performance.  I knew that type-checking consumed processor cycles, but
I didn't know it created garbage.

Now there is no contest between lists and vectors in my benchmark;
vectors win hands-down, even for sequences of one element.  I don't see
a way to make lists competitive with simple vectors.

-David

From: Juho Snellman
Subject: Re: 'Tipping point' for list of integers versus vector?
Date: Tue, 07 Nov 2006 15:51:05 +0000
Message-ID: <slrnel1ar9.7ru.jsnell@sbz-30.cs.Helsinki.FI>

grackle <···········@gmail.com> wrote:
> Juho Snellman wrote:
>> Which would make the compiler tell you that MAP-INTO here can't be
>> open-coded, due to the types of the arrays being unknown. So:
[...]
> Thanks, these declarations eliminate consing and produce reasonable
> performance.  I knew that type-checking consumed processor cycles, but
> I didn't know it created garbage.

It doesn't generally [*]. But the inlined and non-inlined versions of
MAP-INTO will actually use radically different implementation 
strategies. The non-inlined version is pretty hideous, and has
probably only survived since people who use MAP-INTO care about 
performance enough to declare their sequence types.

[*] There are cases in SBCL where it will happen, but the amortized
    cost of that consing will be insignificant.

-- 
Juho Snellman