I have found classes very useful for organizing complex computations, and
the with-slots macro is quite convenient for accessing slots. Recently I
started profiling, and realized that it is also slowing things down:
sometimes I only need to read slots, but there is still a function call
in there.
Here is some example code:
(defclass myclass ()
((s :initform 12d0)))
(defun benchmark (n)
"Without slot access, so we can subtract this as the fixed time."
(declare (fixnum n))
(let ((sum 0d0))
(declare (double-float sum))
(dotimes (i n)
(incf sum (+ 1.8d0 (random 5d0))))
sum))
(defun foo1 (instance n)
"Using with-slots."
(declare (fixnum n))
(with-slots (s) instance
(let ((sum 0d0))
(declare (double-float sum s))
(dotimes (i n)
(incf sum (+ s (random 5d0))))
sum)))
(defun foo2 (instance n)
"Using let."
(declare (fixnum n))
(let ((sum 0d0)
(s (slot-value instance 's)))
(declare (double-float sum s))
(dotimes (i n)
(incf sum (+ s (random 5d0))))
sum))
Using SBCL,
(defparameter *n* 50000000)
(time (benchmark *n*)) ; 2.57s
(time (foo1 (make-instance 'myclass) *n*)) ; 3.78s
(time (foo2 (make-instance 'myclass) *n*)) ; 2.42s, clearly some
; measurement error in there
So I came up with a macro, which is really a trivial modification for
with-slots for cases when I need only need to read slots:
(defmacro with-slots* (slots instance &body body)
"Macro similar to with-slots, but with the slots made read-only."
(let ((in (gensym)))
`(let ((,in ,instance))
(declare (ignorable ,in))
(let ,(mapcar (lambda (slot-entry)
(let ((var-name
(if (symbolp slot-entry)
slot-entry
(car slot-entry)))
(slot-name
(if (symbolp slot-entry)
slot-entry
(cadr slot-entry))))
`(,var-name
(slot-value ,in ',slot-name))))
slots)
,@body))))
Replacing with-slots with with-slots* in places where it makes sense lead
to a 40% improvement in the speed of my calculations.
Tamas
On 2008-10-27 09:10:57 -0400, Tamas K Papp <······@gmail.com> said:
>
> Replacing with-slots with with-slots* in places where it makes sense lead
> to a 40% improvement in the speed of my calculations.
I only find about a 15% speedup using your with-slots* under LispWorks
on an intel 32-bit mac. Maybe this is more of an sbcl specific problem
better addressed on the sbcl-devel mailing list?
In article <··············@mid.individual.net>,
Tamas K Papp <······@gmail.com> wrote:
> I have found classes very useful for organizing complex computations, and
> the with-slots macro is quite convenient for accessing slots. Recently I
> started profiling, and realized that it is also slowing things down:
> sometimes I only need to read slots, but there is still a function call
> in there.
>
That's no surprise, I'd say. Accessing a local variable
is very likely to be faster than accessing slot values. WITH-SLOTS
is just a shorter notation - shorter than repeated
SLOT-VALUE forms. But there is no caching...
SLOT access to CLOS object is usually a bit slower than
slot access to structures. For structures the Lisp
system can/will assume that the position of the slot
does not change. With CLOS slots one usually can't
assume that. There are some extensions to Common Lisp,
where one can declare fixed slot positions for CLOS
classes.
Access to variables should be faster to both CLOS slots
and structure slots.
--
http://lispm.dyndns.org/
On Oct 27, 1:44 pm, Rainer Joswig <······@lisp.de> wrote:
> SLOT access to CLOS object is usually a bit slower than
> slot access to structures. For structures the Lisp
> system can/will assume that the position of the slot
> does not change. With CLOS slots one usually can't
> assume that. There are some extensions to Common Lisp,
> where one can declare fixed slot positions for CLOS
> classes.
The MOP gives you standard-instance-access, not sure if that is the
extension you mean. Using it makes slot lookup wickedly fast:
(eval-when (:compile-toplevel :execute)
(defun class-slot-ix (name class)
(let* ((cls (find-class class))
(slot (or (find name (closer-mop:class-slots cls)
:key #'closer-mop:slot-definition-name)
(error "Class ~A has no slot named ~A." cls
name))))
(closer-mop:slot-definition-location slot))))
(defun bar (instance n)
(declare (fixnum n)
(optimize (speed 3) (safety 0) (debug 0)))
(symbol-macrolet ((s (closer-mop:standard-instance-access instance #.
(class-slot-ix 's 'myclass))))
(let ((sum 0d0))
(declare (double-float sum))
(dotimes (i n)
(incf sum (+ (the double-float s) (random 5d0))))
sum)))
Results, all with same optimize declaration:
(time (foo1 (make-instance 'myclass) *n*)) = 3.897 with-slots
(time (bar (make-instance 'myclass) *n*)) = 2.712 standard-
instance-access
(time (foo2 (make-instance 'myclass) *n*)) = 2.395 let
- Willem
Willem Broekema wrote:
> The MOP gives you standard-instance-access, not sure if that is the
> extension you mean. Using it makes slot lookup wickedly fast:
Be careful here! If the instance comes from a subclass of MYCLASS, the slot
location might be different. (See COMPUTE-SLOTS for a way to make sure the
slots are in the order you need them to be.)
Cheers,
-- Nikodemus
John Thingstad wrote:
> P� Mon, 27 Oct 2008 14:10:57 +0100, skrev Tamas K Papp <······@gmail.com>:
>
>> I have found classes very useful for organizing complex computations, and
>> the with-slots macro is quite convenient for accessing slots. Recently I
>> started profiling, and realized that it is also slowing things down:
>> sometimes I only need to read slots, but there is still a function call
>> in there.
>>
>
> Using with-slots is bad form anyhow as it breaks the abstraction created
> by accessors. What if you changed the type to a computed type?
> with-accessors is better in that respect, but there is no guarantee it
> will be faster. (Depends on the implementation.)
Slot access via accessors is likely to be faster than via slot-value.
Accessors should also be preferred due to potential
:before/:after/:around methods on the accessors. Access via slot-value
should be reserved for low-level access, for example as part of internal
implementation details for a specific class.
(Unless you really know what you're doing, of course, as always. ;)
Pascal
--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
Tamas K Papp wrote:
> So I came up with a macro, which is really a trivial modification for
> with-slots for cases when I need only need to read slots:
>
> (defmacro with-slots* (slots instance &body body)
> "Macro similar to with-slots, but with the slots made read-only."
> (let ((in (gensym)))
> `(let ((,in ,instance))
> (declare (ignorable ,in))
> (let ,(mapcar (lambda (slot-entry)
> (let ((var-name
> (if (symbolp slot-entry)
> slot-entry
> (car slot-entry)))
> (slot-name
> (if (symbolp slot-entry)
> slot-entry
> (cadr slot-entry))))
> `(,var-name
> (slot-value ,in ',slot-name))))
> slots)
> ,@body))))
>
> Replacing with-slots with with-slots* in places where it makes sense lead
> to a 40% improvement in the speed of my calculations.
This macro has the disadvantage that you won't see potential side
effects on the slots anymore (and that's where your speedup comes from).
If you know that your code is (mostly) functional, then that's ok.
But also keep my comment about low-level slot-value vs high-level
accessors in mind.
Pascal
--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
TKP> I have found classes very useful for organizing complex computations,
TKP> and the with-slots macro is quite convenient for accessing slots.
TKP> Recently I started profiling, and realized that it is also slowing
TKP> things down: sometimes I only need to read slots, but there is still a
TKP> function call in there.
is it surprising?
theoretically "sufficiently smart compiler" could detect that you're only
reading
slots and use local variables, avoiding expensive object slot lookups.
however, in presence of multiprocessing that would have different semantics,
as slot could be changed by other thread. so this becomes a tradeoff --
either
clean and transparent semantics or aggressive optimization that could cause
some wtf moments.
top speed was never a top priority of CL, so choice to have transparent
semantics is understandable (and that is also less work to do :))
on the other hand, i can't say that 2x slowdown of slot access is too big
overhead..
On Mon, 27 Oct 2008 17:43:44 +0200, Alex Mizrahi wrote:
> TKP> I have found classes very useful for organizing complex
> computations,
> TKP> and the with-slots macro is quite convenient for accessing slots.
> TKP> Recently I started profiling, and realized that it is also slowing
> TKP> things down: sometimes I only need to read slots, but there is
> still a TKP> function call in there.
>
> is it surprising?
No, not really, once I looked under the hood.
> top speed was never a top priority of CL, so choice to have transparent
> semantics is understandable (and that is also less work to do :))
Still, CL is the fastest practical high-level language* I know of. I
found that with a bit of profiling and tweaking, CL programs can be
superfast.
Tamas
*Yes, I know, languages don't have speed, only implementations do. You
know what I mean.
Alex Mizrahi wrote:
> TKP> I have found classes very useful for organizing complex computations,
> TKP> and the with-slots macro is quite convenient for accessing slots.
> TKP> Recently I started profiling, and realized that it is also slowing
> TKP> things down: sometimes I only need to read slots, but there is still a
> TKP> function call in there.
>
> is it surprising?
>
> theoretically "sufficiently smart compiler" could detect that you're only
> reading
> slots and use local variables, avoiding expensive object slot lookups.
>
> however, in presence of multiprocessing that would have different semantics,
> as slot could be changed by other thread. so this becomes a tradeoff --
> either
> clean and transparent semantics or aggressive optimization that could cause
> some wtf moments.
You don't have to go as far as adding multithreading here. A (direct or
indirect) call to another function could already have a side effect on
the object in question.
Pascal
--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
??>> however, in presence of multiprocessing that would have different
??>> semantics, as slot could be changed by other thread. so this becomes a
??>> tradeoff -- either clean and transparent semantics or aggressive
??>> optimization that could cause some wtf moments.
PC> You don't have to go as far as adding multithreading here. A (direct or
PC> indirect) call to another function could already have a side effect on
PC> the object in question.
couldn't "sufficiently smart compiler" detect that in such code snippet:
(dotimes (i n)
(incf sum (+ s (random 5d0))))
object instance is not going to be changed?
Alex Mizrahi wrote:
> ??>> however, in presence of multiprocessing that would have different
> ??>> semantics, as slot could be changed by other thread. so this becomes a
> ??>> tradeoff -- either clean and transparent semantics or aggressive
> ??>> optimization that could cause some wtf moments.
>
> PC> You don't have to go as far as adding multithreading here. A (direct or
> PC> indirect) call to another function could already have a side effect on
> PC> the object in question.
>
> couldn't "sufficiently smart compiler" detect that in such code snippet:
>
> (dotimes (i n)
> (incf sum (+ s (random 5d0))))
>
> object instance is not going to be changed?
>
>
Unless the compiler somehow knows that the "random" function won't
change anything, it will... But what if random stores it's seed value in
a slot of a global object, how the compiler would know if that's not the
object where the slot "s" is coming from?
I guess there is some relation here to the open-coding - e.g. the
compiler (internally) knows more about the symbols in the "CL" package,
and in this case it knows that although random has state, that state
cannot be accessed normally through the object that has the slot "s".
Alex Mizrahi wrote:
> ??>> however, in presence of multiprocessing that would have different
> ??>> semantics, as slot could be changed by other thread. so this becomes a
> ??>> tradeoff -- either clean and transparent semantics or aggressive
> ??>> optimization that could cause some wtf moments.
>
> PC> You don't have to go as far as adding multithreading here. A (direct or
> PC> indirect) call to another function could already have a side effect on
> PC> the object in question.
>
> couldn't "sufficiently smart compiler" detect that in such code snippet:
>
> (dotimes (i n)
> (incf sum (+ s (random 5d0))))
>
> object instance is not going to be changed?
Maybe. But if the efficiency in this case matters that much, you will
probably also notice this in your profiler, and might as well optimize
this by hand. ;)
Pascal
--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
Tamas K Papp wrote:
> (defun foo1 (instance n)
> "Using with-slots."
> (declare (fixnum n))
> (with-slots (s) instance
> (let ((sum 0d0))
> (declare (double-float sum s))
> (dotimes (i n)
> (incf sum (+ s (random 5d0))))
> sum)))
>
> (defun foo2 (instance n)
> "Using let."
> (declare (fixnum n))
> (let ((sum 0d0)
> (s (slot-value instance 's)))
> (declare (double-float sum s))
> (dotimes (i n)
> (incf sum (+ s (random 5d0))))
> sum))
FOO1 and FOO2 are not doing quite the same thing: in FOO2 you have performed
manual loop-invariant optimization by lifting out reading the slot from the loop.
Now, granted, SBCL does not currently do loop code motion at all -- but even
if it did, the compiler could not do it in this case: calling SLOT-VALUE only
once would be an illegal program transformation. If INSTANCE was known to be a
structure instance, then the transformation would be legal.
In case of a CLOS instance... there are number of things to worry about: (1)
SLOT-VALUE-USING-CLASS (2) slot non-existence and methods on SLOT-MISSING (3)
slot unboundedness and methods on SLOT-UNBOUND. There may be more, but these
at least come to mind. The point is that given these things the program can
behave differently with and without the transformation -- hence it cannot be
done. (That said, I do think that writing stuff that would be affected by a
transformation like that sounds highly bogus, and if the spec was a work in
progress I would probably vote in favor of explicitly allowing things this.)
Incidentally, you might want to have a look at
http://www.sbcl.org/manual/Slot-access.html#Slot-access, which gives some
hints on how to optimize slot access speed. Specifically,
(defmethod foo3 ((instance myclass) n)
"Using with-slots in defmethod specialized on the instance."
(declare (fixnum n))
(with-slots (s) instance
(let ((sum 0d0))
(declare (double-float sum))
(dotimes (i n)
(incf sum (+ (the double-float s) (random 5d0))))
sum)))
is much closer to FOO2 then FOO1, since the slot access becomes just two
memory indirections (well, approximately) thanks to permutation vector
optimizations.
(Using THE instead of DECLARE is a workaround for a bug in those selfsame
permutation vector optimizations -- the type information from the declaration
was lost during the optimization. Patch on sbcl-devel, going in CVS in a week
or so, after the 1.0.22 release.)
Cheers,
-- Nikodemus