Feedback (thoughts, prior art) wanted on templating system (long)

From: rif
Subject: Feedback (thoughts, prior art) wanted on templating system (long)
Date: Sun, 06 Mar 2005 15:34:16 +0000
Message-ID: <wj0acpgwzyf.fsf@five-percent-nation.mit.edu>

Hi.

I've been working on a little project, and I was hoping people here
could give me some feedback.  I'm especially interested in looking at
other systems with similar goals.

I do a lot of numerical scientific work.  It is important that my
programs be efficient.  I have found that CMUCL will give me
sufficient numerical efficiency, if I am careful about making
declarations.  The system I'm building is a first attempt towards
making it easier and more natural to write efficient programs.

My system is a form of "templating" system, that allows one to
automatically construct a parametrized family of efficient
implementations with as little work as possible.  For example, suppose
I need functions to scale arrays of numbers by a constant.  In my
current system, I can say:

(def-templated scale-array ((a :array) (s :scalar :tied-to a))
  (let* ((length (length a))
         (result (make-array length :element-type '(:elt-type a))))
    (declare (type fixnum length))
    (dotimes (i length)
      (setf (aref result i)
            (* s (aref a i))))
    result))

This says that a is an array parameter, and s is a scalar parameter
that is of the same type as a.  Evaluating def-templated creates a
generic function for scale-array, and it also creates a macro,
instantiate-scale-array, with the following macroexpansion:

CL-USER> (macroexpand-1 '(instantiate-scale-array double-float))
(LOCALLY
 (DECLARE (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 1)))
 (DEFMETHOD SCALE-ARRAY
            ((A #<BUILT-IN-CLASS KERNEL::SIMPLE-ARRAY-DOUBLE-FLOAT {281B11E5}>)
             (S DOUBLE-FLOAT))
            (DECLARE (TYPE (SIMPLE-ARRAY DOUBLE-FLOAT (*)) A)
             (TYPE DOUBLE-FLOAT S))
            (LET* ((LENGTH (LENGTH A))
                   (RESULT (MAKE-ARRAY LENGTH :ELEMENT-TYPE 'DOUBLE-FLOAT)))
              (DECLARE (TYPE FIXNUM LENGTH))
              (DOTIMES (I LENGTH) (SETF (AREF RESULT I) (* S (AREF A I))))
              RESULT)))

Note that the instantiation includes type declarations for the
defmethod parameters, even though the template didn't include them at
all --- since the types are known at instantation time, these type
declarations can be automatically generated.

The "unspecialized method", (scale-array (a t) (s t)), has additional
behavior.  It checks to see if a is of a type it knows how to
specialize on, and if so, automatically instantiates and runs the
specialized method.  With this feature, users don't have to explicitly
instantiate the template --- they just call the function, and if the
system can build a useful specialization, it will.

This system is targeting three different efficiency optimizations:

1.  Taking advantage of the fact that (at least on some
    implementations) numerical arrays are laid out "in place"
2.  Avoiding generic arithmetic.
3.  Inlining passed-in functions to avoid boxing/unboxing
    (not yet implemented, but I think it's possible)

The current system is not pretty code (it's a hacked up mess) and
chock full of cargo-cult macrology (thanks for all the help on my
macro question), but it does work.

A more full-fledged (future) system will let you template over
multiple functions simultaneously (to make efficient "data structures"
such as trees, priority queues, etc.), and will probably generate two
versions of each function --- a long-name version such
scale-array<double-float>, and a generic version that calls the
long-name version.  That way one can avoid the dispatching overhead
for inner loop functions as needed, but still keep genericity.

I'm not sure whether using CLOS is a good idea for this.  In
particular, for the next version of the system, I'm seriously
considering just using a hash table of implementations and dispatching
myself.  The reason for this is that it seems like there is a bit of
mismatch between CL's type system and CMUCL's class system.  For
instance, using CLOS, I can't seem to specialize on (simple-array
double-float (*)), instead I have to actually construct such an array
(at instantiation time), get it's class, and specialize on that (see
the expansion above).

Any thoughts and comments are welcome.

Cheers,

rif

Re: Feedback (thoughts, prior art) wanted on templating system (long) Wade Humeniuk
- Re: Feedback (thoughts, prior art) wanted on templating system (long) Wade Humeniuk
  - Re: Feedback (thoughts, prior art) wanted on templating system (long) rif
Re: Feedback (thoughts, prior art) wanted on templating system (long) Nicolas Neuss
- Re: Feedback (thoughts, prior art) wanted on templating system (long) rif
  - Re: Feedback (thoughts, prior art) wanted on templating system (long) Nicolas Neuss
Re: Feedback (thoughts, prior art) wanted on templating system (long) Rahul Jain
- Re: Feedback (thoughts, prior art) wanted on templating system (long) rif
  - Re: Feedback (thoughts, prior art) wanted on templating system (long) Joe Marshall

From: Wade Humeniuk
Subject: Re: Feedback (thoughts, prior art) wanted on templating system (long)
Date: Sun, 06 Mar 2005 20:38:38 +0000
Message-ID: <iTJWd.6032$gJ3.1456@clgrps13>

Have you looked at compiler macros?

See DEFINE-COMPILER-MACRO in the Spec.

I suppose if you use compiler macros you can
define a number of straight functions like scale-array
and then define a compiler macro to handle expansion
based on the type of parameters at compile time.  I suppose
you would have to give your macro a hint with statments like
(the double-float scale-factor).  Perhaps you can then get rid
of any methods or reliance on CLOS.

Also should not the following inline version of scale-array
capture the type information of array and scale and perform the
proper type optimizations and any inherited compiler optimizations?

(declaim (inline scale-array))
(defun scale-array (array scale)
   (let* ((length (length array))
          (result (make-array length :element-type (array-element-type array))))
     (declare (type fixnum length))
     (dotimes (i length)
       (setf (aref result i)
             (* scale (aref array i))))
     result))

If the compiler had trouble with optimizing access to result, you
you define scale-array to take a result array, like

(define scale-array (array scale result) ...

and the compiler would then know the type of result from
higher level declarations.


Wade

From: Wade Humeniuk
Subject: Re: Feedback (thoughts, prior art) wanted on templating system (long)
Date: Sun, 06 Mar 2005 20:52:04 +0000
Message-ID: <U3KWd.6125$gJ3.4621@clgrps13>

In LW the following code shows that the expected optimizations
happen:

(declaim (inline scale-array))
(defun scale-array (array scale result)
   (dotimes (i (length array))
     (declare (type fixnum i))
     (setf (aref result i) (* scale (aref array i))))
   result)

(defun test-function ()
   (declare (optimize (speed 3) (safety 0) (float 0)))
   (let ((array (make-array 128 :element-type 'double-float))
         (scale 1.2d0)
         (result (make-array 128 :element-type 'double-float)))
     (declare (type (simple-array double-float (*)) array result)
              (type double-float scale))
     (scale-array array scale result)))

CL-USER 1 > (disassemble 'test-function)
; Loading fasl file C:\Program 
Files\Xanalys\LispWorks\lib\4-4-0-0\load-on-demand\pcl\util\disass.fsl
; Loading fasl file C:\Program 
Files\Xanalys\LispWorks\lib\4-4-0-0\load-on-demand\pcl\memory\findptr.fsl
21548802:
        0:      55               push  ebp
        1:      89E5             move  ebp, esp
        3:      83EC14           sub   esp, 14
        6:      C7042445140000   move  [esp], 1445
       13:      33F6             xor   esi, esi
       15:      56               push  esi
       16:      56               push  esi
       17:      56               push  esi
       18:      56               push  esi
       19:      56               push  esi
       20:      56               push  esi
       21:      56               push  esi
       22:      56               push  esi
       23:      6800000400       push  40000
       28:      6800800000       push  8000
       33:      B503             moveb ch, 3
       35:      B800F2FFFF       move  eax, -E00
       40:      FF1560C21920     call  [2019C260]       ; SEQ::MAKE-UNFILLED-VECTOR
       46:      8945D4           move  [ebp-2C], eax
       49:      6800000400       push  40000
       54:      6800800000       push  8000
       59:      B503             moveb ch, 3
       61:      B800F2FFFF       move  eax, -E00
       66:      FF1560C21920     call  [2019C260]       ; SEQ::MAKE-UNFILLED-VECTOR
       72:      8945D8           move  [ebp-28], eax
       75:      8B75D4           move  esi, [ebp-2C]
       78:      8B760C           move  esi, [esi+C]
       81:      8975CC           move  [ebp-34], esi
       84:      C745E800000000   move  [ebp-18], 0
       91:      837DCC00         cmp   [ebp-34], 0
       95:      7E58             jle   L2
       97:      C745E400000000   move  [ebp-1C], 0
      104:      8B75CC           move  esi, [ebp-34]
      107:      81EE00010000     sub   esi, 100
      113:      8975D0           move  [ebp-30], esi
L1:  116:      8B75E8           move  esi, [ebp-18]
      119:      8975DC           move  [ebp-24], esi
      122:      8B7DDC           move  edi, [ebp-24]
      125:      C1FF05           sar   edi, 5
      128:      8B75D4           move  esi, [ebp-2C]
      131:      DD443E14         fldl  [esi+14+edi]
      135:      DD05D8F96921     fldl  [2169F9D8]       ; 1.2
      141:      DEC9             fmulp st(1), st
      143:      8B7DE8           move  edi, [ebp-18]
      146:      C1FF05           sar   edi, 5
      149:      8B75D8           move  esi, [ebp-28]
      152:      DD5C3E14         fstpl [esi+14+edi]
      156:      8B75E4           move  esi, [ebp-1C]
      159:      3B75D0           cmp   esi, [ebp-30]
      162:      7C1B             jl    L3
      164:      8B45E8           move  eax, [ebp-18]
      167:      E8BC2149FF       call  209DAA6A         ; #<function SYSTEM:1+$FIXNUM 
209DAA6A>
      172:      8945E8           move  [ebp-18], eax
      175:      8B7DE8           move  edi, [ebp-18]
      178:      8B5DCC           move  ebx, [ebp-34]
      181:      3BFB             cmp   edi, ebx
      183:      7CBB             jl    L1
L2:  185:      8B45D8           move  eax, [ebp-28]
      188:      FD               std
      189:      C9               leave
      190:      C3               ret
L3:  191:      8B75E4           move  esi, [ebp-1C]
      194:      8975E0           move  [ebp-20], esi
      197:      8B7DE0           move  edi, [ebp-20]
      200:      81C700010000     add   edi, 100
      206:      897DE4           move  [ebp-1C], edi
      209:      897DE8           move  [ebp-18], edi
      212:      EB9E             jmp   L1
NIL

CL-USER 2 >

Wade

From: rif
Subject: Re: Feedback (thoughts, prior art) wanted on templating system (long)
Date: Sun, 06 Mar 2005 23:37:41 +0000
Message-ID: <wj0is44e46y.fsf@five-percent-nation.mit.edu>

Thanks for the feedback.  I see what you are saying.

I was familiar with DEFINE-COMPILER-MACRO.  It does some of what I
want, but not all.

Both DEFINE-COMPILER-MACRO and your inlining solution require that the
types be apparent at compile time.  The inlining solution additionally
implies a willingness to inline arbitrarily large functions, and
nestings thereof.  This also involves recompiling all systems that
depend on a library when the underlying library changes.

rif

From: Nicolas Neuss
Subject: Re: Feedback (thoughts, prior art) wanted on templating system (long)
Date: Wed, 09 Mar 2005 10:01:29 +0000
Message-ID: <87is41yw7a.fsf@ortler.iwr.uni-heidelberg.de>

rif <···@mit.edu> writes:

> I'm not sure whether using CLOS is a good idea for this.  In
> particular, for the next version of the system, I'm seriously
> considering just using a hash table of implementations and dispatching
> myself.  The reason for this is that it seems like there is a bit of
> mismatch between CL's type system and CMUCL's class system.  For
> instance, using CLOS, I can't seem to specialize on (simple-array
> double-float (*)), instead I have to actually construct such an array
> (at instantiation time), get it's class, and specialize on that (see
> the expansion above).

Because of these problems, I have generally chosen to create my own typed
CLOS classes in my PDE toolbox Femlisp, e.g.

* (class-of #m(1.0))
#<STANDARD-CLASS |(STANDARD-MATRIX DOUBLE-FLOAT)| {58AADCED}>

Then, automatic code generation and dispatch can be done for those CLOS
classes.

Nicolas.

From: rif
Subject: Re: Feedback (thoughts, prior art) wanted on templating system (long)
Date: Wed, 09 Mar 2005 15:37:07 +0000
Message-ID: <wj0u0nkespo.fsf@five-percent-nation.mit.edu>

Ah, interesting.  I see what you're saying.

I can see advantages to your suggestion.  Certainly, for matrices,
where you're almost certainly going to need a class to represent,
anyways.  The tradeoff is that you have to put a layer of objects
around everything, but that doesn't seem like an enormously large
price to pay.

Thanks for the tip.

Does Femlisp contain stuff pretty similar to what I was suggesting in
my post?  I should take a closer look...

Cheers,

rif

From: Nicolas Neuss
Subject: Re: Feedback (thoughts, prior art) wanted on templating system (long)
Date: Thu, 10 Mar 2005 09:18:36 +0000
Message-ID: <87wtsfyi37.fsf@ortler.iwr.uni-heidelberg.de>

rif <···@mit.edu> writes:

> Thanks for the tip.
>
> Does Femlisp contain stuff pretty similar to what I was suggesting in
> my post?  I should take a closer look...
>
> Cheers,
>
> rif

I think it does.  Especially, you should look in the directory
#p"femlisp:src;matlisp".

Nicolas.

From: Rahul Jain
Subject: Re: Feedback (thoughts, prior art) wanted on templating system (long)
Date: Sat, 26 Mar 2005 22:54:20 +0000
Message-ID: <87eke2rpb7.fsf@nyct.net>

rif <···@mit.edu> writes:

> My system is a form of "templating" system, that allows one to
> automatically construct a parametrized family of efficient
> implementations with as little work as possible.  For example, suppose
> I need functions to scale arrays of numbers by a constant.  In my
> current system, I can say:

Have you looked at how all the numerical operators in CMUCL are
implemented? They work exactly how you want your operators to work. 
Since you seem to OK with sticking to CMUCL, this might be a perfect
solution for you.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist

From: rif
Subject: Re: Feedback (thoughts, prior art) wanted on templating system (long)
Date: Sat, 26 Mar 2005 23:49:45 +0000
Message-ID: <wj0ll8aklwm.fsf@five-percent-nation.mit.edu>

Rahul Jain <·····@nyct.net> writes:

> rif <···@mit.edu> writes:
> 
> > My system is a form of "templating" system, that allows one to
> > automatically construct a parametrized family of efficient
> > implementations with as little work as possible.  For example, suppose
> > I need functions to scale arrays of numbers by a constant.  In my
> > current system, I can say:
> 
> Have you looked at how all the numerical operators in CMUCL are
> implemented? They work exactly how you want your operators to work. 
> Since you seem to OK with sticking to CMUCL, this might be a perfect
> solution for you.
> 
> -- 
> Rahul Jain
> ·····@nyct.net
> Professional Software Developer, Amateur Quantum Mechanicist

I haven't looked.  I've always been a bit scared to look into the guts
of a CL implementation.  Perhaps this is the right time.  Thanks for
the pointer.

rif

From: Joe Marshall
Subject: Re: Feedback (thoughts, prior art) wanted on templating system (long)
Date: Tue, 29 Mar 2005 21:27:55 +0000
Message-ID: <acomb0ro.fsf@ccs.neu.edu>

rif <···@mit.edu> writes:

> I've always been a bit scared to look into the guts of a CL
> implementation. 

And when you stare persistently into a CL implementation, the CL
implementation also stares into you.