Re: (debunk-myth '(debunk-myth '(<-efficient gc malloc/free)))

From: Jeffrey Mark Siskind
Subject: Re: (debunk-myth '(debunk-myth '(<-efficient gc malloc/free)))
Date: Thu, 26 Oct 1995 00:00:00 +0000
Message-ID: <QOBI.95Oct26165240@qobi.ai>

In article <··········@mozo.cc.purdue.edu> ······@newton.math.purdue.edu (Brad Lucier) writes:

I ported to Gambit-C version 2.2 the two codes put forward as a test of
the cost of GC and the cost of functional versus imperative
programming styles. One thing that struck me was that the functional
style allowed one to separate the iterators, which could use fixnum
arithmetic, from the functions that were plugged into the iterators.
Thus, one could declare that certain blocks of code used only fixnums,
possibly speeding up both codes.

Both the imperative and functional versions do isomorphic iterations as part
of the E step and the M step. In the functional version those iterations
are done in the iteration procedures. In the imperative version those
iterations are done by DO loops in the E-STEP! and M-STEP! procedures. The
fixnum declarations that you added only affect those corresponding interations
in the functional version and not the imperative one. That might have skewed
the results. Why don't you repeat the experiment with equivalent fixnumisation
in both versions? I would do so myself but I don't know enough about Gambit-C.

Now one might argue that the advantage of the functional version is that it is
easy to package up such declarations in functional iteration procedures. But
in response one might argue that one can package up analogous declarations as
macros for the imperative version. I.e. replace DO with DO-TIMES.

In any case, each codes uses, in fact, a mixed style of programming.

The original benchmark used imperative programming for all of the auxiliary
procedures outside of the main EM algorithm, in both the imperative and
functional versions. I.e. DETERMINANT, INVERT-MATRIX, JACOBI, etc. all operate
imperatively internally in both benchmarks. This was meant as a control. The
only thing that varies between the imperative and functional versions is how
the EM algorithm itself is coded.

Jeff (home page http://www.cs.toronto.edu/~qobi)
--

Jeff (home page http://www.cs.toronto.edu/~qobi)

From: Brad Lucier
Subject: Re: (debunk-myth '(debunk-myth '(<-efficient gc malloc/free)))
Date: Fri, 27 Oct 1995 00:00:00 +0000
Message-ID: <46rdja$bu6@mozo.cc.purdue.edu>

Jeffrey Mark Siskind writes:

  > Both the imperative and functional versions do isomorphic iterations as
  > part of the E step and the M step. In the functional version those
  > iterations are done in the iteration procedures. In the imperative
  > version those iterations are done by DO loops in the E-STEP! and
  > M-STEP! procedures. The fixnum declarations that you added only affect
  > those corresponding interations in the functional version and not the
  > imperative one. That might have skewed the results. Why don't you
  > repeat the experiment with equivalent fixnumisation in both versions? I
  > would do so myself but I don't know enough about Gambit-C.

Quoting from the Gambit manual, "[Declare forms] can only appear where
a define form is acceptable."  I don't know how fine-grained type
declarations one can achieve with this restriction;  mixing flonum and
fixnum arithmetic may make type declarations more difficult than
explicit declarations of ##fixnum.+, ##flonum.+, etc.

I suggest that it would be better to define

(define (dot u v)
  (reduce-vector (lambda (x y) (+ x y))
                 (map-vector-two (lambda (x y) (* x y)) u v) 0.0))

rather than

(define (dot u v) (reduce-vector + (map-vector-two * u v) 0.0))

since "+" can take any number of arguments, of many different types,
etc.  The programmer's knowledge that you want to apply + to only two
flonum arguments inside map-vector-two is lost without a global
data-flow analysis. Writing explicitly that you expect it to take only
two arguments, with a flonum declaration around this code, leads to
better code in gambit.

I noticed that I had not, in fact, applied all possible arithmetic
declarations for either fun.scm or for.scm.  I did so, changed the
definition of dot, v+, v-, k*v as above, and redefined sum as suggested
above.  The results are listed at the end of this post.

  > The original benchmark used imperative programming for all of the
  > auxiliary procedures outside of the main EM algorithm, in both the
  > imperative and functional versions. I.e. DETERMINANT, INVERT-MATRIX,
  > JACOBI, etc. all operate imperatively internally in both benchmarks.
  > This was meant as a control. The only thing that varies between the
  > imperative and functional versions is how the EM algorithm itself is
  > coded.

Now this is interesting.  It turns out that determinant, invert-matrix,
and jacobi are *always* called with 1x1 matrices.  So they could be
replaced by obviously trivial procedures; it's not clear that in their
present form they take up much time at all, since basically none of
their loops are executed.  From looking at what the eigenvalues and
matrix inverses are used for in the code, it seems that all the
matrix-matrix ops and matrix-vector ops are also called with 1x1
matrices and 1-vectors (but I haven't checked).  If so, this is a
benchmark about whether a compiler/environment/? can get good
floating-point performance if you wrap each floating-point number in
one or two layers of vector wrappers, unwrap each flonum before each
operation, and rewrap each result after each operation.  I think a
better test should involve bigger vectors/matrices.

Brad Lucier     ······@math.purdue.edu

Results for non-R4RS code:

Functional without declarations:
1686.2 seconds, 271 garbage collections

Functional with some declarations:
786.4 seconds, 271 garbage collections

Functional with all (?) declarations plus rewrite of dot, v+, etc.:
642.0 seconds, 271 garbage collections

Imperative without declarations:
797.3 seconds, 181 garbage collections

Imperative with some declarations:
760.5 seconds, 181 garbage collections

Imperative with all (?) declarations:
701.1 seconds, 181 garbage collections