Practical Question: Fast bit strings

From: eric dahlman
Subject: Practical Question: Fast bit strings
Date: Thu, 21 Jun 2001 05:31:03 +0000
Message-ID: <tz4d77y1js8.fsf@sibelius.cs.colostate.edu>

Howdy all,

I have a practical question which I hope someone can help me with.  I
am interested in porting some code from C to lisp.  This is research
code and I think that I will be better able to do my work if I have a
lisp version of the algorithm -- I won't have that albatross of C
around my neck. ;-)

The problem is that I cannot afford a big hit in speed, a little hit
is O.K.  The majority of the work the C code is doing is flipping bits
in bit vectors, making copies and performing comparisons on the
resulting vectors.  In the back of my mind I remember that this is one
area where the performance of CL is a bit lacking.  But that is also
potentially stale info.  I believe that LispM's were particularly bad.
My Symbolics died so I don't have to worry about that any more, but I
am interested in ACL and CMUCL on sparcs and CMUCL on x86 if it
matters.

My first question is if this a performance problem in reality or just
a hobgoblin running around in my mind?  If not what is the
situation for the above implementations and architectures?

There is a fair amount of analysis which goes into reducing the
problem to this set of bit vectors and twiddling operations.  Would I
potentially be ahead if I left the bit twiddling to C and used a FFI
to perform it.  Just for reference a vector can contain hundreds to
thousands of bits, when twiddling bits only a half dozen or so are
changed at a time and then the vector is copied. I know the copying my
get expensive but I need to replicate the original version if only to
serve as a basis before I address some of those possibilities.

I know that I should benchmark this myself but I don't want to go to
the trouble of implementing all this if it is doomed from the start.


Thanks for the help,
-Eric

Re: Practical Question: Fast bit strings Frank A. Adrian
- Re: Practical Question: Fast bit strings eric dahlman
  - Re: Practical Question: Fast bit strings Frank A. Adrian
    - Re: Practical Question: Fast bit strings Tim Bradshaw
    - Re: Practical Question: Fast bit strings Bulent Murtezaoglu
      - Re: Practical Question: Fast bit strings Frank A. Adrian
        Re: Practical Question: Fast bit strings Tim Bradshaw
Re: Practical Question: Fast bit strings Bulent Murtezaoglu
- Re: Practical Question: Fast bit strings eric dahlman
  - Re: Practical Question: Fast bit strings Bulent Murtezaoglu
    - Re: Practical Question: Fast bit strings Eric Dahlman

From: Frank A. Adrian
Subject: Re: Practical Question: Fast bit strings
Date: Thu, 21 Jun 2001 06:16:47 +0000
Message-ID: <kjgY6.1420$Xy3.615137@news.uswest.net>

It would seem that, if you desired to move your code to a Lisp base to "be
better able to do your work", speed would not be that big of an issue.  I'm
fairly sure that copying all of those vectors in C must be a pain.  In fact,
if, as you say, "a vector can contain hundreds to thousands of bits, when
twiddling bits only a half dozen or so are changed at a time and then the
vector is copied", then you must be able to find a sparse representation of
the changes that would take less time to process (e.g., model your vector as
a delta structure holding a sparse representation of the changed bits with a
pointer to another vector model; as you do the ops, keep chaining them
together; when the set of accumulated changes grows large enough (or you
actually need a vector representation of the result), then create a new
vector out of it).  This approach would be much easier in Lisp.  It would
probably save time over the current algorithm in C because you wouldn't be
consing vectors all the time.  In short, if you're going to port your code,
look for ways of saving time by using the language to its best advantage,
don't just transcribe the algorithm.  Remember, you can always dump
intermediate results to make sure that the algorithm using the new data
model gives the same results as the original.

faa

"eric dahlman" <·······@cs.colostate.edu> wrote in message
····················@sibelius.cs.colostate.edu...
> Howdy all,
>
> I have a practical question which I hope someone can help me with.  I
> am interested in porting some code from C to lisp.  This is research
> code and I think that I will be better able to do my work if I have a
> lisp version of the algorithm -- I won't have that albatross of C
> around my neck. ;-)
>
> The problem is that I cannot afford a big hit in speed, a little hit
> is O.K.  The majority of the work the C code is doing is flipping bits
> in bit vectors, making copies and performing comparisons on the
> resulting vectors.  In the back of my mind I remember that this is one
> area where the performance of CL is a bit lacking.  But that is also
> potentially stale info.  I believe that LispM's were particularly bad.
> My Symbolics died so I don't have to worry about that any more, but I
> am interested in ACL and CMUCL on sparcs and CMUCL on x86 if it
> matters.
>
> My first question is if this a performance problem in reality or just
> a hobgoblin running around in my mind?  If not what is the
> situation for the above implementations and architectures?
>
> There is a fair amount of analysis which goes into reducing the
> problem to this set of bit vectors and twiddling operations.  Would I
> potentially be ahead if I left the bit twiddling to C and used a FFI
> to perform it.  Just for reference a vector can contain hundreds to
> thousands of bits, when twiddling bits only a half dozen or so are
> changed at a time and then the vector is copied. I know the copying my
> get expensive but I need to replicate the original version if only to
> serve as a basis before I address some of those possibilities.
>
> I know that I should benchmark this myself but I don't want to go to
> the trouble of implementing all this if it is doomed from the start.
>
>
> Thanks for the help,
> -Eric

From: eric dahlman
Subject: Re: Practical Question: Fast bit strings
Date: Fri, 22 Jun 2001 15:41:21 +0000
Message-ID: <tz4zob0sese.fsf@sibelius.cs.colostate.edu>

"Frank A. Adrian" <·······@uswest.net> writes:

> It would seem that, if you desired to move your code to a Lisp base to "be
> better able to do your work", speed would not be that big of an issue.  

Part of the problem is that it is not my code but someone else's.
Basically it is performing a search in a space of bit vectors, the
problem is that I want to modify the algorithm but it is highly
optimized toward one search method with a specific allocation
pattern. I have ripped out all of this and tacked on a conservative
garbage collector to support my modifications.  At this point I
believe that I would better served by just reimplementing the system
in lisp and working in that environment.  The work that I have done so
far has taken weeks and it would have been 2 days in CL.  (I am no
slouch as a C or CL programmer, so I would not attribute too much of
that to language familiarity, although I am happier in CL.)

> I'm fairly sure that copying all of those vectors in C must be a
> pain.

It is just a call to memcpy ;-) Actually this operation is one where I
was prepared to take a hit moving to lisp since I believe that on x86
machines glibc+gcc uses MMX instructions to do the copy which I don't
think a lisp would do.

> In fact, if, as you say, "a vector can contain hundreds to thousands
> of bits, when twiddling bits only a half dozen or so are changed at
> a time and then the vector is copied", then you must be able to find
> a sparse representation of the changes that would take less time to
> process (e.g., model your vector as a delta structure holding a
> sparse representation of the changed bits with a pointer to another
> vector model; as you do the ops, keep chaining them together; when
> the set of accumulated changes grows large enough (or you actually
> need a vector representation of the result), then create a new
> vector out of it).  This approach would be much easier in Lisp.  It
> would probably save time over the current algorithm in C because you
> wouldn't be consing vectors all the time.

That was one of my big motivations for doing this.  Actually, there
are patterns in the "twiddling" so I was more interested in using this
technique to discover "macros" and null patterns which I could then
compile into new operations or avoidance rules.

> In short, if you're going to port your code, look for ways of saving
> time by using the language to its best advantage, don't just
> transcribe the algorithm.  Remember, you can always dump
> intermediate results to make sure that the algorithm using the new
> data model gives the same results as the original.

I am afraid that a necessary step is still going to be a transcription
of the original program as a base line case.  Unfortunately, a big part
of the reason for that is political and silly.  Suffice it to say that
if I can't get performance approaching that of C the work I am doing
will not even be looked at.  An argument like "I reimplemented this
algorithm in CL and it is 10 times slower than the original C version,
then I made this neat change and it is now only 2 times slower!"  just
won't fly. Which is a pity since algorithmically the change was most
likely language independent.  I would need to back fit the change into
the original C implementation which is not very amenable to change.

Thanks,
-Eric

From: Frank A. Adrian
Subject: Re: Practical Question: Fast bit strings
Date: Sun, 24 Jun 2001 20:26:31 +0000
Message-ID: <Y1sZ6.1708$1v2.375655@news.uswest.net>

"eric dahlman" <·······@cs.colostate.edu> wrote in message
····················@sibelius.cs.colostate.edu...
> I am afraid that a necessary step is still going to be a transcription
> of the original program as a base line case.  Unfortunately, a big part
> of the reason for that is political and silly.  Suffice it to say that
> if I can't get performance approaching that of C the work I am doing
> will not even be looked at.  An argument like "I reimplemented this
> algorithm in CL and it is 10 times slower than the original C version,
> then I made this neat change and it is now only 2 times slower!"  just
> won't fly. Which is a pity since algorithmically the change was most
> likely language independent.  I would need to back fit the change into
> the original C implementation which is not very amenable to change.

Well, again, if this is really "research" and not just another game in
having researchers doing advanced "development", you can show that your
results took fewer abstract operations rather than looking at stupid things
like CPU cycles and clock time that introduce stuff like OS choice and
compiler choice into the mix.  On the other hand, if wall clock time is such
a big issue, just find a faster C compiler - after all, you'd have the same
effect.

Sorry, but you touched a nerve here.  Research is being bastardized at an
amazing rate in this country.  Researchers should fight back.  It doesn't
matter if the Lisp version runs 100X slower.  If your not looking into what
the language system is doing with your algorithm, it has NO bearing on the
actual research.

faa

From: Tim Bradshaw
Subject: Re: Practical Question: Fast bit strings
Date: Sun, 24 Jun 2001 20:53:53 +0000
Message-ID: <ey3vgllfvku.fsf@cley.com>

* Frank A Adrian wrote:

> Sorry, but you touched a nerve here.  Research is being bastardized at an
> amazing rate in this country.  Researchers should fight back.  It doesn't
> matter if the Lisp version runs 100X slower.  If your not looking into what
> the language system is doing with your algorithm, it has NO bearing on the
> actual research.

If the problem is to actually generate some plans to explore various
algorithms then running 100x slower can seriously matter - taking over
4 days to generate a plan rather than an hour can seriously crap all
over the work you're trying to do. (The AI planning systems I knew
often took hours of real time to generate complex plans, in quite
recent history: it's not a simple problem).

Of course there's no evidence Lisp will be 100x as slow as anything,
so this is somewhat silly.  However it is *not* the case that just
because it's research it means you can use some incredibly slow
system.

--tim

From: Bulent Murtezaoglu
Subject: Re: Practical Question: Fast bit strings
Date: Mon, 25 Jun 2001 01:21:58 +0000
Message-ID: <87wv61e4je.fsf@nkapi.internal>

>>>>> "FA" == Frank A Adrian <·······@uswest.net> writes:

    FA> [...] Sorry, but you touched a nerve here.  Research is being
    FA> bastardized at an amazing rate in this country.  Researchers
    FA> should fight back.  It doesn't matter if the Lisp version runs
    FA> 100X slower.  If your not looking into what the language
    FA> system is doing with your algorithm, it has NO bearing on the
    FA> actual research.

I am with Tim B. on this.  I cannot comment on the bastardization of
research since I am not up to date on the present state of affairs,
but there are solid arguments for needing efficiency in research code.
The obvious case would be robotics, but one maybe not-so-obvious case
is when you are coding to gain insights into the problem and the
problem exhibits some "knee" phenomenon such that interesting things
only happen after a certain problem size is reached (think of neural
nets and such).  High constants just kill your exploratory programming
efforts in the latter case because it cuts down the 
tweak-run-observe-think-tweak cycles you can get in during your work day.  
So no, if the run time goes from 1 minute to 1:40, it will have a bearing 
on your research.

While I agree that blind benchmarking of research code is of
dubious value, I cannot agree that wall-clock inefficiency is irrelevant.
Even if your end product is proofs of upper bounds of time complexity 
you might need to get your insights from exploration via actual coding 
and that's where you may need efficiency.  AI code tends to operate very
close to the limits of practical tractability and the last thing you need
is gratuitous high constant factors introduced by sloppiness in the 
language implementation.  With a decent compiler, Common Lisp avoids this
for the most part and it gives you the ability to go beyond what's 
prossible with, say, C by opening doors to runtime compilation of your
domain language into efficient native code.  

I suspect we are missing your point here though.  Maybe you could restate?

cheers,

BM

From: Frank A. Adrian
Subject: Re: Practical Question: Fast bit strings
Date: Mon, 25 Jun 2001 03:51:51 +0000
Message-ID: <rzyZ6.1048$JP5.545266@news.uswest.net>

"Bulent Murtezaoglu" <··@acm.org> wrote in message
···················@nkapi.internal...
> >>>>> "FA" == Frank A Adrian <·······@uswest.net> writes:
>
> I suspect we are missing your point here though.  Maybe you could restate?

Sorry... I was a bit unclear as to what I was upset about.  Quite often
these days, researchers are put into positions where they need to produce
software in the "flavor of the day" rather than what would be better for the
research.  Sometimes this is what the funder asks for, sometimes the
researcher does it in order to gain greater access to a wider variety of
funding sources, and sometimes it's just because - like most of us -
researchers are exposed to mainstream technology and, in attempt to be "more
relevant" (a somewhat specious criterion for real research if I ever heard
one), force themselves to work within self-circumscribed technological
boundaries.

When you made the statement "I've spent a great deal of time on this so far
and I could do this in two days in CL" it appeared to me that you were
choosing C because of inertia and not because it was best for the research.
After all, coding in CL would allow you to advance the programming
underlying the research in a faster manner.  OTOH, Tim is right about the
non-viability of running a planning algorithm for four days (unless the plan
has a year-long timeframe :-) and I apologize if my use of the term
"bastardized" offended anyone or if it was implied that I was talking
specifically about your work.  However, I will not withdraw its use for the
cases I outlined above.  I also have a few other choice words I could use in
these cases, as well.

That being said, maybe you could get the best of both worlds by using C
primitives to allocate, access, and manipulate the low-level bitmaps, while
using the Lisp code to keep the deltas at a higher level.  This could still
possibly provide a good amount of speedup depending on the ratio of
high-level to primitive code.

I'd also still be very certain that Lisp *would* be a disaster in the speed
area before you write it off.  Even if the Lisp code takes 3X as long to
run, if you can speed up the processing by 4X, you've still gained 25%.  It
seems as if you're spinning your wheels in C land, so I'd be worried about
completion in C, also.  Finally, the abstraction capabilities of Lisp would
allow you to do an initial implementation in bit-vectors and, if necessary,
move them over to arrays of fixnums using the LOGxxx functions with little
change to the overlaying code again improving speed.

I guess this all boils down to the issue that I can't really see why people
are always saying that Lisp is "slow", when they continue to write their C
and C++ behemoths whose memory management overhead and
construction/destruction requirements to avoid memory management hassles
take so much time.  I'm even more dishartened when they start to use
(shudder) Java.  This is especially egregious in the world of research,
where the investigator would be better off sharpening his tools and
remaining open to a wide variety of them.

faa

From: Tim Bradshaw
Subject: Re: Practical Question: Fast bit strings
Date: Mon, 25 Jun 2001 10:30:17 +0000
Message-ID: <nkjpubshmx2.fsf@tfeb.org>

"Frank A. Adrian" <·······@uswest.net> writes:

> 
> Sorry... I was a bit unclear as to what I was upset about.  Quite often
> these days, researchers are put into positions where they need to produce
> software in the "flavor of the day" rather than what would be better for the
> research.  Sometimes this is what the funder asks for, sometimes the
> researcher does it in order to gain greater access to a wider variety of
> funding sources, and sometimes it's just because - like most of us -
> researchers are exposed to mainstream technology and, in attempt to be "more
> relevant" (a somewhat specious criterion for real research if I ever heard
> one), force themselves to work within self-circumscribed technological
> boundaries.
> 

I think that, quite often, it's because the researcher has half an eye
on getting a job as a programmer later and want to have a suitably
buzzword-full CV.

Very often, in my experience, it's also been because researchers,
being `scientists', simply believe any random lie they are told by
someone pushing a language or OS.  I wonder at the quality of the
research produced by people this gullible, but probably that's just me
being cynical.

--tim

From: Bulent Murtezaoglu
Subject: Re: Practical Question: Fast bit strings
Date: Thu, 21 Jun 2001 09:06:06 +0000
Message-ID: <87g0cufbfz.fsf@nkapi.internal>

>>>>> "ed" == eric dahlman <·······@cs.colostate.edu> writes:
[...]
    ed> The problem is that I cannot afford a big hit in speed, a
    ed> little hit is O.K.  The majority of the work the C code is
    ed> doing is flipping bits in bit vectors, making copies and
    ed> performing comparisons on the resulting vectors.  

If these operations are truly the hot spot and the difference in
changing the constants mean a productivity increase, you _might_ also
consider going down into assembler for this.  There are fast copy
routines using Intel MMX instructions floating around and you might
aslo get some help from MMX in zeroing chunks of memory etc.  Only
makes sense if the thing takes ages to run and a slight speedup means
getting two tweak-run cycles within a work day as opposed to just one.
Ditto for either coding with or making sure your compiler knows about
"good" instructions.  At least on the Intel platform I think 
shift-by-n-bits has speeded up since 386 by the introduction of a 
barrel shifter.  I am unsure if there's a special instruction to trigger 
it though.
    
    ed> In the back
    ed> of my mind I remember that this is one area where the
    ed> performance of CL is a bit lacking.  

Cmucl seems to do fine actually according to a recent thread.

[...]
    ed> I know that I should benchmark this myself but I don't want to
    ed> go to the trouble of implementing all this if it is doomed
    ed> from the start.

If you can come up with a bunch of critical functions, then we may be
able to help you better.  This you can get at by profiling the C code
if you'll just translate.  

cheers,

BM

From: eric dahlman
Subject: Re: Practical Question: Fast bit strings
Date: Fri, 22 Jun 2001 15:55:39 +0000
Message-ID: <tz4vglose4k.fsf@sibelius.cs.colostate.edu>

Bulent Murtezaoglu <··@acm.org> writes:

> >>>>> "ed" == eric dahlman <·······@cs.colostate.edu> writes:
> [...]
>     ed> The problem is that I cannot afford a big hit in speed, a
>     ed> little hit is O.K.  The majority of the work the C code is
>     ed> doing is flipping bits in bit vectors, making copies and
>     ed> performing comparisons on the resulting vectors.  
> 
> If these operations are truly the hot spot and the difference in
> changing the constants mean a productivity increase, you _might_ also
> consider going down into assembler for this.  There are fast copy
> routines using Intel MMX instructions floating around and you might
> aslo get some help from MMX in zeroing chunks of memory etc.  Only
> makes sense if the thing takes ages to run and a slight speedup means
> getting two tweak-run cycles within a work day as opposed to just one.
> Ditto for either coding with or making sure your compiler knows about
> "good" instructions.  At least on the Intel platform I think 
> shift-by-n-bits has speeded up since 386 by the introduction of a 
> barrel shifter.  I am unsure if there's a special instruction to trigger 
> it though.
>     
>     ed> In the back
>     ed> of my mind I remember that this is one area where the
>     ed> performance of CL is a bit lacking.  
> 
> Cmucl seems to do fine actually according to a recent thread.

Thanks that is the kind of testimonial I was looking for.

> 
> [...]
>     ed> I know that I should benchmark this myself but I don't want to
>     ed> go to the trouble of implementing all this if it is doomed
>     ed> from the start.
> 
> If you can come up with a bunch of critical functions, then we may be
> able to help you better.  This you can get at by profiling the C code
> if you'll just translate.  

Honestly there is not that much to profile, the algorithm is searching
a space where the individual nodes are represented by these bit
vectors and the bit twiddling is generating the new bit vectors.  The
bottleneck has always been node generation, which is this
copy+twiddle operation.

There is some irony in this historically since this community
traditionally worked in CL but moved to C because "it was faster".
That is an unfair generalization but almost all the the current state
of the art systems are in C.  I want to shift the scale back to the
good guys ;-)

> 
> cheers,
> 
> BM

Thanks,
-Eric

P.S.  Remember the UX400 you sent me many moons ago.  I could not get
it to power up in my old sparc (but that machine had backplane issues
anyway) I just got a "new" Sun3 and I am going to try to bring it up
this weekend.  If I get it running I'll send you a screen shot.  That
board has been a big factor in slowing down my dissertation and
preserving my sanity ;-) Thanks for the fun toy!!!

From: Bulent Murtezaoglu
Subject: Re: Practical Question: Fast bit strings
Date: Fri, 22 Jun 2001 22:23:22 +0000
Message-ID: <87zob0dufn.fsf@nkapi.internal>

>>>>> "ED" == eric dahlman <·······@cs.colostate.edu> writes:
[...]
    ED> Honestly there is not that much to profile, the algorithm is
    ED> searching a space where the individual nodes are represented
    ED> by these bit vectors and the bit twiddling is generating the
    ED> new bit vectors.  The bottleneck has always been node
    ED> generation, which is this copy+twiddle operation.

It is also possible that the C coders assumed that bit-vectors would
be 'faster' without benchmarking.  If everything fits in main memory
and things don't spill out of L1 cache in inner loops, char or even
fixnum arrays might be faster.  A possible complication for char
arrays is that the compiler will need to shift your fixnum's to
index into them (low order bits hold the tags in CMUCL), so you still pay
for the shift if you don't make the compiler use the natural machine
words.

    ED> There is some irony in this historically since this community
    ED> traditionally worked in CL but moved to C because "it was
    ED> faster".  That is an unfair generalization but almost all the
    ED> the current state of the art systems are in C.  I want to
    ED> shift the scale back to the good guys ;-)

What is the field?  I imagine it is AI, but what kind?  

    ED> ...  That board has been a big factor
    ED> in slowing down my dissertation and preserving my sanity ;-)
    ED> Thanks for the fun toy!!!

Oh lovely, add to my guilt!  You already overpaid for postage, and now
you're telling me I provided the distraction?  

cheers,

BM

From: Eric Dahlman
Subject: Re: Practical Question: Fast bit strings
Date: Fri, 22 Jun 2001 22:57:48 +0000
Message-ID: <tz4hex8cec3.fsf@sibelius.cs.colostate.edu>

Bulent Murtezaoglu <··@acm.org> writes:

[snip...]
> It is also possible that the C coders assumed that bit-vectors would
> be 'faster' without benchmarking.  If everything fits in main memory
> and things don't spill out of L1 cache in inner loops, char or even
> fixnum arrays might be faster.  A possible complication for char
> arrays is that the compiler will need to shift your fixnum's to
> index into them (low order bits hold the tags in CMUCL), so you still pay
> for the shift if you don't make the compiler use the natural machine
> words.

I can see your point and I agree that this should really be profiled
to be done right.  I still feel that the compression achieved by using
bit vectors makes the C approach a winner in this case.  Once it is
implemented I can change the array type and find out ;-)

> 
>     ED> There is some irony in this historically since this community
>     ED> traditionally worked in CL but moved to C because "it was
>     ED> faster".  That is an unfair generalization but almost all the
>     ED> the current state of the art systems are in C.  I want to
>     ED> shift the scale back to the good guys ;-)
> 
> What is the field?  I imagine it is AI, but what kind?  

Yeah it is AI planning.  I tried to be somewhat evasive since I know
that many of my peers read and occasionally post to c.l.l and I did not
want to offend any one.  Not that anyone should have reason to be
offended they have all done wonderful work.  I just didn't want to
have to go to the effort of making sure that my remarks would not be
misconstrued later on when somebody does a search for foo+bar+baz on
google. 

> 
>     ED> ...  That board has been a big factor
>     ED> in slowing down my dissertation and preserving my sanity ;-)
>     ED> Thanks for the fun toy!!!
> 
> Oh lovely, add to my guilt!  You already overpaid for postage, and now
> you're telling me I provided the distraction?  

Well at least it kept me off the streets ;-)

Thanks,
-Eric