From: Joshua M Yelon
Subject: Re: Performance increase needed any ideas
Date: 
Message-ID: <C32LrA.9wq@news.cso.uiuc.edu>
·······@longs.LANCE.ColoState.Edu (Chris Miner) writes:

>I spent the last month writing a Canonical LR(1) parser generator in Lisp.
>It has code hooks for each rule of the grammar which are executed on
>reduction of the rule.  I am almost real happy with it.
>
>You see, it is the first lisp code I have written, so I thought the poor 
>performance was just my lousy lisp code writting ability.  I thought I
>would be able to review my terrifically inefficient code and get at
>least a 10x improvement.  I was wrong.  There doesn't seem to be much
>to improve.

I have read your other post, too, and from what little evidence I can
glean, my guess is that you have used linked lists to implement
everything.  That's natural - and also a bad idea.  If this is the
case, you may need to do a fair bit of restructuring to get the
program to work fast.

In general, I would advise choosing your data structures in the same way
you would choose them in C - pick the one that your computer-science
background tells you is the best.  So, for this problem:

Abstract Data Types needed:

LR0ITEM - and the operations that need to be fast:

    * insert into SET-OF-LR0ITEM.

    * find symbol after dot (for next-state computation).

    * compute successor (during next-state computation)

SET-OF-LR0ITEM - and the operations that need to be fast:

    * Membership test.

    * Union and intersection (for closure operations).

The set operations will be fastest if you represent the set as a
bitvector.  This is feasible, since the number of LR0ITEMS is a small,
finite number (every grammar rule, with a dot in every possible place -
not too big).

To use LR0ITEMs in bitvectors, every LR0ITEM needs to have an item
number - therefore, it must be possible to compute the item-number of
an LR0ITEM very rapidly.  This is possible if LR0ITEMs are structs
with a field ITEM-NUMBER.  As soon as you see the grammar, generate
all the LR0ITEMs, and assign the item numbers starting at zero at that
time.

To make the "successor" computation fast, add a field SUCCESSOR to
LR0ITEM, pointing to its successor (or NIL if its final).  Initialize
the successor field as you generate all the LR0ITEMs.

The "find symbol after dot" operation can also be implemented as a
field lookup, if you just store the information in another field of
the LR0ITEM, initializing it as you build the LR0ITEM.

Plus, since the algorithms that you are implementing are confusing,
you'll need to be able to print out data structures in a comprehensible
way.  The bitvectors in particular will be hard to read.  So, you can
define SET-OF-LR0ITEM to be a struct, with one field BITVEC.  That
will allow you to print them out in a more meaningful way (you can
control how a struct is printed).

Note, also, that as a general rule, it is "safe" to make every ADT
in your program a struct - you can always add a field that points
to whatever you want.

Hope this helps, and hope you don't have to kill yourself to restructure
the thing, if that's what you choose to do.

- Josh