From: akopa
Subject: A Proper Lexer
Date: 
Message-ID: <9ade14a3-b994-4c92-8547-7dd8106ecbc9@34g2000hsh.googlegroups.com>
Does any one have any advice on doing fast-ish character I/O for a Lex
like scanner.   I've read http://www.emmett.ca/~sabetts/slurp.html,
but I want to present single chars to the scanner.

I already figure I'll have to role my own regex engine since in order
to make a scanner that sees one character at a time I will need more
control over the matching algorithm than something like cl-regex or cl-
ppcre would give me.

I was a little surprised to find no equivalent of Lex available for
Common Lisp.  There is http://www.geocities.com/mparker762/clawk but
it is string oriented, not stream oriented.

Matt

From: Petter Gustad
Subject: Re: A Proper Lexer
Date: 
Message-ID: <874p4kw4de.fsf@pangea.home.gustad.com>
akopa <··················@gmail.com> writes:

> I was a little surprised to find no equivalent of Lex available for
> Common Lisp.  There is http://www.geocities.com/mparker762/clawk but
> it is string oriented, not stream oriented.


You could do buffered reads and then pass the strings to Parker's
lexer. Of course you need some clever buffering schemes unless your
files are small enough to read the entire file into memory.

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
From: Paul Tarvydas
Subject: Re: A Proper Lexer
Date: 
Message-ID: <gaha4r$dbq$1@aioe.org>
akopa wrote:

> Does any one have any advice on doing fast-ish character I/O for a Lex
> like scanner.   I've read http://www.emmett.ca/~sabetts/slurp.html,
> but I want to present single chars to the scanner.
> 
...
> 
> I was a little surprised to find no equivalent of Lex available for
> Common Lisp.  There is http://www.geocities.com/mparker762/clawk but
> it is string oriented, not stream oriented.
> 

There is little incentive to use Lex in lisp, because the lisp reader *is* a
lexer (tokenizer), hence, lisp is the language least likely to have a
Lex-like library.  I've built many small languages and just used what lisp
gives me, e.g. if you can design it so that the little language has a space
between each token, you don't have to write a lexer.  I suppose you could
get fancier by employing read macros, et al.

When I write languages that cannot follow the above simplification, I use my
home-grown CL versions of extended S/SL (syntax/semantic language) and my
experimental packrat parser (which describes the construction of tokens and
parses - google for "Bryan Ford Packrat Parser").  I can share these with
you, but they are not Lex-like (nor are they well-documented, but there is
always email :-).

By definition, lexing something a character at a time is, by definition, a
state machine. If you design it with a state machine macro - code executed
on entry, transition and exit - you will not tie yourself up in knots.  I
have a state machine macro which I could share.  

Lexing is often a simple enough job that you can just roll it by hand using
a few simple primitives (I think I can list them, if you're interested). 
Again, the state machine paradigm is the way to go.

As for speed, what are you doing?  I've stopped worrying about speed in
lexing and parsing - desktop machines are so fast these days that I seem to
get away with fully-backtracking parsers and a packrat parser with no
packratting and don't even notice.

And if you REALLY want Lex, then I know from direct experience that you can
code one up directly from the algorithms in the Dragon book (I did it about
30 years ago, but my version is lost, or on an 8.5" floppy :-).

It ain't exactly what you asked for, but contact me if any of the above is
of interest.

pt
From: Scott Burson
Subject: Re: A Proper Lexer
Date: 
Message-ID: <b51fcc15-92b9-4b1b-bc37-b90ee4f7d256@s20g2000prd.googlegroups.com>
On Sep 13, 2:08 pm, Paul Tarvydas <········@visualframeworksinc.com>
wrote:
> my version is lost, or on an 8.5" floppy :-).

You mean, of course, an 8" floppy :-)

-- Scott (from your friendly Historical Accuracy Police)
From: Paul Tarvydas
Subject: Re: A Proper Lexer
Date: 
Message-ID: <gahs6g$k89$1@aioe.org>
Scott Burson wrote:

> On Sep 13, 2:08 pm, Paul Tarvydas <········@visualframeworksinc.com>
> wrote:
>> my version is lost, or on an 8.5" floppy :-).
> 
> You mean, of course, an 8" floppy :-)
> 
> -- Scott (from your friendly Historical Accuracy Police)


Yeah, you're right.  It's been a while :-).
From: Rob Warnock
Subject: Re: A Proper Lexer
Date: 
Message-ID: <neudnbp4r7O54VHVnZ2dnUVZ_tLinZ2d@speakeasy.net>
Paul Tarvydas  <········@visualframeworksinc.com> wrote:
+---------------
| akopa wrote:
| > I was a little surprised to find no equivalent of Lex available for
| > Common Lisp.  There is http://www.geocities.com/mparker762/clawk but
| > it is string oriented, not stream oriented.
| 
| There is little incentive to use Lex in lisp, because the lisp reader *is*
| a lexer (tokenizer), hence, lisp is the language least likely to have a
| Lex-like library.  I've built many small languages and just used what lisp
| gives me, e.g. if you can design it so that the little language has a space
| between each token, you don't have to write a lexer.  I suppose you could
| get fancier by employing read macros, et al.
+---------------

In that regard, the OP might be interested in META-style parsers,
originally developed by D. V. Schorre circa 1962:

    http://en.wikipedia.org/wiki/META_II

     Schorre, D.V. "META II: A Syntax-Oriented Compiler Writing Language".
     Proc. 19'th Nat'l. Conf. of the ACM (Aug. 1964),D1.3-1-D1.3-11.

[I used Michael Green's implementation of Schorre's META II on the
DEC PDP-10 (part of Green's ALGOL-W compiler) in 1970 to write a tiny
BLISS compiler in a weekend.]

In 1991, Henry Baker did a very loose re-interpretation (or compilation)
of the META style into CL functions, macros, & readmacros to make META-
based parsers *very* concisely embeddable within a CL propram:

    http://home.pipeline.com/~hbaker1/Prag-Parse.html
    "Pragmatic Parsing in Common Lisp"

Jochen Schmidt typed in and reorganised Baker's code, and made it
available here:

    http://cclan.cvs.sourceforge.net/cclan/cclan/meta/

Another version by Fare Rideau is linked from here:

    http://www.cliki.net/meta

And, finally, yet a different version of Baker's code, assembled
by Henrik Motakef, is also available as a FreeBSD "port" under
the name "cl-meta".


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Volkan YAZICI
Subject: Re: A Proper Lexer
Date: 
Message-ID: <30da4d22-f269-4149-9d46-209d3256b6b3@y38g2000hsy.googlegroups.com>
On Sep 14, 5:51 am, ····@rpw3.org (Rob Warnock) wrote:
> In that regard, the OP might be interested in META-style parsers,
> originally developed by D. V. Schorre circa 1962:
>
>    http://en.wikipedia.org/wiki/META_II
>
>      Schorre, D.V. "META II: A Syntax-Oriented Compiler Writing Language".
>      Proc. 19'th Nat'l. Conf. of the ACM (Aug. 1964),D1.3-1-D1.3-11.
>
> [I used Michael Green's implementation of Schorre's META II on the
> DEC PDP-10 (part of Green's ALGOL-W compiler) in 1970 to write a tiny
> BLISS compiler in a weekend.]
>
> In 1991, Henry Baker did a very loose re-interpretation (or compilation)
> of the META style into CL functions, macros, & readmacros to make META-
> based parsers *very* concisely embeddable within a CL propram:
>
>    http://home.pipeline.com/~hbaker1/Prag-Parse.html
>     "Pragmatic Parsing in Common Lisp"
>
> Jochen Schmidt typed in and reorganised Baker's code, and made it
> available here:
>
>    http://cclan.cvs.sourceforge.net/cclan/cclan/meta/
>
> Another version by Fare Rideau is linked from here:
>
>    http://www.cliki.net/meta
>
> And, finally, yet a different version of Baker's code, assembled
> by Henrik Motakef, is also available as a FreeBSD "port" under
> the name "cl-meta".

And there is also meta-sexp[1], bundled with a proper documentation
and ease of use of s-expressions.


Regards.

[1] http://www.cliki.net/meta-sexp
From: Petter Gustad
Subject: Re: A Proper Lexer
Date: 
Message-ID: <8763oyogjh.fsf@pangea.home.gustad.com>
····@rpw3.org (Rob Warnock) writes:


> In 1991, Henry Baker did a very loose re-interpretation (or compilation)
> of the META style into CL functions, macros, & readmacros to make META-
> based parsers *very* concisely embeddable within a CL propram:
>
>     http://home.pipeline.com/~hbaker1/Prag-Parse.html
>     "Pragmatic Parsing in Common Lisp"

Meta is very elegant and simple. However, I find that it's a little
difficult to maintain parsers written using Meta. I like Zebu as a
parser generator, but unfortunately it's missing a decent lexer :-( If
there was a easy way to get Zebu's list-parser to accept tokens from
Parker's lexer it would have been a nice combination. Anybody done
this yet?

Petter

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
From: Chun Tian (binghe)
Subject: Re: A Proper Lexer
Date: 
Message-ID: <3f40774b-db06-438b-be16-05fb5d88829c@r15g2000prd.googlegroups.com>
I second this. In most case there's no need for regular expression
when doing lexer job.

Months ago I wrote an ASN.1->Lisp compiler [1], the lexer part is
whole implemented by CL readtable. ASN.1 token is quite complicated
but CL readtable is powerful enough.

--binghe

[1] http://sourceforge.net/project/showfiles.php?group_id=209101&package_id=285435

Paul Tarvydas wrote:
> akopa wrote:
>
> > Does any one have any advice on doing fast-ish character I/O for a Lex
> > like scanner.   I've read http://www.emmett.ca/~sabetts/slurp.html,
> > but I want to present single chars to the scanner.
> >
> ...
> >
> > I was a little surprised to find no equivalent of Lex available for
> > Common Lisp.  There is http://www.geocities.com/mparker762/clawk but
> > it is string oriented, not stream oriented.
> >
>
> There is little incentive to use Lex in lisp, because the lisp reader *is* a
> lexer (tokenizer), hence, lisp is the language least likely to have a
> Lex-like library.  I've built many small languages and just used what lisp
> gives me, e.g. if you can design it so that the little language has a space
> between each token, you don't have to write a lexer.  I suppose you could
> get fancier by employing read macros, et al.
>
> When I write languages that cannot follow the above simplification, I use my
> home-grown CL versions of extended S/SL (syntax/semantic language) and my
> experimental packrat parser (which describes the construction of tokens and
> parses - google for "Bryan Ford Packrat Parser").  I can share these with
> you, but they are not Lex-like (nor are they well-documented, but there is
> always email :-).
>
> By definition, lexing something a character at a time is, by definition, a
> state machine. If you design it with a state machine macro - code executed
> on entry, transition and exit - you will not tie yourself up in knots.  I
> have a state machine macro which I could share.
>
> Lexing is often a simple enough job that you can just roll it by hand using
> a few simple primitives (I think I can list them, if you're interested).
> Again, the state machine paradigm is the way to go.
>
> As for speed, what are you doing?  I've stopped worrying about speed in
> lexing and parsing - desktop machines are so fast these days that I seem to
> get away with fully-backtracking parsers and a packrat parser with no
> packratting and don't even notice.
>
> And if you REALLY want Lex, then I know from direct experience that you can
> code one up directly from the algorithms in the Dragon book (I did it about
> 30 years ago, but my version is lost, or on an 8.5" floppy :-).
>
> It ain't exactly what you asked for, but contact me if any of the above is
> of interest.
>
> pt
From: Matthew D Swank
Subject: Re: A Proper Lexer
Date: 
Message-ID: <1PZyk.4138$Dj1.158@newsfe01.iad>
On Sat, 13 Sep 2008 17:08:51 -0400, Paul Tarvydas wrote:

> akopa wrote:
> 
>> Does any one have any advice on doing fast-ish character I/O for a Lex
>> like scanner.   I've read http://www.emmett.ca/~sabetts/slurp.html, but
>> I want to present single chars to the scanner.
>> 
> ...
>> 
>> I was a little surprised to find no equivalent of Lex available for
>> Common Lisp.  There is http://www.geocities.com/mparker762/clawk but it
>> is string oriented, not stream oriented.
>> 
>> 

> When I write languages that cannot follow the above simplification, I
> use my home-grown CL versions of extended S/SL (syntax/semantic
> language) and my experimental packrat parser 

...

> If you design it with a state machine macro 

... 

> I have a state machine macro which I could share.
> 

Thank you, but the need is not so urgent.  Mostly I wanted to show 
someone that activities that traditionally require a preprocessor could 
be done in lisp with macros, and scanner and parser generators seemed 
like good examples. 

I am having so much fun with this paper: http://swtch.com/~rsc/regexp/
regexp1.html, that I'll probably throw something simple together using 
the SRE syntax from scsh. Most of what I usually want to lex is line 
oriented, and could be dealt with using cl-ppcre, but I got obsessed with 
the problem.

Tangents are more fun than real work anyway.

Matt