From: Andreas Hinze
Subject: Is there a lexical analyser avail ?
Date: 
Message-ID: <3D2EBEA8.E9D1EDC9@smi.de>
Hi all,

i'm looking for a lexical analyser in CL. I found CLAWK but it
doesn't work with CMU due to a problem with LOOP (i tried it the
first time tonight and have not a final description yet, sorry).
Has anyone got CLAWK running with CMU or is there any other lexer 
available ?

Thanks in advance
Best
AHz

From: Jock Cooper
Subject: Re: Is there a lexical analyser avail ?
Date: 
Message-ID: <m3n0swg7bq.fsf@jcooper02.sagepub.com>
Andreas Hinze <···@smi.de> writes:

> Hi all,
> 
> i'm looking for a lexical analyser in CL. I found CLAWK but it
> doesn't work with CMU due to a problem with LOOP (i tried it the
> first time tonight and have not a final description yet, sorry).
> Has anyone got CLAWK running with CMU or is there any other lexer 
> available ?
> 
> Thanks in advance
> Best
> AHz

Using the CL READER you can put together a basic lexer without too much effort.  
And I am still a relative beginner.
Below is some code I wrote to do some lexing.   It isn't exactly general 
purpose, and not documented, but may be sufficient for a lot of uses.  
I have used it (with addition code not shown) to parse SQL and variously 
formatted config files.
You are welcome to use it for any purpose.  The ITERATE package is required.
Apologies if line-wrap mungs it up.  

;---------cut here
; ITERATE package must be USEd before loading
; entry points are lexical-tokenizer and make-tokens
; author: Jock Cooper

(defun get-specials (tokens)
  (iter (with specials = (sort (copy-list (member-if #'(lambda (val) (stringp (cdr val))) tokens)) #'string-lessp :key #'cdr))
	(with result = '())
	(for (sym . str) in specials)
	(for char0 = (char str 0))
	(for char1 = (and (= 2 (length str)) (char str 1)))
	(for lchar0 previous char0 initially nil)
	(for pair = (assoc char0 result :test #'char=))
	(if (null pair) (push (setf pair (cons char0 nil)) result))
	(for new = (cons char1 sym))
	(push new (cdr pair))
	(finally (return result))))
	
(defun %build-read-macros (specials)
  `(progn
    ,@(iter (for special in specials)
	(for func-name = (gensym))
	(collect `(flet ((,func-name (stream char) (declare (ignore char))
			  (let ((next-ch (peek-char nil stream nil :eof t)))
			    (case next-ch ,@(loop for foo in (cdr special)
						  for ch = (car foo)
						 collect `(,(if ch ch 'otherwise)
							   ,@(if ch `((read-char stream nil :eof t)) nil)
							   ',(cdr foo)))))))
		   (set-macro-character ,(car special) #',func-name))))))

(defun get-reader (pstring tokens)
  (let (string
	(pos 0) (len 0)
	pushback
	my-readtable
	token
	)
    (unless (null pstring)
      (setq string pstring pos 0)
      (setq len (length string)))
    (if (null string)
	(error "You must pass a string to be parsed"))
    #'(lambda (&optional unget)
	(cond (unget (push unget pushback))
	      (pushback (pop pushback))
	      (t (let ((*readtable* (or my-readtable
					(progn
					  (let ((*readtable* (copy-readtable))
						(forms (%build-read-macros (get-specials tokens))))
					    (eval forms)
					    (set-syntax-from-char #\# #\a)  ; make pound sign a normal char
					    (set-syntax-from-char #\' #\")    ; make ' a string delimiter like "
					    (setq my-readtable *readtable*))))))
		   (unless (>= pos len) 
		     (multiple-value-setq (token pos) (read-from-string string nil nil :start pos))
		     token)))))))

(defmacro make-tokens (list)
 `',(loop for tok in list
	  for sym = (etypecase tok (symbol tok) (cons (car tok)))
	  for val = (etypecase tok (symbol tok) (cons (cadr tok)))
	   collect (cons (intern (concatenate 'string "%" (string-upcase sym))) val)))

(defun dotpairp (val) (and (consp val) (atom (cdr val)) (not (listp (cdr val)))))
(deftype dotpair () '(satisfies dotpairp))

(defun lexical-tokenizer (pstring tokens)  
 (let* ((reader (get-reader pstring tokens))
	(tokenizer #'(lambda (list)
		       (iter (for tok in list)
			     (let ((dotpair (rassoc tok tokens :test #'equal)))
			       (collect (cond (dotpair (car dotpair)) ; here it decides what to return for each token it finds
					      ((characterp tok) (cons '%ident (string tok)))
					      ((stringp tok) (cons '%string tok))      
					      ((numberp tok) (cons '%number tok))      ; other types could be added here
					      ((and (symbolp tok) (char= (char (symbol-name tok) 0) #\%)) tok)
					      (t (cons '%ident tok))))))))
	(raw (iter (for tok = (funcall reader))
		   (if (null tok) (finish))
		   (collect tok)))
	(tokens (funcall tokenizer raw)))
   tokens))

(defparameter *teststring* "Here is a test3 string with test1 and test2, and a number (500) and a string \"with some symbols\" like 2 * 5 <= 15 ")

; tokens below just a list of symbols.  
; Any whitespace delimited chunk of text not found in the parse stream will return as (%ident . whatever) or (%number . whatever) 
; depending on its type etc.
;  you must pass the tokens thru make-tokens (a macro).
; note also that lisp special characters should be defined as tokens or the 
; reader will compain about apparent misusage (such as comma not in a backquote)
;  that could probably  also be done up in "defun get-reader" where the # sign is redefined to be normal
(defun example-use ()
  (lexical-tokenizer *teststring*
		     (make-tokens (test1 test2 test3 (OPAREN "(") (CPAREN ")") (TIMES "*") (LE "<=") (COMMA ",")))))
From: Helmut Eller
Subject: Re: Is there a lexical analyser avail ?
Date: 
Message-ID: <m2u1n4kcld.fsf@xaital.online-marketwatch.com>
Andreas Hinze <···@smi.de> writes:

> i'm looking for a lexical analyser in CL. I found CLAWK but it
> doesn't work with CMU due to a problem with LOOP (i tried it the
> first time tonight and have not a final description yet, sorry).
> Has anyone got CLAWK running with CMU or is there any other lexer 
> available ?

I had the same problem.  IIRC these loops contained something like
"finally return ...".  I changed that to "finally (return ...)".  At
least the test cases passed then.

hth,
helmut.
From: Marco Antoniotti
Subject: Re: Is there a lexical analyser avail ?
Date: 
Message-ID: <y6cfzyoivze.fsf@octagon.mrl.nyu.edu>
Helmut Eller <········@stud3.tuwien.ac.at> writes:

> Andreas Hinze <···@smi.de> writes:
> 
> > i'm looking for a lexical analyser in CL. I found CLAWK but it
> > doesn't work with CMU due to a problem with LOOP (i tried it the
> > first time tonight and have not a final description yet, sorry).
> > Has anyone got CLAWK running with CMU or is there any other lexer 
> > available ?
> 
> I had the same problem.  IIRC these loops contained something like
> "finally return ...".  I changed that to "finally (return ...)".  At
> least the test cases passed then.

You did the right thing and fixed CLAWK to be conformant.

        finally return ...

is not ANSI conformant ans per the definition of LOOP. 'finally' and
'initially' want a series of <compound form>s after them.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Michael Parker
Subject: Re: Is there a lexical analyser avail ?
Date: 
Message-ID: <3D315497.8040605@earthlink.net>
Helmut Eller wrote:
> Andreas Hinze <···@smi.de> writes:
> 
> 
>>i'm looking for a lexical analyser in CL. I found CLAWK but it
>>doesn't work with CMU due to a problem with LOOP (i tried it the
>>first time tonight and have not a final description yet, sorry).
>>Has anyone got CLAWK running with CMU or is there any other lexer 
>>available ?
> 
> 
> I had the same problem.  IIRC these loops contained something like
> "finally return ...".  I changed that to "finally (return ...)".  At
> least the test cases passed then.

This is correct.  One of the ACL people mentioned this to me a few days 
ago -- I just haven't gotten around to uploading the new version. 
Thanks for reminding me.  Later today, for sure.
From: Andreas Hinze
Subject: Re: Is there a lexical analyser avail ?
Date: 
Message-ID: <3D32A098.264B1256@smi.de>
Michael Parker wrote:
> 
> Helmut Eller wrote:
> > Andreas Hinze <···@smi.de> writes:
> >
> >
> >>i'm looking for a lexical analyser in CL. I found CLAWK but it
> >>doesn't work with CMU due to a problem with LOOP (i tried it the
> >>first time tonight and have not a final description yet, sorry).
> >>Has anyone got CLAWK running with CMU or is there any other lexer
> >>available ?
> >
> >
> > I had the same problem.  IIRC these loops contained something like
> > "finally return ...".  I changed that to "finally (return ...)".  At
> > least the test cases passed then.
> 
> This is correct.  One of the ACL people mentioned this to me a few days
> ago -- I just haven't gotten around to uploading the new version.
> Thanks for reminding me.  Later today, for sure.
Hi all.
I will pick the new copy today.
Thanks to all for the help.

Sincerly
AHz