Can someone clarify behaviour of nregex.cl

From: Thaddeus L Olczyk
Subject: Can someone clarify behaviour of nregex.cl
Date: Fri, 19 Jul 2002 14:28:00 +0000
Message-ID: <5s2gjuc3ft9qf6543cf7u76ih1b5cjvqsc@4ax.com>

From initially playing with nregex.lisp,
I came to the conclusion that executing the function generated
by regex-compile could have  two kinds of results:

If there was a match, the *groupings* variable contains the number
of subgroups plus one ( for the whole regex). The *regex-groups*
variable would be a ten dimensional array containing begin-end pairs
for the whole group ( in the first slot ), and subgroups ( for the
remaining slots ). The slots that are not associated with a group are
assigned a 0.

If there was no match, the *groupings* variable contains the number
1. The *regex-groups* variable would be a ten dimensional array
containing 0 except for the first slot which would contain
(length-of-string searched nil).

I based some code on this.
When that code started crashing I examined nregex closer with these
results:

-----------------------doc.lisp-----------------------------------------
(load "nregex.cl")

(defun match(reg-str expression)
  (setf *regex-groupings* 0)
  (setf *regex-groups* (make-array 10))
  (funcall (compile nil (regex-compile reg-str)) expression)
  (loop for i from 0 below *regex-groupings*
        collect (aref *regex-groups* i)))

(defun one-test(reg-str expr)
  (let ((x (match reg-str expr)))
    x
    (terpri)
    (princ "###")
    (princ expr)
    (princ " % ")
    (princ reg-str)
    (princ "------->")
    (princ x)
    (terpri)
    (princ "...")
    (princ *regex-groupings*)
    (princ " ")
    (princ *regex-groups*)
    (princ " ")
    ))

(defun test()
  (one-test "dess" "abcdefghijklmnop")
  (one-test "da" "abcdefghijklmnop")
  (one-test "qd" "abcdefghijklmnop")
  (one-test "qrst" "abcdefghijklmnop")
  )

(setf ext:*gc-verbose* nil)
(test)
(quit)
-------------------------------------doc.lisp---------------------------------------------------
when running doc.lisp it produced the following output ( sans
compiling messages to stderr ):

-------------------------------output-----------------------------
; Loading #p"/home/cmucl/bug_doc/doc.lisp".
;; Loading #p"/home/cmucl/bug_doc/nregex.cl".

###abcdefghijklmnop % dess------->((16 NIL))
...1 #((16 NIL) 0 0 0 0 0 0 0 0 0) 
###abcdefghijklmnop % da------->((16 NIL))
...1 #((16 NIL) 0 0 0 0 0 0 0 0 0) 
###abcdefghijklmnop % qd------->(0)
...1 #(0 0 0 0 0 0 0 0 0 0) 
###abcdefghijklmnop % qrst------->(0)
...1 #(0 0 0 0 0 0 0 0 0 0) 
------------------------------output-----------------------------

So it seems that nregex returns two different kinds of
results:
((length-string-searched nil)) if the first letter of the regex
is present in the searched string.
(0) if the first letter of regex is not present.

So it appears that a new "special case" of the first letter not
present in the searched string has come to fore 
( actually it's probably more accurate to say that the special
  case is when the first letter is present in the searched string ).

The problem is that this is probably the most used code in the
application. It is literary run into the ground. So I need to have
a fairly good idea of how the routine works. If one special case
pops up, then why can't there be five more. Each such special case
is going to detract from other work in the project and if it
happens at wrong times, it's going to be hard to find.

Unfortunately nregex doesn't contain complete documentation so
can anyone give a complete explanation of the "package"?

From: Raymond Toy
Subject: Re: Can someone clarify behaviour of nregex.cl
Date: Fri, 19 Jul 2002 15:20:20 +0000
Message-ID: <4n1y9zkamj.fsf@edgedsp4.rtp.ericsson.se>

>>>>> "Thaddeus" == Thaddeus L Olczyk <······@interaccess.com> writes:

    Thaddeus> From initially playing with nregex.lisp,
    Thaddeus> I came to the conclusion that executing the function generated
    Thaddeus> by regex-compile could have  two kinds of results:

I'm not sure, but I thought the function returned non-nil and you look
in *regex-groupings* and *regex-groups* to find the result.  At least
that's how maxima uses nregex.

You may also want to look at the nregex in maxima.  There are a few
bugs fixed in that version from the one you can find in the CMU AI
Archives.

    Thaddeus> The problem is that this is probably the most used code in the
    Thaddeus> application. It is literary run into the ground. So I need to have
    Thaddeus> a fairly good idea of how the routine works. If one special case
    Thaddeus> pops up, then why can't there be five more. Each such special case
    Thaddeus> is going to detract from other work in the project and if it
    Thaddeus> happens at wrong times, it's going to be hard to find.

You seem to be using CMUCL.  In that case, there's another regex
package by Sudhoy (?) that runs MUCH faster on CMUCL.  It's better
documented too and supports a more complete regex, with a few known
bugs.  If you can't find it, I'll try to dig up my copy.

Ray