From: ab talebi
Subject: list formats
Date: 
Message-ID: <3c5131e5.168769835@news.uio.no>
I think the way I posed the question was too complicated in latter
postings (under the name loop problem) so I try to samplify it:

we have this list:

(("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
hui))"))


and want to convert it to this format:

(("car" ((noun automoblie sks hui)(noun vehicle sks hui))
  "is" ((verb hji ska) (noun ako hui))
  "blue" ((adj hji hui)))
 ("strong" ((adj hji hui) (adv ly hui))
  "is" ((verb hji ska) (verb kos hji))
  "man" ((noun ako hui) (prop hui))))

appriciate any guidelines
(for more details, if needed please look at the postings, under "loop
problem"

tnx

ab talebi

From: Marco Antoniotti
Subject: Re: list formats
Date: 
Message-ID: <y6czo32l7jr.fsf@octagon.mrl.nyu.edu>
············@yahoo.com (ab talebi) writes:

> I think the way I posed the question was too complicated in latter
> postings (under the name loop problem) so I try to samplify it:

I have been following your postings for a while.  I believe a little
Socratic method would help here.

> 
> we have this list:
> 
> (("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
> ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
> ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
> hui))"))

Before we go on, please answer the following question.

What are the elements of this list?

Ciao

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: ab talebi
Subject: Re: list formats
Date: 
Message-ID: <pFQ48.2655$tn2.58767@news4.ulv.nextra.no>
"Marco Antoniotti" <·······@cs.nyu.edu> wrote in message
····················@octagon.mrl.nyu.edu...
>
> ············@yahoo.com (ab talebi) writes:
>
> > I think the way I posed the question was too complicated in latter
> > postings (under the name loop problem) so I try to samplify it:
>
> I have been following your postings for a while.  I believe a little
> Socratic method would help here.
>
> >
> > we have this list:
> >
> > (("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
> > ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
> > ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
> > hui))"))
>
> Before we go on, please answer the following question.
>
> What are the elements of this list?
>
> Ciao


Thank you for your answer.

To answer your question first:

This list has to strings. The elements of the first string are: car, is and
blue

The elements of the second string are: strong, is and man

----------------------------

Please take a minute to read this while I explain the problem's origin.

This is an extract of  my original corpus-file (c:\corpus-all.txt) is like
this:



LEXEME         vehicle

CLASSIFICATION    M1x

ARTNR           27405

DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
ska) (noun ako hui)) blue ((adj hji hui))

=

LEXEME         people

ARTNR           27406

DEF     strong ((adj hji hui) (adv ly hui)) is ((verb hji ska) (verb kos
hji)) man ((noun ako hui) (prop hui))

DEF     beautiful ((adj hji hui) (adv ly hui)) women ((verb hji ska) (verb
kos hji)) are ((noun ako hui) (prop hui)) successfull ((adj sko hui) (adv ly
hui))

=



the enteries are followed by the =sign and I put them in a variable

(setf corpus-all "c:\corpus-all.txt")



The very first thing I have to do is to make this file more lisp-friendly.
This is a task for "read-database". Let me first give you the code:



(defun read-entry (stream)

  (loop

      for line = (read-line stream nil nil)

      when (null line)

      return stream

      until  (string-equal line "=")

      collect

            (multiple-value-bind (key position) (let ((*read-eval* nil))


(read-from-string line))

              (loop

                  while (eq #\tab (char line position))

                  do (incf position))   ; skip multiple tabs

              (list key

                        (subseq line position)))))



(defun read-database (pathname)

  (with-open-file (stream pathname :direction :input)

    (loop

            for address = (read-entry stream)

            until (eq address stream)

            collect address)))



The problem I can not solve with this function is that it converts the
values into strings so we end up having:



(LEXEME "vehicle")

(CLASSIFICATION "M1x")

(ARTNR "27405")

(DEF "car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
ska) (noun ako hui)) blue ((adj hji hui))")



Whereas I would like to have:



((LEXEME (vehicle))

(CLASSIFICATION (M1x))

(ARTNR (27405))

(DEF (car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
ska) (noun ako hui)) blue ((adj hji hui)))))



I think this is the problem that I carry on throught out the whole program.

Do you know what I can do to resolve this problem?



Then we go on .. The most important part of the corpus is the DEF-part so I
write a function that extracts the DEF part:



(defun find-category-values (category record)

  (loop for (current-category value) in record

    when (eq current-category category) collect value))



(defun extract (category value list)

  (if (null list)

      '()

      (let ((record (first list)))

        (if (equal (cadr (assoc 'lexeme record)) value)

            (cons (find-category-values category record)

                  (extract category value (rest list)))

          (extract category value (rest list))))))



(defun get-def-all (list)

  (cond ((null list) nil)

        (t (cons (find-category-values 'def (car list))

                   (get-def-all (rest list))))))



(defun get-def (x)

  (extract 'def x corpus-all))



(get-def "vehicle")

(("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska)
(noun ako hui)) blue ((adj hji hui))"))



note that if we fix the "read-database" and "read-entry" we should be able
to call the get-def function like this:



(get-def 'vehicle)



if we can make it so far then I think a lot of problems are already solved.
Once again tanks for helping





tnx



ab talebi
From: Marco Antoniotti
Subject: Re: list formats
Date: 
Message-ID: <y6cg04ht0cf.fsf@octagon.mrl.nyu.edu>
Hi,

I am back ready to answer.

"ab talebi" <············@yahoo.com> writes:

> "Marco Antoniotti" <·······@cs.nyu.edu> wrote in message
> ····················@octagon.mrl.nyu.edu...
> >
> > ············@yahoo.com (ab talebi) writes:
> >
> > > I think the way I posed the question was too complicated in latter
> > > postings (under the name loop problem) so I try to samplify it:
> >
> > I have been following your postings for a while.  I believe a little
> > Socratic method would help here.
> >
> > >
> > > we have this list:
> > >
> > > (("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
> > > ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
> > > ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
> > > hui))"))
> >
> > Before we go on, please answer the following question.
> >
> > What are the elements of this list?
> >
> > Ciao
> 
> 
> Thank you for your answer.
> 
> To answer your question first:
> 
> This list has to strings. The elements of the first string are: car, is and
> blue
> 
> The elements of the second string are: strong, is and man

Wrong.  If you do not get this right, you are missing the whole point.

The list has one sublist (let's call it L1) that is

("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))"
"strong ((adj hji hui) (adv ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop hui))")

L1 has contains two strings
S1:
"car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))"

and
S2:
"strong ((adj hji hui) (adv ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop hui))"

Now S1 and S2 are just strings.  You need to parse them. Both S1 and
S2 have the same structure: a `token' followed by a list (of
attributes).

Let's work bottom up.  Let's take one such string and `parse' it.  Now
let's decide what are the basic data structures we want to
use.

Suppose you want to group the `token' and the attribute list. You may
have something like

(defstruct token
   name
   attributes)

Now let's parse the definition strings.  We will return a list of
`tokens'. (Assumption: no token will appear in the string as `nil' or
`()').

(defun parse-def-string (the-definition)
  (declare (type string the-definition))
  (with-input-from-string (s the-definition)
    (loop for name = (read s nil nil)
	  for attrs = (read s nil nil)
	  while name
	  collect (make-token :name name :attributes attrs))))

As an example on S1

cl-prompt> (parse-def-string "car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")
(#S(TOKEN
      :NAME CAR
      :ATTRIBUTES ((NOUN AUTOMOBLIE SKS HUI) (NOUN VEHICLE SKS HUI)))
 #S(TOKEN :NAME IS :ATTRIBUTES ((VERB HJI SKA) (NOUN AKO HUI)))
 #S(TOKEN :NAME BLUE :ATTRIBUTES ((ADJ HJI HUI))))


which is what you want.

Are you with me so far?

Ciao

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: ab talebi
Subject: Re: list formats
Date: 
Message-ID: <3c5fa15f.1997661@news.uio.no>
On 04 Feb 2002 16:15:44 -0500, Marco Antoniotti <·······@cs.nyu.edu>
wrote:

<snip snip>


>
>(defun parse-def-string (the-definition)
>  (declare (type string the-definition))
>  (with-input-from-string (s the-definition)
>    (loop for name = (read s nil nil)
>	  for attrs = (read s nil nil)
>	  while name
>	  collect (make-token :name name :attributes attrs))))
>
>As an example on S1
>
>cl-prompt> (parse-def-string "car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")
>(#S(TOKEN
>      :NAME CAR
>      :ATTRIBUTES ((NOUN AUTOMOBLIE SKS HUI) (NOUN VEHICLE SKS HUI)))
> #S(TOKEN :NAME IS :ATTRIBUTES ((VERB HJI SKA) (NOUN AKO HUI)))
> #S(TOKEN :NAME BLUE :ATTRIBUTES ((ADJ HJI HUI))))
>
>
>which is what you want.
>
>Are you with me so far?
>
>Ciao
>

I am following the course like a hungry dog following a bone!! please
go on ...


tnx 

ab talebi
From: Marco Antoniotti
Subject: Re: list formats
Date: 
Message-ID: <y6c3d0gase2.fsf@octagon.mrl.nyu.edu>
············@yahoo.com (ab talebi) writes:

> On 04 Feb 2002 16:15:44 -0500, Marco Antoniotti <·······@cs.nyu.edu>
> wrote:
> 
> <snip snip>
> 
> 
> >
> >(defun parse-def-string (the-definition)
> >  (declare (type string the-definition))
> >  (with-input-from-string (s the-definition)
> >    (loop for name = (read s nil nil)
> >	  for attrs = (read s nil nil)
> >	  while name
> >	  collect (make-token :name name :attributes attrs))))
> >
> >As an example on S1
> >
> >cl-prompt> (parse-def-string "car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")
> >(#S(TOKEN
> >      :NAME CAR
> >      :ATTRIBUTES ((NOUN AUTOMOBLIE SKS HUI) (NOUN VEHICLE SKS HUI)))
> > #S(TOKEN :NAME IS :ATTRIBUTES ((VERB HJI SKA) (NOUN AKO HUI)))
> > #S(TOKEN :NAME BLUE :ATTRIBUTES ((ADJ HJI HUI))))
> >
> >
> >which is what you want.
> >
> >Are you with me so far?
> >
> >Ciao
> >
> 
> I am following the course like a hungry dog following a bone!! please
> go on ...

Ok.

Now where were we?  We know how to parse the strings which have the
format you describe.  The problem is that these strings are re result
of reading a file whose format does not seem to be all that precise.
I will extrapolate from your example.

You have a number of `definitions' in your file which seem to start
with the word `LEXEME' and end with a `=' (assuming a line oriented
format as you showed).  You have a number of options here.

1 - read the lines and the parse them,
2 - read and parse the lines as you go.

Let's choose 1 for clarity purposes.

Assume we have an open file (i.e. a `stream') where you have a bunch
of `LEXEME' definitions.  Let's write a functions which will return a
list of strings (note: a LIST of STRINGS) comprising one such `LEXEME'

(defun read-lexeme-lines (instream)
  (declare (type stream instream))
  (loop for line = (read-line instream nil)
	while (and line
		   (string/= line "="
			     :start1 0 :end1 (min 1 (length line))
			     :start2 0 :end2 1))
	  when (string/= "" line)
	    collect line into lexeme-lines
	finally (return lexeme-lines)))
	

The `(min 1 (length line))' is a trick to keep track of empty lines,
even if they are discarded afterward.  The :start2 and :end2
parameters are now really needed, but that's ok.

Now, the above function reads *one* lexeme from an input stream. An
example of how this work is the following

==============================================================================
cl-prompt> (with-input-from-string (s "
LEXEME         vehicle

CLASSIFICATION    M1x

ARTNR           27405

DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
ska) (noun ako hui)) blue ((adj hji hui))

=

LEXEME         people

ARTNR           27406

DEF     strong ((adj hji hui) (adv ly hui)) is ((verb hji ska) (verb kos
hji)) man ((noun ako hui) (prop hui))

DEF     beautiful ((adj hji hui) (adv ly hui)) women ((verb hji ska) (verb
kos hji)) are ((noun ako hui) (prop hui)) successfull ((adj sko hui) (adv ly
hui))

=")
  (read-lexeme-lines s))
==============================================================================

Which yields a LIST of STRINGS.

("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR           27405"
 "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji"
 "ska) (noun ako hui)) blue ((adj hji hui))")

This one list in particular contains 4 substrings.

Shall I go on? :)

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: ab talebi
Subject: Re: list formats
Date: 
Message-ID: <3c60f1ff.2826256@news.uio.no>
On 05 Feb 2002 09:57:09 -0500, Marco Antoniotti <·······@cs.nyu.edu>
wrote:

<snip snip>

>(defun read-lexeme-lines (instream)
>  (declare (type stream instream))
>  (loop for line = (read-line instream nil)
>	while (and line
>		   (string/= line "="
>			     :start1 0 :end1 (min 1 (length line))
>			     :start2 0 :end2 1))
>	  when (string/= "" line)
>	    collect line into lexeme-lines
>	finally (return lexeme-lines)))
>	
>
<snip>

>Which yields a LIST of STRINGS.
>
>("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR           27405"
> "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji"
> "ska) (noun ako hui)) blue ((adj hji hui))")
>
>This one list in particular contains 4 substrings.
>
>Shall I go on? :)
>


yes, please do, but what I don't understand is why we have to have
strings in the first place ? I mean, given a corpus with this format:

LEXEME         vehicle
CLASSIFICATION    M1x
ARTNR           27405
DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is
((verb hji ska) (noun ako hui)) blue ((adj hji hui))
=

(your assumption is correct we have a number of `definitions' in the
corpus that start with the word `LEXEME' and end with a `='  we do
have a line oriented format. The attributes, LEXEME CLASSIFICATION
ARTNR and DEF are seperated from their values by tab)


I'm not sure about what format i the best one to parse the corpus
into, what we finaly need is a bunch of functions that allow us to
extract these values:

vehicle, M1x 27405 car (and the containt of each parantes,)
we need to know for 'is' f.ex. that it is both 'verb' and 'noun' 

do you know what I mean?

tnx

ab talebi
From: Marco Antoniotti
Subject: Re: list formats
Date: 
Message-ID: <y6cpu3i75jf.fsf@octagon.mrl.nyu.edu>
············@yahoo.com (ab talebi) writes:

> On 05 Feb 2002 09:57:09 -0500, Marco Antoniotti <·······@cs.nyu.edu>
> wrote:
> 
> <snip snip>
> 
> >(defun read-lexeme-lines (instream)
> >  (declare (type stream instream))
> >  (loop for line = (read-line instream nil)
> >	while (and line
> >		   (string/= line "="
> >			     :start1 0 :end1 (min 1 (length line))
> >			     :start2 0 :end2 1))
> >	  when (string/= "" line)
> >	    collect line into lexeme-lines
> >	finally (return lexeme-lines)))
> >	
> >
> <snip>
> 
> >Which yields a LIST of STRINGS.
> >
> >("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR           27405"
> > "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji"
> > "ska) (noun ako hui)) blue ((adj hji hui))")
> >
> >This one list in particular contains 4 substrings.
> >
> >Shall I go on? :)
> >
> 
> 
> yes, please do, but what I don't understand is why we have to have
> strings in the first place ? I mean, given a corpus with this format:
> 
> LEXEME         vehicle
> CLASSIFICATION    M1x
> ARTNR           27405
> DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is
> ((verb hji ska) (noun ako hui)) blue ((adj hji hui))
> =
> 
> (your assumption is correct we have a number of `definitions' in the
> corpus that start with the word `LEXEME' and end with a `='  we do
> have a line oriented format. The attributes, LEXEME CLASSIFICATION
> ARTNR and DEF are seperated from their values by tab)

I am going one step at a time.  The function I wrote nicely collects
the strings that comprise one definition.  Now, and only now, we
proceed to actually "parse" them.

> I'm not sure about what format i the best one to parse the corpus
> into, what we finaly need is a bunch of functions that allow us to
> extract these values:
> 
> vehicle, M1x 27405 car (and the containt of each parantes,)
> we need to know for 'is' f.ex. that it is both 'verb' and 'noun' 
> 
> do you know what I mean?

Yes.  But remember my first post.  I am trying to use a Socratic
method to make you understand how to carry on this task.
Incidentally, the corpus file format sucks, but that is another
problem.

Let' proceed to "parse" the list of strings into a more useful
object.  Remember.  We have a LIST of STRINGs which comprise ONE
definition.

First of all, let's define what a `corpus-definition' is

(defstruct corpus-definition
  lexeme
  classification
  artnr
  def)

Now we can write a function which will take the LIST of STRINGS which
we know is a definition from the file and which returns one such
`corpus-definition'.

(defun build-corpus-definition (definition-string-list)
  (let ((c-def (make-corpus-definition))) ; An empty definition.
    (dolist (ds definition-string-list c-def)
      (multiple-value-bind (op pos)
	  (read-from-string ds nil)       ; The semantics of this call
                                          ; is important.
	(cond ((string-equal (symbol-name op) "LEXEME")
	       (setf (corpus-definition-lexeme c-def)
		     (read-from-string ds nil nil :start pos)))
	      ((string-equal (symbol-name op) "CLASSIFICATION")
	       (setf (corpus-definition-classification c-def)
		     (read-from-string ds nil nil :start pos)))
	      ((string-equal (symbol-name op) "ARTNR")
	       (setf (corpus-definition-artnr c-def)
		     (read-from-string ds nil nil :start pos)))
	      ((string-equal (symbol-name op) "DEF")
	       ;; Note the next one!!!
	       (push
		(parse-def-string (subseq ds pos))
		(corpus-definition-defs c-def))))
	))))

Try this and let me know what you get.

Let's summarize.

Now you know:

1 - how to read one definition from the file in a list of strings.
2 - how to parse the DEF field into a `token-definition'
3 - how to build a `corpus-definition' from a list of strings
    comprising a definition in the file.

What do you need to do next? :)

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: ab talebi
Subject: Re: list formats
Date: 
Message-ID: <3c62931b.22744231@news.uio.no>
On 06 Feb 2002 14:51:16 -0500, Marco Antoniotti <·······@cs.nyu.edu>
wrote:

<snip>

>(defstruct corpus-definition
>  lexeme
>  classification
>  artnr
>  def)
>
>Now we can write a function which will take the LIST of STRINGS which
>we know is a definition from the file and which returns one such
>`corpus-definition'.
>
>(defun build-corpus-definition (definition-string-list)
>  (let ((c-def (make-corpus-definition))) ; An empty definition.
>    (dolist (ds definition-string-list c-def)
>      (multiple-value-bind (op pos)
>	  (read-from-string ds nil)       ; The semantics of this call
>                                          ; is important.
>	(cond ((string-equal (symbol-name op) "LEXEME")
>	       (setf (corpus-definition-lexeme c-def)
>		     (read-from-string ds nil nil :start pos)))
>	      ((string-equal (symbol-name op) "CLASSIFICATION")
>	       (setf (corpus-definition-classification c-def)
>		     (read-from-string ds nil nil :start pos)))
>	      ((string-equal (symbol-name op) "ARTNR")
>	       (setf (corpus-definition-artnr c-def)
>		     (read-from-string ds nil nil :start pos)))
>	      ((string-equal (symbol-name op) "DEF")
>	       ;; Note the next one!!!
>	       (push
>		(parse-def-string (subseq ds pos))
>		(corpus-definition-defs c-def))))
>	))))
>
>Try this and let me know what you get.
>

I think the lesson is going very well, and I actually understand some
thing :) but to ask this question first: by "read-lexeme-lines" you
actually mean "read-corpus"  right?

here is my summary of what I understand so far:
#|
we are working bottom up.
we group the `token' and the attribute list.
|#

(defstruct token
   name
   attributes)

#|
we take one string from the DEF's attrib and parse it.
We will return a list of `tokens'. (Assumption: no token will appear
in the string as `nil').
#|

(defun parse-def-string (the-definition)
  (declare (type string the-definition))
  (with-input-from-string (s the-definition)
    (loop for name = (read s nil nil)
	  for attrs = (read s nil nil)
	  while name
	  collect (make-token :name name :attributes attrs))))

#| 
input:
(parse-def-string "car ((noun automoblie sks hui)(noun vehicle sks
hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")

output:
(#S(TOKEN
      :NAME CAR
      :ATTRIBUTES ((NOUN AUTOMOBLIE SKS HUI) (NOUN VEHICLE SKS HUI)))
 #S(TOKEN :NAME IS :ATTRIBUTES ((VERB HJI SKA) (NOUN AKO HUI)))
 #S(TOKEN :NAME BLUE :ATTRIBUTES ((ADJ HJI HUI))))

;; ==============================

#|
We write a functions which will return *a list of strings* comprising
one `LEXEME' (one entry)
the function get the whole corpus as argument but returns only *one*
lexeme from the corpus.
|#

(defun read-lexeme-lines (instream)
  (declare (type stream instream))
  (loop for line = (read-line instream nil)
	while (and line
		   (string/= line "="
			     :start1 0 :end1 (min 1 (length line))
			     :start2 0 :end2 1))
	  when (string/= "" line)
	    collect line into lexeme-lines
	finally (return lexeme-lines)))

#| 
input:
(with-input-from-string (s "
LEXEME         vehicle
CLASSIFICATION    M1x
ARTNR           27405
DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is
((verb hji ska) (noun ako hui)) blue ((adj hji hui))
=
LEXEME         people
ARTNR           27406
DEF     strong ((adj hji hui) (adv ly hui)) is ((verb hji ska) (verb
kos hji)) man ((noun ako hui) (prop hui))
DEF     beautiful ((adj hji hui) (adv ly hui)) women ((verb hji ska)
(verb kos hji)) are ((noun ako hui) (prop hui)) successfull ((adj sko
hui) (adv ly hui))
=")
  (read-lexeme-lines s))

output:

("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")
|#

;; ===============================

#|
We have a LIST of STRINGs which comprise ONE definition.
We will now "parse" this into a more useful object.
We define what a `corpus-definition' is:
|#

(defstruct corpus-definition
  lexeme
  classification
  artnr
  def)

#|
Now we can write a function which will take the LIST of STRINGS which
we know is a definition from the file and which returns one such
`corpus-definition'.
|#

(defun build-corpus-definition (definition-string-list)
  (let ((c-def (make-corpus-definition))) ; An empty definition.
    (dolist (ds definition-string-list c-def)
      (multiple-value-bind (op pos)
	  (read-from-string ds nil)       ; The semantics of this call
                                          ; is important.
	(cond ((string-equal (symbol-name op) "LEXEME")
	       (setf (corpus-definition-lexeme c-def)
		     (read-from-string ds nil nil :start pos)))
	      ((string-equal (symbol-name op) "CLASSIFICATION")
	       (setf (corpus-definition-classification c-def)
		     (read-from-string ds nil nil :start pos)))
	      ((string-equal (symbol-name op) "ARTNR")
	       (setf (corpus-definition-artnr c-def)
		     (read-from-string ds nil nil :start pos)))
	      ((string-equal (symbol-name op) "DEF")
	       ;; Note the next one!!!
	       (push
		(parse-def-string (subseq ds pos))
		(corpus-definition-defs c-def))))
	))))


but I get an error message:
Error: Undefined function BUILD-CORPUS-DEFINITION called with
arguments (("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")).
From: Marco Antoniotti
Subject: Re: list formats
Date: 
Message-ID: <y6c7kppux8l.fsf@octagon.mrl.nyu.edu>
············@yahoo.com (ab talebi) writes:

> On 06 Feb 2002 14:51:16 -0500, Marco Antoniotti <·······@cs.nyu.edu>
> wrote:
> 
> <snip>
> 
> >(defstruct corpus-definition
> >  lexeme
> >  classification
> >  artnr
> >  def)
> >
> >Now we can write a function which will take the LIST of STRINGS which
> >we know is a definition from the file and which returns one such
> >`corpus-definition'.
> >
> >(defun build-corpus-definition (definition-string-list)
> >  (let ((c-def (make-corpus-definition))) ; An empty definition.
> >    (dolist (ds definition-string-list c-def)
> >      (multiple-value-bind (op pos)
> >	  (read-from-string ds nil)       ; The semantics of this call
> >                                          ; is important.
> >	(cond ((string-equal (symbol-name op) "LEXEME")
> >	       (setf (corpus-definition-lexeme c-def)
> >		     (read-from-string ds nil nil :start pos)))
> >	      ((string-equal (symbol-name op) "CLASSIFICATION")
> >	       (setf (corpus-definition-classification c-def)
> >		     (read-from-string ds nil nil :start pos)))
> >	      ((string-equal (symbol-name op) "ARTNR")
> >	       (setf (corpus-definition-artnr c-def)
> >		     (read-from-string ds nil nil :start pos)))
> >	      ((string-equal (symbol-name op) "DEF")
> >	       ;; Note the next one!!!
> >	       (push
> >		(parse-def-string (subseq ds pos))
> >		(corpus-definition-defs c-def))))
> >	))))
> >
> >Try this and let me know what you get.
> >
> 
> I think the lesson is going very well, and I actually understand some
> thing :) but to ask this question first: by "read-lexeme-lines" you
> actually mean "read-corpus"  right?

No.  I mean what I write.  I am not reading the corpus.  I am reading
the lines comprising a single definition.

The question is now:  how do you read an entire corpus (i.e. a file)?
That is: write a function that does that.

	(defun read-corpus (corpus-file-name) ...)

The function will return a list of `corpus-definition's (which are
defined below).

> (defun build-corpus-definition (definition-string-list)
	...
> 	))))
> 
> 
> but I get an error message:
> Error: Undefined function BUILD-CORPUS-DEFINITION called with
> arguments (("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
> 27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
> is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")).
> 

... and the error reads as ... ?  :)  What does the error message mean?

-- 
Marco Socrate Antoniotti=====================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: ab talebi
Subject: Re: list formats
Date: 
Message-ID: <3c638be7.3430793@news.uio.no>
On 07 Feb 2002 10:28:58 -0500, Marco Antoniotti <·······@cs.nyu.edu>
wrote:


>> >
>> >Try this and let me know what you get.
>> >
>> 
>> I think the lesson is going very well, and I actually understand some
>> thing :) but to ask this question first: by "read-lexeme-lines" you
>> actually mean "read-corpus"  right?
>
>No.  I mean what I write.  I am not reading the corpus.  I am reading
>the lines comprising a single definition.
>
>The question is now:  how do you read an entire corpus (i.e. a file)?
>That is: write a function that does that.
>
>	(defun read-corpus (corpus-file-name) ...)

something like:

(setf corpus (read-corpus "c:\corpus.txt"))

(defun read-corpus (pathname)
  (with-open-file (stream pathname :direction :input)
    (loop
	for corpus-definiton = (build-corpus-definition stream)
	until (eq corpus-definiton stream)
	collect corpus-definiton)))

>
>The function will return a list of `corpus-definition's (which are
>defined below).
>
>> (defun build-corpus-definition (definition-string-list)
>	...
>> 	))))
>> 
>> 
>> but I get an error message:
>> Error: Undefined function BUILD-CORPUS-DEFINITION called with
>> arguments (("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
>> 27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
>> is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")).
>> 
>
>... and the error reads as ... ?  :)  What does the error message mean?
>
>-- 
oh, it was just a missing parantese error which I fixed. now when I
compile the file it says:
The following functions are undefined:
CORPUS-DEFINITION-DEFS which is referenced by BUILD-CORPUS-DEFINITION
(SETF CORPUS-DEFINITION-DEFS) which is referenced by
BUILD-CORPUS-DEFINITION

and this is the output I get:

CL-USER 1 > (build-corpus-definition '("LEXEME         vehicle"
"CLASSIFICATION    M1x" "ARTNR           27405" "DEF     car ((noun
automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun
ako hui)) blue ((adj hji hui))"))

Error: Undefined function CORPUS-DEFINITION-DEFS called with arguments
(#S(CORPUS-DEFINITION LEXEME VEHICLE CLASSIFICATION M1X ARTNR 27405
DEF NIL)).
  1 (continue) Try invoking CORPUS-DEFINITION-DEFS again.
  2 Return some values from the call to CORPUS-DEFINITION-DEFS.
  3 Try invoking something other than CORPUS-DEFINITION-DEFS with the
same arguments.
  4 Set the symbol-function of CORPUS-DEFINITION-DEFS to another
function.
  5 (abort) Return to level 0.
  6 Return to top loop level 0.

Type :b for backtrace, :c <option number> to proceed,  or :? for other
options

CL-USER 2 : 1 > 

but if I go without the DEF part it works just fine except I get NIL
as DEF of course

CL-USER 2 : 1 > (build-corpus-definition '("LEXEME         vehicle"
"CLASSIFICATION    M1x" "ARTNR           27405"))

#S(CORPUS-DEFINITION LEXEME VEHICLE CLASSIFICATION M1X ARTNR 27405 DEF
NIL)
From: Marco Antoniotti
Subject: Re: list formats
Date: 
Message-ID: <y6cofj0owkr.fsf@octagon.mrl.nyu.edu>
············@yahoo.com (ab talebi) writes:

> On 07 Feb 2002 10:28:58 -0500, Marco Antoniotti <·······@cs.nyu.edu>
> wrote:
> 
> 
> >> >
> >> >Try this and let me know what you get.
> >> >
> >> 
> >> I think the lesson is going very well, and I actually understand some
> >> thing :) but to ask this question first: by "read-lexeme-lines" you
> >> actually mean "read-corpus"  right?
> >
> >No.  I mean what I write.  I am not reading the corpus.  I am reading
> >the lines comprising a single definition.
> >
> >The question is now:  how do you read an entire corpus (i.e. a file)?
> >That is: write a function that does that.
> >
> >	(defun read-corpus (corpus-file-name) ...)
> 
> something like:
> 
> (setf corpus (read-corpus "c:\corpus.txt"))
> 
> (defun read-corpus (pathname)
>   (with-open-file (stream pathname :direction :input)
>     (loop
> 	for corpus-definiton = (build-corpus-definition stream)
> 	until (eq corpus-definiton stream)
> 	collect corpus-definiton)))

Almost.

Fist of all it is better to use one of the "definition forms" of CL.

	(defvar *the-corpus*)  ; Note the `*' convention.

	(setf *the-corpus* (read-corpus "/where/the/corpus/is/corpus.txt"))

Now let's see why your definition is wrong (meanwhile, we'll fix
problems in my code :) ).

The first question is: what does `build-corpus-definition' returns?
The second question is: what is its input?

You are calling `build-corpus-definition' with a `stream' (a variable
named `stream' whose type is `stream'.
Moreover, given your test in the 'UNTIL' clause, you seem to assume
that `build-corpus-definition' returns a stream.  Is this correct?






> >The function will return a list of `corpus-definition's (which are
> >defined below).
> >
> >> (defun build-corpus-definition (definition-string-list)
> >	...
> >> 	))))
> >> 
> >> 
> >> but I get an error message:
> >> Error: Undefined function BUILD-CORPUS-DEFINITION called with
> >> arguments (("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
> >> 27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
> >> is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")).
> >> 
> >
> >... and the error reads as ... ?  :)  What does the error message mean?
> >
> >-- 
> oh, it was just a missing parantese error which I fixed. now when I
> compile the file it says:
> The following functions are undefined:
> CORPUS-DEFINITION-DEFS which is referenced by BUILD-CORPUS-DEFINITION
> (SETF CORPUS-DEFINITION-DEFS) which is referenced by
> BUILD-CORPUS-DEFINITION
> 
> and this is the output I get:
> 
> CL-USER 1 > (build-corpus-definition '("LEXEME         vehicle"
> "CLASSIFICATION    M1x" "ARTNR           27405" "DEF     car ((noun
> automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun
> ako hui)) blue ((adj hji hui))"))
> 
> Error: Undefined function CORPUS-DEFINITION-DEFS called with arguments
> (#S(CORPUS-DEFINITION LEXEME VEHICLE CLASSIFICATION M1X ARTNR 27405
> DEF NIL)).
>   1 (continue) Try invoking CORPUS-DEFINITION-DEFS again.
>   2 Return some values from the call to CORPUS-DEFINITION-DEFS.
>   3 Try invoking something other than CORPUS-DEFINITION-DEFS with the
> same arguments.
>   4 Set the symbol-function of CORPUS-DEFINITION-DEFS to another
> function.
>   5 (abort) Return to level 0.
>   6 Return to top loop level 0.
> 
> Type :b for backtrace, :c <option number> to proceed,  or :? for other
> options
> 
> CL-USER 2 : 1 > 
> 
> but if I go without the DEF part it works just fine except I get NIL
> as DEF of course
> 
> CL-USER 2 : 1 > (build-corpus-definition '("LEXEME         vehicle"
> "CLASSIFICATION    M1x" "ARTNR           27405"))
> 
> #S(CORPUS-DEFINITION LEXEME VEHICLE CLASSIFICATION M1X ARTNR 27405 DEF
> NIL)
> 

Thanks.  Very nice error output.  Now what this tells me is that there
is a mishap in the `corpus-definition' definition.  As a matter of
fact, there was a missing `s'.

Here is the correct definition

(defstruct corpus-definition
  lexeme
  classification
  artnr
  defs)


Now. Back to basics.  How do you write `read-corpus'? :)

Cheers


-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: ab talebi
Subject: Re: list formats
Date: 
Message-ID: <3c64fa7a.6465767@news.online.no>
On 08 Feb 2002 09:52:04 -0500, Marco Antoniotti <·······@cs.nyu.edu>
wrote:

>> 
>> something like:
>> 
>> (setf corpus (read-corpus "c:\corpus.txt"))
>> 
>> (defun read-corpus (pathname)
>>   (with-open-file (stream pathname :direction :input)
>>     (loop
>> 	for corpus-definiton = (build-corpus-definition stream)
>> 	until (eq corpus-definiton stream)
>> 	collect corpus-definiton)))
>
>Almost.
>
>Fist of all it is better to use one of the "definition forms" of CL.
>
>	(defvar *the-corpus*)  ; Note the `*' convention.
>
>	(setf *the-corpus* (read-corpus "/where/the/corpus/is/corpus.txt"))
>
>Now let's see why your definition is wrong (meanwhile, we'll fix
>problems in my code :) ).
>
>The first question is: what does `build-corpus-definition' returns?

it returns the parsed list of strings

#S(CORPUS-DEFINITION LEXEME VEHICLE CLASSIFICATION M1X ARTNR 27405
DEFS ((#S(TOKEN NAME CAR ATTRIBUTES ((NOUN AUTOMOBLIE SKS HUI) (NOUN
VEHICLE SKS HUI))) #S(TOKEN NAME IS ATTRIBUTES ((VERB HJI SKA) (NOUN
AKO HUI))) #S(TOKEN NAME BLUE ATTRIBUTES ((ADJ HJI HUI))))))

>The second question is: what is its input?

it's input is a list of strings

("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")

>You are calling `build-corpus-definition' with a `stream' (a variable
>named `stream' whose type is `stream'.

I should call it with a list, that is with a list of strings or the
empty list; nil

>Moreover, given your test in the 'UNTIL' clause, you seem to assume
>that `build-corpus-definition' returns a stream.  Is this correct?

nope, it returns a #S plus a list

<snip> <snip>

>Here is the correct definition
>
>(defstruct corpus-definition
>  lexeme
>  classification
>  artnr
>  defs)
>

and it works just fine (of course):

>Now. Back to basics.  How do you write `read-corpus'? :)
>
>Cheers
>

how about something like this :


 (setf *corpus* (read-corpus "c:\corpus.txt"))
 
 how about we make a local variable and make a list of the corpus.txt
which we put in the list and give that list to build-corpus-definition
however i'm very uncertain about how to write the code

(defun read-corpus (pathname)
   (with-open-file (the-new-variable :direction :input)
     (loop
      for corpus-definiton = (build-corpus-definition (list
the-new-varable))
      until (eq corpus-definiton the-new-variable)
      collect corpus-definiton)))

???

ab talebi
From: Marco Antoniotti
Subject: Re: list formats
Date: 
Message-ID: <y6c1yfs6qd3.fsf@octagon.mrl.nyu.edu>
············@yahoo.com (ab talebi) writes:

> On 08 Feb 2002 09:52:04 -0500, Marco Antoniotti <·······@cs.nyu.edu>
> wrote:
> 
> >> 
> >> something like:
> >> 
> >> (setf corpus (read-corpus "c:\corpus.txt"))
> >> 
> >> (defun read-corpus (pathname)
> >>   (with-open-file (stream pathname :direction :input)
> >>     (loop
> >> 	for corpus-definiton = (build-corpus-definition stream)
> >> 	until (eq corpus-definiton stream)
> >> 	collect corpus-definiton)))
> >
> >Almost.
> >
> >Fist of all it is better to use one of the "definition forms" of CL.
> >
> >	(defvar *the-corpus*)  ; Note the `*' convention.
> >
> >	(setf *the-corpus* (read-corpus "/where/the/corpus/is/corpus.txt"))
> >
> >Now let's see why your definition is wrong (meanwhile, we'll fix
> >problems in my code :) ).
> >
> >The first question is: what does `build-corpus-definition' returns?
> 
> it returns the parsed list of strings
> 
> #S(CORPUS-DEFINITION LEXEME VEHICLE CLASSIFICATION M1X ARTNR 27405
> DEFS ((#S(TOKEN NAME CAR ATTRIBUTES ((NOUN AUTOMOBLIE SKS HUI) (NOUN
> VEHICLE SKS HUI))) #S(TOKEN NAME IS ATTRIBUTES ((VERB HJI SKA) (NOUN
> AKO HUI))) #S(TOKEN NAME BLUE ATTRIBUTES ((ADJ HJI HUI))))))

Good.  But what is the "parsed (list of strings)" and what are the
strings in the list?


> 
> >The second question is: what is its input?
> 
> it's input is a list of strings

.... and where do you get this list of strings from?

> >You are calling `build-corpus-definition' with a `stream' (a variable
> >named `stream' whose type is `stream'.
> 
> I should call it with a list, that is with a list of strings or the
> empty list; nil

Good.  Now, will any list of strings work?

> >Moreover, given your test in the 'UNTIL' clause, you seem to assume
> >that `build-corpus-definition' returns a stream.  Is this correct?
> 
> nope, it returns a #S plus a list

No. It returns an object of type CORPUS-DEFINITION which happens to
print out as

	#S(CORPUS-DEFINITION .....)

You work in linguistics or NLP.  There is a difference between a
printed representation of a word (the English text) and its phonetic
representation (which is something else altogether - *especially* in
English :) ).

> 
> <snip> <snip>
> 
> >Here is the correct definition
> >
> >(defstruct corpus-definition
> >  lexeme
> >  classification
> >  artnr
> >  defs)
> >
> 
> and it works just fine (of course):
> 
> >Now. Back to basics.  How do you write `read-corpus'? :)
> >
> >Cheers
> >
> 
> how about something like this :
> 
> 
>  (setf *corpus* (read-corpus "c:\corpus.txt"))
>  
>  how about we make a local variable and make a list of the corpus.txt
> which we put in the list and give that list to build-corpus-definition
> however i'm very uncertain about how to write the code

Sure you are.  It looks like you do not ask yourself the questions you
ask here.

What does WITH-OPEN-FILE do?  What `the-new-variable' gets bound to?
Are you using all the things we defined?  What are you testing with
your EQ test?

> 
> (defun read-corpus (pathname)
>    (with-open-file (the-new-variable :direction :input)
>      (loop
>       for corpus-definiton = (build-corpus-definition (list
> the-new-varable))
>       until (eq corpus-definiton the-new-variable)
>       collect corpus-definiton)))
> 
> ???

Cheers

-- 
Marco Socrate Antoniotti =====================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Marco Antoniotti
Subject: Re: list formats
Date: 
Message-ID: <y6celjwovxp.fsf@octagon.mrl.nyu.edu>
BTW. `read-corpus' is 6 or 7 lines away :)

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Coby Beck
Subject: Re: list formats
Date: 
Message-ID: <30W88.39947$2x2.2422939@news3.calgary.shaw.ca>
"ab talebi" <············@yahoo.com> wrote in message
·····················@news.uio.no...
> On 07 Feb 2002 10:28:58 -0500, Marco Antoniotti <·······@cs.nyu.edu>
> >
> >The question is now:  how do you read an entire corpus (i.e. a file)?
> >That is: write a function that does that.
> >
> > (defun read-corpus (corpus-file-name) ...)
>
> something like:
>
> (setf corpus (read-corpus "c:\corpus.txt"))
>
> (defun read-corpus (pathname)
>   (with-open-file (stream pathname :direction :input)
>     (loop
> for corpus-definiton = (build-corpus-definition stream)
> until (eq corpus-definiton stream)
> collect corpus-definiton)))

I would strongly sugest that a user of build-corpus-definition would not
normally expect the stream being returned in any circumstance.  I would
expect a definition or a nil.  If you _really_want_ the option of not
handling eof errors and/or specifying what is returned in the event of a
handled eof error then i would recommend mimicking the read functions'
conventions.  (ie (build-corpus-definition stream nil stream) would do what
you are wanting, (build-corpus-definition stream nil) would give definitions
until eof then a nil and the way you have it now would through an eof error)

A better name might be read-corpus-definition.

--
Coby Beck
(remove #\Space "coby 101 @ bigpond . com")
From: Marco Antoniotti
Subject: Re: list formats
Date: 
Message-ID: <y6cy9i05bjo.fsf@octagon.mrl.nyu.edu>
"Coby Beck" <·····@mercury.bc.ca> writes:

> "ab talebi" <············@yahoo.com> wrote in message
> ·····················@news.uio.no...
> > On 07 Feb 2002 10:28:58 -0500, Marco Antoniotti <·······@cs.nyu.edu>
> > >
> > >The question is now:  how do you read an entire corpus (i.e. a file)?
> > >That is: write a function that does that.
> > >
> > > (defun read-corpus (corpus-file-name) ...)
> >
> > something like:
> >
> > (setf corpus (read-corpus "c:\corpus.txt"))
> >
> > (defun read-corpus (pathname)
> >   (with-open-file (stream pathname :direction :input)
> >     (loop
> > for corpus-definiton = (build-corpus-definition stream)
> > until (eq corpus-definiton stream)
> > collect corpus-definiton)))
> 
> I would strongly sugest that a user of build-corpus-definition would not
> normally expect the stream being returned in any circumstance.  I would
> expect a definition or a nil.  If you _really_want_ the option of not
> handling eof errors and/or specifying what is returned in the event of a
> handled eof error then i would recommend mimicking the read functions'
> conventions.

...except that I think that Ab Talebi is not yet up to speed on end of
file treatment.

> (ie (build-corpus-definition stream nil stream) would do what
> you are wanting, (build-corpus-definition stream nil) would give definitions
> until eof then a nil and the way you have it now would through an eof error)
> 
> A better name might be read-corpus-definition.

You have not been following the whole thread.  Believe
me. BUILD-CORPUS-DEFINITION is the right name. One thing Ab Talebi
still needs to work on is that BUILD-CORPUS-DEFINITION does not do any
`reading' from the stream. :)

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Coby Beck
Subject: Re: list formats
Date: 
Message-ID: <0I%98.14649$Cg5.937393@news1.calgary.shaw.ca>
"Marco Antoniotti" <·······@cs.nyu.edu> wrote in message
····················@octagon.mrl.nyu.edu...
>
> "Coby Beck" <·····@mercury.bc.ca> writes:
>
[...]
> > (ie (build-corpus-definition stream nil stream) would do what
> > you are wanting, (build-corpus-definition stream nil) would give
definitions
> > until eof then a nil and the way you have it now would through an eof
error)
> >
> > A better name might be read-corpus-definition.
>
> You have not been following the whole thread.  Believe
> me. BUILD-CORPUS-DEFINITION is the right name. One thing Ab Talebi
> still needs to work on is that BUILD-CORPUS-DEFINITION does not do any
> `reading' from the stream. :)
>

No, I've only been peeking in every now and then...I'll butt out and let you
get back to the lessons ;-)

--
Coby Beck
(remove #\Space "coby 101 @ bigpond . com")
From: Marco Antoniotti
Subject: Re: list formats
Date: 
Message-ID: <y6c3d06vi6w.fsf@octagon.mrl.nyu.edu>
"Coby Beck" <·····@mercury.bc.ca> writes:

> "Marco Antoniotti" <·······@cs.nyu.edu> wrote in message
> ····················@octagon.mrl.nyu.edu...
> >
> > "Coby Beck" <·····@mercury.bc.ca> writes:
> >
> [...]
> > > (ie (build-corpus-definition stream nil stream) would do what
> > > you are wanting, (build-corpus-definition stream nil) would give
> definitions
> > > until eof then a nil and the way you have it now would through an eof
> error)
> > >
> > > A better name might be read-corpus-definition.
> >
> > You have not been following the whole thread.  Believe
> > me. BUILD-CORPUS-DEFINITION is the right name. One thing Ab Talebi
> > still needs to work on is that BUILD-CORPUS-DEFINITION does not do any
> > `reading' from the stream. :)
> >
> 
> No, I've only been peeking in every now and then...I'll butt out and let you
> get back to the lessons ;-)

I apologize.  I did not mean to be rude.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Barry Margolin
Subject: Re: list formats
Date: 
Message-ID: <0Vf48.45$qg4.151346@burlma1-snr2>
In article <··················@news.uio.no>,
ab talebi <············@yahoo.com> wrote:
>I think the way I posed the question was too complicated in latter
>postings (under the name loop problem) so I try to samplify it:
>
>we have this list:
>
>(("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
>ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
>ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
>hui))"))
>
>
>and want to convert it to this format:
>
>(("car" ((noun automoblie sks hui)(noun vehicle sks hui))
>  "is" ((verb hji ska) (noun ako hui))
>  "blue" ((adj hji hui)))
> ("strong" ((adj hji hui) (adv ly hui))
>  "is" ((verb hji ska) (verb kos hji))
>  "man" ((noun ako hui) (prop hui))))

Here's my solution.  It's untested and doesn't do any error checking.

(defun parse-my-list (list)
  (loop for string in (car list)
        collect (parse-my-string string)))

(defun parse-my-string (string)
  "Parse a string of repeating "<name> <list-of-attributes> ..."
  (loop with end = (length string)
	with last-end
	for start = 0 then last-end
        while (start < end)
        collect (let* ((space-pos (position #\space string :start start))
                       (word (subseq string start space-pos)))
                  (setq start (1+ space-pos))
                  word)
        collect (multiple-value-bind (list end-pos)
                      (read-from-string string t nil :start start)
                  ;; skip over whitespace after the list
		  (setq last-end
                        (position #\space string :start end-pos :test #'char/=))
                  list)))

-- 
Barry Margolin, ······@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: ab talebi
Subject: Re: list formats
Date: 
Message-ID: <4LQ48.2660$tn2.58920@news4.ulv.nextra.no>
"Barry Margolin" <······@genuity.net> wrote in message
························@burlma1-snr2...
> In article <··················@news.uio.no>,
> ab talebi <············@yahoo.com> wrote:
> >I think the way I posed the question was too complicated in latter
> >postings (under the name loop problem) so I try to samplify it:
> >
> >we have this list:
> >
> >(("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
> >ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
> >ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
> >hui))"))
> >
> >
> >and want to convert it to this format:
> >
> >(("car" ((noun automoblie sks hui)(noun vehicle sks hui))
> >  "is" ((verb hji ska) (noun ako hui))
> >  "blue" ((adj hji hui)))
> > ("strong" ((adj hji hui) (adv ly hui))
> >  "is" ((verb hji ska) (verb kos hji))
> >  "man" ((noun ako hui) (prop hui))))
>
> Here's my solution.  It's untested and doesn't do any error checking.
>
> (defun parse-my-list (list)
>   (loop for string in (car list)
>         collect (parse-my-string string)))
>
> (defun parse-my-string (string)
>   "Parse a string of repeating "<name> <list-of-attributes> ..."
>   (loop with end = (length string)
> with last-end
> for start = 0 then last-end
>         while (start < end)
>         collect (let* ((space-pos (position #\space string :start start))
>                        (word (subseq string start space-pos)))
>                   (setq start (1+ space-pos))
>                   word)
>         collect (multiple-value-bind (list end-pos)
>                       (read-from-string string t nil :start start)
>                   ;; skip over whitespace after the list
>   (setq last-end
>                         (position #\space string :start end-pos :test
#'char/=))
>                   list)))
>

unfortenatley it doesn't work, amoung other things because
while (start < end)
doesn't make sence. I tried changing it to while (< start end) but then
(setq start (1+ space-pos)) expectes a number which we don't have, so I
don't know....

tnx

ab talebi
From: Barry Margolin
Subject: Re: list formats
Date: 
Message-ID: <Kvi58.9$1F6.239221@burlma1-snr2>
In article <····················@news4.ulv.nextra.no>,
ab talebi <············@yahoo.com> wrote:
>unfortenatley it doesn't work, amoung other things because
>while (start < end)
>doesn't make sence. I tried changing it to while (< start end) but then
>(setq start (1+ space-pos)) expectes a number which we don't have, so I
>don't know....

If it can't find a space, SEARCH will return NIL, so you'll have to handle
this case.

Consider my function the starting point, and debug it further until you get
something that works, just as you would have to do if you had written it
yourself.

-- 
Barry Margolin, ······@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.