From: ab talebi
Subject: seperating words
Date: 
Message-ID: <3bcc522c.4256841@news.uio.no>
Hi
I need a lisp function that reads a text like from a given filename
(lexicon.txt) and extracts the definitions seperated by commas

animal: we have cats, dogs, monkey and other animals
food: differet types like rice, beans, potato and some other
car: we have mercedes, opel, mazda plus other cars

and gives me this result file (result.txt)

(animal cats dogs monkey)
(food rice beans potato)
(car mercedes opel mazda)

tnx
ab talebi

From: Kent M Pitman
Subject: Re: seperating words
Date: 
Message-ID: <sfwd73neecl.fsf@world.std.com>
············@yahoo.com (ab talebi) writes:

> Hi
> I need a lisp function that reads a text like from a given filename
> (lexicon.txt) and extracts the definitions seperated by commas
> 
> animal: we have cats, dogs, monkey and other animals
> food: differet types like rice, beans, potato and some other
> car: we have mercedes, opel, mazda plus other cars
> 
> and gives me this result file (result.txt)
> 
> (animal cats dogs monkey)
> (food rice beans potato)
> (car mercedes opel mazda)

There is no function that does this, but you can easily write one using
the functions READ-LINE, POSITION, CHAR, and SUBSEQ.

You can probably also write one using READ-CHAR, CONS, COERCE, and EQL.
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3bcd3b9b.64023392@news.uio.no>
On Tue, 16 Oct 2001 16:31:22 GMT, Kent M Pitman <······@world.std.com>
wrote:

>············@yahoo.com (ab talebi) writes:
>
>> Hi
>> I need a lisp function that reads a text like from a given filename
>> (lexicon.txt) and extracts the definitions seperated by commas
>> 
>> animal: we have cats, dogs, monkey and other animals
>> food: differet types like rice, beans, potato and some other
>> car: we have mercedes, opel, mazda plus other cars
>> 
>> and gives me this result file (result.txt)
>> 
>> (animal cats dogs monkey)
>> (food rice beans potato)
>> (car mercedes opel mazda)
>
>There is no function that does this, but you can easily write one using
>the functions READ-LINE, POSITION, CHAR, and SUBSEQ.
>
>You can probably also write one using READ-CHAR, CONS, COERCE, and EQL.
>

thanks, I'm not very experienced really, could someone help me a
little with that?

tnx
ab talebi
From: Kent M Pitman
Subject: Re: seperating words
Date: 
Message-ID: <sfw3d4ir5sl.fsf@world.std.com>
············@yahoo.com (ab talebi) writes:

> 
> On Tue, 16 Oct 2001 16:31:22 GMT, Kent M Pitman <······@world.std.com>
> wrote:
> 
> >············@yahoo.com (ab talebi) writes:
> >
> >> Hi
> >> I need a lisp function that reads a text like from a given filename
> >> (lexicon.txt) and extracts the definitions seperated by commas
> >> 
> >> animal: we have cats, dogs, monkey and other animals
> >> food: differet types like rice, beans, potato and some other
> >> car: we have mercedes, opel, mazda plus other cars
> >> 
> >> and gives me this result file (result.txt)
> >> 
> >> (animal cats dogs monkey)
> >> (food rice beans potato)
> >> (car mercedes opel mazda)
> >
> >There is no function that does this, but you can easily write one using
> >the functions READ-LINE, POSITION, CHAR, and SUBSEQ.
> >
> >You can probably also write one using READ-CHAR, CONS, COERCE, and EQL.
> >
> 
> thanks, I'm not very experienced really, could someone help me a
> little with that?

Is this for homework?  It looks like a homework problem.

What book are you using to learn?
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3bcd4e94.68881028@news.uio.no>
On Wed, 17 Oct 2001 09:10:02 GMT, Kent M Pitman <······@world.std.com>
wrote:

>············@yahoo.com (ab talebi) writes:
>
>> 
>> On Tue, 16 Oct 2001 16:31:22 GMT, Kent M Pitman <······@world.std.com>
>> wrote:
>> 
>> >············@yahoo.com (ab talebi) writes:
>> >
>> >> Hi
>> >> I need a lisp function that reads a text like from a given filename
>> >> (lexicon.txt) and extracts the definitions seperated by commas
>> >> 
>> >> animal: we have cats, dogs, monkey and other animals
>> >> food: differet types like rice, beans, potato and some other
>> >> car: we have mercedes, opel, mazda plus other cars
>> >> 
>> >> and gives me this result file (result.txt)
>> >> 
>> >> (animal cats dogs monkey)
>> >> (food rice beans potato)
>> >> (car mercedes opel mazda)
>
>What book are you using to learn?

I don't have any LISP books and I need just this one program. therfor
I asked other more experiences Lisp-ers

tnx
ab talebi
From: Janis Dzerins
Subject: Re: seperating words
Date: 
Message-ID: <87lmiaocae.fsf@asaka.latnet.lv>
············@yahoo.com (ab talebi) writes:

> On Tue, 16 Oct 2001 16:31:22 GMT, Kent M Pitman <······@world.std.com>
> wrote:
> 
> >············@yahoo.com (ab talebi) writes:
> >
> >> Hi
> >> I need a lisp function that reads a text like from a given filename
> >> (lexicon.txt) and extracts the definitions seperated by commas
> >> 
> >> animal: we have cats, dogs, monkey and other animals
> >> food: differet types like rice, beans, potato and some other
> >> car: we have mercedes, opel, mazda plus other cars
> >> 
> >> and gives me this result file (result.txt)
> >> 
> >> (animal cats dogs monkey)
> >> (food rice beans potato)
> >> (car mercedes opel mazda)
> >
> >There is no function that does this, but you can easily write one using
> >the functions READ-LINE, POSITION, CHAR, and SUBSEQ.
> >
> >You can probably also write one using READ-CHAR, CONS, COERCE, and EQL.
> >
> 
> thanks, I'm not very experienced really, could someone help me a
> little with that?

Ok, where are you stuck? (i.e. show us the code, tell us what you
expect it to do and what it does.)

-- 
Janis Dzerins

  Eat shit -- billions of flies can't be wrong.
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3bcd7082.77568538@news.uio.no>
On 17 Oct 2001 12:18:01 +0300, Janis Dzerins <·····@latnet.lv> wrote:

>············@yahoo.com (ab talebi) writes:
>
>> On Tue, 16 Oct 2001 16:31:22 GMT, Kent M Pitman <······@world.std.com>
>> wrote:
>> 
>> >············@yahoo.com (ab talebi) writes:
>> >
>> >> Hi
>> >> I need a lisp function that reads a text like from a given filename
>> >> (lexicon.txt) and extracts the definitions seperated by commas
>> >> 
>> >> animal: we have cats, dogs, monkey and other animals
>> >> food: differet types like rice, beans, potato and some other
>> >> car: we have mercedes, opel, mazda plus other cars
>> >> 
>> >> and gives me this result file (result.txt)
>> >> 
>> >> (animal cats dogs monkey)
>> >> (food rice beans potato)
>> >> (car mercedes opel mazda)
<....>
>
>Ok, where are you stuck? (i.e. show us the code, tell us what you
>expect it to do and what it does.)

Dear Janis, actually I'm not a Lisp-er at all, al I need is this one
tiny program so therefor I haven't the foggiest idea about it,
therefor I asked Lispers if they could be so kind and help me with it,
but I understand that it's too much to ask for. Now can you help me?

ab talebi
From: Wade Humeniuk
Subject: Re: seperating words
Date: 
Message-ID: <9qk46k$fng$1@news3.cadvision.com>
> Dear Janis, actually I'm not a Lisp-er at all, al I need is this one
> tiny program so therefor I haven't the foggiest idea about it,
> therefor I asked Lispers if they could be so kind and help me with it,
> but I understand that it's too much to ask for. Now can you help me?

There is concern in this group, and rightly so, that you are a student
learning Lisp.  Giving you the answer to an assignment is cheating.  A
student is meant to learn the material, give it a good try,  present their
work, whether correct or incorrect so that you (and the teacher) can
actually understand if you are learning.  Most people here are concerned
with people learning Lisp and not interested in someone going through the
motions.

Since you have been asked multiple times if you are a student and whether
this is an assignment and you have refused to answer, I can only assume you
are a student and that you are attempting to cheat.

If you want someplace to start, get a programming book, perferably a Lisp
one and spend the hours/days necessary to understand how to start
programming and how to use Lisp.  If you have trouble finding a book, go ask
your teacher, I am sure He/She will be more than helpful.  Sometimes one has
to do remedial work first before one can move ahead.  There is no virtue in
cheating, you are just hurting yourself.

Wade
From: Tim Bradshaw
Subject: Re: seperating words
Date: 
Message-ID: <ey3wv1uifpe.fsf@cley.com>
* ab talebi wrote:
> Dear Janis, actually I'm not a Lisp-er at all, al I need is this one
> tiny program so therefor I haven't the foggiest idea about it,
> therefor I asked Lispers if they could be so kind and help me with it,
> but I understand that it's too much to ask for. Now can you help me?

What we are trying to establish is whether this is a homework problem.
It looks like one both because of the nature of the problem, the time
of year, and the fact that you've failed to answer one direct question
as to whether it is already.  If it is, then people are generally
willing to help if you've obviously made some serious effort to
understand the problem.  People are *not* generally willing to do your
homework for you, for reasons which should be clear.

(I'd like to take this opportunity to encourage something a bit like
the comp.arch approach to homework problems, where people give insane,
superficially plausible, and often very amusing answers.  I can't
think of an exact equivalent here - I think in this case I think an
amusing approach would be either to provide an implementation of a
teco-like text-processing system in CL, in which the word-splitting
problem was then written, or my approach which is to write the obvious
solution and then convert all iteration to tail calls, so you end up
with:

(labels ((l (x))
          .... (l (1+ x)))
  (l 0))

then further convert things so you end up with

((lambda (l)
   (funcall l l 0))
 (lambda (l x)
   ...
   (funcall l l (1+ x))))

- it's essential when doing this to use one-letter variable names and
reuse the same name in as many places as you can.  I feel the end
result is sufficiently inscrutable that, although it solves the
problem, if it was handed in it would be obvious that something odd
had gone on.)

--tim
From: Thomas F. Burdick
Subject: Re: seperating words
Date: 
Message-ID: <xcv7ktu5b9c.fsf@famine.OCF.Berkeley.EDU>
Tim Bradshaw <···@cley.com> writes:

> or my approach which is to write the obvious solution and then
> convert all iteration to tail calls, so you end up with:

Even better is to write up a somewhat foolishly repetative solution,
then obfuscate.

> 
> (labels ((l (x))
>           .... (l (1+ x)))
>   (l 0))
> 
> then further convert things so you end up with
> 
> ((lambda (l)
>    (funcall l l 0))
>  (lambda (l x)
>    ...
>    (funcall l l (1+ x))))
> 
> - it's essential when doing this to use one-letter variable names and
> reuse the same name in as many places as you can.  I feel the end
> result is sufficiently inscrutable that, although it solves the
> problem, if it was handed in it would be obvious that something odd
> had gone on.)

Ooo, I disagree about the one-letter variables.  Every variable should
be either lambda, l, l1, ll, ll1, lll, l1l, etc.  Of course, this
technique works even better with scheme, but oh well :)

(I tried doing this for a friend as a joke, "can you help me with my
homework?" "sure, here's the answer...", he fixed a bug in it and
turned it in ... sigh ... on the other hand, he probably understood
his answer better than half his class who just sort of messed with
stuff until it "worked")


-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: ···@itasoftware.com
Subject: Re: seperating words
Date: 
Message-ID: <zo6q5ay6.fsf@itasoftware.com>
···@famine.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> Ooo, I disagree about the one-letter variables.  Every variable should
> be either lambda, l, l1, ll, ll1, lll, l1l, etc.  Of course, this
> technique works even better with scheme, but oh well :)

Variables?  Why not just express the result in S and K combinators?
From: Gabe Garza
Subject: Re: seperating words
Date: 
Message-ID: <pu7l27wq.fsf@kynopolis.org>
Tim Bradshaw <···@cley.com> writes:

> - it's essential when doing this to use one-letter variable names and
> reuse the same name in as many places as you can.  I feel the end
> result is sufficiently inscrutable that, although it solves the
> problem, if it was handed in it would be obvious that something odd
> had gone on.)

   Isn't this a legitimate use of GENTEMP? Also, don't forget that
LET and LET* can be replaced by LAMBDA. :)

(defun do-homework (input-file output-file)
  ((lambda (t269 t268)
     (with-open-file (file input-file :direction :input)
        ((lambda (t280 t279 t278 t277 t276)
          (handler-case
           (loop (push (funcall t280 t278 t276 t277 t279) t268))
           (end-of-file ()
             (with-open-file (*standard-output* output-file :direction :output)
               (print
                (mapcar
                 (lambda (list)
                   (mapcar
                    (lambda (string)
                      (intern (string-upcase string)))
                    list))
                 t268))))))
         (lambda (t273 t270 t272 t274)
           ((lambda (t281)
             ((lambda (t282) (cons t281 t282))
              ((lambda (t283)
                 (loop
                  ((lambda (t284)
                     (push t284 t283)
                     (when t269 (setf t269 nil) (return)))
                   (funcall t274 t270 t272)))
                 t283)
               nil)))
           (funcall t273 t270)))
        (lambda (t270 t272)
          ((lambda (t285)
             (loop (unless (funcall t272 #\Space) (return)))
             (loop
              ((lambda (t286)
                 (cond ((char= t286 #\,) (setf t269 nil) (return t285))
                       ((char= t286 #\Linefeed) (setf t269 t) (return t285))
                       (t (vector-push-extend t286 t285))))
               (read-char file))))
           (funcall t270)))
        (lambda (t270)
          ((lambda (t287)
             (loop
              ((lambda (t288)
                 (if (char= t288 #\:)
                     (return t287)
                     (vector-push-extend t288 t287)))
               (read-char file))))
           (funcall t270)))
        (lambda (t271)
          (when (char= (peek-char nil file) t271) (read-char file)))
        (lambda ()
          (make-array 0
           :element-type 'character :adjustable t :fill-pointer t)))))
   nil nil))

Usage:

file.txt:
animal: bear, cat, dog, goat
car: ford, chevy, toyota
good: lisp, bassoons
bad: C++

[EOF]

[6]> (do-homework "file.txt" "output.txt")
((BAD C++) (GOOD BASSOONS LISP) (CAR TOYOTA CHEVY FORD) 
 (ANIMAL GOAT DOG CAT BEAR))


 
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3be00536.11387377@news.uio.no>
On Wed, 17 Oct 2001 22:55:49 GMT, Gabe Garza <·······@ix.netcom.com>
wrote:

<...> (I couldn't put the code here or it would say: more included
text than added text)
>
>Usage:
>
>file.txt:
>animal: bear, cat, dog, goat
>car: ford, chevy, toyota
>good: lisp, bassoons
>bad: C++
>
>[EOF]
>
>[6]> (do-homework "file.txt" "output.txt")
>((BAD C++) (GOOD BASSOONS LISP) (CAR TOYOTA CHEVY FORD) 
> (ANIMAL GOAT DOG CAT BEAR))
>

the goal is that the output.txt has the same order as the input, there
is no need to reverse the list. I tried (reverse (do-homework
"file.txt" "output.txt") and it gives me the lists in the right order
but still the elemnts in the lists are reversed as you can see. f.eks.
(good bassoons lisp) where it should be (good lisp bassoons)
how can we fix this problem?

ab talebi
From: Paul Foley
Subject: Re: seperating words
Date: 
Message-ID: <m2elo2nt0q.fsf@mycroft.actrix.gen.nz>
On 17 Oct 2001 14:00:45 +0100, Tim Bradshaw wrote:

> ((lambda (l)
>    (funcall l l 0))
>  (lambda (l x)
>    ...
>    (funcall l l (1+ x))))

> - it's essential when doing this to use one-letter variable names and
> reuse the same name in as many places as you can.

For maximal effect, instead of one-letter variable names reuse the
names FUNCALL, LAMBDA and whatever other functions you use in the main
body as variables.  Also, write in all-caps, leave out whitespace
where you can, and reimplement built in functions in the same form, as
much as you can stand.  E.g., instead of (length some-list), write

(FUNCALL(LAMBDA(LAMBDA)(FUNCALL((LAMBDA(LAMBDA)((LAMBDA(FUNCALL)(FUNCALL
 LAMBDA(LAMBDA(LAMBDA NULL)(FUNCALL(FUNCALL FUNCALL FUNCALL)LAMBDA NULL))))
 (LAMBDA(FUNCALL)(FUNCALL LAMBDA(LAMBDA(LAMBDA NULL)(FUNCALL(FUNCALL FUNCALL
 FUNCALL)LAMBDA NULL))))))(LAMBDA(FUNCALL)(LAMBDA(LAMBDA NULL)(IF(NULL LAMBDA)
 NULL(FUNCALL FUNCALL(CDR LAMBDA)(1+ NULL))))))LAMBDA 0)) some-list)

-- 
The power of accurate observation is commonly called cynicism by those
who have not got it.
                                                    -- George Bernard Shaw
(setq reply-to
  (concatenate 'string "Paul Foley " "<mycroft" '(··@) "actrix.gen.nz>"))
From: Martin Simmons
Subject: Re: seperating words
Date: 
Message-ID: <1003345958.181440@itn>
"Paul Foley" <·······@actrix.gen.nz> wrote in message
···················@mycroft.actrix.gen.nz...
> For maximal effect, instead of one-letter variable names reuse the
> names FUNCALL, LAMBDA and whatever other functions you use in the main
> body as variables.  Also, write in all-caps, leave out whitespace
> where you can, and reimplement built in functions in the same form, as
> much as you can stand.  E.g., instead of (length some-list), write
>
> (FUNCALL(LAMBDA(LAMBDA)(FUNCALL((LAMBDA(LAMBDA)((LAMBDA(FUNCALL)(FUNCALL
>  LAMBDA(LAMBDA(LAMBDA NULL)(FUNCALL(FUNCALL FUNCALL FUNCALL)LAMBDA NULL))))
>  (LAMBDA(FUNCALL)(FUNCALL LAMBDA(LAMBDA(LAMBDA NULL)(FUNCALL(FUNCALL FUNCALL
>  FUNCALL)LAMBDA NULL))))))(LAMBDA(FUNCALL)(LAMBDA(LAMBDA NULL)(IF(NULL LAMBDA)
>  NULL(FUNCALL FUNCALL(CDR LAMBDA)(1+ NULL))))))LAMBDA 0)) some-list)

With inappropriate use of backquote, you can avoid those pesky spaces :-)

#.(LET((L'FUNCALL)(F'LAMBDA)(Z'NULL))`(,L(,F(,F)(,L((,F(,F)((,F(,L)(,L,F(,F(,F,Z
)(,L(,L,L,L),F,Z))))(,F(,L)(,L,F(,F(,F,Z)(,L(,L,L,L),F,Z))))))(,F(,L)(,F(,F,Z)(I
F(,Z,F),Z(,L,L(CDR,F)(1+,Z)))))),F 0)) some-list))

--
Martin Simmons, Xanalys Software Tools
······@xanalys.com
rot13 to reply
From: Janis Dzerins
Subject: Re: seperating words
Date: 
Message-ID: <87g08io13s.fsf@asaka.latnet.lv>
············@yahoo.com (ab talebi) writes:

> On 17 Oct 2001 12:18:01 +0300, Janis Dzerins <·····@latnet.lv> wrote:
> 
> >> >············@yahoo.com (ab talebi) writes:
> >> >
> >> >> Hi
> >> >> I need a lisp function that reads a text like from a given
> >> >> filename (lexicon.txt) and extracts the definitions seperated
> >> >> by commas
> >> >> 
> >> >> animal: we have cats, dogs, monkey and other animals
> >> >> food: differet types like rice, beans, potato and some other
> >> >> car: we have mercedes, opel, mazda plus other cars
> >> >> 
> >> >> and gives me this result file (result.txt)
> >> >> 
> >> >> (animal cats dogs monkey)
> >> >> (food rice beans potato)
> >> >> (car mercedes opel mazda)
> <....>
> >
> >Ok, where are you stuck? (i.e. show us the code, tell us what you
> >expect it to do and what it does.)
> 
> Dear Janis, actually I'm not a Lisp-er at all, al I need is this one
> tiny program so therefor I haven't the foggiest idea about it,
> therefor I asked Lispers if they could be so kind and help me with
> it, but I understand that it's too much to ask for. Now can you help
> me?

Yes, I could try to help you. But I don't want to do your work for you.

If you can convince me or anybody else in this forum that doing this
is worthwhile, you will get help.

Now, to help you convince me, I have following questions:

  Why the function(s) have to be in Lisp?

  Why you need this function at all (i.e. is this some kind of
  homework, for self education purposes, for programming language
  comparison, etc)?

  What should the program do exactly (should it extract words
  seperated by comma after the colon, extract some selected nouns or
  something else I don't see at first glance)?

-- 
Janis Dzerins

  Eat shit -- billions of flies can't be wrong.
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3bcd8b0a.84361884@news.uio.no>
On 17 Oct 2001 16:19:35 +0300, Janis Dzerins <·····@latnet.lv> wrote:

>············@yahoo.com (ab talebi) writes:
>
>> On 17 Oct 2001 12:18:01 +0300, Janis Dzerins <·····@latnet.lv> wrote:
>> 
>> >> >············@yahoo.com (ab talebi) writes:
>> >> >
>> >> >> Hi
>> >> >> I need a lisp function that reads a text like from a given
>> >> >> filename (lexicon.txt) and extracts the definitions seperated
>> >> >> by commas
>> >> >> 
>> >> >> animal: we have cats, dogs, monkey and other animals
>> >> >> food: differet types like rice, beans, potato and some other
>> >> >> car: we have mercedes, opel, mazda plus other cars
>> >> >> 
>> >> >> and gives me this result file (result.txt)
>> >> >> 
>> >> >> (animal cats dogs monkey)
>> >> >> (food rice beans potato)
>> >> >> (car mercedes opel mazda)
>> <....>
>> >
<....>
>If you can convince me or anybody else in this forum that doing this
>is worthwhile, you will get help.

if it's worthwhile your time, only you can decied, I try my best to
convince you :) (my wife says to me that I'm not so good in convincing
people, but let me give it a shot any way) ...

>Now, to help you convince me, I have following questions:
>
>  Why the function(s) have to be in Lisp?

this is an afford to read throught large data-texts and extract
information inteligently, and as I was told Lisp is the best Language
because to handle it.

>  Why you need this function at all (i.e. is this some kind of
>  homework, for self education purposes, for programming language
>  comparison, etc)?

I need this fuction to read and extract any interesting information
from any given text. actually I was hoping if I could get help with
this question, I will be able to see the pattern and the approach to
the problem so I could apply it for extracting other information from
other texts. So you can be sure that you are not doing my homework for
me, because I will have to change and apply the code I get from you
(if I get it) to some other material and other matter.

>  What should the program do exactly (should it extract words
>  seperated by comma after the colon, extract some selected nouns or
>  something else I don't see at first glance)?

we have a large corpus of data in a file called corpus.txt that looks
like this:

animal: we have cats, dogs, monkey and other animals
food: differet types like rice, beans, potato and some other
car: we have mercedes, opel, mazda plus other cars

The "main-words" are always the first word followed by a collon and
examples of thoes "main-words" are seperated by commas. we would like
to intelligently read through that corpus.txt and put the "main-words"
toghether with their examples so we get a file called for example
result.txt that contains:

(animal cats dogs monkey)
(food rice beans potato)
(car mercedes opel mazda)


Now, Janis, what do you say? Is it worthwhile your time?

tnx
ab talebi
From: Tim Bradshaw
Subject: Re: seperating words
Date: 
Message-ID: <ey3d73mi8mp.fsf@cley.com>
* ab talebi wrote:
> we have a large corpus of data in a file called corpus.txt that looks
> like this:

> animal: we have cats, dogs, monkey and other animals
> food: differet types like rice, beans, potato and some other
> car: we have mercedes, opel, mazda plus other cars

If the file really looks like this then you probably have bigger
problems than you are implying here.  For instance take the second
line: you need to get rice, beans, potato out of this.  Looking for
words with a trailing comma doesn't help here because potato doesn't
have a trailing comma.  You need some kind of parsing to find the
interesting words - doing the tokenisation is a small fraction of this
problem.  Lisp is a good language for approaching this kind of
problem, but I'm afraid you are going to have to learn it to do this,
because while tokenising is pretty simple, writing a (heuristic,
probably) parser is going to require you to understand the language a
reasonable amount.

(as a hint, you almost certainly want to *keep* information about the
punctuation, because it's going to make your life easier later on I
expect, so even the tokeniser needs to do special-purpose stuff, you
might want to end up with ("we" "have" "cats" :comma "dogs" :comma
"monkey" "and" "other" "animals")


--tim
From: Kenny Tilton
Subject: Re: seperating words
Date: 
Message-ID: <3BCDA7EE.A8D985C@nyc.rr.com>
Tim Bradshaw wrote:
>  Looking for
> words with a trailing comma doesn't help here because potato doesn't
> have a trailing comma. 

ah, but the spec said "extracts the definitions seperated by commas" and
indeed none of the examples had a specific after "and". ie, there was no
case of "language: lisp, scheme and java", so its kinda cute, the job
really is to pick out any word appearing on either side of a comma.
unless i'm wrong.

tt
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3bcdaef2.4366062@news.uio.no>
On Wed, 17 Oct 2001 15:45:59 GMT, Kenny Tilton <·······@nyc.rr.com>
wrote:

>
>
>Tim Bradshaw wrote:
>>  Looking for
>> words with a trailing comma doesn't help here because potato doesn't
>> have a trailing comma. 
>
>ah, but the spec said "extracts the definitions seperated by commas" and
>indeed none of the examples had a specific after "and". ie, there was no
>case of "language: lisp, scheme and java", so its kinda cute, the job
>really is to pick out any word appearing on either side of a comma.
>unless i'm wrong.
>

You are right the job is to pick out any word that appears before a
comma and (since we do not have a comma after the last one,) the word
on either side on the last comma

ab talebi
From: Tim Bradshaw
Subject: Re: seperating words
Date: 
Message-ID: <ey38zeai7qb.fsf@cley.com>
* Kenny Tilton wrote:

> ah, but the spec said "extracts the definitions seperated by commas" and
> indeed none of the examples had a specific after "and". ie, there was no
> case of "language: lisp, scheme and java", so its kinda cute, the job
> really is to pick out any word appearing on either side of a comma.
> unless i'm wrong.

Well, if the input is that clean, I think we know it's homework, don't
we? If this was a real-world problem then it would have all sorts of
random exceptions to this (at least in my experience of this kind of
stuff, there's really a lot of grut in supposedly-clean NL input).

--tim
From: Kenny Tilton
Subject: Re: seperating words
Date: 
Message-ID: <3BCDAD05.A041CF71@nyc.rr.com>
Tim Bradshaw wrote:
> 
> * Kenny Tilton wrote:
> 
> > ah, but the spec said "extracts the definitions seperated by commas" and
> > indeed none of the examples had a specific after "and".
> 
> Well, if the input is that clean, I think we know it's homework, don't
> we? 

oh yeah, that to me was the biggest tip-off that it was homework

kenny
clinisys
From: Janis Dzerins
Subject: Re: seperating words
Date: 
Message-ID: <877ktunwvk.fsf@asaka.latnet.lv>
············@yahoo.com (ab talebi) writes:

> this is an afford to read throught large data-texts and extract
> information inteligently, and as I was told Lisp is the best
> Language because to handle it.

Depends on what degree of intelligence you need. For simple regular
expression matching it might not be best, but for general pattern
matching it is really good (as is Prolog, but I have no experience
with it).

> I need this fuction to read and extract any interesting information
> from any given text. actually I was hoping if I could get help with
> this question, I will be able to see the pattern and the approach to
> the problem so I could apply it for extracting other information
> from other texts. So you can be sure that you are not doing my
> homework for me, because I will have to change and apply the code I
> get from you (if I get it) to some other material and other matter.

You mean -- I show you haw to build intelligent information extractors
and you go write a pile of them and earn an even bigger pile of money
using them? :) (Not a bad thing in itself -- just not a slightest bit
better than the homework situation.)

> we have a large corpus of data in a file called corpus.txt that
> looks like this:
> 
> animal: we have cats, dogs, monkey and other animals
> food: differet types like rice, beans, potato and some other
> car: we have mercedes, opel, mazda plus other cars
> 
> The "main-words" are always the first word followed by a collon and
> examples of thoes "main-words" are seperated by commas. we would
> like to intelligently read through that corpus.txt and put the
> "main-words" toghether with their examples so we get a file called
> for example result.txt that contains:
> 
> (animal cats dogs monkey)
> (food rice beans potato)
> (car mercedes opel mazda)

Ok. The only problem is -- how do you know which are the "example
words"? Is this the kind of intelligence that needs to be implemented?

Who writes the corpus.txt file and why there is so much noise in
there? Can you influence the format of corpus.txt file so that it
looks like this:

animal: cats, dogs, monkey
food: rice, beans, potato
car: mercedes, opel, mazda

(In this case the task does not need any intelligence at all.)

What's the point of including "other animals" in "animal" category and
"other cars" in "car" category?

Are the examples specified using only "we have ... and other
<main-word>s" or "different types like ... and some other"
forms? (In this case task does not require any intelligence as well.)

> Now, Janis, what do you say? Is it worthwhile your time?

The problem might turn out interesting and solvable. I just think that
if you are going to write general enough "intelligent information
extractor" (i.e. not covered by two cases I mentioned above) you
should start learning Common Lisp and Peter Norvig's "Paradigms of
Artificial Incelligence Programmind: Case Studies in Common Lisp"
(also known as PAIP) would be just what you need.

Let's see what you really need.

-- 
Janis Dzerins

  Eat shit -- billions of flies can't be wrong.
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3bcda4b1.1740712@news.uio.no>
On 17 Oct 2001 17:50:55 +0300, Janis Dzerins <·····@latnet.lv> wrote:

>············@yahoo.com (ab talebi) writes:
>
<...>
>
>> we have a large corpus of data in a file called corpus.txt that
>> looks like this:
>> 
>> animal: we have cats, dogs, monkey and other animals
>> food: differet types like rice, beans, potato and some other
>> car: we have mercedes, opel, mazda plus other cars
>> 
>> The "main-words" are always the first word followed by a collon and
>> examples of thoes "main-words" are seperated by commas. we would
>> like to intelligently read through that corpus.txt and put the
>> "main-words" toghether with their examples so we get a file called
>> for example result.txt that contains:
>> 
>> (animal cats dogs monkey)
>> (food rice beans potato)
>> (car mercedes opel mazda)
>
>Ok. The only problem is -- how do you know which are the "example
>words"? Is this the kind of intelligence that needs to be implemented?

well, may be intelligence is not the word, but "example words" are
seperated by commas (they have a comma after them except the last one)
but if the last one is a bit difficult we can say that "example words"
are seperated by commas

>Who writes the corpus.txt file and why there is so much noise in
>there? Can you influence the format of corpus.txt file so that it
>looks like this:
>
>animal: cats, dogs, monkey
>food: rice, beans, potato
>car: mercedes, opel, mazda
>
>(In this case the task does not need any intelligence at all.)

no, the corpux.txt can not be changed. I agree that if we could change
the corpus to something like that, it would be no problem to solve at
all

>What's the point of including "other animals" in "animal" category and
>"other cars" in "car" category?

it is not a point, and they should not be included because the
corpus.txt looks like this:

animal: we have cats, dogs, monkey and other animals
food: differet types like rice, beans, potato and some other
car: we have mercedes, opel, mazda plus other cars

and "other animals" in "animal" category and "other cars" in "car"
category are not separated by commas

>Are the examples specified using only "we have ... and other
><main-word>s" or "different types like ... and some other"
>forms? (In this case task does not require any intelligence as well.)
>

no, again I agree that if it would be like this it would be very easy.
I hope this was clarifing a bit. Can you see what you can do?

tnx
ab talebi
From: Janis Dzerins
Subject: Re: seperating words
Date: 
Message-ID: <873d4ins8k.fsf@asaka.latnet.lv>
············@yahoo.com (ab talebi) writes:

> On 17 Oct 2001 17:50:55 +0300, Janis Dzerins <·····@latnet.lv> wrote:
> 
> >Ok. The only problem is -- how do you know which are the "example
> >words"? Is this the kind of intelligence that needs to be
> >implemented?
> 
> well, may be intelligence is not the word, but "example words" are
> seperated by commas (they have a comma after them except the last
> one) but if the last one is a bit difficult we can say that "example
> words" are seperated by commas

Well -- if this is all what's needed then it is too simple to be a
real problem. Are you sure this is not [a part of] your homework? 

> >Are the examples specified using only "we have ... and other
> ><main-word>s" or "different types like ... and some other" forms?
> >(In this case task does not require any intelligence as well.)
> 
> no, again I agree that if it would be like this it would be very
> easy.  I hope this was clarifing a bit.

If only the words seperated by comma must be extracted then the
problem is as easy. (Very easy as you say :)

> Can you see what you can do?

First of all -- you can use your favorite programming language to do
the task since it is all about finding colons, commas and spaces and
substring copying.

And I can tell you to go to
http://www.xanalys.com/software_tools/reference/HyperSpec/FrontMatter/index.html
and look up entries for POSITION and SUBSEQ functions, although this
won't help if you don't know how to define a function in lisp (some
tutorials and introductory texts are available on the net, thought).

-- 
Janis Dzerins

  Eat shit -- billions of flies can't be wrong.
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3bcdb78e.6570823@news.uio.no>
On 17 Oct 2001 19:31:07 +0300, Janis Dzerins <·····@latnet.lv> wrote:

>············@yahoo.com (ab talebi) writes:
>
>> On 17 Oct 2001 17:50:55 +0300, Janis Dzerins <·····@latnet.lv> wrote:
>> 
>> >Ok. The only problem is -- how do you know which are the "example
>> >words"? Is this the kind of intelligence that needs to be
>> >implemented?
>> 
>> well, may be intelligence is not the word, but "example words" are
>> seperated by commas (they have a comma after them except the last
>> one) but if the last one is a bit difficult we can say that "example
>> words" are seperated by commas

you see the words are seperated by commas except the last one that has
no comma after it so it will be a tricky problem. Can you think of any
solution for that?

ab talebi
From: Kent M Pitman
Subject: Re: seperating words
Date: 
Message-ID: <sfwpu7m2o61.fsf@world.std.com>
············@yahoo.com (ab talebi) writes:

> 
> On 17 Oct 2001 19:31:07 +0300, Janis Dzerins <·····@latnet.lv> wrote:
> 
> >············@yahoo.com (ab talebi) writes:
> >
> >> On 17 Oct 2001 17:50:55 +0300, Janis Dzerins <·····@latnet.lv> wrote:
> >> 
> >> >Ok. The only problem is -- how do you know which are the "example
> >> >words"? Is this the kind of intelligence that needs to be
> >> >implemented?
> >> 
> >> well, may be intelligence is not the word, but "example words" are
> >> seperated by commas (they have a comma after them except the last
> >> one) but if the last one is a bit difficult we can say that "example
> >> words" are seperated by commas
> 
> you see the words are seperated by commas except the last one that has
> no comma after it so it will be a tricky problem. Can you think of any
> solution for that?

Yes.  The operator OR.
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3bcdbc3e.7770922@news.uio.no>
On Wed, 17 Oct 2001 17:04:38 GMT, Kent M Pitman <······@world.std.com>
wrote:

>············@yahoo.com (ab talebi) writes:
>
>> 
>> On 17 Oct 2001 19:31:07 +0300, Janis Dzerins <·····@latnet.lv> wrote:
>> 
>> >············@yahoo.com (ab talebi) writes:
>> >
>> >> On 17 Oct 2001 17:50:55 +0300, Janis Dzerins <·····@latnet.lv> wrote:
>> >> 
>> >> >Ok. The only problem is -- how do you know which are the "example
>> >> >words"? Is this the kind of intelligence that needs to be
>> >> >implemented?
>> >> 
>> >> well, may be intelligence is not the word, but "example words" are
>> >> seperated by commas (they have a comma after them except the last
>> >> one) but if the last one is a bit difficult we can say that "example
>> >> words" are seperated by commas
>> 
>> you see the words are seperated by commas except the last one that has
>> no comma after it so it will be a tricky problem. Can you think of any
>> solution for that?
>
>Yes.  The operator OR.

but how do we know if it's the last comma?
From: V. R. Puckett
Subject: Re: seperating words
Date: 
Message-ID: <b32669e9lbo.fsf@w4pphx2t.us.nortel.com>
············@yahoo.com (ab talebi) writes:
> but how do we know if it's the last comma?

A conceptually simple, but far from optimal approach to your problem
would be to divide your string up into substrings.  Take this string,
for example:

    "deal with the first, second, third, fourth items"

What if you divided this string into substrings wherever there was a
comma?  That would give you four substrings in this case:

    1.  "deal with the first"
    2.  "second"
    3.  "third"
    4.  "fourth items"

Strings 2 and 3 are in the correct form.  Strings 1 and 4 are not.
For string 1, you only want the last word in the string.  For string
4, you only want the first word.

Based on this example, you can generalize your approach into an
algorithm:

    Make a list of substrings that split the original string
    wherever it has a comma.

    If there are two or more substrings...
        In the first substring, chomp off everything except the last word.
        In the last substring, chomp off everything except the first word.

This should be relatively easy to code.  What to do if there is just
one substring after the comma split is left as an exercise for the
reader.


-- 
V. Z. Puckett                              replace "sendnospam" with "puckett"
From: Edward Fagan
Subject: Re: seperating words
Date: 
Message-ID: <ey3sncigl7e.fsf@dom.ain>
* ab talebi wrote:

> but how do we know if it's the last comma?

Well you can compare its position with that of the last comma in the
string.  Something like this function will tell you the position of
the last comma in a string.

(defun find-last-comma (s)
  (loop for n from 0 below (length s)
      if (char= (aref s n) #\,)
      do
	(if (and (loop for n downfrom (1- (length s)) to 0
		     if (char= (aref n s) #\,)
		     return n
		     finally (return nil))
		 (= n (loop for n downfrom (1- (length s)) to 0
			  if (char= (aref s n) #\,)
			  return n
			  finally (return nil))))
	    (return-from find-last-coma n))
      finally (return nil)))
From: Janis Dzerins
Subject: Re: seperating words
Date: 
Message-ID: <87y9m9mk36.fsf@asaka.latnet.lv>
Edward Fagan <··@dom.ain> writes:

> * ab talebi wrote:
> 
> > but how do we know if it's the last comma?
> 
> Well you can compare its position with that of the last comma in the
> string.  Something like this function will tell you the position of
> the last comma in a string.
> 
> (defun find-last-comma (s)
>   (loop for n from 0 below (length s)
>       if (char= (aref s n) #\,)
>       do
> 	(if (and (loop for n downfrom (1- (length s)) to 0
> 		     if (char= (aref n s) #\,)
> 		     return n
> 		     finally (return nil))
> 		 (= n (loop for n downfrom (1- (length s)) to 0
> 			  if (char= (aref s n) #\,)
> 			  return n
> 			  finally (return nil))))
> 	    (return-from find-last-coma n))
>       finally (return nil)))

How about (position #\, string :from-end t)? Which one would you like
to maintain?

-- 
Janis Dzerins

  Eat shit -- billions of flies can't be wrong.
From: Tim Bradshaw
Subject: Re: seperating words
Date: 
Message-ID: <ey33d4hgvch.fsf@cley.com>
* Janis Dzerins wrote:
> How about (position #\, string :from-end t)? Which one would you like
> to maintain?

Well, I can't really speak for Edward, but I rather suspect that those
inner loops in his solution were meant to be some kind of pointer to
a better answer than his.  Perhaps he doesn't know about POSITION and
does it with descending LOOPs like that.

--tim
From: Martin Simmons
Subject: Re: seperating words
Date: 
Message-ID: <1003432140.197155@itn.cam.harlequin.co.uk>
"Edward Fagan" <··@dom.ain> wrote in message ····················@dom.ain...
> * ab talebi wrote:
> 
> > but how do we know if it's the last comma?
> 
> Well you can compare its position with that of the last comma in the
> string.  Something like this function will tell you the position of
> the last comma in a string.

That is a work of genius :-)
-- 
Martin Simmons, Xanalys Software Tools
······@xanalys.com
rot13 to reply
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3be0034c.10897094@news.uio.no>
On 17 Oct 2001 19:44:53 +0100, Edward Fagan <··@dom.ain> wrote:

>* ab talebi wrote:
>
>> but how do we know if it's the last comma?
>
>Well you can compare its position with that of the last comma in the
>string.  Something like this function will tell you the position of
>the last comma in a string.
>
>(defun find-last-comma (s)
>  (loop for n from 0 below (length s)
>      if (char= (aref s n) #\,)
>      do
>	(if (and (loop for n downfrom (1- (length s)) to 0
>		     if (char= (aref n s) #\,)
>		     return n
>		     finally (return nil))
>		 (= n (loop for n downfrom (1- (length s)) to 0
>			  if (char= (aref s n) #\,)
>			  return n
>			  finally (return nil))))
>	    (return-from find-last-coma n))
>      finally (return nil)))

this doensn't work because in the line before the last line  you say
(return-from find-last-coma, here it misses one m

so the function will not work as it is, but my question is what is the
usage? I tried with:
(find-last-comma '(hello, world))
and I got:
Error: A comma appears outside the scope of a backquote (or there are
too many commas).
From: Martti Halminen
Subject: Re: seperating words
Date: 
Message-ID: <3BE039C1.5E0DF7AA@kolumbus.fi>
ab talebi wrote:
>
> >Well you can compare its position with that of the last comma in the
> >string.  Something like this function will tell you the position of
> >the last comma in a string.
> >
> >(defun find-last-comma (s)
> >  (loop for n from 0 below (length s)
> >      if (char= (aref s n) #\,)
> >      do
> >       (if (and (loop for n downfrom (1- (length s)) to 0
> >                    if (char= (aref n s) #\,)
> >                    return n
> >                    finally (return nil))
> >                (= n (loop for n downfrom (1- (length s)) to 0
> >                         if (char= (aref s n) #\,)
> >                         return n
> >                         finally (return nil))))
> >           (return-from find-last-coma n))
> >      finally (return nil)))
> 
> this doensn't work because in the line before the last line  you say
> (return-from find-last-coma, here it misses one m
> 
> so the function will not work as it is, but my question is what is the
> usage? I tried with:
> (find-last-comma '(hello, world))
> and I got:
> Error: A comma appears outside the scope of a backquote (or there are
> too many commas).

This would work rather better if you used it as its writer suggested and
gave it a string instead of a (malformed) list.

So try to teach yourself the difference between lists and strings. A
look at sequences might prove useful, too...

--
From: ab talebi
Subject: Re: seperating words
Date: 
Message-ID: <3be12079.83913352@news.uio.no>
<...>
>> >(defun find-last-comma (s)
>> >  (loop for n from 0 below (length s)
>> >      if (char= (aref s n) #\,)
>> >      do
>> >       (if (and (loop for n downfrom (1- (length s)) to 0
>> >                    if (char= (aref n s) #\,)
>> >                    return n
>> >                    finally (return nil))
>> >                (= n (loop for n downfrom (1- (length s)) to 0
>> >                         if (char= (aref s n) #\,)
>> >                         return n
>> >                         finally (return nil))))
>> >           (return-from find-last-coma n))
>> >      finally (return nil)))
>> 
>This would work rather better if you used it as its writer suggested and
>gave it a string instead of a (malformed) list.
>

you mean like this:
(find-last-comma "hello, world")

Error: In a call to AREF: 11 is not of type ARRAY.
From: Martti Halminen
Subject: Re: seperating words
Date: 
Message-ID: <3BE1D1C9.45E9DA3C@kolumbus.fi>
ab talebi wrote:

> >> >(defun find-last-comma (s)
> >> >  (loop for n from 0 below (length s)
> >> >      if (char= (aref s n) #\,)
> >> >      do
> >> >       (if (and (loop for n downfrom (1- (length s)) to 0
> >> >                    if (char= (aref n s) #\,)
> >> >                    return n
> >> >                    finally (return nil))
> >> >                (= n (loop for n downfrom (1- (length s)) to 0
> >> >                         if (char= (aref s n) #\,)
> >> >                         return n
> >> >                         finally (return nil))))
> >> >           (return-from find-last-coma n))
> >> >      finally (return nil)))

> >This would work rather better if you used it as its writer suggested and
> >gave it a string instead of a (malformed) list.

> you mean like this:
> (find-last-comma "hello, world")

> Error: In a call to AREF: 11 is not of type ARRAY.


Oh well, seems to have a few bugs; the original writer possibly didn't
test this, either. Try (aref s n) instead of (aref n s), and add an m to
"coma".

--
From: Tim Bradshaw
Subject: Re: seperating words
Date: 
Message-ID: <fbc0f5d1.0111020150.a7f5cc5@posting.google.com>
Martti Halminen <···············@kolumbus.fi> wrote in message news:<·················@kolumbus.fi>...

> Oh well, seems to have a few bugs; the original writer possibly didn't
> test this, either. Try (aref s n) instead of (aref n s), and add an m to
> "coma".

I have a suspicion that the original writer tested it and then
introduced a couple of bugs to make life more interesting.

--tim
From: Tim Bradshaw
Subject: Re: seperating words
Date: 
Message-ID: <fbc0f5d1.0111010724.11b6eb76@posting.google.com>
> 
> this doensn't work because in the line before the last line  you say
> (return-from find-last-coma, here it misses one m

And that's not the only bug, although it might be the only one the
compiler picks up.  I have this feeling that the idea is you're meant
to look at the code and understand it, and work out how it does what
it does, and how, maybe, you could do it a bit more simply.

--tim