Hi all, I'm trying to write a little program that searches through a
text for a specific word and then outputs the sentence containg the
word as well as the sentence before and the sentence after; for
example the little text below (stored in "c:/target.txt"):
I wandered lonely as a cloud
That floats on high over vales and hills,
When all at once I saw a crowd,
A host, of golden daffodils;
Beside the lake, beneath the trees,
Fluttering and dancing in the breeze.
William Wordsworth
Now if I search for the word daffodils I should get as output:
> When all at once I saw a crowd, A host, of golden daffodils;Beside the lake, beneath the trees,
In a rather "lame" attempt I tried to write this:
(with-open-file (s-in "c:/target.txt" :direction :input)
(do ((line1 (read-line s-in nil nil) (read-line s-in nil nil))
(line2 (read-line s-in nil nil) (read-line s-in nil nil))
(line3 (read-line s-in nil nil) (read-line s-in nil nil)))
((not line1))
(setf cp1 (search "daffodils" line1 :test #'char-equal))
(if (not (null cp1))
(format t "~A ~A ~A~%"
line1 line2 line3))
(setf cp2 (search "daffodils" line2 :test #'char-equal))
(if (not (null cp2))
(format t "~A ~A ~A~%"
line1 line2 line3)
(setf cp3 (search "daffodils" line3 :test #'char-equal)))
(if (not (null cp3))
(format t "~A ~A ~A~%"
line1 line2 line3))))
But the output is:
> A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.
So it's outputting 3 stences alright, but not the one before and the
one after (understandably because the function just stores 3 sentences
at a time).
Can someone help me modify this a little to get what I need (knowing
that the aim afterwards is to go through a large text where the word
can occur many times). Thanks
Something like this:
(defun search-text-file (&optional (filename "c:/text.txt")
(text "daffodils"))
(with-open-file (s filename :direction :input)
(let (prev-line curr-line next-line)
(loop
(shiftf prev-line curr-line
next-line (read-line s nil nil))
(when (search text curr-line :test #'char-equal)
(format t "~A ~A ~A~&"
prev-line curr-line next-line))
(when (null next-line)
(return))))))
Francogrex wrote:
> Hi all, I'm trying to write a little program that searches through a
> text for a specific word and then outputs the sentence containg the
> word as well as the sentence before and the sentence after; for
> example the little text below (stored in "c:/target.txt"):
>
> I wandered lonely as a cloud
> That floats on high over vales and hills,
> When all at once I saw a crowd,
> A host, of golden daffodils;
> Beside the lake, beneath the trees,
> Fluttering and dancing in the breeze.
> William Wordsworth
>
> Now if I search for the word daffodils I should get as output:
>> When all at once I saw a crowd, A host, of golden daffodils;Beside the lake, beneath the trees,
>
> In a rather "lame" attempt I tried to write this:
> (with-open-file (s-in "c:/target.txt" :direction :input)
> (do ((line1 (read-line s-in nil nil) (read-line s-in nil nil))
> (line2 (read-line s-in nil nil) (read-line s-in nil nil))
> (line3 (read-line s-in nil nil) (read-line s-in nil nil)))
> ((not line1))
> (setf cp1 (search "daffodils" line1 :test #'char-equal))
> (if (not (null cp1))
> (format t "~A ~A ~A~%"
> line1 line2 line3))
> (setf cp2 (search "daffodils" line2 :test #'char-equal))
> (if (not (null cp2))
> (format t "~A ~A ~A~%"
> line1 line2 line3)
> (setf cp3 (search "daffodils" line3 :test #'char-equal)))
> (if (not (null cp3))
> (format t "~A ~A ~A~%"
> line1 line2 line3))))
> But the output is:
>> A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.
>
> So it's outputting 3 stences alright, but not the one before and the
> one after (understandably because the function just stores 3 sentences
> at a time).
>
> Can someone help me modify this a little to get what I need (knowing
> that the aim afterwards is to go through a large text where the word
> can occur many times). Thanks
On Oct 15, 3:48 pm, Francogrex <······@grex.org> wrote:
> Hi all, I'm trying to write a little program that searches through a
> text for a specific word and then outputs the sentence containg the
> word as well as the sentence before and the sentence after; for
> example the little text below (stored in "c:/target.txt"):
[ .. Snip .. ]
> In a rather "lame" attempt I tried to write this:
>
[.. Snipped code ..]
Not a bad first attempt.
> Can someone help me modify this a little to get what I need (knowing
> that the aim afterwards is to go through a large text where the word
> can occur many times). Thanks
Here's hoping this isn't homework, but since a first attempt was
posted... Here's a tail-recursive solution. Might be easier
iteratively, but this was just the first thing that came to mind.
(defun parse-lines (filename search-word)
(with-open-file (s filename)
(labels ((parse-next-line (prev-line prev-mark)
(let* ((line (read-line s)) ; error on eof
(mark (search search-word line)))
(when (and prev-line mark (not prev-mark))
(print prev-line))
(when (or mark prev-mark)
(print line))
(parse-next-line line mark))))
(ignore-errors
(parse-next-line nil nil)))))
Core idea behind this solution:
- read each line 1 at a time, tracking the previous line read as well
- if I have "the mark" and the previous line didn't, print previous
line
- if I have "the mark" or the previous line did, print this line
- parse the next line with the new line update
- error on eof, but we don't care since we're just outputting text
HTH,
Jeff M.
On Oct 15, 1:48 pm, Francogrex <······@grex.org> wrote:
> Hi all, I'm trying to write a little program that searches through a
> text for a specific word and then outputs the sentence containg the
> word as well as the sentence before and the sentence after; for
> example the little text below (stored in "c:/target.txt"):
>
> I wandered lonely as a cloud
> That floats on high over vales and hills,
> When all at once I saw a crowd,
> A host, of golden daffodils;
> Beside the lake, beneath the trees,
> Fluttering and dancing in the breeze.
> William Wordsworth
>
> Now if I search for the word daffodils I should get as output:
>
> > When all at once I saw a crowd, A host, of golden daffodils;Beside the lake, beneath the trees,
this is much easier in emacs lisp.
Here's the code that does what u want, quickly coded in 11 minutes.
(writing this post took more than 11 min.)
(defun my-search (file word)
"search `file' and return adjacent lines containing the `word'.
`file' is a file path. `word' is a string."
(interactive)
(let (neighborLines p1 p2)
(find-file file)
(while (search-forward word nil t)
(save-excursion
(move-beginning-of-line 1)
(previous-line)
(setq p1 (point))
(next-line 2)
(move-end-of-line 1)
(setq p2 (point))
(setq neighborLines (buffer-substring-no-properties p1 p2))
(print neighborLines)
)
)
))
(my-search "xx.el" "daffodils")
as you can see, the code is geared toward text processing, so you
don't have to deal with nitty-gritty details. (for example, it has
primitives that deals with lines, sentences, syntax, and handles file
encoding, backup, access, permissions etc all automatically)
Basically, elisp system is the best text processing language, even
more powerful than Perl.
See also:
• Text Processing: Emacs Lisp vs Perl
http://xahlee.org/emacs/elisp_text_processing_lang.html
• Generate a Web Links Report with Emacs Lisp
http://xahlee.org/emacs/elisp_link_report.html
Xah
∑ http://xahlee.org/
☄
> In a rather "lame" attempt I tried to write this:
> (with-open-file (s-in "c:/target.txt" :direction :input)
> (do ((line1 (read-line s-in nil nil) (read-line s-in nil nil))
> (line2 (read-line s-in nil nil) (read-line s-in nil nil))
> (line3 (read-line s-in nil nil) (read-line s-in nil nil)))
> ((not line1))
> (setf cp1 (search "daffodils" line1 :test #'char-equal))
> (if (not (null cp1))
> (format t "~A ~A ~A~%"
> line1 line2 line3))
> (setf cp2 (search "daffodils" line2 :test #'char-equal))
> (if (not (null cp2))
> (format t "~A ~A ~A~%"
> line1 line2 line3)
> (setf cp3 (search "daffodils" line3 :test #'char-equal)))
> (if (not (null cp3))
> (format t "~A ~A ~A~%"
> line1 line2 line3))))
> But the output is:
>
> > A host, of golden daffodils; Beside the lake, beneath the trees, Fluttering and dancing in the breeze.
>
> So it's outputting 3 stences alright, but not the one before and the
> one after (understandably because the function just stores 3 sentences
> at a time).
>
> Can someone help me modify this a little to get what I need (knowing
> that the aim afterwards is to go through a large text where the word
> can occur many times). Thanks
On Wed, 15 Oct 2008 16:27:32 -0700 (PDT) ·······@gmail.com" <······@gmail.com> wrote:
xc> On Oct 15, 1:48 pm, Francogrex <······@grex.org> wrote:
>> Hi all, I'm trying to write a little program that searches through a
>> text for a specific word and then outputs the sentence containg the
>> word as well as the sentence before and the sentence after; for
>> example the little text below (stored in "c:/target.txt"):
>>
>> I wandered lonely as a cloud
>> That floats on high over vales and hills,
>> When all at once I saw a crowd,
>> A host, of golden daffodils;
>> Beside the lake, beneath the trees,
>> Fluttering and dancing in the breeze.
>> William Wordsworth
>>
>> Now if I search for the word daffodils I should get as output:
>>
>> > When all at once I saw a crowd, A host, of golden daffodils;Beside the lake, beneath the trees,
xc> this is much easier in emacs lisp.
xc> Here's the code that does what u want, quickly coded in 11 minutes.
xc> (writing this post took more than 11 min.)
xc> (defun my-search (file word)
xc> "search `file' and return adjacent lines containing the `word'.
xc> `file' is a file path. `word' is a string."
xc> (interactive)
xc> (let (neighborLines p1 p2)
xc> (find-file file)
xc> (while (search-forward word nil t)
xc> (save-excursion
xc> (move-beginning-of-line 1)
xc> (previous-line)
xc> (setq p1 (point))
xc> (next-line 2)
xc> (move-end-of-line 1)
xc> (setq p2 (point))
xc> (setq neighborLines (buffer-substring-no-properties p1 p2))
xc> (print neighborLines)
xc> )
xc> )
xc> ))
xc> (my-search "xx.el" "daffodils")
xc> as you can see, the code is geared toward text processing, so you
xc> don't have to deal with nitty-gritty details. (for example, it has
xc> primitives that deals with lines, sentences, syntax, and handles file
xc> encoding, backup, access, permissions etc all automatically)
xc> Basically, elisp system is the best text processing language, even
xc> more powerful than Perl.
For this particular problem, "grep -C 1 daffodils textfile" would be the
best solution. It would be significantly faster than ELisp or Perl,
especially for large files.
The equivalent in Perl, which would work on any size file (like the grep
solution):
perl -n ···@out[0,1,2] = ($out[1], $out[2], $_); print @out if $out[1] =~ m/daffodils/;' FILE1 FILE2 ...
I strongly disagree ELisp is the best text processing language. It's
generally OK for small files, but really too slow and memory-intensive
for more demanding tasks. Perl is generally a good compromise between
memory usage, speed, ease of development, and features for any file size.
Ted
Ted Zlatanov <···@lifelogs.com> wrote:
>On Wed, 15 Oct 2008 16:27:32 -0700 (PDT) ·······@gmail.com" <······@gmail.com> wrote:
>
>xc> On Oct 15, 1:48 pm, Francogrex <······@grex.org> wrote:
>>> Hi all, I'm trying to write a little program that searches through a
>>> text for a specific word and then outputs the sentence containg the
>>> word as well as the sentence before and the sentence after; for
>>> example the little text below (stored in "c:/target.txt"):
>>>
>>> I wandered lonely as a cloud
>>> That floats on high over vales and hills,
>>> When all at once I saw a crowd,
>>> A host, of golden daffodils;
>>> Beside the lake, beneath the trees,
>>> Fluttering and dancing in the breeze.
>>> William Wordsworth
>>>
>>> Now if I search for the word daffodils I should get as output:
>>>
>>> > When all at once I saw a crowd, A host, of golden daffodils;Beside the lake, beneath the trees,
You seem to have an uncommon defintion of 'sentence'. To me the whole
poem is one single sentence.
>xc> this is much easier in emacs lisp.
[...]
>xc> as you can see, the code is geared toward text processing, so you
>xc> don't have to deal with nitty-gritty details. (for example, it has
>xc> primitives that deals with lines, sentences, syntax, and handles file
>xc> encoding, backup, access, permissions etc all automatically)
>
>xc> Basically, elisp system is the best text processing language, even
>xc> more powerful than Perl.
Oh, sorry for replying, didnt' see that this was just another bait from
our second-favourite troll.
jue
On Oct 16, 6:17 am, Ted Zlatanov <····@lifelogs.com> wrote:
> On Wed, 15 Oct 2008 16:27:32 -0700 (PDT) ·······@gmail.com" <······@gmail.com> wrote:
>
> xc> On Oct 15, 1:48 pm, Francogrex <······@grex.org> wrote:
>
>
>
> >> Hi all, I'm trying to write a little program that searches through a
> >> text for a specific word and then outputs the sentence containg the
> >> word as well as the sentence before and the sentence after; for
> >> example the little text below (stored in "c:/target.txt"):
>
> >> I wandered lonely as a cloud
> >> That floats on high over vales and hills,
> >> When all at once I saw a crowd,
> >> A host, of golden daffodils;
> >> Beside the lake, beneath the trees,
> >> Fluttering and dancing in the breeze.
> >> William Wordsworth
>
> >> Now if I search for the word daffodils I should get as output:
>
> >> > When all at once I saw a crowd, A host, of golden daffodils;Beside the lake, beneath the trees,
>
> xc> this is much easier in emacs lisp.
>
> xc> Here's the code that does what u want, quickly coded in 11 minutes.
> xc> (writing this post took more than 11 min.)
>
> xc> (defun my-search (file word)
> xc> "search `file' and return adjacent lines containing the `word'.
> xc> `file' is a file path. `word' is a string."
> xc> (interactive)
> xc> (let (neighborLines p1 p2)
> xc> (find-file file)
> xc> (while (search-forward word nil t)
> xc> (save-excursion
> xc> (move-beginning-of-line 1)
> xc> (previous-line)
> xc> (setq p1 (point))
> xc> (next-line 2)
> xc> (move-end-of-line 1)
> xc> (setq p2 (point))
> xc> (setq neighborLines (buffer-substring-no-properties p1 p2))
> xc> (print neighborLines)
> xc> )
> xc> )
> xc> ))
>
> xc> (my-search "xx.el" "daffodils")
>
> xc> as you can see, the code is geared toward text processing, so you
> xc> don't have to deal with nitty-gritty details. (for example, it has
> xc> primitives that deals with lines, sentences, syntax, and handles file
> xc> encoding, backup, access, permissions etc all automatically)
>
> xc> Basically, elisp system is the best text processing language, even
> xc> more powerful than Perl.
>
> For this particular problem, "grep -C 1 daffodils textfile" would be the
> best solution. It would be significantly faster than ELisp or Perl,
> especially for large files.
>
> The equivalent in Perl, which would work on any size file (like the grep
> solution):
>
> perl -n ···@out[0,1,2] = ($out[1], $out[2], $_); print @out if $out[1] =~ m/daffodils/;' FILE1 FILE2 ...
Nice code.
> I strongly disagree ELisp is the best text processing language.
> It's generally OK for small files, but really too slow and
> memory-intensive for more demanding tasks. Perl is generally a good
> compromise between memory usage, speed, ease of development, and
> features for any file size.
Thanks. From my experience, i also feel similar.
To summarize, i think that for small scale tasks, elisp is far more
powerful and flexible for text processing (mostly due to its buffer
datatype and the whole associated functions (e.g. the point) and few
thousand functions designed to manipulate text)
For larger scale tasks where a file size is too large to be read into
memory comfortably, then i think elisp is basically not usable.
Xah
∑ http://xahlee.org/
☄
>>>>> "xahlee" == xahlee <······@gmail.com> writes:
xahlee> this is much easier in emacs lisp.
xahlee> Here's the code that does what u want, quickly coded in 11 minutes.
xahlee> (writing this post took more than 11 min.)
xahlee> (defun my-search (file word)
xahlee> "search `file' and return adjacent lines containing the `word'.
xahlee> `file' is a file path. `word' is a string."
xahlee> (interactive) (let (neighborLines p1 p2)
xahlee> (find-file file) (while (search-forward word nil t)
xahlee> (save-excursion
xahlee> (move-beginning-of-line 1) (previous-line) (setq p1 (point))
xahlee> (next-line 2) (move-end-of-line 1) (setq p2 (point)) (setq
xahlee> neighborLines (buffer-substring-no-properties p1 p2)) (print
xahlee> neighborLines) )
xahlee> )
xahlee> ))
xahlee> (my-search "xx.el" "daffodils")
Hi,
doesn't work in general: try a word in the first line, e.g. "cloud".
What about two matches in one line? Should the program write the same output
twice?
I still don't know from the original example if we're talking about "lines" or
"sentences".
xahlee> as you can see, [...]
Why the crossposting?
Toto
PS: With emacs, you could use "grep" ;-)
--
Contact information and PGP key at
http://www.withouthat.org/~toto/contact.html
In six days the Lord created the heavens and the earth and all the
wonders therein. There are some of us who feel that He might have
taken just a little more time.
Friedman, Kinky (1993), A case of Lone Star. New York (Wings Books),
356
On Oct 16, 8:31 am, Thorsten Bonow <··············@post.rwth-
aachen.de> wrote:
> I still don't know from the original example if we're talking about "lines" or
> "sentences".
Hi, the original idea was lines but would also be very good if it can
extract the sentences around the word of interest. Or even maybe in a
further development it would extract X (5, 10 etc...) number of words
to the left and to the right of the word of interest...
Xah Lee wrote:
«
this is much easier in emacs lisp.
Here's the code that does what u want, quickly coded in 11 minutes.
(writing this post took more than 11 min.)
(defun my-search (file word)
"search `file' and return adjacent lines containing the `word'.
`file' is a file path. `word' is a string."
(interactive) (let (neighborLines p1 p2)
(find-file file) (while (search-forward word nil t)
(save-excursion
(move-beginning-of-line 1) (previous-line) (setq p1 (point))
(next-line 2) (move-end-of-line 1) (setq p2 (point)) (setq
neighborLines (buffer-substring-no-properties p1 p2)) (print
neighborLines) )
)
))
(my-search "xx.el" "daffodils")
»
Thorsten Bonow <··············@post.rwth-aachen.de> wrote:
> doesn't work in general: try a word in the first line, e.g. "cloud".
>
> What about two matches in one line? Should the program write the same output
> twice?
that example is a quick hack in 10 min to demo elisp on the given
problem, as indicated in my previous post. Adding another 5 min will
fix the problem you mention.
The point is, for text processing tasks, elisp allows you to do more,
than equal amount of time and experience with other langs such as
Common Lisp or Perl.
(See http://xahlee.org/emacs/elisp_text_processing_lang.html )
> Why the crossposting?
Because i think it is relevant. See:
Cross-posting & Language Factions
http://xahlee.org/Netiquette_dir/cross-post.html
Xah
∑ http://xahlee.org/
☄
Francogrex <······@grex.org> writes:
> Hi all, I'm trying to write a little program that searches through a
> text for a specific word and then outputs the sentence containg the
> word as well as the sentence before and the sentence after; for
> example the little text below (stored in "c:/target.txt"):
>
> I wandered lonely as a cloud
> That floats on high over vales and hills,
> When all at once I saw a crowd,
> A host, of golden daffodils;
> Beside the lake, beneath the trees,
> Fluttering and dancing in the breeze.
> William Wordsworth
>
>
> In a rather "lame" attempt I tried to write this:
OK. Let's do this with minimal changes to your original code.
The fundamental problem, which was noted by others and also yourself, is
that you are reading three lines at a time. You need to go down to one
lines at a time and rotate the lines through your variables.
It's also considered not cool to set global variables, but in fact we
don't really need them. The only really tricky part of the changes
below is the code to figure out the loop termination conditions.
> (with-open-file (s-in "c:/target.txt" :direction :input)
> (do ((line1 (read-line s-in nil nil) (read-line s-in nil nil))
replace with (line1 nil line2)
> (line2 (read-line s-in nil nil) (read-line s-in nil nil))
replace with (line2 (read-line s-in nil nil) line3)
> (line3 (read-line s-in nil nil) (read-line s-in nil nil)))
leave as is
> ((not line1))
replace with ((not line2))
> (setf cp1 (search "daffodils" line1 :test #'char-equal))
> (if (not (null cp1))
> (format t "~A ~A ~A~%"
> line1 line2 line3))
These lines above are a bit excessively complicated. First off, we
don't really need a variable named CP1 at all. Second of all, we only
want to test LINE2. Third, (NOT (NULL ...)) is redundant and can be
eliminated. Fourth, the formatting would look better with more newlines
in it. Fifth, many people don't like using IF with no ELSE clause, but
prefer to use WHEN instead. So replace the four lines above with the
simpler:
(when (search "daffodils"line2 :test #'char-equal)
(format t "~A~%~A~%~A~2%" line1 line2 line3))
and then delete all of this from here vvvvv
> (setf cp2 (search "daffodils" line2 :test #'char-equal))
> (if (not (null cp2))
> (format t "~A ~A ~A~%"
> line1 line2 line3)
> (setf cp3 (search "daffodils" line3 :test #'char-equal)))
> (if (not (null cp3))
> (format t "~A ~A ~A~%"
> line1 line2 line3))
to here ^^^^^^^^^^^^, but keep the closing parentheses
))
So, the final, consolidated routine would look like:
(defvar *source*
"I wandered lonely as a cloud
That floats on high over vales and hills,
When all at once I saw a crowd,
A host, of golden daffodils;
Beside the lake, beneath the trees,
Fluttering and dancing in the breeze.
William Wordsworth")
(with-input-from-string (s-in *source*)
(do ((line1 nil line2)
(line2 (read-line s-in nil nil) line3)
(line3 (read-line s-in nil nil) (read-line s-in nil nil)))
((not line2))
(if (search "daffodils" line2 :test #'char-equal)
(format t "~A~%~A~%~A~%~%"
line1 line2 line3))))
For your next assignment, you should turn this into a function that
takes both a stream argument for the input and the word to search for.
--
Thomas A. Russ, USC/Information Sciences Institute
On Oct 16, 2:26 am, ····@sevak.isi.edu (Thomas A. Russ) wrote:
> OK. Let's do this with minimal changes to your original code.
> The fundamental problem, which was noted by others and also yourself, is
> that you are reading three lines at a time. You need to go down to one
> lines at a time and rotate the lines through your variables. <snipped for space>.
Thanks everyone for very good suggestions, they work great. The idea
is to slowly build a simple textmining application; once I'm there (if
ever) I'll post the code online to share with all.
Dnia 15.10.2008 Francogrex <······@grex.org> napisa�/a:
> Hi all, I'm trying to write a little program that searches through a
> text for a specific word and then outputs the sentence containg the
> word as well as the sentence before and the sentence after
Presuming you meant "line" instead of "sentence", this seems to be
an ideal candidate for using ITERATE, which allows to express the
task in a very idiomatic and clear way:
(defun search-in-file (file text)
(with-open-file (f file)
(iter (for line3 = (read-line f nil))
(for line2 previous line3)
(for line1 previous line2)
(when (search text line2)
(collect (list line1 line2 line3)))
(unless (or line1 line2 line3)
(finish)))))
(search-in-file "x.txt" "daffodils")
==> (("When all at once I saw a crowd," "A host, of golden daffodils;"
"Beside the lake, beneath the trees,"))
And it works reasonably for words occurring in first and last line, too
(returning NIL where appropriate):
(search-in-file "x.txt "cloud")
==> ((NIL "I wandered lonely as a cloud"
"That floats on high over vales and hills,"))
(search-in-file "x.txt" "Wordsworth")
==> (("Fluttering and dancing in the breeze." "William Wordsworth" NIL))
--
Daniel 'Nathell' Janus, ······@nathell.korpus.pl, http://korpus.pl/~nathell
- Pro� m� �lov�k ml�et?
- Aby sly�el melodie lid� kolem sebe.
[Rok diab�a]
On Oct 15, 11:53 pm, Daniel Janus <············@nathell.korpus.pl>
wrote:
> Dnia 15.10.2008 Francogrex <······@grex.org> napisa³/a:
>
> > Hi all, I'm trying to write a little program that searches through a
> > text for a specific word and then outputs the sentence containg the
> > word as well as the sentence before and the sentence after
>
> Presuming you meant "line" instead of "sentence", this seems to be
> an ideal candidate for using ITERATE, which allows to express the
> task in a very idiomatic and clear way:
>
> (defun search-in-file (file text)
> (with-open-file (f file)
> (iter (for line3 = (read-line f nil))
> (for line2 previous line3)
> (for line1 previous line2)
> (when (search text line2)
> (collect (list line1 line2 line3)))
> (unless (or line1 line2 line3)
> (finish)))))
>
> (search-in-file "x.txt" "daffodils")
> ==> (("When all at once I saw a crowd," "A host, of golden daffodils;"
> "Beside the lake, beneath the trees,"))
>
> And it works reasonably for words occurring in first and last line, too
> (returning NIL where appropriate):
>
> (search-in-file "x.txt "cloud")
> ==> ((NIL "I wandered lonely as a cloud"
> "That floats on high over vales and hills,"))
>
> (search-in-file "x.txt" "Wordsworth")
> ==> (("Fluttering and dancing in the breeze." "William Wordsworth" NIL))
>
> --
> Daniel 'Nathell' Janus, ······@nathell.korpus.pl,http://korpus.pl/~nathell
> - Proè má èlovìk mlèet?
> - Aby sly¹el melodie lidí kolem sebe.
> [Rok diab³a]
(defun search-in-stream (stream text)
(loop for line1 = nil then line2
for line2 = nil then line3
for line3 = (read-line stream nil)
when (search text line2) collect (list line1 line2 line3)
while line3))
(defun test ()
(with-input-from-string (stream "I wandered lonely as a cloud
That floats on high over vales and hills,
When all at once I saw a crowd,
A host, of golden daffodils;
Beside the lake, beneath the trees,
Fluttering and dancing in the breeze.
William Wordsworth")
(search-in-stream stream "daffodils")))
From: William James
Subject: Re: Textmining with Lisp
Date:
Message-ID: <gd70vk$rd4$1@aioe.org>
······@corporate-world.lisp.de wrote:
> On Oct 15, 11:53�pm, Daniel Janus <············@nathell.korpus.pl>
> wrote:
> > Dnia 15.10.2008 Francogrex <······@grex.org> napisa�/a:
> >
> > > Hi all, I'm trying to write a little program that searches
> > > through a text for a specific word and then outputs the sentence
> > > containg the word as well as the sentence before and the sentence
> > > after
> >
> > Presuming you meant "line" instead of "sentence", this seems to be
> > an ideal candidate for using ITERATE, which allows to express the
> > task in a very idiomatic and clear way:
> >
> > � �(defun search-in-file (file text)
> > � � �(with-open-file (f file)
> > � � � �(iter (for line3 = (read-line f nil))
> > � � � � � � �(for line2 previous line3)
> > � � � � � � �(for line1 previous line2)
> > � � � � � � �(when (search text line2)
> > � � � � � � � �(collect (list line1 line2 line3)))
> > � � � � � � �(unless (or line1 line2 line3)
> > � � � � � � � �(finish)))))
> >
> > � �(search-in-file "x.txt" "daffodils")
> > � �==> (("When all at once I saw a crowd," "A host, of golden
> > daffodils;" � � � � �"Beside the lake, beneath the trees,"))
> >
> > And it works reasonably for words occurring in first and last line,
> > too (returning NIL where appropriate):
> >
> > � �(search-in-file "x.txt "cloud")
> > � �==> ((NIL "I wandered lonely as a cloud"
> > � � � � �"That floats on high over vales and hills,"))
> >
> > � �(search-in-file "x.txt" "Wordsworth")
> > � �==> (("Fluttering and dancing in the breeze." "William
> > Wordsworth" NIL))
> >
> > --
> > Daniel 'Nathell' Janus,
> > ······@nathell.korpus.pl,http://korpus.pl/~nathell - Pro� m� �lov�k
> > ml�et? - Aby sly�el melodie lid� kolem sebe.
> > � �[Rok diab�a]
>
>
> (defun search-in-stream (stream text)
> (loop for line1 = nil then line2
> for line2 = nil then line3
> for line3 = (read-line stream nil)
> when (search text line2) collect (list line1 line2 line3)
> while line3))
>
> (defun test ()
> (with-input-from-string (stream "I wandered lonely as a cloud
> That floats on high over vales and hills,
> When all at once I saw a crowd,
> A host, of golden daffodils;
> Beside the lake, beneath the trees,
> Fluttering and dancing in the breeze.
> William Wordsworth")
> (search-in-stream stream "daffodils")))
Instead of line1, line2, line3, I'll use a list named lines.
Ruby:
def scan_text source, sought
coll, lines = [], [ nil, nil, nil ]
begin lines.push( source.gets ).shift
coll << lines.dup if lines[1] =~ sought
end while lines.last
coll
end
p scan_text( DATA, /Poe/ )
__END__
There open fanes and gaping graves
Yawn level with the luminous waves;
But not the riches there that lie
In each idol's diamond eye --
Not the gaily-jewelled dead
Tempt the waters from their bed;
For no ripples curl, alas!
Along that wilderness of glass --
No swellings tell that winds may be
Upon some far-off happier sea --
No heavings hint that winds have been
On seas less hideously serene.
Poe
--- output ---
[["On seas less hideously serene.\n", "Poe\n", nil]]
In article <············@aioe.org>,
"William James" <·········@yahoo.com> wrote:
> ······@corporate-world.lisp.de wrote:
>
> > On Oct 15, 11:53�pm, Daniel Janus <············@nathell.korpus.pl>
> > wrote:
> > > Dnia 15.10.2008 Francogrex <······@grex.org> napisa�/a:
> > >
> > > > Hi all, I'm trying to write a little program that searches
> > > > through a text for a specific word and then outputs the sentence
> > > > containg the word as well as the sentence before and the sentence
> > > > after
> > >
> > > Presuming you meant "line" instead of "sentence", this seems to be
> > > an ideal candidate for using ITERATE, which allows to express the
> > > task in a very idiomatic and clear way:
> > >
> > > � �(defun search-in-file (file text)
> > > � � �(with-open-file (f file)
> > > � � � �(iter (for line3 = (read-line f nil))
> > > � � � � � � �(for line2 previous line3)
> > > � � � � � � �(for line1 previous line2)
> > > � � � � � � �(when (search text line2)
> > > � � � � � � � �(collect (list line1 line2 line3)))
> > > � � � � � � �(unless (or line1 line2 line3)
> > > � � � � � � � �(finish)))))
> > >
> > > � �(search-in-file "x.txt" "daffodils")
> > > � �==> (("When all at once I saw a crowd," "A host, of golden
> > > daffodils;" � � � � �"Beside the lake, beneath the trees,"))
> > >
> > > And it works reasonably for words occurring in first and last line,
> > > too (returning NIL where appropriate):
> > >
> > > � �(search-in-file "x.txt "cloud")
> > > � �==> ((NIL "I wandered lonely as a cloud"
> > > � � � � �"That floats on high over vales and hills,"))
> > >
> > > � �(search-in-file "x.txt" "Wordsworth")
> > > � �==> (("Fluttering and dancing in the breeze." "William
> > > Wordsworth" NIL))
> > >
> > > --
> > > Daniel 'Nathell' Janus,
> > > ······@nathell.korpus.pl,http://korpus.pl/~nathell - Pro� m� �lov�k
> > > ml�et? - Aby sly�el melodie lid� kolem sebe.
> > > � �[Rok diab�a]
> >
> >
> > (defun search-in-stream (stream text)
> > (loop for line1 = nil then line2
> > for line2 = nil then line3
> > for line3 = (read-line stream nil)
> > when (search text line2) collect (list line1 line2 line3)
> > while line3))
> >
> > (defun test ()
> > (with-input-from-string (stream "I wandered lonely as a cloud
> > That floats on high over vales and hills,
> > When all at once I saw a crowd,
> > A host, of golden daffodils;
> > Beside the lake, beneath the trees,
> > Fluttering and dancing in the breeze.
> > William Wordsworth")
> > (search-in-stream stream "daffodils")))
>
> Instead of line1, line2, line3, I'll use a list named lines.
>
> Ruby:
>
> def scan_text source, sought
> coll, lines = [], [ nil, nil, nil ]
> begin lines.push( source.gets ).shift
> coll << lines.dup if lines[1] =~ sought
> end while lines.last
> coll
> end
>
> p scan_text( DATA, /Poe/ )
Sure, why not.
Well, actually I'd use functional abstraction,
a arbitrary large context and a vector as context.
(defun filter-lines-with-context (stream filter context-length)
(check-type context-length (integer 1 *))
(loop with context = (make-array (1+ (* 2 context-length)) :initial-element nil)
with middle-index = context-length
with last-index = (* 2 context-length)
for line = (read-line stream nil nil)
while line
do (replace context context :start2 1) (setf (aref context last-index) line)
when (funcall filter (aref context middle-index) context)
collect (copy-seq context)))
(defun test (text string)
(with-input-from-string (stream text)
(filter-lines-with-context
stream
(lambda (line context)
(declare (ignore context))
(search string line))
1)))
(test "I wandered lonely as a cloud
That floats on high over vales and hills,
When all at once I saw a crowd,
A host, of golden daffodils;
Beside the lake, beneath the trees,
Fluttering and dancing in the breeze.
William Wordsworth"
"daffodils")
>
> __END__
> There open fanes and gaping graves
> Yawn level with the luminous waves;
> But not the riches there that lie
> In each idol's diamond eye --
> Not the gaily-jewelled dead
> Tempt the waters from their bed;
> For no ripples curl, alas!
> Along that wilderness of glass --
> No swellings tell that winds may be
> Upon some far-off happier sea --
> No heavings hint that winds have been
> On seas less hideously serene.
> Poe
>
>
> --- output ---
> [["On seas less hideously serene.\n", "Poe\n", nil]]
--
http://lispm.dyndns.org/
From: William James
Subject: Re: Textmining with Lisp
Date:
Message-ID: <gd60sp$pnc$1@aioe.org>
Francogrex wrote:
> Hi all, I'm trying to write a little program that searches through a
> text for a specific word and then outputs the sentence containg the
> word as well as the sentence before and the sentence after; for
> example the little text below (stored in "c:/target.txt"):
>
> I wandered lonely as a cloud
> That floats on high over vales and hills,
> When all at once I saw a crowd,
> A host, of golden daffodils;
> Beside the lake, beneath the trees,
> Fluttering and dancing in the breeze.
> William Wordsworth
>
> Now if I search for the word daffodils I should get as output:
> > When all at once I saw a crowd, A host, of golden daffodils;Beside
> > the lake, beneath the trees,
>
> In a rather "lame" attempt I tried to write this:
> (with-open-file (s-in "c:/target.txt" :direction :input)
> (do ((line1 (read-line s-in nil nil) (read-line s-in nil nil))
> (line2 (read-line s-in nil nil) (read-line s-in nil nil))
> (line3 (read-line s-in nil nil) (read-line s-in nil nil)))
> ((not line1))
> (setf cp1 (search "daffodils" line1 :test #'char-equal))
> (if (not (null cp1))
> (format t "~A ~A ~A~%"
> line1 line2 line3))
> (setf cp2 (search "daffodils" line2 :test #'char-equal))
> (if (not (null cp2))
> (format t "~A ~A ~A~%"
> line1 line2 line3)
> (setf cp3 (search "daffodils" line3 :test #'char-equal)))
> (if (not (null cp3))
> (format t "~A ~A ~A~%"
> line1 line2 line3))))
> But the output is:
> > A host, of golden daffodils; Beside the lake, beneath the trees,
> > Fluttering and dancing in the breeze.
>
> So it's outputting 3 stences alright, but not the one before and the
> one after (understandably because the function just stores 3 sentences
> at a time).
Ruby:
def search_file filename, sought
puts IO.read(filename).
scan( /^.*\n?.*#{ Regexp.escape( sought ) }.*\n?.*$/ ).
join("\n----\n")
end
search_file "data1", " and "
I wandered lonely as a cloud
That floats on high over vales and hills,
When all at once I saw a crowd,
----
Beside the lake, beneath the trees,
Fluttering and dancing in the breeze.
William Wordsworth
search_file "data1", "cloud"
I wandered lonely as a cloud
That floats on high over vales and hills,
search_file "data1", "Words"
Fluttering and dancing in the breeze.
William Wordsworth
def seach_file_overlapping filename, sought, a = []
last = ( lines = IO.readlines(filename) ).size - 1
lines.each_with_index{|str,i|
if str.index( sought )
a << lines[ [i-1,0].max .. [i+1,last].min ]
end }
a
end
p seach_file_overlapping( "data1", " the " )
[["A host, of golden daffodils;\n",
"Beside the lake, beneath the trees,\n",
"Fluttering and dancing in the breeze.\n"],
["Beside the lake, beneath the trees,\n",
"Fluttering and dancing in the breeze.\n",
"William Wordsworth\n"]]
--