Reading and processing text

From: andrea
Subject: Reading and processing text
Date: Wed, 13 Dec 2006 00:14:39 +0000
Message-ID: <1165968878.939784.62880@j72g2000cwa.googlegroups.com>

Hello everyones, I need to write a lisp program that takes a text,
justifies it and writes it to another file.

It shouldn't be too difficul but I found a bunch of little and very
annoying problems.

Which data type? I'm trying to use strings and putting them inside
lists. Every paragraph should be a list of lines, and the final text
should be a list of paragraphs.
This is the function that read the all file for example

(defun read-src (stream list)
  "takes an open stream and a list"
  (let ((word (read-line stream nil 'eof)))
    (cond
     ((eq word 'eof) list)
     (t (read-src stream (append list (list word)))))))

The problem is that with strings is that I don't have some of the
useful function I have with lists, and I find myself to write something
like that

(defun space-count (string)
  "count spaces in a string"
  (let ((counter 0))
    (dotimes (i (length string))
      (if
	  (char-equal (char string i) #\ )
	  (setq counter (1+ counter))))
    counter))

Just to count spaces in a string, which I think it's pretty ugly!

Then I have to reformat the text trying to justify it without splitting
words, in my implementation I should move words from a list to the
other, and it's very unconfortable and not really clever.

I think I should just put everything in a big structure (list/string)
and work on that.

It could be useful to use big strings that represents every paragraph
and putting them in a list, but how do I create this while I read the
input??
I would need to modify this:

  (let ((word (read-line stream nil 'eof)))
    (cond
     ((eq word 'eof) list)

To have two base cases, when it's the end of the file and when i find
".\n", but I really don't understand how...

And last question is about dividing words, I have a set of rules (based
on the type of char I find)

;; this is also pretty ugly
(setq vocali_d '(#\a #\e #\i #\o #\u #\j #\y))
(setq vocali (append vocali_d (mapcar #'char-upcase vocali_d)))
(setq cons_d '(#\b #\c #\d #\f #\g #\h #\k #\l #\m #\n #\p #\q #\r #\s
#\t #\v #\x #\w #\z))
(setq cons (append cons_d (mapcar #'char-upcase cons_d)))
(setq spec_cons '(#\l #\m #\n #\r))

And with char-member
(defun char-member (char set)
  "verifica se un certo carattere fa parte di un insieme"
  (cond
   ((null set) nil)
   ((char-equal (car set) char) T)
   (T (char-member char (cdr set)))))

it verifies if I match one of the rules, but I have a few problems...
If I have for example
"mamma" the first syllabe "ma" matches BUT following the rules the next
"m" should go with the first syllabe!
I don't like to use setq or setf, but in this case I don't find
anything else then brutals assignement.

Sorry for the lenght of the post, I hope someone could help me, thank
you!

Re: Reading and processing text Vebjorn Ljosa
- Re: Reading and processing text andrea
  - Re: Reading and processing text Luigi Panzeri
Re: Reading and processing text Greg Buchholz
- Re: Reading and processing text Bill Atkins
  - Re: Reading and processing text Thomas A. Russ
Re: Reading and processing text Zach Beane
- Re: Reading and processing text Greg Buchholz
  - Re: Reading and processing text Zach Beane
    - Re: Reading and processing text GP lisper
- Re: Reading and processing text William James
  - Re: Reading and processing text Bill Atkins
  - Re: Reading and processing text Sacha
  - Re: Reading and processing text Thomas A. Russ
  - Re: Reading and processing text =?ISO-8859-15?Q?Andr=E9_Thieme?=
Re: Reading and processing text Andreas Thiele
Re: Reading and processing text Andreas Thiele
Re: Reading and processing text Sidney Markowitz
- Re: Reading and processing text andrea

From: Vebjorn Ljosa
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 00:23:59 +0000
Message-ID: <1165969439.029475.108560@80g2000cwy.googlegroups.com>

andrea wrote:
> Hello everyones, I need to write a lisp program that takes a text,
> justifies it and writes it to another file.
>
> Which data type? I'm trying to use strings and putting them inside
> lists. Every paragraph should be a list of lines, and the final text
> should be a list of paragraphs.

Sounds like a homework assignment, but here is a hint:
Have you thought of making each paragraph a list of words?

Thanks,
Vebjorn

From: andrea
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 00:32:03 +0000
Message-ID: <1165969923.813981.245910@n67g2000cwd.googlegroups.com>

On 13 Dic, 01:23, "Vebjorn Ljosa" <·······@ljosa.com> wrote:
> andrea wrote:
> > Hello everyones, I need to write a lisp program that takes a text,
> > justifies it and writes it to another file.
>
> > Which data type? I'm trying to use strings and putting them inside
> > lists. Every paragraph should be a list of lines, and the final text
> > should be a list of paragraphs.Sounds like a homework assignment, but here is a hint:
> Have you thought of making each paragraph a list of words?
>
> Thanks,
> Vebjorn

Yes it's a homework, but of course I don't want the code for it, I just
need some ideas to solve my problems.

Each paragraph could be a list of word, but how can I do it directly in
the read-src function using recursion??
Thanks again

From: Luigi Panzeri
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 13:40:50 +0000
Message-ID: <87d56nap7h.fsf@matley.muppetslab.org>

"andrea" <········@gmail.com> writes:

> On 13 Dic, 01:23, "Vebjorn Ljosa" <·······@ljosa.com> wrote:
>> andrea wrote:
>> > Hello everyones, I need to write a lisp program that takes a text,
>> > justifies it and writes it to another file.
>>
>> > Which data type? I'm trying to use strings and putting them inside
>> > lists. Every paragraph should be a list of lines, and the final text
>> > should be a list of paragraphs.Sounds like a homework assignment, but here is a hint:
>> Have you thought of making each paragraph a list of words?
>>
>> Thanks,
>> Vebjorn

Before choosing the datatype you want to play with, choose an
algorithm. I suppose that you mean for justify something like
fill-paragraph (emacs command).

So you want to divide your input in paragraph, remove all newlines and
spaces from each paragraph, put one space after each word (or
unbreakable item like "Hi!") and newlines in order to not outcome a
fill-column argument.

So a solution can be something like:

(defun justify-stream (&optional (fill-column 70) (input-stream *standard-input*) (output-stream *standard-output*))
  (loop 
     for paragraph = (read-paragraph input-stream 'eof)
     until (eq paragraph 'eof)
     do
       (princ (justify-paragraph paragraph fill-column) output-stream)
       (princ #\Newline)))

(defun justify-paragraph (paragraph fill-column)
  (let ((raw-paragraph (remove #\Newline paragraph)))
    (with-output-to-paragraph (paragraph)
      (let ((current-fill 0))
	(with-input-from-paragraph (input raw-paragraph)
	  (loop
	     for unbreakable = (read-unbreakable input 'eop)
	     until (eq unbreakable 'eop)

	     for length = (length unbreakable)

	     when (> (+ current-fill length) fill-column)
	     do (progn (princ #\Newline paragraph)
		       (setf current-fill 0))

	     do
	     (progn
	       (unless (zerop current-fill)
		 (princ #\Space paragraph)
		 (incf current-fill))
	       (princ unbreakable paragraph) 
	       (incf current-fill length))))))))

Ok, in this way you can now choose the data type you prefer for
paragraphs and the words (string, list, fixed array or clos object),
provided that you define read-paragraph, read-unbreakable,
with-input-from-paragraph, remove, princ, etc.

So if for example we choose the string as data type a possible
implementation can be (i supposed that paragraph end when . is
reached):

(defun read-paragraph (stream eof-value)
  "Read text from input-stream until #\."
  (let ((ret
	 (with-output-to-string (out)
	   (loop 
	      for char = (read-char stream nil nil)
	      when char do (princ char out)
	      until (member char '(#\. nil))))))
    (if (> (length ret) 0)
	ret
	eof-value)))

(defun read-unbreakable (stream eof-value)
  "Read text from stream until #\Space or #\Tab"
  (let ((ret
	 (with-output-to-string (out)
	   (loop
	      for char = (read-char stream nil nil)
	      when char do (princ char out)
	      until (member char '(nil #\Space #\Tab))))))
    (if (> (length ret) 0)
	(remove #\Space ret)
	eof-value)))

(defun justify-paragraph (paragraph fill-column)
  (let ((raw-paragraph (remove #\Newline paragraph)))
    (with-output-to-string (paragraph)
      (let ((current-fill 0))
	(with-input-from-string (input raw-paragraph)
	  (loop
	     for unbreakable = (read-unbreakable input 'eop)
	     until (eq unbreakable 'eop)

	     for length = (length unbreakable)

	     when (> (+ current-fill length) 
			fill-column)
	     do (progn (princ #\Newline paragraph)
		       (setf current-fill 0))

	     do
	     (progn
	       (unless (zerop current-fill)
		 (princ #\Space paragraph)
		 (incf current-fill))
	       (princ unbreakable paragraph) 
	       (incf current-fill length))))))))


>
> Each paragraph could be a list of word, but how can I do it directly in
> the read-src function using recursion??
> Thanks again
>

If you don't like the style looping for char and collecting it you can
always transform it to a recursive function with accumulator:

(defun read-unbreakable (stream eof-value &optional (acc ""))
  (let ((char (read-char stream nil nil))
	(separators '(#\Space #\Tab)))
    (cond 
      ((null char) eof-value)      
      ((member char separators) acc)
      (t (read-unbreakable stream eof-value (format nil "~a~c" acc char))))))


If your problem is not a homework you should consider to not reinvent
the wheel and watch for libreries (see cl-user.net) that can help you
to do parsing, splitting, matching regular expression and so on.

From: Greg Buchholz
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 00:54:16 +0000
Message-ID: <1165971256.187736.202670@j72g2000cwa.googlegroups.com>

andrea wrote:
> The problem is that with strings is that I don't have some of the
> useful function I have with lists, and I find myself to write something
> like that
>
> (defun space-count (string)
>   "count spaces in a string"
>   (let ((counter 0))
>     (dotimes (i (length string))
>       (if
> 	  (char-equal (char string i) #\ )
> 	  (setq counter (1+ counter))))
>     counter))
>
> Just to count spaces in a string, which I think it's pretty ugly!

(defun space-count (string)
  (reduce (lambda (acc c) (if (char-equal c #\ ) (1+ acc) acc))
          string :initial-value 0))

From: Bill Atkins
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 04:45:49 +0000
Message-ID: <m23b7kh08y.fsf@bertrand.local>

"Greg Buchholz" <················@yahoo.com> writes:

> andrea wrote:
>> The problem is that with strings is that I don't have some of the
>> useful function I have with lists, and I find myself to write something
>> like that
>>
>> (defun space-count (string)
>>   "count spaces in a string"
>>   (let ((counter 0))
>>     (dotimes (i (length string))
>>       (if
>> 	  (char-equal (char string i) #\ )
>> 	  (setq counter (1+ counter))))
>>     counter))
>>
>> Just to count spaces in a string, which I think it's pretty ugly!
>
> (defun space-count (string)
>   (reduce (lambda (acc c) (if (char-equal c #\ ) (1+ acc) acc))
>           string :initial-value 0))

What's wrong with:

 (count #\space string)

?

From: Thomas A. Russ
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 18:07:28 +0000
Message-ID: <ymi7iwvllen.fsf@sevak.isi.edu>

Bill Atkins <······@rpi.edu> writes:

> "Greg Buchholz" <················@yahoo.com> writes:
> 
> > andrea wrote:
> >> The problem is that with strings is that I don't have some of the
> >> useful function I have with lists, and I find myself to write something
> >> like that
> >>
> >> (defun space-count (string)
> >>   "count spaces in a string"
> >>   (let ((counter 0))
> >>     (dotimes (i (length string))
> >>       (if
> >> 	  (char-equal (char string i) #\ )
> >> 	  (setq counter (1+ counter))))
> >>     counter))
> >>
> >> Just to count spaces in a string, which I think it's pretty ugly!
> >
> > (defun space-count (string)
> >   (reduce (lambda (acc c) (if (char-equal c #\ ) (1+ acc) acc))
> >           string :initial-value 0))
> 
> What's wrong with:
> 
>  (count #\space string)
> ?

Precisely.  The OP should remember that strings are just one special
type of SEQUENCE (as are lists) and that all of the sequence functions
in common lisp will also work on strings.

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Zach Beane
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 01:24:27 +0000
Message-ID: <m3ejr4a8qc.fsf@unnamed.xach.com>

"andrea" <········@gmail.com> writes:

> The problem is that with strings is that I don't have some of the
> useful function I have with lists, and I find myself to write something
> like that
> 
> (defun space-count (string)
>   "count spaces in a string"
>   (let ((counter 0))
>     (dotimes (i (length string))
>       (if
> 	  (char-equal (char string i) #\ )
> 	  (setq counter (1+ counter))))
>     counter))
> 
> Just to count spaces in a string, which I think it's pretty ugly!

This /is/ ugly, and so is Greg's suggestion. Use this:

   (count #\Space string) 
 
> Then I have to reformat the text trying to justify it without splitting
> words, in my implementation I should move words from a list to the
> other, and it's very unconfortable and not really clever.

Must you justify, or just line-wrap?

Here's a function I used for http://wigflip.com/minifesto/ . I don't
think it fits exactly what you want, and it ain't the prettiest code
ever, but perhaps it can give you some ideas.

   (defun paginate (text font &key
                    (height *default-height*) (width *default-width*)
                    (left-margin 2) (right-margin 2)
                    (top-margin 2) (bottom-margin 2))
     "Return TEXT split up into sets of lines and pages, wrapped on word
   boundaries."
     (let ((pages '())
           (page '())
           (start 0)
           (mark nil)
           (i 0)
           (x left-margin)
           (page-line-count 0)
           (lines-per-page (truncate (- height top-margin bottom-margin)
                                     (+ (ymax font) 2))))
       (when (zerop lines-per-page)
         (error "Insufficient vertical space for this font"))
       (labels ((save-page ()
                  (push (nreverse page) pages)
                  (setf page '()))
                (save-line (end next-i)
                  (when (= page-line-count lines-per-page)
                    (save-page)
                    (setf page-line-count 0))
                  (incf page-line-count)
                  (push (subseq text start end) page)
                  ;; reset everything, (maybe) advancing i
                  (setf i     next-i
                        start i
                        mark  nil
                        x     left-margin))
                (mark ()
                  (setf mark i))
                (mark-char ()
                  (aref text mark)))
         (loop
          (when (>= i (length text))
            (save-line i i)
            (save-page)
            (return (nreverse pages)))
          (let* ((char (char text i))
                 (glyph (find-glyph char font))
                 (char-width (bitmap-width glyph)))
            ;; Ok, what to do with this next character?
            (when (eql char #\Space)
              ;; Spaces are marked early, so even spaces that technically
              ;; exceed the line width are marked as break candidates, since
              ;; spaces are invisible
              (mark))
            (cond ((eql char #\Newline)
                   ;; break at the newline
                   (save-line i (1+ i)))
                  ((< (+ x char-width right-margin) width)
                   ;; Within the line, so no need to break
                   (when (eql char #\-)
                     ;; Dashes are marked late, so only dashes that fit into
                     ;; the current line are marked as break candidates
                     (mark))
                   (incf i)
                   (incf x (advance-width glyph)))
                  ((= x left-margin)
                   ;; Uh oh, out of room and we're already at the left margin?
                   (error "No space for input character ~D (~S) in output"
                          i char))
                  ((null mark)
                   ;; Beyond the line, but no breakable point
                   (save-line i i))
                  ((eql (mark-char) #\Space)
                   ;; Beyond the line, break at marked space
                   (save-line mark (1+ mark)))
                  ((eql (mark-char) #\-)
                   ;; Beyond the line, break after marked dash
                   (save-line (1+ mark) (1+ mark)))
                  (t
                   (error "Unexpected state: mark = ~D, mark-char = ~S"
                          mark (mark-char)))))))))

Here's example output (the page is very small):

   (with-font (font *font-file* 'minifont)
     (paginate (format nil "Hello, sir, here's new-style text ~
                            for you to justify and wrap into ~
                            lines and pages. I hope you enjoy it.")
                       font))
   => (("Hello, sir, here's new-" "style text for you to" 
        "justify and wrap into" "lines and pages. I" 
        "hope you enjoy it."))

Zach

From: Greg Buchholz
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 04:38:32 +0000
Message-ID: <1165984712.517843.161960@j44g2000cwa.googlegroups.com>

Zach Beane wrote:
> Use this:
>
>    (count #\Space string)
>

    Nice.  Is there a tool available to search for functions based on
the parameters and return type?  Using terms from the hyperspec...

    http://www.lisp.org/HyperSpec/Body/fun_countcm_c_count-if-not.html

... you might search for somthing like "a function which takes an
object and a proper sequence, and retuns an integer".  Hoogle is an
example of a similar tool from the Haskell world...

    http://haskell.org/hoogle/

Thanks,

Greg Buchholz

From: Zach Beane
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 12:22:01 +0000
Message-ID: <m3ac1s9eae.fsf@unnamed.xach.com>

"Greg Buchholz" <················@yahoo.com> writes:

> Zach Beane wrote:
> > Use this:
> >
> >    (count #\Space string)
> >
> 
>     Nice.  Is there a tool available to search for functions based on
> the parameters and return type? 

The hyperspec is organized by concept. Functions useful for strings
will be in either the strings dictionary or the sequences dictionary.

Zach

From: GP lisper
Subject: Re: Reading and processing text
Date: Sat, 16 Dec 2006 00:16:44 +0000
Message-ID: <slrneo6enc.o9q.spambait@phoenix.clouddancer.com>

On 13 Dec 2006 07:22:01 -0500, <····@xach.com> wrote:
> "Greg Buchholz" <················@yahoo.com> writes:
>> Zach Beane wrote:
>> >
>> >    (count #\Space string)
>> 
>>     Nice.  Is there a tool available to search for functions based on
>> the parameters and return type? 
>
> The hyperspec is organized by concept. Functions useful for strings
> will be in either the strings dictionary or the sequences dictionary.

That's Chapter 16.2 and 17.3

-- 
Common Lisp: You have it, you have them all.
Reply-To: email is ignored.

-- 
Posted via a free Usenet account from http://www.teranews.com

From: William James
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 04:48:57 +0000
Message-ID: <1165985337.467554.223500@73g2000cwn.googlegroups.com>

Zach Beane wrote:
> "andrea" <········@gmail.com> writes:
>
> > The problem is that with strings is that I don't have some of the
> > useful function I have with lists, and I find myself to write something
> > like that
> >
> > (defun space-count (string)
> >   "count spaces in a string"
> >   (let ((counter 0))
> >     (dotimes (i (length string))
> >       (if
> > 	  (char-equal (char string i) #\ )
> > 	  (setq counter (1+ counter))))
> >     counter))
> >
> > Just to count spaces in a string, which I think it's pretty ugly!
>
> This /is/ ugly, and so is Greg's suggestion. Use this:
>
>    (count #\Space string)
>
> > Then I have to reformat the text trying to justify it without splitting
> > words, in my implementation I should move words from a list to the
> > other, and it's very unconfortable and not really clever.
>
> Must you justify, or just line-wrap?
>
> Here's a function I used for http://wigflip.com/minifesto/ . I don't
> think it fits exactly what you want, and it ain't the prettiest code
> ever, but perhaps it can give you some ideas.
>
>    (defun paginate (text font &key
>                     (height *default-height*) (width *default-width*)
>                     (left-margin 2) (right-margin 2)
>                     (top-margin 2) (bottom-margin 2))
>      "Return TEXT split up into sets of lines and pages, wrapped on word
>    boundaries."
>      (let ((pages '())
>            (page '())
>            (start 0)
>            (mark nil)
>            (i 0)
>            (x left-margin)
>            (page-line-count 0)
>            (lines-per-page (truncate (- height top-margin bottom-margin)
>                                      (+ (ymax font) 2))))
>        (when (zerop lines-per-page)
>          (error "Insufficient vertical space for this font"))
>        (labels ((save-page ()
>                   (push (nreverse page) pages)
>                   (setf page '()))
>                 (save-line (end next-i)
>                   (when (= page-line-count lines-per-page)
>                     (save-page)
>                     (setf page-line-count 0))
>                   (incf page-line-count)
>                   (push (subseq text start end) page)
>                   ;; reset everything, (maybe) advancing i
>                   (setf i     next-i
>                         start i
>                         mark  nil
>                         x     left-margin))
>                 (mark ()
>                   (setf mark i))
>                 (mark-char ()
>                   (aref text mark)))
>          (loop
>           (when (>= i (length text))
>             (save-line i i)
>             (save-page)
>             (return (nreverse pages)))
>           (let* ((char (char text i))
>                  (glyph (find-glyph char font))
>                  (char-width (bitmap-width glyph)))
>             ;; Ok, what to do with this next character?
>             (when (eql char #\Space)
>               ;; Spaces are marked early, so even spaces that technically
>               ;; exceed the line width are marked as break candidates, since
>               ;; spaces are invisible
>               (mark))
>             (cond ((eql char #\Newline)
>                    ;; break at the newline
>                    (save-line i (1+ i)))
>                   ((< (+ x char-width right-margin) width)
>                    ;; Within the line, so no need to break
>                    (when (eql char #\-)
>                      ;; Dashes are marked late, so only dashes that fit into
>                      ;; the current line are marked as break candidates
>                      (mark))
>                    (incf i)
>                    (incf x (advance-width glyph)))
>                   ((= x left-margin)
>                    ;; Uh oh, out of room and we're already at the left margin?
>                    (error "No space for input character ~D (~S) in output"
>                           i char))
>                   ((null mark)
>                    ;; Beyond the line, but no breakable point
>                    (save-line i i))
>                   ((eql (mark-char) #\Space)
>                    ;; Beyond the line, break at marked space
>                    (save-line mark (1+ mark)))
>                   ((eql (mark-char) #\-)
>                    ;; Beyond the line, break after marked dash
>                    (save-line (1+ mark) (1+ mark)))
>                   (t
>                    (error "Unexpected state: mark = ~D, mark-char = ~S"
>                           mark (mark-char)))))))))
>
> Here's example output (the page is very small):
>
>    (with-font (font *font-file* 'minifont)
>      (paginate (format nil "Hello, sir, here's new-style text ~
>                             for you to justify and wrap into ~
>                             lines and pages. I hope you enjoy it.")
>                        font))
>    => (("Hello, sir, here's new-" "style text for you to"
>         "justify and wrap into" "lines and pages. I"
>         "hope you enjoy it."))
>
> Zach

Using Commune Lisp is a very painful route.  It's easier to use a
modern, high-level language such as MatzLisp (Ruby).

line_width = 65

puts gets(nil).  # Read the whole file.
  split( /(?:\n[ \t]*)+\n/ ).  # Break it into paragraphs.
  map{|par|
    par.gsub("\n", " ").
    # Word wrap.
    scan(/\S.{0,#{line_width-2}}\S(?=\s|$)|\S+/) << "\n" }


If the text file contains this ...


For some: if it's not Common Lisp or Scheme it
must be bad. They should study the history of
LISP. Before Common Lisp came along LISP was
flourishing in many different implementations and
feeding on many different concepts in Computer
Science.

Common Lisp came along to create a standard by committee, a
kitchen sink of features and concepts, liked and used by
some and critiqued by others. Here a qote from [Brooks and
Gabriel 1984, 'A Critique of Common Lisp']:

Every decision of the committee can be
locally rationalized as the right thing.
We believe that the sum of these
decisions, however, has produced
something greater than its parts; an
unwieldy, overweight beast, with
significant costs (especially on other
than micro-codable personal Lisp
engines) in compiler size and speed, in
runtime performance, in programmer
overhead needed to produce efficient
programs, and in intellectual overload
for a programmer wishing to be a
proficient COMMON LISP programmer.


... then the output will be:


For some: if it's not Common Lisp or Scheme it must be bad. They
should study the history of LISP. Before Common Lisp came along
LISP was flourishing in many different implementations and
feeding on many different concepts in Computer Science.

Common Lisp came along to create a standard by committee, a
kitchen sink of features and concepts, liked and used by some and
critiqued by others. Here a qote from [Brooks and Gabriel 1984,
'A Critique of Common Lisp']:

Every decision of the committee can be locally rationalized as
the right thing. We believe that the sum of these decisions,
however, has produced something greater than its parts; an
unwieldy, overweight beast, with significant costs (especially on
other than micro-codable personal Lisp engines) in compiler size
and speed, in runtime performance, in programmer overhead needed
to produce efficient programs, and in intellectual overload for a
programmer wishing to be a proficient COMMON LISP programmer.

From: Bill Atkins
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 05:11:28 +0000
Message-ID: <m2wt4w8jnj.fsf@bertrand.local>

"William James" <·········@yahoo.com> writes:

>
> Using Commune Lisp is a very painful route.  It's easier to use a
> modern, high-level language such as MatzLisp (Ruby).
>
> line_width = 65
>
> puts gets(nil).  # Read the whole file.
>   split( /(?:\n[ \t]*)+\n/ ).  # Break it into paragraphs.
>   map{|par|
>     par.gsub("\n", " ").
>     # Word wrap.
>     scan(/\S.{0,#{line_width-2}}\S(?=\s|$)|\S+/) << "\n" }
>
>
> If the text file contains this ...
>
>
> For some: if it's not Common Lisp or Scheme it
> must be bad. They should study the history of
> LISP. Before Common Lisp came along LISP was
> flourishing in many different implementations and
> feeding on many different concepts in Computer
> Science.
>
> Common Lisp came along to create a standard by committee, a
> kitchen sink of features and concepts, liked and used by
> some and critiqued by others. Here a qote from [Brooks and
> Gabriel 1984, 'A Critique of Common Lisp']:
>
> Every decision of the committee can be
> locally rationalized as the right thing.
> We believe that the sum of these
> decisions, however, has produced
> something greater than its parts; an
> unwieldy, overweight beast, with
> significant costs (especially on other
> than micro-codable personal Lisp
> engines) in compiler size and speed, in
> runtime performance, in programmer
> overhead needed to produce efficient
> programs, and in intellectual overload
> for a programmer wishing to be a
> proficient COMMON LISP programmer.
>
>
> ... then the output will be:
>
>
> For some: if it's not Common Lisp or Scheme it must be bad. They
> should study the history of LISP. Before Common Lisp came along
> LISP was flourishing in many different implementations and
> feeding on many different concepts in Computer Science.
>
> Common Lisp came along to create a standard by committee, a
> kitchen sink of features and concepts, liked and used by some and
> critiqued by others. Here a qote from [Brooks and Gabriel 1984,
> 'A Critique of Common Lisp']:
>
> Every decision of the committee can be locally rationalized as
> the right thing. We believe that the sum of these decisions,
> however, has produced something greater than its parts; an
> unwieldy, overweight beast, with significant costs (especially on
> other than micro-codable personal Lisp engines) in compiler size
> and speed, in runtime performance, in programmer overhead needed
> to produce efficient programs, and in intellectual overload for a
> programmer wishing to be a proficient COMMON LISP programmer.

Troll.

From: Sacha
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 06:06:51 +0000
Message-ID: <%LMfh.234127$ad4.5192758@phobos.telenet-ops.be>

William James wrote:

> line_width = 65
> 
> puts gets(nil).  # Read the whole file.
>   split( /(?:\n[ \t]*)+\n/ ).  # Break it into paragraphs.
>   map{|par|
>     par.gsub("\n", " ").
>     # Word wrap.
>     scan(/\S.{0,#{line_width-2}}\S(?=\s|$)|\S+/) << "\n" }
> 

You'd better fix this adsl filter.
As you can see, your lisp code came out totally
screwed with line noise.

Sacha

From: Thomas A. Russ
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 18:12:21 +0000
Message-ID: <ymi3b7jll6i.fsf@sevak.isi.edu>

"William James" <·········@yahoo.com> writes:

> 
> Using Commune Lisp is a very painful route.  It's easier to use a
> modern, high-level language such as MatzLisp (Ruby).

Oh, please.  Just because there is a built-in library for regular
expressions doesn't mean you couldn't do something very similar in
Common Lisp using one of the non-built-in libraries for regular
expressions.

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: =?ISO-8859-15?Q?Andr=E9_Thieme?=
Subject: Re: Reading and processing text
Date: Thu, 14 Dec 2006 16:58:19 +0000
Message-ID: <elrvrc$pib$1@registered.motzarella.org>

William James schrieb:

Nice to have the name of a famous psychologist.


> Using Commune Lisp is a very painful route.  It's easier to use a
> modern, high-level language such as MatzLisp (Ruby).

This is a typical scripting task. It is obvious that a scripting
language can solve tasks in its specific domain specific very well.
Your code is so short because you used regex, which probably isn't
allowed for this home work.


> line_width = 65
> 
> puts gets(nil).  # Read the whole file.
>   split( /(?:\n[ \t]*)+\n/ ).  # Break it into paragraphs.
>   map{|par|
>     par.gsub("\n", " ").
>     # Word wrap.
>     scan(/\S.{0,#{line_width-2}}\S(?=\s|$)|\S+/) << "\n" }


You could do this in Lisp as well. Just use cl-ppcre.
Rubys "scan" can be expressed with scan-to-string in Lisp.
For gsub you would use regex-replace-all in Lisp.
To split the text into paragraphs a Lisper could use spilt.
Reading a whole file in one step is solved well in scripting
languages, as this is a typical task for scripting.
Here a Lisper would have to use a tool for it from his repertoire
or write this 10 liner.

So in the end one could have a similar solution as you have,
but a bit longer as the Ruby version.
Lisp also has no built in tool that is as easy to use as your
#{line_width-2}.


> ... then the output will be:
> 
> 
> For some: if it's not Common Lisp or Scheme it must be bad. They
> should study the history of LISP. Before Common Lisp came along
> LISP was flourishing in many different implementations and
> feeding on many different concepts in Computer Science.
> 
> Common Lisp came along to create a standard by committee, a
> kitchen sink of features and concepts, liked and used by some and
> critiqued by others. Here a qote from [Brooks and Gabriel 1984,
> 'A Critique of Common Lisp']:
> 
> Every decision of the committee can be locally rationalized as
> the right thing. We believe that the sum of these decisions,
> however, has produced something greater than its parts; an
> unwieldy, overweight beast, with significant costs (especially on
> other than micro-codable personal Lisp engines) in compiler size
> and speed, in runtime performance, in programmer overhead needed
> to produce efficient programs, and in intellectual overload for a
> programmer wishing to be a proficient COMMON LISP programmer.

And depending on what "justified text" means this output could be
the correct one:


For some: if it's not Common Lisp or Scheme it must be bad.  They
should study the history of LISP.  Before Common Lisp came  along
LISP  was  flourishing  in  many  different  implementations  and
feeding   on  many  different  concepts   in   Computer  Science.

Common Lisp  came  along  to  create  a standard by committee,  a
kitchen sink of features and concepts, liked and used by some and
critiqued  by others. Here  a  qote from [Brooks and Gabriel 1984,
'A Critique of Common Lisp']:

Every decision  of the committee can  be locally rationalized  as
the right thing.  We  believe that  the sum  of  these decisions,
however,  has  produced  something greater  than  its  parts;  an
unwieldy, overweight beast, with significant costs (especially on
other than micro-codable personal Lisp engines) in compiler  size
and speed, in runtime performance, in programmer overhead  needed
to produce efficient programs, and in intellectual overload for a
programmer wishing to be a proficient COMMON LISP programmer.


Andr�
--

From: Andreas Thiele
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 11:35:26 +0000
Message-ID: <elooi5$4o7$00$1@news.t-online.com>

"andrea" <········@gmail.com> schrieb im Newsbeitrag ····························@j72g2000cwa.googlegroups.com...
> Hello everyones, I need to write a lisp program that takes a text,
> justifies it and writes it to another file.
> ...

Hi Andrea,

a hint:

(setq counter (1+ count))

can be written as

(incf counter).


I don't know what defines a paragraph? I guess two consecutive #\Newline character. I think you are primarily interested in individual words.

Here is how I'd tackle the problem (hope my news client doesn't scramble line breaks :)):

(defun read-input (stream)
  "First step: rough sketch"
  )

(defun tst ()
  (with-input-from-string (s "This is a test
in a few lines.

Two Newlines marking a paragraph.")
    (read-input s)))

With the above two I can do interactive testing. Now I do refinement:

(defun read-input (stream)
  "Second step: Gain confidence by simple returning input as string.
I choose a simple imperative approach."
  (loop for c = (read-char stream nil nil) 
        with result = "" do
        (unless c (return result))
        (setf result (concatenate 'string result (string c)))))

(defun read-input (stream)
  "Third step: Writing a parser returning a list of words"
  (loop for c = (read-char stream nil nil) 
        with result = nil
        with word = "" do
        (unless c 
          (unless (string= word "")
            (push word result))
          (return (nreverse result)))
        (if (find c '(#\Space #\Newline))
            (progn
              (unless (string= word "")
                (push word result))
              (setf word ""))
          (setf word (concatenate 'string word (string c))))))

(defun read-input (stream)
  "Step 4: Refining the approach; we return a list of strings (denoting words) or symbols.
The keyword symbol :par denoted a paragraph."
  (loop for c = (read-char stream nil nil) 
        with one-newline nil
        with result = nil
        with word = "" do
        (unless c 
          (unless (string= word "")
            (push word result))
          (return (nreverse result)))
        (cond ((find c '(#\Space #\Newline))
               (unless (string= word "")
                 (push word result)
                 (setf word ""))
               (when (eq c #\Newline)
                 (if one-newline
                     (push :par result)
                   (setf one-newline t))))
              (t            
               (setf one-newline nil
                     word (concatenate 'string word (string c)))))))

Final test output is:

CL-USER 14 > (tst)
("This" "is" "a" "test" "in" "a" "few" "lines." :PAR "Two" "Newlines" "marking" "a" "paragraph.")

Perhaps interpunctation (full stop, comma etc. should be represented seperately - eventually using :dot, :comma etc.).

Andreas

From: Andreas Thiele
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 11:52:51 +0000
Message-ID: <elopiq$1d2$01$1@news.t-online.com>

"andrea" <········@gmail.com> schrieb im Newsbeitrag ····························@j72g2000cwa.googlegroups.com...
> ...
> The problem is that with strings is that I don't have some of the
> useful function I have with lists ...

Strings are sequences, list are also sequences. All sequence functions can be applied to strings and lists as well.
You should browse the CLHS for `sequences'.

Andreas

From: Sidney Markowitz
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 04:23:01 +0000
Message-ID: <457f802a$0$69039$742ec2ed@news.sonic.net>

andrea wrote, On 13/12/06 1:14 PM:
> Hello everyones, I need to write a lisp program that takes a text,
> justifies it and writes it to another file.

I would take a different approach than some of the other posters, and I
think this will lead you to learning more Lisp without me doing the
homework for you.

One way of dealing with this is by writing one function at a time that
you can get working independently of the others, playing with them
interactively in the REPL (Read/Eval/Print Loop), then hook them
together to make your program.

Since you are justifying words, you are doing it one paragraph at a
time. So your input file is really a collection of paragraphs. You can
write a function that reads a file stream and returns a list of
paragraphs. You already did that with read-src, but keep in mind that
you could do some things differently if necessary. You could concatenate
the strings with a space between them so a paragraph is one continuous
string of words, eliminating having to deal with separate lines. You
could have the function return at the end of a paragraph so you don't
have to read the entire file at once. The next time you call the
function you pass it the same stream and it just keeps on reading. But
notice that you can decide to make those changes later if they prove useful.

The next function that will be useful will convert a paragraph into a
list of words as case-sensitive symbols. That is a parsing problem. When
you encounter a parsing problem in Lisp, one of the first things to ask
yourself is whether you can use the parser that is already built in to
Lisp to save yourself a lot of work. If a paragraph is a string, you
could use read-from-string to read symbols out of it. There are some
complications with that: To preserve case you need to make a copy of the
readtable and use readtable-case properly. If your input contains
special characters you have to do the proper things to make sure that
reader macros are disabled in the readtable you are using for the
read-from-string call. But you can start by getting it to work with
input that consists of only  words with alphabetic characters separated
by spaces and newlines, and then refine it. If it gets too complicated
to do using read-from-string and a custom readtable, you can rewrite the
function to read one character at a time and do your own parsing. You
can test this function by calling it with a string argument, separate
from your file to paragraphs function, then have the output of one be
passed to the other.

Then you can write a function that given a list of symbols in a
paragraph outputs them in justified lines. If you are assuming
fixed-width fonts, that should be fairly easy using (length (string
Symbol)) on each word. Remember that (string Symbol) converts a symbol
to a string. Again you can test this in the REPL by calling it with a
list of symbols, then once it works hook it in to the output of the
function that converted a paragraph to a list of words.

> And last question is about dividing words, I have a set of rules (based
> on the type of char I find)

I'm afraid that I don't understand your question here. You showed how to
 define a char-member function that determines if a character is a vowel
or a consonant, but you never said what the rules are.


-- 
    Sidney Markowitz
    http://www.sidney.com

From: andrea
Subject: Re: Reading and processing text
Date: Wed, 13 Dec 2006 10:20:39 +0000
Message-ID: <1166005239.151230.88590@f1g2000cwa.googlegroups.com>

On 13 Dic, 05:23, Sidney Markowitz <······@sidney.com> wrote:
> andrea wrote, On 13/12/06 1:14 PM:
>
> > Hello everyones, I need to write a lisp program that takes a text,
> > justifies it and writes it to another file.I would take a different approach than some of the other posters, and I
>
> One way of dealing with this is by writing one function at a time that
> you can get working independently of the others, playing with them
> interactively in the REPL (Read/Eval/Print Loop), then hook them
> together to make your program.

Yes this is normally what I try to do, I use sbcl with emacs and slime,
in one buffer there's the code and in the other the REPL...

> The next function that will be useful will convert a paragraph into a
> list of words as case-sensitive symbols. That is a parsing problem. When
> you encounter a parsing problem in Lisp, one of the first things to ask
> yourself is whether you can use the parser that is already built in to
> Lisp to save yourself a lot of work. If a paragraph is a string, you
> could use read-from-string to read symbols out of it. There are some
> complications with that: To preserve case you need to make a copy of the
> readtable and use readtable-case properly. If your input contains
> special characters you have to do the proper things to make sure that
> reader macros are disabled in the readtable you are using for the
> read-from-string call. But you can start by getting it to work with
> input that consists of only  words with alphabetic characters separated
> by spaces and newlines, and then refine it. If it gets too complicated
> to do using read-from-string and a custom readtable, you can rewrite the
> function to read one character at a time and do your own parsing. You
> can test this function by calling it with a string argument, separate
> from your file to paragraphs function, then have the output of one be
> passed to the other.
>
The input should be just ascii text, so it shouldn't be a problem, I'm
trying the read-from-string...

> Then you can write a function that given a list of symbols in a
> paragraph outputs them in justified lines. If you are assuming
> fixed-width fonts, that should be fairly easy using (length (string
> Symbol)) on each word. Remember that (string Symbol) converts a symbol
> to a string. Again you can test this in the REPL by calling it with a
> list of symbols, then once it works hook it in to the output of the
> function that converted a paragraph to a list of words.
>
> > And last question is about dividing words, I have a set of rules (based
> > on the type of char I find)I'm afraid that I don't understand your question here. You showed how to
>  define a char-member function that determines if a character is a vowel
> or a consonant, but you never said what the rules are.

Yes I know, the rules are pretty annoying to explain, they are
something like,
1 vowel in the first char, 1 cons in the second, 1 vowel in the third
==> the syllabe is the first vowel.

I'm trying to find a way to write less code as possible, because I even
have to check for the length of the remaining string to analyze.
And the other problem is that there are some rules like

2 equal consonants ==> the first with the previous syllabe, the second
with the next.
If I do a recursive function with as less side-effect as I can this can
be a pain in the ass...

Could I define macros for rules?
Is it better an iterative function to do this dirty work of rules
matching?

Thanks again