Newbie Help.

From: todd
Subject: Newbie Help.
Date: Mon, 17 Sep 2001 16:34:26 +0000
Message-ID: <66adff33.0109170834.28cff3c6@posting.google.com>

I am learning to LISP as a predecessor to stufying AI.  I am pretty
experienced developer (5+ yrs: java, C++, perl, vb...), and have
decided that as a project to learn on I will create a program to read
in an HTML file and spit out an HTML file that is XML compliant (close
image tags, quotes around attributes...).

My first step is to read each element in and print it out (w/
recursive funciton).

Now I need to store the elements I am reading into memory and am
having trouble deciphering how to do this.  I am able to hold
individual charaters, but I how do I accumulate them?

(DO 
        ((sChar (READ-CHAR oInpStream NIL) (READ-CHAR oInpStream
NIL)))
        ((EQ sChar NIL))			
		(COND ...whole bunch of stuff...))

among the ...whole bunch of stuff... I need to accumulate the data. 
How do I do this -- I can't find it anywhere!

Thx

Todd

Re: Newbie Help. Tim Moore
- Re: Newbie Help. Tim Bradshaw
  - Re: Newbie Help. Janis Dzerins
    - Re: Newbie Help. Tim Bradshaw
      - Re: Newbie Help. Janis Dzerins
      - Re: Newbie Help. Duane Rettig
        Re: Newbie Help. Janis Dzerins
        Re: Newbie Help. Duane Rettig
  - Re: Newbie Help. Tim Moore
    - Re: Newbie Help. Tim Bradshaw
      - Re: Newbie Help. Tim Moore
Re: Newbie Help. Tim Bradshaw
- Re: Newbie Help. James A. Crippen
  - Re: Newbie Help. Juliusz Chroboczek
Re: Newbie Help. JP Massar
- Re: Newbie Help. todd
  - Re: Newbie Help. Matthieu Villeneuve
    - Re: Newbie Help. todd

From: Tim Moore
Subject: Re: Newbie Help.
Date: Mon, 17 Sep 2001 17:55:43 +0000
Message-ID: <9o5dev$58g$0@216.39.145.192>

I tend to use adjustable arrays when I need to accumulate strings:

(let ((return-val (make-array 1 
			      :element-type 'character
			      :fill-pointer 0
			      :adjustable t)))
  (do ((schar (read-char stream nil) (read-char stream nil)))
      ((null schar)
       return-val)
    (vector-push-extend schar return-val)))

If you're slurping an entire file into a string, other approaches may be
more efficient and/or easier to program.  Check out READ-LINE and
READ-SEQUENCE.  Another example:

(with-open-file (s "foo")
  (let ((slurped-file (make-array (file-length s) :element-type 'character)))
    (read-sequence slurped-file s)
    slurped-file))

An aside to language lawyers: how is file length defined for non-binary
files?  The HyperSpec seems to be sparse on the issue.

Tim

On 17 Sep 2001, todd wrote:

> I am learning to LISP as a predecessor to stufying AI.  I am pretty
> experienced developer (5+ yrs: java, C++, perl, vb...), and have
> decided that as a project to learn on I will create a program to read
> in an HTML file and spit out an HTML file that is XML compliant (close
> image tags, quotes around attributes...).
> 
> My first step is to read each element in and print it out (w/
> recursive funciton).
> 
> Now I need to store the elements I am reading into memory and am
> having trouble deciphering how to do this.  I am able to hold
> individual charaters, but I how do I accumulate them?
> 
> (DO 
>         ((sChar (READ-CHAR oInpStream NIL) (READ-CHAR oInpStream
> NIL)))
>         ((EQ sChar NIL))			
> 		(COND ...whole bunch of stuff...))
> 
> among the ...whole bunch of stuff... I need to accumulate the data. 
> How do I do this -- I can't find it anywhere!
> 
> Thx
> 
> Todd
> 
>

From: Tim Bradshaw
Subject: Re: Newbie Help.
Date: Tue, 18 Sep 2001 08:56:49 +0000
Message-ID: <ey3y9nc512m.fsf@cley.com>

* Tim Moore wrote:
> (with-open-file (s "foo")
>   (let ((slurped-file (make-array (file-length s) :element-type 'character)))
>     (read-sequence slurped-file s)
>     slurped-file))

> An aside to language lawyers: how is file length defined for non-binary
> files?  The HyperSpec seems to be sparse on the issue.

I think in such a way that this does not work.  It's fairly hard to
see how this could work efficiently in the presence of encodings.  For
something like UTF-8 I think characters can be one to four (?) octets
long, so FILE-LENGTH would have to actually go through the whole file
to find out how many characters long it was.  A similar thing happens
on DOS where the line-end conversions bite you - you need to count the
lines to know how long the file is.

Presumably all this works OK if you read (unsigned-byte 8)s or some
suitably binary type.

--tim

From: Janis Dzerins
Subject: Re: Newbie Help.
Date: Tue, 18 Sep 2001 10:29:37 +0000
Message-ID: <87ofo8eqr2.fsf@asaka.latnet.lv>

Tim Bradshaw <···@cley.com> writes:

> * Tim Moore wrote:
> > (with-open-file (s "foo")
> >   (let ((slurped-file (make-array (file-length s) :element-type 'character)))
> >     (read-sequence slurped-file s)
> >     slurped-file))
> 
> > An aside to language lawyers: how is file length defined for
> > non-binary files?  The HyperSpec seems to be sparse on the issue.
> 
> I think in such a way that this does not work.  It's fairly hard to
> see how this could work efficiently in the presence of encodings.
> For something like UTF-8 I think characters can be one to four (?)
> octets

It's one to six. man utf8.

-- 
Janis Dzerins

  If million people say a stupid thing it's still a stupid thing.

From: Tim Bradshaw
Subject: Re: Newbie Help.
Date: Tue, 18 Sep 2001 10:53:54 +0000
Message-ID: <ey3iteg4vnh.fsf@cley.com>

* Janis Dzerins wrote:

> It's one to six. man utf8.

you're making the dangerous assumption that I'm working on a system,
which supports all this internationalisation stuff.  My main problem
is converting to and from the native BAUDOT that my system uses to
this new-fangled 7bit (7 bits?  who could need so many characters?)
ASCII for things like news postings.

Of course, BAUDOT has variable length encoding too, which is how I
know about this.

Incidentally I'm very annoyed that none of the Lisp vendors support
BAUDOT as an external format.

--tim

From: Janis Dzerins
Subject: Re: Newbie Help.
Date: Tue, 18 Sep 2001 11:58:41 +0000
Message-ID: <87k7ywemmm.fsf@asaka.latnet.lv>

Tim Bradshaw <···@cley.com> writes:

> * Janis Dzerins wrote:
> 
> > It's one to six. man utf8.
> 
> you're making the dangerous assumption that I'm working on a system,
> which supports all this internationalisation stuff.

I don't make any assumptions. I just mentioned the source of
information I posted (and to save bandwidth with
not-very-on-topic-posts I tried to keep it as short as possible
without loosing the meaning, in what I see I succeeded).

> My main problem is converting to and from the native BAUDOT that my
> system uses to this new-fangled 7bit (7 bits?  who could need so
> many characters?)  ASCII for things like news postings.
> 
> Of course, BAUDOT has variable length encoding too, which is how I
> know about this.
> 
> Incidentally I'm very annoyed that none of the Lisp vendors support
> BAUDOT as an external format.

[On-topic] humour is what I like in your posts.

-- 
Janis Dzerins

  If million people say a stupid thing it's still a stupid thing.

From: Duane Rettig
Subject: Re: Newbie Help.
Date: Wed, 19 Sep 2001 08:20:54 +0000
Message-ID: <4pu8nzj4p.fsf@beta.franz.com>

Tim Bradshaw <···@cley.com> writes:

> * Janis Dzerins wrote:
> 
> > It's one to six. man utf8.
> 
> you're making the dangerous assumption that I'm working on a system,
> which supports all this internationalisation stuff.  My main problem
> is converting to and from the native BAUDOT that my system uses to
> this new-fangled 7bit (7 bits?  who could need so many characters?)
> ASCII for things like news postings.
> 
> Of course, BAUDOT has variable length encoding too, which is how I
> know about this.
> 
> Incidentally I'm very annoyed that none of the Lisp vendors support
> BAUDOT as an external format.

No, but we do give you tools to define your own external-format:

http://www.franz.com/support/documentation/6.0/doc/iacl.htm#defining-efs-3

Once you've created your BAUDOT external-format, be sure to send
it back to us, and we'll be happy to include it on our website.
But you'll have to maintain it, since we don't have any teletypes
to test it with...

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Janis Dzerins
Subject: Re: Newbie Help.
Date: Wed, 19 Sep 2001 10:06:00 +0000
Message-ID: <87d74ncx6f.fsf@asaka.latnet.lv>

Duane Rettig <·····@franz.com> writes:

> Tim Bradshaw <···@cley.com> writes:
> 
> > * Janis Dzerins wrote:
> > 
> > > It's one to six. man utf8.
> > 
> > you're making the dangerous assumption that I'm working on a system,
> > which supports all this internationalisation stuff.  My main problem
> > is converting to and from the native BAUDOT that my system uses to
> > this new-fangled 7bit (7 bits?  who could need so many characters?)
> > ASCII for things like news postings.
> > 
> > Of course, BAUDOT has variable length encoding too, which is how I
> > know about this.
> > 
> > Incidentally I'm very annoyed that none of the Lisp vendors support
> > BAUDOT as an external format.
> 
> No, but we do give you tools to define your own external-format:
> 
> http://www.franz.com/support/documentation/6.0/doc/iacl.htm#defining-efs-3
> 
> Once you've created your BAUDOT external-format, be sure to send
> it back to us, and we'll be happy to include it on our website.
> But you'll have to maintain it, since we don't have any teletypes
> to test it with...

As far as I know now, BAUDOT uses 5 bit bytes. From iacl
documentation, I found out that external-formats are
octet(s)->character, character->octet(s) transformation.

Is the 5-bit-byte external-format doable without [much] hacking?
(I'm just curious and would be glad to get a short answer.)

-- 
Janis Dzerins

  If million people say a stupid thing it's still a stupid thing.

From: Duane Rettig
Subject: Re: Newbie Help.
Date: Wed, 19 Sep 2001 18:11:22 +0000
Message-ID: <4sndjqcdx.fsf@beta.franz.com>

Janis Dzerins <·····@latnet.lv> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > Tim Bradshaw <···@cley.com> writes:
> > 
> > > * Janis Dzerins wrote:
> > > 
> > > > It's one to six. man utf8.
> > > 
> > > you're making the dangerous assumption that I'm working on a system,
> > > which supports all this internationalisation stuff.  My main problem
> > > is converting to and from the native BAUDOT that my system uses to
> > > this new-fangled 7bit (7 bits?  who could need so many characters?)
> > > ASCII for things like news postings.
> > > 
> > > Of course, BAUDOT has variable length encoding too, which is how I
> > > know about this.
> > > 
> > > Incidentally I'm very annoyed that none of the Lisp vendors support
> > > BAUDOT as an external format.
> > 
> > No, but we do give you tools to define your own external-format:
> > 
> > http://www.franz.com/support/documentation/6.0/doc/iacl.htm#defining-efs-3
> > 
> > Once you've created your BAUDOT external-format, be sure to send
> > it back to us, and we'll be happy to include it on our website.
> > But you'll have to maintain it, since we don't have any teletypes
> > to test it with...
> 
> As far as I know now, BAUDOT uses 5 bit bytes. From iacl
> documentation, I found out that external-formats are
> octet(s)->character, character->octet(s) transformation.
> 
> Is the 5-bit-byte external-format doable without [much] hacking?
> (I'm just curious and would be glad to get a short answer.)

Heh, heh, I was being facetious, because I read an implied smiley in
Tim's first paragraph (sorry if that was wrong, Tim).

However, in all seriousness, it is true that the octet-oriented nature
of our implementation makes it hard to do such a sub-octet
external-format, and I would thus not recommend it.

If I were seriously considering reading and writing baudot characters,
though, I would consider doing it with an encapsulated stream.
Creating encapsulations is hard to see in our current simple-streams
documentation, but I am working on a couple of examples for the 6.1
release.  And even if I can't get them done before the release, they
should become available on our website when I do get the chance to
complete them.  However, I would less likely consider a baudot
encoding as an example, and would prefer a more widely-used encoding
such as the Base64 mime encoding instead as an example.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Tim Moore
Subject: Re: Newbie Help.
Date: Tue, 18 Sep 2001 18:44:20 +0000
Message-ID: <9o84m4$e80$0@216.39.145.192>

On 18 Sep 2001, Tim Bradshaw wrote:

> * Tim Moore wrote:
> > (with-open-file (s "foo")
> >   (let ((slurped-file (make-array (file-length s) :element-type 'character)))
> >     (read-sequence slurped-file s)
> >     slurped-file))
> 
> > An aside to language lawyers: how is file length defined for non-binary
> > files?  The HyperSpec seems to be sparse on the issue.
> 
> I think in such a way that this does not work.  It's fairly hard to
> see how this could work efficiently in the presence of encodings.  For

Yeah, I realize that it would be onerous to make FILE-LENGTH return the
number of characters that would actually be read.  However, for character
encodings that I can think of the number of bytes in the file will be
greater than or equal to the number of bytes read.  So this should "work",
though it might be more useful to do something like:
(with-open-file (s "foo")
  (let ((slurped-file (make-array (file-length s) 
				  :element-type 'character
				  :fill-pointer t)))
    (setf (fill-pointer slurped-file) (read-sequence slurped-file s)) 
    slurped-file))

Tim

From: Tim Bradshaw
Subject: Re: Newbie Help.
Date: Wed, 19 Sep 2001 08:43:56 +0000
Message-ID: <ey366af4lkj.fsf@cley.com>

* Tim Moore wrote:
> Yeah, I realize that it would be onerous to make FILE-LENGTH return the
> number of characters that would actually be read.  However, for character
> encodings that I can think of the number of bytes in the file will be
> greater than or equal to the number of bytes read.  

I was wondering about this the other week.  Is it always true?  As a,
possibly silly, counterexample would something like `gzipped-x' be a
reasonable encoding for a file?

-tim

From: Tim Moore
Subject: Re: Newbie Help.
Date: Wed, 19 Sep 2001 17:28:21 +0000
Message-ID: <9oakjl$ih1$0@216.39.145.192>

On 19 Sep 2001, Tim Bradshaw wrote:

> * Tim Moore wrote:
> > Yeah, I realize that it would be onerous to make FILE-LENGTH return the
> > number of characters that would actually be read.  However, for character
> > encodings that I can think of the number of bytes in the file will be
> > greater than or equal to the number of bytes read.  
                                           ^^^^^
I meant characters, in case it wasn't clear.
> 
> I was wondering about this the other week.  Is it always true?  As a,
> possibly silly, counterexample would something like `gzipped-x' be a
> reasonable encoding for a file?
> 
> -tim

For some definition of reasonable :)  Another counterexample, also
somewhat exotic, is reading a Unicode file with pre-composed characters
that are decoded into constituents by the stream machinery.

Perhaps a better all-purpose slurper is:

(defun slurp-stream (s)
  (let ((scratch (make-array 4096 :element-type 'character)))
    (with-output-to-string (string-stream nil :element-type 'character)
      (loop 
          for bytes-read = (read-sequence scratch s)
          while (> bytes-read 0)
          do (write-sequence scratch string-stream :end bytes-read)))))

Tim

From: Tim Bradshaw
Subject: Re: Newbie Help.
Date: Mon, 17 Sep 2001 17:50:03 +0000
Message-ID: <ey37kux671w.fsf@cley.com>

* tbrown 01923 wrote:

> (DO 
>         ((sChar (READ-CHAR oInpStream NIL) (READ-CHAR oInpStream
> NIL)))
>         ((EQ sChar NIL))			
> 		(COND ...whole bunch of stuff...))

(loop for schar = (read-char in nil)
      while schar
      collect schar)

for instance.

Or if you want to collect into a string:

(loop with buf = (make-array 64 :element-type 'character 
                                :adjustable t
                                :fill-pointer 0)
      for schar = (read-char in nil)
      while schar
      do ...
         (vector-push-extend char vector))
      finally (return buf))

Of course using LOOP means you will go to hell, although not as fast
as using wEird Capitalisation will.  Quiche eaters do this
with obscure recursive tricks that mean you don't need to reverse the
list.  Real Men do it with TAGBODY, GO and RPLACD (and their code is
all upper case of course).

--tim

From: James A. Crippen
Subject: Re: Newbie Help.
Date: Sat, 22 Sep 2001 07:07:26 +0000
Message-ID: <m33d5fr9e9.fsf@kappa.unlambda.com>

Tim Bradshaw <···@cley.com> writes:

> Of course using LOOP means you will go to hell, although not as fast
> as using wEird Capitalisation will.

Or using _ instead of, say -.

(let ((*flame* t))

I once wrote an elisp hack to demunge an elisp file obviously written
by some C luser, filled with stuff like

(defun 
 foo_bar(x y)
    let (
      (
        a (+ quux_baz 2)
      )
      ...
    )
)

and such similar nonsense.  I wish I'd actually kept it around as a
hook in the save-buffer function for any Lisp buffer.  Would have been
useful.

What I don't get about people who write that sort of thing is exactly
why they don't pick up on the fact that Lisp has a fairly rigid
indentation style when all the examples they ever see in megabytes of
source code are formatted the *exact* *same* *way*?  All C programmers
just want to have their very own indentation style that enhances their
'uniqness'.  I think there's something similar in this behavior to
that of the people who obsess over how many chromed blinky gadgets
they can attach to their big noisy trucks.

The most atrocious 'uniq' C formatting that I've seen so far (that is
actually used by someone in daily life) entails indenting relative to
the width of the last keyword starting a code block, with closing braces
indented at the same level so you don't notice them in the line noise
that is C.  The programmer who insists upon this also insists on
commenting every brace in detail with the line that it matches, on the
presumption that you can't look back three lines to figure out where
the brace went.  Eg,

int foo(int bar) {
    short quux;
    char *fnord;

    if (bar) {
       callSomeRandomFunctionThatIsHardToTypeBecauseOfAllTheCaps(bar);
       }
      else {
           callSomeOtherRandomFunctionWithANameThatIsTooLong(bar);
           } /* if (bar) */
    } /* function int foo(int bar) */

The commenting reaction I think has something to do with both poor
editor support for brace matching (the programmer in question uses
some random editor that reminds him of his favorite Mac IDE) as well
as the inherent unreadability of C syntax because of all the silly
characters that clutter it.  (I'm not apologizing for Lisp's hairy
macro quoting though...)

)

> Quiche eaters do this
> with obscure recursive tricks that mean you don't need to reverse the
> list.  Real Men do it with TAGBODY, GO and RPLACD (and their code is
> all upper case of course).

Don't be silly.  Real Men don't use such wimpy 'Common' Lisp crap like
'TAGBODY'.  Real men use MACLISP and do things like

  ...
  (PROG (SETQ LOOP T)
        LOOP
        ;; IMAGINE 5K LOC HERE
        (COND (LOOP (GO LOOP))))

dammit.

And Real Men don't need wimpy things like formatting:

  DEFINE ((
  (MEMBER(LAMBDA(A X)(COND((NULL X)F)
       ((EQ A(CAR X))T)(T(MEMBER A(CDR X))))))
  (UNION(LAMBDA(X Y)(COND((NULL X)Y)((MEMBER
       (CAR X)Y)(UNION(CDR X)Y))(T(CONS(CAR X)
       (UNION(CDR X)Y))))))))

If it was hard to write then it should be hard to read.  (Thanks to
McCarthy et alia for the above example.)

'james

-- 
James A. Crippen <·····@unlambda.com> ,-./-.  Anchorage, Alaska,
Lambda Unlimited: Recursion 'R' Us   |  |/  | USA, 61.2069 N, 149.766 W,
Y = \f.(\x.f(xx)) (\x.f(xx))         |  |\  | Earth, Sol System,
Y(F) = F(Y(F))                        \_,-_/  Milky Way.

From: Juliusz Chroboczek
Subject: Re: Newbie Help.
Date: Sat, 22 Sep 2001 16:27:37 +0000
Message-ID: <87y9n7dxuo.fsf@pps.jussieu.fr>

James A Crippen:

JC> And Real Men don't need wimpy things like formatting:

[...]

JC> (Thanks to McCarthy et alia for the above example.)

But doesn't McCarthy carefully format his M-expressions?

                                        Juliusz

From: JP Massar
Subject: Re: Newbie Help.
Date: Mon, 17 Sep 2001 19:05:15 +0000
Message-ID: <3ba64911.157629064@news>

On 17 Sep 2001 09:34:26 -0700, ············@hotmail.com (todd) wrote:
 
>
>Now I need to store the elements I am reading into memory and am
>having trouble deciphering how to do this.  I am able to hold
>individual charaters, but I how do I accumulate them?
>
>(DO 
>        ((sChar (READ-CHAR oInpStream NIL) (READ-CHAR oInpStream
>NIL)))
>        ((EQ sChar NIL))			
>		(COND ...whole bunch of stuff...))
>
>among the ...whole bunch of stuff... I need to accumulate the data. 
>How do I do this -- I can't find it anywhere!
>
 
Take a look at creating adjustable arrays and using the
vector-push-extend function.

From: todd
Subject: Re: Newbie Help.
Date: Tue, 18 Sep 2001 14:26:10 +0000
Message-ID: <66adff33.0109180626.19e8739f@posting.google.com>

Thx all -- and I will watch my capitolisation (coding standard
habbits), is there a LISP coding style faq?

······@alum.mit.edu (JP Massar) wrote in message news:<··················@news>...
> On 17 Sep 2001 09:34:26 -0700, ············@hotmail.com (todd) wrote:
>  
> >
> >Now I need to store the elements I am reading into memory and am
> >having trouble deciphering how to do this.  I am able to hold
> >individual charaters, but I how do I accumulate them?
> >
> >(DO 
> >        ((sChar (READ-CHAR oInpStream NIL) (READ-CHAR oInpStream
> >NIL)))
> >        ((EQ sChar NIL))			
> >		(COND ...whole bunch of stuff...))
> >
> >among the ...whole bunch of stuff... I need to accumulate the data. 
> >How do I do this -- I can't find it anywhere!
> >
>  
> Take a look at creating adjustable arrays and using the
> vector-push-extend function.

From: Matthieu Villeneuve
Subject: Re: Newbie Help.
Date: Tue, 18 Sep 2001 16:46:09 +0000
Message-ID: <3BA77A4F.1D742653@tumbleweed.com>

todd wrote:
> 
> Thx all -- and I will watch my capitolisation (coding standard
> habbits), is there a LISP coding style faq?

You may want to take a look there:
http://www.landfield.com/faqs/lisp-faq/
The section 1-3 contains very valuable advice regarding coding style.

--Matthieu

From: todd
Subject: Re: Newbie Help.
Date: Wed, 19 Sep 2001 15:27:14 +0000
Message-ID: <66adff33.0109190727.4e177243@posting.google.com>

thx again.

> 
> You may want to take a look there:
> http://www.landfield.com/faqs/lisp-faq/
> The section 1-3 contains very valuable advice regarding coding style.
> 
> --Matthieu