From: Mark  Watson
Subject: Q: easy to use, efficient code for reading ZIPed or GZIPed text files?
Date: 
Message-ID: <1187192725.974648.101260@50g2000hsm.googlegroups.com>
I have been looking through the packages on the cliki compression
page:

http://www.cliki.net/Compression

Lots of good code to compress and decompress, but I am also looking
for a wrapper that will let me do simply (as in Ruby) operations like
read a GZIPed text file, passing my own function to process each line.
Or something like (with-zip-file ...)

Something that works with the iterate package would be especially
good, but I am looking for something that wraps reading compressed
text files in a line or two of code (because this is a common
operation for me).

Anyway, if anyone has a nice wrapper, please share it :-)

Thanks,
Mark

PS. I thought of asking this group because I saw a great snippet
posted by Louis Oliveira that I now use frequently:

   (iter (for line in-file file using #'read-line)
      ;; process 'line'
   )

I find that I learn a lot from reading other people's code - fun way
to learn.

From: Mark  Watson
Subject: Re: Q: easy to use, efficient code for reading ZIPed or GZIPed text files?
Date: 
Message-ID: <1187193245.966956.227280@g4g2000hsf.googlegroups.com>
Also, the text files that I process are often very large, so using
streams and not reading everything into memory is best.

Off topic, but for Franz Lisp users, I have found that using excl:read-
line-into buffer helps a lot: cuts way down on cons'ing when
processing large files.

On Aug 15, 8:45 am, Mark  Watson <···········@gmail.com> wrote:
> I have been looking through the packages on the cliki compression
> page:
>
> http://www.cliki.net/Compression
>
> Lots of good code to compress and decompress, but I am also looking
> for a wrapper that will let me do simply (as in Ruby) operations like
> read a GZIPed text file, passing my own function to process each line.
> Or something like (with-zip-file ...)
>
> Something that works with the iterate package would be especially
> good, but I am looking for something that wraps reading compressed
> text files in a line or two of code (because this is a common
> operation for me).
>
> Anyway, if anyone has a nice wrapper, please share it :-)
>
> Thanks,
> Mark
>
> PS. I thought of asking this group because I saw a great snippet
> posted by Louis Oliveira that I now use frequently:
>
>    (iter (for line in-file file using #'read-line)
>       ;; process 'line'
>    )
>
> I find that I learn a lot from reading other people's code - fun way
> to learn.
From: David Lichteblau
Subject: Re: Q: easy to use, efficient code for reading ZIPed or GZIPed text files?
Date: 
Message-ID: <slrnfc6bmu.au9.usenet-2006@radon.home.lichteblau.com>
On 2007-08-15, Mark Watson <···········@gmail.com> wrote:
> Lots of good code to compress and decompress, but I am also looking
> for a wrapper that will let me do simply (as in Ruby) operations like
> read a GZIPed text file, passing my own function to process each line.
> Or something like (with-zip-file ...)
>
> Something that works with the iterate package would be especially
> good, but I am looking for something that wraps reading compressed
> text files in a line or two of code (because this is a common
> operation for me).

Put the following code into the package Franz' inflate routines are in.
(For example using the ZIP package, which includes inflate.cl.)

Use like this:

(with-open-file (s "passwd.gz" :element-type '(unsigned-byte 8))
  (zip::skip-gzip-header s)
  (let ((r (flexi-streams:make-flexi-stream (zip::make-inflate-stream s))))
    (loop for line = (read-line r nil)
	  while line
	  do (print line))))

"root:x:0:0:root:/root:/bin/bash"                                               
"daemon:x:1:1:daemon:/usr/sbin:/bin/sh"                                         
"bin:x:2:2:bin:/bin:/bin/sh"                                                    
"sys:x:3:3:sys:/dev:/bin/sh"                                                    
...


;;;; (c) David Lichteblau, X11-style license

(in-package :zip)

(defclass inflate-stream
    (trivial-gray-stream-mixin fundamental-binary-input-stream)
  ((current-stream :initform nil)
   (current-vectors :initform nil)
   (br :initarg :br)
   (buffer :initarg :buffer)
   (end :initform 0)))

(defmethod initialize-instance :after ((stream inflate-stream) &key)
  (refill stream))

(defun make-inflate-stream (source)
  (let ((buf (make-array (* 32 1024) :element-type '(unsigned-byte 8))))
    (make-instance 'inflate-stream :br (new-bit-reader source) :buffer buf)))

(defun refill (stream)
  (with-slots (current-stream current-vectors br buffer end) stream
    (unless current-vectors
      (unless end
	(setf current-stream nil)
	(return-from refill nil))
      (setf current-vectors
	    (nreverse
	     (let ((vectors '()))
	       (flet ((op (buf end)
			(push (subseq buf 0 end) vectors)))
		 (setq end (process-deflate-block br #'op buffer end)))
	       vectors))))
    (setf current-stream
	  (flexi-streams:make-in-memory-input-stream (pop current-vectors)))))

(defmethod stream-element-type ((stream inflate-stream))
  '(unsigned-byte 8))

(defmethod stream-read-byte ((stream inflate-stream))
  (with-slots (current-stream) stream
    (or (read-byte current-stream nil)
	(if (refill stream)
	    (read-byte current-stream nil :eof)
	    :eof))))

(defmethod stream-listen ((stream inflate-stream))
  (with-slots (current-stream) stream
    (or (listen current-stream)
	(if (refill stream)
	    (listen current-stream)
	    nil))))

(defmethod stream-read-sequence
    ((stream inflate-stream) sequence start end &key)
  (with-slots (current-stream) stream
    (let ((index (read-sequence current-stream
				sequence
				:start start
				:end end)))
      (loop while (and (< index end) (refill stream)) do
	    (setf index (read-sequence current-stream
				       sequence
				       :start index
				       :end end))))))
From: Mark  Watson
Subject: Re: Q: easy to use, efficient code for reading ZIPed or GZIPed text files?
Date: 
Message-ID: <1187205580.025327.129690@d55g2000hsg.googlegroups.com>
Thanks David!

This is helpful, and a time saver for me.

Best regards,
Mark

On Aug 15, 9:50 am, David Lichteblau <···········@lichteblau.com>
wrote:
> On 2007-08-15, Mark Watson <···········@gmail.com> wrote:
>
> > Lots of good code to compress and decompress, but I am also looking
> > for a wrapper that will let me do simply (as in Ruby) operations like
> > read a GZIPed text file, passing my own function to process each line.
> > Or something like (with-zip-file ...)
>
> > Something that works with the iterate package would be especially
> > good, but I am looking for something that wraps reading compressed
> > text files in a line or two of code (because this is a common
> > operation for me).
>
> Put the following code into the package Franz' inflate routines are in.
> (For example using the ZIP package, which includes inflate.cl.)
>
> Use like this:
>
> (with-open-file (s "passwd.gz" :element-type '(unsigned-byte 8))
>   (zip::skip-gzip-header s)
>   (let ((r (flexi-streams:make-flexi-stream (zip::make-inflate-stream s))))
>     (loop for line = (read-line r nil)
>           while line
>           do (print line))))
>
> "root:x:0:0:root:/root:/bin/bash"
> "daemon:x:1:1:daemon:/usr/sbin:/bin/sh"
> "bin:x:2:2:bin:/bin:/bin/sh"
> "sys:x:3:3:sys:/dev:/bin/sh"
> ...
>
> ;;;; (c) David Lichteblau, X11-style license
>
> (in-package :zip)
>
> (defclass inflate-stream
>     (trivial-gray-stream-mixin fundamental-binary-input-stream)
>   ((current-stream :initform nil)
>    (current-vectors :initform nil)
>    (br :initarg :br)
>    (buffer :initarg :buffer)
>    (end :initform 0)))
>
> (defmethod initialize-instance :after ((stream inflate-stream) &key)
>   (refill stream))
>
> (defun make-inflate-stream (source)
>   (let ((buf (make-array (* 32 1024) :element-type '(unsigned-byte 8))))
>     (make-instance 'inflate-stream :br (new-bit-reader source) :buffer buf)))
>
> (defun refill (stream)
>   (with-slots (current-stream current-vectors br buffer end) stream
>     (unless current-vectors
>       (unless end
>         (setf current-stream nil)
>         (return-from refill nil))
>       (setf current-vectors
>             (nreverse
>              (let ((vectors '()))
>                (flet ((op (buf end)
>                         (push (subseq buf 0 end) vectors)))
>                  (setq end (process-deflate-block br #'op buffer end)))
>                vectors))))
>     (setf current-stream
>           (flexi-streams:make-in-memory-input-stream (pop current-vectors)))))
>
> (defmethod stream-element-type ((stream inflate-stream))
>   '(unsigned-byte 8))
>
> (defmethod stream-read-byte ((stream inflate-stream))
>   (with-slots (current-stream) stream
>     (or (read-byte current-stream nil)
>         (if (refill stream)
>             (read-byte current-stream nil :eof)
>             :eof))))
>
> (defmethod stream-listen ((stream inflate-stream))
>   (with-slots (current-stream) stream
>     (or (listen current-stream)
>         (if (refill stream)
>             (listen current-stream)
>             nil))))
>
> (defmethod stream-read-sequence
>     ((stream inflate-stream) sequence start end &key)
>   (with-slots (current-stream) stream
>     (let ((index (read-sequence current-stream
>                                 sequence
>                                 :start start
>                                 :end end)))
>       (loop while (and (< index end) (refill stream)) do
>             (setf index (read-sequence current-stream
>                                        sequence
>                                        :start index
>                                        :end end))))))