From: dpapathanasiou
Subject: Signaling End of Stream: Special Character Constraints?
Date: 
Message-ID: <1135278735.067845.165200@g49g2000cwa.googlegroups.com>
In experimenting with issues related to my prior post
(http://groups.google.com/group/comp.lang.lisp/browse_thread/thread/8efbcc1a4f2c4f97)
I discovered that if I use:

(make-string-input-stream s)

where s is an string containing more than one sequence of "<? ... ?>"
then whatever reads the resulting stream hangs, as if it's waiting for
more input!

If s contains just one sequence bounded by "<? ... ?>", or, if there
are no such sequences in s, then there is no hanging, i.e. anything
reading the resulting stream knows where it starts and stops, and does
not hang waiting for more input.

So, is there anything about the character sequence "<? ... ?><? ... ?>"
(or perhaps it's this combination " ... ?><? ... ") which fools the
stream reader?

From: Barry Margolin
Subject: Re: Signaling End of Stream: Special Character Constraints?
Date: 
Message-ID: <barmar-149B12.14562022122005@comcast.dca.giganews.com>
In article <························@g49g2000cwa.googlegroups.com>,
 "dpapathanasiou" <···················@gmail.com> wrote:

> In experimenting with issues related to my prior post
> (http://groups.google.com/group/comp.lang.lisp/browse_thread/thread/8efbcc1a4f
> 2c4f97)
> I discovered that if I use:
> 
> (make-string-input-stream s)
> 
> where s is an string containing more than one sequence of "<? ... ?>"
> then whatever reads the resulting stream hangs, as if it's waiting for
> more input!
> 
> If s contains just one sequence bounded by "<? ... ?>", or, if there
> are no such sequences in s, then there is no hanging, i.e. anything
> reading the resulting stream knows where it starts and stops, and does
> not hang waiting for more input.
> 
> So, is there anything about the character sequence "<? ... ?><? ... ?>"
> (or perhaps it's this combination " ... ?><? ... ") which fools the
> stream reader?

It shouldn't if you're using the default readtable.  Those are both 
ordinary constituent characters to the Lisp reader.

Can you give a simple example that shows the hanging?

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
From: dpapathanasiou
Subject: Re: Signaling End of Stream: Special Character Constraints?
Date: 
Message-ID: <1135285954.871137.175880@g49g2000cwa.googlegroups.com>
Here's a quick test I did earlier: I have 2 xml files "hang.xml" and
"no-hang.xml" (they're both rss feeds).

$ diff hang.xml no-hang.xml
2d1
< <?xml-stylesheet href="/rss.xsl" type="text/xsl" media="screen"?>

i.e. they're the same RSS xml file, except the second line in
"no-hang.xml" has an additional RSS header.

Then, I passed both files to my Lisp function (here's part of my cmucl
session):

* (time (parse-xml-from-file "no-hang.xml"))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:

; Evaluation took:
;   0.01 seconds of real time
;   0.01 seconds of user run time
;   0.0 seconds of system run time
;   24,384,796 CPU cycles
;   0 page faults and
;   17,528 bytes consed.
;
(("rss") "
[ output continues ]

* (time (parse-xml-from-file "hang.xml"))
[ prompt never returns ]

The function (parse-xml-from-file) is defined in my previous post
(http://groups.google.com/group/comp.lang.lisp/browse_thread/thread/8efbcc1a4f2c4f97).

And that's what led me to think it was those combination of characters
in the stream causing the hang.

But just to be sure, I wrote a simpler set of stream reading functions,
inspired by this article (http://www.emmett.ca/~sabetts/slurp.html):

(defun slurp-stream4 (stream)
  (let ((seq (make-string (file-length stream))))
    (read-sequence seq stream)
    seq))

(defun my-cat (file)
  (with-open-file (stream file :direction :input)
    (format t "~A~%" (slurp-stream4 stream))))

This time, though, both streams were read without any hanging at all:

* (time (my-cat "no-hang.xml"))
; Evaluation took:
;   0.0 seconds of real time
;   0.0 seconds of user run time
;   0.0 seconds of system run time
;   1,392,676 CPU cycles
;   0 page faults and
;   3,968 bytes consed.
;
NIL

* (time (my-cat "hang.xml"))
; Evaluation took:
;   0.0 seconds of real time
;   0.0 seconds of user run time
;   0.0 seconds of system run time
;   1,417,548 CPU cycles
;   0 page faults and
;   4,056 bytes consed.
;
NIL

So you're right, creating streams from files or strings containg
multiple "<? ... ?>" sequences has nothing to do with it.

Unfortunately, it looks like the problem is in CMUCL's ext:run-program
extension and the way it sends stdin to the child process ("elements"
binary) during execution.

Running  "/usr/local/bin/utils/elements < hang.xml" and
"/usr/local/bin/utils/elements < no-hang.xml" from the command line,
both work equally well, so ext:run-program must be doing something
strange when it hits that combination of characters.

So it's back to the drawing board.

Thanks, though, for helping me eliminate stream creation as a cause.
From: Barry Margolin
Subject: Re: Signaling End of Stream: Special Character Constraints?
Date: 
Message-ID: <barmar-17CD12.17174922122005@comcast.dca.giganews.com>
In article <························@g49g2000cwa.googlegroups.com>,
 "dpapathanasiou" <···················@gmail.com> wrote:

> Here's a quick test I did earlier: I have 2 xml files "hang.xml" and
> "no-hang.xml" (they're both rss feeds).
> 
> $ diff hang.xml no-hang.xml
> 2d1
> < <?xml-stylesheet href="/rss.xsl" type="text/xsl" media="screen"?>

Your previous message said that the <? ... ?> stuff was in the string 
that you were passing to the Lisp reader.  But now you say it's in the 
XML file, which expat is supposed to translate into Lisp s-expressions.  
So by the time Lisp sees it, these sequences should be gone.

That's why I wanted to see a simpler example that just shows Lisp 
hanging in READ, without any of the expat stuff.

One thing I noticed in the other posting was that you call OPEN but 
never call CLOSE on the stream it returns (in fact, you never even 
assign the stream to a variable).  However, I don't think that should 
cause a problem like this.

> But just to be sure, I wrote a simpler set of stream reading functions,
> inspired by this article (http://www.emmett.ca/~sabetts/slurp.html):
> 
> (defun slurp-stream4 (stream)
>   (let ((seq (make-string (file-length stream))))
>     (read-sequence seq stream)
>     seq))
> 
> (defun my-cat (file)
>   (with-open-file (stream file :direction :input)
>     (format t "~A~%" (slurp-stream4 stream))))

Now you're using READ-SEQUENCE, but your PARSE-XML-FROM-FILE function 
uses READ.  READ-SEQUENCE just reads characters or raw bytes without 
parsing, while READ is the S-expression parser, so you're comparing 
apples and oranges.

> Unfortunately, it looks like the problem is in CMUCL's ext:run-program
> extension and the way it sends stdin to the child process ("elements"
> binary) during execution.

You can run "ps" to see if the child process is still running.  If it's 
completed and just waiting for your function to call EXT:PROCESS-WAIT, 
it should show up as a zombie, with the name <defunct>.

> 
> Running  "/usr/local/bin/utils/elements < hang.xml" and
> "/usr/local/bin/utils/elements < no-hang.xml" from the command line,
> both work equally well, so ext:run-program must be doing something
> strange when it hits that combination of characters.

EXT:RUN-PROGRAM shouldn't be looking at the file at all, it should 
redirect the process's stdin to the file exactly the same way that the 
shell does.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
From: dpapathanasiou
Subject: Re: Signaling End of Stream: Special Character Constraints?
Date: 
Message-ID: <1135292874.703274.272250@g47g2000cwa.googlegroups.com>
Hi Barry,

Thanks for your feedback, and sorry if some of my examples were a bit
confused (some clarification follows below).

Fortunately, though, I have discovered the problem: since the expat
wrapper binary ("elements") reads from stdin and writes to stdout, I
have to specify that CMUCL's ext:run-program executes my "elements"
subprocess under a PTY
(http://article.gmane.org/gmane.lisp.cmucl.general/1553/match=ext+process+output).

Rewriting (parse-xml-from-file) to use :pty solved the problem.

Now for some clarification on what I had been doing before that (if
you're interested):

> Your previous message said that the <? ... ?> stuff was in the string
> that you were passing to the Lisp reader.  But now you say it's in the
> XML file, which expat is supposed to translate into Lisp s-expressions.
> So by the time Lisp sees it, these sequences should be gone.
>
> That's why I wanted to see a simpler example that just shows Lisp
> hanging in READ, without any of the expat stuff.

All the hung cases, whether I used read or read-sequence, were reading
the results of ext:run-program (stupidly, I admit, I never thought to
try a simple read not involving expat or ext:run-program).

So in addition to the experiments I decribed earlier, which were files
passed to (parse-xml-from-file), I had written a slightly different
version of that function to accept input from a stream instead:

(defun parse-xml-from-stream (stream)
  "Parse an xml stream using a C wrapper to the expat xml parsing
library and return the contents as a list (S-expression) or an error
string
(an idea by Pierre Mai -- see:
http://www.pmsf.de/resources/lisp/expat.html for more)."
    (let* ((process (ext:run-program *xml-expat-elements* nil
				     :input stream :output :stream :wait nil))
	   (output (ext:process-output process))
	   (result
	    (unwind-protect
	      (read output)
	      (ext:process-wait  process)
	      (ext:process-close process))))
      (if (zerop (ext:process-exit-code process))
	result
	(format nil "Error: Could Not Parse XML"))))

Then, I defined a two simple strings to simulate the rss data:

(defvar *hang-data*
"<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href='/rss.xsl' type='text/xsl' media='screen'?>
<rss>
<channel>
<title>RSS Feeds</title>
<link>http://www.somewhere.com/rss.xml</link>
<description>A description of this RSS Feed.</description>
<item>
<title>First Item</title>
<link>http://www.somewhere.com/item1.xml</link>
<description>Blah, Blah, Blah</description>
<pubDate>Tue, 25 Oct 2005 15:47:29 GMT</pubDate>
</item>
<item>
<title>Second Item</title>
<link>http://www.somewhere.com/item2.xml</link>
<description>More blah blah</description>
<pubDate>Mon, 28 Nov 2005 17:55:42 GMT</pubDate>
</item>
</channel>
</rss>")

(defvar *no-hang-data*
"<?xml version='1.0' encoding='utf-8'?>
<rss>
<channel>
<title>RSS Feeds</title>
<link>http://www.somewhere.com/rss.xml</link>
<description>A description of this RSS Feed.</description>
<item>
<title>First Item</title>
<link>http://www.somewhere.com/item1.xml</link>
<description>Blah, Blah, Blah</description>
<pubDate>Tue, 25 Oct 2005 15:47:29 GMT</pubDate>
</item>
<item>
<title>Second Item</title>
<link>http://www.somewhere.com/item2.xml</link>
<description>More blah blah</description>
<pubDate>Mon, 28 Nov 2005 17:55:42 GMT</pubDate>
</item>
</channel>
</rss>")

Running:

(parse-xml-from-stream (make-string-input-stream *hang-data*))

always hung without returning the prompt (and yes, a ps indicated that
the child process for the "elements" binary had zombied).

But, running:

(parse-xml-from-stream (make-string-input-stream *no-hang-data*))

returned immediately, with an S expression of the parsed results, as
expected.

So, not knowing about the PTY issue with CMUCL's ext:run-program
extension, that's why I posted originally: the only tangible difference
between all those experiments, whether using read or read-sequence was
the second xml header line in the two samples.
From: Barry Margolin
Subject: Re: Signaling End of Stream: Special Character Constraints?
Date: 
Message-ID: <barmar-3789CD.19401122122005@comcast.dca.giganews.com>
In article <························@g47g2000cwa.googlegroups.com>,
 "dpapathanasiou" <···················@gmail.com> wrote:

> Hi Barry,
> 
> Thanks for your feedback, and sorry if some of my examples were a bit
> confused (some clarification follows below).
> 
> Fortunately, though, I have discovered the problem: since the expat
> wrapper binary ("elements") reads from stdin and writes to stdout, I
> have to specify that CMUCL's ext:run-program executes my "elements"
> subprocess under a PTY
> (http://article.gmane.org/gmane.lisp.cmucl.general/1553/match=ext+process+outp
> ut).
> 
> Rewriting (parse-xml-from-file) to use :pty solved the problem.

This sounds like the issue is with stdio buffering in the expat program, 
not anything specific to CMUCL.  You would probably see the same issue 
in a C program using popen().

By default, stdio fully buffers output to a pipe, while output to a 
terminal is line-buffered.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
From: dpapathanasiou
Subject: Re: Signaling End of Stream: Special Character Constraints?
Date: 
Message-ID: <1135342286.147339.163100@g14g2000cwa.googlegroups.com>
> This sounds like the issue is with stdio buffering in the expat program,
> not anything specific to CMUCL.  You would probably see the same issue
> in a C program using popen().

Actually, I thought so too, initially: I played with different
combinations of buffer size and xml file size, but no matter what I did
to the C wrapper, it always worked properly when invoked from the
command line.

It was only when invoked via ext:run-program did it hang sometimes, and
the only thing all those hangs had in common was the second xml header
line.

I saw someone else running CMUCL's ext:run-program experienced the same
phenomenon: replace "identify" with "expat" and "length of string
argument" with "second xml header sequence" in his post, and he's
describing my problem:
http://thread.gmane.org/gmane.lisp.cmucl.devel/3571

Also, just like he experienced, if I replace output :stream with output
t, ext:run-program *never* hangs, regardless of the input.

> By default, stdio fully buffers output to a pipe, while output to a
> terminal is line-buffered.

Thanks for this clarification: I need to look a little deeper into what
the expat wrapper and ext:run-program function are doing in terms of
I/O interaction before I can be sure that I've got this resolved.