Problem with CLISP and APACHE

From: Andreas Hinze
Subject: Problem with CLISP and APACHE
Date: Mon, 30 Dec 2002 15:06:45 +0000
Message-ID: <3E106105.A7730F25@snafu.de>

Hi all
I'm playing around with CLISP as a replacement for PERL in CGI applications.
It works very well but now i'm running into a problem that i can't figure out.

Look at the two CGI scripts below. They only differ in one line containing
a comment and a &uuml; char (\"u for tex users).

umlaut1.cgi:
  #!/usr/bin/clisp
  (load "html")
  (setq *character-replace-function* nil)
  (format t "Content-type: text/html~%~%" )
  (format t "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">~%")

  (output-html (html
	        (head (title "Looking for the bug"))
	        (body
	         (h1 "Ein &uuml;"))))

umlaut2.cgi:
  #!/usr/bin/clisp
  (load "html")
  (setq *character-replace-function* nil)
  (format t "Content-type: text/html~%~%" )
  (format t "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">~%")
  ;�
  (output-html (html
	        (head (title "Looking for the bug"))
	        (body
	         (h1 "Ein &uuml;"))))

Calling the first script (umlaut1.cgi) works fine. Calling the second script
(umlaut2.cgi) doesn't provide any visible output in the browser.
When i look into the browser source i find, that only the first line is
transfered to the browser (that is:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

In the Apache Error log i found a line

*** - invalid byte #xFC in CHARSET:ASCII conversion

every time i call umlaut2.cgi. 

A short grep over the CLISP source shows that it seems to be a CLISP error message.
However, when i run both CGI scripts from the command line, with
  clisp umlaut1.cgi
  clisp umlaut2.cgi
they output the same string (which is nice & working HTML if you ignore the first Content-type
line - this is only for Apache). Moreover i can't see any kind of error message from CLISP.

And after all: The critical character is inside a comment !

Can please anyone give me a hint whats going on here ?

The motivation for having these char in the file is, that my functions can provide
an automatic translation of localized characters (here ISO8859-1) to html. But if i have
these special chars in a string (not only in a comment) it shows the behaivor described
above. I brought the problem down to the situation above where the character in question
is no more in a string but in a comment only, so that my translator doesn't see it. However,
the problem remains.

Please reply to the newsgroup. My EMail is currently out-of-order. 

Thanks in advance
AHz

P.S: Some technical details:

  I'm using CLISP 2.30  build with ./makemake --with-gettext -with-dynamic-ffi >Makefile

  *features* is
  (:CLOS :LOOP :COMPILER :CLISP :ANSI-CL :COMMON-LISP :LISP=CL :INTERPRETER :SOCKETS 
   :GENERIC-STREAMS :LOGICAL-PATHNAMES :SCREEN :FFI :UNICODE :BASE-CHAR=CHARACTER :PC386 :UNIX)

  The Server is APACHE/1.3.23 (Unix) on Suse Linux 8.0

Re: Problem with CLISP and APACHE Frank A. Adrian
Re: Problem with CLISP and APACHE Kaz Kylheku
Re: Problem with CLISP and APACHE Eric Marsden
Re: Problem with CLISP and APACHE Daniel Barlow
Re: Problem with CLISP and APACHE Pascal Bourguignon
Re: Problem with CLISP and APACHE Andreas Hinze
- Re: Problem with CLISP and APACHE Matthew Danish
  - Re: Problem with CLISP and APACHE Thomas F. Burdick
    - Re: Problem with CLISP and APACHE Andreas Hinze

From: Frank A. Adrian
Subject: Re: Problem with CLISP and APACHE
Date: Mon, 30 Dec 2002 15:47:56 +0000
Message-ID: <MUZP9.7$TN.19610@news.uswest.net>

Andreas Hinze wrote:

>   ;�

The character after the semicolon is not a standard character.  As such, an 
implmentation is allowed to handle it as it wishes.  In this case, the 
implmentation seems to choose to flag non-ASCII characters as invalid.  
Talk to the CLisp implementors.  It's up to them whether or not their 
implmentation supports the full range of ISO-85859-1 characters (or any 
other character set for that matter).

faa

From: Kaz Kylheku
Subject: Re: Problem with CLISP and APACHE
Date: Mon, 30 Dec 2002 19:54:46 +0000
Message-ID: <cf333042.0212301154.1da20a2b@posting.google.com>

Andreas Hinze <···@snafu.de> wrote in message news:<·················@snafu.de>...
> In the Apache Error log i found a line
> 
> *** - invalid byte #xFC in CHARSET:ASCII conversion
> 
> every time i call umlaut2.cgi. 

Your LANG environment variable is not set, so your text streams are
restricted to the 7 bit USASCII character set.

> A short grep over the CLISP source shows that it seems to be a CLISP error message.
> However, when i run both CGI scripts from the command line, with
>   clisp umlaut1.cgi
>   clisp umlaut2.cgi
> they output the same string (which is nice & working HTML if you ignore the first Content-type

That's because your LANG environment variable is set in that context.

From: Eric Marsden
Subject: Re: Problem with CLISP and APACHE
Date: Mon, 30 Dec 2002 15:46:20 +0000
Message-ID: <wzi7kdrldbn.fsf@melbourne.laas.fr>

>>>>> "ah" == Andreas Hinze <···@snafu.de> writes:

  ah> Look at the two CGI scripts below. They only differ in one line
  ah> containing a comment and a &uuml; char (\"u for tex users).

  ah> In the Apache Error log i found a line
  ah> 
  ah> *** - invalid byte #xFC in CHARSET:ASCII conversion
  ah> 
  ah> every time i call umlaut2.cgi.

please read the CLISP FAQ, at <URL:http://clisp.cons.org/faq.html>.

You need either to get Apache to invoke CLISP with appropriate values
for the locale-related environment variables LC_ALL, LC_CTYPE, LANG,
or to invoke CLISP with an appropriate value for the -E commandline
option.

-- 
Eric Marsden                          <URL:http://www.laas.fr/~emarsden/>

From: Daniel Barlow
Subject: Re: Problem with CLISP and APACHE
Date: Mon, 30 Dec 2002 18:57:17 +0000
Message-ID: <874r8v72sy.fsf@noetbook.telent.net>

Andreas Hinze <···@snafu.de> writes:

> Look at the two CGI scripts below. They only differ in one line containing
> a comment and a &uuml; char (\"u for tex users).
[...]
> Calling the first script (umlaut1.cgi) works fine. Calling the second script
> (umlaut2.cgi) doesn't provide any visible output in the browser.
[...]
> In the Apache Error log i found a line
>
> *** - invalid byte #xFC in CHARSET:ASCII conversion
>
> every time i call umlaut2.cgi. 
[...]
> However, when i run both CGI scripts from the command line, with
>   clisp umlaut1.cgi
>   clisp umlaut2.cgi
> they output the same string (which is nice & working HTML if you ignore the first Content-type

If it works from the shell but not from CGI, the first place to check
is the shell environment: specifically, you probably have some kind of
LANG or LC_* variable set to indicate what character set you're using,
but apache (which was started by init) doesn't.

I try to avoid posix locales as far as possible, so I see the error
message in the shell too -

:; set|grep LC
MAILCHECK=60
:; set|grep LANG
LANG=C
:; od f.cgi 
0000000 020040 176073 000012
0000005
:; clisp f.cgi 

*** - invalid byte #xFC in CHARSET:ASCII conversion


-dan

-- 

   http://www.cliki.net/ - Link farm for free CL-on-Unix resources

From: Pascal Bourguignon
Subject: Re: Problem with CLISP and APACHE
Date: Tue, 31 Dec 2002 12:10:41 +0000
Message-ID: <87ptrifkxq.fsf@thalassa.informatimago.com>

Andreas Hinze <···@snafu.de> writes:

> Hi all
> I'm playing around with CLISP as a replacement for PERL in CGI applications.
> It works very well but now i'm running into a problem that i can't figure out.
> 
> Look at the two CGI scripts below. They only differ in one line containing
> a comment and a &uuml; char (\"u for tex users).
> 
> umlaut1.cgi:
>   #!/usr/bin/clisp
>   (load "html")
>   (setq *character-replace-function* nil)
>   (format t "Content-type: text/html~%~%" )
>   (format t "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">~%")
> 
>   (output-html (html
> 	        (head (title "Looking for the bug"))
> 	        (body
> 	         (h1 "Ein &uuml;"))))
> 
> umlaut2.cgi:
>   #!/usr/bin/clisp
>   (load "html")
>   (setq *character-replace-function* nil)
>   (format t "Content-type: text/html~%~%" )
>   (format t "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">~%")
>   ;�
>   (output-html (html
> 	        (head (title "Looking for the bug"))
> 	        (body
> 	         (h1 "Ein &uuml;"))))
> 
> Calling the first script (umlaut1.cgi) works fine. Calling the second script
> (umlaut2.cgi) doesn't provide any visible output in the browser.
> When i look into the browser source i find, that only the first line is
> transfered to the browser (that is:
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> 
> In the Apache Error log i found a line
> 
> *** - invalid byte #xFC in CHARSET:ASCII conversion

So it  seems that the  html functions convert  &uuml; into a  � before
outputing the HTML code...

Apache is saying that the document  returned by the CGI declared to be
ASCII (remember, 7bit), but it  contained a #xFC character that is not
ASCII.

Try adding this meta in the header section:

  <meta http-equiv="Content-Type"  content="text/html; charset=iso-8859-1">

Or use (format t "Content-type: text/html; charset=iso-8859-1~%~%" )

Then the  document would be  correct, Apache should not  complain, and
the browsers neither.


-- 
__Pascal_Bourguignon__                   http://www.informatimago.com/
----------------------------------------------------------------------
There is a fault in reality. Do not adjust your minds. -- Salman Rushdie

From: Andreas Hinze
Subject: Re: Problem with CLISP and APACHE
Date: Tue, 31 Dec 2002 14:24:58 +0000
Message-ID: <3E11A8BA.884652C8@snafu.de>

Hi all,
thanks for the help. With your hints i found out what the problem is.
In fact it wasd related to the LANG var in the environment. It is set
in my environment but not in the environment that APACHE is using. So
when APACHE was calling CLISP my scripts run without the correct language
definition.
I changed the headline of my scripts to "#!/usr/bin/clisp -E ISO8859-1"
and now all works fine.

Thanks again and have a happy new year.

Sincerly
AHz

From: Matthew Danish
Subject: Re: Problem with CLISP and APACHE
Date: Tue, 31 Dec 2002 15:39:32 +0000
Message-ID: <20021231103932.A12928@lain.cheme.cmu.edu>

On Tue, Dec 31, 2002 at 03:24:58PM +0100, Andreas Hinze wrote:
> I changed the headline of my scripts to "#!/usr/bin/clisp -E ISO8859-1"

I recall from somewhere that having more than one argument to the
program on your #! line is not supported everywhere.  Just something to
keep in mind if it breaks unexpectedly.

-- 
; Matthew Danish <·······@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."

From: Thomas F. Burdick
Subject: Re: Problem with CLISP and APACHE
Date: Wed, 01 Jan 2003 20:13:59 +0000
Message-ID: <xcv65t8iq60.fsf@conquest.OCF.Berkeley.EDU>

Matthew Danish <·······@andrew.cmu.edu> writes:

> On Tue, Dec 31, 2002 at 03:24:58PM +0100, Andreas Hinze wrote:
> > I changed the headline of my scripts to "#!/usr/bin/clisp -E ISO8859-1"
> 
> I recall from somewhere that having more than one argument to the
> program on your #! line is not supported everywhere.  Just something to
> keep in mind if it breaks unexpectedly.

As someone who has used such systems, I'm smitten with one particular
feature of CLISP: use hard-space characters, and clisp will parse them
as normal spaces, allowing you to pass as many command-line options as
you can stuff in 255 characters.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Andreas Hinze
Subject: Re: Problem with CLISP and APACHE
Date: Mon, 06 Jan 2003 13:03:07 +0000
Message-ID: <3E197E8B.F4BB1859@smi.de>

Hi all,
thanks for that hints. Until now all works very well. 
I never realized the problem of having more than one argument after a #!.
However, this is only needed for the automatic charset convertion of my tool.
Only a minor feature.

Sincerly
AHz

"Thomas F. Burdick" wrote:
> 
> Matthew Danish <·······@andrew.cmu.edu> writes:
> 
> > On Tue, Dec 31, 2002 at 03:24:58PM +0100, Andreas Hinze wrote:
> > > I changed the headline of my scripts to "#!/usr/bin/clisp -E ISO8859-1"
> >
> > I recall from somewhere that having more than one argument to the
> > program on your #! line is not supported everywhere.  Just something to
> > keep in mind if it breaks unexpectedly.
> 
> As someone who has used such systems, I'm smitten with one particular
> feature of CLISP: use hard-space characters, and clisp will parse them
> as normal spaces, allowing you to pass as many command-line options as
> you can stuff in 255 characters.
> 
> --
>            /|_     .-----------------------.
>          ,'  .\  / | No to Imperialist war |
>      ,--'    _,'   | Wage class war!       |
>     /       /      `-----------------------'
>    (   -.  |
>    |     ) |
>   (`-.  '--.)
>    `. )----'