code critique

From: Daniel Leidisch
Subject: code critique
Date: Wed, 30 May 2007 09:15:15 +0000
Message-ID: <f3jfb3$1n8$00$1@news.t-online.com>

For educational purposes (resp. fun) I wrote some code for the cgi
GET method. It works like this:

> (get-parameter 'username) => "John"

Since I'm learning Lisp on my own, I would highly appreciate any tips
concerning my code. What could be done better, what is already ok?

Regards,

dhl


(defun convert-hex-encoded-chars (string)
  "Converts %HEX encoded chars found in string."
  (if (cl-ppcre:scan "%.." string)
      (cl-ppcre:register-groups-bind
          (before match after) ("([^%]*)(%..)(.*)" string)
        (concatenate 'string before
                     (string (code-char
                              (parse-integer (subseq match 1) :radix 16)))
                     (convert-hex-encoded-chars after)))
      string))

(defun split-query-string (query-string)
  "Splits a query-string and returns a list of variable=value assignments."
  (cl-ppcre:split "&" (substitute #\Space #\+
                                  (convert-hex-encoded-chars query-string))))

(defun get-parameters (&optional
                       (query-string
                        (osicat:environment-variable "QUERY_STRING")))
  "Returns an alist of (variable . value) pairs for a cgi query-string, which
may be given as an optional parameter. Otherwise, the environment-variable
QUERY_STRING is used."
  (loop for variable in (split-query-string query-string)
     collect (cl-ppcre:register-groups-bind
                 (key value) ("(.*)=(.*)" variable)
               (cons (intern (string-upcase key))
                     value))))

(defparameter *parameters* (get-parameters)
  "Alist of QUERY_STRING variables.")

(defun get-parameter (key)
  "Returns the value of a QUERY_STRING variable named key."
  (cdr (assoc key *parameters*)))

Re: code critique Richard M Kreuter
- Re: code critique Daniel Leidisch
  - Re: code critique Vassil Nikolov
  - Re: code critique Richard M Kreuter
    - Re: code critique Pascal Bourguignon
      - Re: code critique Richard M Kreuter
        Re: code critique Pascal Bourguignon
        Re: code critique Richard M Kreuter
      - Re: code critique Edi Weitz

From: Richard M Kreuter
Subject: Re: code critique
Date: Wed, 30 May 2007 13:24:27 +0000
Message-ID: <87r6oyigvo.fsf@tan-ru.localdomain>

Daniel Leidisch <····@leidisch.net> writes:

> Since I'm learning Lisp on my own, I would highly appreciate any tips
> concerning my code. What could be done better, what is already ok?
>
> (defun convert-hex-encoded-chars (string)
>   "Converts %HEX encoded chars found in string."
>   (if (cl-ppcre:scan "%.." string)
>       (cl-ppcre:register-groups-bind
>           (before match after) ("([^%]*)(%..)(.*)" string)
>         (concatenate 'string before
>                      (string (code-char
>                               (parse-integer (subseq match 1) :radix 16)))
>                      (convert-hex-encoded-chars after)))
>       string))

This is fine; the following are just a few suggestions:

* it may be more efficient to use a string-output-stream than to
  repeatedly construct intermediate strings.  See
  WITH-OUTPUT-TO-STRING.

* A portability caveat: ANSI Common Lisp does not require characters
  to be encoded in ASCII or any other encoding, and so CODE-CHAR may
  not do what you expect here.

* While it's evident when you look at the code for a few seconds (if
  one knows regular expressions), neither the name of the function nor
  the docstring actually says which way this function "converts"; it
  could be "convert /from/", or "convert /to/".  Why not name it
  something like DECODE-HEX-ENCODED-CHARACTERS, or
  DECODE-URLENCODED-STRING?

* Note that you don't really need regular expressions here: you can go
  over characters in the string fairly easily with LOOP or DOTIMES, or
  by constructing a string-input-stream and looping with READ-CHAR,
  decoding and accumulating the first two characters after a percent
  sign, and accumulating other characters verbatim.  (This isn't a
  claim that regular expressions are to be avoided: rather, that if
  you're looking for exercises, then since you demonstrate facility
  with regular expressions already, you might like to try writing this
  routine using only standard CL routines.)

> (defun split-query-string (query-string)
>   "Splits a query-string and returns a list of variable=value assignments."
>   (cl-ppcre:split "&" (substitute #\Space #\+
>                                   (convert-hex-encoded-chars query-string))))

If you split after decoding, what happens in case one of the decoded
variables or values contains an ampersand?  For example:

(split-query-string "foo=b%26r")

> (defun get-parameters (&optional
>                        (query-string
>                         (osicat:environment-variable "QUERY_STRING")))
>   "Returns an alist of (variable . value) pairs for a cgi query-string, which
> may be given as an optional parameter. Otherwise, the environment-variable
> QUERY_STRING is used."
>   (loop for variable in (split-query-string query-string)
>      collect (cl-ppcre:register-groups-bind
>                  (key value) ("(.*)=(.*)" variable)
>                (cons (intern (string-upcase key))
>                      value))))

Likewise, if you split variables from values after decoding, what
happens in case one of the decoded variables or values contains an
equal sign?  For example:

(get-parameters "foo=b%3Dr")

--
RmK

From: Daniel Leidisch
Subject: Re: code critique
Date: Wed, 30 May 2007 15:13:27 +0000
Message-ID: <f3k4an$ugm$01$1@news.t-online.com>

In article <··············@tan-ru.localdomain> you wrote:
> Daniel Leidisch <····@leidisch.net> writes:
>
>> Since I'm learning Lisp on my own, I would highly appreciate any tips
>> concerning my code. What could be done better, what is already ok?
>>
>> (defun convert-hex-encoded-chars (string)
>>   "Converts %HEX encoded chars found in string."
>>   (if (cl-ppcre:scan "%.." string)
>>       (cl-ppcre:register-groups-bind
>>           (before match after) ("([^%]*)(%..)(.*)" string)
>>         (concatenate 'string before
>>                      (string (code-char
>>                               (parse-integer (subseq match 1) :radix
>>                               16)))
>>                      (convert-hex-encoded-chars after)))
>>       string))
>
> This is fine; the following are just a few suggestions:
>
> * it may be more efficient to use a string-output-stream than to
>  repeatedly construct intermediate strings.  See
>  WITH-OUTPUT-TO-STRING.

Although performance was not my foremost concern, that seems to be
a good tip. I'll keep it in mind, and maybe I'll change the code
according to your suggestion.

> * A portability caveat: ANSI Common Lisp does not require characters
>  to be encoded in ASCII or any other encoding, and so CODE-CHAR may
>  not do what you expect here.

Do you have any suggestions as to how to implement this
more portably?  After a quick (apropos 'ascii) I recognized
that all fbound symbols are in the SB-IMPL package (sbcl is my
cl-implementation of choice), so I assume that there is no standard
way to do it. The first thing that came to my mind was to add an
ascii table lookup myself. Is there a better solution?

> * While it's evident when you look at the code for a few seconds (if
>  one knows regular expressions), neither the name of the function nor
>  the docstring actually says which way this function "converts"; it
>  could be "convert /from/", or "convert /to/".  Why not name it
>  something like DECODE-HEX-ENCODED-CHARACTERS, or
>  DECODE-URLENCODED-STRING?

You're right. I like the names you proposed, so they are used now.

> * Note that you don't really need regular expressions here: you can go
>  over characters in the string fairly easily with LOOP or DOTIMES, or
>  by constructing a string-input-stream and looping with READ-CHAR,
>  decoding and accumulating the first two characters after a percent
>  sign, and accumulating other characters verbatim.  (This isn't a
>  claim that regular expressions are to be avoided: rather, that if
>  you're looking for exercises, then since you demonstrate facility
>  with regular expressions already, you might like to try writing this
>  routine using only standard CL routines.)

That would make an interesting exercise, indeed. I think I'll give
it a try.

>> (defun split-query-string (query-string)
>>   "Splits a query-string and returns a list of variable=value
>>   assignments."
>>   (cl-ppcre:split "&" (substitute #\Space #\+
>>                                   (convert-hex-encoded-chars
>>                                   query-string))))
>
> If you split after decoding, what happens in case one of the decoded
> variables or values contains an ampersand?  For example:
>
> (split-query-string "foo=b%26r")
>
>> (defun get-parameters (&optional
>>                        (query-string
>>                         (osicat:environment-variable
>>                         "QUERY_STRING")))
>>   "Returns an alist of (variable . value) pairs for a cgi
>>   query-string, which
>> may be given as an optional parameter. Otherwise, the
>> environment-variable
>> QUERY_STRING is used."
>>   (loop for variable in (split-query-string query-string)
>>      collect (cl-ppcre:register-groups-bind
>>                  (key value) ("(.*)=(.*)" variable)
>>                (cons (intern (string-upcase key))
>>                      value))))
>
> Likewise, if you split variables from values after decoding, what
> happens in case one of the decoded variables or values contains an
> equal sign?  For example:
> (get-parameters "foo=b%3Dr")

Cripes, that's awkward. Of course you're right. In my defence: To
me the major problem was implementing convert-hex-encoded-chars,
so I seem to have become sloppy after it worked. Lack of discipline,
I guess. Now it works!

Yet again, thank you very much for your help -- it is appreciated.


Regards,

dhl

>
> --
> RmK

Here is a quick overhaul, I hope that there are no major slips of
the pen:

(defun decode-urlencoded-string (string)
  "Decodes an urlencoded string: Occurrences of %HEX encoding are
transformed to the represented characters and + is substituted by
#\Space."
  (substitute #\Space #\+
              (decode-hex-encoded-chars string)))

(defun decode-hex-encoded-chars (string)
  "Occurrences of %HEX in stringencoding are transformed to the
represented characters."
  (if (cl-ppcre:scan "%.." string)
      (cl-ppcre:register-groups-bind
          (before match after) ("([^%]*)(%..)(.*)" string)
        (concatenate 'string before
                     (string (code-char
                              (parse-integer (subseq match 1) :radix 16)))
                     (decode-hex-encoded-chars after)))
      string))

(defun get-parameters (&optional
                       (query-string
                        (osicat:environment-variable "QUERY_STRING")))
  "Returns an alist of (variable . value) pairs for a cgi query-string,
which may be given as an optional parameter. Otherwise, the environment-
variable QUERY_STRING is used."
  (loop for variable in (cl-ppcre:split "&" query-string)
     collect (cl-ppcre:register-groups-bind
                 (key value) ("(.*)=(.*)" variable)
               (cons (intern (string-upcase (decode-urlencoded-string key)))
                     (decode-urlencoded-string value)))))

(defparameter *parameters* (get-parameters)
  "Alist of QUERY_STRING variables.")

(defun get-parameter (key)
  "Returns the value of a QUERY_STRING variable named key."
  (cdr (assoc key *parameters*)))

From: Vassil Nikolov
Subject: Re: code critique
Date: Thu, 31 May 2007 02:46:22 +0000
Message-ID: <kaps4h7ls1.fsf@localhost.localdomain>

On Wed, 30 May 2007 17:13:27 +0200, Daniel Leidisch <····@leidisch.net> said:
| In article <··············@tan-ru.localdomain> you wrote:
|| ...
|| * A portability caveat: ANSI Common Lisp does not require characters
|| to be encoded in ASCII or any other encoding, and so CODE-CHAR may
|| not do what you expect here.

| Do you have any suggestions as to how to implement this
| more portably?  After a quick (apropos 'ascii) I recognized
| that all fbound symbols are in the SB-IMPL package (sbcl is my
| cl-implementation of choice), so I assume that there is no standard
| way to do it. The first thing that came to my mind was to add an
| ascii table lookup myself. Is there a better solution?

  Not really.  Either you or a library writer needs to provide the
  mapping for URL-encoded characters.  (Imagine that your program is
  running in EBCDIC land; how can it have "native" knowledge that,
  say, #x20 (rather than #x40) is the code for #\Space?)

  ---Vassil.


-- 
The truly good code is the obviously correct code.

From: Richard M Kreuter
Subject: Re: code critique
Date: Thu, 31 May 2007 03:25:02 +0000
Message-ID: <87k5upisj5.fsf@tan-ru.localdomain>

Daniel Leidisch <····@leidisch.net> writes:
> In article <··············@tan-ru.localdomain> you wrote:

>> * A portability caveat: ANSI Common Lisp does not require
>> characters to be encoded in ASCII or any other encoding, and so
>> CODE-CHAR may not do what you expect here.
>
> Do you have any suggestions as to how to implement this
> more portably?

Well, this was mostly a fussy "technically this isn't portable" note;
all current implementations do, I think, use a superset of ASCII for
their character repertoires.

In case you are interested in being fussy: a conforming ANSI CL
implementation is only required to support the 96 characters in the
standard-character repertoire, and one of those characters, Newline,
need not map to a single ASCII character.  You can get the standard
character for an ASCII code like this:

(defun ascii-code-char (code)
  (cond ((< 31 code 127)
         (char #.(format nil ··@{~A~}"
                         " !\"#$%&’()*+,-./0123456789:;<=>·@"
                         "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`"
                         "abcdefghijklmnopqrstuvwxyz{|}~")
	       (- code 32)))
        (t ;Here you might handle Newline and the semistandard characters
         (error "Can't find an ASCII character for code ~D" code))))

For the other characters in the ASCII character set, you'll need to
resort to implementation-dependent details.  For Unicode, RFC 3986
defines the URL encoding of a Unicode code point as the URL encoding
of the octets in the UTF-8 encoding of the code point, so if you care
to support this, you'll probably have make your decoder a bit more
complex.  (A quick check leaves me less than confident that all
browsers actually encode Unicode code points according to the RFC,
though.)

--
RmK

From: Pascal Bourguignon
Subject: Re: code critique
Date: Thu, 31 May 2007 12:17:22 +0000
Message-ID: <87tzttdw6l.fsf@thalassa.lan.informatimago.com>

Richard M Kreuter <·······@progn.net> writes:

> Daniel Leidisch <····@leidisch.net> writes:
>> In article <··············@tan-ru.localdomain> you wrote:
>
>>> * A portability caveat: ANSI Common Lisp does not require
>>> characters to be encoded in ASCII or any other encoding, and so
>>> CODE-CHAR may not do what you expect here.
>>
>> Do you have any suggestions as to how to implement this
>> more portably?
>
> Well, this was mostly a fussy "technically this isn't portable" note;
> all current implementations do, I think, use a superset of ASCII for
> their character repertoires.
>
> In case you are interested in being fussy: a conforming ANSI CL
> implementation is only required to support the 96 characters in the
> standard-character repertoire, and one of those characters, Newline,
> need not map to a single ASCII character.  You can get the standard
> character for an ASCII code like this:
>
> (defun ascii-code-char (code)
>   (cond ((< 31 code 127)
>          (char #.(format nil ··@{~A~}"
>                          " !\"#$%&’()*+,-./0123456789:;<=>·@"
>                          "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`"
>                          "abcdefghijklmnopqrstuvwxyz{|}~")
> 	       (- code 32)))
>         (t ;Here you might handle Newline and the semistandard characters
>          (error "Can't find an ASCII character for code ~D" code))))
>
> For the other characters in the ASCII character set, you'll need to
> resort to implementation-dependent details.  

There are NO other character in the ASCII character set.
There are a few control codes in the ASCII codes.



-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

NOTE: The most fundamental particles in this product are held
together by a "gluing" force about which little is currently known
and whose adhesive power can therefore not be permanently
guaranteed.

From: Richard M Kreuter
Subject: Re: code critique
Date: Thu, 31 May 2007 15:58:05 +0000
Message-ID: <87fy5dhto2.fsf@tan-ru.localdomain>

Pascal Bourguignon <···@informatimago.com> writes:

>> For the other characters in the ASCII character set, you'll need to
>> resort to implementation-dependent details.
>
> There are NO other character in the ASCII character set.
> There are a few control codes in the ASCII codes.

While one could define the term "character" in ways that would make
your sentences either true or interesting, ANSI Common Lisp clearly
classifies as characters a handful of entities that you're
discounting: Backspace, Tab, Linefeed, Page, Return, and Rubout.

--
RmK

From: Pascal Bourguignon
Subject: Re: code critique
Date: Thu, 31 May 2007 16:40:32 +0000
Message-ID: <876469djzz.fsf@thalassa.lan.informatimago.com>

Richard M Kreuter <·······@progn.net> writes:

> Pascal Bourguignon <···@informatimago.com> writes:
>
>>> For the other characters in the ASCII character set, you'll need to
>>> resort to implementation-dependent details.
>>
>> There are NO other character in the ASCII character set.
>> There are a few control codes in the ASCII codes.
>
> While one could define the term "character" in ways that would make
> your sentences either true or interesting, ANSI Common Lisp clearly
> classifies as characters a handful of entities that you're
> discounting: Backspace, Tab, Linefeed, Page, Return, and Rubout.

Yes, and interestingly, there's not a 1-1 correspondance between some
of these "characters" and the ASCII codes used to represent them,
since the #\newline "character" can correspond to a sequence of TWO
ASCII codes: #(13 10).

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

NOTE: The most fundamental particles in this product are held
together by a "gluing" force about which little is currently known
and whose adhesive power can therefore not be permanently
guaranteed.

From: Richard M Kreuter
Subject: Re: code critique
Date: Thu, 31 May 2007 17:52:34 +0000
Message-ID: <87bqg0j2xp.fsf@tan-ru.localdomain>

Pascal Bourguignon <···@informatimago.com> writes:
> Richard M Kreuter <·······@progn.net> writes:
>> Pascal Bourguignon <···@informatimago.com> writes:
>>
>>> There are NO other character in the ASCII character set.  There
>>> are a few control codes in the ASCII codes.
>>
>> ANSI Common Lisp clearly classifies as characters a handful of
>> entities that you're discounting: Backspace, Tab, Linefeed, Page,
>> Return, and Rubout.
>
> Yes, and interestingly, there's not a 1-1 correspondance between some
> of these "characters" and the ASCII codes used to represent them,
> since the #\newline "character" can correspond to a sequence of TWO
> ASCII codes: #(13 10).

This is precisely why I left Newline out of that list.  Backspace,
Tab, Linefeed, Page, Return, and Rubout, intuitively map bijectively
to ASCII's bs, ht, nl, np, cr, and del, respectively.

From: Edi Weitz
Subject: Re: code critique
Date: Thu, 31 May 2007 12:34:32 +0000
Message-ID: <uvee92muf.fsf@agharta.de>

On Thu, 31 May 2007 14:17:22 +0200, Pascal Bourguignon <···@informatimago.com> wrote:

> Richard M Kreuter <·······@progn.net> writes:
>
>> For the other characters in the ASCII character set, you'll need to
>> resort to implementation-dependent details.
>
> There are NO other character in the ASCII character set.

Says who?  The abstract on the ANSI website[1] for example talks about
"128 characters (control characters and graphics characters such as
letters, digits, and symbols) with their coded representation."  There
are lots of other references which call all of these 128 thingies
"characters."  In fact, you yourself just said "ASCII /character/
set," didn't you?

[1] http://webstore.ansi.org/ansidocstore/product.asp?sku=ANSI+INCITS+4-1986+(R2002)

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")