From: tichy
Subject: style question: FORMAT vs VECTOR-PUSH-EXTEND
Date: 
Message-ID: <dnefij$kj8$1@nemesis.news.tpi.pl>
Hi.

What is preferable ?

1.
[.....]
(format result
         (lambda (stream)
           (write-string (get (funcall entity-source char)
                              (getf '(:named :&name :hex :hex-str :dec :dec-str) type))
                         stream)))
[.....]

2.
[.....]
(loop for c across (get (funcall entity-source char)
                         (getf '(:named :&name :hex :hex-str :dec :dec-str) type))
       do (vector-push-extend c result))
[.....]

3. any other ideas ?

TIA, Szymon.

From: Pascal Bourguignon
Subject: Re: style question: FORMAT vs VECTOR-PUSH-EXTEND
Date: 
Message-ID: <87hd9h5d18.fsf@thalassa.informatimago.com>
tichy <···········@o2.pl> writes:

> Hi.
>
> What is preferable ?
>
> 1.
> [.....]
> (format result
>          (lambda (stream)
>            (write-string (get (funcall entity-source char)
>                               (getf '(:named :&name :hex :hex-str :dec :dec-str) type))
>                          stream)))
> [.....]
>
> 2.
> [.....]
> (loop for c across (get (funcall entity-source char)
>                          (getf '(:named :&name :hex :hex-str :dec :dec-str) type))
>        do (vector-push-extend c result))
> [.....]
>
> 3. any other ideas ?
>
> TIA, Szymon.


The second one.  The first doesn't even work!

[74]> (setf result (make-array 0 :adjustable t :fill-pointer 0))
#()
[75]> (loop for c across #(a 1 b "Hi" c) do (vector-push-extend c result))
NIL
[76]> result
#(A 1 B "Hi" C)


[78]> (setf result (make-array 0 :adjustable t :fill-pointer 0))
#()
[79]> (format result
           (lambda (stream)
             (write-string  #(a 1 b "Hi" c)
                            stream)))

*** - The destination argument #() is invalid (not NIL or T or a stream or a
      string).
The following restarts are available:
ABORT          :R1      ABORT
Break 1 [80]> :q
[81]> 


Perhaps more context would be useful don't you think?

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"Specifications are for the weak and timid!"
From: tichy
Subject: Re: style question: FORMAT vs VECTOR-PUSH-EXTEND
Date: 
Message-ID: <dng487$uo$1@nemesis.news.tpi.pl>
Pascal Bourguignon wrote:
> tichy <···········@o2.pl> writes:
>> 1.
>> (format result
>>          (lambda (stream)
>>            (write-string (get (funcall entity-source char)
>>                               (getf '(:named :&name :hex :hex-str :dec :dec-str) type))
>>                          stream)))
>> 2.
>> (loop for c across (get (funcall entity-source char)
>>                          (getf '(:named :&name :hex :hex-str :dec :dec-str) type))
>>        do (vector-push-extend c result))
>> 3. any other ideas ?
.....
> The second one.  The first doesn't even work!
> 
> [74]> (setf result (make-array 0 :adjustable t :fill-pointer 0))
> #()
> [75]> (loop for c across #(a 1 b "Hi" c) do (vector-push-extend c result))
> NIL
> [76]> result
> #(A 1 B "Hi" C)
> 
> 
> [78]> (setf result (make-array 0 :adjustable t :fill-pointer 0))
> #()
> [79]> (format result
>            (lambda (stream)
>              (write-string  #(a 1 b "Hi" c)
>                             stream)))
> 
> *** - The destination argument #() is invalid (not NIL or T or a stream or a
>       string).
> The following restarts are available:
> ABORT          :R1      ABORT
> Break 1 [80]> :q
> [81]> 

(get ...) always returns string or NIL (you found a bug without looking at the
rest of the code :) I just fixed this.

> Perhaps more context would be useful don't you think?

I wrote code to do substitution in strings: character -> (x)html entity.

Code I asked about is used in function 'subst-char-to-entity'
(it is located near the end of the code and marked by comment: #| * here * |#).

File with predefined entities is about 500 lines long, it can be
downloaded from: http://jogger.pl/custom/sb/files/entities.lisp


Regards, Szymon.

;;; --------------------------------------------------------

;; code not (well) tested yet, examples:
;; comment are welcomed.

#|

CL-USER> (subst-char-to-entity "<foo> & <bar>") ; default charset is :base-xml = <>&"'

"&lt;foo&gt; &amp; &lt;bar&gt;"

CL-USER> (subst-char-to-entity "foo & b � ar <" :char-set '(:base-xml "�"))

"foo &amp; b &euro; ar &lt;"

CL-USER> (subst-char-to-entity "foo &euro; b � ar <" :char-set :all-standard :omit "euro")

"foo &euro; b &euro; ar &lt;"

CL-USER> (subst-char-to-entity "< &#38; >" :omit :numeric)

"&lt; &#38; &gt;"

CL-USER> (subst-char-to-entity "foo &euro; b � ar <" :char-set :all-standard)

"foo &amp;euro; b &euro; ar &lt;"

CL-USER> (subst-char-to-entity "foo &euro; b � ar <" :char-set :all-standard :omit :all-standard)

"foo &euro; b &euro; ar &lt;"

CL-USER> (subst-char-to-entity "foo & b � ar <" :char-set '(#\&))

"foo &amp; b � ar <"

|#

;; :char-set keyword parameter can be:
;;   :base-xml
;;   :all-standard
;;   :special
;;   :symbol
;;   :8859-1
;;   string
;;   character
;; or sequence of that keywords (above) and/or strings and/or characters.

;; :omit keyword parameter can be:
;;   :all, :all-named, :hex, :dec, :numeric, :base-xml, :all-standard, :special,
;;   :symbol, :8859-1, string
;; or LIST of that keywords and/or strings.


(defun memq (item list) (member item list :test #'eq))

(defun strconc (&rest strings)
   (apply #'concatenate 'string strings))

(let ((h (make-hash-table)))
   (do-external-symbols (s :ent-all)
                        (setf (gethash (get s :char) h) s))
   (defun char-to-entity-refresh ()
     (clrhash h)
     (do-external-symbols (s :ent-all)
                          (setf (gethash (get s :char) h) s)))
   (defun char-to-entity (char)
     (gethash char h)))


(defun entityp (string start &aux (ent-begin start)) ; returns (shared) substring or nil.
   (when (< 1 (- (length string) start))
     (let ((result (load-time-value (make-array 0 :adjustable t :element-type 'character))))
       ;; I'm not sure about second argument for load-time-value... can it be T here ?
       (labels ((in-range-p (char range &aux (code (char-code char)))
                  (loop for x in range
                        thereis (etypecase x
                                  (cons    (and (>= code (car x)) (<= code (cadr x))))
                                  (integer (= code x)))))
                (name-start-char-p (char)
                  (in-range-p char '(#x3A (#x41 #x5A) #x5F (#x61 #x7A) (#xC0 #xD6) (#xD8 #xF6)
                                     (#xF8 #x2FF) (#x370 #x37D) (#x37F #x1FFF) (#x200C #x200D)
                                     (#x2070 #x218F) (#x2C00 #x2FEF) (#x3001 #xD7FF) (#xF900 #xFDCF)
                                     (#xFDF0 #xFFFD) (#x10000 #xEFFFF))))
                (name-char-p (char)
                  (or (name-start-char-p char)
                      (in-range-p char '(#x2D #x2E (#x30 #x39) #xB7 (#x0300 #x036F) (#x203F #x2040))))))
         (let* ((first-char (char string start))
                (second-char (char string (1+ start)))
                (pred
                 (cond ((and (eql first-char #\#)
                             (char-equal second-char #\x)
                             (array-in-bounds-p string (+ 2 start))
                             (digit-char-p (char string (+ 2 start)) 16))
                        (incf start)
                        (lambda (char) (digit-char-p char 16)))
                       ((and (eql first-char #\#) (digit-char-p second-char 10))
                        (incf start)
                        (lambda (char) (digit-char-p char 10)))
                       ((name-start-char-p first-char)
                        (when (eql second-char #\;)
                          (return-from entityp
                            (adjust-array result 1
                                          :displaced-to string
                                          :displaced-index-offset start)))
                        #'name-char-p))))
           (when pred
             (loop for index from (1+ start) below (length string)
                   for char = (char string index)
                   do (cond ((eql char #\;)
                             (return-from entityp
                               (adjust-array result (- index ent-begin)
                                             :displaced-to string
                                             :displaced-index-offset ent-begin)))
                            ((not (funcall pred char))
                             (return-from entityp))))))))))


(defun subst-char-to-entity
   (string
    &key omit (char-set :base-xml) (entity-source #'char-to-entity) (type :named)
    &aux user-ents packages in-char-set-p (omit-keywords '(:all :all-named :hex :dec :numeric :base-xml)))
   (if char-set
       (progn
         (cond ((stringp omit)
                (setq user-ents (list omit)
                      omit nil))
               ((keywordp omit)
                (setq omit (list omit))))
         (when omit
           (setq omit
                 (mapcan (lambda (x)
                           (cond ((stringp x) (push x user-ents) nil)
                                 ((memq x omit-keywords) (list x))
                                 ((keywordp x)
                                  (push (case x
                                          (:all-standard (find-package :ent-all))
                                          (:special      (find-package :ent-special))
                                          (:symbol       (find-package :ent-symbol))
                                          (:8859-1       (find-package :ent-8859-1)))
                                        packages)
                                  nil)
                                 (t (error "wrong object in omit: ~A" x))))
                         omit)))
         (setq in-char-set-p
               (labels ((helper (package &aux (h (make-hash-table)))
                          (do-external-symbols (s package) (setf (gethash (get s :char) h) s))
                          (lambda (char) (gethash char h)))
                        (package-to-charset (kw)
                          (ecase kw
                            (:base-xml     (lambda (char) (find char "&'<>\"")))
                            (:all-standard (helper :ent-all))
                            (:special      (helper :ent-special))
                            (:symbol       (helper :ent-symbol))
                            (:8859-1       (helper :ent-8859-1))))
                        (string-to-charset (string &aux (char-set (coerce string 'simple-string)))
                          (lambda (char) (find char char-set)))
                        (character-to-charset (character)
                          (lambda (char) (char= char character))))
                 (etypecase char-set
                   (keyword    (package-to-charset char-set))
                   (string     (string-to-charset char-set))
                   (character  (character-to-charset char-set))
                   (sequence   (let ((char-set
                                      (map 'list
                                           (lambda (x)
                                             (etypecase x
                                               (string (string-to-charset x))
                                               (character (character-to-charset x))
                                               (keyword (package-to-charset x))))
                                           char-set)))
                                 (lambda (char)
                                   (some (lambda (f) (funcall f char)) char-set)))))))
         (let ((result (make-array (floor (* 1.2 (length string))) :fill-pointer 0 :adjustable t :element-type 'character))
               (last 0))
           (loop for index from 0 below (length string)
                 for char = (char string index) do
                 (when (funcall in-char-set-p char)
                   (loop for i from last below index do (vector-push-extend (char string i) result))
                   (if (and (or omit user-ents packages)
                            (eql char #\&)
                            (let* ((entity-name (entityp string (1+ index)))
                                   (entity-type
                                    (and entity-name
                                         (if (eql (char entity-name 0) #\#)
                                             (if (char-equal (char entity-name 1) #\x)
                                                 :hex
                                               :dec)
                                           :named))))
                              (and entity-name
                                   (flet ((entity= (str) (string= entity-name str)))
                                     (or (memq :all omit)
                                         (and (memq :all-named omit) (eq entity-type :named))
                                         (and (memq   :numeric omit) (or (eq entity-type :hex)
                                                                         (eq entity-type :dec)))
                                         (and (memq       :hex omit) (eq entity-type :hex))
                                         (and (memq       :dec omit) (eq entity-type :dec))
                                         (and (memq  :base-xml omit)
                                              (some #'entity= '("amp" "apos" "gt" "lt" "quot")))
                                         (and user-ents
                                              (some #'entity= user-ents))
                                         (and packages
                                              (some (lambda (pac) (find-symbol entity-name pac)) packages)))))))
                       (vector-push-extend #\& result)
                     (let ((entity (get (funcall entity-source char)
                                        (getf '(:named :&name :hex :hex-str :dec :dec-str) type))))
                       (unless entity
                         (setq entity
                               (format nil "&#~[~D~;x~X~];" (ecase type (:dec 0) (:hex 1)) (char-code char))))
                       (loop for c across entity do (vector-push-extend c result)))) #| * here * |#
                   (setq last (1+ index)))
                 finally (loop for i from last below (length string)
                               do (vector-push-extend (char string i) result)))
           result))
     string))

;;; -------------- end.
From: Thomas A. Russ
Subject: Re: style question: FORMAT vs VECTOR-PUSH-EXTEND
Date: 
Message-ID: <ymimzj6nqfo.fsf@sevak.isi.edu>
tichy <···········@o2.pl> writes:
> > Perhaps more context would be useful don't you think?
> 
> I wrote code to do substitution in strings: character -> (x)html entity.
> 
> Code I asked about is used in function 'subst-char-to-entity'
> (it is located near the end of the code and marked by comment: #| * here * |#).

Well, I would favor a stream-based approach.  That would allow doing the
substitution on the fly as you write the (x)html output directly to the
file or network stream.  And if you needed to get a string out, you
could use a string output stream (i.e., WITH-OUTPUT-TO-STRING) to get a
string version.

I generally prefer to use streams when I need to make substitutions in
strings where the substituted value is a different length than what it
substitutes for.  Entity name expansion is one of those cases.

When I've written functions like this in the past, I keep two pointers
into the original string, and use them with WRITE-STRING to copy the
verbatim parts.  When a substitution is called for, you update write the
unchanged part, then the substitution, and then update the pointers.

An untested framework for doing this:

(defun write-string-with-substitutions (string stream substitution-table)
  (let ((max (length string))
        (start 0)
	(end 0)
        sub)
    (loop while (< end max)
          do (setq sub (gethash substitution-table (char string end)))
             (cond (sub
                     (write-string string stream :start start :end  end)
                     (write-string sub stream)
                     (incf end)
                     (setq start end))
                   (t
                     (incf end))))))

Then just use a hashtable of character substitutions.

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: tichy
Subject: Re: style question: FORMAT vs VECTOR-PUSH-EXTEND
Date: 
Message-ID: <dnlaum$2dk$1@nemesis.news.tpi.pl>
Thomas A. Russ wrote:

 > [ ... nice advices ... ]

 > An untested framework for doing this:
 >
 > (defun write-string-with-substitutions (string stream substitution-table)
 >   (let ((max (length string))
 >         (start 0)
 > 	(end 0)
 >         sub)
 >     (loop while (< end max)
 >           do (setq sub (gethash substitution-table (char string end)))
 >              (cond (sub
 >                      (write-string string stream :start start :end  end)
 >                      (write-string sub stream)
 >                      (incf end)
 >                      (setq start end))
 >                    (t
 >                      (incf end))))))
 >
 > Then just use a hashtable of character substitutions.

 > [ ... ]

Many thanks for the ideas. I modified a little your code -- with
FORMAT it can work with both: streams and strings.

Regards, Szymon.

code:

(defparameter *substitution-fn*
   (lambda (char)
     (cdr (assoc char '((#\& . "&amp;") (#\' . "&apos;") (#\" . "&quot;") (#\< . "&lt;") (#\> . "&gt;"))))))


(defun write-string-with-substitutions (string &optional output substitution-fn)
   (assert (or (null output)
	      (streamp output)
	      (and (stringp output)
		   (array-has-fill-pointer-p output))))
   (unless output
     (setq output (make-array 0 :element-type 'character :adjustable t :fill-pointer 0)))
   (unless substitution-fn
     (setq substitution-fn *substitution-fn*))
   (loop with max = (length string) and (start end) of-type fixnum
	while (< end max)
	do (let ((sub (funcall substitution-fn (char string end))))
	     (cond (sub
		    (format output
			    (lambda (stream)
			      (write-string string stream :start start :end end)
			      (write-string sub stream)))
		    (incf end)
		    (setq start end))
		   (t
		    (incf end)))))
   output)


CL-USER> (write-string-with-substitutions "<foo><bar>")
          "&lt;foo&gt;&lt;bar&gt;"

CL-USER> (defparameter *test-stream* (make-string-output-stream))
          *TEST-STREAM*

CL-USER> (write-string-with-substitutions "<foo><bar>" *test-stream*)
          #<SB-IMPL::STRING-OUTPUT-STREAM {B92D5F9}>

CL-USER> (get-output-stream-string *)
          "&lt;foo&gt;&lt;bar&gt;"

;; SBCL does not like this last example (below) -- I think it should accept base
;; and standard characters... but it does not:
;;
;; 'The value "" is not of type (AND (VECTOR CHARACTER) (SATISFIES ARRAY-HAS-FILL-POINTER-P))'
;;
;; CLISP accepts this.

CL-USER> (write-string-with-substitutions "<foo><bar>" (make-array 30 :element-type 'standard-char :fill-pointer 0))
          "&lt;foo&gt;&lt;bar&gt;"
From: Thomas A. Russ
Subject: Re: style question: FORMAT vs VECTOR-PUSH-EXTEND
Date: 
Message-ID: <ymifyowop3i.fsf@sevak.isi.edu>
tichy <···········@o2.pl> writes:

> 
> Thomas A. Russ wrote:
> 
>  > [ ... nice advices ... ]

I did just realize that this doesn't handle the end case properly.  If
the last character is not a substitution, you lose the last bit of
code.  Fortunately, this is easy to fix by adding the following loop
clause:


>  > An untested framework for doing this:
>  >
>  > (defun write-string-with-substitutions (string stream substitution-table)
>  >   (let ((max (length string))
>  >         (start 0)
>  > 	(end 0)
>  >         sub)
>  >     (loop while (< end max)
>  >           do (setq sub (gethash substitution-table (char string end)))
>  >              (cond (sub
>  >                      (write-string string stream :start start :end  end)
>  >                      (write-string sub stream)
>  >                      (incf end)
>  >                      (setq start end))
>  >                    (t
>  >                      (incf end)))

                  finally (write-string string stream :start start :end  end)

                          )))
>  >
>  > Then just use a hashtable of character substitutions.






> 
>  > [ ... ]
> 
> Many thanks for the ideas. I modified a little your code -- with
> FORMAT it can work with both: streams and strings.
> 
> Regards, Szymon.
> 
> code:
> 
> (defparameter *substitution-fn*
>    (lambda (char)
>      (cdr (assoc char '((#\& . "&amp;") (#\' . "&apos;") (#\" . "&quot;") (#\< . "&lt;") (#\> . "&gt;"))))))
> 
> 

Note:  I find it clearer to put default values for optional arguments
into the lambda list itself, but with a large initializer I would be
tempted to either write a helping function or maybe to use your method.

(defun write-string-with-substitutions 
    (string 
     &optional (output (make-array 0 :element-type 'character :adjustable t :fill-pointer 0))
               (substitution-fn *substitution-fn*))

> (defun write-string-with-substitutions (string &optional output substitution-fn)
>    (assert (or (null output)
> 	      (streamp output)
> 	      (and (stringp output)
> 		   (array-has-fill-pointer-p output))))
>    (unless output
>      (setq output (make-array 0 :element-type 'character :adjustable t :fill-pointer 0)))
>    (unless substitution-fn
>      (setq substitution-fn *substitution-fn*))
>    (loop with max = (length string) and (start end) of-type fixnum
> 	while (< end max)
> 	do (let ((sub (funcall substitution-fn (char string end))))
> 	     (cond (sub
> 		    (format output
> 			    (lambda (stream)
> 			      (write-string string stream :start start :end end)
> 			      (write-string sub stream)))
> 		    (incf end)
> 		    (setq start end))
> 		   (t
> 		    (incf end)))))
>    output)

-- 
Thomas A. Russ,  USC/Information Sciences Institute