tichy <···········@o2.pl> writes:
> Hi.
>
> What is preferable ?
>
> 1.
> [.....]
> (format result
> (lambda (stream)
> (write-string (get (funcall entity-source char)
> (getf '(:named :&name :hex :hex-str :dec :dec-str) type))
> stream)))
> [.....]
>
> 2.
> [.....]
> (loop for c across (get (funcall entity-source char)
> (getf '(:named :&name :hex :hex-str :dec :dec-str) type))
> do (vector-push-extend c result))
> [.....]
>
> 3. any other ideas ?
>
> TIA, Szymon.
The second one. The first doesn't even work!
[74]> (setf result (make-array 0 :adjustable t :fill-pointer 0))
#()
[75]> (loop for c across #(a 1 b "Hi" c) do (vector-push-extend c result))
NIL
[76]> result
#(A 1 B "Hi" C)
[78]> (setf result (make-array 0 :adjustable t :fill-pointer 0))
#()
[79]> (format result
(lambda (stream)
(write-string #(a 1 b "Hi" c)
stream)))
*** - The destination argument #() is invalid (not NIL or T or a stream or a
string).
The following restarts are available:
ABORT :R1 ABORT
Break 1 [80]> :q
[81]>
Perhaps more context would be useful don't you think?
--
__Pascal Bourguignon__ http://www.informatimago.com/
"Specifications are for the weak and timid!"
Pascal Bourguignon wrote:
> tichy <···········@o2.pl> writes:
>> 1.
>> (format result
>> (lambda (stream)
>> (write-string (get (funcall entity-source char)
>> (getf '(:named :&name :hex :hex-str :dec :dec-str) type))
>> stream)))
>> 2.
>> (loop for c across (get (funcall entity-source char)
>> (getf '(:named :&name :hex :hex-str :dec :dec-str) type))
>> do (vector-push-extend c result))
>> 3. any other ideas ?
.....
> The second one. The first doesn't even work!
>
> [74]> (setf result (make-array 0 :adjustable t :fill-pointer 0))
> #()
> [75]> (loop for c across #(a 1 b "Hi" c) do (vector-push-extend c result))
> NIL
> [76]> result
> #(A 1 B "Hi" C)
>
>
> [78]> (setf result (make-array 0 :adjustable t :fill-pointer 0))
> #()
> [79]> (format result
> (lambda (stream)
> (write-string #(a 1 b "Hi" c)
> stream)))
>
> *** - The destination argument #() is invalid (not NIL or T or a stream or a
> string).
> The following restarts are available:
> ABORT :R1 ABORT
> Break 1 [80]> :q
> [81]>
(get ...) always returns string or NIL (you found a bug without looking at the
rest of the code :) I just fixed this.
> Perhaps more context would be useful don't you think?
I wrote code to do substitution in strings: character -> (x)html entity.
Code I asked about is used in function 'subst-char-to-entity'
(it is located near the end of the code and marked by comment: #| * here * |#).
File with predefined entities is about 500 lines long, it can be
downloaded from: http://jogger.pl/custom/sb/files/entities.lisp
Regards, Szymon.
;;; --------------------------------------------------------
;; code not (well) tested yet, examples:
;; comment are welcomed.
#|
CL-USER> (subst-char-to-entity "<foo> & <bar>") ; default charset is :base-xml = <>&"'
"<foo> & <bar>"
CL-USER> (subst-char-to-entity "foo & b � ar <" :char-set '(:base-xml "�"))
"foo & b € ar <"
CL-USER> (subst-char-to-entity "foo € b � ar <" :char-set :all-standard :omit "euro")
"foo € b € ar <"
CL-USER> (subst-char-to-entity "< & >" :omit :numeric)
"< & >"
CL-USER> (subst-char-to-entity "foo € b � ar <" :char-set :all-standard)
"foo &euro; b € ar <"
CL-USER> (subst-char-to-entity "foo € b � ar <" :char-set :all-standard :omit :all-standard)
"foo € b € ar <"
CL-USER> (subst-char-to-entity "foo & b � ar <" :char-set '(#\&))
"foo & b � ar <"
|#
;; :char-set keyword parameter can be:
;; :base-xml
;; :all-standard
;; :special
;; :symbol
;; :8859-1
;; string
;; character
;; or sequence of that keywords (above) and/or strings and/or characters.
;; :omit keyword parameter can be:
;; :all, :all-named, :hex, :dec, :numeric, :base-xml, :all-standard, :special,
;; :symbol, :8859-1, string
;; or LIST of that keywords and/or strings.
(defun memq (item list) (member item list :test #'eq))
(defun strconc (&rest strings)
(apply #'concatenate 'string strings))
(let ((h (make-hash-table)))
(do-external-symbols (s :ent-all)
(setf (gethash (get s :char) h) s))
(defun char-to-entity-refresh ()
(clrhash h)
(do-external-symbols (s :ent-all)
(setf (gethash (get s :char) h) s)))
(defun char-to-entity (char)
(gethash char h)))
(defun entityp (string start &aux (ent-begin start)) ; returns (shared) substring or nil.
(when (< 1 (- (length string) start))
(let ((result (load-time-value (make-array 0 :adjustable t :element-type 'character))))
;; I'm not sure about second argument for load-time-value... can it be T here ?
(labels ((in-range-p (char range &aux (code (char-code char)))
(loop for x in range
thereis (etypecase x
(cons (and (>= code (car x)) (<= code (cadr x))))
(integer (= code x)))))
(name-start-char-p (char)
(in-range-p char '(#x3A (#x41 #x5A) #x5F (#x61 #x7A) (#xC0 #xD6) (#xD8 #xF6)
(#xF8 #x2FF) (#x370 #x37D) (#x37F #x1FFF) (#x200C #x200D)
(#x2070 #x218F) (#x2C00 #x2FEF) (#x3001 #xD7FF) (#xF900 #xFDCF)
(#xFDF0 #xFFFD) (#x10000 #xEFFFF))))
(name-char-p (char)
(or (name-start-char-p char)
(in-range-p char '(#x2D #x2E (#x30 #x39) #xB7 (#x0300 #x036F) (#x203F #x2040))))))
(let* ((first-char (char string start))
(second-char (char string (1+ start)))
(pred
(cond ((and (eql first-char #\#)
(char-equal second-char #\x)
(array-in-bounds-p string (+ 2 start))
(digit-char-p (char string (+ 2 start)) 16))
(incf start)
(lambda (char) (digit-char-p char 16)))
((and (eql first-char #\#) (digit-char-p second-char 10))
(incf start)
(lambda (char) (digit-char-p char 10)))
((name-start-char-p first-char)
(when (eql second-char #\;)
(return-from entityp
(adjust-array result 1
:displaced-to string
:displaced-index-offset start)))
#'name-char-p))))
(when pred
(loop for index from (1+ start) below (length string)
for char = (char string index)
do (cond ((eql char #\;)
(return-from entityp
(adjust-array result (- index ent-begin)
:displaced-to string
:displaced-index-offset ent-begin)))
((not (funcall pred char))
(return-from entityp))))))))))
(defun subst-char-to-entity
(string
&key omit (char-set :base-xml) (entity-source #'char-to-entity) (type :named)
&aux user-ents packages in-char-set-p (omit-keywords '(:all :all-named :hex :dec :numeric :base-xml)))
(if char-set
(progn
(cond ((stringp omit)
(setq user-ents (list omit)
omit nil))
((keywordp omit)
(setq omit (list omit))))
(when omit
(setq omit
(mapcan (lambda (x)
(cond ((stringp x) (push x user-ents) nil)
((memq x omit-keywords) (list x))
((keywordp x)
(push (case x
(:all-standard (find-package :ent-all))
(:special (find-package :ent-special))
(:symbol (find-package :ent-symbol))
(:8859-1 (find-package :ent-8859-1)))
packages)
nil)
(t (error "wrong object in omit: ~A" x))))
omit)))
(setq in-char-set-p
(labels ((helper (package &aux (h (make-hash-table)))
(do-external-symbols (s package) (setf (gethash (get s :char) h) s))
(lambda (char) (gethash char h)))
(package-to-charset (kw)
(ecase kw
(:base-xml (lambda (char) (find char "&'<>\"")))
(:all-standard (helper :ent-all))
(:special (helper :ent-special))
(:symbol (helper :ent-symbol))
(:8859-1 (helper :ent-8859-1))))
(string-to-charset (string &aux (char-set (coerce string 'simple-string)))
(lambda (char) (find char char-set)))
(character-to-charset (character)
(lambda (char) (char= char character))))
(etypecase char-set
(keyword (package-to-charset char-set))
(string (string-to-charset char-set))
(character (character-to-charset char-set))
(sequence (let ((char-set
(map 'list
(lambda (x)
(etypecase x
(string (string-to-charset x))
(character (character-to-charset x))
(keyword (package-to-charset x))))
char-set)))
(lambda (char)
(some (lambda (f) (funcall f char)) char-set)))))))
(let ((result (make-array (floor (* 1.2 (length string))) :fill-pointer 0 :adjustable t :element-type 'character))
(last 0))
(loop for index from 0 below (length string)
for char = (char string index) do
(when (funcall in-char-set-p char)
(loop for i from last below index do (vector-push-extend (char string i) result))
(if (and (or omit user-ents packages)
(eql char #\&)
(let* ((entity-name (entityp string (1+ index)))
(entity-type
(and entity-name
(if (eql (char entity-name 0) #\#)
(if (char-equal (char entity-name 1) #\x)
:hex
:dec)
:named))))
(and entity-name
(flet ((entity= (str) (string= entity-name str)))
(or (memq :all omit)
(and (memq :all-named omit) (eq entity-type :named))
(and (memq :numeric omit) (or (eq entity-type :hex)
(eq entity-type :dec)))
(and (memq :hex omit) (eq entity-type :hex))
(and (memq :dec omit) (eq entity-type :dec))
(and (memq :base-xml omit)
(some #'entity= '("amp" "apos" "gt" "lt" "quot")))
(and user-ents
(some #'entity= user-ents))
(and packages
(some (lambda (pac) (find-symbol entity-name pac)) packages)))))))
(vector-push-extend #\& result)
(let ((entity (get (funcall entity-source char)
(getf '(:named :&name :hex :hex-str :dec :dec-str) type))))
(unless entity
(setq entity
(format nil "&#~[~D~;x~X~];" (ecase type (:dec 0) (:hex 1)) (char-code char))))
(loop for c across entity do (vector-push-extend c result)))) #| * here * |#
(setq last (1+ index)))
finally (loop for i from last below (length string)
do (vector-push-extend (char string i) result)))
result))
string))
;;; -------------- end.
From: Thomas A. Russ
Subject: Re: style question: FORMAT vs VECTOR-PUSH-EXTEND
Date:
Message-ID: <ymimzj6nqfo.fsf@sevak.isi.edu>
tichy <···········@o2.pl> writes:
> > Perhaps more context would be useful don't you think?
>
> I wrote code to do substitution in strings: character -> (x)html entity.
>
> Code I asked about is used in function 'subst-char-to-entity'
> (it is located near the end of the code and marked by comment: #| * here * |#).
Well, I would favor a stream-based approach. That would allow doing the
substitution on the fly as you write the (x)html output directly to the
file or network stream. And if you needed to get a string out, you
could use a string output stream (i.e., WITH-OUTPUT-TO-STRING) to get a
string version.
I generally prefer to use streams when I need to make substitutions in
strings where the substituted value is a different length than what it
substitutes for. Entity name expansion is one of those cases.
When I've written functions like this in the past, I keep two pointers
into the original string, and use them with WRITE-STRING to copy the
verbatim parts. When a substitution is called for, you update write the
unchanged part, then the substitution, and then update the pointers.
An untested framework for doing this:
(defun write-string-with-substitutions (string stream substitution-table)
(let ((max (length string))
(start 0)
(end 0)
sub)
(loop while (< end max)
do (setq sub (gethash substitution-table (char string end)))
(cond (sub
(write-string string stream :start start :end end)
(write-string sub stream)
(incf end)
(setq start end))
(t
(incf end))))))
Then just use a hashtable of character substitutions.
--
Thomas A. Russ, USC/Information Sciences Institute
Thomas A. Russ wrote:
> [ ... nice advices ... ]
> An untested framework for doing this:
>
> (defun write-string-with-substitutions (string stream substitution-table)
> (let ((max (length string))
> (start 0)
> (end 0)
> sub)
> (loop while (< end max)
> do (setq sub (gethash substitution-table (char string end)))
> (cond (sub
> (write-string string stream :start start :end end)
> (write-string sub stream)
> (incf end)
> (setq start end))
> (t
> (incf end))))))
>
> Then just use a hashtable of character substitutions.
> [ ... ]
Many thanks for the ideas. I modified a little your code -- with
FORMAT it can work with both: streams and strings.
Regards, Szymon.
code:
(defparameter *substitution-fn*
(lambda (char)
(cdr (assoc char '((#\& . "&") (#\' . "'") (#\" . """) (#\< . "<") (#\> . ">"))))))
(defun write-string-with-substitutions (string &optional output substitution-fn)
(assert (or (null output)
(streamp output)
(and (stringp output)
(array-has-fill-pointer-p output))))
(unless output
(setq output (make-array 0 :element-type 'character :adjustable t :fill-pointer 0)))
(unless substitution-fn
(setq substitution-fn *substitution-fn*))
(loop with max = (length string) and (start end) of-type fixnum
while (< end max)
do (let ((sub (funcall substitution-fn (char string end))))
(cond (sub
(format output
(lambda (stream)
(write-string string stream :start start :end end)
(write-string sub stream)))
(incf end)
(setq start end))
(t
(incf end)))))
output)
CL-USER> (write-string-with-substitutions "<foo><bar>")
"<foo><bar>"
CL-USER> (defparameter *test-stream* (make-string-output-stream))
*TEST-STREAM*
CL-USER> (write-string-with-substitutions "<foo><bar>" *test-stream*)
#<SB-IMPL::STRING-OUTPUT-STREAM {B92D5F9}>
CL-USER> (get-output-stream-string *)
"<foo><bar>"
;; SBCL does not like this last example (below) -- I think it should accept base
;; and standard characters... but it does not:
;;
;; 'The value "" is not of type (AND (VECTOR CHARACTER) (SATISFIES ARRAY-HAS-FILL-POINTER-P))'
;;
;; CLISP accepts this.
CL-USER> (write-string-with-substitutions "<foo><bar>" (make-array 30 :element-type 'standard-char :fill-pointer 0))
"<foo><bar>"
From: Thomas A. Russ
Subject: Re: style question: FORMAT vs VECTOR-PUSH-EXTEND
Date:
Message-ID: <ymifyowop3i.fsf@sevak.isi.edu>
tichy <···········@o2.pl> writes:
>
> Thomas A. Russ wrote:
>
> > [ ... nice advices ... ]
I did just realize that this doesn't handle the end case properly. If
the last character is not a substitution, you lose the last bit of
code. Fortunately, this is easy to fix by adding the following loop
clause:
> > An untested framework for doing this:
> >
> > (defun write-string-with-substitutions (string stream substitution-table)
> > (let ((max (length string))
> > (start 0)
> > (end 0)
> > sub)
> > (loop while (< end max)
> > do (setq sub (gethash substitution-table (char string end)))
> > (cond (sub
> > (write-string string stream :start start :end end)
> > (write-string sub stream)
> > (incf end)
> > (setq start end))
> > (t
> > (incf end)))
finally (write-string string stream :start start :end end)
)))
> >
> > Then just use a hashtable of character substitutions.
>
> > [ ... ]
>
> Many thanks for the ideas. I modified a little your code -- with
> FORMAT it can work with both: streams and strings.
>
> Regards, Szymon.
>
> code:
>
> (defparameter *substitution-fn*
> (lambda (char)
> (cdr (assoc char '((#\& . "&") (#\' . "'") (#\" . """) (#\< . "<") (#\> . ">"))))))
>
>
Note: I find it clearer to put default values for optional arguments
into the lambda list itself, but with a large initializer I would be
tempted to either write a helping function or maybe to use your method.
(defun write-string-with-substitutions
(string
&optional (output (make-array 0 :element-type 'character :adjustable t :fill-pointer 0))
(substitution-fn *substitution-fn*))
> (defun write-string-with-substitutions (string &optional output substitution-fn)
> (assert (or (null output)
> (streamp output)
> (and (stringp output)
> (array-has-fill-pointer-p output))))
> (unless output
> (setq output (make-array 0 :element-type 'character :adjustable t :fill-pointer 0)))
> (unless substitution-fn
> (setq substitution-fn *substitution-fn*))
> (loop with max = (length string) and (start end) of-type fixnum
> while (< end max)
> do (let ((sub (funcall substitution-fn (char string end))))
> (cond (sub
> (format output
> (lambda (stream)
> (write-string string stream :start start :end end)
> (write-string sub stream)))
> (incf end)
> (setq start end))
> (t
> (incf end)))))
> output)
--
Thomas A. Russ, USC/Information Sciences Institute