I'm trying to "un-escape" an "html-encoded" string, and
thought that I could use substitue as in:
> (substitute #\& "&" "lamb & elephant & "squale"")
unfortunately, this doesn't seem to work.
Nor does:
> (substitute "&" "&" "lamb & elephant & "squale"")
as both return the sequence, unchanged.
The CLHS entry for
subsitute newitem olditem sequence
doesn't seem to say anything about the "type compatibility" between
newitem, olditem nor sequence, only that the first two should be
"objects" and the third one a "proper sequence".
Did I miss some magic builtin CL function, or should I roll my own?
BTW: I had a look at split-sequence, and it seems that this couldn't
work either, because it uses position internally, and
> (position "&" "lamb & elephant & "squale"")
returns nil.
Many thanks
--
JFB
verec wrote:
> I'm trying to "un-escape" an "html-encoded" string, and
> thought that I could use substitue as in:
>
> > (substitute #\& "&" "lamb & elephant & "squale"")
>
> unfortunately, this doesn't seem to work.
>
> Nor does:
>
> > (substitute "&" "&" "lamb & elephant & "squale"")
>
> as both return the sequence, unchanged.
>
> The CLHS entry for
>
> subsitute newitem olditem sequence
The sequences in question are strings.
SUBSTITUTE works finds and replaces individual elements of sequences.
The element of a sequence is a character, not a substring.
The element of a string, the item, is a character, not a substring.
> Did I miss some magic builtin CL function, or should I roll my own?
SEARCH, REPLACE
On 2006-01-27 19:40:02 +0000, "Kaz Kylheku" <········@gmail.com> said:
>> Did I miss some magic builtin CL function, or should I roll my own?
>
> SEARCH, REPLACE
Sounds like "roll-my-own" time then, obviously using primitives
like search and replace.
Many thanks
--
JFB
verec wrote:
> On 2006-01-27 19:40:02 +0000, "Kaz Kylheku" <········@gmail.com> said:
>
> >> Did I miss some magic builtin CL function, or should I roll my own?
> >
> > SEARCH, REPLACE
>
> Sounds like "roll-my-own" time then, obviously using primitives
> like search and replace.
>
> Many thanks
> --
> JFB
Isn't there some HTML or XML library out there with escape and
unescape?
Also, regex maybe?
CL-PPCRE probably has some slick way of doing this.
I'm looking at the web page, and it looks like the REGEX-REPLACE-ALL is
the thing.
verec wrote:
> I'm trying to "un-escape" an "html-encoded" string, and
> thought that I could use substitue as in:
>
>> (substitute #\& "&" "lamb & elephant & "squale"")
>
> unfortunately, this doesn't seem to work.
>
Use cl-ppcre
http://www.cliki.net/CL-PPCRE
CL-USER 4 > (cl-ppcre:regex-replace-all "&" "lamb & elephant &
"squale"" "&")
"lamb & elephant & "squale""
CL-USER 5 >
Wade
On 2006-01-27 19:58:58 +0000, Wade Humeniuk
<··················@telus.net> said:
> verec wrote:
>> I'm trying to "un-escape" an "html-encoded" string, and
>> thought that I could use substitue as in:
>>
>>> (substitute #\& "&" "lamb & elephant & "squale"")
>>
>> unfortunately, this doesn't seem to work.
>>
>
> Use cl-ppcre
>
> http://www.cliki.net/CL-PPCRE
>
> CL-USER 4 > (cl-ppcre:regex-replace-all "&" "lamb & elephant
> & "squale"" "&")
> "lamb & elephant & "squale""
Thanks for the pointer.
In the mean-time, I had come up with:
(defmacro while (test &body body)
`(do ()
((not ,test))
,@body))
(defun search-replace (what with where)
(let ((pos nil)
(what-len (length what)))
(while (setf pos (search what where))
(setf where
(with-output-to-string (out)
(write-string (subseq where 0 pos) out)
(write-string with out)
(write-string (subseq where (+ pos what-len)) out))))
where))
CL-USER 46 > (search-replace "&" "&" "lamb & elephant &
"squale"")
"lamb & elephant & "squale""
CL-USER 47 > (search-replace """ "\"" "lamb & elephant &
"squale"")
"lamb & elephant & \"squale\""
which seems to work, but that I am not too happy with because
it generates three temporary strings for each occurence of
the searched for string.
I'm having a look at CL-PPCRE now.
Many thanks
--
JFB
verec wrote:
> (defun search-replace (what with where)
> (let ((pos nil)
> (what-len (length what)))
> (while (setf pos (search what where))
> (setf where
> (with-output-to-string (out)
> (write-string (subseq where 0 pos) out)
> (write-string with out)
> (write-string (subseq where (+ pos what-len)) out))))
> where))
This idiom is somewhat awkward with the SETF:
(let ((var nil))
(while (setf var (compute-var))
(expr)))
Idea:
(loop for var = (compute-var)
while var
do (expr))
> CL-USER 46 > (search-replace "&" "&" "lamb & elephant &
> "squale"")
> "lamb & elephant & "squale""
>
> CL-USER 47 > (search-replace """ "\"" "lamb & elephant &
> "squale"")
> "lamb & elephant & \"squale\""
>
> which seems to work, but that I am not too happy with because
Does it really work? Ha ha!
What if you try to replace "foo" with "foo" in a string that contains
"foo"? Because your algorithm re-scans the string from the beginning,
it will never terminate. It will keep finding "foo" and replacing it
with "foo".
What's worse, if you replace "foo" with "foos", it will keep growing
the string until memory runs out: "...foo..." -> "...foos..." ->
"...fooss..." and so forth.
Tant pis!
> it generates three temporary strings for each occurence of
> the searched for string.
It also generates an entire string-stream object to build one of those
strings.
What you can do is make two passes over the input string. In the first
pass, calculate exactly how many characters are required to do the
replacement. Then make a string which is exactly that big using
MAKE-STRING.
Then, in a second pass, copy pieces of the original string, and the
replacement string, into that buffer using REPLACE.
On 2006-01-27 22:22:57 +0000, "Kaz Kylheku" <········@gmail.com> said:
> This idiom is somewhat awkward with the SETF:
>
> (let ((var nil))
> (while (setf var (compute-var))
> (expr)))
I'm more than happy to correct anything that is non idiomatic
or inefficient, but what is it, precisely, that you find
awkward? Using the value returned by setf? Or using it as
an argument to while?
> (loop for var = (compute-var)
> while var
> do (expr))
Noted.
>> which seems to work, but that I am not too happy with because
>
> Does it really work? Ha ha!
I am aware of those cases. One more reason to look elsewhere :-)
> What you can do is make two passes over the input string. In the first
> pass, calculate exactly how many characters are required to do the
> replacement. Then make a string which is exactly that big using
> MAKE-STRING.
>
> Then, in a second pass, copy pieces of the original string, and the
> replacement string, into that buffer using REPLACE.
Good idea. The reason I didn't use replace is because it cannot
expand or shrink the sequence. But in the design you suggest, this
is what is wanted.
I'll give it a try.
[Right now, I'm trying to tell CL-PPCRE to replace ) with #\A,
ie &#xy with (code-char (from-base-16 xy))]
Many thanks
--
JFB
On Sat, 28 Jan 2006 11:45:37 +0000, verec <·····@mac.com> wrote:
> [Right now, I'm trying to tell CL-PPCRE to replace ) with #\A, ie
> &#xy with (code-char (from-base-16 xy))]
Assuming you forgot the semicolon and every number of hex digits (not
only two) is OK, it'd look more or less like this (untested):
(defun foo (string)
(ppcre:regex-replace-all "&#([0-9a-fA-F]+);"
string
(lambda (whole-match hex-string)
(declare (ignore whole-match))
(string
(code-char
(parse-integer hex-string :radix 16))))
:simple-calls t))
If you're very much concerned about micro-efficiency you should use
REGEX-REPLACE-ALL without the SIMPLE-CALLS argument. The closure will
be more complicated then, of course.
HTH,
Edi.
--
Lisp is not dead, it just smells funny.
Real email: (replace (subseq ·········@agharta.de" 5) "edi")
On Sat, 28 Jan 2006 23:57:42 +0000, verec <·····@mac.com> wrote:
> (defun simple-replace (text)
> (dolist (x '(("&" . "&") ("<" . "<") (">" . ">") ("""
> . "\"")))
> (setf text (ppcre:regex-replace-all (car x) text (cdr x))))
> text)
I'd suggest something like the following:
(defun simple-replace (text)
(ppcre:regex-replace-all "&(amp|lt|gt|quot);"
text
#'find-simple-replacement))
(defun find-simple-replacement (target-string start end
match-start match-end
reg-starts reg-ends)
(declare (ignore start end match-end reg-starts reg-ends))
(case (char target-string (1+ match-start))
(#\a "&")
(#\l "<")
(#\g ">")
(#\q "\"")))
In your implementation CL-PPCRE has to compile four regular
expressions at run time for each call of SIMPLE-REPLACE. In my
version it will compile one regular expression once at compile time
and that's it. See the compiler macros in `api.lisp' for details, or
trace the function PPCRE:CREATE-SCANNER for both versions.
Cheers,
Edi.
--
Lisp is not dead, it just smells funny.
Real email: (replace (subseq ·········@agharta.de" 5) "edi")
On 2006-01-29 00:16:08 +0000, Edi Weitz <········@agharta.de> said:
> I'd suggest something like the following:
>
> (defun simple-replace (text)
> (ppcre:regex-replace-all "&(amp|lt|gt|quot);"
> text
> #'find-simple-replacement))
>
> (defun find-simple-replacement (target-string start end
> match-start match-end
> reg-starts reg-ends)
> (declare (ignore start end match-end reg-starts reg-ends))
> (case (char target-string (1+ match-start))
> (#\a "&")
> (#\l "<")
> (#\g ">")
> (#\q "\"")))
>
> In your implementation CL-PPCRE has to compile four regular
> expressions at run time for each call of SIMPLE-REPLACE. In my
> version it will compile one regular expression once at compile time
> and that's it. See the compiler macros in `api.lisp' for details, or
> trace the function PPCRE:CREATE-SCANNER for both versions.
What can I say?
Thanks again, Edi. Works like a charm.
--
JFB