What would be the standard way of removing a
bunch of characters (with limited wildcard characters)
from a string ?
E.g: remove all occurrences of "#\ESC[?m" from a string.
"#\ESC[?m" might resolve into "#\ESC[4m", "#\ESC[m" etc.
The wildcard character should expand into either 0
or 1 char. Has anyone already done this ?
Could I use the read macro by defining macro-dispatch chars ?
Or should I use remove-if #'(huge-lambda-function)
(I'm trying to remove all VT100 escape character
sequences from strings received from a VT100 terminal...)
Example:
(remove-esc-seq "#\ESC[4mSystem V#\ESC[m") should print
--> "System V"
Cheers, Bela
In article <······················@ifi.unizh.ch> ···@ifi.unizh.ch (Bela Ban) writes:
>What would be the standard way of removing a
>bunch of characters (with limited wildcard characters)
>from a string ?
>E.g: remove all occurrences of "#\ESC[?m" from a string.
>"#\ESC[?m" might resolve into "#\ESC[4m", "#\ESC[m" etc.
>The wildcard character should expand into either 0
>or 1 char. Has anyone already done this ?
Note that the syntax of ANSI escape sequences doesn't limit you to one
character between the "ESC [" and the final character. Consider the control
sequence for cursor motion: ESC [ row ; column H. The full syntax, in Unix
extended regular expression syntax, is:
\E[ -/]*([0-Z\\-~]|\[[ -/]*[0-?]*[@-~])
(The metacharacters in this syntax are: [...], which specifies sets and
ranges (using <char1>-<char2> syntax); (...|...), which specifies
alternatives; *, which specifies 0 or more repetitions; and \, which either
quotes the following metacharacter or turns into a control character
indicated by the following letter (\E is ESC).
The English translation of that is:
An escape sequence is of the form
ESC (27) I...I F where I are intermediate codes between SP (32) and / (47),
and F is the final code between 0 (48) and ~ (126).
A control sequence is an escape sequence using final code [ (91)
ESC [ P...P I...I F where P are parameter codes between 0 (48) and ? (63),
and F is the final code between @ (64) and ~ (126).
The whole point was to allow implementations that don't support a
particular operation to be able to parse the escape sequence enough to
ignore it.
>Could I use the read macro by defining macro-dispatch chars ?
Read macros aren't processed when reading the characters inside a string.
You could use READ-FROM-STRING, but that will end up trying to parse the
string as Lisp object representations. It would be better to write your
own routine that iterates over the characters.
>Or should I use remove-if #'(huge-lambda-function)
That won't work, because it passes each character to the test function, so
the context is lost. Here's some code that does what you want.
(defun remove-escape-sequences (string)
(let ((new-string (make-array (length string) :element-type 'character :fill-pointer 0)))
(loop for input-index below (length string)
for input-char = (char string input-index)
if (char= input-char (code-char 27)) ; ESC
do (setq input-index (end-of-escape-sequence string input-index))
else do (vector-push input-char new-string))
new-string))
(defun end-of-escape-sequence (string start-index)
(let* ((final-char-index
(position-if #'(lambda (char) (char<= ··@ char #\~)) string :start start-index))
(final-char (char string final-char-index)))
(if (char= final-char #\[)
;; it's a complex control sequence
(position-if #'(lambda (char) (char<= ··@ char #\~)) string :start (1+ final-char-index))
;; it's a single-character escape sequence
final-char-index)))
>(I'm trying to remove all VT100 escape character
>sequences from strings received from a VT100 terminal...)
>Example:
>(remove-esc-seq "#\ESC[4mSystem V#\ESC[m") should print
>--> "System V"
Does the string really contain the five characters "#\ESC" as you've shown,
or does it actually contain the ESC character as a single element? My
code assumes the latter, since a VT100 terminal doesn't send Lisp printed
representations.
--
Barry Margolin
System Manager, Thinking Machines Corp.
······@think.com {uunet,harvard}!think!barmar