From: Bela Ban
Subject: Remove wildcarded substrs from strs
Date: 
Message-ID: <1992Feb28.111713.29043@ifi.unizh.ch>
What would be the standard way of removing a
bunch of characters (with limited wildcard characters)
from a string ? 
E.g: remove all occurrences of "#\ESC[?m" from a string.
"#\ESC[?m" might resolve into "#\ESC[4m", "#\ESC[m" etc.
The wildcard character should expand into either 0
or 1 char. Has anyone already done this ? 
Could I use the read macro by defining macro-dispatch chars ?
Or should I use remove-if #'(huge-lambda-function)
(I'm trying to remove all VT100 escape character 
sequences from strings received from a VT100 terminal...)
Example:
(remove-esc-seq "#\ESC[4mSystem V#\ESC[m") should print
--> "System V" 
Cheers, Bela
From: Barry Margolin
Subject: Re: Remove wildcarded substrs from strs
Date: 
Message-ID: <kqti6pINNpa5@early-bird.think.com>
In article <······················@ifi.unizh.ch> ···@ifi.unizh.ch (Bela Ban) writes:
>What would be the standard way of removing a
>bunch of characters (with limited wildcard characters)
>from a string ? 
>E.g: remove all occurrences of "#\ESC[?m" from a string.
>"#\ESC[?m" might resolve into "#\ESC[4m", "#\ESC[m" etc.
>The wildcard character should expand into either 0
>or 1 char. Has anyone already done this ? 

Note that the syntax of ANSI escape sequences doesn't limit you to one
character between the "ESC [" and the final character.  Consider the control
sequence for cursor motion: ESC [ row ; column H.  The full syntax, in Unix
extended regular expression syntax, is: 

	\E[ -/]*([0-Z\\-~]|\[[ -/]*[0-?]*[@-~])

(The metacharacters in this syntax are: [...], which specifies sets and
ranges (using <char1>-<char2> syntax); (...|...), which specifies
alternatives; *, which specifies 0 or more repetitions; and \, which either
quotes the following metacharacter or turns into a control character
indicated by the following letter (\E is ESC).

The English translation of that is:

An escape sequence is of the form
 ESC (27) I...I F where I are intermediate codes between SP (32) and / (47),
  and F is the final code between 0 (48) and ~ (126).
A control sequence is an escape sequence using final code [ (91)
 ESC [ P...P I...I F where P are parameter codes between 0 (48) and ? (63),
 and F is the final code between @ (64) and ~ (126).

The whole point was to allow implementations that don't support a
particular operation to be able to parse the escape sequence enough to
ignore it.

>Could I use the read macro by defining macro-dispatch chars ?

Read macros aren't processed when reading the characters inside a string.
You could use READ-FROM-STRING, but that will end up trying to parse the
string as Lisp object representations.  It would be better to write your
own routine that iterates over the characters.

>Or should I use remove-if #'(huge-lambda-function)

That won't work, because it passes each character to the test function, so
the context is lost.  Here's some code that does what you want.

(defun remove-escape-sequences (string)
  (let ((new-string (make-array (length string) :element-type 'character :fill-pointer 0)))
    (loop for input-index below (length string)
	  for input-char = (char string input-index)
	  if (char= input-char (code-char 27)) ; ESC
	    do (setq input-index (end-of-escape-sequence string input-index))
	  else do (vector-push input-char new-string))
    new-string))

(defun end-of-escape-sequence (string start-index)
  (let* ((final-char-index
	   (position-if #'(lambda (char) (char<= ··@ char #\~)) string :start start-index))
	 (final-char (char string final-char-index)))
    (if (char= final-char #\[)
	;; it's a complex control sequence
	(position-if #'(lambda (char) (char<= ··@ char #\~)) string :start (1+ final-char-index))
	;; it's a single-character escape sequence
	final-char-index)))


>(I'm trying to remove all VT100 escape character 
>sequences from strings received from a VT100 terminal...)
>Example:
>(remove-esc-seq "#\ESC[4mSystem V#\ESC[m") should print
>--> "System V" 

Does the string really contain the five characters "#\ESC" as you've shown,
or does it actually contain the ESC character as a single element?  My
code assumes the latter, since a VT100 terminal doesn't send Lisp printed
representations.
-- 
Barry Margolin
System Manager, Thinking Machines Corp.

······@think.com          {uunet,harvard}!think!barmar