From: Will Deakin
Subject: complement(ary stuff)
Date: 
Message-ID: <3B288E35.4090808@pindar.com>
There is something that has been puzzling me about {position, 
position-if} and search.

First, I want to look for the furthest to the left position of a 
number of characters in a string I can use position-if:
 > (position-if #'(lambda (x) (or (char= #\space x)
                                  (char= #\s x)))
                "this is a test, isn't it")
3

And if I then want to find the the furthest to the left position 
that are *not* these character I can use complement:
 > (position-if (complement #'(lambda (x) (or (char= #\space x)
                                          (char= #\h x))))
                            "this is a test, isn't it")
0

If then I want find the the furthest to the left position a 
sequence within a sequence I can then use search:
 > (search ", " "this is a test, isn't it")
14

However, my question is that in the absence of `search-if' how do 
you search for the furthest left position of for the complement 
of a sequence or number of sequences?

A possible solution for the search and complement is:
 > (search ", "
           "this is a test, isn't it"
           :test #'(lambda (x y)
                     (string= x y)))
14

and
 > (search ", " "this is a test, isn't it"
           :test (complement #'(lambda (x y)
                                 (string= x y))))
0

Is this the right way to go? And how can you extend this to 
search for a number of sequences?

For example:
 > (search ", " "this is a test, isn't it"
           :test #'(lambda (x y)
                     (or (string= x y)
                         (string= "s " x))))

14

rather than 3 which is what I would expect....

Any thoughts would be greatly appreciated.


<bad joke>
ps: A bloke walks into an empty pub. To his surprise he hears a 
high pitch voice saying "You're looking very smart today."
Puzzled, and as he realises he's out of tabs he wanders over to 
the cigarette machine. Putting in his money, he hears another 
voice saying "BASTARD!" and no cigarettes appear.

Extremely perplexed, he walks up to the bar on which there is a 
bell and a bowl of mints. He rings the bell, after a short while 
the landloard appears, and so the man asks for a pint and an 
explanation.

"I am very sorry," says the landlord "the mints are 
complimentary, but the cigarette machine is out of order."
</bad joke>

From: Kent M Pitman
Subject: Re: complement(ary stuff)
Date: 
Message-ID: <sfwn17b9p52.fsf@world.std.com>
Will Deakin <········@pindar.com> writes:

> However, my question is that in the absence of `search-if' how do 
> you search for the furthest left position of for the complement 
> of a sequence or number of sequences?
> 
> A possible solution for the search and complement is:
>  > (search ", "
>            "this is a test, isn't it"
>            :test #'(lambda (x y)
>                      (string= x y)))
> 14

Um, I looked at the standard and it's quite ambiguously worded on this
point; sigh.  17.2.1 makes this clear, but you may have to struggle to
see why.  (For one thing, it reminds you that the default for :TEST is
#'EQL.)

Anyway, my reading (and my experience) is that the :test argument is
supposed to be an elementwise-test, not a subsequence-test.  That is, 
it's there so you can pass #'char-equal to get a case-insensitive search.

If you think about it, this has to be.  Your :test as supplied above can't
work efficiently because you'd have to cons a new substring x and y for
every position in the string just to do the test.
 
> and
>  > (search ", " "this is a test, isn't it"
>            :test (complement #'(lambda (x y)
>                                  (string= x y))))
> 0

Since you've got the wrong notion of the test to use, you also have the wrong
notion of an appropriate complement here.

> Is this the right way to go? And how can you extend this to 
> search for a number of sequences?

I have no idea what you awnt to search for, but this seems a bizarre way to
go about it.  The problem with inverse searches is that there are so many
matches.  "th" is the result of searching for the string-complement of
", ".  That is, the first two-letter sequence in "this is a test" that is
not ", " is "th".

> For example:
>  > (search ", " "this is a test, isn't it"
>            :test #'(lambda (x y)
>                      (or (string= x y)
>                          (string= "s " x))))

Perhaps you want to write a search-min macro such that you can do:

 (search-min (search ", " "this is a test, isn't it")
             (search "s " "this is a test, isn't it"))

or else perhaps you should write something which allows you to take a
list of strings or a regexp and find the first place in the string that
matches the spec.  Logical complement is not going to get you to there, 
though.
From: Will Deakin
Subject: Re: complement(ary stuff)
Date: 
Message-ID: <3B28D3C7.5070101@pindar.com>
Kent M Pitman wrote:

> Um, I looked at the standard and it's quite ambiguously worded on this
> point; sigh.
Hmmmm. I don't have the standard but having looked in the 
hyperspec, cltl2, the paul graham books and PAIP I was still 
little wiser -- this would either suggest that it is not only the 
standard that is ambiguous (or that I am easily confused ;)

[...elided some excellent clarifying points...]

 > Since you've got the wrong notion of the test to use, you also 
 > have the wrong notion of an appropriate complement here.
Yes, this is my problem and lies within the nature of the test...

I also agree that the logical complement stuff will also not 
help. Time for a rething. Anyway, thanks for your help.

:)w
From: Barry Margolin
Subject: Re: complement(ary stuff)
Date: 
Message-ID: <FH6W6.10$4O4.266@burlma1-snr2>
In article <················@pindar.com>,
Will Deakin  <········@pindar.com> wrote:
>I also agree that the logical complement stuff will also not 
>help. Time for a rething. Anyway, thanks for your help.

(defun search-nonmatch (substring string)
  (let ((ss-len (length substring))
        (s-len (length string)))
    (and (<= ss-len s-len)
         (dotimes (i (- s-len ss-len))
           (when (string/= substring string :start2 i :end2 (+ i ss-len))
             (return i))))))

As Kent suggested, this function doesn't seem particularly useful.  Unless
the substring only contains multiple copies of the first character of the
string, and the string begins with at least that many copies of the
character, this will always return either 0 or 1.  E.g.

(search-nonmatch "abc" "abcdef") => 1 "bcd" doesn't match "abc"
(search-nonmatch "bcd" "abcdef") => 0 "abc" doesn't match "bcd"
(search-nonmatch "xyz" "abcdef") => 0 "abc" doesn't match "xyz"
(search-nonmatch "aaa" "aaaaab") => 3 "aab" doesn't match "aaa"

-- 
Barry Margolin, ······@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: Will Deakin
Subject: Re: complement(ary stuff)
Date: 
Message-ID: <3B29CDCA.2060406@pindar.com>
[...elided an excellent description of an search-complement...]

Barry Margolin wrote:

> As Kent suggested, this function doesn't seem particularly useful.  
I agree. I try and explain what I was thinking about and this was 
something that would do:

(search "bc" "abcdef") => 1
(search-complement "bc" "bcbcdef") => 4

The after the first complement match, the newmatch-complement 
would skip the matched stuff and restart.

In the above example, the search-complement would start at 
position 0 match "bc" and restart at 2, match the second "bc" and 
restart and finally fail at 4.

Alot of my confusion was based on drawing an analogy with 
position-if-not which does restart at the width of the field 
along which also happens to be 1.

Anyway, I hope this may go some way to explain why I thought what 
I did -- and I would like to thank you for your help.
From: Barry Margolin
Subject: Re: complement(ary stuff)
Date: 
Message-ID: <7vpW6.40$4O4.272@burlma1-snr2>
In article <················@pindar.com>,
Will Deakin  <········@pindar.com> wrote:
>[...elided an excellent description of an search-complement...]
>
>Barry Margolin wrote:
>
>> As Kent suggested, this function doesn't seem particularly useful.  
>I agree. I try and explain what I was thinking about and this was 
>something that would do:
>
>(search "bc" "abcdef") => 1
>(search-complement "bc" "bcbcdef") => 4
>
>The after the first complement match, the newmatch-complement 
>would skip the matched stuff and restart.

OK, that makes a bit more sense.  If you take my function and have it
increment the index by ss-len instead of by 1 each time, I think it will do
what you want.  It's not clear to me, though, how you want:

(search-complement "bc" "abcbcdef")

to work.  Should it return 0 or 5 (find the first "bc", then skip over all
the "bc" sequences)?  If the latter, just use (search substring string) to
initialize the index instead of using 0.


-- 
Barry Margolin, ······@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: Will Deakin
Subject: Re: complement(ary stuff)
Date: 
Message-ID: <3B320212.5000209@pindar.com>
Sorry about the delay...

Barry Margolin wrote:

> (search-complement "bc" "abcbcdef")
> 
> to work.  Should it return 0 or 5 (find the first "bc", then skip over all
> the "bc" sequences)?  
This should produce 0.

> If the latter, just use (search substring string) to initialize the index 

> instead of using 0.
Good tip.

Cheers,

Will