From: Geoff Summerhayes
Subject: bit problem
Date: 
Message-ID: <MlaN7.51952$kb.4091251@news1.calgary.shaw.ca>
I came across this problem in another NG.
(comp.games.development.programming.algorithms)

(from original post)
----------------------------
// Input example
1110100
0000000
0111011
1000000
1111111
1111011
1111011

An algorithm should return the following set :
// Output (no sort)
1000000
0110000
0000100
0001011

For the last string, 0000011 means that among all of the input's bit strings
the last two bits are always equal.
----------------------------

What I came up with:

(defun bit-problem (list)
  (remove-duplicates
   (loop for index from 0 below (length (car list)) ; assume equal-length
vectors
         collecting
         (reduce #'bit-and
                 (mapcar #'(lambda(bit-vector)
                             (if (= 0 (sbit bit-vector index))
                                 (bit-not bit-vector)
                               bit-vector))
                         list))) :test #'equal))

>> (bit-problem '(#*1110100 #*0000000 #*0111011 #*1000000 #*1111111 #*1111011
#*1111011))
(#*1000000 #*0110000 #*0000100 #*0001011)

Comments? (style,algorithm,etc.) This was banged together over about 15 minutes
in LW's listener,
I'm sure it can be improved, the current code looks too expensive for actual
use.

-----------------
Geoff

From: Vladimir Zolotykh
Subject: Re: bit problem
Date: 
Message-ID: <3C054293.5803683C@eurocom.od.ua>
Geoff Summerhayes wrote:

> An algorithm should return the following set :

I've interested what this algorithm might be used for.
From: Gareth McCaughan
Subject: Re: bit problem
Date: 
Message-ID: <slrna0b1rl.24q6.Gareth.McCaughan@g.local>
Geoff Summerhayes wrote:

> I came across this problem in another NG.
> (comp.games.development.programming.algorithms)

[SNIP]

If I've understood correctly, the idea is to build
equivalence classes of bit positions under the relation
"always contain the same bit in the specified bitstrings".

(defun equivalence-classes (bit-vectors)
  (flet ((column (j) (map 'bit-vector (lambda (v) (sbit v j)) bit-vectors))
         (make-bits (n) (make-array n :element-type 'bit :initial-element 0)))
    (let* ((n (length (first bit-vectors)))
           (column-hash (make-hash-table :test #'equal)))
      (loop for i upfrom 0 below n do
        (let* ((column (column i))
               (bits (or (gethash column column-hash) (make-bits n))))
          (setf (sbit bits i) 1)
          (setf (gethash column column-hash) bits)))
      (loop for row being each hash-value of column-hash collect row))))

On a very artificial example (20 random bit-vectors of length 10000;
Athlon/1GHz; CMUCL 18c; FreeBSD), the code above takes 1.5 seconds
and conses 14Mb; the BIT-PROBLEM function in Geoff's article takes...
well, I gave up waiting after about 15 minutes of CPU time and hundreds
of Mb consed. The consing is because it's making huge numbers of new
bit-vectors. The CPU time is mostly the fault of REMOVE-DUPLICATES,
which I think is using the obvious and horrible quadratic algorithm --
on a list of almost 10k bit vectors, each 10k bits long. That's going
to take a while.

Now, as I said, that's a very artificial example, and if there are
real applications for this thing then their input data probably don't
look even slightly like the data I constructed. :-) It's possible
that the overhead of setting up and using a hash table is a big
deal for small applications.

The performance of this code is surprisingly stable under
small perturbations. It doesn't get much slower if you
make COLUMN build lists instead of bit vectors. It doesn't
get much faster if you take the trouble to avoid storing
BITS back in the hash table when it's already there. At
least, not with my test data, where that's a rare occurrence. :-)
In applications where there were lots of repeated columns, that
might become much more worth while.

-- 
Gareth McCaughan  ················@pobox.com
.sig under construc
From: Geoff Summerhayes
Subject: Re: bit problem
Date: 
Message-ID: <jmTN7.86620$6b.5470286@news2.calgary.shaw.ca>
"Gareth McCaughan" <················@pobox.com> wrote in message
·····································@g.local...
> Geoff Summerhayes wrote:
>
> > I came across this problem in another NG.
> > (comp.games.development.programming.algorithms)
>
<snip>
>
> On a very artificial example (20 random bit-vectors of length 10000;
> Athlon/1GHz; CMUCL 18c; FreeBSD), the code above takes 1.5 seconds
> and conses 14Mb; the BIT-PROBLEM function in Geoff's article takes...
> well, I gave up waiting after about 15 minutes of CPU time and hundreds
> of Mb consed. The consing is because it's making huge numbers of new
> bit-vectors. The CPU time is mostly the fault of REMOVE-DUPLICATES,
> which I think is using the obvious and horrible quadratic algorithm --
> on a list of almost 10k bit vectors, each 10k bits long. That's going
> to take a while.
>
> Now, as I said, that's a very artificial example, and if there are
> real applications for this thing then their input data probably don't
> look even slightly like the data I constructed. :-) It's possible
> that the overhead of setting up and using a hash table is a big
> deal for small applications.
<snip>

A very nice approach, thank you. After giving it some thought I wrote
this incremental approach:

(defun incremental-equivalence(length)
  (let ((accumulator
             (list(make-array length :element-type 'bit :initial-element 1))))
    (values
      #'(lambda (bit-vector)
           (let ((length (length accumulator)))
             (dotimes (index length)
               (let* ((test-val (nth index accumulator))
                        (value1 (bit-and bit-vector test-val))
                        (value2 (bit-andc1 bit-vector test-val)))
                 (or (equal test-val value1)
                       (equal test-val value2)
                       (progn
                          (setf (nth index accumulator) value1)
                          (setf accumulator (cons value2 accumulator))))))))
      #'(lambda () (copy-list accumulator)))))

with (for testing)

(defun equivalence-classes(list)
  (multiple-value-bind (add view)
      (incremental-equivalence (length (first list)))
    (progn
      (dolist (x list) (funcall add x))
      (funcall view))))

Using your approach for testing(WinNT4-sp6,PII/1G ,LW Personal):

               incremental      hash
Seconds:          31.0             3.7
                  31.0             3.5
                  30.4             3.6
                  29.5             3.7
                  30.5             3.5

Standard
Allocation(bytes):
               9561528        14805664
               9487576        14787832
               9399160      4309777520
               9081720        14811240
               9419992      4310265712

Fixlen
Allocation(bytes):
                  6622          345752
                  6479          345290
                  6391          345851
                  6182          345884
                  6413          345191

I'm not sure this is a fair test for the incremental, it's not really designed
to get all the data at once.
Be that as it may, any suggestions for improvements?

---------
Geoff