how to deal with various representations of matrices ?

From: Thibault Langlois
Subject: how to deal with various representations of matrices ?
Date: Mon, 23 Jun 2008 15:05:22 +0000
Message-ID: <f13848e7-08ba-4a11-b11f-5f5e5726ac73@e39g2000hsf.googlegroups.com>

Hi, I often have to manipulate numerical data (in programs in in the
REPL) that happens to be in various formats (2D arrays, lists of
vectors, vector of vectors, list of lists  etc...). Like many I wrote
a bunch of trivial utilities to:
- compute some stats
- find max and min in columns
- extract parts of the matrix etc...

My problem is as data may be in various formats I have to take this
into account. I see 3 ways:

1. write functions to convert data from each format to a canonical
representation (2D arrays). And code functions only for this
representation. This has the advantage of not duplicating code but the
translation process may be consuming time and space if I have a large
amount of data.

2. use method to specialize on each data format. In order to do this I
have to create a class "list-of-vectors" (for example) in order to be
able to specialize on it. This not very convenient especially when
working with the REPL because I have to use make-instance very often.

3. use a method that specialize on a symbol ex:
(defmethod mean-columns ((data-format (eql :list-of-vectors))
data) ...

This may be a very newbie dilemma but, anyway, is there more clever
ways to deal with this issue ?

Thibault

Re: how to deal with various representations of matrices ? Alberto Riva
Re: how to deal with various representations of matrices ? Pascal J. Bourguignon
- Re: how to deal with various representations of matrices ? Thibault Langlois
  - Re: how to deal with various representations of matrices ? Pascal J. Bourguignon

From: Alberto Riva
Subject: Re: how to deal with various representations of matrices ?
Date: Mon, 23 Jun 2008 16:42:16 +0000
Message-ID: <g3ojp8$io4c$1@usenet.osg.ufl.edu>

Thibault Langlois wrote:
> Hi, I often have to manipulate numerical data (in programs in in the
> REPL) that happens to be in various formats (2D arrays, lists of
> vectors, vector of vectors, list of lists  etc...). Like many I wrote
> a bunch of trivial utilities to:
> - compute some stats
> - find max and min in columns
> - extract parts of the matrix etc...
> 
> My problem is as data may be in various formats I have to take this
> into account. I see 3 ways:
> 
> 1. write functions to convert data from each format to a canonical
> representation (2D arrays). And code functions only for this
> representation. This has the advantage of not duplicating code but the
> translation process may be consuming time and space if I have a large
> amount of data.
> 
> 2. use method to specialize on each data format. In order to do this I
> have to create a class "list-of-vectors" (for example) in order to be
> able to specialize on it. This not very convenient especially when
> working with the REPL because I have to use make-instance very often.
> 
> 3. use a method that specialize on a symbol ex:
> (defmethod mean-columns ((data-format (eql :list-of-vectors))
> data) ...

I would use solution #2. It may seem unnecessarily complex at first, but 
in the long run it's going to save you a lot of time and trouble 
(especially if you'll ever have to extend the set of formats you work 
with). The make-instance problem can easily be solved with a macro or a 
reader macro. How are you creating your matrices in the first place? It 
may be even possible to write a single matrix-creating function that 
creates an instance of the right class automatically depending on the 
nature of the data you supply to it.

Alberto

From: Pascal J. Bourguignon
Subject: Re: how to deal with various representations of matrices ?
Date: Mon, 23 Jun 2008 18:24:59 +0000
Message-ID: <87r6aom2kk.fsf@hubble.informatimago.com>

Thibault Langlois <·················@gmail.com> writes:

> Hi, I often have to manipulate numerical data (in programs in in the
> REPL) that happens to be in various formats (2D arrays, lists of
> vectors, vector of vectors, list of lists  etc...). Like many I wrote
> a bunch of trivial utilities to:
> - compute some stats
> - find max and min in columns
> - extract parts of the matrix etc...
>
> My problem is as data may be in various formats I have to take this
> into account.

Watch Lecture-4b from:
http://swiss.csail.mit.edu/classes/6.001/abelson-sussman-lectures/

>  I see 3 ways:
>
> 1. write functions to convert data from each format to a canonical
> representation (2D arrays). And code functions only for this
> representation. This has the advantage of not duplicating code but the
> translation process may be consuming time and space if I have a large
> amount of data.
>
> 2. use method to specialize on each data format. In order to do this I
> have to create a class "list-of-vectors" (for example) in order to be
> able to specialize on it. This not very convenient especially when
> working with the REPL because I have to use make-instance very often.
>
> 3. use a method that specialize on a symbol ex:
> (defmethod mean-columns ((data-format (eql :list-of-vectors))
> data) ...
>
> This may be a very newbie dilemma but, anyway, is there more clever
> ways to deal with this issue ?

Another way to deal with that would be to use an adaptor pattern.  And
since you seem to say that you have more than three representations,
indeed it would be nice to choose one intermediary representation, so
you have to write only 2N adaptors instead of N�-N.

Now, the nice thing is that once you've wrote the 2N adaptors, you can
compose then and compile the N�-N optimized direct adaptors.

But another consideration is that data structures should match the
algorithms (and vice-versa).  If you have an algorithm that needs
direct access to the slots of your sequences, it would be much better
to use a vector than an adaptor over a list.  This would mean
converting the data into the data structure adapted to each algorithm.
If you hide well your representations as indicated in the Lecture-4b,
you could convert from one representation to the other automatically.

(let ((seq (make-instance 'smart-sequence)))
  (insert item seq :at-end)
  (insert item seq :at-end)
  (insert item seq :at-end) ; oh oh! we're just appending items to the end
                            ; perhaps I should switch to a reverse list representation
  (insert item seq :at-end) ; Yay! O(1)
  (insert item seq :at-end)
  ... (assume 10000 more)
  ;; assume a call to sort
  (ref item 5000)              
  (ref item 2500)           ; oh oh! looks like random access
  (ref item 7500)           ; let's switch to a vector representation
  (ref item 1250)           ; Yay! O(1)
  (ref item 3750)
  ... (assume 10000 more)
  )

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"You cannot really appreciate Dilbert unless you read it in the
original Klingon"

From: Thibault Langlois
Subject: Re: how to deal with various representations of matrices ?
Date: Tue, 24 Jun 2008 21:47:06 +0000
Message-ID: <65d25d06-5407-40a9-acb6-c8c7be04220b@i76g2000hsf.googlegroups.com>

On Jun 23, 7:24 pm, ····@informatimago.com (Pascal J. Bourguignon)
wrote:
> Thibault Langlois <·················@gmail.com> writes:
> > Hi, I often have to manipulate numerical data (in programs in in the
> > REPL) that happens to be in various formats (2D arrays, lists of
> > vectors, vector of vectors, list of lists  etc...). Like many I wrote
> > a bunch of trivial utilities to:
> > - compute some stats
> > - find max and min in columns
> > - extract parts of the matrix etc...
>
> > My problem is as data may be in various formats I have to take this
> > into account.
>
> Watch Lecture-4b from:http://swiss.csail.mit.edu/classes/6.001/abelson-sussman-lectures/
>
>
>
> >  I see 3 ways:
>
> > 1. write functions to convert data from each format to a canonical
> > representation (2D arrays). And code functions only for this
> > representation. This has the advantage of not duplicating code but the
> > translation process may be consuming time and space if I have a large
> > amount of data.
>
> > 2. use method to specialize on each data format. In order to do this I
> > have to create a class "list-of-vectors" (for example) in order to be
> > able to specialize on it. This not very convenient especially when
> > working with the REPL because I have to use make-instance very often.
>
> > 3. use a method that specialize on a symbol ex:
> > (defmethod mean-columns ((data-format (eql :list-of-vectors))
> > data) ...
>
> > This may be a very newbie dilemma but, anyway, is there more clever
> > ways to deal with this issue ?
>
> Another way to deal with that would be to use an adaptor pattern.  And
> since you seem to say that you have more than three representations,
> indeed it would be nice to choose one intermediary representation, so
> you have to write only 2N adaptors instead of N²-N.
>
> Now, the nice thing is that once you've wrote the 2N adaptors, you can
> compose then and compile the N²-N optimized direct adaptors.
>
> But another consideration is that data structures should match the
> algorithms (and vice-versa).  If you have an algorithm that needs
> direct access to the slots of your sequences, it would be much better
> to use a vector than an adaptor over a list.  This would mean
> converting the data into the data structure adapted to each algorithm.
> If you hide well your representations as indicated in the Lecture-4b,
> you could convert from one representation to the other automatically.
>
> (let ((seq (make-instance 'smart-sequence)))
>   (insert item seq :at-end)
>   (insert item seq :at-end)
>   (insert item seq :at-end) ; oh oh! we're just appending items to the end
>                             ; perhaps I should switch to a reverse list representation
>   (insert item seq :at-end) ; Yay! O(1)
>   (insert item seq :at-end)
>   ... (assume 10000 more)
>   ;; assume a call to sort
>   (ref item 5000)              
>   (ref item 2500)           ; oh oh! looks like random access
>   (ref item 7500)           ; let's switch to a vector representation
>   (ref item 1250)           ; Yay! O(1)
>   (ref item 3750)
>   ... (assume 10000 more)
>   )
>

Thank you for the link and the tips.
I watched the lecture. Very interesting. For me, many of the concepts
that are taught are build-in with CLOS. Anyway the lecture was very
eloquent. It seems clear that, as Alberto suggests, that the full CLOS-
based approach is the best. I suppose the adapters you talk about
should be implemented as change-class methods. However the switch
between representations may be space and time consuming.

(defclass list-of-vectors ()
  ((data :accessor data :initarg :data)))

(defclass 2d-array ()
  ((data :accessor data :initarg :data)))


(defmethod sort/column ((a list-of-vectors) col pred)
  (setf (data a) (sort (data a) pred
                       :key #'(lambda (v) (svref v col)))))

(defmethod sort/column ((a 2d-array) col pred)
  ;; let's be lazy
  (setq a (change-class a 'list-of-vectors))
  (sort/column a col pred))

(defmethod update-instance-for-different-class :after ((previous 2d-
array) (current list-of-vectors)
                                                       &key &allow-
other-keys)
  (setf (data current)
        (loop
           for i below (array-dimension (data previous) 0)
           collect (coerce (loop
                              for j below (array-dimension (data
previous) 1)
                              collect (aref (data previous) i j))
                           'vector))))

(defun test ()
  (let ((m1 (make-instance 'list-of-vectors :data (list (vector 1 2 3)
(vector 4 5 6))))
        (m2 (make-instance '2d-array
                           :data (make-array '(2 3)
                                             :initial-contents '((1 2
3) (4 5 6))))))
    (format t "m1: ~A ~A~%" m1 (data m1))
    (format t "m2: ~A ~A~%" m2 (data m2))
    (sort/column m1 1 #'>)
    (sort/column m2 1 #'>)
    (format t "m1: ~A ~A~%" m1 (data m1))
    (format t "m2: ~A ~A~%" m2 (data m2))))

CL-USER> (test)
m1: #<LIST-OF-VECTORS {C0A4D31}> (#(1 2 3) #(4 5 6))
m2: #<2D-ARRAY {C0A4DB9}> #2A((1 2 3) (4 5 6))
m1: #<LIST-OF-VECTORS {C0A4D31}> (#(4 5 6) #(1 2 3))
m2: #<LIST-OF-VECTORS {C0A4DB9}> (#(4 5 6) #(1 2 3))




> --
> __Pascal Bourguignon__                    http://www.informatimago.com/
>
> "You cannot really appreciate Dilbert unless you read it in the
> original Klingon"

From: Pascal J. Bourguignon
Subject: Re: how to deal with various representations of matrices ?
Date: Tue, 24 Jun 2008 23:28:20 +0000
Message-ID: <87lk0ul8ff.fsf@hubble.informatimago.com>

Thibault Langlois <·················@gmail.com> writes:
> On Jun 23, 7:24�pm, ····@informatimago.com (Pascal J. Bourguignon)
> wrote:
> I watched the lecture. Very interesting. For me, many of the concepts
> that are taught are build-in with CLOS. 

Indeed.

> Anyway the lecture was very
> eloquent. It seems clear that, as Alberto suggests, that the full CLOS-
> based approach is the best. 

> I suppose the adapters you talk about
> should be implemented as change-class methods. 

Not exactly.  CHANGE-CLASS could be used for the version that switch
representation, as you've done.

What I meant by adaptor, is a set of routines that provide the same
API as the other representation, while accessing the given
representation.  In fact, it's the same as what is described in the
lecture, only that the interface layer is added after the fact, to
adapt an existing representation to an existing API.

Assuming you have a matrix class, with generic accessorssuch as
(defclass matrix (abstract-matrix) ...))
(row m i)    -> vector
(column m j) -> vector
(cell m i j) -> element
...

and you want to pass a matrix represented as a list of rows
(represented as lists too) to code using this matrix class, then you
could write an adaptor:

;; untested pseudo-code ;-)

(defclass list-matrix-adaptor (abstract-matrix)
   ((data :type list #| of list |# :initarg :data)))

(defmethod dimensions ((m list-matrix-adaptor))
  (list (length (slot-value m 'data))
        (length (nth 0 (slot-value m 'data)))))

(defmethod row ((m list-matrix-adaptor) i)
  (coerce (elt (slot-value m 'data) i) 'vector))

(defmethod (setf row) (value (m list-matrix-adaptor) i)
  (map-into (elt (slot-value m 'data) i) (function identity) value))

(defmethod column ((m list-matrix-adaptor) j)
  (coerce (mapcar (lambda (row) (elt row j))
                  (slot-value m 'data)) 'vector))

(defmethod (setf column) (value (m list-matrix-adaptor) j)
  (map nil (lambda (row value) (setf (elt row j) value))
           (slot-value m 'data) value)
  value)

(defmethod cell ((m list-matrix-adaptor) i j)
  (elt (elt (slot-value m 'data) i) j))

(defmethod (setf cell) (value (m list-matrix-adaptor) i j)
  (setf (elt (elt (slot-value m 'data) i) j) value))

so the methods defined on abstract-matrix that use these accessors can
now work on lists of lists, thru this adaptor..

(defmethod += ((a abstract-matrix) (b abstract-matrix))
   (assert (equal (dimensions a) (dimensions b)))
   (destructuring-bind (nrows ncolumns) (dimensions a)
      (loop
          :for i :from 0 :below nrows
          :do (loop
                 :for j :from 0 :below nrows
                 :do (incf (cell a i j) (cell b i j)))))
   a)

However, applied on a list-matrix-adaptor, += now has a complexity of
O(R�C�) instead of O(RC), because (cell list-matrix-adaptor i j) is
O(RC) instead of O(1).

Another limitation of this adaptor approach is that most CL functions
are not generic function.  We couldn't write an adaptor to map vectors
to lists, because we cannot define methods on cl:first and cl:rest for
vectors.  This is only practical with generic functions, and when
there is a superclass we can derive from, so we can pass adaptor
objects instead of the natural subclasses to the working code.

> However the switch
> between representations may be space and time consuming.

Indeed, but if it allows you to get O(N) instead of O(N�), it may be
worth the pain. 

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

There is no worse tyranny than to force a man to pay for what he does not
want merely because you think it would be good for him. -- Robert Heinlein