Of money and mouths

From: Joe Marshall
Subject: Of money and mouths
Date: Fri, 24 Oct 2003 14:27:40 +0000
Message-ID: <smliu2w3.fsf@ccs.neu.edu>

Alexander Schmolck suggested that I try my hand at converting a few
hundred lines of Python code into CL.  He emailed me a Python
interface to Matlab.

Matlab is a package produced by Mathworks.  It provides a number of
numeric analysis tools that are largely applicable in engineering.
Matlab manipulates matrices of floating-point numbers.  It is not a
symbolic math package like Macsyma or Mathematica.  While Matlab is a
commercial product, GNU Octave (www.octave.org) is a GPL product that
strives to be functionally compatible with Matlab.

Unfortunately, GNU Octave was designed on Unix.  There is no native
Windows port of GNU Octave, but there is a cygwin port.  However, the
FAQ notes that there are several `issues' and highly recommends using
the Linux version.  I'm not sure how to interface Lisp with a process
running under cygwin in the same address space.  I *could* run Octave
in its own address space and interact via pipes, but in my experience,
this is unreliable.

I admit that I have punted.  It was looking like a hairier task than I
have time for.  However, I have studied the code, and I'd like to
compare and contrast it with some Scheme code I have been working on
that does something somewhat analagous.

First, the Python.

Between Matlab and Python is a C++ shim layer that supports opening
and closing of Matlab sessions, marshaling of matrices between Matlab
and Python, and a mechanism for sending strings to the Matlab REPL.
It is fairly unremarkable.

The interesting part of the code is written in Python.  The MlabWrap
class does the bulk of the work, but there is a helper class
MlabObjectProxy that acts as a standin for matlab objects that cannot
be marshaled.

There are two techniques primarily used to effect the interface: calls
to the Matlab shim EVAL function with a literal string is the
mechanism by which Matlab is driven, and implementation of certain
methods that are known to the Python execution engine is the mechanism
by which proxy objects are `integrated' into the Python environment
(e.g., apparently if one defines the `__getitem__' and `__setitem__'
methods in a class then that class can be used in an expression like
`foo[3]' Similarly, if the `__getattr__' and `__setattr__' methods are
defined, the `foo.bar' syntax will use them.)

One troublesome issue is that parts of this interface are pure
kludge.  I'm not attributing this to Python, but rather to the very
loose coupling between Python and Matlab.  For instance, to determine
the type of a Matlab object, the following statements are executed:

    mlabraw.eval(self._session, "TMP_CLS__ = class(%s);" % varname)
    res_type = mlabraw.get(self._session, "TMP_CLS__")
    mlabraw.eval(self._session, "clear TMP_CLS__;")
    return res_type

A variable TMP_CLS__ is assigned to the type, then it is printed, then
the variable is cleared.  Obviously this package can only be used in a
single-threaded mode.  I don't see a mechanism for keeping the matlab
REPL in sync with the python model.

The proxy class models *all* Matlab objects that cannot be marshaled.
The reflection mechanism in Python is used to tailor the specific
instances of the proxy class to react differently from other
instances.  Let me illustrate:

    sct = mlab._do("struct('type',{'big','little'},'color','red','x',{3 4})")

This statement creates a proxy object and assigns it to SCT.  At this
point, the expression   `sct[1].x' will evaluate to 4.  Apparently,
the object is searched like a lisp property list and then the
resulting value is extracted.  I don't understand why this isn't
`(sct.x)[1]', though.  Regardless, suppose that I now evaluate:

    bct = mlab._do("struct('type',{'BIG','little'},'color','red')")

The expression `bct[1].x' is an error because there is no field with
the value `x' in the object `bct'.

It appears to be the case, however, that Python would percieve both
bct and sct to be instances of the MlabObjectProxy class.


Now the Scheme

I needed a mechanism by which I could manipulate .NET objects from
MzScheme.  This is similar to the above project, but there are some
interesting differences:

    1) The .NET runtime engine can be embedded within another
       process and can communicate via COM.  Rather than converting
       everything to strings for interaction, primitive types such as
       integers and strings, can be marshaled through the interop
       layer directly.  More complex types are represented directly
       as opaque `foreign objects' in Scheme.

    2) It is highly desirable to model and integrate the .NET type
       system within Scheme so that there exists a Scheme class
       hierarchy in one-to-one correspondence with the .NET class
       hierarchy *in addition* to the native Scheme class system.

    3) It is highly *undesirable* to *wed* the two type systems:  .NET
       is a java-like single-inheritance + interfaces + special
       non-typed `value' classes.

As part of its macro expansion phase, MzScheme implicitly wraps
variables with a `#%top' macro.  The default binding for `#%top' is
simply a no-op, but if you re-define it to a macro function, you can
intercept all the variable references in the code.  This allows us two
very useful tricks:

  1) Variables of the form ':foo' are macroexpanded to (quote :foo)
     thus providing us with keywords ala Common Lisp.

  2) Variables with embedded dots are transformed:

        .foo$      -->  (javadot-instance-field-getter 'foo)
        (setf (.foo$ x) y)  --> (javadot-instance-field-setter 'foo)
        .foo       --> (javadot-find-generic 'foo)
        foo.       --> (javadot-find-constructor 'foo)
        Foo.class  --> (javadot-find-class 'foo)
        Foo.bar    --> (javadot-find-static-method 'foo 'bar)
        Foo.bar$   --> (javadot-find-static-field 'foo 'bar)

     This allows us to manipulate .NET objects with a syntax that will
     be familiar to users of both .NET and Scheme:

     (setf (.DashStyle$ pen) System.Drawing.Drawing2D.DashStyle.Dot$)


In order to model the type system, I used Eli Barzilay's excellent
Swindle package.  This provides a CLOS-style object system with a
CLOS-style MOP (in addition to a number of clever utilities that make
the Scheme/Common Lisp transition easier:  extended lambda lists --
&optional &rest and &key args, setf, defsubst, non-hygienic macros)

I define System.RuntimeType metaclass that will represent instances of
.NET type descriptors.  From this I derive those instances, but
because they are also derived from class Class, they are full-fledged
class objects as well.  The system bootstraps itself by finding the
.NET root metatype, then using the .NET reflection mechanism to
instantiate methods and generic functions specialized on the .NET type
model.  Whenever an object is returned from .NET its type is queried
and an instance of the appropriate wrapper is created.  If the type
has not been seen before, a metatype is instantiated and the new
methods brought in.

Since .NET overloads functions in a way incompatible with CLOS, I
derived an overloadable generic function metaclass from the standard
generic function metaclass.

The upshot of all this is that I can write code like this:

(let ((type-builder (System.Reflection.TypeBuilder.)))

  (.DefineType type-builder *dynamic-module* "NewType")

  ;; Create a method called Main and assemble code into it.
  (let* ((method-builder (.DefineMethod type-builder "Main"))
         (il-generator   (.GetILGenerator method-builder)))

      (.Emit il-generator "Nop")
      (.Emit il-generator "Nop")
      (.Emit il-generator "Ret")
      (.CreateType type-builder)
      (.SetEntryPoint *dynamic-assembly* *dynamic-module* "NewType" "Main")
      (.Save *dynamic-assembly* "mymodule.dll")))

This code uses the .NET reflection mechanism to build a .NET assembly
on the fly.

As you can see, .NET objects are fully integrated into the Scheme
system along with a syntax that bears resemblance to the `standard'
.NET syntax.

----

What I did with Scheme and .NET is considerably `weightier' and/or
`hairier' than the Python/Matlab interface, but on the other hand it
is attempting to accomplish more.  The goal was to merge two
completely different computation models to simulate as closely as
possible a Scheme hosted within the .NET framework.  The macro
system allows me to extend the standard Scheme syntax to accomodate
the popular `dotted' notation that is in vogue these days.

The Python/Matlab interface is certainly `simpler' and/or 'easier to
grok', but its goal is to enable Python to invoke Matlab functions and
get the return values.  It automatically marshals vectors and scalars,
but it still keeps Matlab at arm's length.  Vectors and proxy objects
do not retain their `identity' across the interface (mutation to a
subarray of a proxied object only modifies the proxy's representation,
not the Matlab object it is representing).  For numeric calculation
this is likely not an issue.

The Python/Matlab interface makes no effort to import any syntax from
Matlab.  This isn't an issue because Matlab has no interesting syntax
to import.


The Swindle object system, based on CLOS, is a very mature and
sophisticated object model.  By using the MOP, I am able to relfect
and instantiate the entire .NET class hierarchy (including the
inheritance model and method dispatching) with four specialized
metaclasses (the rest is automatically bootstrapped).

The Python object system is message based.  Python provides access to
the method `dictionary' associated with an object, so it is possible
to modify the set of messages handled by an object at runtime.  This
makes it quite flexible.  The influence of Smalltalk can clearly be
seen in Python.

I won't draw any conclusions at this point and let the languages
speak for themselves.