Data structure abstraction

From: Jerome Baum
Subject: Data structure abstraction
Date: Sun, 28 Dec 2008 17:41:25 +0000
Message-ID: <u7i5k19kq.fsf@jeromebaum.com>

Hey all,


First of all, sorry for the cross-posting. I felt this was
appropriate in both groups. Followup-To is set to c.l.l in any case.

I really didn't know what to name this post. I am looking for a
little bit of advice on an abstraction I want to make.

Basically, different programming language may or may not have the
concept of namespaces, some may call these modules (see
Erlang). Some have classes and objects, others only have objects,
others are not object-oriented at all. Again see Erlang for an
exotic choice -- objects ("records") are vectors.

What I am looking for is a way to generalize away from this. If a
language absolutely doesn't something (e.g. objects) then of
course there isn't much of a way to make OO appear in that
language short of implementing it *ergh*, but otherwise I would
like to have a general way of addressing, for example,
functions, even taken across different languages.

For comparison, I am thinking something "similar" to CL pathnames
may be a nice solution, but then again I'm not really sure.

I thought of some components which would have to be considered --
I am sure there are many I forgot.

* Namespaces
* Modules
* Packages
* Classes
* Objects
* Arrays (Lists _and_ Hashes / Assoc. Arrays)
* Vectors
* Strings
* Characters
* Integers / Floats / Doubles / Reals
* Complex (might be some kind of vector?)

Of course many of these can be put in the same class, but I would
prefer some advice _before_ doing this the wrong way.

As to what I am trying to achieve with this, I am thinking of a
cross-language API. Something like python-on-lisp but generalized
to "any" programming languages.

Well then, that was another long post... -- I'd be thankful for
comments.


- Jerome

Re: Data structure abstraction Stanisław Halik
- Re: Data structure abstraction Jerome Baum
  - Re: Data structure abstraction Stanisław Halik
    - Re: Data structure abstraction Jerome Baum
Re: Data structure abstraction Chris Barts
- Re: Data structure abstraction Jerome Baum
  - Re: Data structure abstraction Chris Barts
    - Re: Data structure abstraction Jerome Baum
      - Re: Data structure abstraction Chris Barts
        Re: Data structure abstraction Jerome Baum

From: Stanisław Halik
Subject: Re: Data structure abstraction
Date: Sun, 28 Dec 2008 19:47:39 +0000
Message-ID: <gj8l4r$21j7$2@opal.icpnet.pl>

In comp.lang.lisp Jerome Baum <····@jeromebaum.com> wrote:

> Basically, different programming language may or may not have the
> concept of namespaces [...] Again see Erlang for an exotic choice --
> objects ("records") are vectors.

Usually in CLOS the slot values are stored in a vector too :)

> [...] but otherwise I would like to have a general way of addressing,
> for example, functions, even taken across different languages.

You mean like FFI, RPC or like cross-language serialization? There's
already `cl-serializer' that might prove useful.

-- 
You only have power over people so long as you don’t take everything
away from them. But when you’ve robbed a man of everything he’s no longer
in your power — he’s free again. -- Aleksandr Isayevich Solzhenitsyn

From: Jerome Baum
Subject: Re: Data structure abstraction
Date: Sun, 28 Dec 2008 19:55:17 +0000
Message-ID: <uhc4oyt0a.fsf@jeromebaum.com>

Stanisław Halik <··············@tehran.lain.pl> writes:

>> [...] but otherwise I would like to have a general way of addressing,
>> for example, functions, even taken across different languages.
>
> You mean like FFI, RPC or like cross-language serialization?

More like FFI, not RPC or cross-language serialization, though of
course serialization falls into this domain.

So say I want to call a Tcl function from Lisp and then want to
use the output from that to call an Erlang or Java
function. There would have to be some kind of unified interface
to abstract away issues of not-exactly-the-same datatypes,
e.g. differences between records, CLOS/Java objects and Tcl
<things>.

Another consideration is that of lambdas. I would hope that it is
possible to transform lambdas from lisp relatively "easily" into
other languages supporting such constructs (whatever they may be
called in the target language).

In order for all of that to be transparent, the must be a very
solid abstraction as a base to the "API."

Hope that clarifies things a bit.

- Jerome

From: Stanisław Halik
Subject: Re: Data structure abstraction
Date: Mon, 29 Dec 2008 13:45:55 +0000
Message-ID: <gjakaj$1jv3$1@opal.icpnet.pl>

thus spoke Jerome Baum <·····@jeromebaum.com>:

> Another consideration is that of lambdas. I would hope that it is
> vpossible to transform lambdas from lisp relatively "easily" into
> other languages supporting such constructs (whatever they may be
> called in the target language).

No.

Unless lambdas are to be written in a lowest common denominator form
with a sparse library. IMO not worth it.

Putting multiple languages in the same address space might prove
challenging. Not so much with Tcl, as it's practically designed for
embeddability, but linking with Erlang could be hard.

-- 
You only have power over people so long as you don’t take everything
away from them. But when you’ve robbed a man of everything he’s no longer
in your power — he’s free again. -- Aleksandr Isayevich Solzhenitsyn

From: Jerome Baum
Subject: Re: Data structure abstraction
Date: Mon, 29 Dec 2008 13:55:18 +0000
Message-ID: <uwsdjhyrd.fsf@jeromebaum.com>

Stanisław Halik <··············@tehran.lain.pl> writes:

> Putting multiple languages in the same address space might prove
> challenging. Not so much with Tcl, as it's practically designed for
> embeddability, but linking with Erlang could be hard.

I think Erlang will actually be lowest on the list of goals, but
I did want to show that such languages must also be considered in
the design of the API -- it is an API and should be designed with
extreme care! But yes, totally agree with you there.

From: Chris Barts
Subject: Re: Data structure abstraction
Date: Mon, 29 Dec 2008 03:13:26 +0000
Message-ID: <878wpzn06h.fsf@chbarts.motzarella.org>

Jerome Baum <····@jeromebaum.com> writes:

> What I am looking for is a way to generalize away from this. If a
> language absolutely doesn't something (e.g. objects) then of
> course there isn't much of a way to make OO appear in that
> language short of implementing it *ergh*, but otherwise I would
> like to have a general way of addressing, for example,
> functions, even taken across different languages.

There is another direction to this as well: Some low- or middle-level
languages have concepts that don't map to Lisp or Erlang or
similar. Pointers are the most obvious. I'm sure someone else can come
up with more. (In addition, antique dialects of FORTRAN have functions
with multiple entry points and the heartbreak of the COMMON
block. Trying to implement either of those in Lisp is far beyond my
tolerance for insanity. The ALTER verb from COBOL is similarly crazy.)

>
> For comparison, I am thinking something "similar" to CL pathnames
> may be a nice solution, but then again I'm not really sure.

The full complexity of CL pathnames has outlived most of its
usefulness in terms of real-world OSes, so they might be a bad
specific example. Couch this in terms of sharing code across languages
on the same VM and you're golden. ;)

>
> I thought of some components which would have to be considered --
> I am sure there are many I forgot.
>
> * Namespaces
> * Modules
> * Packages

How are these all not the same thing?

> * Classes
> * Objects

Saying 'classes' and 'objects' is easy, but how do you map between
Java's OO and CLOS? It might be better to decompose OO into its
constituent concepts and find a way to map each of them. This link
might help you do that: <http://www.paulgraham.com/reesoo.html>

> * Arrays (Lists _and_ Hashes / Assoc. Arrays)
> * Vectors

You conflate lists, hashes, and arrays, but you break vectors out into
their own space. This doesn't make much sense: Lists and hashes are
very different from arrays, as lists and hashes optimize for insertion
but arrays optimize for indexing. In addition, hashes optimize for a
complex form of indexing neither lists nor arrays support. I'm also
not sure why vectors deserve their own identity here distinct from
arrays.

> * Strings
> * Characters

You must distinguish characters from octets, which means you must care
about encoding schemes (Latin-1, UTF-8, UCS-2LE, etc.) and things like
collation and (in Unicode-related schemes) normalization and combining
forms.

> * Integers / Floats / Doubles / Reals
> * Complex (might be some kind of vector?)

How about quaternions and octonions? What about overflow vs. bignums?
What about implementing a numeric tower? Is a bitfield conceptually an
integer with a fixed size or a vector type with bitwise arithmetic
defined on it? Every language seems to go about this somewhat
differently, and a lot of high-level languages ignore bitfields
completely, making implementing encryption software more difficult
than it needs to be.

> As to what I am trying to achieve with this, I am thinking of a
> cross-language API. Something like python-on-lisp but generalized
> to "any" programming languages.

Just ignore C and C++. "Any" programming language should be restricted
to "any programming language normally implemented on a GC'd
runtime". That is probably my biggest piece of advice.

From: Jerome Baum
Subject: Re: Data structure abstraction
Date: Mon, 29 Dec 2008 13:52:26 +0000
Message-ID: <u1vvrjdgl.fsf@jeromebaum.com>

Chris Barts <··············@gmail.com> writes:

> Jerome Baum <····@jeromebaum.com> writes:
>> * Namespaces
>> * Modules
>> * Packages
>
> How are these all not the same thing?

I said somewhere below that I didn't want to put thing together
before somebody else just takes a look at the list in general
(but see below on arrays).

>> * Classes
>> * Objects
>
> Saying 'classes' and 'objects' is easy, but how do you map between
> Java's OO and CLOS? It might be better to decompose OO into its
> constituent concepts and find a way to map each of them. This link
> might help you do that: <http://www.paulgraham.com/reesoo.html>

Very good point!

>> * Arrays (Lists _and_ Hashes / Assoc. Arrays)
>> * Vectors
>
> You conflate lists, hashes, and arrays, but you break vectors out into
> their own space.

That was not my intention, rather I wanted to make the
distinction (I was even thinking of putting an _xor_ but then
there would have been more confusion, in hindsight I see that
even _xunion_ wouldn't have helped, I simply should have said:

* Lists
* Hashes / Assoc. Arrays

Good too lazy on my typing there.

As for the vectors, there is a distinction made (in other
languages) with regards to dynamic length, see Erlang (though
they call these tuples). But then I think that is also most
common anyway, so one would need to add another type called
"dynamic array" or so.

>> * Strings
>> * Characters
>
> You must distinguish characters from octets, which means you must care
> about encoding schemes (Latin-1, UTF-8, UCS-2LE, etc.) and things like
> collation and (in Unicode-related schemes) normalization and combining
> forms.

Again, very good point!

>> * Integers / Floats / Doubles / Reals
>> * Complex (might be some kind of vector?)
>
> How about quaternions and octonions? What about overflow vs. bignums?
> What about implementing a numeric tower? Is a bitfield conceptually an
> integer with a fixed size or a vector type with bitwise arithmetic
> defined on it? Every language seems to go about this somewhat
> differently, and a lot of high-level languages ignore bitfields
> completely, making implementing encryption software more difficult
> than it needs to be.

I was actually hoping to abstract away a lot of this
madness. Isn't that the purpose of an abstraction, after all?
Though I totally agree that bitfields should be in there.

I guess we would have to just pick one with regards to overflow
vs. bignums -- I would go for bignums as they seem more natural.

>> As to what I am trying to achieve with this, I am thinking of a
>> cross-language API. Something like python-on-lisp but generalized
>> to "any" programming languages.
>
> Just ignore C and C++. "Any" programming language should be restricted
> to "any programming language normally implemented on a GC'd
> runtime". That is probably my biggest piece of advice.

I'll keep this in mind (though I might end up implementing some
kind of subset of the functionality for C/C++ interop, but that
is of course last on the list).

Well then, thanks for you input. You showed me some difficult
issues in this -- and that was exactly the reason I went here
first instead of just coding away.

As for grouping things, would you for example say we can just
assume float and double to be the same (simply a decimal and then
have the docs say that lowest precision should be assumed)? Or
should we go for an integer tuple (which is inefficient)? Then
there is always the possibility of mimicing IEEE to some extent
(integer plus integer exponent, which as it turns out is also an
integer tuple).

Best wishes anyhow,

- Jerome

From: Chris Barts
Subject: Re: Data structure abstraction
Date: Tue, 30 Dec 2008 18:01:31 +0000
Message-ID: <871vvpsft0.fsf@chbarts.motzarella.org>

Jerome Baum <·····@jeromebaum.com> writes:

> Chris Barts <··············@gmail.com> writes:
>
>> Jerome Baum <····@jeromebaum.com> writes:
>>> * Namespaces
>>> * Modules
>>> * Packages
>>
>> How are these all not the same thing?
>
> I said somewhere below that I didn't want to put thing together
> before somebody else just takes a look at the list in general
> (but see below on arrays).

OK, that makes sense. And looking at it the next day, I think modules
and packages both have some notion of inheritance whereas pure
namespaces don't tend to.

>
>>> * Classes
>>> * Objects
>>
>> Saying 'classes' and 'objects' is easy, but how do you map between
>> Java's OO and CLOS? It might be better to decompose OO into its
>> constituent concepts and find a way to map each of them. This link
>> might help you do that: <http://www.paulgraham.com/reesoo.html>
>
> Very good point!

I'm just sick of people talking about OO like it's one concept. It
isn't and it never has been, and the set of concepts it implicitly
bundles together has changed radically over the 40+ years the notion
has been in existence. (The concept is that old (Simula-67), but not
the name (which was invented in the 1980s (?) by someone associated
with Smalltalk.))

>
>>> * Arrays (Lists _and_ Hashes / Assoc. Arrays)
>>> * Vectors
>>
>> You conflate lists, hashes, and arrays, but you break vectors out into
>> their own space.
>
> That was not my intention, rather I wanted to make the
> distinction (I was even thinking of putting an _xor_ but then
> there would have been more confusion, in hindsight I see that
> even _xunion_ wouldn't have helped, I simply should have said:
>
> * Lists
> * Hashes / Assoc. Arrays
>
> Good too lazy on my typing there.

OK, that makes a lot of sense.

>
> As for the vectors, there is a distinction made (in other
> languages) with regards to dynamic length, see Erlang (though
> they call these tuples). But then I think that is also most
> common anyway, so one would need to add another type called
> "dynamic array" or so.

Sure. I wasn't aware of that implication.

>
>>> * Strings
>>> * Characters
>>
>> You must distinguish characters from octets, which means you must care
>> about encoding schemes (Latin-1, UTF-8, UCS-2LE, etc.) and things like
>> collation and (in Unicode-related schemes) normalization and combining
>> forms.
>
> Again, very good point!

I suppose most of what I was talking about above can be done in
libraries. The main thing the type ontology has to do is distinguish
characters from octets because a single character can be an arbitrary
number of octets long (especially in UTF-16 with codepoints outside of
the BMP (if you didn't understand that, you should look at a more
specialized Unicode reference)).

>
>>> * Integers / Floats / Doubles / Reals
>>> * Complex (might be some kind of vector?)
>>
>> How about quaternions and octonions? What about overflow vs. bignums?
>> What about implementing a numeric tower? Is a bitfield conceptually an
>> integer with a fixed size or a vector type with bitwise arithmetic
>> defined on it? Every language seems to go about this somewhat
>> differently, and a lot of high-level languages ignore bitfields
>> completely, making implementing encryption software more difficult
>> than it needs to be.
>
> I was actually hoping to abstract away a lot of this
> madness. Isn't that the purpose of an abstraction, after all?

That's the purpose of using an abstraction, not creating one. Creators
get to have /fun/. ;)

> Though I totally agree that bitfields should be in there.

Good. Everyone attempting to implement DES using this library will
thank you.

> I guess we would have to just pick one with regards to overflow
> vs. bignums -- I would go for bignums as they seem more natural.

That's good for 99.9% of the users, but the rest will want some way to
specify fixed-width integral types with defined overflow
behavior. (Checksum algorithms really need this, for example.)

> As for grouping things, would you for example say we can just
> assume float and double to be the same (simply a decimal and then
> have the docs say that lowest precision should be assumed)?

This should make sense for most uses. Anyone doing heavy numerical
analysis work will likely be using the FFI to access code written in
Fortran or assembly language.

> Or should we go for an integer tuple (which is inefficient)? Then
> there is always the possibility of mimicing IEEE to some extent
> (integer plus integer exponent, which as it turns out is also an
> integer tuple).

I think this is too picky, and likely to be slow enough people will
notice and complain. IBM-PC users in the 1980s learned that doing
floating point math in software is a method of last resort.

From: Jerome Baum
Subject: Re: Data structure abstraction
Date: Wed, 31 Dec 2008 15:19:55 +0000
Message-ID: <ud4f8l6ck.fsf@jeromebaum.com>

Chris Barts <··············@gmail.com> writes:

> OK, that makes sense. And looking at it the next day, I think modules
> and packages both have some notion of inheritance whereas pure
> namespaces don't tend to.

However there is also a need to unify these in some way (you are
totally right that these *should be* the same thing). After all,
users of different languages shouldn't need to care if another
language uses another wording for its modules or whatever. This
definitely must be abstracted away!

> I'm just sick of people talking about OO like it's one concept. It
> isn't and it never has been, and the set of concepts it implicitly
> bundles together has changed radically over the 40+ years the notion
> has been in existence. (The concept is that old (Simula-67), but not
> the name (which was invented in the 1980s (?) by someone associated
> with Smalltalk.))

Do you have a suggestion as to solving this problem? (See below
for a longer comment.)

>> I guess we would have to just pick one with regards to overflow
>> vs. bignums -- I would go for bignums as they seem more natural.
>
> That's good for 99.9% of the users, but the rest will want some way to
> specify fixed-width integral types with defined overflow
> behavior. (Checksum algorithms really need this, for example.)

I think bignums would be the way to go anyway, for one very good
reason -- an actual checksum algorithm shouldn't really be
calling other language's routines, so it wouldn't care about
overflow or not. After all, this is meant to be a form of
communicating between languages -- if overflow is required
anywhere in this process, one or two lines of code should
suffice, or even just a conversion to a native overflowing
datatype.

>> Or should we go for an integer tuple (which is inefficient)? Then
>> there is always the possibility of mimicing IEEE to some extent
>> (integer plus integer exponent, which as it turns out is also an
>> integer tuple).
>
> I think this is too picky, and likely to be slow enough people will
> notice and complain. IBM-PC users in the 1980s learned that doing
> floating point math in software is a method of last resort.

Again, we don't want to actually be doing math in this software
-- just converting numbers into formats for different languages.

As for the OO above, that really seems a difficult one. After
all, there are so many different view-points and they must
somehow all be integrated into a single unified method of
communication. I really cannot think of a good way to do this.

Anyhow, a happy new year to everybody!

From: Chris Barts
Subject: Re: Data structure abstraction
Date: Thu, 01 Jan 2009 11:58:01 +0000
Message-ID: <87vdszfdbq.fsf@chbarts.motzarella.org>

Jerome Baum <·····@jeromebaum.com> writes:

> Chris Barts <··············@gmail.com> writes:
>
>> OK, that makes sense. And looking at it the next day, I think modules
>> and packages both have some notion of inheritance whereas pure
>> namespaces don't tend to.
>
> However there is also a need to unify these in some way (you are
> totally right that these *should be* the same thing). After all,
> users of different languages shouldn't need to care if another
> language uses another wording for its modules or whatever. This
> definitely must be abstracted away!

I suppose I slightly misconstrued your goals. By all means, your way
makes sense.

>
>> I'm just sick of people talking about OO like it's one concept. It
>> isn't and it never has been, and the set of concepts it implicitly
>> bundles together has changed radically over the 40+ years the notion
>> has been in existence. (The concept is that old (Simula-67), but not
>> the name (which was invented in the 1980s (?) by someone associated
>> with Smalltalk.))
>
> Do you have a suggestion as to solving this problem? (See below
> for a longer comment.)

That problem is a social one, and the best way around it is to promote
languages with a more livable view of OO than C++ and Java take. The
best way is to promote Common Lisp, which rather explicitly gives
programmers the whole smorgasboard and does not insist in whiny,
petulant tones that they *must* use *all* of the options laid out
every single time. ;)

>
>>> I guess we would have to just pick one with regards to overflow
>>> vs. bignums -- I would go for bignums as they seem more natural.
>>
>> That's good for 99.9% of the users, but the rest will want some way to
>> specify fixed-width integral types with defined overflow
>> behavior. (Checksum algorithms really need this, for example.)
>
> I think bignums would be the way to go anyway, for one very good
> reason -- an actual checksum algorithm shouldn't really be
> calling other language's routines, so it wouldn't care about
> overflow or not. After all, this is meant to be a form of
> communicating between languages -- if overflow is required
> anywhere in this process, one or two lines of code should
> suffice, or even just a conversion to a native overflowing
> datatype.

Right. I did misconstrue the scope of what you wanted to do. This goes
for the comment below, as well, so I'll snip it.

> As for the OO above, that really seems a difficult one. After
> all, there are so many different view-points and they must
> somehow all be integrated into a single unified method of
> communication. I really cannot think of a good way to do this.

I think the best way for you to live with this is to duck the
problem. Make the system as low-level as possible wrt OO: Implement
things for simple ideas and let the users construct what they need at
the time. /Them/ going from Javascript to Java for /their/ project is
hard enough; it would be impossible for you to solve the problem for
/everyone/.

From: Jerome Baum
Subject: Re: Data structure abstraction
Date: Thu, 01 Jan 2009 16:25:18 +0000
Message-ID: <uhc4jkn81.fsf@jeromebaum.com>

Chris Barts <··············@gmail.com> writes:

> I think the best way for you to live with this is to duck the
> problem. Make the system as low-level as possible wrt OO: Implement
> things for simple ideas and let the users construct what they need at
> the time. /Them/ going from Javascript to Java for /their/ project is
> hard enough; it would be impossible for you to solve the problem for
> /everyone/.

Now that's insightful. You get a +1 in my score-file.

Happy new year and thanks again!


- Jerome