Unified data access again

From: Max Ischenko
Subject: Unified data access again
Date: Tue, 05 Mar 2002 07:16:35 +0000
Message-ID: <jcr16a.9r2.ln@cvs>

AFAIU each kind of data type has it's own accessor function like
GETHASH, AREF, ELT. Is there a unifying one, like SETF, that hides impl.
details?

A quote from Successfull Lisp tutorial:

: If you need to pick out one element (or a range of elements) from a
: sequence, you can use ELT (to pick out one element) or SUBSEQ (to pick
: out a range of elements). But don't use these unless you're really
: sure you can't narrow down the sequence type to a vector or list;
: there are more specific (hence more efficient) accessors for the less
: general types.

Isn't it bad that you have to know specific kind of sequence to be
able to efficiently operate upon it?

Can I (should I) simulate e.g. Pythonic style
a[index] = value
where A can be a string, a list, a hash or any user-defined type that provides
__getitem__ protocol?

Re: Unified data access again Kent M Pitman
- Re: Unified data access again Max Ischenko

From: Kent M Pitman
Subject: Re: Unified data access again
Date: Tue, 05 Mar 2002 08:31:58 +0000
Message-ID: <sfwvgcbct4x.fsf@shell01.TheWorld.com>

Max Ischenko <···@malva.com.ua> writes:

> AFAIU each kind of data type has it's own accessor function like
> GETHASH, AREF, ELT. Is there a unifying one, like SETF, that hides impl.
> details?

It would be hard to hide the fact that AREF takes multiple dimensions
while GETHASH and ELT do not.

ELT is not like the others you listed because it already hides the 
difference between NTH and 1-argument AREF.  But the fact is that ELT,
already being present, should show you why it's a bad idea to have it.
A typical application with ELT might be:

 (defun list-elements (sequence)
   (loop for i below (length sequence)
         collect (elt sequence i)))

But the fact is that there this is almost never any good use and this is
one of the bad ones.  It's O(n^2) where n is the length of the sequence
because you are hiding the important fact that lists are only linearly
accessible by using an abstract accessor that purports to be able to do
indexed access when it really cannot (in any algorithmically efficient
sense) and has to simulate it.

> A quote from Successfull Lisp tutorial:
> 
> : If you need to pick out one element (or a range of elements) from a
> : sequence, you can use ELT (to pick out one element) or SUBSEQ (to pick
> : out a range of elements). But don't use these unless you're really
> : sure you can't narrow down the sequence type to a vector or list;
> : there are more specific (hence more efficient) accessors for the less
> : general types.
> 
> Isn't it bad that you have to know specific kind of sequence to be
> able to efficiently operate upon it?

No.  It's necessary if you're going to work at the level of "mere access".

In the case of MAP, it is possible to abstract this over lists and vectors,
and MAP does that.  It's arguably a weakness that n-dimensional vectors can't
be mapped over using the same operation.

But it's not clear what it would mean to map over a hash table.  I guess you
it would be sort of like
  (defmethod my-map (fn (table hash-table))
    (maphash #'(lambda (key val) 
                 (declare (ignore key))
                 (funcall fn val))
             table))

Btw, nothing keeps you from either making your own functions (like MY-MAP)
or your own package such that you can shadow existing functions (like
having a MY-WORKSPACE:MAP that is different than CL:MAP), and making them
generic in just this way.

> Can I (should I) simulate e.g. Pythonic style
> a[index] = value
> where A can be a string, a list, a hash or any user-defined type that provides
> __getitem__ protocol?

As above, I mostly don't think abstracting at this level is wise.  Though it
is certainly possible. I don't mind a mapping protocol.

I also personally don't like wiring builtins to user-defined functionality 
of such a general nature.  This too easily leaves open the door to a[x] 
meaning something that is not O(1), when I think the eye is cued by the
primitive-looking syntax to think O(1).  I prefer to leave a[x] attached to
either an unchangeable set of meanings or a set of meanings that is 
extensible only under much more rigid circumstances than you're describing.

This starts to get to the difference between the way I use the terms
"overloading" and "genericity".  I think overloading means to attach 
random quasi-syntactically-related meanings to the same syntax, while
genericity attaches a much higher burden to actually relate the meanings.
"foo"+"bar"=>"foobar" is overloading of "+".   Whereas, making + work
for both floats and integers is not overloading.

Then again, there are issues that are a gray area.  Consider that LENGTH
could very easily run into problems being recursively defind (for lists)
and directly defined (for vectors).  The case of (LENGTH '(A . "foo"))
is the interesting case in question for people who are fans of a generic
"length" operation; I personally don't like LENGTH being generic because
it's easy to accidentally construct cases where people want things like
(LENGTH '(A . "foo")) => 4, whereas I want it to signal an error.  Only an
extremely careful definition of LENGTH will avoid this kind of confusion
creeping in, and I don't think we are very practiced at describing the 
true requirements of a generic such that adding to it is very reliable.

We do a bad job even with PRINC, which does one kind of action on things like
pathnames and error objects, and a very different kind of action on 
symbols.  In that case, it's PRINT-OBJECT that is the generic, but the
issues are the same, and we defined PRINT-OBJECT fairly badly.  Most uses of
PRINC are not recursive, so mostly one gets by.  But if you do:
 (progn
   (handler-case this-is-unbound
     (error (c) (princ (list c c))))
   t)
You'll see the kind of mess that results from not having effectively tolerated
overloading instead of having encouraged genericity in the definition of
PRINT-OBJECT.

From: Max Ischenko
Subject: Re: Unified data access again
Date: Tue, 05 Mar 2002 10:38:24 +0000
Message-ID: <07726a.kj3.ln@cvs>

 Kent M Pitman wrote:

> It would be hard to hide the fact that AREF takes multiple dimensions
> while GETHASH and ELT do not.

[  skip  ]

Thank you very much for your comments!


> As above, I mostly don't think abstracting at this level is wise.
> Though it is certainly possible. I don't mind a mapping protocol.

> I also personally don't like wiring builtins to user-defined
> functionality of such a general nature. This too easily leaves open
> the door to a[x] meaning something that is not O(1), when I think the
> eye is cued by the primitive-looking syntax to think O(1). I prefer
> to leave a[x] attached to either an unchangeable set of meanings
> or a set of meanings that is extensible only under much more rigid
> circumstances than you're describing.

[ skip ]

> You'll see the kind of mess that results from not having effectively
> tolerated overloading instead of having encouraged genericity in the
> definition of PRINT-OBJECT.

Not sure if I agree with this, but wait, I'm still a Lisp newbie!