From: Paul Wilson
Subject: Re: Object IDs are good ( was: Object IDs are bad )
Date: 
Message-ID: <5k6656$2p9@roar.cs.utexas.edu>
In article <··············@mocha.CS.Princeton.EDU>,
Matthias Blume <·····@mocha.cs.princeton.edu> wrote:
>In article <··········@roar.cs.utexas.edu> ······@cs.utexas.edu (Paul Wilson) writes:
>
>   Hmmm... I dunno about electrons, and I'm not sure it's relevant,
>   but I'd say that for most purposes 1 *does* have an identity.
>
>   [ fun stuff about subatomic particles omitted ]
>
>Anyway, as irrelevant as it may be, we should remember not to make the
>mistake of sticking too closely to "intuition" when making models.
>
>   It's 1, you know, the number 1.  The reason why you can't copy the number
>   1 and distinguish between the copies is that it makes no sense
>   mathematically.  But that doesn't mean there isn't a single,
>   distinguished conceptual object 1.
>
>In the sense that it is different from 2, yes.  But 1 is always just 1
>-- conceptually. Isn't that exactly what I said?  

Mostly---I was mostly agreeing with you on this.

>terms, however, we can "distinguish" between "different" copies of 1:
>The 1 in register r5 as opposed to the 1 at memory location 0x338A0,
>and so forth.  Still, we tend to think of 1 as just 1 -- we think of 1
>is a _value_, not an _object_.

That's not the way I view it.  I think that in a language like Scheme,
the most useful conceptualization is that every value is just a pointer
to an object, i.e., is an identifier of an object.  Values are not
objects, but are always pointers to objects.  The objects may be
mutable (like pairs and vectors) or immutable, like numbers and
symbols.  Still, you can uniformly view them as objects.  For
some objects, it may be interesting to compare object identities,
and for some it may not.  Often it's context-dependent---it depends
on what invariants your program maintains, and what those invariants
mean in terms of the application's semantics.

>   >   - Things that _actually_ look the same in each and every
>   >     respect_are_ the same.
>
>   No.  This is bad philosopy.  Things that are indistinguishable
>   given a certain subjective point of view may become distinguishable
>   when you learn more about them.
>
>Lennart Augustsson already gave a perfectly fine answer to this.

I didn't think so.  It's possible I'm missing the point, but what
I take to be the point seems wrong.  A couple of other people
followed up with examples that appear to undermine Lennart's answer.

(The basic idea is that two objects my be distinguished by properties
that aren't detectable from looking at the objects---they my not
differ in what they hold pointers to, but in what holds pointers
to them.  And this can be very useful in indexing structures.)

The point here is that even if things are objectively distingishable
on the basis of external data, in clean modular code you often
don't have that external data where it's needed.

>   I think this is simplistic.  I may be able to distinguish between
>   two identical twins based on their social security numbers, and
>   that may be my only means of knowing they're two separate people,
>   at some point in time.
>
>But the two of them probably remember their SSNs, so they are actually
>different (because their memories are different).

Sure, but I don't think that's relevant.  (BTW, some people actually don't
have social security numbers.  *Lots* of people if you include non-U.S.
citizens. :-) )

The real issue is whether it's a hassle to maintain object identity
when you want it.  Sometimes you do want it, which is why even
in pure, lazy FP people do in fact sometimes tag things with unique
identifiers, essentially faking pointers.

And that's why the US does have social security numbers---to uniquely
identify people for certain purposes WITHOUT having to examine a whole
lot of information about them.  (It could be a lot---the set of
property equalities sufficient to reliably distinguish people may
not be tiny.  You can imagine two Jim Smiths ending up roommates,
and even having been born in the same city and having fathers
named Jim Smith.  You don't want to ask them a series of 
otherwise-irrelevant questions just to determine that they are
or are not the same person.  

SSN's are a cobbled up kind of pointer, which was cobbled up precisely 
because pointers are convenient for many tasks.  To a first approximation,
each American human has a unique identifier, just so that different
applications don't have to come up with new and different tagging
schemes.  (Of course, SSN's raise serious privacy questions, especially
when they start being used by everybody instead of just the Social
Security Administration.)

>if they don't
>remember it, then they must have it tattooed somewhere -- otherwise
>how would one know which of the SSNs goes with which person.)

Doesn't matter.  I don't want to have to pull my objects' clothes off
and strip-search and mind-read them.  That's why either tag
objects with unique ID's or let the language do it for us, depending
on the language.

Letting the language do it for us does make it easier---we don't have to
ensure that we tattoo our objects and modify our code to check tattoos.
We just pick between different equality predicates, or just code most
of our stuff in ways that don't need equality predicates at all.
It's nice if they're there when you need them.

>   >   - From a denotational point of view object identity doesn't matter
>   >     for immutable things.
>
>   It depends on the semantics you need.  For example, if I'm representing
>   two individuals who are parents of a third individual, and I don't
>   really know anything else about them, I do not want to make the mistake
>   of thinking that they're the same person---one may be the father,
>   and the other the mother, so that they're entirely different
>   individuals.
>
>Well, they are like fermions then -- swapping them around changes a
>sign somewhere -- otherwise the situation is the same. :)

Right, but in practical terms it means that you need pointers, either
language-provided or hand-cobbled.  You don't want to have to actually
swap things around and observe the side effects, if a pointer comparison
can tell you whether swapping them is a no-op.

>   >   - Exactly _because_ there is no notion of identity, and because
>   >     things that look the same _are_ the same, it is possible for the
>   >     compiler to make copies of things (or represent them differently
>   >     at different times during execution).  This is of great value for
>   >     optimizing compilers.
>
>   I think this has much more to do with immutability than with object
>   identity (or the lack of it).
>
>Mutability immediately gives you object identity. But even without
>mutability, if you are required (by the language) to maintain object
>identity, then you can't make copies (unless you can prove that the
>program in question never asks about object identity).
>
>In Scheme this comes up in the case of procedure objects, which are 
>required to compare "eq?" to "themselves".  This is (IMO) a major nuisance 
>for an optimizing compiler, because many otherwise legal beta-reductions
>become potentially illegal because of that.

You might want to elaborate on this.  It's not clear to me exactly
what circumstances you're worried about.

>   I think the issue here is not so much that you have an object identity
>   when you don't want one, but that you've got to know whether you want
>   one, for the semantics you're imposing on your data structures.
>   If an eq? (object identity) test isn't want you want, _don't_use_it_.
>
>Well, but the compiler cannot (in general) take advantage of this,
>unless your type system somehow tracks the fact that there are no eq?
>comparisons for a given object.  Without a type system you loose,
>because then you need some sort of global analysis, which defeats
>things like separate compilation.

This is starting to change the ground of the discussion.  (But I asked
for it.)  I was mostly questioning the conceptual "rightness" of the 
"immutable iff no-identity-test" claim.  I also questioned whether
the performance losses due to having pointers is significant.

(I should have covered my ass there and said "compared to the costs of
faking pointers when you need them", but then that may degenerate
into an argument about the "right" programming style, and how often
you actually end up having to do what.)

If we're going to have a serious discussion of performance impact,
we need to go into more detailed explanations.

I still think that in practical terms, immutability is different
from not having object identity, and the former has more significant
impact on performance than the latter.  (I'm happy to be argued out
of either view, but I haven't seen a convincing argument yet.  Sorry
if I missed the boat, but I *thought* I understood what people were
saying.)

>-Matthias

-- 
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (······@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from 
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)      

From: Graham Matthews
Subject: Re: Object IDs are good ( was: Object IDs are bad )
Date: 
Message-ID: <3366A67D.27D7@maths.anu.edu.au>
Paul Wilson wrote:
> The real issue is whether it's a hassle to maintain object identity
> when you want it.  Sometimes you do want it, which is why even
> in pure, lazy FP people do in fact sometimes tag things with unique
> identifiers, essentially faking pointers.

I think you have missed the point here Paul. I agree that sometimes
people using pure FP languages do tags things with unique identifiers.
This gives a notion of identity. But this is not "faking pointers"!
Pointers are a much "bigger" concept than object identity -- there are
other operations you can associate with pointers. Most of those other
operations have awful semantics and screw up optimisation,
parallelisation, and provability properties. Pointers do give object
identity, but unfortunately they also give a lot more junk that you
don't want. So the people tagging things in pure FP languages, are not
"faking pointers" -- they are simply programming object identity.
Tagging is a sharp scalpel for object identity, pointers are a blunt
saw.

Paul
> Right, but in practical terms it means that you need pointers, either
> language-provided or hand-cobbled.

No you don't want pointers. You might want object identity at times (but
that can easily be programmed and put into a library), but you certainly
don't want pointers.

graham
-- 
        He'll be riding that fast train to Memphis
                    On that sweet R & B
            Till the green grass way down home
                     Grows up over me
From: Paul Wilson
Subject: Object ID's vs. "pointers" vs. storage addresses
Date: 
Message-ID: <5k7rb0$6bh@roar.cs.utexas.edu>
In article <·············@maths.anu.edu.au>,
Graham Matthews  <···············@maths.anu.edu.au> wrote:
>Paul Wilson wrote:
>> The real issue is whether it's a hassle to maintain object identity
>> when you want it.  Sometimes you do want it, which is why even
>> in pure, lazy FP people do in fact sometimes tag things with unique
>> identifiers, essentially faking pointers.
>
>I think you have missed the point here Paul. I agree that sometimes
>people using pure FP languages do tags things with unique identifiers.
>This gives a notion of identity. But this is not "faking pointers"!
>Pointers are a much "bigger" concept than object identity -- there are
>other operations you can associate with pointers. 

I think we may be miscommunicating here.  When I say "pointer", I mean
an identifying reference to an object, through which you may be able
to access the object.  I do NOT mean virtual addresses that can
be compared arithmetically, cast to integers and back, etc.

This is the traditional sense of "pointer" used in data structures
textbooks.  Roughly, I mean pointers in the sense of Pascal, not
the weird and extended sense of C or C++.  Even to a C programmer,
Pascal's pointers are recognizably pointers, even though they're
not castable to raw addresses.

(And even in C and C++, doing weird things with pointers is often illegal,
despite the fact that most compilers and architectures let you do it.
For example, in C the result of pointer arithmetic across distinct 
objects is undefined, partly because C code is supposed to be usable
on segmented architectures.  You're only supposed to do pointer
arithmetic within things like arrays.  Unfortunately, many
programmers do it anyway and their programs are officially
"unportable".  And in C++, doing pointer arithmetic on pointers
to class instances is a big no-no in general, because of the
way pointers may actually point into the interiors of objects so that 
they can have multiple virtual function tables that implement different
interfaces.  Luckily, messing with this kind of unportability often
breaks code right away, because virtual function dispatching stops
working!)

>Most of those other
>operations have awful semantics and screw up optimisation,
>parallelisation, and provability properties. Pointers do give object
>identity, but unfortunately they also give a lot more junk that you
>don't want. So the people tagging things in pure FP languages, are not
>"faking pointers" -- they are simply programming object identity.
>Tagging is a sharp scalpel for object identity, pointers are a blunt
>saw.

I suspect this is based on a culture-specific interpretation of the
word "pointer."  (I'm not sure whether this is an American/non-American
difference, or an FP/traditional language difference, or what.  I
suspect it's more the latter, but some of each. Sorry for the confusion.)

In the terminology I'm used to, I think we are talking about
pointers here.  If we were to make a distinction between object
identity and pointers, I'd be inclined to say that an object
identity is *just* a tag, not a direct reference to the object
that has that tag.  (For example, I may know someone's social
security number, and be able to compare it to another, but just
having the social security number doesn't give me direct access
to that person.)  When you add a tag to an object so that a
value gives you both identity and "direct" reference (at the
language level), I'd say you have a pointer.

And on principle, I'd say that we shouldn't let langauges like C
ruin the meaning of a perfecty good CS term like "pointer".  :-)

One reason is that most programmers *basic* intutions about 
pointers seem to be in line with my usage---being able to cast
and do pointer arithmetic is extra stuff.

So if you try to explain a language like Scheme or Smalltalk, and
say it doesn't have pointers, most programmers will get very confused.
If you say that "every value is a pointer to an object" and "we don't
need pointer syntax because every value is a pointer," it's easier
to get across the basic idea.

Even in teaching pure lazy FP, that's the approach I use.  I just
tell them that everything's a pointer, but you can't do pointer
comparisions, and the compiler may fiddle the representations sometimes,
just like any optimizing compiler is allowed to do.

>Paul
>> Right, but in practical terms it means that you need pointers, either
>> language-provided or hand-cobbled.
>
>No you don't want pointers. You might want object identity at times (but
>that can easily be programmed and put into a library), but you certainly
>don't want pointers.
>
>graham
-- 
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (······@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from 
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)      
From: Stefan Monnier
Subject: Re: Object IDs are good ( was: Object IDs are bad )
Date: 
Message-ID: <5lafmgxjx2.fsf@tequila.systemsz.cs.yale.edu>
Graham Matthews <···············@maths.anu.edu.au> writes:
> No you don't want pointers. You might want object identity at times (but
> that can easily be programmed and put into a library), but you certainly
> don't want pointers.

The thing is that you generally have pointers in the implementation, so if you
have such a "common" implementation, it makes more sense to use them to
get object-ID for free than to implement object-ID manually.

If you really want the best of both worlds, you need some way to tell the
compiler that you want object-ID semantics on objects X and Y, so he can either
use pointers to get identities or use somthing else if pointers are somehow
inconvenient.


        Stefan