I might write an essay about "binders" and bindings

From: Robert Maas, http://tinyurl.com/uh3t
Subject: I might write an essay about "binders" and bindings
Date: Fri, 09 Jan 2009 21:10:07 +0000
Message-ID: <REM-2009jan09-005@Yahoo.Com>

In recent months I was brainstorming new ideas for efficient
storage management using reference counts. I eventually
came to two decisions:
- Reference count can be applicable even to algorithms that use
   circular structures (pointer loops), providing that all such
   pointer loops are *temporary*, broken up before exit from any
   tight loop that needed them for efficiency.
- Reference count can be efficient and can handle time-critical
   algorithms for multi-processing, providing that we introduce a
   way of counting them that ignores local duplicate copies of
   pointers and instead increment/decrement reference counts only
   when passing a pointer outside the local scope or totally
   discarding all local copies of a singly-counted pointer.

Today I'm looking at this essay on "linear (single reference) objects":
 <http://www.pipeline.com/~hbaker1/Use1Var.html>
I'm concerned that the author seems to insist that there is only
one reference to an object, where in fact it's essential for most
algorithms and for most computers to have multiple local transient
references even where there's only one *counted* reference.
Accordingly, I believe this topic should be unified with my
reference-count ideas.

I'm considering writing an essay on these topics, introducing the
term "binders" (analagous to "folders" in window-based filesystems)
which contain "places" (in the Common Lisp SETF sense) which
contain bindings to objects (first class citizens). Examples of
"binders" would be function/method frames (including parameters
passed to them), local blocks, and objects with slots. If I took
the time to write this all up in a unified well-explained way,
explaining both the general case of efficient reference-count
systems and the special case of "once-referenced" (actually
once-*counted*-referenced) objects, would anybody be interested in
reading it? Or would it be a waste of my time to write it up?

Re: I might write an essay about "binders" and bindings Kaz Kylheku
- Re: I might write an essay about "binders" and bindings Robert Maas, http://tinyurl.com/uh3t
- Re: I might write an essay about "binders" and bindings ··········@gmail.com
Re: I might write an essay about "binders" and bindings Slobodan Blazeski
- Re: I might write an essay about "binders" and bindings Robert Maas, http://tinyurl.com/uh3t

From: Kaz Kylheku
Subject: Re: I might write an essay about "binders" and bindings
Date: Fri, 09 Jan 2009 22:29:35 +0000
Message-ID: <20090115042521.758@gmail.com>

On 2009-01-09, Robert Maas, http://tinyurl.com/uh3t
<·············@teh.intarweb.org> wrote:
> In recent months I was brainstorming new ideas for efficient
> storage management using reference counts. I eventually
> came to two decisions:
> - Reference count can be applicable even to algorithms that use
>    circular structures (pointer loops), providing that all such
>    pointer loops are *temporary*, broken up before exit from any
>    tight loop that needed them for efficiency.

I.e. you /can/ have cycles and refcounting if you know where all the cycles are
and you break them before they leak out and cause trouble.  Brilliant!

Do you know you can also have storage that is not known to managed memory, too!
It's simple. All you have to do is compute its lifetime yourself and free it
explicitly.

One more substantial way to set up cycles under refcounting, without worrying
about having to break the cycles, is to use weak pointers for the
backreferences. Weak pointers can be navigated in the same way as the regular
ones, but don't contribute a reference count to their target.

But you haven't reinvented that wheel yet, I suppose.

> - Reference count can be efficient and can handle time-critical
>    algorithms for multi-processing, providing that we introduce a
>    way of counting them that ignores local duplicate copies of
>    pointers and instead increment/decrement reference counts only
>    when passing a pointer outside the local scope or totally
>    discarding all local copies of a singly-counted pointer.

Nothing new. Done all the time in C and C++ programs that use refcounts.  You
don't always bump up the refcount when calling a tree of helper functions. The
top-level holds the one and only ref that is needed.

In C++ what you can do is pass your smart pointers by reference wherever
possible. Then no copy construction takes place.

  refcounted_pointer<widget> global_ref;

  // no increment/decrement of refcount on entry/exit
  // into function

  static void helper(refcounted_pointer<widget> &widget)
  {
  }

  // Even at module boundaries
  void module_entry(refcounted_pointer<widget> &widget)
  {
     helper(widget);

     // but this assignment drops the refcount on
     // the object that is currently stored in global_ref,
     // and increments the refcount on the one going in.
     // (provided they aren't the same object).
     global_ref = widget;
  }

I suspect you are reinventing wheels.

> Today I'm looking at this essay on "linear (single reference) objects":
> <http://www.pipeline.com/~hbaker1/Use1Var.html>
> I'm concerned that the author seems to insist that there is only
> one reference to an object, where in fact it's essential for most
> algorithms and for most computers to have multiple local transient
> references even where there's only one *counted* reference.

Multiple local transient references are effectively one reference.  The reason
is that these references are managed by the compiled code of the function,
which is an opaque entity. While the function is activated, it needs to hold on
to the object it is working with. It doesn't matter if it has five copies of
that object spread across three machine registers and two stack locations, or
that it just has one.

If you're using garbage collection, the collector will scan all these areas.
If the object is found in any one register or stack location, then it's marked
as live. If it's found in two or more locations, that makes no difference.

Expressing the idea that ``this function activation is working with this
object, so it can't be reclaimed'' requires only one reference, which
can be shared with the entire function call tree rooted at that function.

> Accordingly, I believe this topic should be unified with my
> reference-count ideas.

You may have independently arrived at these ideas, which is very good, but they
are remotely far from new, let alone original.

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: I might write an essay about "binders" and bindings
Date: Sun, 11 Jan 2009 21:43:56 +0000
Message-ID: <REM-2009jan11-004@Yahoo.Com>

> > In recent months I was brainstorming new ideas for efficient
> > storage management using reference counts. I eventually
> > came to two decisions:
> > - Reference count can be applicable even to algorithms that use
> >    circular structures (pointer loops), providing that all such
> >    pointer loops are *temporary*, broken up before exit from any
> >    tight loop that needed them for efficiency.
> From: Kaz Kylheku <········@gmail.com>
> I.e. you /can/ have cycles and refcounting if you know where all
> the cycles are and you break them before they leak out and cause
> trouble.  Brilliant!

I'll take that as a compliment.

> Do you know you can also have storage that is not known to
> managed memory, too! It's simple. All you have to do is compute its
> lifetime yourself and free it explicitly.

Yes. But if you have pointers from non-managed memory to managed
memory, you have a crisis that can't be easily resoled. By
comparision, having pointers from a pointer-loop/cycle to ordinary
loop-free managed memory works just fine.

> One more substantial way to set up cycles under refcounting,
> without worrying about having to break the cycles, is to use weak
> pointers for the backreferences. Weak pointers can be navigated in
> the same way as the regular ones, but don't contribute a reference
> count to their target.

I don't see how that can possibly work under all use-cases when
using reference-count memory-management. Consider this example:
 (setq ptr1 (cons (make-big-structure1) nil))
 (setq ptr2 (cons (make-big-structure2) nil))
 (rplacd ptr1 ptr2)
 (rplacd-weak ptr2 ptr1)
 (setq ptr ptr1)
 (setq ptr1 NIL)
 (setq ptr2 NIL)
Now ptr contains the only entry into a loop as you prescribed.
Now suppose the programmer has his/her application do a typical
sort of thing when using a circular list:
 (setq ptr (cdr ptr))
Now suddenly the reference count to one of the cells in the loop
goes to zero, and the currently-pointed cell has a weak reference
to that zero-count cell, so at any moment that cell, and the entire
structure in its CAR might be recycled for some other use, while
the current entry cell still has a weak reference to it. Disaster!
By comparison, with mark-and-sweep system, the weak reference is
cleanly set to a null pointer because its referent no longer
exists, so the appliacation gets unexpeced behaviour but not
disaster. (Of course, if you know you're running under
mark-and-sweep system, you don't bother with the weak reference in
the first place, you use a normal RPLACD, and everything works
correctly.)

My proposed (unwind-protect (regular code) (sever-back-reference))
works correctly under either reference-count or mark-and-sweep.

Another point: One of the main reasons for trying reference-count
storage management is because it's easier to program than
mark-and-sweep storage management. I've been especially thinking
lately of implementing Lisp inside Forth, where the stack nature of
Forth makes it trivial to know how many references there are at any
time, and in the light of the essay on single-reference data
obects, I see that Forth makes it trivial to deal with
*transferring* references from caller to callee without any hassle,
because it really isn't the caller or callee that owns the
reference, it's the *stack* that directly owns the reference, and
the *implication* as to whether caller or callee owns part of the
stack (hence indirectly owns the reference) switches back and forth
trivially at call/return with zero overhead.

In that context of super-low-cost Lisp implementation in Forth,
having weak pointers is an extra complication/cost that just isn't
within the scope of what is reasonable to implement.

Of course you may argue that implementing an exception system with
unwind-protect is virtually impossible to include within the usual
Forth model, hence
  (unwind-protect (main-code) (sever-back-reference))
can't be done, hence my trick for dealing with pointer loops
isn't easy to code, hence it's difficult to assure correct
code without memory leak. But without any kind of non-local
exit from code anyway, there won't be any need for unwind-protect,
you can just write the Forth equivalent of
  (prog1 (main-code) (sever-back-reference))
and there should be no problem. This kind of application is easy to
implement in Forth via the SWAP primitive of course:
- Given START + END of list on stack;
- RPLACD to create link from end back to start;
- stack now has START pointer at top, with END pointer deeper under it;
- main algorithm uses START pointer, which might traverse around
   list in arbitrary way;
- RETURN value is left at top of stack, with END pointer just under
   it where it always was;
- SWAP, so END pointer at top and RETURN value under it;
- RPLACD-NIL to break the cycle, discarding END pointer;
- RETURN value at top of stack.

> > - Reference count can be efficient and can handle time-critical
> >    algorithms for multi-processing, providing that we introduce a
> >    way of counting them that ignores local duplicate copies of
> >    pointers and instead increment/decrement reference counts only
> >    when passing a pointer outside the local scope or totally
> >    discarding all local copies of a singly-counted pointer.
> Nothing new. Done all the time in C and C++ programs that use
> refcounts.  You don't always bump up the refcount when calling a
> tree of helper functions. The top-level holds the one and only ref
> that is needed.

OK, so presumably that's what the author really meant (or should
have meant) in the article about single-reference objects, and it
was just a matter of sloopy wording by that author that invoked my
gripe/correction.

Now there's a slight distinction between what you are saying and
what I was saying: If an actual *function* is called, then perhaps
it really is necessary to bump the reference count upon entry and
unbump it upon return, to account for two simultaneous references
(from caller and callee, unless strict usage of single-reference
per the essay is maintained), because in theory it's possible
during an exception to set up a permanent reference which is a copy
from one of the functions but not the other, so it doesn't work
that only one of them "counts". But I'm not totally sure it can't
be made to work correctly anyway in all cases.

> In C++ what you can do is pass your smart pointers by reference
> wherever possible. Then no copy construction takes place.

As I understand it, you are saying the calling function contains a
local variable that contains a pointer to a heap object that is
reference-counted, and you pass the *location* of that local
variable to the called function, so the called function can
actually *modify* the local variable right within the calling
function (unless the reference parameter is declared CONST), right?
Most of the time you wouldn't want that to happen, so it'd would be
good to declare it CONST most of the time, right?

>  refcounted_pointer<widget> global_ref;
>  // no increment/decrement of refcount on entry/exit
>  // into function
>  static void helper(refcounted_pointer<widget> &widget)
>  {
>  }

Presumably a CONST declaration should be added for most cases, right?
(I.e. you didn't happen to think of it in contriving this example,
 rather than that you deliberately avoided CONST declaration?)

What if you have a debugger that allows interrupting a called
function right in the middle of activity, and extracting it from
the called function, and embedding it into some *other* function,
perhaps a modified version of the original function, which is
initialized as if already in the middle of running. Can you be sure
the reference count won't get screwed up in all cases?

> I suspect you are reinventing wheels.

That's better than getting my wheel rejected as *wrong*.
Posting an article with my "new" idea that isn't so new, and
quickly getting feedback as to its oldness, is a lot better than
getting no feedback for years, and wasting years implementing my
re-invention, only to try to sell my years of work and learning
it's not needed because somebody else already did it.
Also if I do go ahead with a project to write a siimple version of
Lisp inside Forth using these ideas, knowing it's not a new idea,
rather it's a well-accepted idea, gives me confidence that my new
implementation ought to work just fine, but also gives me guidance
not to invest too much work on the idea. Thanks for the feedback!

> > Today I'm looking at this essay on "linear (single reference) objects":
> > <http://www.pipeline.com/~hbaker1/Use1Var.html>
> > I'm concerned that the author seems to insist that there is only
> > one reference to an object, where in fact it's essential for most
> > algorithms and for most computers to have multiple local transient
> > references even where there's only one *counted* reference.
> Multiple local transient references are effectively one
> reference.  The reason is that these references are managed by the
> compiled code of the function, which is an opaque entity. While the
> function is activated, it needs to hold on to the object it is
> working with. It doesn't matter if it has five copies of that
> object spread across three machine registers and two stack
> locations, or that it just has one.

(See above where I jumped ahead and already responded to this.
 Summary: Indeed that author was merely sloppy in wording, and my
 "correction" was actually a DWIM of that the other author really
 meant, if you are correct here.)
So this gives me additional confidence that my thoughts on this
topic are correct, even if not totally *new* to the state-of-art.

As for our idea of Lisp-in-Forth, I think explicit single-reference
objects are too much trouble to distinguish from ordinary
reference-counted objects, but I like that author's description of
expliciting *transferring* references from one place to another
(caller-to-callee on call, callee-to-caller on return,
function-to-objectSlot, etc.), whereby per his model the
single-reference quality is maintained, and likewise per my model
the reference count doesn't need to be adjusted. A combination of
duplicate uncounted *local* references, and explicit
no-adjustment-needed reference *transfer*, taking full advantage of
Forth's stack model to make this easy to code in a fullproof way,
should make the Lisp-in-Forth system very efficient as well as easy
to debug to get working correctly.

> Expressing the idea that ``this function activation is working
> with this object, so it can't be reclaimed'' requires only one
> reference, which can be shared with the entire function call tree
> rooted at that function.

(Still not sanguine about that working correctly in context of
 debuggers and non-local transfers such as exceptions, but maybe
 you're correct.)

> > Accordingly, I believe this topic should be unified with my
> > reference-count ideas.
> You may have independently arrived at these ideas, which is very
> good, but they are remotely far from new, let alone original.

Actually my ideas *are* original, from lots of personal thought,
with not a clue from anything I saw posted by anyone else. Truly
independent multiple invention of same idea, multi-original. It's
only upon reading your followup that I see that my analysis is
re-invention of something others already knew about.

Anyway, do you know of any online essay that unifies these ideas
clearly, the way that Kent's essay on "intention" nicely unifies
some ideas about intention as distinct from implementation, the way
that Darwin's book on evolution by natural selection unifies his
and Wallace's and other people's ideas about that topic, etc. I.e.
even though my ideas are re-invention, would it still be useful for
me to write an essay nicely organizing these ideas, or is that
essay already written and available so I should just read that
essay and offer any minor corrections to it?

From: ··········@gmail.com
Subject: Re: I might write an essay about "binders" and bindings
Date: Mon, 12 Jan 2009 17:38:16 +0000
Message-ID: <b7cdcfb3-64b5-42ab-b532-367d833cbbe4@w1g2000prm.googlegroups.com>

On Jan 9, 2:29 pm, Kaz Kylheku <········@gmail.com> wrote:
> On 2009-01-09, Robert Maas,http://tinyurl.com/uh3t
>
> <·············@teh.intarweb.org> wrote:
> > In recent months I was brainstorming new ideas for efficient
> > storage management using reference counts. I eventually
> > came to two decisions:
> > - Reference count can be applicable even to algorithms that use
> >    circular structures (pointer loops), providing that all such
> >    pointer loops are *temporary*, broken up before exit from any
> >    tight loop that needed them for efficiency.
>
> I.e. you /can/ have cycles and refcounting if you know where all the cycles are
> and you break them before they leak out and cause trouble.  Brilliant!

Please assure that your theory is resistant to errors in the "tight
loop."  The loops
must be broken even if an exception is raised in the inner temporary
scope.

John

From: Slobodan Blazeski
Subject: Re: I might write an essay about "binders" and bindings
Date: Mon, 12 Jan 2009 12:58:03 +0000
Message-ID: <e69a2004-c8ad-4847-b627-a3add117090c@z28g2000prd.googlegroups.com>

On Jan 9, 10:10 pm, ·············@teh.intarweb.org (Robert Maas,
http://tinyurl.com/uh3t) wrote:
> In recent months I was brainstorming new ideas for efficient
> storage management using reference counts. I eventually
> came to two decisions:
> - Reference count can be applicable even to algorithms that use
>    circular structures (pointer loops), providing that all such
>    pointer loops are *temporary*, broken up before exit from any
>    tight loop that needed them for efficiency.
And who will make that breaking, programmer or the implementation?
bobi
> - Reference count can be efficient and can handle time-critical
>    algorithms for multi-processing, providing that we introduce a
>    way of counting them that ignores local duplicate copies of
>    pointers and instead increment/decrement reference counts only
>    when passing a pointer outside the local scope or totally
>    discarding all local copies of a singly-counted pointer.
>
> Today I'm looking at this essay on "linear (single reference) objects":
>  <http://www.pipeline.com/~hbaker1/Use1Var.html>
> I'm concerned that the author seems to insist that there is only
> one reference to an object, where in fact it's essential for most
> algorithms and for most computers to have multiple local transient
> references even where there's only one *counted* reference.
> Accordingly, I believe this topic should be unified with my
> reference-count ideas.
>
> I'm considering writing an essay on these topics, introducing the
> term "binders" (analagous to "folders" in window-based filesystems)
> which contain "places" (in the Common Lisp SETF sense) which
> contain bindings to objects (first class citizens). Examples of
> "binders" would be function/method frames (including parameters
> passed to them), local blocks, and objects with slots. If I took
> the time to write this all up in a unified well-explained way,
> explaining both the general case of efficient reference-count
> systems and the special case of "once-referenced" (actually
> once-*counted*-referenced) objects, would anybody be interested in
> reading it? Or would it be a waste of my time to write it up?

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: I might write an essay about "binders" and bindings
Date: Sun, 08 Mar 2009 06:54:34 +0000
Message-ID: <REM-2009mar07-006@Yahoo.Com>

> > In recent months I was brainstorming new ideas for efficient
> > storage management using reference counts. I eventually
> > came to two decisions:
> > - Reference count can be applicable even to algorithms that use
> > =A0 =A0circular structures (pointer loops), providing that all such
> > =A0 =A0pointer loops are *temporary*, broken up before exit from any
> > =A0 =A0tight loop that needed them for efficiency.
> From: Slobodan Blazeski <·················@gmail.com>
> And who will make that breaking, programmer or the implementation?

The super-advanced "software hacker", far beyond ordinary
application programmer, who writes that super-efficient tool that
requires temporary circular structure, because the "just do it"
utility written by the ordinary "programmer/analyst" ran too slow
in a high-production environment. Ordinary "programmer/analyst"
never need bother to make circular structure, just use the
self-balancing binary search tree and other medium-efficient data
structures and trust it'll be fast enough for most practical
purposes. Note the "software hacker" will probably be writing some
of the tight loop in machine language (using LAP or SysLisp syntax)
rather than ordinary Lisp function calls.

And some "ordinary hackers" might make temporary circular structure
just because it makes some algorithms cleaner looking than by other
methods such as SBBSTs.