Progress report

From: Erann Gat
Subject: Progress report
Date: Sun, 28 Mar 2004 08:03:15 +0000
Message-ID: <gNOSPAMat-2803040003150001@192.168.1.52>

In case anyone is wondering, here's a quick report on the current state of
my MySQL-based object store.

So far (after about two weeks or so of evenings and weekends) I can do the
following:

? (setf db (make-object-store "mcl"))  ; Connect to database MCL
#<DB mcl>                        ; A database object
? (query db "describe foo")      ; Reqular SQL query
766: describe foo                ; Debugging output (can be switched off)
(#("x" "int(11)" "" "PRI" "0" "") #("y" "float" "YES" "" NIL "") #("z"
"text" "YES" "" NIL ""))
? (ref db 'foo)           ; REF is a generic function.
#<DB-TABLE foo>           ; When applied to a database the result is a table
? (ref (ref db 'foo) 1)   ; When applied to a table...
767: select * from foo where x=1
#<DB-ITEM foo 1 { x 1 y 2.3 z "biff" }>   ; The result is a DB-ITEM (i.e. a row)

; (Notice that the system figures out on its own which column is the
; primary index)

? (ref (ref (ref db 'foo) 1) 'z)    ; When REF is applied to a DB-ITEM...
768: select * from foo where x=1
"biff"                              ; ... the result is the stored value.
? @db.foo                           ; If you get tired of typing REF there's
#<DB-TABLE foo>                     ;       a shorthand syntax
? @db.foo.1
887: select * from foo where x=1
#<DB-ITEM foo 1 { x 1 y 2.3 z "biff" }>
? 
? @db.foo.1.z
769: select * from foo where x=1
"biff"
? (setf i 1)
1
? @······@i.z                       ; The syntax includes evaluated indices
770: select * from foo where x=1
"biff"

; Aside: since @ just expands into REF calls, and REF is a generic function
; we can use this syntax for many other purposes.  Examples:
? (define-class c1 x y z)
#<STANDARD-CLASS C1>
? (setf c (make-c1))
#<C1 #xC37526>
? (setf x 'y)
Y
? (setf @c.x 123)
123
? (setf @·@x 456)
456
? (describe c)
#<C1 #xC36F16>
Class: #<STANDARD-CLASS C1>
Wrapper: #<CCL::CLASS-WRAPPER C1 #xC36EEE>
Instance slots
X: 123
Y: 456
Z: NIL
? 

; Then there's a completely separate facility that allows you to store
; and retrieve Lisp objects:

? (setf x '(a b c d))
(A B C D)
? (setf y (cdr x))
(B C D)
? (store db y)
771: insert into object(type) values(8)
772: insert into object(type) values(8)
773: insert into object(type) values(8)
774: replace CONS(id,hash,CAR,CDR) values(1323,8944188,1129,5)
775: replace CONS(id,hash,CAR,CDR) values(1322,243841764,18,1323)
776: replace CONS(id,hash,CAR,CDR) values(1321,109657136,15,1322)
1321
? (store db x)
777: insert into object(type) values(8)
778: replace CONS(id,hash,CAR,CDR) values(1324,243941056,11,1321)
1324
? (setf x nil)
NIL
? (setf y nil)
NIL
? t
T
? (gc)  ; Clear the caches (which are implemented as weak hash tables)
NIL
? (restore db 1321)
875: select type from object where id=1321
876: select car,cdr from cons where id=1321
877: select type from object where id=1322
878: select car,cdr from cons where id=1322
879: select type from object where id=1323
880: select car,cdr from cons where id=1323
881: select type from object where id=5
882: select pkg.value,name.value
 from symbol left join string as pkg on symbol.package=pkg.id, string as name
 where symbol.name=name.id  and symbol.id=5
(B C D)
? (restore db 1324)
883: select type from object where id=1324
884: select car,cdr from cons where id=1324
(A B C D)
? (eq (cdr *) **)     ; Object identity is preserved
T
? (restore db 1324)
(A B C D)             ; Note no queries -- object is cached
?

Note that the system does not do unnecessary queries (except for query 882
which is a bug -- it's retrieving the symbol NIL, which isn't being cached
properly.)

All this in a grand total of only about 500 lines of code.  The object
store proper is only about 250 lines.  (The @ syntax reader is 15 lines,
an afterthought really.)  The rest is FFI, MySQL interface code, and
miscellaneous utilities from my personal standard library.  It's not
complete; I expect it could double in size before it's done.

It turns out I didn't even need any fancy tricks like a STORABLE mixin or
a before method on slot-value.  SXHASH was all I needed to manage the
cache.

E.

Re: Progress report Rahul Jain
- Re: Progress report Erann Gat
  - Re: Progress report Erann Gat
    - Re: Progress report Rahul Jain
  - Re: Progress report Rahul Jain

From: Rahul Jain
Subject: Re: Progress report
Date: Sun, 28 Mar 2004 17:03:21 +0000
Message-ID: <87zna0oqee.fsf@nyct.net>

·········@flownet.com (Erann Gat) writes:

> It turns out I didn't even need any fancy tricks like a STORABLE mixin or
> a before method on slot-value.  SXHASH was all I needed to manage the
> cache.

That's because you require an explicit STORE operation on the whole
object after mutation instead of extending (SETF S-V-U-C) to do the
STORE implicitly on the mutated slot-value.

But now that I think about it, there's really no difference in the
programmer effort involved. The only "problem" is that the persistence
layer needs to _recursively_ check _all_ slot-values to see if they've
changed.

Have you looked at (un)CommonSQL, btw? It's not quite the same, in that
it's not intended to be a proper object store, but rather a way of
building instances based on relational data -- slots can be defined to
be joins, e.g.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist

From: Erann Gat
Subject: Re: Progress report
Date: Sun, 28 Mar 2004 19:25:18 +0000
Message-ID: <gNOSPAMat-2803041125180001@192.168.1.52>

In article <··············@nyct.net>, Rahul Jain <·····@nyct.net> wrote:

> ·········@flownet.com (Erann Gat) writes:
> 
> > It turns out I didn't even need any fancy tricks like a STORABLE mixin or
> > a before method on slot-value.  SXHASH was all I needed to manage the
> > cache.
> 
> That's because you require an explicit STORE operation on the whole
> object after mutation instead of extending (SETF S-V-U-C) to do the
> STORE implicitly on the mutated slot-value.
> 
> But now that I think about it, there's really no difference in the
> programmer effort involved.

Yes, that's the conclusion I came to.  The programmer has to tell the
system somehow what they do and do not want stored.

> The only "problem" is that the persistence
> layer needs to _recursively_ check _all_ slot-values to see if they've
> changed.

No, this is the really cool thing.  Once you've stored an object, as long
as that object hasn't be GC'd it's in the cache along with its sxhash the
last time it was stored.  So to update everything in the DB you just walk
through the cache and compare the cached stored sxhash with the current
actual sxhash and if they differ you need to re-store the object.  So the
programmer only has to STORE things once.  Thereafter they can just call
SYNC and everything that is STOREd that needs to be updated will be.

I didn't demonstrate this because I hadn't implemented it, but it's trivial:

(define-method (sync (db odb hash-cache))
  (loop for o being the hash-keys of hash-cache do
        (unless (= (ref hash-cache o) (sxhash o)) (store db o))))

and now I can demonstrate it:

? (sync db)
NIL
? x
(A B C D)
? (setf (car x) 'z)
Z
? (sync db)
1898: replace CONS(id,hash,CAR,CDR) values(1324,378158984,1319,1321)
NIL
? 

That took me literally less than three minutes.

(I just realized that sxhash doesn't quite do the right thing.  It will
generally cause you to store more stuff than is really necessary.  But
it's easy enough to write a new hash function that does the Right Thing.)

> Have you looked at (un)CommonSQL, btw? It's not quite the same, in that
> it's not intended to be a proper object store, but rather a way of
> building instances based on relational data -- slots can be defined to
> be joins, e.g.

Yes, I have.  It's much more complex and sophisticated than what I'm
doing.  My project is designed to be lightweight.  Also, (un)CommonSQL
doesn't work on MCL.

E.

From: Erann Gat
Subject: Re: Progress report
Date: Sun, 28 Mar 2004 19:51:27 +0000
Message-ID: <gNOSPAMat-2803041151270001@192.168.1.52>

In article <··························@192.168.1.52>,
·········@flownet.com (Erann Gat) wrote:

> 
> (I just realized that sxhash doesn't quite do the right thing.  It will
> generally cause you to store more stuff than is really necessary.  But
> it's easy enough to write a new hash function that does the Right Thing.)

I just further realized that sxhash doesn't do the right thing in the
other direction too: it will occasionally cause you to miss storing a
change, which is a much more serious problem.  Using a wider hash can make
the probability of losing a change vanishingly small, but cannot entirely
eliminate it.  I may have to rethink this whole approach.

E.

From: Rahul Jain
Subject: Re: Progress report
Date: Mon, 29 Mar 2004 01:33:09 +0000
Message-ID: <87isgomo8a.fsf@nyct.net>

·········@flownet.com (Erann Gat) writes:

> I just further realized that sxhash doesn't do the right thing in the
> other direction too

OK, you already addressed my objection. Never mind. :)

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist

From: Rahul Jain
Subject: Re: Progress report
Date: Mon, 29 Mar 2004 01:32:09 +0000
Message-ID: <87n060mo9y.fsf@nyct.net>

·········@flownet.com (Erann Gat) writes:

>> The only "problem" is that the persistence
>> layer needs to _recursively_ check _all_ slot-values to see if they've
>> changed.
>
> No, this is the really cool thing.  Once you've stored an object, as long
> as that object hasn't be GC'd it's in the cache along with its sxhash the
> last time it was stored.  So to update everything in the DB you just walk
> through the cache and compare the cached stored sxhash with the current
> actual sxhash and if they differ you need to re-store the object.

Is that really enough? How do you know when the sxhash will change? How
do you know there won't be any hash collisions? In CMUCL the sxhash for
_all_ instances at least used to be 42. Your method would never see a
change in any instance's data, ever, in such a system.

Properly taking a hash code (and accepting the penalty of not updating
the database quickly) would involve recursing through the whole
reference graph rooted at the object being updated and computing a hash
code based on all values found in that traversal. This is exactly what I
said above.

> So the programmer only has to STORE things once. Thereafter they can
> just call SYNC and everything that is STOREd that needs to be updated
> will be.

That would require walking the entire set of cached objects, which is
even worse (at least if it's not amortized really well) than recursively
walking just one part of the reference graph. In fact, it's bound to be
worse unless there is a whole lot of reference sharing or repeated
updating without STOREing of the same "root" persisted object.

> (I just realized that sxhash doesn't quite do the right thing.  It will
> generally cause you to store more stuff than is really necessary.  But
> it's easy enough to write a new hash function that does the Right Thing.)

See above for my objections to this statement.

>> Have you looked at (un)CommonSQL, btw? It's not quite the same, in that
>> it's not intended to be a proper object store, but rather a way of
>> building instances based on relational data -- slots can be defined to
>> be joins, e.g.
>
> Yes, I have.  It's much more complex and sophisticated than what I'm
> doing.  My project is designed to be lightweight.

OK, that's what I expected.

> Also, (un)CommonSQL doesn't work on MCL.

Probably just a matter of referring it to the right package to find the
MOP symbols.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist