Question about data save/load

From: Ladv�nszky K�roly
Subject: Question about data save/load
Date: Tue, 07 Oct 2003 07:23:56 +0000
Message-ID: <f3706271f05c782c0f85e3f00a906e75@news.meganetnews.com>

Are there Lisp mechanisms/extensions to support the saving/loading of
arbitrary data structures like the pickle facility in Python?

Thanks for any help,

K�roly

Re: Question about data save/load Nikodemus Siivola
Re: Question about data save/load Tim Daly Jr.
Re: Question about data save/load Simon Andr�s
Re: Question about data save/load Markus Fix
- Prevalence (was: Re: Question about data save/load) Will Hartung
  - Re: Prevalence (was: Re: Question about data save/load) Markus Fix

From: Nikodemus Siivola
Subject: Re: Question about data save/load
Date: Tue, 07 Oct 2003 10:05:39 +0000
Message-ID: <blu35j$hgl$1@nyytiset.pp.htv.fi>

"Ladv�nszky K�roly" <··@bb.cc> wrote:

> Are there Lisp mechanisms/extensions to support the saving/loading of
> arbitrary data structures like the pickle facility in Python?

Google groups from bindump.

 -- Nikodemus

From: Tim Daly Jr.
Subject: Re: Question about data save/load
Date: Tue, 07 Oct 2003 08:58:35 +0000
Message-ID: <87n0cd2zlw.fsf@tenkan.org>

"Ladv�nszky K�roly" <··@bb.cc> writes:

> Are there Lisp mechanisms/extensions to support the saving/loading of
> arbitrary data structures like the pickle facility in Python?

Often you can simply use READ and PRINT.  For example, you can write
an array of 10 elements to a file like this:

     (with-open-file (f "/tmp/foo" :direction :output
     		   :if-does-not-exist :create)
       (print (make-array 10) f))

and you can read it back like this:

     (with-open-file (f "/tmp/foo" :direction :input)
       (read f))

That works for most things.  One bonus of this approach is it also
means that you can type data directly into the REPL:

     * (type-of #(0 0 0 0 0 0 0 0 0 0))
     (SIMPLE-VECTOR 10)
     *

There are tons of hooks to enable you to extend the reader and
printer, in case they don't do what you need.  One way to find out
more is to look up things like readtable, dispatching macro character,
print-object generic function, and the :print-function keyword, in a
Common Lisp reference.  (See http://www.cliki.net/CLtL2 and
http://www.cliki.net/CLHS, for example.)

-Tim

-- 
Man must shape his tools lest they shape him.
                -- Arthur R. Miller

From: Simon Andr�s
Subject: Re: Question about data save/load
Date: Tue, 07 Oct 2003 09:38:49 +0000
Message-ID: <vcdu16l1j6e.fsf@tarski.math.bme.hu>

Search for save-object, make-load-form, &c in groups.google. E.g.
the thread

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=3D47C29B.27679B36%40smi.de&rnum=12&prev=/&frame=on

from last summer might be helpful. 

Andras

"Ladv�nszky K�roly" <··@bb.cc> writes:

> Are there Lisp mechanisms/extensions to support the saving/loading of
> arbitrary data structures like the pickle facility in Python?
> 
> Thanks for any help,
> 
> K�roly

From: Markus Fix
Subject: Re: Question about data save/load
Date: Tue, 07 Oct 2003 10:42:58 +0000
Message-ID: <3F8298B2.2020108@bookfix.com>

Ladv�nszky K�roly wrote:
> Are there Lisp mechanisms/extensions to support the saving/loading of
> arbitrary data structures like the pickle facility in Python?

You might want to have a look at "Common Lisp Prevalence"
by Sven Van Caekenberghe:

<http://homepage.mac.com/svc/prevalence/readme.html>

Nice project and good code.

-fix

-- 
------- Markus Fix http://www.bookfix.com/ --------
Don't fight forces, use them. -R.Buckminster Fuller

From: Will Hartung
Subject: Prevalence (was: Re: Question about data save/load)
Date: Wed, 08 Oct 2003 01:06:58 +0000
Message-ID: <blvnar$hatb3$1@ID-197644.news.uni-berlin.de>

"Markus Fix" <···@bookfix.com> wrote in message
·····················@bookfix.com...
> Ladv�nszky K�roly wrote:
> > Are there Lisp mechanisms/extensions to support the saving/loading of
> > arbitrary data structures like the pickle facility in Python?
>
> You might want to have a look at "Common Lisp Prevalence"
> by Sven Van Caekenberghe:
>
> <http://homepage.mac.com/svc/prevalence/readme.html>
>
> Nice project and good code.

I have to question the utility of this mechanism, particularly with CL.

I mean it's novel and all, but...I dunno, I have "issues" with it.

"Like we care about your Issues, Will!"

Yea, but this is USENET, so...

With basic Lisp structures we almost get this "for free", at least the
load/save part with the reader/writer. We don't get the unit of work
details, but a transaction log shouldn't be painfully difficult,
particularly an application specific one.

If we punt over to the binary versions that (most?) implementations support,
then we have a quicker and vastly more efficient loading and saving
mechanism, still needing the transaction log.

I believe, for example, that Franz can essentially mmap in large static
structures, making the initial load very quick, and initial traversals the
price of the page swap in. This almost reminds me of DJBs cdb system, which
is essentially a static persistent hash map the builds very quickly, and
designed to be "fast enough" to simply rebuild when you want to update it.

But combining the transaction log with an outside "rebuild" utility with the
fast loading static binary FASL, and you get a pretty efficient system,
almost "for free". Of course, you can always simply snapshot your image as
well.

I guess my biggest concern is simply the limitations of the system. It
doesn't appear to scale at all. Of course, if you system has no need, and
will never need, to scale this is a non-issue.

Having done the "read the huge serializable datastructure" thing in Java,
the default generic implementation takes forever with any large quanity of
objects. The concept of loading "hundreds of megabytes" of these things send
shivers up my spine. Loading hundreds of MB isn't so hard, but most of these
serialization schemes are object based, not page based. So, 100's of MB of
objects means, usually, one or two orders of magnitude that number of actual
objects. So, loading and saving 1-10 Billion objects takes a bloody long
time. Period. Nothing exacerbates grumpy customers like a server crash.
Nothing rubs salt into that wound more than having the server take forever
and a day to start up.

Also, the larger the system, the longer the load time, the greater the price
of failure. As the system gets larger, the restart time gets that much
longer. By the same token, the shutdown time takes that much longer as well.
Pity the poor soul whose large system fails in the middle of shutdown (like,
with the disk filling up). That soul gets to reload the old, rerun the log,
and then save it again.

Any system relying on this layer is also essentially doomed should it try to
scale. Since scaling isn't practical, you need either a new persistence
layer, or you need to adapt this one. But either way, even if you could get
rid of the code dependencies (say by creating an api equivalent scalable
persistence layer), your performance dynamics are gone. Your base
performance is prefaced by having the system in RAM, and scalable system
won't be doing that. Perhaps a lot of it will be in RAM, but not all. So
when you try to scale, you need to reevaluate your system performance.

Throw in the multi-user "oh we'll just lock the entire DB, but it's in RAM
so it's fast" contention issues and I just swoon thinking of the
ramifications of such a system. Locks are locks, busy locks with long lines
are worse.

Of course, there is value in this system, particularly for other languages,
and smaller applications. But I think that with reasonable use of CL
facilities, both implementation dependent and portable versions, there is
less value for this system here.

I guess to me something like this would be better suited if implemented as a
layer on top of the more general CL image facilities. That gives us
hopefully higher performance object loads and saves, true "native" object
support, as well as a crafty transaction system to recover the tree if its
necessary.

Regards,

Will Hartung
(·····@msoft.com)

From: Markus Fix
Subject: Re: Prevalence (was: Re: Question about data save/load)
Date: Thu, 09 Oct 2003 08:45:03 +0000
Message-ID: <3F85200F.6070708@bookfix.com>

Will Hartung wrote:
> "Markus Fix" <···@bookfix.com> wrote in message
> ·····················@bookfix.com...
> 
>>Ladv�nszky K�roly wrote:
>>
>>>Are there Lisp mechanisms/extensions to support the saving/loading of
>>>arbitrary data structures like the pickle facility in Python?
>>
>>You might want to have a look at "Common Lisp Prevalence"
>>by Sven Van Caekenberghe:
>>
>><http://homepage.mac.com/svc/prevalence/readme.html>
>>
>>Nice project and good code.
> 
> 
> I have to question the utility of this mechanism, particularly with CL.
> 
> I mean it's novel and all, but...I dunno, I have "issues" with it.
> 
> "Like we care about your Issues, Will!"
> 
> Yea, but this is USENET, so...
> 
> With basic Lisp structures we almost get this "for free", at least the
> load/save part with the reader/writer. We don't get the unit of work
> details, but a transaction log shouldn't be painfully difficult,
> particularly an application specific one.
>

I agree with you that it's much easier to implement a prevalence
mechanism in Lisp. Especially compared to Java, where the Prevayler 
project started initially. Prevalence make a lot of sense for
small web based projects. Small meaning the footprint of your
database which has to fit into available RAM. It's also important
to understand that your insert performance is horrible as you
usually sync after each commit.

> If we punt over to the binary versions that (most?) implementations support,
> then we have a quicker and vastly more efficient loading and saving
> mechanism, still needing the transaction log.

Very true and it shows just how flexible and performant Lisp
can be compared to Java.

> I believe, for example, that Franz can essentially mmap in large static
> structures, making the initial load very quick, and initial traversals the
> price of the page swap in. This almost reminds me of DJBs cdb system, which
> is essentially a static persistent hash map the builds very quickly, and
> designed to be "fast enough" to simply rebuild when you want to update it.
> 
> But combining the transaction log with an outside "rebuild" utility with the
> fast loading static binary FASL, and you get a pretty efficient system,
> almost "for free". Of course, you can always simply snapshot your image as
> well.
> 

I've been thinking along those line too.

> I guess my biggest concern is simply the limitations of the system. It
> doesn't appear to scale at all. Of course, if you system has no need, and
> will never need, to scale this is a non-issue.
> 

It shouldn't be to hard to get simple clustering between
prevalence applications implemented. The hardes part is
making sure all your object IDs are unique.

> Having done the "read the huge serializable datastructure" thing in Java,
> the default generic implementation takes forever with any large quanity of
> objects. The concept of loading "hundreds of megabytes" of these things send
> shivers up my spine. Loading hundreds of MB isn't so hard, but most of these
> serialization schemes are object based, not page based. So, 100's of MB of
> objects means, usually, one or two orders of magnitude that number of actual
> objects. So, loading and saving 1-10 Billion objects takes a bloody long
> time. Period. Nothing exacerbates grumpy customers like a server crash.
> Nothing rubs salt into that wound more than having the server take forever
> and a day to start up.
> 

It does take quite a while to restart after a crash, if you only have
the transaction logs and no recent snapshot. You cannot delay object
creation until a certain datum is needed. All objects are recreated at
startup time.

> Also, the larger the system, the longer the load time, the greater the price
> of failure. As the system gets larger, the restart time gets that much
> longer. By the same token, the shutdown time takes that much longer as well.
> Pity the poor soul whose large system fails in the middle of shutdown (like,
> with the disk filling up). That soul gets to reload the old, rerun the log,
> and then save it again.
> 

That's why large production systems would probably use clustering.
Still, the failure case is a pain.

> Any system relying on this layer is also essentially doomed should it try to
> scale. Since scaling isn't practical, you need either a new persistence
> layer, or you need to adapt this one. But either way, even if you could get
> rid of the code dependencies (say by creating an api equivalent scalable
> persistence layer), your performance dynamics are gone. Your base
> performance is prefaced by having the system in RAM, and scalable system
> won't be doing that. Perhaps a lot of it will be in RAM, but not all. So
> when you try to scale, you need to reevaluate your system performance.
> 

Prevalence, as a simple form of orthogonal persistence, brakes down
as soon as your data set exceeds available RAM. Klaus Wuestefeld, who 
developed Prevayler for Java, states:

"For many systems it is already feasible to keep all business objects in 
RAM."

As part of my day job we developed a system for www.otop.de, which
provides a marketplace for the German health care industry. The overall 
footprint of the system is small (below 500MB) and the performance gain 
of getting rid of EJB and database was huge. The code is almost 
simplistic now and we can extend and evolve our data model without doing 
database dumps.

If the system was using Lisp we could even do that (evolve the data
structure) while the application was running. Alas, that's not possible 
with Java in it's current state.

> Throw in the multi-user "oh we'll just lock the entire DB, but it's in RAM
> so it's fast" contention issues and I just swoon thinking of the
> ramifications of such a system. Locks are locks, busy locks with long lines
> are worse.
> 

Agree.

> Of course, there is value in this system, particularly for other languages,
> and smaller applications. But I think that with reasonable use of CL
> facilities, both implementation dependent and portable versions, there is
> less value for this system here.
> 

It could be evolved into something that makes developing web 
applications a joy. It's quite addictive to just evolve
your data structure according to your daily needs without worrying
about migration code and object/table mappings.

> I guess to me something like this would be better suited if implemented as a
> layer on top of the more general CL image facilities. That gives us
> hopefully higher performance object loads and saves, true "native" object
> support, as well as a crafty transaction system to recover the tree if its
> necessary.

Exactly what I've been thinking. The original Prevayler implementation
is quite constrained by the barriers the Java language creates. In 
particular the 'kindergarten' object system and the terrible performance
of the serialization and marshalling of objects are painful.

With CL most of these constraints could be overcome.

-fix

-- 
------- Markus Fix http://www.bookfix.com/ --------
Don't fight forces, use them. -R.Buckminster Fuller