;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Persistence in Common Lisp (C) Arthur Lemmens, 2004 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Saving objects and loading them back later * Scenarios - user wants to save/load different workspaces - safeguard against program crashes - too much data to keep in memory. - transporting data 1. PRINT and READ 2. MAKE-LOAD-FORM and LOAD 3. SAVE-IMAGE 4. Serializing and deserializing 5. Logging changes 6. Using relational databases 7. Full-blown persistence in Lisp ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 1. PRINT and READ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Lisp is designed with read/write-consistency in mind REPL (read-eval-print loop). Result of PRINT can be fed to READ. Poor man's persistence for free. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 1. PRINT and READ: an example ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defstruct world ;; WORLD contains everything that we want to save. contents) (defun save-world (world pathname) (with-open-file (stream pathname :direction :output) (with-standard-io-syntax (let ((*print-circle* t) (*package* (find-package :common-lisp))) (print world stream)))) pathname) (defun load-world (pathname) (with-open-file (stream pathname :direction :input) (with-standard-io-syntax (let ((*package* (find-package :common-lisp))) (read stream))))) ;;; (defstruct node ;; Example of a circular object in the world. left right parent) (defun build-world () (let ((contents ;; A list of things that can be printed and read back (list ;; Numbers 42 1/3 3.14 ;; Circular lists (let ((circle (cons nil nil))) (setf (car circle) circle (cdr circle) circle)) ;; Pathnames #P"d:/arthur/my-file.txt" ;; Symbols 'some-symbol 'lw:current-pathname ;; Circular structures (let ((root (make-node))) (setf (node-left root) (make-node :left "X" :right "Y" :parent root)) root) ;; Simple arrays (including strings and bit-vectors) #((1 2 3) (a b c)) "Hello world" #*1001))) (make-world :contents contents))) ;;; ;; Building and modifying the world (defparameter *world* (build-world)) ;; Saving the world (save-world *world* "world.lsp") ;; Restoring the world (setq *world* (load-world "world.lsp")) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 1. PRINT and READ: pitfalls ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; - Pitfall: reader/printer control variables PRINT is controlled by *PRINT-READABLY*, *PRINT-ARRAY*, *PRINT-CIRCLE*, *PRINT-BASE*, *READ-EVAL* and other variables. READ is controlled by *PACKAGE*, *READ-BASE*, *READ-DEFAULT-FLOAT-FORMAT*, *READ-EVAL*, *READTABLE* and others. Solution: Use WITH-STANDARD-IO-SYNTAX - Pitfall: circular objects Solution: Bind *PRINT-CIRCLE* to true Example: #1=(a b c . #1#) - Pitfall: The binding of *PACKAGE* WITH-STANDARD-IO-SYNTAX binds *PACKAGE* to the CL-USER package. This is not portable. Solution: Bind *PACKAGE* to the COMMON-LISP package. - Pitfall: READ can be dangerous Example: (read-from-string "#.(delete-file #P\"important-file)") Solution: bind *READ-EVAL* to false (but ....) - Pitfall: printing arrays Information about fill-pointer, element-type, adjustability and displacement is lost. Solutions: (1) don't worry, be happy (2) write your own printer or serializer - Pitfall: printing other objects The spec says: "For example, hash tables, readtables, packages, streams, and functions might not print readably." Solutions: see above ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 1. PRINT and READ: using PRINT-OBJECT ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Using characters, strings, symbols, pathnames, numbers, arrays, lists and structures can go a long way, but... ... what if I want to use CLOS objects? Problem: Instances of classes with metaclass STANDARD-CLASS (i.e. classes defined with DEFCLASS) are not printed readably. Solution: define PRINT-OBJECT methods Example: (defclass point () ((x :initarg :x) (y :initarg :y) (z :initarg :z))) (defmethod print-object ((point point) stream) (if *print-readably* (with-slots (x y z) point (format stream "#.(MAKE-INSTANCE 'POINT :X ~S :Y ~S :Z ~S)" x y z)) (call-next-method))) (let ((*print-readably* t)) (print (make-instance 'point :x 10 :y 20 :z 0))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 2. MAKE-LOAD-FORM and LOAD ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Using read-time evaluation can go a long way, but ... Advantages: - object circularity and initialization can be handled nicely - cleaner - can be faster Disadvantages: - Fast version needs implementation-specific form dumper Lispworks: WITH-OUTPUT-TO-FASL-FILE, DUMP-FORMS-TO-FILE - circularity check is expensive - uses lots of memory (pig in snake) - Data is not portable for fast version. Background: (defclass point () ((x :initarg :x) (y :initarg :y) (z :initarg :z))) (defun test (point) (let ((origin #.(make-instance 'point :x 0 :y 0 :z 0))) (point-distance point origin))) Doesn't compile: the origin can't be dumped to a fasl-file without a MAKE-LOAD-FORM method. So: (defmethod make-load-form ((point point) &optional environment) (declare (ignore environment)) `(make-instance 'point :x ,(slot-value point 'x) :y ,(slot-value point 'y) :z ,(slot-value point 'z))) Now points can be dumped to a fasl-file that can be LOADed. We can hijack that mechanism if our implementation supports something like DUMP-FORMS-TO-FILE. (Portable alternative: write the forms to a Lisp file and compile it.) Notes: - second value of MAKE-LOAD-FORM lets you handle object initialization (e.g. to restore circular references) - MAKE-LOAD-FORM-SAVING-SLOTS is handy for simple cases. (defmethod make-load-form ((point point) &optional environment) (make-load-form-saving-slots point '(x y z))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 3. SAVE-IMAGE ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; - implementation-specific - data is not portable - saves everything - means program exit in some implementations Variant: dumping memory - low level - can also be done incrementally - paging and virtual memory systems - TexasStore (Paul Wilson) - operating system support ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 4. Serializing/deserializing ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Advantages: - can be lots faster and use lots less memory - when done carefully, the data can be portable across Lisp implementations Existing work: - CL-STORE (by Sean Ross, on common-lisp.net) - SAVE-OBJECT (Kerry V. Koitzsch and Kevin Thompson) - unpublished library by Arthur Lemmens (assumes non-circularity as (overridable) default) - chapter 4 of Sonya Keene's "Object-Oriented Programming in CL" - more? Basic idea: - serialize an object (to a stream) by: - writing a marker code (usually a small integer) that represents the (internal) object type to the stream - serializing the contents of the object to the stream - deserialize an object (from a stream) by: - reading the marker - dispatching to the corresponding function to deserialize the contents Dealing with object circularity: - we need to do this ourselves now - when writing: - use some kind of table to register all serialized objects that may be referenced more than once - don't write objects that have already been serialized; write out a special +reference+ marker and then some kind of object ID instead - when reading: - ..... - in general this can be very expensive Some pitfalls: - no portable way to determine the slots and structure type of arbitrary structures - need the MOP (semi-portable) to determine slots and class of instances - most-significant or least-significant byte first? - implementation-defined character attributes - versioning: what should happen when the program evolves but it needs to deal with old data? - ..... ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 3. Serializing/deserializing: code fragment 1 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defclass serializer () ((shared-objects :initform (make-hash-table) :reader serializer-shared-objects :documentation "A mapping of objects to/from numbers.") (struct-registry :initform (make-hash-table) :reader serializer-struct-registry :documentation "A mapping of struct names/numbers to struct-info.") (symbol-registry :initform (make-hash-table) :reader serializer-symbol-registry :documentation "Mapping symbols to/from numbers.") (stream :initarg :stream :reader serializer-stream :documentation "An (unsigned-byte 8) stream."))) (defgeneric serialize (object serializer) (:documentation "Writes a serialized version of an object to the stream in a serializer.")) (defgeneric deserialize-contents (marker serializer) (:documentation "Deserialize the contents for an object that starts with MARKER.")) (defun serialize-byte (byte serializer) "Writes an unsigned-byte to a stream or array." (write-byte byte (serializer-stream serializer))) (defun deserialize-byte (serializer) (read-byte (serializer-stream serializer))) (defun serialize-marker (marker serializer) (serialize-byte marker serializer)) (defun deserialize (serializer) (loop for marker = (deserialize-byte serializer) until (not (eql marker +ignore+)) finally return (deserialize-contents marker serializer))) (defun serialize-list (list serializer) "Serializes a proper list by first serializing its length and then all the elements of the list." (serialize (length list) serializer) (dolist (elt list) (serialize elt serializer))) (defun deserialize-list (serializer) (let ((length (deserialize serializer))) (loop repeat length collect (deserialize serializer)))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 3. Serializing/deserializing: code fragment 2 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defmethod serialize ((symbol symbol) stream) (let ((symbol-nr (ensure-symbol-registry-nr symbol stream))) (serialize-marker +symbol-reference+ stream) (serialize symbol-nr stream))) (defun ensure-symbol-registry-nr (symbol stream) (let ((registry (serializer-symbol-registry stream))) (or (gethash symbol registry) (progn (let ((number (hash-table-count registry))) (setf (gethash symbol registry) number) ;; (cond ((keywordp symbol) (serialize-marker +keyword-definition+ stream) (serialize number stream) (serialize (symbol-name symbol) stream)) ((null (symbol-package symbol)) (serialize-marker +uninterned-symbol-definition+ stream) (serialize number stream) (serialize (symbol-name symbol) stream)) (t (serialize-marker +symbol-definition+ stream) (serialize number stream) (serialize (package-name (symbol-package symbol)) stream) (serialize (symbol-name symbol) stream))) ;; number))))) (defmethod deserialize-contents ((marker (eql +symbol-reference+)) stream) (let ((number (deserialize stream))) (or (gethash number (serializer-symbol-registry stream)) (cerror "Use NIL instead." "Can't find symbol with reference number ~D." number) nil))) (defmethod deserialize-contents ((marker (eql +keyword-definition+)) stream) (let ((number (deserialize stream))) (setf (gethash number (serializer-symbol-registry stream)) (intern (deserialize stream) (find-package :keyword)))) (deserialize stream)) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 5. Logging changes ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Serializing can go a long way, but ... ... what if my program (or computer) crashes? Solution: log all relevant changes immediately Basic assumption: all data fits in memory Existing work: Prevalence (Sven van Caekenberghe, uses XML) unpublished library by Joe Marshall unpublished library by Arthur Lemmens Can be built on top of serializer (or write a file that can be LOADed directly). Question: how can we detect changes? - you can't: program carefully - CLOS and MOP help: you can detect creation of instances and modifications of slot values. For slot changes: write a metaclass PERSISTENT-CLASS and specialize (SETF SLOT-VALUE-USING-CLASS) for classes with this metaclass. For instance creation: write an :AFTER method on INITIALIZE-INSTANCE. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 5. Logging changes: two code fragments ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Defining a new metaclass. (defclass database-class (standard-class) ;; This is a metaclass. ((persistent-slots :initform '() :accessor class-persistent-slots))) (defclass persistent-slot-mixin () ;; This is part of the slot-definition class ;; for classes with metaclass DATABASE-CLASS. ((persistence :initarg :persistence :initform t :reader slot-persistence :documentation "T for persistent slots, NIL for transient slots.") (inverse :initarg :inverse :initform nil :reader slot-inverse :documentation "Name for an automatically created inverse function.") (test :initarg :test :initform #'equalp :reader slot-test :documentation "Equality predicate used by inverse and unique.") (unique :initarg :unique :initform nil :type boolean :reader slot-unique))) ;; Specializing (SETF SLOT-VALUE-USING-CLASS). (defmethod (setf slot-value-using-class) :around (new-value (class database-class) object slot-name) ;; In Lispworks, SLOT-DEF is just the slot-name, not the effective ;; slot definition (which it should be, according to AMOP). (if (and *log-slot-changes-p* (slot-boundp object 'database)) (let* ((slot-def (find slot-name (class-slots class) :key #'slot-definition-name))) (prog1 (call-next-method) ;; Register the change (when (slot-persistence slot-def) (save-slot-value object slot-name new-value (database-object-database object))))) (call-next-method))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 6. Interfacing to relational databases ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Logging changes can go a long way, but ... ... what if my data doesn't fit in memory? ... my data must be compatible with the rest of the world? Existing stuff: CommonSQL (Lispworks) CLSQL (Kevin Rosenberg) MaiSQL (Pierre Mai) SQL-ODBC (Paul Meurer) SQL interface by Franz Elephant more? - Functional and object-oriented mappings. - Need some way to describe the mapping between Lisp classes and database tables. - Simple datatypes (fixnums, floats, timestamps, booleans, strings) can use a simple mapping. - Homogeneous lists (containing objects of only one type) can map to two tables (one for the container objects and one for the contained objects). - Lisp doesn't have a (LIST-OF ...) type. - More complicated datatypes can map to strings or blobs (using PRINT/READ or serialize techniques). - Deleting objects - Transaction handling "WITH-TRANSACTION executes a body of code and then does a commit, unless the body failed in which case it does a rollback." ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 6. Interfacing to relational databases: a code fragment ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; From a conference registration system. ;; Library: Paul Meurer's SQL-ODBC extended with ;; an object-oriented interface (similar to CommonSQL) ;; by Arthur Lemmens. ;; Note the mapping of: ;; - Lisp slot names to database columns ;; - specification of key columns ;; - lists to database joins ;; (both one-to-many and one-to-one). (def-view-class conference () ((code :initarg :code :type (string 50) :reader code :db-kind :key :column "ConferenceCode") (title :initarg :title :type (string 100) :reader title) (date :initarg :date :type timestamp :reader conference-date) (registrations :reader registrations :db-kind :join :db-info (:join-class registration :home-key code :foreign-key conference-code :set t))) (:base-table Conferences)) (def-view-class registration () ((id :initarg :id :reader id :type integer :db-kind :key :column "RegistrationID" :db-type "LONG") (conference-code :column "Conference" :type (string 50)) (person-id :reader person-id :type integer :column "Person" :db-type "LONG") (person :db-kind :join :reader person :db-info (:join-class person :home-key person-id :foreign-key id :set nil)))) (def-view-class person () ((id :initarg :id :reader id :type integer :db-kind :key :column "PersonID" :db-type "LONG") (last-name :initarg :last-name :initform "" :type (string 50) :column "Name" :reader last-name)) (:base-table PERSONS)) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 6. Interfacing to relational databases: code fragment 2 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Reading the value of a join slot. ;;; (defmethod slot-value-using-class :around ((class standard-db-class) object slot-def) #+lispworks ;; In Lispworks, SLOT-DEF is just the slot-name, not the effective ;; slot definition (which it should be, according to AMOP). (setq slot-def (find slot-def (class-slots class) :key #'slot-definition-name)) ;; When it's an unbound join slot, compute the value. (when (and (eq (db-slot-kind slot-def) :join) (not (slot-boundp object (slot-definition-name slot-def)))) (setf (slot-value object (slot-definition-name slot-def)) (load-join-slot object slot-def))) ;; (call-next-method)) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 6. Interfacing to relational databases: code fragment 3 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Extending LOOP (or ITERATE) for queries. ;; From CommonSQL manual. (loop for (name salary) being each record in [select [ename] [sal] :from [emp]] initially (format t "~&~20A~10D" 'name 'salary) when (and salary (> salary 2750)) count salary into salaries and sum salary into total and do (format t "~&~20A~10D" name salary) else do (format t "~&~20A~10D" name "N/A") finally (format t "~2&Av Salary: ~10D" (/ total salaries))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 7. Full-blown persistence in Lisp ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Existing work: - PLOB (Persistent Lisp OBjects) Platforms: Windows/NT, Linux, Solaris, ... Author: Heiko Kirschke License: BSD-style, POSTORE backend from the University of St. Andrews has unclear license - AllegroStore Author: Franz Platforms: many platforms Uses ObjectStore as backend - Statice Platform: Open Genera See article by Ralf Moeller - PCLOS Author: Andreas Paepcke (at Hewlet Packard??) - Wood (William's Object Oriented Database) Platform: MCL Author: Bill St. Clair "A Wood file is called a persistent heap. A persistent heap has one distinguished root object. All objects in the heap must be accessible from the root. " - Elcaro (name?) License: will be Open Source (LLGPL) Author: Jans Aasman - more? MOP: also specialize SLOT-VALUE-USING-CLASS. Implementation: - do what database people do: give each object an ID, build indices (e.g. btrees) to map object id's to disk addresses - use serializer Interesting questions: caching, schema evolution, garbage collection, persistent lists/symbols,....