Ok this is a question you must get a lot but from my archive searches on
google their are'nt any awnswers that satisfy or they cause more questions.
Reading and writing to a file in lisp (on windows) I get right with :io and
:overwrite but it seems to be limited to inserting new data and I need to
change data (please dont tell me to write it to a temporary file and then
rename). I must add that I have only played with the string type 'reads' and
'writes'.
Maybe I am just showing my ignorance on the subject of IO in general (it is
a dark area for me I have been spoiled but also restricted by databases) but
surely if you can add you must be able to replace. I can imagine if lisp
only reads from disk at the time of a 'read' and only the requested length
(or does lisp load the wole file into a memory stream on open and then
rewrite the wole file when you close it?) that replacing a data segment with
one of differing length might be a problem, but then what trick is lisp or
the operating system employing to do the inserts that seem to be possible
(is it fracturing the file?).
Lastly writing to a temp file is not a option the files are just to big.
I would really appreciate any advise or comments or a refenrence to a site
or text.
"Harag" <·········@psychedelic.co.za> writes:
> Ok this is a question you must get a lot but from my archive searches on
> google their are'nt any awnswers that satisfy or they cause more questions.
>
> Reading and writing to a file in lisp (on windows) I get right with :io and
> :overwrite but it seems to be limited to inserting new data and I need to
> change data (please dont tell me to write it to a temporary file and then
> rename). I must add that I have only played with the string type 'reads' and
> 'writes'.
>
> Maybe I am just showing my ignorance on the subject of IO in general (it is
> a dark area for me I have been spoiled but also restricted by databases) but
> surely if you can add you must be able to replace. I can imagine if lisp
> only reads from disk at the time of a 'read' and only the requested length
> (or does lisp load the wole file into a memory stream on open and then
> rewrite the wole file when you close it?) that replacing a data segment with
> one of differing length might be a problem, but then what trick is lisp or
> the operating system employing to do the inserts that seem to be possible
> (is it fracturing the file?).
>
> Lastly writing to a temp file is not a option the files are just to big.
>
> I would really appreciate any advise or comments or a refenrence to a site
> or text.
As previously indicated, you can use file-position to move inside the
file and overwrite parts of it.
Of course, it may pose problems if you must write bigger or smaller
data.
unix-like OS provide only a stream of byte metaphore for files.
Older OS provided integrated access methods with various kind of
indexes, fixed or variable length records, etc.
You can implement something similar over the stream of byte.
For example, a file containing the following fixed-length (42 bytes)
space-filled lines:
+------------------------------------------+
| 1 Hao Wang, logicien americain. |
| 2 |
| 3 L'algorithme en question a �t� |
| 4 publi� en 1960 dans l'IBM Journal, |
| 5 article intitule "Toward Mechanical |
| 6 Mathematics",avec des variantes et une |
| 7 extension au calcul des pr�dicats. |
| 8 Il s'agit ici du "premier |
| 9 programme" de Wang, systeme "P". |
+------------------------------------------+
could contain actually the following records:
header: | first-rec: 004 | last-rec: 010 | free-list: 005 | rec-size: 42 |
000: | next: 002 | prev: 004 | 2 |
001: | next: 009 | prev: NIL |..........................................|
002: | next: 003 | prev: 000 | 3 L'algorithme en question a �t� |
003: | next: 006 | prev: 002 | 4 publi� en 1960 dans l'IBM Journal, |
004: | next: 000 | prev: NIL | 1 Hao Wang, logicien americain. |
005: | next: 001 | prev: NIL |..........................................|
006: | next: 011 | prev: 003 | 5 article intitule "Toward Mechanical |
007: | next: 008 | prev: 011 | 7 extension au calcul des pr�dicats. |
008: | next: 010 | prev: 007 | 8 Il s'agit ici du "premier |
009: | next: NIL | prev: NIL |..........................................|
010: | next: NIL | prev: 008 | 9 programme" de Wang, systeme "P". |
011: | next: 007 | prev: 006 | 6 Mathematics",avec des variantes et une |
Then it's really easy to delete a record or insert a new record: just
seek and rewrite the current, next, previous, and header records.
--
__Pascal_Bourguignon__
http://www.informatimago.com/
> header: | first-rec: 004 | last-rec: 010 | free-list: 005 | rec-size: 42
|
> 000: | next: 002 | prev: 004 | 2
|
> 001: | next: 009 | prev: NIL
|..........................................|
> 002: | next: 003 | prev: 000 | 3 L'algorithme en question a �t�
|
> 003: | next: 006 | prev: 002 | 4 publi� en 1960 dans l'IBM Journal,
|
> 004: | next: 000 | prev: NIL | 1 Hao Wang, logicien americain.
|
> 005: | next: 001 | prev: NIL
|..........................................|
> 006: | next: 011 | prev: 003 | 5 article intitule "Toward Mechanical
|
> 007: | next: 008 | prev: 011 | 7 extension au calcul des pr�dicats.
|
> 008: | next: 010 | prev: 007 | 8 Il s'agit ici du "premier
|
> 009: | next: NIL | prev: NIL
|..........................................|
> 010: | next: NIL | prev: 008 | 9 programme" de Wang, systeme "P".
|
> 011: | next: 007 | prev: 006 | 6 Mathematics",avec des variantes et une
|
>
> Then it's really easy to delete a record or insert a new record: just
> seek and rewrite the current, next, previous, and header records.
Pascal thats great thanx I have been looking at more complicated formats
like HDF5 but the pure scope of implementing it had me backing off this will
do nicely for the little project I am working on now its even better than
the XML hack I was attempting. Now I can concentrate on getting the app
finished and learning more about lisp other than IO stuff.
"Harag" <·········@psychedelic.co.za> writes:
> Ok after complaining about complicated structures I wnet ahead and slapped
> this monster together to handle a mini db/tree with multiple levels.
>
> (i did not mash this because its compicated enough I just did 2 recotds)
>
> 000: | next reference: NIL | prev ref: NIL | parent struc : NIL | prev
> ref : NIL | 1 |
> 001: | next keyword : NIL | prev ref: NIL | parent struc : 000 | prev
> keyword : NIL | identity |
> 002: | next summary : 003 | prev ref: NIL | parent struc : 000 | prev
>
> do you think this can work pascal ?
Now that you're getting more specific.
The principle to be able to delete or insert data "in the middle of a
file", is to cut this file in records of fixed size and implement a
higher level data structure on this set of records. Some records may
be "deleted" (garbage collection may be implemented with a free list).
The next question is that of the access mode. For sequential access,
a simple linked or double-linked list is enough. If you need to
access records based on keys, you can add an index layer. Here, you
may consider using one of the db libraries provided in unix (db, dbm,
gdbm), thru UFFI. They essentially implement a hash table, a map (key
-> value).
You may consider each record payload as a row in a database. The
longer the record, the less impact the overhead of record header
(previous/next pointers) have. But the longer the record, the more
space is lost when it's deleted and not reused or when it's not fully
used (variable length records may be implemented as fixed-length
records of maximum size containing a length field).
You could have in each one of the records all your fields:
| ref | keyword | author | auth-inits | book | date publ | summary-------- |
(showing only the payload, the previous-record and next-record
belonging to the lower level).
Of course, like in the databases, you may want to define several
tables and do joints, but you're reimplementing a database then.
An intermediate technique is to use a record type field; it's like
having several tables stored in one file. If the record sizes are
vastly different, it's not economical space-wise (note the fillers).
| rec-type='author' | ref | author | auth-inits | filler |
| rec-type='book' | ref | book | date publ | filler |
| rec-type='summary' | ref | index | summary --- line --- |
The structure you're describing is a degenerate case of this, using
only one field per record.
If you use a dbm, of course you can use complex keys, like the
concatenation of major key, minor key and record type, but you could
not do a "select" on a part of keys returning several "rows", so you
would have to enumerate the keys to gather all the data. On the other
hand gdbm, contrarily to the old dbm can handle any size for key or
data, so you could store big records with all the data related to an
entity (and then you'd have to manage the content of your records,
inserting and deleting fields, but you can do that in RAM).
Otherwise, what would be the reason why not use a real RDBMS? mysql or
postgresql are quite nice.
--
__Pascal_Bourguignon__
http://www.informatimago.com/
OK thats a mouth full and it will take me a moment to work through it all
... thanks for the input pascal it is really appreciated
Their are a couple of reasons for not using a DB like mysql (which I use a
lot):
1. This is a lisp learning excercise and I dont have all the paraphanalia
(packages/libs and stuff) to access a db from lisp yet so I thought it would
be a good time to learn this file stuff. I have been programming for close
onto 10 years and have done no more with IO than write out a log file
sequencially (and replace the whole thing next itme). I was started in the
DB world and never could bother with the file stuff untill recently (see
next point).
2. I have a little gripe about relational dms as implemented currently (if
your interested I have summed it up here more or less
http://www.psychedelic.co.za/newrelational/newrelational.htm) so I thought I
would try my hand at one or two concepts and see what its all about.
I never had a tertiary education(or any education for that matter) in
programming so I ask a lot of stupid questions and re-invent the wheel a lot
when it comes to the basic / lowlevel stuff but I'm learning more and more
every day or atleast I hope so. This is why I really appreciate your input.
Harag writes:
> 1. This is a lisp learning excercise and I dont have all the paraphanalia
It would be great if you could take this survey:
The Road to Lisp Survey
http://alu.cliki.net/The%20Road%20to%20Lisp%20Survey
Paolo
--
Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film
"Harag" <·········@psychedelic.co.za> writes:
> 2. I have a little gripe about relational dms as implemented currently (if
> your interested I have summed it up here more or less
> http://www.psychedelic.co.za/newrelational/newrelational.htm) so I thought I
> would try my hand at one or two concepts and see what its all about.
It sounds like you're wanting an OODBS, an Object storage.
There are a few hits on OODB and Lisp:
http://www.google.com/search?as_q=OODB+Lisp
--
__Pascal_Bourguignon__
http://www.informatimago.com/
> It sounds like you're wanting an OODBS, an Object storage.
I dont anything about oodbs ... is that what i'm attempting
http://www.psychedelic.co.za/newrelational/attempt.htm (this is draft stuff
so dont laugh to much ;o})?
> If you use a dbm, of course you can use complex keys, like the
> concatenation of major key, minor key and record type, but you could
> not do a "select" on a part of keys returning several "rows", so you
> would have to enumerate the keys to gather all the data.
So what you are saying is that like in my example where each field occupies
a row using dbm I would not be able to do a type of select because a row
does not make up a whole record?
>On the other hand gdbm, contrarily to the old dbm can handle any size
for key or
> data, so you could store big records with all the data related to an
> entity (and then you'd have to manage the content of your records,
> inserting and deleting fields, but you can do that in RAM).
Ok what you are implying here is that dbm had limitations on the size of key
fields and gdbm does not. Thus I could use my structure and dump as much
meta-data with each field as I would like? The select would still be out of
the question though?
"Harag" <·········@psychedelic.co.za> writes:
> > If you use a dbm, of course you can use complex keys, like the
> > concatenation of major key, minor key and record type, but you could
> > not do a "select" on a part of keys returning several "rows", so you
> > would have to enumerate the keys to gather all the data.
>
> So what you are saying is that like in my example where each field occupies
> a row using dbm I would not be able to do a type of select because a row
> does not make up a whole record?
Indeed. The usage pattern of db is that of the hash table, a mere map
(key -> value). In that case, what you want to do is identify your
"objects" with a key, and use it to find the object. You can skip
from one to the other keeping the keys as reference, or you can
enumerate the keys present in the db, like you do with a hash table.
But doing a "select", that is, filtering the objects on a given
criteria is not done automatically, you have to read the keys and
values and select the ones you're interested in yourself.
> >On the other hand gdbm, contrarily to the old dbm can handle any size
> for key or
> > data, so you could store big records with all the data related to an
> > entity (and then you'd have to manage the content of your records,
> > inserting and deleting fields, but you can do that in RAM).
>
> Ok what you are implying here is that dbm had limitations on the size of key
> fields and gdbm does not. Thus I could use my structure and dump as much
> meta-data with each field as I would like? The select would still be out of
> the question though?
Yes.
There is a technique to have several "indexes" and non-unique
keys. Keep in one db file a mapping (id -> object), and create as many
index files you want that are other db files containing a mapping (key
-> (list of ids)). Then given a set of keys, you first search the
index db files, may be intersect the resulting lists of id, and
finally retrieve the objects from the data file.
Here is a prototype in Lisp (not including the I/O aspects) using hash
tables:
(setq data (make-hash-table))
(setf (gethash 1 data)
'(:author "Wood, Derick" :year 1993
:title "Data Structures, Algorithms, and Performance"
:isbn "0-201-52148-2"))
(setf (gethash 2 data)
'(:author "Celko, Joe" :year 2000
:title "SQL for smarties: advanced SQL programming /2nd ed"
:isbn "1-55860-576-2"))
(setf (gethash 3 data)
'(:author "Celko, Joe" :year 1999
:title "Data & Databases: conseps in practice"
:isbn "1-55860-432-4"))
So you can retrieve an object using the identifier:
[14]> (gethash 2 data)
(:AUTHOR "Celko, Joe" :YEAR 2000 :TITLE
"SQL for smarties: advanced SQL programming /2nd ed" :ISBN "1-55860-576-2") ;
T
Now you can build indexes:
(defun make-index (db keysf &key test)
"returns a hash table mapping the keys returned by the keysf function
applied to the values of the db hash table to the keys of the db hash."
(let ((index (make-hash-table :test test)))
(maphash (lambda (id value)
(mapc (lambda (key) (pushnew id (gethash key index)))
(funcall keysf value)))
db)
index))
[16]> (setq year-index (make-index data (lambda (obj)(list (getf obj :year)))))
#S(HASH-TABLE EQL (1993 1) (2000 2) (1999 3))
[29]> (setq title-keywords-index
(make-index data
(lambda (obj)
(delete ""
(mapcar (lambda (word)
(string-trim ",.:/&" word))
(split-string (getf obj :title) " "))
:test (function string=)))
:test (function equalp)))
#S(HASH-TABLE EQUALP ("Performance" 1) ("and" 1) ("Algorithms" 1)
("Structures" 1) ("ed" 2) ("2nd" 2) ("programming" 2) ("advanced" 2)
("smarties" 2) ("for" 2) ("SQL" 2) ("practice" 3) ("in" 3)
("conseps" 3) ("Databases" 3) ("Data" 1 3))
(note the two ID for "Data"). The same for author:
[30]> (setq author-keywords-index
(make-index data
(lambda (obj)
(delete ""
(mapcar (lambda (word)
(string-trim ",.:/&" word))
(split-string (getf obj :author) " "))
:test (function string=)))
:test (function equalp)))
#S(HASH-TABLE EQUALP ("Derick" 1) ("Wood" 1) ("Joe" 2 3) ("Celko" 2 3))
(split-string is not CL; see emacs split-string for an example of
implementation).
Then you can find your objects by different indexes:
[33]> (gethash "Celko" author-keywords-index)
(2 3) ;
T
[34]> (gethash 1999 year-index)
(3) ;
T
[38]> (intersection (gethash "Celko" author-keywords-index)
(gethash 1999 year-index))
(3)
[39]> (mapcar (lambda (id) (gethash id data))
(gethash "Celko" author-keywords-index))
((:AUTHOR "Celko, Joe" :YEAR 2000
:TITLE "SQL for smarties: advanced SQL programming /2nd ed"
:ISBN "1-55860-576-2")
(:AUTHOR "Celko, Joe" :YEAR 1999
:TITLE "Data & Databases: conseps in practice"
:ISBN "1-55860-432-4"))
[40]> (mapcar (lambda (id) (gethash id data))
(gethash "Data" title-keywords-index))
((:AUTHOR "Wood, Derick" :YEAR 1993
:TITLE "Data Structures, Algorithms, and Performance"
:ISBN "0-201-52148-2")
(:AUTHOR "Celko, Joe" :YEAR 1999
:TITLE "Data & Databases: conseps in practice"
:ISBN "1-55860-432-4"))
Since you're interested in the I/O aspects in Lisp, you could
implement an equivalent of gdbm in lisp. It could have the same high
level kind of API than the hash tables.
Read the chapter about the Streams
http://www.lisp.org/HyperSpec/Body/chap-21.html
(read-sequence, write-sequence, file-position)
[ use `(array unsigned-byte (,blocksize)) as buffer ].
Add a layer to allocate, read, write, and free blocks.
On this layer, implement the storage of records (see print-object ;
some type of lisp data may be hard or impossible to store or read back
automatically).
If you want to allow variable sized keys and values like gdbm, you
should take care in the allocation of the records and their reference,
since this would break the 1-1 mapping between the records and the
blocks. This mean that you may need to have a map (record-number ->
(block-number...)) at this layer.
And finally, add a layer for the key->value mapping. That could be
done merely storing the hash table mapping the (key-record-number ->
value-record-number), and then you would load the whole hash table in
Lisp core to access the file, or you may prefer keep the mapping in
the file.
--
__Pascal_Bourguignon__
http://www.informatimago.com/
> Since you're interested in the I/O aspects in Lisp, you could
> implement an equivalent of gdbm in lisp. It could have the same high
> level kind of API than the hash tables.
Ok maybe thats an idea I'll put some stuff together and slap it on my
website and then only put up change notices onto this news group so you or
anybody else (if anybody is interested) can follow the progress...else I
might just get accused of abbusing this newsgroup...(lol)
Well Pascal the hack now exists (very primitive and needs more testing)
...associative-gdbm or something ;o}
...in true tradition(actually lazyness) documentation is a bit on the thin
side...to follow one day
http://www.psychedelic.co.za/newrelational/lacsap.htm
"Harag" <·········@psychedelic.co.za> writes:
> Well Pascal the hack now exists (very primitive and needs more testing)
> ...associative-gdbm or something ;o}
>
> ...in true tradition(actually lazyness) documentation is a bit on the thin
> side...to follow one day
>
> http://www.psychedelic.co.za/newrelational/lacsap.htm
Indeed. You should either use more explicit names than:
(defvar *lid* 1)
(defvar *lrid* 1)
or add documentation strings:
(defvar *lid* 1 "Largest Integer Designating.") ;; for example...
(defvar *lrid* 1 "Lowest Rotund Ignominious Decimal.")
The same for the functions, instead of writting a comment _before_ (I
hate this), put it as documentation string conveniently placed just
_after_ the defun signature:
(defun print-vpos (vpos stream)
"This struct is used to store details about an entity's value's
physical position."
(format stream "(~A ~A \"~A\")"
...))
instead of:
;;this struct is used to store details about an entity's value's
;;physical position
(defun print-vpos (vpos stream)
(format stream "(~A ~A \"~A\")"
...))
Oops! That comment did not correspond to the adjacent object!
defstruct too accepts a documentation string.
I would write the following: rather as:
(equal parent nil) ---> (null parent)
(equal rid 0) ---> (= 0 rid) or: (zerop rid)
(defun make-tindex (e-db)
(let ((index (make-hash-table :test (function string=))))
This is not Common-Lisp:
[28]> (load "lacsap")
*** - MAKE-HASH-TABLE: illegal :TEST argument #<SYSTEM-FUNCTION STRING=>
Use (function equal) instead, in hash tables.
So, you're building an index named "entity.db" of a data file named
"value.db". Unfortunately, since you're not using fixed size records
in the data file, you're needing functions such as:
(defun defrag-e-values (db newpath)
(maphash (lambda (key entity)
(set-e-val entity (get-e-sval entity) newpath))
;;; OOPS! set-e-val is not defined, it's set-e-value
db)
newpath)
You've got a funny strange little toy here. But what are you trying
to accomplish? This looks a little unfocused.
It looks like you're trying to manipulate in-memory a data structure
that would be implemented only on disk.
In memory, we don't usually refer objects with reference numbers thru
hash tables. (At worse, we could do it with indices of an array).
If you want to have "entity" objects, when you create a new entity,
you get it, and when you insert the entity in a "database", you still
keep the entity to further manipulate it. It's much easier to do so in
memory than to having to retrieve the object from the "database"
everytime.
Then, the mapping between the objects and their reference number is
done only by the database access layer.
(setf pid (add-e (create-entity "author") *db*))
(setf id (add-e (create-entity "surname" :parent (gethash pid *db*)) *db*))
(set-e-value (gethash id *db*) "Celcko")
(setf id (add-e (create-entity "names" :parent (gethash pid *db*)) *db*))
(set-e-value (gethash id *db*) "Joe")
(setf id (add-e (create-entity "book" :parent (gethash pid *db*)) *db*))
(set-e-value (gethash id *db*) "SQL for smarties: advanced SQL programming /2nd ed")
(setf id (add-e (create-entity "year" :parent (gethash id *db*)) *db*))
(set-e-value (gethash id *db*) "2000")
(setf id (add-e (create-entity "book" :parent (gethash pid *db*)) *db*))
(set-e-value (gethash id *db*) "Data & Databases: conseps in practice")
(setf id (add-e (create-entity "year" :parent (gethash id *db*)) *db*))
(set-e-value (gethash id *db*) "1999")
could be rewritten as:
(let ((author (create-entity "author")))
(add-e author *db*)
(let ((surname (create-entity "surname" :parent author)))
(add-e surname *db*)
(set-e-value surname "Celcko"))
(let ((name (create-entity "name" :parent author)))
(add-e name *db*)
(set-e-value name "Joe"))
(let ((book (create-entity "book" :parent author)))
(add-e book *db*)
(set-e-value book "SQL for smarties: advanced SQL programming /2nd ed")
(let ((year (create-entity "book" :parent book)))
(add-e year *db*)
(set-e-value year "2000")))
(let ((book (create-entity "book" :parent author)))
(add-e book *db*)
(set-e-value book "Data & Databases: conseps in practice")
(let ((year (create-entity "book" :parent book)))
(add-e year *db*)
(set-e-value year "1999"))))
Then you could simplify it in two directions, depending on your
needs. Either note the repeatition of the same form
(let ((ENTITY (create-entity ENTITY-NAME)))
(add-e ENTITY *db*)
(OTHER-STUFF-WITH-ENTITY-AS-PARENT))
which leads to:
(DEFMACRO WITH-GENSYMS (SYMS &BODY BODY)
"Replaces given symbols with gensyms. Useful for creating macros.
This version by Paul Graham in On Lisp."
`(LET ,(MAPCAR #'(LAMBDA (S) `(,S (GENSYM))) SYMS) ,@BODY)
)
(defvar *with-entity-parent* nil)
(proclaim '(special *with-entity-parent*))
(defmacro with-entity (entity-name entity-value &body body)
(with-gensyms (entity value)
`(let ((,entity (create-entity ,entity-name :parent *with-entity-parent*))
(,value ,entity-value))
(add-e ,entity *db*)
,@(when value (list `(set-e-value ,entity ,value)))
,@body)))
and then:
(with-entity ("author")
(with-entity ("surname" "Wood"))
(with-entity ("name" "Derick"))
(with-entity ("book" "SQL for smarties: advanced SQL programming /2nd ed")
(with-entity ("year" "2000")))
(with-entity ("book" "Data & Databases: conseps in practice")
(with-entity ("year" "1999"))))
But you could as well define a function that would parse a tree
defining your entity:
(make-entity-from-description
'(:author nil
(:surname "Wood")
(:name "Derick")
(:book "SQL for smarties: advanced SQL programming /2nd ed"
(:year "2000"))
(:book "Data & Databases: conseps in practice"
(:year "1999"))))
For example:
(defun make-entity-from-description (description &optional parent)
(let ((entity (create-entity (string (first description)) :parent parent)))
(add-e entity *db*)
(when (second description) (set-e-value entity (second description)))
(map nil (lambda (subentity)
(make-entity-from-description subentity entity))
(cddr description))
entity))
You should read the book of Peter Seibel, in particular this chapter:
http://www.gigamonkeys.com/book/practical-a-simple-database.html
--
__Pascal_Bourguignon__
http://www.informatimago.com/
>
> So, you're building an index named "entity.db" of a data file named
> "value.db". Unfortunately, since you're not using fixed size records
> in the data file, you're needing functions such as:
>
> (defun defrag-e-values (db newpath)
> (maphash (lambda (key entity)
> (set-e-val entity (get-e-sval entity) newpath))
> ;;; OOPS! set-e-val is not defined, it's set-e-value
> db)
> newpath)
My appologies here I was renaming functions and did not do enough testing
before "publishing"...
>
>
> You've got a funny strange little toy here. But what are you trying
> to accomplish? This looks a little unfocused.
>
> It looks like you're trying to manipulate in-memory a data structure
> that would be implemented only on disk.
Yes any "reasonable" size DB in this format will have to be implemented on
disk with only sub "trees" in memory to manipulate. For now I was just using
the hashtable to try out some ideas.
Maybe a couple of examples of what I'm aiming at in the end will help to
explain where I'm trying to go.
A typical personell record:
Personnel Table:
Personnel Number : 001
Surname : Snot
Name : Piet
Initials : P
Sex : Male
Personnel Contacts:
Personnel Number : 001
Type : Home
Phone : 123456789
Personnel Number : 001
Type : Mobile
Phone : 987654321
Personnel Dependants:
Personnel Number : 001
Relation : Spouse
Surname : Snot
Name : Sannie
Initials : S
Sex : Female
Personnel Number : 001
Relation : Child
Surname : Snot
Name : Pietie
Initials : P
Sex : Male
Personnel Salary:
Personnel Number : 001
Year : 2000
Annual : $10
Personnel Number : 001
Year : 2001
Annual : $9
Personnel Number : 001
Year : 2002
Annual : $8 (he lives in a third world country
;o})
Personnel Table:
Personnel Number : 002
Surname : Snot
Name : Sannie
Initials : S
Sex : Female
Personnel Contacts:
Personnel Number : 002
Type : Home
Phone : 123456789
Personnel Number : 002
Type : Mobile
Phone : 987654321
Personnel Dependants:
Personnel Number : 002
Relation : Spouse
Surname : Snot
Name : Piet
Initials : P
Sex : Male
Personnel Number : 002
Relation : Child
Surname : Snot
Name : Pietie
Initials : P
Sex : Male
----OR-----
Personnel:
Number : 001
Surname : Snot
Name : Piet
Initials : P
Sex : Male
Contacts:
Home Phone : 123456789
Mobile : 987654321
Dependants:
Relation : Spouse
Surname : Snot
Name : Sannie
Initials : S
Sex : Female
Relation : Child
Surname : Snot
Name : Pietie
Initials : P
Sex : Male
Personnel Salary:
Year : 2000
Annual : $10
Year : 2001
Annual : $9
Year : 2002
Annual : $8
Personnel:
Number : 002
Surname : Snot
Name : Sannie
Initials : S
Sex : Female
Contacts:
Home Phone : 123456789
Mobile : 987654321
Dependants:
Relation : Spouse
Surname : Snot
Name : Piet
Initials : P
Sex : Male
Relation : Child
Surname : Snot
Name : Pietie
Initials : P
Sex : Male
---OR--- (and this is what I am aiming at)
EID = ENTITY (INSTANCE) ID
PID = PARENT ENTITY ID
Personnel: (EID 1 PID 0)
Number : 001 (EID 2 PID 1)
Person (EID 3 PID 1)
Surname : Snot (EID 4 PID 3)
Name : Piet (EID 5 PID 3)
Initials : P (EID 6 PID 3)
Sex : Male (EID 7 PID 3)
Contacts (EID 8 PID 1)
Home Phone : 123456789 (EID 9 PID 8)
Mobile : 987654321 (EID 10 PID 8)
Dependants (EID 11 PID 1)
Relation : Spouse (EID 12 PID 11)
Person (EID 13 PID 12)
Surname : Snot (EID
14 PID 13)
Name : Sannie (EID
15 PID 13)
Initials : S
(EID 16 PID 13)
Sex : Female
(EID 17 PID 13)
Relation : Child (EID 18 PID 11)
Person (EID 19 PID 18)
Surname : Snot (EID 20
PID 19)
Name : Pietie (EID
21 PID 19)
Initials : P
(EID 22 PID 19)
Sex : Male
(EID 23 PID 19)
Personnel Salary (EID 24 PID 1)
Year : 2000 (EID 25 PID 24)
Annual : $10 (EID 26 PID 25)
Year : 2001 (EID 27 PID 24)
Annual : $9 (EID 28 PID 27)
Year : 2002 (EID 29 PID 24)
Annual : $8 (EID 30 PID 29)
Personnel (EID 31 PID 0)
Number : (EID 32 PID 31)
Person (EID 13 PID 31)
Surname : Snot (EID 14 PID 13)
Name : Sannie (EID 15 PID 13)
Initials : S (EID 16 PID
13)
Sex : Female (EID 17 PID 13)
Contacts: (EID 8 PID 31)
Home Phone : 123456789 (EID 9 PID 8)
Mobile : 987654321 (EID 10 PID 8)
Dependants: (EID 33 PID 31)
Relation : Spouse (EID 34 PID 33)
Person (EID 3 PID 34)
Surname : Snot (EID 4
PID 3)
Name : Piet (EID
5 PID 3)
Initials : P
(EID 6 PID 3)
Sex : Male (EID
7 PID 3)
Relation : Child (EID 35 PID 33)
Person (EID 19 PID 35)
Surname : Snot (EID 20
PID 19)
Name : Pietie (EID
21 PID 19)
Initials : P
(EID 22 PID 19)
Sex : Male
(EID 23 PID 19)
Note that the indentetion indicates the RELATIONSHIP between the various
data piece or ENTITIES.
What do I get for all this trouble:
1. Although I shoe the enities' data more thanonce the actual data will only
be stored once for the EID.
2. I'm hoping to set up heuristics that USES the EID,PID and ENTITY TYPE
(like Person) to infer new "data / information" like
PIET and SANNIE are a COUPLE
PIET and SANNIE are PIETIE's parents
etc
*NOTE* This heuristics bit is still vary vague in my head!!
3. If I get an EID I can roll up and down its structure without knowing
about "tables and there field structures and there relationships upfront. I
find this structure much more intuative to work with personnally and I want
to give it a go EVEN though I suspect that speed/processing time will be
slower/more intensive.
>
> In memory, we don't usually refer objects with reference numbers thru
> hash tables. (At worse, we could do it with indices of an array).
The reason I'm sticking to the hashtable is because after some use there
might be uge gaps in my numbering ranges (EID) and then it would not map
naturally to an array.
>
> If you want to have "entity" objects, when you create a new entity,
> you get it, and when you insert the entity in a "database", you still
> keep the entity to further manipulate it. It's much easier to do so in
> memory than to having to retrieve the object from the "database"
> everytime.
>
Yes I agree it needs work and I will come back to you on this issue once I
fully understand the path to the macro at the end of your code ;o}....(after
2 weeks of lisp Harag has till not tried his hand at a macro).
Thanks for the link to Peter Seibel's chapter. I also strated out with
propertylists but changed to structures, for some forgotten reason, some
where in the "growth" (can't call it development) of the code.
There is still a lot of areas that needs fine scrutiny but I am trying my
hand at Paul Grahams idea of using lisp as a specification tool.
"Pascal Bourguignon" <····@thalassa.informatimago.com> wrote in message
···················@thalassa.informatimago.com...
> "Harag" <·········@psychedelic.co.za> writes:
>
> > > So, you're building an index named "entity.db" of a data file named
> > > "value.db". Unfortunately, since you're not using fixed size records
> > > in the data file, you're needing functions such as:
> > >
> > > (defun defrag-e-values (db newpath)
> > > (maphash (lambda (key entity)
> > > (set-e-val entity (get-e-sval entity) newpath))
> > > ;;; OOPS! set-e-val is not defined, it's set-e-value
> > > db)
> > > newpath)
> >
> > My appologies here I was renaming functions and did not do enough
testing
> > before "publishing"...
>
> Never mind. The point was about the record size. With fixed record
> sizes, you would not need so much to "defrag" the file, because you
> could easily reuse empty ("deleted") records.
No agrument there but if I remember correctly you said gdbm had variable
length keys and values, not that I have ever seen gdbm or used it, so I was
kind off aiming for something like that.
> > [...]
> > 1. Although I shoe the enities' data more thanonce the actual data will
only
> > be stored once for the EID.
>
> Usually, entities are considered as a whole. Some function may access
> only some fields of the entities it processes, but most often we need
> to handle all the attributes of an entity at once. So, unless you
> have some very big fields (let's say: a hires picture of the person,
> recordings of his speaches, etc, in which case we would rather
> consider rather the big item as individual entities related to the
> person), it would be much more efficient to keep all the fields of an
> entity in the same record and load and save them at once.
>
> Which does not prevent you to add relations, be them hierarchical or
> not, but not necessarily embedded. It may be worthwhile to do it in
> separate files ("tables").
>
All true in a simple static world.
Consider the following senario (I'm going for the abstract here because I
can't think of a "real word" scenario right now).
A
B
C
K
L
M
E
F
D
G
The resulting records (in their own tables) would be:
A-BC
C-KLM
E-FDG
***The first letter being the key and the rest the attributes.
If I desided to change the "structure" after some time or because of some
obscure reason to the following
A
B
L
M
C
K
E
F
D
G
or had to accommodate new attributes like this
A
B
C
K
L
M
X
Y
E
F
D
Z
G
then I think my physical schema should work better.
------------
What am I getting at:
1. Programming and data structure flexibility
In my schema I would just change a relationships or add a
relationship and not have to bother about the physical storage at all, where
for records I would have to go and move data about.
2. Catering for temporal and/or varying data
I have no argument with the fact that in my schema I pay in
performance for flexibility. Whether the flexibility is really needed is
arguable. What comes to mind is having to pivot records (I'm thinking
SQL syntax here not programmatically) to work with a "whole entity" because
the "attributes" are variable or optional. Surely in those cases at least,
"records" and "tables" hinder more than they help.
A typical "real life" scenario would be a payroll. You have many
small "entities" like salary and other pay codes (on average between 100 and
250 numerical entities alone) that should actually be attributes of a
personnel member, but because they vary in number (and some times shape)
they can not or should not be made attributes. Each little entity being its
own record now kind of levels the playing field when it comes to storing the
physical data does it not?
The other problem in this scenario is that a lot of the data in a
payroll is temporal (is that the right word?) in nature. An example would be
the hours worked for February by a contract worker is very important this
month but wont feature in calculations next month. This type of
attribute/entity you would not want to tie down in the personnel record even
if you could because it would be a waste. Futher you cant abbon the February
hours completly either because of reporting issues so next month you have
to create a new record for the hours and so on.
------------
I don't know if all that made any sense to you at all, a lot of it is gut
feel, having had to work on such databases for some years ;o}
"Harag" <·········@psychedelic.co.za> writes:
> No agrument there but if I remember correctly you said gdbm had variable
> length keys and values, not that I have ever seen gdbm or used it, so I was
> kind off aiming for something like that.
Yes, but the variable-sized is managed inside fixed-sized.
Look for example how malloc is usually implemented, with "buckets",
arrays of blocks of same size. When you want to allocate x byte, you
get the address of a memory block that is actually 2^int(1+log2(x-1))
(when x is small enough; when it's much greater than a page size, it's
simply allocated rounding to the next page border).
Actually, it's a compromize betweeen managing each block sized to the
byte and having only one-size-fit-all. If you allocate exactly the
number of bytes needed, when it's freed, it may be hard to find a
second request for the exact same number of bytes. That is, given a
request for y bytes, it's harder to find a free block with a matching
size, and to find an algorithm that will minimize the lost space.
When you have one size for all, of course you lose a lot more on each
request, with the gain that the algorithm is utterly simple: just give
the first free block (but of course, then there is a maximum over
which you can't allocate).
So you have to choose between the complexity of managing the free
space and the space lost on average on each allocation. Of course, if
you don't delete items (or rarely), you can forget the freed space,
and count on a defragment feature to clean up the garbage. It all
depends on what your requirements are.
--
__Pascal_Bourguignon__
http://www.informatimago.com/
"Pascal Bourguignon" <····@thalassa.informatimago.com> wrote in message
···················@thalassa.informatimago.com...
>
> "Harag" <·········@psychedelic.co.za> writes:
> > No agrument there but if I remember correctly you said gdbm had variable
> > length keys and values, not that I have ever seen gdbm or used it, so I
was
> > kind off aiming for something like that.
>
> Yes, but the variable-sized is managed inside fixed-sized.
>
Right I'm with you there. In the "record" senariao are you storing the data
in adjacent blocks or is there some "trick" to have it spread.
"Harag" <·········@psychedelic.co.za> writes:
> "Pascal Bourguignon" <····@thalassa.informatimago.com> wrote in message
> ···················@thalassa.informatimago.com...
> >
> > "Harag" <·········@psychedelic.co.za> writes:
> > > No agrument there but if I remember correctly you said gdbm had variable
> > > length keys and values, not that I have ever seen gdbm or used it, so I
> was
> > > kind off aiming for something like that.
> >
> > Yes, but the variable-sized is managed inside fixed-sized.
> >
>
> Right I'm with you there. In the "record" senariao are you storing the data
> in adjacent blocks or is there some "trick" to have it spread.
Of course, it works better in the cases where there is an upper bound
to the size of your variable-sized stuff, and when most of our stuff
is all about the same size. In these conditions you can set easily one
fixed record size and don't lose too much when the record is not
completely filled.
Otherwise, either you may segment your data in subrecords if that
would help the upper bound and the deviation of the sizes, or you have
to manage the storage and you can effectively spread big data over
several blocks not necessarily adjacent, like it's done for files in
filesystems.
--
__Pascal_Bourguignon__
http://www.informatimago.com/
On 02 Nov 2003 17:06:26 +0100, Pascal Bourguignon wrote:
> access records based on keys, you can add an index layer. Here, you
> may consider using one of the db libraries provided in unix (db, dbm,
> gdbm), thru UFFI. They essentially implement a hash table, a map (key
-> value).
FWIW, I have a Berkeley DB interface at
http://users.actrix.co.nz/mycroft/cl.html
--
Cogito ergo I'm right and you're wrong. -- Blair Houghton
(setq reply-to
(concatenate 'string "Paul Foley " "<mycroft" '(··@) "actrix.gen.nz>"))
Paul Foley <···@below.invalid> writes:
> On 02 Nov 2003 17:06:26 +0100, Pascal Bourguignon wrote:
>
> > access records based on keys, you can add an index layer. Here, you
> > may consider using one of the db libraries provided in unix (db, dbm,
> > gdbm), thru UFFI. They essentially implement a hash table, a map (key
> -> value).
>
> FWIW, I have a Berkeley DB interface at
> http://users.actrix.co.nz/mycroft/cl.html
Yep. I should have refered the OP to http://www.cliki.org/
where it's referenced.
--
__Pascal_Bourguignon__
http://www.informatimago.com/
Hi Harag,
> Ok this is a question you must get a lot but from my archive searches on
> google their are'nt any awnswers that satisfy or they cause more
> questions.
>
> Reading and writing to a file in lisp (on windows) I get right with :io
> and :overwrite but it seems to be limited to inserting new data and I
> need to change data (please dont tell me to write it to a temporary file
> and then rename). I must add that I have only played with the string
> type 'reads' and 'writes'.
Sounds like you should be using :if-exists :supersede instead of
:if-exists :overwrite. Check out the documentation for OPEN in the
HyperSpec.
Regards,
Adam
Thats where I started but superede destroys the data see below:
Hyperspec === The existing file is superseded; that is, a new file with the
same name as the old one is created. If possible, the implementation should
not destroy the old file until the new stream is closed.
CL-USER 45 : 1 > (setf path (make-pathname :name "eish2.iesh"))
#P"eish2.iesh"
CL-USER 46 : 1 > (setf str (open path :direction :io :if-exists :supersede))
#<STREAM::LATIN-1-FILE-STREAM C:\Documents and Settings\Phil\eish2.iesh>
CL-USER 47 : 1 > (read-line str)
Error: End of file while reading stream #<STREAM::LATIN-1-FILE-STREAM
C:\Documents and Settings\Phil\eish2.iesh>.
1 (abort) Return to level 1.
CL-USER 48 : 2 > (close str)
T
When I check my file after this its empty. Am I doing something stupid??
"Adam Warner" <······@consulting.net.nz> wrote in message
··································@consulting.net.nz...
> Hi Harag,
>
> > Ok this is a question you must get a lot but from my archive searches on
> > google their are'nt any awnswers that satisfy or they cause more
> > questions.
> >
> > Reading and writing to a file in lisp (on windows) I get right with :io
> > and :overwrite but it seems to be limited to inserting new data and I
> > need to change data (please dont tell me to write it to a temporary file
> > and then rename). I must add that I have only played with the string
> > type 'reads' and 'writes'.
>
> Sounds like you should be using :if-exists :supersede instead of
> :if-exists :overwrite. Check out the documentation for OPEN in the
> HyperSpec.
>
> Regards,
> Adam
Hi Harag,
> Thats where I started but supersede destroys the data see below:
Yes :supersede will replace the original file. So I misunderstood your
intentions.
> Hyperspec === The existing file is superseded; that is, a new file with the
> same name as the old one is created. If possible, the implementation should
> not destroy the old file until the new stream is closed.
>
> CL-USER 45 : 1 > (setf path (make-pathname :name "eish2.iesh"))
> #P"eish2.iesh"
>
> CL-USER 46 : 1 > (setf str (open path :direction :io :if-exists :supersede))
> #<STREAM::LATIN-1-FILE-STREAM C:\Documents and Settings\Phil\eish2.iesh>
>
> CL-USER 47 : 1 > (read-line str)
> Error: End of file while reading stream #<STREAM::LATIN-1-FILE-STREAM
> C:\Documents and Settings\Phil\eish2.iesh>.
> 1 (abort) Return to level 1.
>
> CL-USER 48 : 2 > (close str)
> T
>
> When I check my file after this its empty. Am I doing something stupid??
(Top posting)
I think I understand the problem and no you will not be able to insert
data. Let's create a file to operate upon:
(with-open-file (stream "test" :direction :output :if-exists :supersede)
(format stream "~S ~S ~S~%" 'these 'are 'symbols))
The file "test" contains these characters:
these are symbols
Now let's modify test, first using overwrite:
(with-open-file (stream "test" :direction :io :if-exists :overwrite)
(prin1 'a-new-symbol stream))
This works as expected, overwriting the start of the file:
a-new-symbolmbols
Then append:
(with-open-file (stream "test" :direction :io :if-exists :append)
(prin1 'a-new-symbol stream))
Again we have the expected result:
a-new-symbolmbols
a-new-symbol
Let's try some bidirectional operations on the original file:
(with-open-file (stream "test" :direction :io :if-exists :overwrite)
(print (read stream)) (prin1 'insert stream))
The symbol `these' was read and then the text `insert' written to
the stream.
The resulting output is(!!!):
these are symbols
insert
Which seems very odd to me as the file pointer should have been
positioned after `these'. Everyone, are CMUCL (and SBCL) broken?
Here's the expected result that I get with CLISP:
these insertmbols
There appears to be no way (at least portably) to insert data into the
stream. If you want to do that it appears you will have to have one file
that you :input from and one file that you :output to.
Regards,
Adam
Thanks for your help Adam.
The result in LispWorks is the same as in CLISP my original file had white
spacesand newlines in it that was hiding the overwriting, doing a simpler
example, converting your example to read-line and write-line, showed up my
error. The other thing to notice/remember is that when it overwrites
'newline' counts for 2 characters ...;o}
Is there any way to reset the position of the 'read' on the stream, I see
you can check the position file-position?
Regards
Harag
"Harag" <·········@psychedelic.co.za> writes:
> Is there any way to reset the position of the 'read' on the stream, I see
> you can check the position file-position?
From the hyperspec:
| Function FILE-POSITION
|
| Syntax:
| file-position stream => position
| file-position stream position-spec => success-p
| Description:
|
| Returns or changes the current position within a stream.
|
| When position-spec is not supplied, file-position returns the current
| file position in the stream, or nil if this cannot be determined.
|
| When position-spec is supplied, the file position in stream is set to
| that file position (if possible). file-position returns true if the
| repositioning is performed successfully, or false if it is not.
tah
missed that last part in the reading my lisp 'bible' (the book i use) is not
as clear on the issue as the spec ;o}
"Henrik Motakef" <············@henrik-motakef.de> wrote in message
···················@pokey.internal.henrik-motakef.de...
> "Harag" <·········@psychedelic.co.za> writes:
>
> > Is there any way to reset the position of the 'read' on the stream, I
see
> > you can check the position file-position?
>
> From the hyperspec:
>
> | Function FILE-POSITION
> |
> | Syntax:
> | file-position stream => position
> | file-position stream position-spec => success-p
>
> | Description:
> |
> | Returns or changes the current position within a stream.
> |
> | When position-spec is not supplied, file-position returns the current
> | file position in the stream, or nil if this cannot be determined.
> |
> | When position-spec is supplied, the file position in stream is set to
> | that file position (if possible). file-position returns true if the
> | repositioning is performed successfully, or false if it is not.
"Harag" <·········@psychedelic.co.za> writes:
> Is there any way to reset the position of the 'read' on the stream,
> I see you can check the position file-position?
If you give file-position a second argument, it will try to set the
file pointer to that position.
--
Frode Vatvedt Fjeld
Hi Harag,
> Thanks for your help Adam.
>
> The result in LispWorks is the same as in CLISP my original file had white
> spaces and newlines in it that was hiding the overwriting, doing a
> simpler example, converting your example to read-line and write-line,
> showed up my error. The other thing to notice/remember is that when it
> overwrites 'newline' counts for 2 characters ...;o}
>
> Is there any way to reset the position of the 'read' on the stream, I
> see you can check the position file-position?
Usually (file-position <stream name> <integer position>). Refer again to
the HyperSpec definition of file-position and file position designator
(you'll see you can also use the symbols :start and :end).
If "test" contains the string 0123456789 then observe these results:
(with-open-file (stream "test" :direction :io :if-exists :overwrite)
(print (read-char stream))
(file-position stream 5)
(print (read-char stream))
(write-char #\a stream)
(file-position stream 6)
(print (read-char stream)))
Output is:
#\0
#\5
#\a
I read a character from the stream and printed it out. I then set the file
position to 5 and read+printed the character at that position. This
incremented the file position by one so when I wrote the character #\a to
the stream it was at file position 6 and the file position incremented
again to 7. I then reset the file position to 6 so the newly written
character would be read again.
Regards,
Adam
PS: You may have noticed I always use the WITH-OPEN-FILE macro. I've never
had a reason to use OPEN explicitly.
> (with-open-file (stream "test" :direction :output :if-exists :supersede)
> (format stream "~S ~S ~S~%" 'these 'are 'symbols))
>
> The file "test" contains these characters:
> these are symbols
>
> Now let's modify test, first using overwrite:
>
> (with-open-file (stream "test" :direction :io :if-exists :overwrite)
> (prin1 'a-new-symbol stream))
>
> This works as expected, overwriting the start of the file:
> a-new-symbolmbols
>
> Then append:
> (with-open-file (stream "test" :direction :io :if-exists :append)
> (prin1 'a-new-symbol stream))
>
> Again we have the expected result:
> a-new-symbolmbols
> a-new-symbol
>
> Let's try some bidirectional operations on the original file:
> (with-open-file (stream "test" :direction :io :if-exists :overwrite)
> (print (read stream)) (prin1 'insert stream))
>
> The symbol `these' was read and then the text `insert' written to
> the stream.
>
> The resulting output is(!!!):
> these are symbols
> insert
>
> Which seems very odd to me as the file pointer should have been
> positioned after `these'.
This is now confirmed fixed in CMUCL CVS (compare revisions 1.72 and 1.74
of code/fd-stream.lisp).
Regards,
Adam