I have developed some simple tools for working with hierarchical data
that basically operate on nested association lists.
For example, rs-ref is function that operates on a record-set*, returns a
list of record sets, and accepts :key and :test arguments for filtering.
A sample query might look like:
(rs-ref
(as-record-set (root
(record (sub-type1 (tagged-val 1))
(sub-type2 (tagged-val 2)))
(record (sub-type1 (tagged-val 3))
(sub-type2 (tagged-val 4)))))
'(record sub-type1)
:key (lambda (rs) (rs-member rs 'tagged-val))
:test (lambda (val) (> val 2)))
=>
(#<[TREE-NODE|
:TYPE SUB-TYPE1 :RECORDS (#<[TREE-LEAF|:TYPE TAGGED-VAL :VALUE 3]>)]>)
The implementation is not terribly optimized. Usually I read some semi-
structured file into a record-set, then write out some appropriately
massaged query data. It is usually I/O bound, and if I'm consing a lot I
can provide a context for memoization:
(with-rs-cache
<lots of queries and output>)
However, I am starting to do more ad hoc joins on data from different
sources, and it would be nice if I could make this more general and
efficient.
Any ideas?
Matt
*A record-set is anything that implements the record-set protocol, but
currently I have implemented only one set of concrete types.
* Matthew D Swank <···················@newsfe01.iad> :
Wrote on Fri, 07 Nov 2008 16:30:56 GMT:
| I have developed some simple tools for working with hierarchical data
| that basically operate on nested association lists.
|
| For example, rs-ref is function that operates on a record-set*, returns a
| list of record sets, and accepts :key and :test arguments for filtering.
|
| A sample query might look like:
| (rs-ref
| (as-record-set (root
| (record (sub-type1 (tagged-val 1))
| (sub-type2 (tagged-val 2)))
| (record (sub-type1 (tagged-val 3))
| (sub-type2 (tagged-val 4)))))
| '(record sub-type1)
| :key (lambda (rs) (rs-member rs 'tagged-val))
| :test (lambda (val) (> val 2)))
|
| =>
| (#<[TREE-NODE|
| :TYPE SUB-TYPE1 :RECORDS (#<[TREE-LEAF|:TYPE TAGGED-VAL :VALUE 3]>)]>)
|
| The implementation is not terribly optimized. Usually I read some semi-
| structured file into a record-set, then write out some appropriately
| massaged query data. It is usually I/O bound, and if I'm consing a lot I
| can provide a context for memoization:
|
| However, I am starting to do more ad hoc joins on data from different
| sources, and it would be nice if I could make this more general and
| efficient.
|
| Any ideas?
Perhaps an existing optimized pattern matcher would work. Here's how the
above example might look in Sunil Mishra's Meta-Circular Pattern
Matcher, MCPat <URL:http://www.sfmishras.com/smishra/mcpat/>
;; Without using :constraints matching,
(defpackage "MCPAT-USER"
(:use "CL" "MCPAT"))
(in-package "MCPAT-USER")
(defvar *sexp* '(root
(record
(sub-type1 (tagged-val 1))
(sub-type2 (tagged-val 2)))
(record
(sub-type1 (tagged-val 3))
(sub-type2 (tagged-val 4)))))
(in-ruleset :MSWANK-RS encoded-ruleset)
(clear-ruleset :MSWANK-RS)
(defrule root-matcher-42
((root . ?records))
`(:tree-node :type sub-type1 :records ,(mapcan 'solve ?records)))
(defrule record-matcher-42
((record . ?subtypes))
(mapcan (lambda (subtype) (solve `(subtype ,subtype))) ?subtypes)))
(defrule subtype-matcher-42
((subtype (?subtype (tagged-val ?val))))
(when (and (eq ?subtype 'sub-type1)
(> ?val 2))
`((:tree-leaf tagged-val ,?val))))
(solve *sexp*)
=> (:TREE-NODE :TYPE SUB-TYPE1 :RECORDS ((:TREE-LEAF TAGGED-VAL 3)))
--
Madhu
On Sun, 09 Nov 2008 00:18:39 +0530, Madhu wrote:
> * Matthew D Swank <···················@newsfe01.iad> : Wrote on Fri, 07
> Nov 2008 16:30:56 GMT:
>
> | I have developed some simple tools for working with hierarchical data
> | that basically operate on nested association lists.
...
> | However, I am starting to do more ad hoc joins on data from different
> | sources, and it would be nice if I could make this more general and |
> efficient.
> |
> | Any ideas?
>
> Perhaps an existing optimized pattern matcher would work. Here's how the
> above example might look in Sunil Mishra's Meta-Circular Pattern
> Matcher, MCPat <URL:http://www.sfmishras.com/smishra/mcpat/>
>
> ;; Without using :constraints matching,
>
> (defpackage "MCPAT-USER"
> (:use "CL" "MCPAT"))
> (in-package "MCPAT-USER")
>
> (defvar *sexp* '(root
> (record
> (sub-type1 (tagged-val 1))
> (sub-type2 (tagged-val 2)))
> (record
> (sub-type1 (tagged-val 3))
> (sub-type2 (tagged-val 4)))))
>
>
> (in-ruleset :MSWANK-RS encoded-ruleset) (clear-ruleset :MSWANK-RS)
>
> (defrule root-matcher-42
> ((root . ?records))
> `(:tree-node :type sub-type1 :records ,(mapcan 'solve ?records)))
>
> (defrule record-matcher-42
> ((record . ?subtypes))
> (mapcan (lambda (subtype) (solve `(subtype ,subtype))) ?subtypes)))
>
> (defrule subtype-matcher-42
> ((subtype (?subtype (tagged-val ?val))))
> (when (and (eq ?subtype 'sub-type1)
> (> ?val 2))
> `((:tree-leaf tagged-val ,?val))))
>
> (solve *sexp*)
>
> => (:TREE-NODE :TYPE SUB-TYPE1 :RECORDS ((:TREE-LEAF TAGGED-VAL 3)))
Interesting, if a bit verbose. It's not immediately evident to me how
solve based one more than one tree. I suppose we're getting in to logic-
programming/db-query-engine land.
If look at it as a representation problem, there are things I like about
nested lists.
1. functional construction
2. preservation of order
3. preservation of multiplicity
However, naive construction of result sets requires lots of intermediate
consing, and joins require linear search.
Using the complete cached representation as _the_ representation would
speed queries up, but not joins. And it would be very expensive (I
suppose this is like the power set?)
Ex.
(root
(record (sub-type1 (tagged-val 1))
(sub-type2 (tagged-val 2)))
(record (sub-type1 (tagged-val 3))
(sub-type2 (tagged-val 4))))
becomes (as a map):
{ nil,
((root
(record (sub-type1 (tagged-val 1))
(sub-type2 (tagged-val 2)))
(record (sub-type1 (tagged-val 3))
(sub-type2 (tagged-val 4)))))
(record),
((record (sub-type1 (tagged-val 1))
(sub-type2 (tagged-val 2)))
(record (sub-type1 (tagged-val 3))
(sub-type2 (tagged-val 4))))
(record sub-type1),
((sub-type1 (tagged-val 1))
(sub-type1 (tagged-val 3)))
... etc
}
I know a lot of this is obvious, but I am thinking out loud and it's
being caught by a net.
I want to be fast lazy, and functional; is that so hard? :)
Matt
--
question = ( to ) ? be : ! be;
-- Wm. Shakespeare