(Not)XQuery

From: Matthew D Swank
Subject: (Not)XQuery
Date: Fri, 07 Nov 2008 16:30:56 +0000
Message-ID: <43_Qk.5154$J21.2343@newsfe01.iad>

I have developed some simple tools for working with hierarchical data 
that basically operate on nested association lists. 

For example, rs-ref is function that operates on a record-set*, returns a 
list of record sets, and accepts :key and :test arguments for filtering.

A sample query might look like:
          (rs-ref 
            (as-record-set (root
                            (record (sub-type1 (tagged-val 1))
                                    (sub-type2 (tagged-val 2)))
                            (record (sub-type1 (tagged-val 3))
                                    (sub-type2 (tagged-val 4)))))
            '(record sub-type1)
            :key (lambda (rs) (rs-member rs 'tagged-val))
            :test (lambda (val) (> val 2)))

=>
(#<[TREE-NODE|
    :TYPE SUB-TYPE1 :RECORDS (#<[TREE-LEAF|:TYPE TAGGED-VAL :VALUE 3]>)]>)

The implementation is not terribly optimized.  Usually I read some semi-
structured file into a record-set, then write out some appropriately 
massaged query data.  It is usually I/O bound, and if I'm consing a lot I 
can provide a context for memoization:

(with-rs-cache
  <lots of queries and output>)

However, I am starting to do more ad hoc joins on data from different 
sources, and it would be nice if I could make this more general and 
efficient.

Any ideas?

Matt 

*A record-set is anything that implements the record-set protocol, but 
currently I have implemented only one set of concrete types.

Re: (Not)XQuery Madhu
- Re: (Not)XQuery Matthew D Swank

From: Madhu
Subject: Re: (Not)XQuery
Date: Sat, 08 Nov 2008 18:48:39 +0000
Message-ID: <m3wsfeuk8o.fsf@moon.robolove.meer.net>

* Matthew D Swank <···················@newsfe01.iad> :
Wrote on Fri, 07 Nov 2008 16:30:56 GMT:

| I have developed some simple tools for working with hierarchical data
| that basically operate on nested association lists.
|
| For example, rs-ref is function that operates on a record-set*, returns a
| list of record sets, and accepts :key and :test arguments for filtering.
|
| A sample query might look like:
|           (rs-ref
|             (as-record-set (root
|                             (record (sub-type1 (tagged-val 1))
|                                     (sub-type2 (tagged-val 2)))
|                             (record (sub-type1 (tagged-val 3))
|                                     (sub-type2 (tagged-val 4)))))
|             '(record sub-type1)
|             :key (lambda (rs) (rs-member rs 'tagged-val))
|             :test (lambda (val) (> val 2)))
|
| =>
| (#<[TREE-NODE|
|     :TYPE SUB-TYPE1 :RECORDS (#<[TREE-LEAF|:TYPE TAGGED-VAL :VALUE 3]>)]>)
|
| The implementation is not terribly optimized.  Usually I read some semi-
| structured file into a record-set, then write out some appropriately
| massaged query data.  It is usually I/O bound, and if I'm consing a lot I
| can provide a context for memoization:
|
| However, I am starting to do more ad hoc joins on data from different
| sources, and it would be nice if I could make this more general and
| efficient.
|
| Any ideas?

Perhaps an existing optimized pattern matcher would work. Here's how the
above example might look in Sunil Mishra's Meta-Circular Pattern
Matcher, MCPat <URL:http://www.sfmishras.com/smishra/mcpat/>

;; Without using :constraints matching, 

(defpackage "MCPAT-USER"
  (:use "CL" "MCPAT"))
(in-package "MCPAT-USER")

(defvar *sexp* '(root
                 (record
                  (sub-type1 (tagged-val 1))
                  (sub-type2 (tagged-val 2)))
                 (record
                  (sub-type1 (tagged-val 3))
                  (sub-type2 (tagged-val 4)))))


(in-ruleset :MSWANK-RS encoded-ruleset)
(clear-ruleset :MSWANK-RS)

(defrule root-matcher-42
    ((root . ?records))
  `(:tree-node :type sub-type1 :records ,(mapcan 'solve ?records)))

(defrule record-matcher-42
    ((record . ?subtypes))
  (mapcan (lambda (subtype) (solve `(subtype ,subtype))) ?subtypes)))

(defrule subtype-matcher-42
    ((subtype (?subtype (tagged-val ?val))))
  (when (and (eq ?subtype 'sub-type1)
             (> ?val 2))
    `((:tree-leaf tagged-val ,?val))))

(solve *sexp*)

=> (:TREE-NODE :TYPE SUB-TYPE1 :RECORDS ((:TREE-LEAF TAGGED-VAL 3)))

--
Madhu

From: Matthew D Swank
Subject: Re: (Not)XQuery
Date: Sun, 09 Nov 2008 00:16:23 +0000
Message-ID: <rZpRk.542$2l5.461@newsfe01.iad>

On Sun, 09 Nov 2008 00:18:39 +0530, Madhu wrote:

> * Matthew D Swank <···················@newsfe01.iad> : Wrote on Fri, 07
> Nov 2008 16:30:56 GMT:
> 
> | I have developed some simple tools for working with hierarchical data
> | that basically operate on nested association lists. 
...
> | However, I am starting to do more ad hoc joins on data from different
> | sources, and it would be nice if I could make this more general and |
> efficient.
> |
> | Any ideas?
> 
> Perhaps an existing optimized pattern matcher would work. Here's how the
> above example might look in Sunil Mishra's Meta-Circular Pattern
> Matcher, MCPat <URL:http://www.sfmishras.com/smishra/mcpat/>
> 
> ;; Without using :constraints matching,
> 
> (defpackage "MCPAT-USER"
>   (:use "CL" "MCPAT"))
> (in-package "MCPAT-USER")
> 
> (defvar *sexp* '(root
>                  (record
>                   (sub-type1 (tagged-val 1))
>                   (sub-type2 (tagged-val 2)))
>                  (record
>                   (sub-type1 (tagged-val 3))
>                   (sub-type2 (tagged-val 4)))))
> 
> 
> (in-ruleset :MSWANK-RS encoded-ruleset) (clear-ruleset :MSWANK-RS)
> 
> (defrule root-matcher-42
>     ((root . ?records))
>   `(:tree-node :type sub-type1 :records ,(mapcan 'solve ?records)))
> 
> (defrule record-matcher-42
>     ((record . ?subtypes))
>   (mapcan (lambda (subtype) (solve `(subtype ,subtype))) ?subtypes)))
> 
> (defrule subtype-matcher-42
>     ((subtype (?subtype (tagged-val ?val))))
>   (when (and (eq ?subtype 'sub-type1)
>              (> ?val 2))
>     `((:tree-leaf tagged-val ,?val))))
> 
> (solve *sexp*)
> 
> => (:TREE-NODE :TYPE SUB-TYPE1 :RECORDS ((:TREE-LEAF TAGGED-VAL 3)))

Interesting, if a bit verbose.  It's not immediately evident to me how 
solve based one more than one tree.  I suppose we're getting in to logic-
programming/db-query-engine land.

If look at it as a representation problem, there are things I like about 
nested lists.  
	1. functional construction
	2. preservation of order
	3. preservation of multiplicity

However, naive construction of result sets requires lots of intermediate 
consing, and joins require linear search. 

Using the complete cached representation as _the_ representation would 
speed queries up, but not joins. And it would be very expensive (I 
suppose this is like the power set?)
Ex.
(root
 (record (sub-type1 (tagged-val 1))
         (sub-type2 (tagged-val 2)))
 (record (sub-type1 (tagged-val 3))
         (sub-type2 (tagged-val 4))))      

becomes (as a map):

{ nil, 
  ((root
   (record (sub-type1 (tagged-val 1))
           (sub-type2 (tagged-val 2)))
   (record (sub-type1 (tagged-val 3))
           (sub-type2 (tagged-val 4)))))

  (record),
  ((record (sub-type1 (tagged-val 1))
           (sub-type2 (tagged-val 2)))
   (record (sub-type1 (tagged-val 3))
           (sub-type2 (tagged-val 4))))

  (record sub-type1),
  ((sub-type1 (tagged-val 1))
   (sub-type1 (tagged-val 3)))

... etc

} 

I know a lot of this is obvious, but I am thinking out loud and it's 
being caught by a net.

I want to be fast lazy, and functional; is that so hard? :)

Matt

-- 
question = ( to ) ? be : ! be;
		-- Wm. Shakespeare