A broad question on data driven programming style

From: AJ
Subject: A broad question on data driven programming style
Date: Sun, 27 Apr 2008 01:26:29 +0000
Message-ID: <57238c85-0f77-4c8a-8dba-f091ebd1f21c@w4g2000prd.googlegroups.com>

For my current project, I implemented an ETL framework using Java. I
tried to use data driven programing style as much as possible. Two
ways to store the "process driving" data and rules were immediately
apparent. One was using property files and other using relational db
tables. We used a tag to denote each source of data and how the data
from that source should be staged and transformed.

To give some pertinent background info, we relied heavily on an RDBMS
-
(i) to store the actual data
(ii) to store the business rules and "process driver" data
(iii) to even carry out the transformations on data. My Java code
would query the db table to get the rules and then dynamically
construct SQL queries to be executed on the db to manipulate/transform
business data. I know people these days don't like using a database
system to more than just store data, but it has saved me a lot of time
and work since the database could easily do to the data what I'll have
to write loads of classes for. I did try to ensure that people did not
use stored procedures and such unless they were absolutely necessary
for complex transformations. Mostly a system working off dynamically
constructed SQL with data never leaving the database after being
staged and before being output. We manage the complexity on db using
indexes and such. It works fine for what we need to do.

Though property files are fine, I preferred db tables because I could
query or alter the properties a lot faster. And I prefer SQL to
regexps. But db tables have their limitations as well. For instance,
when I wanted to store properties for Output process, I had to create
4 or more tables. One table for storing general output information
like output type, names f files, etc, one to store properties for
"delimited files", one for "fixed length files", one for each xml type
of output and so on. These tables now have a sort of primary key
relationship and the rules/properties to carry out one output process
are scattered.

I wonder if I made the right design decisions and would like to know
if I am missing something. I am looking for suggestions for
improvement.

My two questions:
1) What are your comments on using dynamically constructed SQL to do
data transformations rather than having a java code do the work?
2) How else can I store properties, rules and "process driving" data?
I would like to see if I can get around the limitations of db tables.
XML comes to mind. But using a db table seems to have far less
overhead and a lot cleaner. I would love to use s-expressions, but
alas, we're not using common lisp.

I would love to hear on how people here have dealt with storing rules
and properties for ETL.

Re: A broad question on data driven programming style Robert Maas, http://tinyurl.com/uh3t
- Re: A broad question on data driven programming style AJ
  - Re: A broad question on data driven programming style Kaz Kylheku
    - Re: A broad question on data driven programming style Scott Burson
Re: A broad question on data driven programming style Alex Mizrahi
- Re: A broad question on data driven programming style AJ

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: A broad question on data driven programming style
Date: Sun, 27 Apr 2008 03:53:38 +0000
Message-ID: <rem-2008apr26-013@yahoo.com>

> From: AJ <········@gmail.com>
> For my current project, I implemented an ETL framework using Java.

Why are you posting this to comp.lang.lisp, instead of to
comp.lang.java.programmer, or even to comp.programming?
Do you believe Lisp programmers are more intelligent, and thus more
likely to be able to answer your question, than Java programmers?

> I would love to use s-expressions, but alas, we're not using common lisp.

It's trivial to write a parser (in Java) for s-expressions, of the
atomic types of data you intend to parse are very limited, for
example only strings, integers, and symbols.

  /* Returns a nested structure of Vector objects, containing nested
      Vector objects and/or objects of type String Integer and MySymbol.
     In a trivial case, no parens, it'll just return a String or Integer
      or MySymbol object not inside any Vector object. */
  public static Object parseSepxr(String str) {
    Object tokens = tokenizer(str);
    Object result = collectTokens(tokens);
    return result;
    }

Class MySymbol would use a static hashtable, called by the
constructor, to canonicalize all parsed symbol names. Note that you
probably don't need fullfledged Lisp-style symbols with value cell
and function cell and property list. Something like a keyword
symbol which has nothing except a print name should suffice for
your needs.

tokenizer(str) would skip whitespace, then dispatch on the next character:
- digit -> parse integer
- alphanumeric -> parse symbol, call constructor for MySymbol
- quotemark -> parse string
add the resultant object onto a Vector or something like that.
All that in a loop that exited when it ran into end-of-string.

You can probably write the whole thing in one workday, the
MySymbol class and the tokenizer and the stack-based token-collector.


The inverse operation, *generating* s-expression output, given an
internal structure of nested Vector objects, is by comparison
trivial, assuming you don't need the result prettyprinted.
Use class StringBuffer to accumulate all the parts within a Vector,
with parens around outside and spaces between pieces.

From: AJ
Subject: Re: A broad question on data driven programming style
Date: Sun, 27 Apr 2008 06:44:43 +0000
Message-ID: <116e4010-f575-465f-875b-f97459b7baeb@u12g2000prd.googlegroups.com>

> Why are you posting this to comp.lang.lisp, instead of to
> comp.lang.java.programmer, or even to comp.programming?
> Do you believe Lisp programmers are more intelligent, and thus more
> likely to be able to answer your question, than Java programmers?

Pretty much. I'd rather have no responses than noise and headache. And
besides, I consider cll my stamping ground though I post only
occasionally.

> > I would love to use s-expressions, but alas, we're not using common lisp.
>
> It's trivial to write a parser (in Java) for s-expressions, of the
> atomic types of data you intend to parse are very limited, for
> example only strings, integers, and symbols.

Thanks for your suggestions and ideas. I think they'd work well,  but
I cannot yet seen an advantage to using s-exps with Java.

If I were to use CL, I'd have plenty of in built tools to work with s-
exps. I could perhaps use the car of the s-exp to denote the
transformations. I could have a function with the same name in lisp.
My data s-exps would actually then be code as well.

I could use reflection and such in Java, but I'd rather not.  I am
quite content to keep my Java code straightforward and simple. Every
time I wanted to do something fancy with Java, there was pain.

I actually cannot use s-exps because I'm not the only person using the
framework I implemented. They'd be so alien to my team-mates.

From: Kaz Kylheku
Subject: Re: A broad question on data driven programming style
Date: Sun, 27 Apr 2008 17:11:37 +0000
Message-ID: <708e0ae6-5304-44a3-a0c8-b7bb5395c6c0@w1g2000prd.googlegroups.com>

On Apr 26, 11:44 pm, AJ <········@gmail.com> wrote:
> I actually cannot use s-exps because I'm not the only person using the
> framework I implemented. They'd be so alien to my team-mates.

If a simple sequence of space-delimited tokens placed between
parentheses, is ``alien'' to your coworkers, maybe they should contact
their mother ship to beam them up and take them to their home planet.

From: Scott Burson
Subject: Re: A broad question on data driven programming style
Date: Sun, 27 Apr 2008 18:06:52 +0000
Message-ID: <9b85af6b-1eef-4a27-942f-9c28037109c5@b5g2000pri.googlegroups.com>

On Apr 27, 10:11 am, Kaz Kylheku <········@gmail.com> wrote:
> On Apr 26, 11:44 pm, AJ <········@gmail.com> wrote:
>
> > I actually cannot use s-exps because I'm not the only person using the
> > framework I implemented. They'd be so alien to my team-mates.
>
> If a simple sequence of space-delimited tokens placed between
> parentheses, is ``alien'' to your coworkers, maybe they should contact
> their mother ship to beam them up and take them to their home planet.

Ha!  Indeed :)

Or, point them at Slava Akhmechet's excellent intro to s-expressions
for Java/XML users:

http://www.defmacro.org/ramblings/lisp.html

-- Scott

From: Alex Mizrahi
Subject: Re: A broad question on data driven programming style
Date: Sun, 27 Apr 2008 16:40:42 +0000
Message-ID: <4814ac8d$0$90266$14726298@news.sunsite.dk>

 A> 2) How else can I store properties, rules and "process driving" data?
 A> I would like to see if I can get around the limitations of db tables.
 A> XML comes to mind. But using a db table seems to have far less
 A> overhead and a lot cleaner. I would love to use s-expressions, but
 A> alas, we're not using common lisp.

you can run some Scheme or even CL implementations on JVM, to do some parts 
of work, or just whole project.
so "i'm using Java" is not an excuse anymore.

From: AJ
Subject: Re: A broad question on data driven programming style
Date: Sun, 27 Apr 2008 21:57:42 +0000
Message-ID: <da234a36-47b2-403e-8e28-aca081ed904b@v26g2000prm.googlegroups.com>

Hahaha.
No retreat, no surrender. Just use Common Lisp (TM).

Thanks for the pointers guys. I'll try to push in some s-exps. Lemme
see what comes out of it.