Choice of data structures

From: Jon Boone
Subject: Choice of data structures
Date: Sun, 26 Sep 2004 16:59:21 +0000
Message-ID: <BD7C6DA9.371A8%ipmonger@comcast.net>

Folks,

  I'm looking for some insight into choosing appropriate data structures for
a program that I'm working on.  Right now it's in the "design" stage where I
am working out the different parts of the program and how they will work
together.

  The fundamental data that I will work with is a series of variables that
are entered on a periodic basis (perhaps as often as once a day).  Not all
variables will receive data daily.

  There are fundamentally two things I will use the data for:

  1.  Displaying textual/graphical output to the user to provide a means for
      them to easily determine the data trends.  Add on modules will color
      code the data outputs to indicate important analytical properties.

      In typical usage, I imagine that not all variables will be graphed
      simultaneously, although I intend to support that.

  2.  Perform statistical testing on the data to highlight correlations that
      that exist (and that the user might not think to display graphically).

  If I were programming this in C (or perl), I'd create a hash to store each
variable, keyed by the date that the data was collected.  So, for 15
variables, I'd have 15 different hashes, all keyed the same.

  If I were trying to do this in an OOP style (with Java or C++), I'd
probably use some generic sequence data type (and implement it as a hash),
but otherwise still use 15 different sequences for 15 different variables.

  I believe that Common Lisp can support either of these styles without a
problem.  My question is more oriented toward soliciting ideas that I have
not thought of in terms of ordering the data.

  Is there a better way to order the data in general [programming language
independent]?

  Is there a more powerful way to order the data that is reasonable to do in
Lisp but that I haven't thought of?


--jon

Re: Choice of data structures Rahul Jain
- Re: Choice of data structures Jon Boone
Re: Choice of data structures Mark McConnell
- Re: Choice of data structures Jon Boone
  - Re: Choice of data structures Will Hartung
    - Re: Choice of data structures Jon Boone
      - Re: Choice of data structures Ron Garret

From: Rahul Jain
Subject: Re: Choice of data structures
Date: Sun, 26 Sep 2004 20:22:47 +0000
Message-ID: <874qlkwzjc.fsf@nyct.net>

Jon Boone <········@comcast.net> writes:

>   If I were programming this in C (or perl), I'd create a hash to store each
> variable, keyed by the date that the data was collected.  So, for 15
> variables, I'd have 15 different hashes, all keyed the same.
>
>   If I were trying to do this in an OOP style (with Java or C++), I'd
> probably use some generic sequence data type (and implement it as a hash),
> but otherwise still use 15 different sequences for 15 different variables.
>
>   I believe that Common Lisp can support either of these styles without a
> problem.  My question is more oriented toward soliciting ideas that I have
> not thought of in terms of ordering the data.
>
>   Is there a better way to order the data in general [programming language
> independent]?

Why use hashes when what you really want is a vector of objects, each
with 15 different slots? Maybe CL makes this easier by allowing slots
to be unbound.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist

From: Jon Boone
Subject: Re: Choice of data structures
Date: Mon, 27 Sep 2004 00:10:52 +0000
Message-ID: <BD7CD2CC.37242%ipmonger@comcast.net>

On 2004-09-26 16:22, in article ··············@nyct.net, "Rahul Jain"
<·····@nyct.net> wrote:

> Jon Boone <········@comcast.net> writes:
> 
>>   If I were programming this in C (or perl), I'd create a hash to store each
>> variable, keyed by the date that the data was collected.  So, for 15
>> variables, I'd have 15 different hashes, all keyed the same.
>> 
>>   If I were trying to do this in an OOP style (with Java or C++), I'd
>> probably use some generic sequence data type (and implement it as a hash),
>> but otherwise still use 15 different sequences for 15 different variables.
>> 
>>   I believe that Common Lisp can support either of these styles without a
>> problem.  My question is more oriented toward soliciting ideas that I have
>> not thought of in terms of ordering the data.
>> 
>>   Is there a better way to order the data in general [programming language
>> independent]?
> 
> Why use hashes when what you really want is a vector of objects, each
> with 15 different slots? Maybe CL makes this easier by allowing slots
> to be unbound.

  I had thought of the idea of modeling the data samples as a "state of the
world" object which stored all relevant/available state for that time
instance and then keeping an collection of "SotW" objects.

  The main reason I didn't go with this idea was that analysis of the data
components would tend to happen for sequences of a particular variable x or
for sequences of two variables, x and y.  This made me question how the SotW
model was actually used in the program.

  It turns out that it's only used in two places:  data collection (which
would happen one SotW object at a time) and data presentation to the user
(which would happen in sequence for some range of SotW objects.)  But the
actual data crunching doesn't work with SotW per se -- just individual or
pairs of variables.

  Is this data driven design?  Or is it just premature optimization?

--jon

From: Mark McConnell
Subject: Re: Choice of data structures
Date: Mon, 27 Sep 2004 11:31:03 +0000
Message-ID: <d3aed052.0409270331.3dba1ade@posting.google.com>

Jon Boone <········@comcast.net> wrote in message news:<·······················@comcast.net>...
> Folks,
> 
>   I'm looking for some insight into choosing appropriate data structures for
> a program that I'm working on.  Right now it's in the "design" stage where I
> am working out the different parts of the program and how they will work
> together.
> 
>   The fundamental data that I will work with is a series of variables that
> are entered on a periodic basis (perhaps as often as once a day).  Not all
> variables will receive data daily.
> 
>   There are fundamentally two things I will use the data for:
> 
>   1.  Displaying textual/graphical output to the user to provide a means for
>       them to easily determine the data trends.  Add on modules will color
>       code the data outputs to indicate important analytical properties.
> 
>       In typical usage, I imagine that not all variables will be graphed
>       simultaneously, although I intend to support that.
> 
>   2.  Perform statistical testing on the data to highlight correlations that
>       that exist (and that the user might not think to display graphically).
> 
>   If I were programming this in C (or perl), I'd create a hash to store each
> variable, keyed by the date that the data was collected.  So, for 15
> variables, I'd have 15 different hashes, all keyed the same.
> 
>   If I were trying to do this in an OOP style (with Java or C++), I'd
> probably use some generic sequence data type (and implement it as a hash),
> but otherwise still use 15 different sequences for 15 different variables.

I'm wondering if you need the hashing.  If you need to do some
random-access operation, then sure, hashing is necessary.  But
random-access operations are not mentioned directly in 1 and 2.

You could store the time in each piece of data, then use binary search
to find ranges of data defined by time.  Put the data into the generic
sequences (Java's ArrayList, Lisp's adjustable-array) in the order it
comes in, i.e., sorted by time.  To find all the data from Tuesday
9/21 to Thursday 9/23, use a binary search to find 9/21, then another
binary search (starting from the 9/21 point) to find 9/23.

From: Jon Boone
Subject: Re: Choice of data structures
Date: Mon, 27 Sep 2004 12:44:03 +0000
Message-ID: <BD7D8353.37308%ipmonger@comcast.net>

On 2004-09-27 07:31, in article
····························@posting.google.com, "Mark McConnell"
<···············@yahoo.com> wrote:

> I'm wondering if you need the hashing.  If you need to do some
> random-access operation, then sure, hashing is necessary.  But
> random-access operations are not mentioned directly in 1 and 2.
> 
> You could store the time in each piece of data, then use binary search
> to find ranges of data defined by time.  Put the data into the generic
> sequences (Java's ArrayList, Lisp's adjustable-array) in the order it
> comes in, i.e., sorted by time.  To find all the data from Tuesday
> 9/21 to Thursday 9/23, use a binary search to find 9/21, then another
> binary search (starting from the 9/21 point) to find 9/23.

  I was planning on using hashes to make out-of-order data insertion easier.
Since things would be keyed by the data-collection-date, I could just insert
the data according to the hash (not really caring where it ended up).  When
I wanted to pull things out in sequence, I'd use the sorted keys to pull the
data in sequence.

  I've never used adjustable-arrays.  I'll spend some time looking at them
and see if I can figure out how to work out the out-of-order data insertion.

--jon

From: Will Hartung
Subject: Re: Choice of data structures
Date: Mon, 27 Sep 2004 18:39:15 +0000
Message-ID: <2rr50hF1cgh65U1@uni-berlin.de>

"Jon Boone" <········@comcast.net> wrote in message
····························@comcast.net...
>   I was planning on using hashes to make out-of-order data insertion
easier.
> Since things would be keyed by the data-collection-date, I could just
insert
> the data according to the hash (not really caring where it ended up).
When
> I wanted to pull things out in sequence, I'd use the sorted keys to pull
the
> data in sequence.

Sounds like you want a Tree to me, not a hash. Then you don't have to sort
anything. There are certainly RB-Tree implementations floating around in the
assorted libraries.

Regards,

Will Hartung
(·····@msoft.com)

From: Jon Boone
Subject: Re: Choice of data structures
Date: Mon, 27 Sep 2004 20:18:53 +0000
Message-ID: <BD7DEDED.373C4%ipmonger@comcast.net>

On 2004-09-27 14:39, in article ···············@uni-berlin.de, "Will
Hartung" <·····@msoft.com> wrote:

> Sounds like you want a Tree to me, not a hash. Then you don't have to sort
> anything. There are certainly RB-Tree implementations floating around in the
> assorted libraries.

  I'll think about this too.  I generally don't use trees for data that
isn't structured hierarchically.

--jon

From: Ron Garret
Subject: Re: Choice of data structures
Date: Mon, 27 Sep 2004 20:25:48 +0000
Message-ID: <rNOSPAMon-12EF9A.13254827092004@nntp1.jpl.nasa.gov>

In article <·······················@comcast.net>,
 Jon Boone <········@comcast.net> wrote:

> On 2004-09-27 14:39, in article ···············@uni-berlin.de, "Will
> Hartung" <·····@msoft.com> wrote:
> 
> > Sounds like you want a Tree to me, not a hash. Then you don't have to sort
> > anything. There are certainly RB-Tree implementations floating around in the
> > assorted libraries.
> 
>   I'll think about this too.  I generally don't use trees for data that
> isn't structured hierarchically.

You should rethink this.  Trees can be very handy in those instances 
when you can (or have to) define a total order on your keys.

rg