CL vs. K/KDB or efficient analysis of financial time series

From: Joel Reymont
Subject: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 17:05:34 +0000
Message-ID: <1120928734.327138.200390@o13g2000cwo.googlegroups.com>

Folks,

Suppose I have a stock price time series of a million or more entries.
I'm looking to backtest a trading strategy over all these price points.
My strategy would only need, say, 400 points at a time so I would need
to have a sliding window where the first point is always dropped as I
move the window forward.

What is the most efficient way of implementing this with Lisp? Would
the series package help?

I'm being as non-technical as possible on purpose and I'm avoiding the
issue of how to store all the data.

Basically, I evaluating different technologies for implementing a
short-term futures trading platform. The original post is here:
http://groups-beta.google.com/group/uptick/browse_thread/thread/bf733a1f3f3a2e38

I was looking at http://kx.com/papers/kdb-tickdemo.php which tells you
that

"Kdb achieves this kind of efficiency because of its design. It is
fully vertically partitioned and replaces tables by ordered arrays. For
this reason, moving averages, correlations and other such queries can
be done directly in the database."

I'm wondering if it's possible to achieve with Lisp the kind of time
series analysis performance that K achieves with KDB. Apparently,
simple table scans on over one billion trades (e.g. select max price
from trades) can be done in less than a second with K.

    Thanks, Joel

Re: CL vs. K/KDB or efficient analysis of financial time series GP lisper
- Re: CL vs. K/KDB or efficient analysis of financial time series Eric Lavigne
  - Re: CL vs. K/KDB or efficient analysis of financial time series Joel Reymont
    - Re: CL vs. K/KDB or efficient analysis of financial time series Greg Menke
  - Re: CL vs. K/KDB or efficient analysis of financial time series GP lisper
Re: CL vs. K/KDB or efficient analysis of financial time series Eric Lavigne
- Re: CL vs. K/KDB or efficient analysis of financial time series Joel Reymont
  - Re: CL vs. K/KDB or efficient analysis of financial time series Eric Lavigne
    - Re: CL vs. K/KDB or efficient analysis of financial time series Joel Reymont
      - Re: CL vs. K/KDB or efficient analysis of financial time series Pascal Bourguignon
        Re: CL vs. K/KDB or efficient analysis of financial time series Joel Reymont
        Re: CL vs. K/KDB or efficient analysis of financial time series Pascal Bourguignon
        Re: CL vs. K/KDB or efficient analysis of financial time series Joel Reymont
Re: CL vs. K/KDB or efficient analysis of financial time series Joe Marshall
Re: CL vs. K/KDB or efficient analysis of financial time series Fernando

From: GP lisper
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 17:33:38 +0000
Message-ID: <1120931105.90d55513b964e566e01e54b2fa6ed9c6@teranews>

On 9 Jul 2005 10:05:34 -0700, <······@gmail.com> wrote:
>
> Suppose I have a stock price time series of a million or more entries.
> I'm looking to backtest a trading strategy over all these price points.
> My strategy would only need, say, 400 points at a time so I would need

trading day : 390 minutes

'seconds per point' : (/ (* 60 390) 1000000)

time (seconds) for 400 points : (* (/ (* 60 390) 1000000) 400)


-- 
LOOP :: a Domain Specific Language.

From: Eric Lavigne
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 18:07:23 +0000
Message-ID: <1120932443.504951.253860@g47g2000cwa.googlegroups.com>

>> Suppose I have a stock price time series of a million or more entries.
>> I'm looking to backtest a trading strategy over all these price points.
>> My strategy would only need, say, 400 points at a time so I would need
>
>trading day : 390 minutes
>
>'seconds per point' : (/ (* 60 390) 1000000)
>
>time (seconds) for 400 points : (* (/ (* 60 390) 1000000) 400)

I suspect that his 1,000,000 points represent more than just one
trading day. History is available for many previous days. He can use
however many points he feels are necessary for statistical certainty,
pulling data from previous trading days if necessary. In fact, I
wouldn't be very comfortable with an experiment that was based only on
one day of data.

From: Joel Reymont
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 18:15:06 +0000
Message-ID: <1120932906.159960.86600@g49g2000cwa.googlegroups.com>

Eric Lavigne wrote:
>
> I suspect that his 1,000,000 points represent more than just one
> trading day.

Right. Think a couple of years worth of ticks, where a tick might
happen a few times per second.

I suppose a lazy-load database needs to be implemented for this since
multiple strategies could be using different chunks of this price
history at the same or different time.

I would hate to load 1,000,000 price quotes into memory a few times and
create copies of the sliding window as I move forward one quote at a
time. 

    Joel

From: Greg Menke
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 21:11:58 +0000
Message-ID: <m3u0j3bs69.fsf@athena.pienet>

"Joel Reymont" <······@gmail.com> writes:

> Eric Lavigne wrote:
> >
> > I suspect that his 1,000,000 points represent more than just one
> > trading day.
> 
> Right. Think a couple of years worth of ticks, where a tick might
> happen a few times per second.
> 
> I suppose a lazy-load database needs to be implemented for this since
> multiple strategies could be using different chunks of this price
> history at the same or different time.
> 
> I would hate to load 1,000,000 price quotes into memory a few times and
> create copies of the sliding window as I move forward one quote at a
> time. 
> 
>     Joel

Why bother?  Just load 10, 20, 200 megs of data or whatever and let'er
rip.  vm is designed to help you do this kind of thing- thats what its
there for- just set up your SQL query, load in the data and have at it.

If you're writing a best of breed or whatever, then you are in a
position to dictate a pile of core memory.  If you get to a position
where you have 4 gigs in the machine and vm is thrashing, then its time
to reconsider your approach.

In other words, don't worry too much about pre-optimizing your solution.

Gregm

From: GP lisper
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sun, 10 Jul 2005 06:40:59 +0000
Message-ID: <1120977902.769f70f73f3301fa7688e923d379aed4@teranews>

On 9 Jul 2005 11:07:23 -0700, <············@gmail.com> wrote:
>
> pulling data from previous trading days if necessary. In fact, I
> wouldn't be very comfortable with an experiment that was based only on
> one day of data.

Then you would be poorer.


-- 
LOOP :: a Domain Specific Language.

From: Eric Lavigne
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 17:34:20 +0000
Message-ID: <1120930460.404924.203930@f14g2000cwb.googlegroups.com>

>Suppose I have a stock price time series of a million or more entries.
>I'm looking to backtest a trading strategy over all these price points.
>
>...
>
>Apparently,
>simple table scans on over one billion trades (e.g. select max price
>from trades) can be done in less than a second with K.

I assume you will spend more than a second thinking of a new trading
strategy and describing it with sufficient precision for your program
to be able to test it. If it takes several minutes for you to do your
part, is it so important for the program to do its part in under a
second?

Write your program. Use it. If it's too slow, you can always change its
scanning method later.

From: Joel Reymont
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 17:40:30 +0000
Message-ID: <1120930830.726851.20910@g49g2000cwa.googlegroups.com>

Eric,

It won't quite work as you suggest. Your approach would normally be the
one that I would follow but in this case I _am_ looking for top
performance. I want to create best-of-the-breed software or get as
close as I can. My system would handle price ticks in real-time and
according to traders milliseconds are important these days.  

    Joel

From: Eric Lavigne
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 18:02:49 +0000
Message-ID: <1120932169.348019.217940@z14g2000cwz.googlegroups.com>

>It won't quite work as you suggest. Your approach would normally be the
>one that I would follow but in this case I _am_ looking for top
>performance. I want to create best-of-the-breed software or get as
>close as I can. My system would handle price ticks in real-time and
>according to traders milliseconds are important these days.

We are still talking about backtesting a trading strategy, right? Every
millisecond, you want to go back and retest the same strategy over the
last million entries because the new information you received this
minute might have an effect on the result?

Backtesting a strategy requires a lot of data and could take some time,
but it doesn't need to be done quickly or often. "Real-time" has no
meaning when it comes to backtesting.

Employing the strategy requires only 400 points and will be blazingly
fast, in fact it will be hard to find a slow way to do this. Speed is
important for this part, but you get the speed for free.

I probably just misunderstood your problem, but to me there just
doesn't seem to be a problem.

From: Joel Reymont
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 18:09:32 +0000
Message-ID: <1120932572.635511.18630@g43g2000cwa.googlegroups.com>

Eric,

Backtesting the strategy requires a window of 400 data points, sliding
over a time series of, say, 1 million data points. My question is how
to implement this sliding window in the most efficient manner, e.g.
with the least work for the garbage collector.

The real-time bit is will require to keep the last 400 data points
around, always adding the most recent point to the front and dropping
the last point.

I discovered the Series package at SourceForge and I'm wondering if
it's the best way to go.

From: Pascal Bourguignon
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 19:03:56 +0000
Message-ID: <87fyunls2r.fsf@thalassa.informatimago.com>

"Joel Reymont" <······@gmail.com> writes:

> Eric,
>
> Backtesting the strategy requires a window of 400 data points, sliding
> over a time series of, say, 1 million data points. My question is how
> to implement this sliding window in the most efficient manner, e.g.
> with the least work for the garbage collector.

Keep the data in a 1000000-entry vector.

(loop with window = (make-array '(400) :displaced-to data
                                       :displaced-index-offset 0
                                       :adjustable t)
      for n below (- (array-dimension data 0) (array-dimension window 0))
      do (adjust-array window (array-dimensions window)
                       :displaced-to (array-displacement window)
                       :displaced-index-offset n)
         (backtest window))

> The real-time bit is will require to keep the last 400 data points
> around, always adding the most recent point to the front and dropping
> the last point.

Depends on the kind of accesses you'll make on that data.  If mostly
serial accesses, then you could keep just a list:

(defstruct hlist head tail)
(defun hlist (&rest items) (make-hlist :head items :tail (last items)))
(defun enqueue (hlist item)
   (if (null (hlist-head hlist))
       (setf (hlist-head hlist) (cons item nil)
             (hlist-tail hlist) (hlist-head hlist))
       (setf (cdr (hlist-tail hlist)) (cons item nil)
             (hlist-tail hlist)  (cdr (hlist-tail hlist)))))
(defun dequeue (hlist)
   (prog1 (hlist-head hlist)
          (setf (hlist-head hlist) (cdr (hlist-head hlist)))
          (when (null (hlist-head hlist))
             (setf (hlist-tail hlist) nil))))

(defun shift (hlist new-item)
    (enqueue hlist new-item)
    (dequeue hlist))



> I discovered the Series package at SourceForge and I'm wondering if
> it's the best way to go.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
In deep sleep hear sound,
Cat vomit hairball somewhere.
Will find in morning.

From: Joel Reymont
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 19:13:16 +0000
Message-ID: <1120936395.958178.161610@g43g2000cwa.googlegroups.com>

Pascal,

Would both of the approaches above require a lot of memory trashing and
garbage collection since the sliding window has to be created and then
discarded at each data point?

    Thanks, Joel

From: Pascal Bourguignon
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 20:43:35 +0000
Message-ID: <87br5blngo.fsf@thalassa.informatimago.com>

"Joel Reymont" <······@gmail.com> writes:
> Would both of the approaches above require a lot of memory trashing and
> garbage collection since the sliding window has to be created and then
> discarded at each data point?

You've not read my lisp very closely, have you?

1M data points is peanuts for today memories.
Modifying an adjustable displaced array doesn't cost anything spacewise.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay

From: Joel Reymont
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 20:48:01 +0000
Message-ID: <1120942081.511416.280970@o13g2000cwo.googlegroups.com>

Pascal Bourguignon wrote:
>
> You've not read my lisp very closely, have you?
>
> 1M data points is peanuts for today memories.
> Modifying an adjustable displaced array doesn't cost anything spacewise.

Mea culpa, I see it now!

From: Joe Marshall
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sat, 09 Jul 2005 19:14:06 +0000
Message-ID: <wtnzhjwh.fsf@comcast.net>

"Joel Reymont" <······@gmail.com> writes:

> Folks,
>
> Suppose I have a stock price time series of a million or more entries.
> I'm looking to backtest a trading strategy over all these price points.
> My strategy would only need, say, 400 points at a time so I would need
> to have a sliding window where the first point is always dropped as I
> move the window forward.
>
> What is the most efficient way of implementing this with Lisp? 

You want a circular buffer.

> Would the series package help?

Not particularly.

-- 
~jrm

From: Fernando
Subject: Re: CL vs. K/KDB or efficient analysis of financial time series
Date: Sun, 10 Jul 2005 10:54:41 +0000
Message-ID: <1120992881.794619.177170@g14g2000cwa.googlegroups.com>

Joel Reymont wrote:

> What is the most efficient way of implementing this with Lisp? Would
> the series package help?

No. The series package has nothing to do with time-series. It's an
abstraction around loops.