Dang - did you ever get stumped

From: rusty craine
Subject: Dang - did you ever get stumped
Date: Thu, 12 Nov 1998 00:00:00 +0000
Message-ID: <72enhd$gu8$1@excalibur.flash.net>

We are getting stream data from a analog to digital converter.  The range of
the gathered binary numbers is from 0 to 255 (decimal).  If the data was
graphed it would be a wave form of random amplitude and period.  Our goal is
to plot three lines on the screen 1) maximum amplitue 2) average amplitude
3) minimum amplitued.  The lisp program converts incoming data to lisp
integer and pushes 2500 bytes onto a list processes that list, loops back
for the next 2500 bytes.  At first glance, processing the data to represent
three lines on a screen seemed routine...hmmmm.

We have made several attempts, all fairly unsuccessful.  Our first attempt
was kind of a linear regression  with weighted influence analysis taking an
arbitrary number of bytes at a time.  This was fast enough to keep up but
did not give an effective visualization of the input.  Out next attempt was
to "crawl" the list comparatively, locating the actual points and plotting
those.  That mutated into several algorithms.  The most successful was a
matrix of lists, "pre-sorting"  the input into seven list (or any number).
Each list has a nill or value pushed onto it for each  byte read.  We then
crawl the two outside list looking for values.  If a position is nill we
"nth" down to the next lower list till we find a value and print that
derived location on the screen.  Sounded real slow but really was our most
successful attempt.

We are now back to the drawing board.  After looking at several hours of
data, we have discovered that the period of the wave form is never more that
100 or fewer than 10 bytes.  So we are experimenting with much smaller list.
This makes direct analysis of the list much faster.  Still seems like there
is a better approach I'm missing.

The good news is the program is being used to evaluated certain aspects of
frogs nervous system responses.  As long as a frog can't hire a lawyer we
can experiment with the program.  Come up with a good idea and will send you
some frog legs from our failed attemps

Rusty


--
Email address  is ········@flash.net.  Under some situation the "a" and the
"l" in flash are reversed in the header.
Rusty Craine

Re: Dang - did you ever get stumped Raffael Cavallaro
- Re: Dang - did you ever get stumped rusty craine
Re: Dang - did you ever get stumped Gareth McCaughan
- Re: Dang - did you ever get stumped rusty craine
Re: Dang - did you ever get stumped rusty craine
- Re: Dang - did you ever get stumped Gareth McCaughan

From: Raffael Cavallaro
Subject: Re: Dang - did you ever get stumped
Date: Thu, 12 Nov 1998 00:00:00 +0000
Message-ID: <raffael-1211981311400001@raffaele.ne.mediaone.net>

In article <············@excalibur.flash.net>, "rusty craine"
<········@flash.net> wrote:

>This was fast enough to keep up but
>did not give an effective visualization of the input.

I think it would help if you let us know what would constitute "effective
visualization." Are the moving average, max and min not good enough, or
are you just having trouble implementing these?

If it's a max, min and moving average you want, don't you need to decide
on the relevant period (i.e., a 100 byte moving average? a 25 byte moving
average). This would probably depend on the input data rate (i.e., how
many bytes per second) and any observed periodicity in the data (such as
that you mentioned).

Be careful when your moving average gets to the beginning of a new 2500
byte input list or you could get a temporal discontinuity in your disply.
You should straddle two input lists by caching the last n-1 elements of
the previous list, where n is the period of your moving average.

Hope this helps save some frogs. ; )

Raf

-- 
Raffael Cavallaro

From: rusty craine
Subject: Re: Dang - did you ever get stumped
Date: Thu, 12 Nov 1998 00:00:00 +0000
Message-ID: <72fet6$n24$1@excalibur.flash.net>

Raffael Cavallaro wrote in message ...
>In article <············@excalibur.flash.net>, "rusty craine"
><········@flash.net> wrote:
>
>>This was fast enough to keep up but
>>did not give an effective visualization of the input.
>
>I think it would help if you let us know what would constitute "effective
>visualization." Are the moving average, max and min not good enough, or
>are you just having trouble implementing these?
>
Yep I see that is kinda confusing.  Our regressive algorithm didn't react
fast enough to the data.  So what we really came up with was a smoothing of
the min and max. Thus what appeared on the screen was of no use to the
researchers.  In the subsequent attempts where we search the lists,  we
actually locate the next max in the one of the three top list.  Using this
as a starting location to serach the bottom three lists for the next
minimum.  The average is a calculation of the two.

Today we have identified the actual range of numbers we are getting is never
below 50 or over 190.  So each of the seven list will get a range of 20
numbers.  For example the bottom list (lst1) gets numbers ranging from 50 to
70 or 0 for each byte input and so on over the spread of list and number
range.

Here is our rationale to date.  We are in the process of discovering the
minimum number that can be a maxium amplitude and the maximum number that
can be a mimium  amplitude  (geeez).  Any reading falling on the inside of
those two ranges (I really don't want to type "the range of the maximum
maximun to the minimum maximum or maximum minimum to the minimum mimimum").
will not be collected on the front end thus want have to be processed.  It
looks like we can discard about 40 to 50% of all incoming bytes.   The data
input rate is fairly slow 9.5K/sec.  If we can discard half or so of in
coming data, cut down the number of list we are processing, I may have
bothered you for nothing.  Sorry, panic and deadlines make for desperate
pleas.

There was a very good suggestion sent to me pointing out that using arrays
would be fast.  The problem is the release of lisp we are using doesn't have
a "machine native" array, thus arrays have not worked out well for me in the
past.

ya'll come on down for frog legs (how do they say that in french, got to
sound better)
rusty

From: Gareth McCaughan
Subject: Re: Dang - did you ever get stumped
Date: Thu, 12 Nov 1998 00:00:00 +0000
Message-ID: <86vhkk35y9.fsf@g.pet.cam.ac.uk>

Rusty Craine wrote:

> We are getting stream data from a analog to digital converter.  The range of
> the gathered binary numbers is from 0 to 255 (decimal).  If the data was
> graphed it would be a wave form of random amplitude and period.  Our goal is
> to plot three lines on the screen 1) maximum amplitue 2) average amplitude
> 3) minimum amplitued.

Do you really mean "amplitude" (i.e., distance from lowest point to
highest point on the wave), or do you mean that you want to plot the
lowest, highest and average *values* in your data stream?

From your description of what you've tried, it sounds more like
the second. I'll assume it is; if I'm wrong and that makes nonsense
of everything I say, tell me so...

First off, a suggestion which may be nonsense. Rather than plotting
just min/max/avg, I think it might be good to plot some other measure
of range (quartiles, perhaps), because that is less likely to get
disturbed by occasional rogue data points.

Here's what I'd have.

1. An array "count" of size 256. count[x] contains the number of
   x's you have had in the last N data points.

2. A queue of length N. The head of the queue is the data point
   that arrived N ticks ago: the one you need to throw out when
   a new point arrives.

3. Variables holding

     - the average of the last N data points (better: the sum)
     - the quartiles (minimum, lower quartile, median, upper quartile,
       maximum) of the last N data points

Except that with each of those "quartile" variables I'd store a
little extra information to make updating them much faster.
Specifically: imagine that your last N data points are arranged
in order, smallest to largest, and numbered from 1 to N; and
that they're distributed into 256 boxes 0..255. (Of course actually
all the points in each box have the same value. But they have
separate numbers anyway.) The median, for instance, is defined
to be point number (N+1)/2. So, as well as remembering what
value the median point has, we also remember where it is within
its box: is it the smallest item in the box? The largest? The 23rd?

Now, every time a new data point x arrives, here's what you do.

1. Look at the head of the queue; call it y. What we're doing is
   replacing y with x. We don't know where x,y are within their
   boxes: let's arbitrarily say each is in the middle of its box.

2. Updating the average is easy. Just add (x-y)/N. Or, if you're
   keeping the sum rather than the average as I suggest, just add
   x-y.

3. Updating the "count" array is also easy. Increment count[x]
   and decrement count[y].

4. What about the quartiles? Well, what's happening is that all
   the data points between x and y are being shifted along one in
   their ordering, but you still want e.g. the lower quartile to
   be point number (N+3)/4. So, nudge each quartile that falls
   in the range between x and y along one in the right direction.
   If it falls out of its box, search linearly through the boxes
   in the appropriate direction until you find a non-empty box;
   then that quartile is the smallest or the largest thing in
   that box.

Getting this just right is tricky (well, it took me a while...).
I've included some code at the end of this article. It's not
terribly good code. I don't guarantee there are no mistakes in
it. You're welcome to do with it as you please.

On my box (P166ish), running CMU CL, I get the following:
  | * (dsr-time 1000 1000000)
  | [GC threshold exceeded with 694,784 bytes in use.  Commencing GC.]
  | [GC completed with 632,832 bytes retained and 61,952 bytes freed.]
  | [GC will next occur when at least 2,632,832 bytes are in use.]
  | 
  | Timing a null loop...
  | Evaluation took:
  |   2.23 seconds of real time
  |   2.224129 seconds of user run time
  |   2.9000002e-5 seconds of system run time
  |   0 page faults and
  |   0 bytes consed.
  | [GC threshold exceeded with 643,584 bytes in use.  Commencing GC.]
  | [GC completed with 637,952 bytes retained and 5,632 bytes freed.]
  | [GC will next occur when at least 2,637,952 bytes are in use.]
  | Timing the real thing...
  | Evaluation took:
  |   5.64 seconds of real time
  |   5.597803 seconds of user run time
  |   2.01e-4 seconds of system run time
  |   0 page faults and
  |   0 bytes consed.
plus some boring output you don't want to see.

In other words, with a 1000-sample window (not that this figure
makes much difference) each update operation takes about 3.4
microseconds and doesn't cons at all.

I seem to remember you're using muLisp. This is all Common Lisp;
I wouldn't expect it to be hard to translate into any civilised
dialect of Lisp. I don't know how civilised muLisp is. :-)

---------------------------- code begins ----------------------------
;;; Rusty Craine's data stream.

(declaim (optimize (speed 3)))

;;;--------------------------------------------------------------------
;;; A bin/bin-offset pair, for when we need finer resolution
;;; than just the bin.
(defstruct (quartile
            (:constructor make-quartile (value offset)))
  (value  0 :type (integer 0 255))
  (offset 0 :type (integer 0 10000)))

;;;--------------------------------------------------------------------
;;; Our main data structure.
;;; NB that MAKE-DSR doesn't produce a usable DSR. Use NEW-DSR,
;;; defined below.
(defstruct (data-stream-record
            (:conc-name dsr-)
            (:constructor make-dsr))

  ;; The number of items being remembered.
  (n     100
         :type (integer 1 10000))

  ;; count[x] is the number of items with value x.
  (count (make-array '(256) :initial-element 0
                            :element-type '(integer 0 10000))
         :type (simple-array (integer 0 10000) (256)))

  ;; A queue containing the last N items, in order.
  ;; It's implemented as an array of size N, plus a pointer;
  ;; the pointer points to the item we're about to remove and,
  ;; of course, to the space in which we'll put its replacement.
  (queue (make-array '(0) :initial-element 0
                          :element-type '(integer 0 255))
         :type (simple-array (integer 0 255) (*)))

  ;; The pointer for the array
  (queue-pos 0 :type (integer 0 100000))

  ;; The sum of all the items so far.
  (sum 0 :type (integer 0 2550000))

  ;; Quartiles. Each is represented by its value and its
  ;; offset within the bin.
  (quartiles
    (make-array '(5)
      :element-type 'quartile
      :initial-contents (loop for i fixnum from 0 to 4
                              collect (make-quartile 0 0)))
    :type (simple-array quartile (5)))
)

;;;--------------------------------------------------------------------
;;; You need to call this every time a new data point arrives.
;;; It updates all the relevant stuff.
(defun update-dsr (dsr datum)
  "Update DSR to accommodate a new DATUM."
  (declare (type data-stream-record dsr))
  (declare (type (integer 0 255) datum))

  (let ((count (dsr-count dsr))
        (dead-datum 0))		; this value of 0 is bogus

    (declare (type (integer 0 255) dead-datum))

    ;; Replace old value with new in queue, and get old value
    (let ((p (dsr-queue-pos dsr)))
      (shiftf dead-datum (aref (dsr-queue dsr) p) datum)
      (setf (dsr-queue-pos dsr)
            (1- (if (zerop p) (dsr-n dsr) p))))

    ;; Needn't do anything if old=new
    (when (= datum dead-datum)
      (return-from update-dsr))

    ;; Update trivial stuff: counts and sum
    (incf (aref count datum))
    (decf (aref count dead-datum))
    (incf (dsr-sum dsr) (- datum dead-datum))

    ;; Fix up quartiles.
    ;; This is fiddly. Conceptually, a new or old item is always assumed
    ;; to be right at the start of its bin. The best way to understand
    ;; the precise conditions under which a quartile is moved is to
    ;; scribble with pencil and paper for a while...
    (let ((smaller datum)
          (larger dead-datum)
          (new<old (< datum dead-datum)))
      ;; I don't understand why CMUCL needs this:
      (declare (type (integer 0 255) smaller larger))
      (unless new<old (rotatef smaller larger))
      (loop for q across (dsr-quartiles dsr) doing
        (if new<old
          (when (< smaller (quartile-value q) (1+ larger))
            ;; Move quartile at q down one.
            (if (zerop (quartile-offset q))
              ;; We need to search downwards.
              (loop for i downfrom (1- (quartile-value q))
                    while (zerop (aref count i))
                    finally (setf (quartile-value q) i
                                  (quartile-offset q) (1- (aref count i))))
              (decf (quartile-offset q))))
          (when (and (<= smaller (quartile-value q) larger)
                     (or (< smaller (quartile-value q))
                         (>= (quartile-offset q)
                            (aref count (quartile-value q)))))
            ;; Move quartile at q up one.
            (when (>= (incf (quartile-offset q))
                      (aref count (quartile-value q)))
              ;; We need to search upwards.
              (loop for i upfrom (1+ (quartile-value q))
                    while (zerop (aref count i))
                    finally (setf (quartile-value q) i
                                  (quartile-offset q) 0))))
)) ) ))

;;;--------------------------------------------------------------------
;;; Make a new DSR with all its data points at 0.
(defun new-dsr (&optional (n 100))
  (declare (type (integer 1 10000) n))
  (let ((dsr (make-dsr :n n)))
    (setf (dsr-queue dsr) (make-array (list n) :initial-element 0
                                               :element-type '(integer 0 255)))
    (setf (aref (dsr-count dsr) 0) n)
    (dotimes (i 5)
      (setf (aref (dsr-quartiles dsr) i)
            (make-quartile 0 (floor (* (1- n) i) 4))))
    dsr))

;;;--------------------------------------------------------------------
;;; Check that the quartiles of a DSR are right.
(defun check-dsr (dsr n)
  (declare (type (integer 1 10000) n))
  (let* ((qq (dsr-quartiles dsr))
         (ok t)
         (sq (sort (concatenate '(simple-array (integer 0 255) (*))
                                (dsr-queue dsr)) #'<)))
    (declare (type (simple-array (integer 0 255) (*)) sq))    
    (loop for i fixnum from 0 to 4 doing
      (let ((q (aref qq i))
            (j (floor (the fixnum (* (1- n) i)) 4)))
        (let ((val (quartile-value q))
              (ofs (quartile-offset q)))
          (unless (and (= (aref sq j) val)
                       (= (aref sq (- j ofs)) val)
                       (or (= j ofs)
                           (/= (aref sq (- j ofs 1)) val)))
            (format t "Quartile ~A at (~A,~A) should be ~:R at ~A~%"
                      i val ofs ofs (aref sq j))
            (setq ok nil)) )) )
    (unless ok
      (format t "Sorted list is:~%~A~%" sq))
    ok))

;;;--------------------------------------------------------------------
;;; Make a DSR and exercise it for a bit, testing it all the time.
(defun dsr-test (&optional (n 1000))
  (declare (type (integer 1 10000) n))
  (let ((d (new-dsr n)))
    (dotimes (i (* 2 n))
      (update-dsr d (random 256))
      (unless (check-dsr d n) (return-from dsr-test (values d 'ZOG))))
    d))

;;;--------------------------------------------------------------------
;;; Make a DSR and then time some operations on it.
(defun dsr-time (&optional (size 1000) (n 10000))
  (declare (type fixnum n))
  (let ((d (new-dsr size)))
    (gc)
    (format t "~&Timing a null loop...~&")
    ;; Let's hope this doesn't get optimised away.
    (time (dotimes (i n) (random 256)))
    (gc)
    (format t "~&Timing the real thing...~&")
    (time (dotimes (i n) (update-dsr d (random 256))))
    d))
----------------------------- code ends -----------------------------

-- 
Gareth McCaughan       Dept. of Pure Mathematics & Mathematical Statistics,
·····@dpmms.cam.ac.uk  Cambridge University, England.

From: rusty craine
Subject: Re: Dang - did you ever get stumped
Date: Thu, 12 Nov 1998 00:00:00 +0000
Message-ID: <72fsbo$qb7$1@excalibur.flash.net>

Gareth McCaughan wrote in message <··············@g.pet.cam.ac.uk>...
>Rusty Craine wrote:
>
>> We are getting stream data from a analog to digital converter.  The range
of
>> the gathered binary numbers is from 0 to 255 (decimal).  If the data was
>> graphed it would be a wave form of random amplitude and period.  Our goal
is
>> to plot three lines on the screen 1) maximum amplitue 2) average
amplitude
>> 3) minimum amplitued.
>
>Do you really mean "amplitude" (i.e., distance from lowest point to
>highest point on the wave), or do you mean that you want to plot the
>lowest, highest and average *values* in your data stream?
>
>From your description of what you've tried, it sounds more like
>the second. I'll assume it is; if I'm wrong and that makes nonsense
>of everything I say, tell me so...
>
>First off, a suggestion which may be nonsense. Rather than plotting
>just min/max/avg, I think it might be good to plot some other measure
>of range (quartiles, perhaps), because that is less likely to get
>disturbed by occasional rogue data points.
>
>Here's what I'd have.
>
>1. An array "count" of size 256. count[x] contains the number of
>   x's you have had in the last N data points.
>
>2. A queue of length N. The head of the queue is the data point
>   that arrived N ticks ago: the one you need to throw out when
>   a new point arrives.
>
>3. Variables holding
>
>     - the average of the last N data points (better: the sum)
>     - the quartiles (minimum, lower quartile, median, upper quartile,
>       maximum) of the last N data points
>
Gareth - Indeed you had a much better idea.  Not only that but it really
gets more at the heart of what the doctors had in mind and needed  when they
made the request.  It won't  be very hard to get running in muLisp.  Thanks
for your help.  I forwarded your posting to their department head and in
doing that extracted a promise from her that if they published their work on
this they would give you an acknowledgement for your algorithm.  Geeez that
is the best I could do they were fresh out of those little two seater BMW's
for honoraria.  :)  (heee heee I wish)

Doing it much better now with a little help from bright folks
rusty


>>Gareth McCaughan       Dept. of Pure Mathematics & Mathematical
Statistics,
>·····@dpmms.cam.ac.uk  Cambridge University, England.

From: rusty craine
Subject: Re: Dang - did you ever get stumped
Date: Sun, 15 Nov 1998 00:00:00 +0000
Message-ID: <72mson$2q$1@excalibur.flash.net>

rusty craine wrote in message <············@excalibur.flash.net>...
>We are getting stream data from a analog to digital converter.  The range
of
>the gathered binary numbers is from 0 to 255 (decimal).  If the data was
>graphed it would be a wave form of random amplitude and period.

Gareth McCaughan       Dept. of Pure Mathematics & Mathematical Statistics,
·····@dpmms.cam.ac.uk  Cambridge University, England.  wrote>
>First off, a suggestion which may be nonsense. Rather than plotting
>just min/max/avg, I think it might be good to plot some other measure
>of range (quartiles, perhaps), because that is less likely to get
>disturbed by occasional rogue data points.

>Here's what I'd have.....
[snip]

Just a note to followup.  We worked most of the weekend to get  Gareth's
algorithm up and running.  Geeez it is very efficient; in a test we ran
speeds up to 14.5K and the program was buzzing right along  with no
problems.

The group I was working with kept spinning off ideas that we could use
Gareth's idea for.  One of the group named it the McCaughan
Quartiles...guess that name will stick around here.  I'll bet this is not
the last time we use it.

I must say one good way to prompt lisp is bright folks sharing good ideas
with the guys in the trenches.  This group does lisp a valuable service.

Thanks
Rusty

From: Gareth McCaughan
Subject: Re: Dang - did you ever get stumped
Date: Tue, 17 Nov 1998 00:00:00 +0000
Message-ID: <86af1qwp01.fsf@g.pet.cam.ac.uk>

Rusty Craine wrote:

> Just a note to followup.  We worked most of the weekend to get  Gareth's
> algorithm up and running.  Geeez it is very efficient; in a test we ran
> speeds up to 14.5K and the program was buzzing right along  with no
> problems.
> 
> The group I was working with kept spinning off ideas that we could use
> Gareth's idea for.  One of the group named it the McCaughan
> Quartiles...guess that name will stick around here.  I'll bet this is not
> the last time we use it.

*urgh* :-). Just call 'em quartiles. I'm sure I'm not the first
person ever to think of this. (Incidentally, I hope it's obvious
that the technique isn't restricted to quartiles. You might find,
say, the 10th and 90th percentiles useful if what you want to
know is the extremes of the oscillations.)

Oh, and have you thought about Fourier transforms? If your data
stream is usually roughly sinusoidal, with the frequencies changing
all the time, doing an FFT on the data (maybe 1024 points at a
time) would give a good idea of both frequencies and magnitudes.
I know there are some folks on c.l.l who have FFT code in Lisp,
because they've been discussing using is as a benchmark.

-- 
Gareth McCaughan       Dept. of Pure Mathematics & Mathematical Statistics,
·····@dpmms.cam.ac.uk  Cambridge University, England.