Hi, I have just started looking into series, making a small system for
analyzing debug-logs, and I have come up with a few questions:
The first thing I'm doing is reading lines from an access.log with
scan-file and parsing these into clos-objects:
(defun make-access-series (path)
(choose-if (complement #'null)
(map-fn t #'access-parse
(scan-file path #'read-line))))
access-parse parses a line, does some simple filtering (to avoid
making uneccessary objects) and returns an access-log object.
To test this, I typically do something like:
(subseries (make-access-series #P"log:access.log") 0 2)
Giving me the first two parsed access-log objects.
1. Series are supposed to be lazy, which I though meant that in this
example only the first two lines would need to be read (given that
the access-parse function accepts these), but running this code
leads to heavy disk activity. Have I misunderstood "lazy", or is
this implementation-dependent? (I'm using series from CLOCC on
clisp)
2. Once the parsing is debugged, the next step is to include this in
higher-order functions. This means both a developement fase, where
I only need a small set of data (the same data over and over
will do nicely), and doing the full analysis, where I'll probably
want to run the same set of data through different aggregations.
In both these cases the reading and parsing really only needs to be
done once, and I wonder: how can I add caching to a given stage in
this analysis, and upon re-read of the series, read from the cache
instead of re-doing the read-parse? If this was a shell-script, I
would do:
sed parselog.sed < logfile | tee log.tmp1 | analysis-1
and later on:
analysis-2 < log.tmp1
In other words: how can I make "tee" with a series?
(I imagine the CL solution can be a lot smarter than tee, having
implicit caching without the caller knowing, and automatically
continuing to read the original series once past the length of the
cache)
3. Some log-files have entries spanning several lines. Is it possible
to make a function collecting a number of entries from an
input-series and having the result returned as a new series? Like
chunk, but the number of entries in each chunk is not known in
advance.
regards, rolf rander
--
http://www.pvv.org/~rolfn/
"Den som kun tar sp�k for sp�k og alvor kun alvorligt, han og hun har
faktisk fattet begge deler d�rlig" -- Piet Hein
>>>>> "Rolf" == Rolf Rander N�ss <··········@pvv.org> writes:
Rolf> Hi, I have just started looking into series, making a small system for
Rolf> analyzing debug-logs, and I have come up with a few questions:
Rolf> The first thing I'm doing is reading lines from an access.log with
Rolf> scan-file and parsing these into clos-objects:
Rolf> (defun make-access-series (path)
Rolf> (choose-if (complement #'null)
Rolf> (map-fn t #'access-parse
Rolf> (scan-file path #'read-line))))
Rolf> access-parse parses a line, does some simple filtering (to avoid
Rolf> making uneccessary objects) and returns an access-log object.
Rolf> To test this, I typically do something like:
Rolf> (subseries (make-access-series #P"log:access.log") 0 2)
Don't you need to declare make-access-series to be a series function?
I think if you don't, make-access-series will read every single line
and create a series (from choose-if). This then gets passed to
subseries.
If you declare make-access-series to be a series function, then series
can optimize it and then only the first 2 lines will be read.
Ray
··········@pvv.org (Rolf Rander N�ss) writes:
>(I'm using series from CLOCC on clisp)
No, ofcourse not. Series comes from: http://series.sourceforge.net/
rolf rander
--
http://www.pvv.org/~rolfn/
"Den som kun tar sp�k for sp�k og alvor kun alvorligt, han og hun har
faktisk fattet begge deler d�rlig" -- Piet Hein
Rolf Rander N�ss wrote:
> Hi, I have just started looking into series, making a small system for
> analyzing debug-logs, and I have come up with a few questions:
>
> The first thing I'm doing is reading lines from an access.log with
> scan-file and parsing these into clos-objects:
>
> (defun make-access-series (path)
> (choose-if (complement #'null)
> (map-fn t #'access-parse
> (scan-file path #'read-line))))
I think you want
(defun make-access-series (path)
(declare (optimizable-series-function))
(choose-if (complement #'null)
(map-fn t #'access-parse
(scan-file path #'read-line))))
> access-parse parses a line, does some simple filtering (to avoid
> making uneccessary objects) and returns an access-log object.
>
> To test this, I typically do something like:
>
> (subseries (make-access-series #P"log:access.log") 0 2)
>
> Giving me the first two parsed access-log objects.
>
> 1. Series are supposed to be lazy, which I though meant that in this
> example only the first two lines would need to be read (given that
> the access-parse function accepts these), but running this code
> leads to heavy disk activity. Have I misunderstood "lazy", or is
> this implementation-dependent? (I'm using series from CLOCC on
> clisp)
Well it should read lines until two non-nil objects have been returned
by ACCESS-PARSE. I'm not sure why your harddrive would have a hard time
with that...
> 2. Once the parsing is debugged, the next step is to include this in
> higher-order functions. This means both a developement fase, where
> I only need a small set of data (the same data over and over
> will do nicely), and doing the full analysis, where I'll probably
> want to run the same set of data through different aggregations.
> In both these cases the reading and parsing really only needs to be
> done once, and I wonder: how can I add caching to a given stage in
> this analysis, and upon re-read of the series, read from the cache
> instead of re-doing the read-parse? If this was a shell-script, I
> would do:
> sed parselog.sed < logfile | tee log.tmp1 | analysis-1
> and later on:
> analysis-2 < log.tmp1
> In other words: how can I make "tee" with a series?
> (I imagine the CL solution can be a lot smarter than tee, having
> implicit caching without the caller knowing, and automatically
> continuing to read the original series once past the length of the
> cache)
I'm not exactly sure what you want here. If you're talking about storing
a series for later examination, you can't do that without losing all
optimalisation Series provides. You can collect it into a data-structure
or file though, and then later scan that to get a series again.
> 3. Some log-files have entries spanning several lines. Is it possible
> to make a function collecting a number of entries from an
> input-series and having the result returned as a new series? Like
> chunk, but the number of entries in each chunk is not known in
> advance.
You can use PRODUCING to create off-line transducers. I think that's
what you want.
You may want to read
AIM-1082 - Optimization of Series Expressions: Part I: User's Manual for
the Series Macro Package
ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1082.pdf
for more information on how to use Series.
Regards,
Dirk Gerrits
Dirk Gerrits <····@dirkgerrits.com> writes:
> Rolf Rander N�ss wrote:
>> Hi, I have just started looking into series, making a small system for
>> analyzing debug-logs, and I have come up with a few questions:
>> The first thing I'm doing is reading lines from an access.log with
>> scan-file and parsing these into clos-objects:
>> (defun make-access-series (path)
>> (choose-if (complement #'null)
>> (map-fn t #'access-parse
>> (scan-file path #'read-line))))
>
> I think you want
>
> (defun make-access-series (path)
> (declare (optimizable-series-function))
> (choose-if (complement #'null)
> (map-fn t #'access-parse
> (scan-file path #'read-line))))
This is the same advice Raymond Toy gave me(?). This alone didn't
make any difference, but when I also wrapped the subseries-call into
an optimizable-series-function like this:
(defun access-series-first-2 (path)
(declare (optimizable-series-function 1))
(subseries (make-access-series path) 0 2))
it made a huge difference.
>> 2. Once the parsing is debugged, the next step is to include this in
>> higher-order functions. This means both a developement fase, where
>> I only need a small set of data (the same data over and over
>> will do nicely), and doing the full analysis, where I'll probably
>> want to run the same set of data through different aggregations.
>> In both these cases the reading and parsing really only needs to be
>> done once, and I wonder: how can I add caching to a given stage in
>> this analysis, and upon re-read of the series, read from the cache
>> instead of re-doing the read-parse? If this was a shell-script, I
>> would do:
>> sed parselog.sed < logfile | tee log.tmp1 | analysis-1
>> and later on:
>> analysis-2 < log.tmp1
>> In other words: how can I make "tee" with a series?
>> (I imagine the CL solution can be a lot smarter than tee, having
>> implicit caching without the caller knowing, and automatically
>> continuing to read the original series once past the length of the
>> cache)
>
> I'm not exactly sure what you want here. If you're talking about
> storing a series for later examination, you can't do that without
> losing all optimalisation Series provides. You can collect it into a
> data-structure or file though, and then later scan that to get a
> series again.
Collecting results is what I want, but transparent to the syntax.
Something like:
(subseries
(light-analysis-1
(caching-series *series-cache*
(heavy-seriesmaking-function)))
0 20)
which would store the first n results from the
heavy-seriesmaking-function (enough to compute subseries 0 to 20) in
the *series-cache*. Such that a later call to for example:
(subseries
(light-analysis-2
(caching-series *series-cache*
(heavy-seriesmaking-function)))
0 40)
would not need to re-calculate the heavy-seriesmaking-function for the
first 20 entries. (This would ofcourse mandate storing some kind of
state for the heavy-seriesmaking-function).
While developing and debugging the analysis-functions, I would like to
operate on just a subseries, and avoid re-reading data from file each
time. (In the code-test-debug-cycle, anything more than 10s is "too
long")
Anyway, reading the responses to my question, studying the series-doc
and looking at the macroexpansions of the function-definitions above,
it seems my caching-idea isn't really feasible, and it is possibly the
wrong way to approach series.
> AIM-1082 - Optimization of Series Expressions: Part I: User's Manual
> for the Series Macro Package
>
> ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1082.pdf
Yes, thank you.
rolf rander
--
http://www.pvv.org/~rolfn/
"Den som kun tar sp�k for sp�k og alvor kun alvorligt, han og hun har
faktisk fattet begge deler d�rlig" -- Piet Hein
From: Rahul Jain
Subject: Re: using SERIES for log-parsing
Date:
Message-ID: <87pta65ipy.fsf@nyct.net>
··········@pvv.org (Rolf Rander N�ss) writes:
> Anyway, reading the responses to my question, studying the series-doc
> and looking at the macroexpansions of the function-definitions above,
> it seems my caching-idea isn't really feasible, and it is possibly the
> wrong way to approach series.
Not at all. Just bind the series to a local variable and pass it along
in both cases where you need to use it. You'll get compiler warnings
when you are creating a traversal which can't be compiled to a fast
loop.
--
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist