(In)Efficient snmpwalk parsing...

From: mawhin
Subject: (In)Efficient snmpwalk parsing...
Date: Thu, 11 Dec 2008 22:50:22 +0000
Message-ID: <6046899c-aa9c-47ce-8316-1e093328121c@i24g2000prf.googlegroups.com>

Hi.

I'm trying to understand (1) why my code conses so much, and (b) how I
can improve it. I'm a relative lisp newb here (about 6weeks), and
while I can do stuff with it, it's not always pretty.

Right: I've written some lisp which does stuff with switch forwarding
tables, and gets them from files I've got on disk. The files are the
output of snmpwalk, and lines look like:
 SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008

the '4008' is a port number, and the last six dot-separated numbers
representa MAC address, with the octets encoded decimally. Bastards.

So, I want to extract the port number (easy) and the MAC address (do-
able, but not so easy). And because I'm doing a helluva lot of
equality comparisons, I've decided it's more efficient to represent
MAC addresses internally as integers instead of strings.So:

I want to convert

SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008

to

(4008 1957141788), the latter part of which is the integer value of
0.0.116.167.157.28. Honest.

Here's the code I've got so far:

(defun fix-snmp-fdb-line-a (line)
;; Side-effects: None
;; Reads-globals: None
  (declare (type (simple-string) line))
  " Fixes up a line of output of snmpwalking a switch's fdb. Returns a
list
    of a port number and an integer MAC address."
  (let ((spline (string-split #\Space line)))
    (list
     (parse-integer (fourth spline))
     (parse-integer (format nil "~{~2,'0X~}"
             (mapcar #'parse-integer
                     (last (string-split #\. (first spline))
6))) :radix 16))))

Which works good. But when I profile, it's consing a lot:
  seconds  |   consed   |  calls |  sec/call  |  name
0.149 | 54,712,008 |  3,877 |   0.000038 | FIX-SNMP-FDB-LINE-A

on the order, I think, of 10k per call? Which seems excessive. Can
anyone explain(a) why I shouldn't care, or (b) what I'm missing here?

Ta.

Mart

Re: (In)Efficient snmpwalk parsing... Alessio Stalla
- Re: (In)Efficient snmpwalk parsing... mawhin
Re: (In)Efficient snmpwalk parsing... Rainer Joswig
- Re: (In)Efficient snmpwalk parsing... mawhin
  - Re: (In)Efficient snmpwalk parsing... mawhin
    - Re: (In)Efficient snmpwalk parsing... Thomas A. Russ
Re: (In)Efficient snmpwalk parsing... Thomas A. Russ
- Re: (In)Efficient snmpwalk parsing... Peder O. Klingenberg

From: Alessio Stalla
Subject: Re: (In)Efficient snmpwalk parsing...
Date: Thu, 11 Dec 2008 23:03:17 +0000
Message-ID: <e1caac55-55d4-4fe1-92b3-d00f2e5e0713@g1g2000pra.googlegroups.com>

On Dec 11, 11:50 pm, mawhin <···············@gmail.com> wrote:
> Hi.
>
> I'm trying to understand (1) why my code conses so much, and (b) how I
> can improve it. I'm a relative lisp newb here (about 6weeks), and
> while I can do stuff with it, it's not always pretty.
>
> Right: I've written some lisp which does stuff with switch forwarding
> tables, and gets them from files I've got on disk. The files are the
> output of snmpwalk, and lines look like:
>  SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008
>
> the '4008' is a port number, and the last six dot-separated numbers
> representa MAC address, with the octets encoded decimally. Bastards.
>
> So, I want to extract the port number (easy) and the MAC address (do-
> able, but not so easy). And because I'm doing a helluva lot of
> equality comparisons, I've decided it's more efficient to represent
> MAC addresses internally as integers instead of strings.So:
>
> I want to convert
>
> SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008
>
> to
>
> (4008 1957141788), the latter part of which is the integer value of
> 0.0.116.167.157.28. Honest.
>
> Here's the code I've got so far:
>
> (defun fix-snmp-fdb-line-a (line)
> ;; Side-effects: None
> ;; Reads-globals: None
>   (declare (type (simple-string) line))
>   " Fixes up a line of output of snmpwalking a switch's fdb. Returns a
> list
>     of a port number and an integer MAC address."
>   (let ((spline (string-split #\Space line)))
>     (list
>      (parse-integer (fourth spline))
>      (parse-integer (format nil "~{~2,'0X~}"
>              (mapcar #'parse-integer
>                      (last (string-split #\. (first spline))
> 6))) :radix 16))))
>
> Which works good. But when I profile, it's consing a lot:
>   seconds  |   consed   |  calls |  sec/call  |  name
> 0.149 | 54,712,008 |  3,877 |   0.000038 | FIX-SNMP-FDB-LINE-A
>
> on the order, I think, of 10k per call? Which seems excessive. Can
> anyone explain(a) why I shouldn't care, or (b) what I'm missing here?

The usual suspect: have you compiled your function?

Your code does cons some temporary objects, but whether that is a
problem or not depends on the typical use you'll make of your
function.

That said, if you really need to optimize it:
string-split is not standard CL; have you written it yourself? is it
efficient? maybe using compiled regular expressions (CL-PPCRE) instead
of string-split might help...

hth,
Alessio

> Ta.
>
> Mart

From: mawhin
Subject: Re: (In)Efficient snmpwalk parsing...
Date: Thu, 11 Dec 2008 23:17:32 +0000
Message-ID: <af158269-fffc-442b-87d8-723e07338c69@b38g2000prf.googlegroups.com>

On Dec 11, 11:03 pm, Alessio Stalla <·············@gmail.com> wrote:
> On Dec 11, 11:50 pm, mawhin <···············@gmail.com> wrote:
>
>
>
> > Hi.
>
> > I'm trying to understand (1) why my code conses so much, and (b) how I
> > can improve it. I'm a relative lisp newb here (about 6weeks), and
> > while I can do stuff with it, it's not always pretty.
>
> > Right: I've written some lisp which does stuff with switch forwarding
> > tables, and gets them from files I've got on disk. The files are the
> > output of snmpwalk, and lines look like:
> >  SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008
>
> > the '4008' is a port number, and the last six dot-separated numbers
> > representa MAC address, with the octets encoded decimally. Bastards.
>
> > So, I want to extract the port number (easy) and the MAC address (do-
> > able, but not so easy). And because I'm doing a helluva lot of
> > equality comparisons, I've decided it's more efficient to represent
> > MAC addresses internally as integers instead of strings.So:
>
> > I want to convert
>
> > SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008
>
> > to
>
> > (4008 1957141788), the latter part of which is the integer value of
> > 0.0.116.167.157.28. Honest.
>
> > Here's the code I've got so far:
>
> > (defun fix-snmp-fdb-line-a (line)
> > ;; Side-effects: None
> > ;; Reads-globals: None
> >   (declare (type (simple-string) line))
> >   " Fixes up a line of output of snmpwalking a switch's fdb. Returns a
> > list
> >     of a port number and an integer MAC address."
> >   (let ((spline (string-split #\Space line)))
> >     (list
> >      (parse-integer (fourth spline))
> >      (parse-integer (format nil "~{~2,'0X~}"
> >              (mapcar #'parse-integer
> >                      (last (string-split #\. (first spline))
> > 6))) :radix 16))))
>
> > Which works good. But when I profile, it's consing a lot:
> >   seconds  |   consed   |  calls |  sec/call  |  name
> > 0.149 | 54,712,008 |  3,877 |   0.000038 | FIX-SNMP-FDB-LINE-A
>
> > on the order, I think, of 10k per call? Which seems excessive. Can
> > anyone explain(a) why I shouldn't care, or (b) what I'm missing here?
>
> The usual suspect: have you compiled your function?
>
> Your code does cons some temporary objects, but whether that is a
> problem or not depends on the typical use you'll make of your
> function.
>
> That said, if you really need to optimize it:
> string-split is not standard CL; have you written it yourself? is it
> efficient? maybe using compiled regular expressions (CL-PPCRE) instead
> of string-split might help...
>
> hth,
> Alessio
>
String-split I, erm, found on the internet somewhere. Ahem. Shame,
etc. But it seems pretty efficient.
  seconds  |   consed   |  calls |  sec/call  |  name
     0.016 |  3,661,928 | 11,920 |   0.000001 | STRING-SPLIT

I'll look at CL_PPCRE. Thanks.

Mart

From: Rainer Joswig
Subject: Re: (In)Efficient snmpwalk parsing...
Date: Thu, 11 Dec 2008 23:14:52 +0000
Message-ID: <joswig-C65916.00145112122008@news-europe.giganews.com>

In article 
<····································@i24g2000prf.googlegroups.com>,
 mawhin <···············@gmail.com> wrote:

> Hi.
> 
> I'm trying to understand (1) why my code conses so much, and (b) how I
> can improve it. I'm a relative lisp newb here (about 6weeks), and
> while I can do stuff with it, it's not always pretty.
> 
> Right: I've written some lisp which does stuff with switch forwarding
> tables, and gets them from files I've got on disk. The files are the
> output of snmpwalk, and lines look like:
>  SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008
> 
> the '4008' is a port number, and the last six dot-separated numbers
> representa MAC address, with the octets encoded decimally. Bastards.
> 
> So, I want to extract the port number (easy) and the MAC address (do-
> able, but not so easy). And because I'm doing a helluva lot of
> equality comparisons, I've decided it's more efficient to represent
> MAC addresses internally as integers instead of strings.So:
> 
> I want to convert
> 
> SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008
> 
> to
> 
> (4008 1957141788), the latter part of which is the integer value of
> 0.0.116.167.157.28. Honest.
> 
> Here's the code I've got so far:
> 
> (defun fix-snmp-fdb-line-a (line)
> ;; Side-effects: None
> ;; Reads-globals: None
>   (declare (type (simple-string) line))
>   " Fixes up a line of output of snmpwalking a switch's fdb. Returns a
> list
>     of a port number and an integer MAC address."
>   (let ((spline (string-split #\Space line)))
>     (list
>      (parse-integer (fourth spline))
>      (parse-integer (format nil "~{~2,'0X~}"
>              (mapcar #'parse-integer
>                      (last (string-split #\. (first spline))
> 6))) :radix 16))))
> 
> Which works good. But when I profile, it's consing a lot:
>   seconds  |   consed   |  calls |  sec/call  |  name
> 0.149 | 54,712,008 |  3,877 |   0.000038 | FIX-SNMP-FDB-LINE-A
> 
> on the order, I think, of 10k per call? Which seems excessive. Can
> anyone explain(a) why I shouldn't care, or (b) what I'm missing here?
> 
> Ta.
> 
> Mart

A few remarks. You could probably use one of the parsing tools
(like Edi Weitz' regexp tool) to parse the string.

a) reading a line is already expensive.
   sometimes it is useful to use a special line reader
   which uses a string buffer and does not allocate
   new lines all the time.

b) a special function to convert a bunch of numbers to
   a mac address in integer format might be a useful
   tool. See that you can shift integers in Lisp.

c) a function like parse-integer can directly
   parse from the line. parse-integer takes :start
   and :end values. So for a handwritten function
   you could loop over the string and hand over
   the start and end position of each number
   to parse-integer. That gets rid of intermediate
   strings. Then call the convert function from b).

d) POSITION can find character position in a string.
   It also can search backwards.

So as an example for a handwritten function you
could efficiently extract the port like this:

(let ((line "SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008"))
   (parse-integer line :start (+ 2 (position #\: line :from-end t))))

-- 
http://lispm.dyndns.org/

From: mawhin
Subject: Re: (In)Efficient snmpwalk parsing...
Date: Thu, 11 Dec 2008 23:43:09 +0000
Message-ID: <6a48bf50-2da4-4366-adfe-e13ac027324e@k1g2000prb.googlegroups.com>

On Dec 11, 11:14 pm, Rainer Joswig <······@lisp.de> wrote:
> In article
> <····································@i24g2000prf.googlegroups.com>,
>
>
>
>  mawhin <···············@gmail.com> wrote:
> > Hi.
>
> > I'm trying to understand (1) why my code conses so much, and (b) how I
> > can improve it. I'm a relative lisp newb here (about 6weeks), and
> > while I can do stuff with it, it's not always pretty.
>
> > Right: I've written some lisp which does stuff with switch forwarding
> > tables, and gets them from files I've got on disk. The files are the
> > output of snmpwalk, and lines look like:
> >  SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008
>
> > the '4008' is a port number, and the last six dot-separated numbers
> > representa MAC address, with the octets encoded decimally. Bastards.
>
> > So, I want to extract the port number (easy) and the MAC address (do-
> > able, but not so easy). And because I'm doing a helluva lot of
> > equality comparisons, I've decided it's more efficient to represent
> > MAC addresses internally as integers instead of strings.So:
>
> > I want to convert
>
> > SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008
>
> > to
>
> > (4008 1957141788), the latter part of which is the integer value of
> > 0.0.116.167.157.28. Honest.
>
> > Here's the code I've got so far:
>
> > (defun fix-snmp-fdb-line-a (line)
> > ;; Side-effects: None
> > ;; Reads-globals: None
> >   (declare (type (simple-string) line))
> >   " Fixes up a line of output of snmpwalking a switch's fdb. Returns a
> > list
> >     of a port number and an integer MAC address."
> >   (let ((spline (string-split #\Space line)))
> >     (list
> >      (parse-integer (fourth spline))
> >      (parse-integer (format nil "~{~2,'0X~}"
> >              (mapcar #'parse-integer
> >                      (last (string-split #\. (first spline))
> > 6))) :radix 16))))
>
> > Which works good. But when I profile, it's consing a lot:
> >   seconds  |   consed   |  calls |  sec/call  |  name
> > 0.149 | 54,712,008 |  3,877 |   0.000038 | FIX-SNMP-FDB-LINE-A
>
> > on the order, I think, of 10k per call? Which seems excessive. Can
> > anyone explain(a) why I shouldn't care, or (b) what I'm missing here?
>
> > Ta.
>
> > Mart
>
> A few remarks. You could probably use one of the parsing tools
> (like Edi Weitz' regexp tool) to parse the string.
>
> a) reading a line is already expensive.
>    sometimes it is useful to use a special line reader
>    which uses a string buffer and does not allocate
>    new lines all the time.
>
> b) a special function to convert a bunch of numbers to
>    a mac address in integer format might be a useful
>    tool. See that you can shift integers in Lisp.
>
> c) a function like parse-integer can directly
>    parse from the line. parse-integer takes :start
>    and :end values. So for a handwritten function
>    you could loop over the string and hand over
>    the start and end position of each number
>    to parse-integer. That gets rid of intermediate
>    strings. Then call the convert function from b).
>
> d) POSITION can find character position in a string.
>    It also can search backwards.
>
> So as an example for a handwritten function you
> could efficiently extract the port like this:
>
> (let ((line "SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008"))
>    (parse-integer line :start (+ 2 (position #\: line :from-end t))))


Yes. Getting the port, as I said,was the easy bit...

Nonetheless, thanks. Re-reading your advice has made me realise I'm
wasting a lot by first converting to hex and then back, whereas what I
should be doing is summing n1 + (n2 * 16^2) * (n3 * 16^3) .... Or some
such. Thanks. Off to do some more head-scratching now...

From: mawhin
Subject: Re: (In)Efficient snmpwalk parsing...
Date: Fri, 12 Dec 2008 14:35:37 +0000
Message-ID: <43a317c3-29a8-4b6f-ac26-8b6a00715939@k36g2000pri.googlegroups.com>

On Dec 11, 11:43 pm, mawhin <···············@gmail.com> wrote:
> On Dec 11, 11:14 pm, Rainer Joswig <······@lisp.de> wrote:
>
>
>
> > In article
> > <····································@i24g2000prf.googlegroups.com>,
>
> >  mawhin <···············@gmail.com> wrote:
> > > Hi.
>
> > > I'm trying to understand (1) why my code conses so much, and (b) how I
> > > can improve it. I'm a relative lisp newb here (about 6weeks), and
> > > while I can do stuff with it, it's not always pretty.
>
> > > Right: I've written some lisp which does stuff with switch forwarding
> > > tables, and gets them from files I've got on disk. The files are the
> > > output of snmpwalk, and lines look like:
> > >  SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008
>
> > > the '4008' is a port number, and the last six dot-separated numbers
> > > representa MAC address, with the octets encoded decimally. Bastards.
>
> > > So, I want to extract the port number (easy) and the MAC address (do-
> > > able, but not so easy). And because I'm doing a helluva lot of
> > > equality comparisons, I've decided it's more efficient to represent
> > > MAC addresses internally as integers instead of strings.So:
>
> > > I want to convert
>
> > > SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008
>
> > > to
>
> > > (4008 1957141788), the latter part of which is the integer value of
> > > 0.0.116.167.157.28. Honest.
>
> > > Here's the code I've got so far:
>
> > > (defun fix-snmp-fdb-line-a (line)
> > > ;; Side-effects: None
> > > ;; Reads-globals: None
> > >   (declare (type (simple-string) line))
> > >   " Fixes up a line of output of snmpwalking a switch's fdb. Returns a
> > > list
> > >     of a port number and an integer MAC address."
> > >   (let ((spline (string-split #\Space line)))
> > >     (list
> > >      (parse-integer (fourth spline))
> > >      (parse-integer (format nil "~{~2,'0X~}"
> > >              (mapcar #'parse-integer
> > >                      (last (string-split #\. (first spline))
> > > 6))) :radix 16))))
>
> > > Which works good. But when I profile, it's consing a lot:
> > >   seconds  |   consed   |  calls |  sec/call  |  name
> > > 0.149 | 54,712,008 |  3,877 |   0.000038 | FIX-SNMP-FDB-LINE-A
>
> > > on the order, I think, of 10k per call? Which seems excessive. Can
> > > anyone explain(a) why I shouldn't care, or (b) what I'm missing here?
>
> > > Ta.
>
> > > Mart
>
> > A few remarks. You could probably use one of the parsing tools
> > (like Edi Weitz' regexp tool) to parse the string.
>
> > a) reading a line is already expensive.
> >    sometimes it is useful to use a special line reader
> >    which uses a string buffer and does not allocate
> >    new lines all the time.
>
> > b) a special function to convert a bunch of numbers to
> >    a mac address in integer format might be a useful
> >    tool. See that you can shift integers in Lisp.
>
> > c) a function like parse-integer can directly
> >    parse from the line. parse-integer takes :start
> >    and :end values. So for a handwritten function
> >    you could loop over the string and hand over
> >    the start and end position of each number
> >    to parse-integer. That gets rid of intermediate
> >    strings. Then call the convert function from b).
>
> > d) POSITION can find character position in a string.
> >    It also can search backwards.
>
> > So as an example for a handwritten function you
> > could efficiently extract the port like this:
>
> > (let ((line "SNMPv2-SMI::mib-2.17.4.3.1.2.0.0.116.167.157.28 = INTEGER: 4008"))
> >    (parse-integer line :start (+ 2 (position #\: line :from-end t))))
>
> Yes. Getting the port, as I said,was the easy bit...
>
> Nonetheless, thanks. Re-reading your advice has made me realise I'm
> wasting a lot by first converting to hex and then back, whereas what I
> should be doing is summing n1 + (n2 * 16^2) * (n3 * 16^3) .... Or some
> such. Thanks. Off to do some more head-scratching now...

Goodness. The new version:

(defun octet-list-to-integer (octets &optional (pwr 0))
  "takes a list of 0 or more integer octets (0 >= n > 256), returns
one integer."
  ;; f() = 0
  ;; f(n1) = (n1 * 16^0) + f(0)
  ;; f(n2, n1) = (n2 * 16^2) + f(n1)
  ;; f(nx, nx-1, ... n) = (nx * 16^(2x)) + f(nx-1)
  (cond (octets
         (+ (* (parse-integer (first octets)) (expt 16 pwr))
            (octet-list-to-integer (rest octets) (+ 2 pwr))))
        (t
         0)))

(defun fix-snmp-fdb-line-c (line)
;; Side-effects: None
;; Reads-globals: None
  " Fixes up a line of output of snmpwalking a switch's fdb. Returns a
list
    of a port number and an integer MAC address."
  (list
     (parse-integer line :start (+ 2 (position #\: line :from-end t)))
     (multiple-value-bind (m l)
         (cl-ppcre:scan-to-strings "([0-9]+)\.([0-9]+)\.([0-9]+)\.
([0-9]+)\.([0-9]+)\.([0-9]+)\\
 " line) (octet-list-to-integer (nreverse (coerce l 'list))))))


Sorry about the regexp, but in general this is a boatload faster, and
conses about a tenth as much. I'm sure I'm still doing some things
wrong (like the multiple-value-bind with the unused first variable),
but thank you. I've learned a bit, and got a little way closer to lisp
nirvana.

Mart

From: Thomas A. Russ
Subject: Re: (In)Efficient snmpwalk parsing...
Date: Fri, 12 Dec 2008 17:46:27 +0000
Message-ID: <ymi4p1970cs.fsf@blackcat.isi.edu>

mawhin <···············@gmail.com> writes:

> > Nonetheless, thanks. Re-reading your advice has made me realise I'm
> > wasting a lot by first converting to hex and then back, whereas what I
> > should be doing is summing n1 + (n2 * 16^2) * (n3 * 16^3) .... Or some
> > such. Thanks. Off to do some more head-scratching now...
> 
> Goodness. The new version:
> 
> (defun octet-list-to-integer (octets &optional (pwr 0))
>   "takes a list of 0 or more integer octets (0 >= n > 256), returns
> one integer."
>   ;; f() = 0
>   ;; f(n1) = (n1 * 16^0) + f(0)
>   ;; f(n2, n1) = (n2 * 16^2) + f(n1)
>   ;; f(nx, nx-1, ... n) = (nx * 16^(2x)) + f(nx-1)
>   (cond (octets
>          (+ (* (parse-integer (first octets)) (expt 16 pwr))
>             (octet-list-to-integer (rest octets) (+ 2 pwr))))
>         (t
>          0)))

This is a good first attempt, but with a little factoring you can make
it more efficient still.  And then by doing some fancy arithmetic
hacking, you can speed that up even more.

1)  Factoring.  Your formula for the result is basically:

    n0 + n1*16 + n2*16^2 + n3*16^3 + ...

now, with some factoring we can eliminate the need to actually raise
anything to a power.  A factored version of the equation would look like 

    n0 + 16*(n1 + 16*(n2 + 16*(n3 + ...)))

where we have unrolled the exponentiation into repeated multiplication.
This will generally be a more efficient operation than expoentiation.

To make this truly efficient, you have to arrange to process the octets
starting with the most significant byte first.  Happily for you, that is
the order in which you encounter the octets in your input.  This will
actually work better as a loop rather than a recursion.  The opposite
will be true if you encounter the bytes starting with the low byte
first.  So your loop would look something like this:

   (loop with result = 0
         for octet in octets
         do (setq result (+ (parse-integer octet) (* 16 result)))
         finally (return result))

where you accumulate the results, always multiplying the previous result
by 16.

2)  Once we have this structure, we can be even cleverer by exploiting
    the fact that Common Lisp uses base-2 integers.  That allows us to
replace multiplication by 16 with a shift of the integer bits by 4
positions.  In other words, instead of multiplying, we just do a more
efficient bit shift:

   (setq result (+ (parse-integer octet) (ash result 4)))

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Thomas A. Russ
Subject: Re: (In)Efficient snmpwalk parsing...
Date: Fri, 12 Dec 2008 17:30:37 +0000
Message-ID: <ymi8wql7136.fsf@blackcat.isi.edu>

mawhin <···············@gmail.com> writes:

> Hi.
> 
> Here's the code I've got so far:
> 
> (defun fix-snmp-fdb-line-a (line)
> ;; Side-effects: None
> ;; Reads-globals: None
>   (declare (type (simple-string) line))
>   " Fixes up a line of output of snmpwalking a switch's fdb. Returns a
> list
>     of a port number and an integer MAC address."
>   (let ((spline (string-split #\Space line)))
>     (list
>      (parse-integer (fourth spline))
>      (parse-integer (format nil "~{~2,'0X~}"
>              (mapcar #'parse-integer
>                      (last (string-split #\. (first spline))
> 6))) :radix 16))))
> 
> Which works good. But when I profile, it's consing a lot:
>   seconds  |   consed   |  calls |  sec/call  |  name
> 0.149 | 54,712,008 |  3,877 |   0.000038 | FIX-SNMP-FDB-LINE-A
> 
> on the order, I think, of 10k per call? Which seems excessive. Can
> anyone explain(a) why I shouldn't care, or (b) what I'm missing here?

As for the why it conses so much:  There is a lot of creation of strings
going on here.  STRING-SPLIT has to create new strings for each of the
substrings that it processes.  Also, the FORMAT call has to produce a
new string.  Format strings also often get interpreted, so there is the
possibilty that there is some additional consing going on in there.

(a)  Why you shouldn't care.  I'll start with another question:  Is your
current function fast enough for your purposes?  If so, don't worry
about the speed, since it's adequate.  Move on to other more interesting
challenges.

(b)  What you can do about it.  The main thing would be to not go the
"easy" route of using SPLIT-STRING.  Instead, you would need to descend
one level of abstraction and find the split points yourself.  You can
then call PARSE-INTEGER on the original string, passing the :START and
:END keywords to just operate on the appropriate part of the string.

For a simple example of that, consider the following for reading the
port number:

  (parse-integer line :start (position #\Space line :from-end t))

But in pursuing a solution like this you want to also avoid repeatedly
scanning the entire input line, or else you will just move the source of
slow performance from consing to algorithmic complexity.

The most efficient solution would be to write your own small parser that
iterates through the string and accumulates what you want along the
way.  But that would be a bit tedious.

So unless this really is too slow, I would probably be inclined to leave
things are they are.  Or perhaps to use CL-PPCRE to use a regular
expression for the parsing instead.

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Peder O. Klingenberg
Subject: Re: (In)Efficient snmpwalk parsing...
Date: Fri, 12 Dec 2008 20:02:31 +0000
Message-ID: <ks63lpmaaw.fsf@beto.netfonds.no>

···@sevak.isi.edu (Thomas A. Russ) writes:

> (a)  Why you shouldn't care.  I'll start with another question:  Is your
> current function fast enough for your purposes?  If so, don't worry
> about the speed, since it's adequate.  Move on to other more interesting
> challenges.
>
> (b)  What you can do about it.  The main thing would be to not go the
> "easy" route of using SPLIT-STRING.  Instead, you would need to descend
> one level of abstraction and find the split points yourself.  You can
> then call PARSE-INTEGER on the original string, passing the :START and
> :END keywords to just operate on the appropriate part of the string.

This reminds me of a problem I encountered with one of our
applications for collecting stock market pricing info.  It was written
without much consideration for efficiency.  For instance, parsing
incoming data used several nested SUBSEQs, only to call e.g
parse-integer on the final piece of string and then throwing all the
intermediate strings away.

Volumes of market data have increased rather a lot over the last few
years, and all of a sudden this application had a problem keeping up,
and our customers were noticing delays.

So I started profiling, running through a complete day of input data
and watching for hot spots.  Unsurprisingly, the SUBSEQ-based parsing
was a major one.  I rewrote the entire parser module to store incoming
data items in a single string buffer, passing around start/end
pointers instead of generating subsequences.

Along with other tweaks, I was able to achieve a 10-12x speed
increase.  Feeling really pleased with myself, I checked in the code
and started to build the application, when I noticed something
strange.  The build script had a bug, and the application had been
running off interpreted code for years!

...Peder...
-- 
I wish a new life awaited _me_ in some off-world colony.