statistical vs deterministic profilers

From: Thibault Langlois
Subject: statistical vs deterministic profilers
Date: Tue, 24 Apr 2007 08:35:31 +0000
Message-ID: <1177403730.911751.168830@r35g2000prh.googlegroups.com>

Hello,
It seems that statistical profilers are prefered over deterministic
ones. Lispworks has a statistical profiler, I don't know about Allegro
some free/open-source implementations have a deterministic profiler
(for example cmucl). The manual of SBCL says:

SBCL includes both a deterministic profiler, that can collect
statistics on individual functions, and a more "modern" statistical
profiler.

I am wondering what are the advantages of a statistical  over a
deterministic profiler.
I remember that when I used Lispworks profiler I had to run several
times to get a good idea of the location of the hot spots. By
contrast, I can run a deterministic profiler only once and I know
exactly how many times a function is called, how much time it takes to
compute, how much memory is used and so on.

Am I missing something ?

Thibault Langlois

Re: statistical vs deterministic profilers Duane Rettig
Re: statistical vs deterministic profilers Juho Snellman

From: Duane Rettig
Subject: Re: statistical vs deterministic profilers
Date: Tue, 24 Apr 2007 14:54:52 +0000
Message-ID: <o0lkghq0ub.fsf@gemini.franz.com>

Thibault Langlois <·················@gmail.com> writes:

> Hello,
> It seems that statistical profilers are prefered over deterministic
> ones. Lispworks has a statistical profiler, I don't know about Allegro
> some free/open-source implementations have a deterministic profiler
> (for example cmucl). The manual of SBCL says:
>
> SBCL includes both a deterministic profiler, that can collect
> statistics on individual functions, and a more "modern" statistical
> profiler.
>
> I am wondering what are the advantages of a statistical  over a
> deterministic profiler.
> I remember that when I used Lispworks profiler I had to run several
> times to get a good idea of the location of the hot spots. By
> contrast, I can run a deterministic profiler only once and I know
> exactly how many times a function is called, how much time it takes to
> compute, how much memory is used and so on.
>
> Am I missing something ?
>
> Thibault Langlois

Juho has already answered for SBCL, and I won't bother repeating the
advantages he gave in his post, except to say that Allegro CL also has
what you're calling a "statistical" profiler.  It is also called a
"sampling profiler", because instead of the profiler being forced by
pre-instrumented code (hence the "deterministic" style) to take a
count of how many times the function was entered (which on a
deterministic profiler is usually the only place that is instrumented)
instead the sampling profiler uses a timer to take a shapshot of the
stack at a regular interval, which includes the current program
counter of the machine.  This allows down-to-the-instruction profiler
hits, and in Allegro CL the function prof:disassemble-profile allows
you to see those hits - the function provides an indented disassembled
function ith annotations on those instructions that have been
interrupted, giving both the number of hits and the percentage of that
number to the total hits in the same function.  If you know your
statistics, you understand that the more samples you have, the more
accurate the data, so if you run the program enough times, you can
actually get a feel for where the pipeline blockages are; the hits
tend to get bunched up where the instructions have been stalled for
one reason or the other.  Juho also mentioned being able to profile
functions other than lisp functions, and this is also true in Allegro
CL; they will show up as strings in the prof:show-flat-profile or the
prof:show-call-graph outputs - and like disassemble,
prof:disassemble-profile allows for strings as input, and interprets
them as object-file entry points to disassemble, if there are any
sampling hits within that function.

In addition, Allegro CL does have the equivalent of a "deterministic"
profiler (we call it call-counting), and it has the additional feature
of not requiring any instrumentation in order to get call-count
statistics on almost every Lisp function, named or not.  Additionally,
this call-counting mechanism can be run at the same time as the
sampling profiler, and you can get information in your reports that
includes both sampling information and call-count information.  Juho
is right on this one also; the call-count mechanism takes time, and
thus skews the sampling result.  But if you figure this into your
result, the counting mechanism can be used with the sampling mechanism
to your advantage - it just usually shows up at the top of the flat
profile.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Juho Snellman
Subject: Re: statistical vs deterministic profilers
Date: Tue, 24 Apr 2007 09:58:33 +0000
Message-ID: <slrnf2rl69.6g7.jsnell@sbz-30.cs.Helsinki.FI>

Thibault Langlois <·················@gmail.com> wrote:
> I am wondering what are the advantages of a statistical  over a
> deterministic profiler.
> I remember that when I used Lispworks profiler I had to run several
> times to get a good idea of the location of the hot spots. By
> contrast, I can run a deterministic profiler only once and I know
> exactly how many times a function is called, how much time it takes to
> compute, how much memory is used and so on.
>
> Am I missing something ?

The SBCL statistical profiler has at least the following advantages
over the deterministic one:

  * Has lower overhead, and thus can produce more accurate results;
    this is especially important with small functions, where the
    profiling overhead can easily be as large as the time spent doing
    real work.
  * Shows what the hotspots are inside a function, not just which
    functions are hotspots
  * Collects call graph data
  * Profiles anonymous / local functions
  * Profiles everything, including the standard CL library functions,
    not just the functions in your application that you suspect might
    be important
  * Has a Slime interface for browsing call graphs (though it's not
    included with the standard Slime distribution)

Now, some of these advantages aren't really fundamental: someone could
implement a deterministic profiler with less overhead, or one that's
capable of collecting call graph data, or even one that can handle
anonymous functions. But it's pretty hard to improve one point without
making the situation with others worse. For example adding call graph
support would increase the profiling overhead.

-- 
Juho Snellman