Re: Back to the Future: Lisp as a Base for a Statistical Computing System

From: Rainer Joswig
Subject: Re: Back to the Future: Lisp as a Base for a Statistical Computing System
Date: Thu, 04 Dec 2008 21:11:54 +0000
Message-ID: <joswig-EA3D2D.22115404122008@news-europe.giganews.com>

In article 
<····································@s20g2000yqh.googlegroups.com>,
 Francogrex <······@grex.org> wrote:

> This below is an abstract of a seemingly very interesting article from
> Ross Ihaka the developer of the very popular and powerful software for
> statistical computing (R). He's going "back to the future" and using
> lisp to improve things. It's a pity that we can't have the full text
> article, it would be a great read.
> 
> Back to the Future: Lisp as a Base for a Statistical Computing System
> Ross Ihaka  and Duncan Temple Lang
> Abstract
> The application of cutting-edge statistical methodology is limited by
> the capabilities of the systems in which it is implemented. In
> particular, the limitations of R mean that applications developed
> there do not scale to the larger problems of interest in practice. We
> identify some of the limitations of the computational model of the R
> language that reduces its effectiveness for dealing with large data
> efficiently in the modern era.
> We propose developing an R-like language on top of a Lisp-based engine
> for statistical computing that provides a paradigm for modern
> challenges and which leverages the work of a wider community. At its
> simplest, this provides a convenient, high-level language with support
> for compiling code to machine instructions for very significant
> improvements in computational performance. But we also propose to
> provide a framework which supports more computationally intensive
> approaches for dealing with large datasets and position ourselves for
> dealing with future directions in high-performance computing.
> We discuss some of the trade-offs and describe our efforts to
> realizing this approach. More abstractly, we feel that it is important
> that our community explore more ambitious, experimental and risky
> research to explore computational innovation for modern data analyses.

I could read that full article here:

http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21

-- 
http://lispm.dyndns.org/

Re: Back to the Future: Lisp as a Base for a Statistical Computing System ·············@gmail.com
Re: Back to the Future: Lisp as a Base for a Statistical Computing System ·············@gmail.com
- Re: Back to the Future: Lisp as a Base for a Statistical Computing System Mirko
Re: Back to the Future: Lisp as a Base for a Statistical Computing System Tamas K Papp
Re: Back to the Future: Lisp as a Base for a Statistical Computing System Tamas K Papp
Re: Back to the Future: Lisp as a Base for a Statistical Computing System Tamas K Papp
- Re: Back to the Future: Lisp as a Base for a Statistical Computing System Tamas K Papp
- Re: Back to the Future: Lisp as a Base for a Statistical Computing System Raymond Toy
Re: Back to the Future: Lisp as a Base for a Statistical Computing System John "Z-Bo" Zabroski

From: ·············@gmail.com
Subject: Re: Back to the Future: Lisp as a Base for a Statistical Computing 	System
Date: Thu, 04 Dec 2008 21:30:08 +0000
Message-ID: <e0208f05-3673-4355-b748-f4e3cb28d48b@v5g2000prm.googlegroups.com>

Ahh!  Beat me to it, I found it as well.  It's nice that the full
article is readable this way.

This is an AMAZING



On Dec 4, 2:11 pm, Rainer Joswig <······@lisp.de> wrote:
>
> I could read that full article here:
>
> http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21
>
> --http://lispm.dyndns.org/

From: ·············@gmail.com
Subject: Re: Back to the Future: Lisp as a Base for a Statistical Computing 	System
Date: Thu, 04 Dec 2008 21:54:26 +0000
Message-ID: <02e039b7-5899-4e59-812e-e75997d5f1a6@y1g2000pra.googlegroups.com>

Oops, finger fumble made me submit too early.

This is an AMAZING paper.  It discusses the performance as well as
expressive advantages of using Common Lisp as the base of a custom DSL
instead of writing one's own interpreter.  Over the past several years
I've been also been doing something very similar, although substitute
MATLAB for R and substitute engineering simulations for statistics.

I often lie awake in the middle of the night weighing the pros and
cons of a CL based approach like this article and my work vs using
something like SciPy (the scientific & numerical substrate built on
top of Python -- a very nice package if you like Python, btw).  But
the expressive advantages that CL has over Python as well as the raw
speed advantages and even better, the continuum of optimization
mentioned at the end of section 4 convince me that CL is still the
right approach for something like this.  It's even better that you
don't ever need to drop into C except in the rarest of specialized
cases.

In my work, I've developed a library for doing discrete event
simulations using the DEVS formalism.  Normally, this involves
subclassing from the base model class in the framework and then
specializing the small number of methods necessary for the simulation
behavior.  This year, I developed a very expressive macro language
that allowed me to express models directly in a very compact notation
without it looking like a normal subclassed framework usage.  For
example, defining a block that implements 'plus' functionality takes
roughly 30 lines of defclass and defmethod forms.  In the macro, it's
6 lines.  And it's still all blindingly fast because this compact
notation is expanded at macro-expansion time.  In python, if you were
to do something like that, you'd have to go to an external parser or
suffer some sort of run-time hit.

These reasons and the reasons mentioned in the article make CL my
secret weapon.

Summary:  for those interested in numerical computing with CL as well
as wanting additional evidence/fodder for advocating CL use, this is a
great article.

Glenn

> I could read that full article here:
>
> http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21
>
> --http://lispm.dyndns.org/

From: Mirko
Subject: Re: Back to the Future: Lisp as a Base for a Statistical Computing 	System
Date: Fri, 05 Dec 2008 02:19:51 +0000
Message-ID: <181723c7-b069-4f93-9ee3-d00091dd25b6@j11g2000yqg.googlegroups.com>

On Dec 4, 4:54 pm, ·············@gmail.com wrote:
> Oops, finger fumble made me submit too early.
>
> This is an AMAZING paper.  It discusses the performance as well as
> expressive advantages of using Common Lisp as the base of a custom DSL
> instead of writing one's own interpreter.  Over the past several years
> I've been also been doing something very similar, although substitute
> MATLAB for R and substitute engineering simulations for statistics.
>
> I often lie awake in the middle of the night weighing the pros and
> cons of a CL based approach like this article and my work vs using
> something like SciPy (the scientific & numerical substrate built on
> top of Python -- a very nice package if you like Python, btw).  But
> the expressive advantages that CL has over Python as well as the raw
> speed advantages and even better, the continuum of optimization
> mentioned at the end of section 4 convince me that CL is still the
> right approach for something like this.  It's even better that you
> don't ever need to drop into C except in the rarest of specialized
> cases.
>
> In my work, I've developed a library for doing discrete event
> simulations using the DEVS formalism.  Normally, this involves
> subclassing from the base model class in the framework and then
> specializing the small number of methods necessary for the simulation
> behavior.  This year, I developed a very expressive macro language
> that allowed me to express models directly in a very compact notation
> without it looking like a normal subclassed framework usage.  For
> example, defining a block that implements 'plus' functionality takes
> roughly 30 lines of defclass and defmethod forms.  In the macro, it's
> 6 lines.  And it's still all blindingly fast because this compact
> notation is expanded at macro-expansion time.  In python, if you were
> to do something like that, you'd have to go to an external parser or
> suffer some sort of run-time hit.
>
> These reasons and the reasons mentioned in the article make CL my
> secret weapon.
>
> Summary:  for those interested in numerical computing with CL as well
> as wanting additional evidence/fodder for advocating CL use, this is a
> great article.
>
> Glenn
>
> > I could read that full article here:
>
> >http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21
>
> > --http://lispm.dyndns.org/
>
>

There are a couple of interesting numerical projects going on:

 - NLISP project's aim is to implement MatLab (&IDL's) vector
functionality
 - GSLL is a library that links lisp with the GNU Scientific Library

I have used both of these on SBCL with good success and good rapport
with their authors.

Mirko

From: Tamas K Papp
Subject: Re: Back to the Future: Lisp as a Base for a Statistical Computing System
Date: Fri, 05 Dec 2008 02:01:19 +0000
Message-ID: <6prgbfF9hui4U1@mid.individual.net>

On Thu, 04 Dec 2008 22:11:54 +0100, Rainer Joswig wrote:

> I could read that full article here:
> 
> http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21

Thanks for the link, it was an interesting read.

I used R before coming to CL.  I still remember the painful hours of 
debugging C code that I had to write to speed things up.  I had do to do 
that often, since not of lot of the stuff I write vectorizes nicely (eg 
MCMC methods).

Common Lisp has been a blessing, it is extremely fast.  When I started 
using it, I stressed about optimizing my code, but I only do that very 
rarely now.  It is fast enough most of the time without extra tweaking, 
and then I just profile and optimize the bottlenecks.

I feel that the authors of the article are moving in the right direction, 
but they are also trying to sugar-coat Lisp for R users, by thin syntax 
layers and semantic extensions.  I see no purpose in doing that, CL is 
here, and people can start programming in it today if they want to.

The only thing I miss in CL is some libraries.  But the language is quite 
flexible, so they are easy to develop in most cases.  The libraries I 
miss at the moment include the following:

- a nice robust multivariate optimization/rootfinding library, with a set 
of methods based on csolve, the other on trust region methods,

- B-spline library (GSLL does work for some stuff, lacks important things)

- common mathematical functions (distributions, gamma, etc)

I know GSLL is quite good, but interfacing with foreign code is still 
clunky, especially when I want to have a Lisp function called by foreign 
code.  During the summer, I plan to develop some libraries for the above.

Tamas

From: Tamas K Papp
Subject: Re: Back to the Future: Lisp as a Base for a Statistical Computing System
Date: Fri, 05 Dec 2008 02:01:20 +0000
Message-ID: <6prgbgF9m4cuU1@mid.individual.net>

On Thu, 04 Dec 2008 22:11:54 +0100, Rainer Joswig wrote:

> I could read that full article here:
> 
> http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21

Thanks for the link, it was an interesting read.

I used R before coming to CL.  I still remember the painful hours of 
debugging C code that I had to write to speed things up.  I had do to do 
that often, since not of lot of the stuff I write vectorizes nicely (eg 
MCMC methods).

Common Lisp has been a blessing, it is extremely fast.  When I started 
using it, I stressed about optimizing my code, but I only do that very 
rarely now.  It is fast enough most of the time without extra tweaking, 
and then I just profile and optimize the bottlenecks.

I feel that the authors of the article are moving in the right direction, 
but they are also trying to sugar-coat Lisp for R users, by thin syntax 
layers and semantic extensions.  I see no purpose in doing that, CL is 
here, and people can start programming in it today if they want to.

The only thing I miss in CL is some libraries.  But the language is quite 
flexible, so they are easy to develop in most cases.  The libraries I 
miss at the moment include the following:

- a nice robust multivariate optimization/rootfinding library, with a set 
of methods based on csolve, the other on trust region methods,

- B-spline library (GSLL does work for some stuff, lacks important things)

- common mathematical functions (distributions, gamma, etc)

I know GSLL is quite good, but interfacing with foreign code is still 
clunky, especially when I want to have a Lisp function called by foreign 
code.  During the summer, I plan to develop some libraries for the above.

Tamas

From: Tamas K Papp
Subject: Re: Back to the Future: Lisp as a Base for a Statistical Computing System
Date: Fri, 05 Dec 2008 02:01:20 +0000
Message-ID: <6prgbgF9hui4U2@mid.individual.net>

On Thu, 04 Dec 2008 22:11:54 +0100, Rainer Joswig wrote:

> I could read that full article here:
> 
> http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21

Thanks for the link, it was an interesting read.

I used R before coming to CL.  I still remember the painful hours of 
debugging C code that I had to write to speed things up.  I had do to do 
that often, since not of lot of the stuff I write vectorizes nicely (eg 
MCMC methods).

Common Lisp has been a blessing, it is extremely fast.  When I started 
using it, I stressed about optimizing my code, but I only do that very 
rarely now.  It is fast enough most of the time without extra tweaking, 
and then I just profile and optimize the bottlenecks.

I feel that the authors of the article are moving in the right direction, 
but they are also trying to sugar-coat Lisp for R users, by thin syntax 
layers and semantic extensions.  I see no purpose in doing that, CL is 
here, and people can start programming in it today if they want to.

The only thing I miss in CL is some libraries.  But the language is quite 
flexible, so they are easy to develop in most cases.  The libraries I 
miss at the moment include the following:

- a nice robust multivariate optimization/rootfinding library, with a set 
of methods based on csolve, the other on trust region methods,

- B-spline library (GSLL does work for some stuff, lacks important things)

- common mathematical functions (distributions, gamma, etc)

I know GSLL is quite good, but interfacing with foreign code is still 
clunky, especially when I want to have a Lisp function called by foreign 
code.  During the summer, I plan to develop some libraries for the above.

Tamas

From: Tamas K Papp
Subject: Re: Back to the Future: Lisp as a Base for a Statistical Computing System
Date: Fri, 05 Dec 2008 02:02:31 +0000
Message-ID: <6prgdnF9j5ecU1@mid.individual.net>

On Fri, 05 Dec 2008 02:01:20 +0000, Tamas K Papp wrote:

sorry for the multiple posts, newsreader acting crazy

From: Raymond Toy
Subject: Re: Back to the Future: Lisp as a Base for a Statistical Computing System
Date: Fri, 05 Dec 2008 16:40:27 +0000
Message-ID: <sxdskp2tw1w.fsf@rtp.ericsson.se>

>>>>> "Tamas" == Tamas K Papp <Tamas> writes:

    Tamas> - a nice robust multivariate optimization/rootfinding library, with a set 
    Tamas> of methods based on csolve, the other on trust region methods,

Don't know if this satisfies your requirement for nice and robust, but
I have a translation of DONLP2 for multivariate optimization.
Translation done by f2cl, but it seems to work and most of the
examples for DONLP2 pass.

    Tamas> - common mathematical functions (distributions, gamma, etc)

Maxima has implementations of some of these in Lisp.  Clocc has some
code for this.

    Tamas> I know GSLL is quite good, but interfacing with foreign code is still 
    Tamas> clunky, especially when I want to have a Lisp function called by foreign 
    Tamas> code.  During the summer, I plan to develop some libraries for the above.

I have also wanted to do this, but I have never really gotten very
far.

Ray

From: John "Z-Bo" Zabroski
Subject: Re: Back to the Future: Lisp as a Base for a Statistical Computing 	System
Date: Sun, 07 Dec 2008 18:39:46 +0000
Message-ID: <d72c8cfe-eb91-4b19-9f8c-1a5b73e1bc57@j11g2000yqg.googlegroups.com>

On Dec 4, 4:11 pm, Rainer Joswig <······@lisp.de> wrote:
> In article
> <····································@s20g2000yqh.googlegroups.com>,
>
>
>
>  Francogrex <······@grex.org> wrote:
> > This below is an abstract of a seemingly very interesting article from
> > Ross Ihaka the developer of the very popular and powerful software for
> > statistical computing (R). He's going "back to the future" and using
> > lisp to improve things. It's a pity that we can't have the full text
> > article, it would be a great read.
>
> > Back to the Future: Lisp as a Base for a Statistical Computing System
> > Ross Ihaka  and Duncan Temple Lang
> > Abstract
> > The application of cutting-edge statistical methodology is limited by
> > the capabilities of the systems in which it is implemented. In
> > particular, the limitations of R mean that applications developed
> > there do not scale to the larger problems of interest in practice. We
> > identify some of the limitations of the computational model of the R
> > language that reduces its effectiveness for dealing with large data
> > efficiently in the modern era.
> > We propose developing an R-like language on top of a Lisp-based engine
> > for statistical computing that provides a paradigm for modern
> > challenges and which leverages the work of a wider community. At its
> > simplest, this provides a convenient, high-level language with support
> > for compiling code to machine instructions for very significant
> > improvements in computational performance. But we also propose to
> > provide a framework which supports more computationally intensive
> > approaches for dealing with large datasets and position ourselves for
> > dealing with future directions in high-performance computing.
> > We discuss some of the trade-offs and describe our efforts to
> > realizing this approach. More abstractly, we feel that it is important
> > that our community explore more ambitious, experimental and risky
> > research to explore computational innovation for modern data analyses.
>
> I could read that full article here:
>
> http://books.google.com/books?id=8Cf16JkKz30C&pg=PA21&lpg=PA21


I would like to see the plots package's heuristics preserved, but make
it easier to prototype new plots apart from the base ones.  In short,
a new statistics language should utilize multiple paradigms to design
a better graphics package than R's present one.  Even though it uses
Cleveland's excellent graphing heuristics, the design feels very much
like it was drafted by someone with no formal knowledge in program
design.

Arguably, Lisp by itself offers no raw materials for visual
programming.  Lisp is not a visual environment.  However, Lisp could
be used to prototype a textual command language for shaping data into
graphical form, as well as used to reason about a meta object protocol
for such data shaping and reports generation.