lisp implementations and scaling requirements

From: Alok
Subject: lisp implementations and scaling requirements
Date: Sat, 04 Nov 2006 20:49:40 +0000
Message-ID: <1162673380.787115.126030@h54g2000cwb.googlegroups.com>

Daniel Barlow noted a point here -> http://ww.telent.net/diary/2002/11/
that "Lisp is unsuitable for Google because implementations don't cope
with their scary scaling requirements.".

True enough, just to start up SBCL on my machine takes up 26M (python
4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
you want thousands of processes (doing simple things) running
simultaneously.

So I would like to invite comments from people who know about this, is
it possible to create a runtime shared library as a Lisp
implementation?

Also how well supported is multi-threading in Lisp (SBCL)?
--
Alok

Re: lisp implementations and scaling requirements Javier
- Re: lisp implementations and scaling requirements Bill Atkins
  - Re: lisp implementations and scaling requirements Javier
    - Re: lisp implementations and scaling requirements Pascal Bourguignon
      - Re: lisp implementations and scaling requirements Juho Snellman
- Re: lisp implementations and scaling requirements Lars Rune Nøstdal
  - Re: lisp implementations and scaling requirements Javier
    - Re: lisp implementations and scaling requirements Lars Rune Nøstdal
      - Re: lisp implementations and scaling requirements Javier
Re: lisp implementations and scaling requirements ······@earthlink.net
- Re: lisp implementations and scaling requirements Alok
  - Re: lisp implementations and scaling requirements ······@earthlink.net
    - Re: lisp implementations and scaling requirements Rob Thorpe
Re: lisp implementations and scaling requirements Kaz Kylheku
- Re: lisp implementations and scaling requirements Alok
- Re: lisp implementations and scaling requirements Alex Mizrahi
Re: lisp implementations and scaling requirements Lars Rune Nøstdal
- Re: lisp implementations and scaling requirements Alok
  - Re: lisp implementations and scaling requirements Lars Rune Nøstdal
    - Re: lisp implementations and scaling requirements Lars Rune Nøstdal
Re: lisp implementations and scaling requirements Douglas Crosher

From: Javier
Subject: Re: lisp implementations and scaling requirements
Date: Sat, 04 Nov 2006 21:20:41 +0000
Message-ID: <1162675241.743619.237290@h54g2000cwb.googlegroups.com>

Alok wrote:
> Daniel Barlow noted a point here -> http://ww.telent.net/diary/2002/11/
> that "Lisp is unsuitable for Google because implementations don't cope
> with their scary scaling requirements.".
>
> True enough, just to start up SBCL on my machine takes up 26M (python
> 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
> you want thousands of processes (doing simple things) running
> simultaneously.

C and C++ uses even less, because they are sharing the same libraries
but still having separate memory segments for each process.

> So I would like to invite comments from people who know about this, is
> it possible to create a runtime shared library as a Lisp
> implementation?

In SBCL this is not really possible. Every application must open a
separate process of SBCL itself.
You can simulate multiple processes using threads, but it is not very
safe, and have a lot of side effects (like all the applications sharing
the same memory space).

> Also how well supported is multi-threading in Lisp (SBCL)?

It's only well supported under Linux.

From: Bill Atkins
Subject: Re: lisp implementations and scaling requirements
Date: Sat, 04 Nov 2006 23:40:17 +0000
Message-ID: <m2d582rdb2.fsf@weedle-24.dynamic.rpi.edu>

"Javier" <·······@gmail.com> writes:

>> So I would like to invite comments from people who know about this, is
>> it possible to create a runtime shared library as a Lisp
>> implementation?
>
> In SBCL this is not really possible. Every application must open a
> separate process of SBCL itself.

Huh?  There's no reason that, for example, the compiler and the
functions that make up the runtime couldn't be moved into a shared
library and dynamically linked into a Lisp process.

From: Javier
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 01:50:18 +0000
Message-ID: <1162691418.062981.263950@m7g2000cwm.googlegroups.com>

Bill Atkins wrote:
> "Javier" <·······@gmail.com> writes:
>
> >> So I would like to invite comments from people who know about this, is
> >> it possible to create a runtime shared library as a Lisp
> >> implementation?
> >
> > In SBCL this is not really possible. Every application must open a
> > separate process of SBCL itself.
>
> Huh?  There's no reason that, for example, the compiler and the
> functions that make up the runtime couldn't be moved into a shared
> library and dynamically linked into a Lisp process.

What the original poster asked is if there could be a shared library
with the runtime (and indeed the compiler) which could be shared along
different SBCL processes. My response is that every SBCL process must
have its own copy in memory of such libraries in order to function.
For example, if you open two separate shells, and open SBCL on each
one, each process loads the runtime, they do not share the same in
memory.
So if you open 20 SBCL processes, you end up using at least 520Mb of
memory.

From: Pascal Bourguignon
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 03:54:30 +0000
Message-ID: <87slgyy2dl.fsf@thalassa.informatimago.com>

"Javier" <·······@gmail.com> writes:

> Bill Atkins wrote:
>> "Javier" <·······@gmail.com> writes:
>>
>> >> So I would like to invite comments from people who know about this, is
>> >> it possible to create a runtime shared library as a Lisp
>> >> implementation?
>> >
>> > In SBCL this is not really possible. Every application must open a
>> > separate process of SBCL itself.
>>
>> Huh?  There's no reason that, for example, the compiler and the
>> functions that make up the runtime couldn't be moved into a shared
>> library and dynamically linked into a Lisp process.
>
> What the original poster asked is if there could be a shared library
> with the runtime (and indeed the compiler) which could be shared along
> different SBCL processes. My response is that every SBCL process must
> have its own copy in memory of such libraries in order to function.
> For example, if you open two separate shells, and open SBCL on each
> one, each process loads the runtime, they do not share the same in
> memory.
> So if you open 20 SBCL processes, you end up using at least 520Mb of
> memory.

Perhaps that's how sbcl works (I don't know), it could work differently.

The problem is that each lisp image does its own garbage collection.
Theorically, there's no difference between buit-in functions and user
defined functions, and they could be garbage collected as well, and in
the case of a copying garbage collector, copied during the run.  In
this case, of course, two processes will need their own copies of the
whole image.

However, we could have a "generation zero" of built-in objects, that
would live in the text section instead of the heap, and that would be
shared as any other unix text.  There's no need for a shared library,
on modern unices, program texts are automatically shared.

Is it not how sbcl works?

In the case of clisp, there are 2MB of program and 2MB of image.  Of
course, the 2MB of program text are shared between all clisp processes
running at the same time.  Only the 2MB of image are (gc) managed
independently by each process.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"Klingon function calls do not have "parameters" -- they have
"arguments" and they ALWAYS WIN THEM."

From: Juho Snellman
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 05:22:40 +0000
Message-ID: <slrnekqt90.s06.jsnell@sbz-30.cs.Helsinki.FI>

Pascal Bourguignon <···@informatimago.com> wrote:
> "Javier" <·······@gmail.com> writes:
>> What the original poster asked is if there could be a shared library
>> with the runtime (and indeed the compiler) which could be shared along
>> different SBCL processes. My response is that every SBCL process must
>> have its own copy in memory of such libraries in order to function.
>> For example, if you open two separate shells, and open SBCL on each
>> one, each process loads the runtime, they do not share the same in
>> memory.
>> So if you open 20 SBCL processes, you end up using at least 520Mb of
>> memory.
>
> Perhaps that's how sbcl works (I don't know), it could work differently.
[...]
> However, we could have a "generation zero" of built-in objects, that
> would live in the text section instead of the heap, and that would be
> shared as any other unix text.  There's no need for a shared library,
> on modern unices, program texts are automatically shared.
>
> Is it not how sbcl works?

Yes, essentially[*] it is. "Javier" should already be aware of this,
since he has been corrected on this before. I don't know why he
persists in spreading misinformation.

[*] Actually the "built-in objects" also live on the heap, but in
    pages whose contents the GC will treat specially. Since the core
    is mapped from a file as private copy-on-write memory, all of the
    pages from the core file are shared until a process writes to
    them. Most pages are never written to, and thus remain shared for
    the lifetime of the processes.

-- 
Juho Snellman

From: Lars Rune Nøstdal
Subject: Re: lisp implementations and scaling requirements
Date: Sat, 04 Nov 2006 22:46:33 +0000
Message-ID: <pan.2006.11.04.22.46.31.34048@gmail.com>

On Sat, 04 Nov 2006 13:20:41 -0800, Javier wrote:
>
> You can simulate multiple processes using threads, but it is not very
> safe, and have a lot of side effects (like all the applications sharing
> the same memory space).

I'm not sure I understand you; threads do not share all memory. I find the
way SBCL handles this quite intuitive:


cl-user> (defmacro in-thread (&body body)
           `(sb-thread:make-thread (lambda () ,@body)))
in-thread
cl-user> (let ((x 0))
           (defparameter *shared* -1)
           (in-thread
             (let ((x 1))
               (sleep 2) ;; `x' in thread below is setf'ed to 2 after this.
               (assert (= x 1))
               (format t "thread1, x: ~A, *shared*: ~A~%" x (incf *shared*))))
           (in-thread
             (let ((x "blah"))
               (sleep 1)
               (setf x 2)
               (format t "thread2, x: ~A, *shared*: ~A~%" x (incf *shared*))))
           (sleep 3) ;; `x' in other threads are bound to 1 then setf'ed to 2 after this.
           (format t "thread0, x: ~A, *shared*: ~A~%" x (incf *shared*))
           (in-thread (setf x 3))
           (sleep 0.5) ;; To avoid race between the setf in thread above and format below.
           (format t "thread0, x: ~A, *shared*: ~A~%" x (incf *shared*)))
thread2, x: 2, *shared*: 0
thread1, x: 1, *shared*: 1
thread0, x: 0, *shared*: 2
thread0, x: 3, *shared*: 3
nil


..and being able to share some data between threads makes it easy to IPC.

-- 
Lars Rune Nøstdal
http://lars.nostdal.org/

From: Javier
Subject: Re: lisp implementations and scaling requirements
Date: Sat, 04 Nov 2006 23:19:30 +0000
Message-ID: <1162682369.955541.287900@h54g2000cwb.googlegroups.com>

Lars Rune Nøstdal wrote:
> On Sat, 04 Nov 2006 13:20:41 -0800, Javier wrote:
> >
> > You can simulate multiple processes using threads, but it is not very
> > safe, and have a lot of side effects (like all the applications sharing
> > the same memory space).
>
> I'm not sure I understand you; threads do not share all memory. I find the
> way SBCL handles this quite intuitive:

I don't mean all the memory, but the same memory space. For example, if
a thread exaust the computer memory or produce a segmentation fault,
the remaining threads must be closed.
But this is not the only side effect. There may be some name space
collisions when working on common packages (for example, two processes
define the same name for a function in CL-USER).

> ..and being able to share some data between threads makes it easy to IPC.

Yes. But we are talking here about multiple processes, not threads.
Imagine two different applications, one is an http server, the other
one is a desktop application. It is not convenient to run both in the
same process because they can interfere each other. This means that you
need to load SBCL two times. This would not be a problem if SBCL would
be able to share the same memory for its core and libraries for
different processes, like C does.

From: Lars Rune Nøstdal
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 00:09:33 +0000
Message-ID: <pan.2006.11.05.00.09.33.204791@gmail.com>

On Sat, 04 Nov 2006 15:19:30 -0800, Javier wrote:

> 
> Lars Rune Nøstdal wrote:
>> On Sat, 04 Nov 2006 13:20:41 -0800, Javier wrote:
>> >
>> > You can simulate multiple processes using threads, but it is not very
>> > safe, and have a lot of side effects (like all the applications sharing
>> > the same memory space).
>>
>> I'm not sure I understand you; threads do not share all memory. I find the
>> way SBCL handles this quite intuitive:
> 
> I don't mean all the memory, but the same memory space. For example, if
> a thread exaust the computer memory or produce a segmentation fault,
> the remaining threads must be closed.

Lisp doesn't SEGFAULT. :) Total memory-exhaustion with no control or
checking will bring any system down regardless of language. It is not
something one lets happen.

> But this is not the only side effect. There may be some name space
> collisions when working on common packages (for example, two processes
> define the same name for a function in CL-USER).
> 
>> ..and being able to share some data between threads makes it easy to IPC.
> 
> Yes. But we are talking here about multiple processes, not threads.
> Imagine two different applications, one is an http server, the other
> one is a desktop application. It is not convenient to run both in the
> same process because they can interfere each other. This means that you
> need to load SBCL two times. This would not be a problem if SBCL would
> be able to share the same memory for its core and libraries for
> different processes, like C does.

OK, I thought this was about running multiple instances or sessions of one
application - which is usually what one does on a server and is what
scaling means in that context.

-- 
Lars Rune Nøstdal
http://lars.nostdal.org/

From: Javier
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 01:35:53 +0000
Message-ID: <1162690553.854902.251440@f16g2000cwb.googlegroups.com>

Lars Rune Nøstdal wrote:
> On Sat, 04 Nov 2006 15:19:30 -0800, Javier wrote:
>
> >
> > Lars Rune Nøstdal wrote:
> >> On Sat, 04 Nov 2006 13:20:41 -0800, Javier wrote:
> >> >
> >> > You can simulate multiple processes using threads, but it is not very
> >> > safe, and have a lot of side effects (like all the applications sharing
> >> > the same memory space).
> >>
> >> I'm not sure I understand you; threads do not share all memory. I find the
> >> way SBCL handles this quite intuitive:
> >
> > I don't mean all the memory, but the same memory space. For example, if
> > a thread exaust the computer memory or produce a segmentation fault,
> > the remaining threads must be closed.
>
> Lisp doesn't SEGFAULT. :) Total memory-exhaustion with no control or
> checking will bring any system down regardless of language. It is not
> something one lets happen.

Not always, there is the case when you can kill an application before
it eats all the memory avaiable.
And yes, Lisp doesn't segfault, at least in theory. :) (I have managed
to see SBCL to crash one or two times elsewhere).

>
> > But this is not the only side effect. There may be some name space
> > collisions when working on common packages (for example, two processes
> > define the same name for a function in CL-USER).
> >
> >> ..and being able to share some data between threads makes it easy to IPC.
> >
> > Yes. But we are talking here about multiple processes, not threads.
> > Imagine two different applications, one is an http server, the other
> > one is a desktop application. It is not convenient to run both in the
> > same process because they can interfere each other. This means that you
> > need to load SBCL two times. This would not be a problem if SBCL would
> > be able to share the same memory for its core and libraries for
> > different processes, like C does.
>
> OK, I thought this was about running multiple instances or sessions of one
> application - which is usually what one does on a server and is what
> scaling means in that context.

Yes, in that context I agree with you.

From: ······@earthlink.net
Subject: Re: lisp implementations and scaling requirements
Date: Sat, 04 Nov 2006 22:27:09 +0000
Message-ID: <1162679229.610132.27290@f16g2000cwb.googlegroups.com>

Alok wrote:
> True enough, just to start up SBCL on my machine takes up 26M (python
> 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
> you want thousands of processes (doing simple things) running
> simultaneously.

Interestingly enough, Google uses thousands of machines to run its
thousands of processes.

As to the relative startup size, Greenspun's Tenth Law is true, even at
Google.  Me - I don't care how big "hello world" is because that's not
my
application.

-andy

From: Alok
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 12:28:15 +0000
Message-ID: <1162729695.844345.300110@m7g2000cwm.googlegroups.com>

······@earthlink.net wrote:
> Alok wrote:
> > True enough, just to start up SBCL on my machine takes up 26M (python
> > 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
> > you want thousands of processes (doing simple things) running
> > simultaneously.
>
> Interestingly enough, Google uses thousands of machines to run its
> thousands of processes.

If only everyone had cheap access to resources that Google now
commands. But even Google must be running thousands of small processes
throughout the day on a typical Linux server? After all, running small
processes each designed to do the one thing that it does best, is the
UNIX philosophy.

> As to the relative startup size, Greenspun's Tenth Law is true, even at
> Google.  Me - I don't care how big "hello world" is because that's not
> my
> application.
> 
> -andy

From: ······@earthlink.net
Subject: Re: lisp implementations and scaling requirements
Date: Mon, 06 Nov 2006 15:51:02 +0000
Message-ID: <1162828262.102277.318020@k70g2000cwa.googlegroups.com>

Alok wrote:
> ······@earthlink.net wrote:
> > Alok wrote:
> > > True enough, just to start up SBCL on my machine takes up 26M (python
> > > 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
> > > you want thousands of processes (doing simple things) running
> > > simultaneously.
> >
> > Interestingly enough, Google uses thousands of machines to run its
> > thousands of processes.
>
> If only everyone had cheap access to resources that Google now
> commands. But even Google must be running thousands of small processes
> throughout the day on a typical Linux server?

"Must"?  On the performance-critical servers, it's quite unlikely.

After all, a search engine isn't built by piping unix utilities.

> After all, running small
> processes each designed to do the one thing that it does best, is the
> UNIX philosophy.

Google has no interest in conforming to "the UNIX philosophy".  Do
you really think that they'll conform when they think that there is a
better way?

Note that the canonical example of "the UNIX philosophy", spell, isn't
actually implemented according to said philosophy.

-andy

From: Rob Thorpe
Subject: Re: lisp implementations and scaling requirements
Date: Mon, 06 Nov 2006 17:13:14 +0000
Message-ID: <1162833194.114195.164390@b28g2000cwb.googlegroups.com>

······@earthlink.net wrote:
> Alok wrote:
> > ······@earthlink.net wrote:
> > > Alok wrote:
> > > > True enough, just to start up SBCL on my machine takes up 26M (python
> > > > 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
> > > > you want thousands of processes (doing simple things) running
> > > > simultaneously.
> > >
> > > Interestingly enough, Google uses thousands of machines to run its
> > > thousands of processes.
> >
> > If only everyone had cheap access to resources that Google now
> > commands. But even Google must be running thousands of small processes
> > throughout the day on a typical Linux server?
>
> "Must"?  On the performance-critical servers, it's quite unlikely.
>
> After all, a search engine isn't built by piping unix utilities.
>
> > After all, running small
> > processes each designed to do the one thing that it does best, is the
> > UNIX philosophy.
>
> Google has no interest in conforming to "the UNIX philosophy".  Do
> you really think that they'll conform when they think that there is a
> better way?

You're right.  One of my friends worked for a search engine company,
not google, one of the others.  He described to me how the system he
worked on worked, most search engines are not much different.

The system is separated into three banks of machines.  One serve the
results that the user sees from the store/database.  Another set
load-balance directing the traffic to unoccupied search machines.
Another crawl the web updating the links in the store.  In some systems
there is further separation, the machine accessing the database are
separate from those performing html serving.

All the machines involved use large processes that run more-or-less
continuously.  Starting new processes is relatively expensive and so
avoided.  A machine might still reasonably run a few thousand
incidental processes each day but not for important tasks or tasks that
require much runtime or memory.  ~3000 process starts/stops over the
course of a day would not hurt performance much.  The system that was
described to me didn't do this though.  It is very poor design to
initiate a process for every transaction in a system where transaction
speed is important.

> Note that the canonical example of "the UNIX philosophy", spell, isn't
> actually implemented according to said philosophy.

Is spell the canonical example?

The problem with this aspect of the Unix philosophy is that it was
something that Unix did and does relatively well compared to other
system.  You can construct a program made of smaller interlocking
programs connected by pipes.  This was often used as an example of good
design by Unix advocates and became synonomous with "the Unix
philosohy".

It's never been something that Unix programmers have been religious
about though.  Loads of stuff is built to be modular by use of
libraries rather than separate programs.  X doesn't do it, neither do
the libraries that use X, they use whatever modularity is suitable, or
try to.

From: Kaz Kylheku
Subject: Re: lisp implementations and scaling requirements
Date: Sat, 04 Nov 2006 23:48:08 +0000
Message-ID: <1162684088.886048.52730@h54g2000cwb.googlegroups.com>

Alok wrote:
> Daniel Barlow noted a point here -> http://ww.telent.net/diary/2002/11/
> that "Lisp is unsuitable for Google because implementations don't cope
> with their scary scaling requirements.".

Trolling lamer, don't you have anything better to do than digging
through four year old blogs?

This is not Daniel Barlow's point. Rather, he is citing Norvig's
comments from the International Lisp Conference which was in the middle
of attending when he wrote that blog.

> True enough, just to start up SBCL on my machine takes up 26M (python

True enough, you are guessing, and poorly at that, at what Norvig might
have meant by whatever he said at that conference. To have any hope of
understanding, you have to download the proceedings of that conference
and read the relevant paper.

I have that paper. Here are some quotes:

``It used to be if you could double the speed of your developers that
that was a really great thing, and it didn't matter much about the
hardware cost. That's not true at Google. If our code were to run 2x
more slowly, it would cost us tens of millions of dollars.''

...

``The first question for this audience is, why isn' t Lisp everyone's
favorite language? Why doesn't everyone use it for everything? Why
isn't it used at Google?  Here are some of the reasons I've extracted
from my own experience:

- Memory Management. I mentioned Eugene Charniak earlier who had great
success with training a parser on a very large set of examples.  He
said he switched to C++ because of memory management issues.  His
feeling was that the operations he was doing weren't that complicated
--- you can write down the mathematical expressions in a page or so,
but what he really cared about was manipulating these giant matrices
and making that efficient. He felt you have better control of that in
C++. I'm not convinced that's true, but in order to duplicate the
performance in Lisp, you have to throw away the abstraction
capabilities of Lisp, manage the memory yourself, and allocate arrays
of words or whatever and work on them directly.
...
- Cultural Bias. [Oh boy, here we go -KK] The Lisp community has become
somewhat isolated in two ways.  [... snip BS ...] If you want to do
web-type stuff with HTTP, SMTP, FTP, XML and all those packages, you
can go out and download it instantly from a canonical site for Perl or
Python. ''

[ Then he talks a lot about how the various advantages of Lisp have
crept into other languages in the last ten years. See Daniel Barlow's
own response to that in the blog you were citing: nothing has all those
advantages rolled into one tool. ]

Finally, here is the kicker:

``I look at Google as having all of the advantages of Lisp without
using Lisp.  Why?  Well, for one thing we have all the right people,
which is critical.  We re also very fortunate to have risen at a time
when many others have slipped. Nobody else is hiring so we get many of
the good people.  That's worked out very well for us.  Our people have
backgrounds in Lisp, Smalltalk, Dylan and Python, and C++.  Our goal is
to always be running and able to make incremental changes.  What was
important to Paul Graham is also important to us, but we do things in a
slightly different way.  Rather than having one listener to which you
compile updated definitions of your Lisp functions, we have 10,000
servers and we take them down one at time and upgrade them.  So we're
always running but we're doing it at a different level and at a
different granularity.  With C++, you can't augment a running program,
but we don't have to augment a running program we have to augment a
running system of servers, not a single program.  As for interactive
top level, well in some sense all web servers are that a web server
takes a request and returns it.  And for things like macros for HTML,
we just go out and implement it in a custom way for what we need, as
opposed to taking a general-purpose approach.''

> 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ).
> Which is not good if you want thousands of processes (doing simple things) running
> simultaneously.

Firstly, thousands of processes on one machine is a very poor idea if
you want to provide scalability.  Google's scalability comes from
thousands of /machines/. There isn't anything in Lisp implementations
that gets in the way of writing distributed applications.

Lisp is a programming language. You know, one of those things that maps
high level constructs onto the processor.

Secondly, please inform yourself about how memory mapping of
executables and libraries works in a virtual memory OS. In general,
multiple instances of an executable or shared library do not occupy
additional memory, at least not the read-only text sections which hold
code and some static data.

Basically, if we ignore the silly crap about Python and Perl having
nice libraries, Norvig makes these two points, which we can take at
face value:

- Things have to run fast at Google. Execution time is money.

Lisp may have a disadvantage with regard to C++ here, but not a huge
one. And it is certainly ahead of things which don't even have native
compilers, like Python and Perl. So why did he even bother bringing
these up?

- Control over memory management is important. (But note that Norvig
observes himself that he's not convinced that C++ offers better control
than Lisp!)

- The capability of Lisp to update code in a running image and macros
is not needed at Google, where you do the same thing at a different
granularity (perhaps replacing an entire executable program written in
C++).

This doesn't mean that Lisp is unsuitable; the ability to update code
in a running image isn't a hindrance to anything, it's just not needed.
This is a justification why C++ is used, not why Lisp isn't used.

- Lisp macros for things like HTML generaration are not needed. Google
is satisfied with hacking up a custom way of doing it.

Again, a rationalization for C++. Nothing against Lisp.

From: Alok
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 12:05:34 +0000
Message-ID: <1162728334.804149.202630@i42g2000cwa.googlegroups.com>

"Kaz Kylheku" <········@gmail.com> writes:

> Alok wrote:
> > Daniel Barlow noted a point here -> http://ww.telent.net/diary/2002/11/
> > that "Lisp is unsuitable for Google because implementations don't cope
> > with their scary scaling requirements.".
>
> Trolling lamer, don't you have anything better to do than digging
> through four year old blogs?

I am considering learning lisp and am curious about how suitable it
would be
for my different needs. So, if you can point me to better places
where I can get more information on how to run thousands of  small
Lisp processes, then you are more than welcome.

And, before you dismiss anyone as a lame troll, consider giving some
positive feedback first. Its good manners.

>
> This is not Daniel Barlow's point. Rather, he is citing Norvig's
> comments from the International Lisp Conference which was in the
> middle of attending when he wrote that blog.

Read my post carefully, I never wrote that D. Barlow states this point,

only that he noted the point.

> > True enough, just to start up SBCL on my machine takes up 26M (python
>
> True enough, you are guessing, and poorly at that, at what Norvig might
> have meant by whatever he said at that conference. To have any hope of
> understanding, you have to download the proceedings of that conference
> and read the relevant paper.

You will be more helpful, if you can share the link to the original
paper.
I tried to, but could not find it. I was guessing that the process size
may have been the scaling problems, but you are quoting that it was the
process execution speed. Either case I am more interested in what the
scaling problems are. So, now that you cite the quote, it does shed
some
light about it.

>
> I have that paper. Here are some quotes:
>
> ``It used to be if you could double the speed of your developers that
> that was a really great thing, and it didn't matter much about the
> hardware cost. That's not true at Google. If our code were to run 2x
> more slowly, it would cost us tens of millions of dollars.''
>
> ...
>
> ``The first question for this audience is, why isn' t Lisp everyone's
> favorite language? Why doesn't everyone use it for everything? Why
> isn't it used at Google?  Here are some of the reasons I've extracted
> from my own experience:
>
> - Memory Management. I mentioned Eugene Charniak earlier who had great
> success with training a parser on a very large set of examples.  He
> said he switched to C++ because of memory management issues.  His
> feeling was that the operations he was doing weren't that complicated
> --- you can write down the mathematical expressions in a page or so,
> but what he really cared about was manipulating these giant matrices
> and making that efficient. He felt you have better control of that in
> C++. I'm not convinced that's true, but in order to duplicate the
> performance in Lisp, you have to throw away the abstraction
> capabilities of Lisp, manage the memory yourself, and allocate arrays
> of words or whatever and work on them directly.
> ...
> - Cultural Bias. [Oh boy, here we go -KK] The Lisp community has become
> somewhat isolated in two ways.  [... snip BS ...] If you want to do
> web-type stuff with HTTP, SMTP, FTP, XML and all those packages, you
> can go out and download it instantly from a canonical site for Perl or
> Python. ''
>
> [ Then he talks a lot about how the various advantages of Lisp have
> crept into other languages in the last ten years. See Daniel Barlow's
> own response to that in the blog you were citing: nothing has all those
> advantages rolled into one tool. ]
>
> Finally, here is the kicker:
>
> ``I look at Google as having all of the advantages of Lisp without
> using Lisp.  Why?  Well, for one thing we have all the right people,
> which is critical.  We re also very fortunate to have risen at a time
> when many others have slipped. Nobody else is hiring so we get many of
> the good people.  That's worked out very well for us.  Our people have
> backgrounds in Lisp, Smalltalk, Dylan and Python, and C++.  Our goal is
> to always be running and able to make incremental changes.  What was
> important to Paul Graham is also important to us, but we do things in a
> slightly different way.  Rather than having one listener to which you
> compile updated definitions of your Lisp functions, we have 10,000
> servers and we take them down one at time and upgrade them.  So we're
> always running but we're doing it at a different level and at a
> different granularity.  With C++, you can't augment a running program,
> but we don't have to augment a running program we have to augment a
> running system of servers, not a single program.  As for interactive
> top level, well in some sense all web servers are that a web server
> takes a request and returns it.  And for things like macros for HTML,
> we just go out and implement it in a custom way for what we need, as
> opposed to taking a general-purpose approach.''
>
> > 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ).
> > Which is not good if you want thousands of processes (doing simple things) running
> > simultaneously.
>
> Firstly, thousands of processes on one machine is a very poor idea if
> you want to provide scalability.

Look, I work on several UNIX servers where there are several thousands
of
processes running on each server for a few seconds throughout the day.
And I
am curious about if it would be a good idea to move them to Lisp.

> Google's scalability comes from
> thousands of /machines/. There isn't anything in Lisp implementations
> that gets in the way of writing distributed applications.
>
> Lisp is a programming language. You know, one of those things that maps
> high level constructs onto the processor.
>
> Secondly, please inform yourself about how memory mapping of
> executables and libraries works in a virtual memory OS. In general,
> multiple instances of an executable or shared library do not occupy
> additional memory, at least not the read-only text sections which hold
> code and some static data.

As far as I am aware the memory size reported by the 'prstat' command
for a
process on Solaris or the 'top' command on Linux includes all shared
object memory
mappings.

> Basically, if we ignore the silly crap about Python and Perl having
> nice libraries, Norvig makes these two points, which we can take at
> face value:
>
> - Things have to run fast at Google. Execution time is money.
>
> Lisp may have a disadvantage with regard to C++ here, but not a huge
> one. And it is certainly ahead of things which don't even have native
> compilers, like Python and Perl. So why did he even bother bringing
> these up?
>
> - Control over memory management is important. (But note that Norvig
> observes himself that he's not convinced that C++ offers better control
> than Lisp!)
>
> - The capability of Lisp to update code in a running image and macros
> is not needed at Google, where you do the same thing at a different
> granularity (perhaps replacing an entire executable program written in
> C++).
>
> This doesn't mean that Lisp is unsuitable; the ability to update code
> in a running image isn't a hindrance to anything, it's just not needed.
> This is a justification why C++ is used, not why Lisp isn't used.
>
> - Lisp macros for things like HTML generaration are not needed. Google
> is satisfied with hacking up a custom way of doing it.
> 
> Again, a rationalization for C++. Nothing against Lisp.

From: Alex Mizrahi
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 10:21:38 +0000
Message-ID: <454dbb37$0$49199$14726298@news.sunsite.dk>

(message (Hello 'Kaz)
(you :wrote  :on '(4 Nov 2006 15:48:08 -0800))
(

 KK> - Control over memory management is important. (But note that Norvig
 KK> observes himself that he's not convinced that C++ offers better control
 KK> than Lisp!)

interesting that Google uses Java  -- Ron Garret told us. i don't know if 
it's widely used on production servers, or just as prototyping..
but Java has almost same runtime properties as Common Lisp (except Java has 
threads working right :), so reasons for not using Lisp most likely are 
non-techincal

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"People who lust for the Feel of keys on their fingertips (c) Inity")

From: Lars Rune Nøstdal
Subject: Re: lisp implementations and scaling requirements
Date: Sat, 04 Nov 2006 21:36:57 +0000
Message-ID: <pan.2006.11.04.21.36.57.636017@gmail.com>

On Sat, 04 Nov 2006 12:49:40 -0800, Alok wrote:

> Daniel Barlow noted a point here -> http://ww.telent.net/diary/2002/11/
> that "Lisp is unsuitable for Google because implementations don't cope
> with their scary scaling requirements.".
> 
> True enough, just to start up SBCL on my machine takes up 26M (python
> 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
> you want thousands of processes (doing simple things) running
> simultaneously.

How well something scales at something is not the same as its
one-time cost of startup. There is not a 26M increase in memory
consumption for each new step or "upscale".

> So I would like to invite comments from people who know about this, is
> it possible to create a runtime shared library as a Lisp
> implementation?
> 
> Also how well supported is multi-threading in Lisp (SBCL)?

SBCL has good support for this under Linux:
  http://www.sbcl.org/manual/Threading.html#Threading

But if you try to create thousands of threads it breaks the stack (or
something like that); there might be an option in SBCL to adjust this.
However, I think just having a couple of worker-threads (number of CPUs +
1?) and listen for events instead of blocking would scale better anyways.

-- 
Lars Rune Nøstdal
http://lars.nostdal.org/

From: Alok
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 12:17:08 +0000
Message-ID: <1162729028.253847.107420@e3g2000cwe.googlegroups.com>

Lars Rune Nøstdal wrote:
> On Sat, 04 Nov 2006 12:49:40 -0800, Alok wrote:
>
> > Daniel Barlow noted a point here -> http://ww.telent.net/diary/2002/11/
> > that "Lisp is unsuitable for Google because implementations don't cope
> > with their scary scaling requirements.".
> >
> > True enough, just to start up SBCL on my machine takes up 26M (python
> > 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
> > you want thousands of processes (doing simple things) running
> > simultaneously.
>
> How well something scales at something is not the same as its
> one-time cost of startup. There is not a 26M increase in memory
> consumption for each new step or "upscale".

It is if you consider that each new step is to run in a different
process, for whatever reason. A typical UNIX machine runs several
thousands of process of very small execution time (like 'sed' commands
in small shell scripts). It wouldn't be a good idea to port those small
individual shell scripts to run in equivalent small Lisp implementation
programs using CL-PPCRE, would it?

>
> > So I would like to invite comments from people who know about this, is
> > it possible to create a runtime shared library as a Lisp
> > implementation?
> >
> > Also how well supported is multi-threading in Lisp (SBCL)?
>
> SBCL has good support for this under Linux:
>   http://www.sbcl.org/manual/Threading.html#Threading
>
> But if you try to create thousands of threads it breaks the stack (or
> something like that); there might be an option in SBCL to adjust this.
> However, I think just having a couple of worker-threads (number of CPUs +
> 1?) and listen for events instead of blocking would scale better anyways.

Can you please elaborate more on this "and listen for events instead of
blocking ". Wouldn't listening for events imply blocking, until an
event is received?

Alok

From: Lars Rune Nøstdal
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 20:54:07 +0000
Message-ID: <pan.2006.11.05.20.54.06.641730@gmail.com>

On Sun, 05 Nov 2006 04:17:08 -0800, Alok wrote:

> Lars Rune Nøstdal wrote:
>> On Sat, 04 Nov 2006 12:49:40 -0800, Alok wrote:
>>
>> > Daniel Barlow noted a point here -> http://ww.telent.net/diary/2002/11/
>> > that "Lisp is unsuitable for Google because implementations don't cope
>> > with their scary scaling requirements.".
>> >
>> > True enough, just to start up SBCL on my machine takes up 26M (python
>> > 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
>> > you want thousands of processes (doing simple things) running
>> > simultaneously.
>>
>> How well something scales at something is not the same as its
>> one-time cost of startup. There is not a 26M increase in memory
>> consumption for each new step or "upscale".
> 
> It is if you consider that each new step is to run in a different
> process, for whatever reason. A typical UNIX machine runs several
> thousands of process of very small execution time (like 'sed' commands
> in small shell scripts). It wouldn't be a good idea to port those small
> individual shell scripts to run in equivalent small Lisp implementation
> programs using CL-PPCRE, would it?
> 
>>
>> > So I would like to invite comments from people who know about this, is
>> > it possible to create a runtime shared library as a Lisp
>> > implementation?
>> >
>> > Also how well supported is multi-threading in Lisp (SBCL)?
>>
>> SBCL has good support for this under Linux:
>>   http://www.sbcl.org/manual/Threading.html#Threading
>>
>> But if you try to create thousands of threads it breaks the stack (or
>> something like that); there might be an option in SBCL to adjust this.
>> However, I think just having a couple of worker-threads (number of CPUs +
>> 1?) and listen for events instead of blocking would scale better anyways.
> 
> Can you please elaborate more on this "and listen for events instead of
> blocking ". Wouldn't listening for events imply blocking, until an
> event is received?

Yeah, but you can check more than one "channel" with only one listener.

  http://en.wikipedia.org/wiki/Event_driven
  http://www.kegel.com/c10k.html

..there is no need to have one dedicated listener for each "channel" or
client - or have one thread blocking for each "channel" or client.

-- 
Lars Rune Nøstdal
http://lars.nostdal.org/

From: Lars Rune Nøstdal
Subject: Re: lisp implementations and scaling requirements
Date: Sun, 05 Nov 2006 21:09:59 +0000
Message-ID: <pan.2006.11.05.21.09.56.320831@gmail.com>

On Sun, 05 Nov 2006 21:54:07 +0100, Lars Rune Nøstdal wrote:

> On Sun, 05 Nov 2006 04:17:08 -0800, Alok wrote:
> 
>> Lars Rune Nøstdal wrote:
>>> On Sat, 04 Nov 2006 12:49:40 -0800, Alok wrote:
>>>
>>> > Daniel Barlow noted a point here -> http://ww.telent.net/diary/2002/11/
>>> > that "Lisp is unsuitable for Google because implementations don't cope
>>> > with their scary scaling requirements.".
>>> >
>>> > True enough, just to start up SBCL on my machine takes up 26M (python
>>> > 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
>>> > you want thousands of processes (doing simple things) running
>>> > simultaneously.
>>>
>>> How well something scales at something is not the same as its
>>> one-time cost of startup. There is not a 26M increase in memory
>>> consumption for each new step or "upscale".
>> 
>> It is if you consider that each new step is to run in a different
>> process, for whatever reason. A typical UNIX machine runs several
>> thousands of process of very small execution time (like 'sed' commands
>> in small shell scripts). It wouldn't be a good idea to port those small
>> individual shell scripts to run in equivalent small Lisp implementation
>> programs using CL-PPCRE, would it?
>> 
>>>
>>> > So I would like to invite comments from people who know about this, is
>>> > it possible to create a runtime shared library as a Lisp
>>> > implementation?
>>> >
>>> > Also how well supported is multi-threading in Lisp (SBCL)?
>>>
>>> SBCL has good support for this under Linux:
>>>   http://www.sbcl.org/manual/Threading.html#Threading
>>>
>>> But if you try to create thousands of threads it breaks the stack (or
>>> something like that); there might be an option in SBCL to adjust this.
>>> However, I think just having a couple of worker-threads (number of CPUs +
>>> 1?) and listen for events instead of blocking would scale better anyways.
>> 
>> Can you please elaborate more on this "and listen for events instead of
>> blocking ". Wouldn't listening for events imply blocking, until an
>> event is received?
> 
> Yeah, but you can check more than one "channel" with only one listener.
> 
>   http://en.wikipedia.org/wiki/Event_driven
>   http://www.kegel.com/c10k.html
> 
> ..there is no need to have one dedicated listener for each "channel" or
> client - or have one thread blocking for each "channel" or client.

Has anyone tried this by the way?

  http://common-lisp.net/project/nio/

I wonder how they solve the signal-problem mentioned in "What about
signals?", here:

  http://everything2.com/index.pl?node_id=1669061

If I understand correctly; SBCL uses signals when conditions happens and
this might screw up the epoll-stuff.

-- 
Lars Rune Nøstdal
http://lars.nostdal.org/

From: Douglas Crosher
Subject: Re: lisp implementations and scaling requirements
Date: Mon, 06 Nov 2006 00:39:07 +0000
Message-ID: <454E842B.4010304@scieneer.com>

Alok wrote:
> Daniel Barlow noted a point here -> http://ww.telent.net/diary/2002/11/
> that "Lisp is unsuitable for Google because implementations don't cope
> with their scary scaling requirements.".
> 
> True enough, just to start up SBCL on my machine takes up 26M (python
> 4M, perl 3M, ruby 2.5M, C++ 1.2M, and C 0.9M ). Which is not good if
> you want thousands of processes (doing simple things) running
> simultaneously.
> 
> So I would like to invite comments from people who know about this, is
> it possible to create a runtime shared library as a Lisp
> implementation?
> 
> Also how well supported is multi-threading in Lisp (SBCL)?

If you need to run a web site with modest scalability then you might
consider the Scieneer CL implementation which includes a http server
that can comfortably manage many tens of thousands of simultaneous
connections and can scale on symmetrical multi-processor systems.

Further, a PortableAllegroServe compatibility layer is available,
built on the Scieneer CL http server, which may offer a path to port
your application to a more scalable implementation.  Contact me for more
information, or see the SCL web site: http://www.scieneer.com/scl/

Regards
Douglas Crosher