multi-core software

From: Xah Lee
Subject: multi-core software
Date: Thu, 04 Jun 2009 16:46:44 +0000
Message-ID: <ee09c02c-41d4-4a4a-ac0f-7275cd17547a@s12g2000yqi.googlegroups.com>

Of interest:

• Why Must Software Be Rewritten For Multi-Core Processors?
  http://xahlee.org/UnixResource_dir/writ/multi-core_software.html

plain text version follows.

--------------------------------------------------
Why Must Software Be Rewritten For Multi-Core Processors?

Xah Lee, 2009-06-04

I had a revelation today, namely, that it is necessary to rewrite
software to use multi-processor in order to benefit from it.

This may sound stupid, but is a revelation to me. For the past decade,
the question has been on my mind, about why should software needs to
be rewritten to take advantage of multi-processors. Because, in my
mind, i thought that software are at some fundamental level just
algorithms, and algorithms, have nothing to do with hardware
implementation aspects such as number of processors. I always felt,
that those talks about the need or difficulty of rewriting software
for multi-processor (or multi-core these days) must be the product of
idiocy of industrial imperative coding monkies. In particular, some
languages such as java, the way they deal with it, seems to me
extremely stupid. e.g. the concept of threads. In my mind, there
should be a layer between the software and the hardware, such as the
operating system, or the compiler, that should be able to
automatically handle or compile the software so that it FULLY use the
multi-processors when present. In short, i thought that a algorithm
really has nothing to do with hardware details.

I never really thought hard about this issue, but today, since i got a
quad-core PC, so i looked into the issue, and thought about it, and i
realized the answer. The gist is that, algorithm, fundamentally means
manipulating some hardware, in fact, algorithm is a step by step
instruction about some form of hardware, even the hardware may be
abstract or virtual. For example, let's say the simplest case of 1+1.
It is a algorithm, but where is the hardware? You may say it's
abstract concept, or it being a mathematical model. What you call 1+1
depends on the context, but in our context, those numbers are the
hardware. To see this, lets say more complex example of listing primes
by sieve. Again, it comes down to “what is a number”? Here, numbers
can be stones, or arrangement of beads on abacus, it's hardware! As
another example, say sorting. To begin with, you have to have some
something to sort, that's hardware.

Another way to see this is that, for a given computing problem, there
are infinite number of algorithms to achieve the same thing. Some,
will be better ones, requiring less steps, or requiring less storage.
All these are concrete manipulation issues, and the thing being
manipulated, ultimately we have to call it hardware. So, when hardware
changes, say from one cpu to multi-cpu, there's no way for the
algorithm to magically change and adopt the changed hardware. If you
need a algorithm that is catered to the new hardware, you need a new
algorithm.

One might think that there might be algorithm Omega, such that it
takes input of old hardware O, new hardware N, and a algorithm A, and
output a algorithm B, such that B has the same behavior as A, but N+B
performs better than O+A. This is like asking for Strong AI.

One thing we often forgot is that algorithms ultimately translates to
manipulating hardware. In a modern digital computer, that means
software algorithms ultimately becomes machine instructions in CPU,
which manipulate the 1s and 0s in register, or electricity voltage in
transisters.

In a more mundane point of view, a automatic system for software to
work on multi-processors is a problem of breaking a problem into
discrete units (so that they can be computed in parallel). The problem
of finding a algorithm is entirely different from the problem of
breaking a problem into distinct units. The problem of breaking a
problem into distinct units is a entire new branch of mathematics. For
example, let's say factoring. Factoring is a well defined mathematical
problem. There are millions algorithms to do it, each class has
different properties such as number of steps or storage units.
However, if we look at these algorithms from the point of view of
distinct units, it's a new perspective on classification of
algorithms. Software are in general too ill-defined and fuzzy and
complex, the software we use on personal computers such as email,
browsers, games, don't even have mathematical models. They don't even
have mathematical models of their inputs and outputs. To talk about
automatic system of unitizing software, would be more like a AI
fantasy. Roughly, a term that describes this aspect of research is
Parallel computing.

In the Wikipedia article, it talks about types of parallelism: Bit-
level parallelism, Instruction-level parallelism, Data parallelism,
Task parallelism. Then it also discusses hardware aspects classes:
multicore, symmetric multiprocessing, distributed computing, cluster,
grid. The subjects mentioned there more close to this essay are “data
parallelism” and “task parallelism”. However, neither are high level
enough as i discussed here. The issue discussed here would be like
“algorithmic parallelism”. Indeed, Wikipedia mentioned “Automatic
parallelization”, which is exactly what i'm talking about here. Quote:

    Automatic parallelization of a sequential program by a compiler is
the holy grail of parallel computing. Despite decades of work by
compiler researchers, automatic parallelization has had only limited
success.[40]

    Mainstream parallel programming languages remain either explicitly
parallel or (at best) partially implicit, in which a programmer gives
the compiler directives for parallelization. A few fully implicit
parallel programming languages exist — SISAL, Parallel Haskell, and
(for FPGAs) Mitrion-C.

It says “A few fully implicit parallel programming languages exist”.
If you are a comp lang nutcase, don't get carried away by what those
words might seem to mean.

(Note: Wikipedia has a dedicated article on the subject: Automatic
parallelization)

  Xah
∑ http://xahlee.org/

☄

Re: multi-core software Roedy Green
- Re: multi-core software Kaz Kylheku
  - Re: multi-core software Thomas A. Russ
    - Re: multi-core software Jeff M.
      - Re: multi-core software John B. Matthews
    - Re: multi-core software Roedy Green
      - Re: multi-core software Paul Wallich
  - Re: multi-core software Scott Burson
    - Re: multi-core software Alessio Stalla
      - Re: multi-core software Dimiter "malkia" Stanev
        Re: multi-core software Jon Harrop
      - Re: multi-core software Thomas Munro
  - Re: multi-core software toby
  - Re: multi-core software =?UTF-8?B?QW5kcsOpIFRoaWVtZQ==?=
    - Re: multi-core software Rob Warnock
  - Re: multi-core software Seamus MacRae
- Re: multi-core software Vend
  - Re: multi-core software Kaz Kylheku
    - Re: multi-core software Roedy Green
      - Re: multi-core software Vend
      - Re: multi-core software Raymond Wiker
      - Re: multi-core software George Neuner
        Re: multi-core software Paul Rubin
        Re: multi-core software Jeff M.
        Re: multi-core software Paul Rubin
        Re: multi-core software Scott David Daniels
        Re: multi-core software Lew
        Re: multi-core software Scott David Daniels
        Re: multi-core software Jon Harrop
        Re: multi-core software Christopher C. Stacy
        Re: multi-core software Ken T.
        Re: multi-core software Jon Harrop
        Re: multi-core software Ken T.
        Re: multi-core software ········@mpi-sws.org
        Re: multi-core software Paul Wallich
        Re: multi-core software Jeff M.
        Re: multi-core software Jon Harrop
        Re: multi-core software John Thingstad
        Re: multi-core software Jon Harrop
      - Re: multi-core software Jon Harrop
        Re: multi-core software Arved Sandstrom
        Re: multi-core software Jon Harrop
        Re: multi-core software Patricia Shanahan
        Re: multi-core software Lew
        Re: multi-core software Jon Harrop
        Re: multi-core software Lew
        Re: multi-core software Arved Sandstrom
        Re: multi-core software Jon Harrop
        Re: multi-core software Arved Sandstrom
        Re: multi-core software Jon Harrop
        Re: multi-core software Arved Sandstrom
        Re: multi-core software Jeff M.
        Re: multi-core software Matthias Blume
        Re: multi-core software Jon Harrop
        Re: multi-core software Dimiter "malkia" Stanev
        Re: multi-core software Seamus MacRae
        Re: multi-core software Jeff M.
        Re: multi-core software Arved Sandstrom
        Re: multi-core software Jon Harrop
        Re: multi-core software Paul Rubin
        Re: multi-core software Arved Sandstrom
        Re: multi-core software Piet van Oostrum
        Re: multi-core software Seamus MacRae
        Re: multi-core software Piet van Oostrum
        Re: multi-core software Jon Harrop
        Re: multi-core software Patricia Shanahan
        Re: multi-core software Joshua Cranmer
        Re: multi-core software Jon Harrop
        Re: multi-core software Arved Sandstrom
        Re: multi-core software Jeff M.
        Re: multi-core software Jon Harrop
        Re: multi-core software George Neuner
        Re: multi-core software Jeff M.
        Re: multi-core software Madhu
        Re: multi-core software Raffael Cavallaro
        Re: multi-core software André Thieme
        Re: multi-core software Madhu
        Re: multi-core software Pascal J. Bourguignon
        Re: multi-core software Nicolas Neuss
        Re: multi-core software Pascal J. Bourguignon
        Re: multi-core software Madhu
        Re: multi-core software Jon Harrop
        Re: multi-core software Jon Harrop
        Re: multi-core software toby
        Re: multi-core software George Neuner
        Re: multi-core software Dimiter "malkia" Stanev
        Re: multi-core software Jon Harrop
Re: multi-core software Scott Burson
- Re: multi-core software Scott Burson

From: Roedy Green
Subject: Re: multi-core software
Date: Thu, 04 Jun 2009 18:35:03 +0000
Message-ID: <eo4g2513iu265prer2oamnq2k7eo8137rg@4ax.com>

On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee <······@gmail.com>
wrote, quoted or indirectly quoted someone who said :

>� Why Must Software Be Rewritten For Multi-Core Processors?

Threads have been part of Java since Day 1.  Using threads complicates
your code, but even with a single core processor, they can improve
performance, particularly if you are doing something like combing
multiple websites.

The nice thing about Java is whether you are on a single core
processor or a 256 CPU machine (We got to run our Athena Integer Java
spreadsheet engine on such a beast), does not concern your code.

You just have to make sure your threads don't interfere with each
other, and Java/the OS, handle exploiting all the CPUs available.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com

Never discourage anyone... who continually makes progress, no matter how slow.
~ Plato 428 BC died: 348 BC at age: 80

From: Kaz Kylheku
Subject: Re: multi-core software
Date: Thu, 04 Jun 2009 19:37:59 +0000
Message-ID: <20090616103324.200@gmail.com>

["Followup-To:" header set to comp.lang.lisp.]
On 2009-06-04, Roedy Green <···········@mindprod.com.invalid> wrote:
> On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee <······@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> Why Must Software Be Rewritten For Multi-Core Processors?
>
> Threads have been part of Java since Day 1.

Unfortunately, not sane threads designed by people who actually understand
multithreading.

> The nice thing about Java is whether you are on a single core
> processor or a 256 CPU machine (We got to run our Athena Integer Java
> spreadsheet engine on such a beast), does not concern your code.

You are dreaming if you think that there are any circumstances (other than
circumstances in which performance doesn't matter) in which you don't have to
concern yourself about the difference between a uniprocessor and a 256 CPU
machine.

From: Thomas A. Russ
Subject: Re: multi-core software
Date: Thu, 04 Jun 2009 20:35:03 +0000
Message-ID: <ymiprdj4u14.fsf@blackcat.isi.edu>

Kaz Kylheku <········@gmail.com> writes:
> ["Followup-To:" header set to comp.lang.lisp.]

Expanded to include comp.lang.java.programmer, since I'm pretty sure
that's where Roedy Green reads.

> On 2009-06-04, Roedy Green <···········@mindprod.com.invalid> wrote:
> 
> > The nice thing about Java is whether you are on a single core
> > processor or a 256 CPU machine (We got to run our Athena Integer Java
> > spreadsheet engine on such a beast), does not concern your code.
> 
> You are dreaming if you think that there are any circumstances (other than
> circumstances in which performance doesn't matter) in which you don't have to
> concern yourself about the difference between a uniprocessor and a 256 CPU
> machine.

I concur.  You would generally do the program design and use of threads
differently depending on whether you expected to be using a multi-core
parallel processing system versus a uniprocessor.

For one thing, spawning a large number of threads to do computation or
subdividing one big computation into smaller chunks to do in separate
threads usually incurs some often significant overhead that only makes
sense if you can make up for the overhead by having parallel execution
speed things up.

For example, suppose that I had four matrix multiplications that I
needed to do in order to further my computation.  I could choose to
either do them sequentially, or else spawn three additional "helper"
threads so that each multiplication happens on a single thread.

But spawning the additional threads has its own overhead in thread
creation, the need to synchronize the completion of the three helper and
arrange a more complicated means of getting the information into and out
of the helper threads.  That is overhead that would be pure waste if
running on a uni-processor, since the overhead buys you nothing, and in
fact imposes the additional cost of having the processor need to switch
threads to simulate the multi-threading.

So that sort of program design would only make sense if you knew that
you would have multiple processors available to handle the helper
threads so that the administrative overhead ends up being offset by the
parallel execution of the four big multiplications.

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Jeff M.
Subject: Re: multi-core software
Date: Thu, 04 Jun 2009 23:07:36 +0000
Message-ID: <268e2a81-8020-47c1-8006-0a4b9582405e@f16g2000vbf.googlegroups.com>

On Jun 4, 3:35 pm, ····@sevak.isi.edu (Thomas A. Russ) wrote:
> Kaz Kylheku <········@gmail.com> writes:
> > ["Followup-To:" header set to comp.lang.lisp.]
>
> Expanded to include comp.lang.java.programmer, since I'm pretty sure
> that's where Roedy Green reads.
>
> > On 2009-06-04, Roedy Green <···········@mindprod.com.invalid> wrote:
>
> > > The nice thing about Java is whether you are on a single core
> > > processor or a 256 CPU machine (We got to run our Athena Integer Java
> > > spreadsheet engine on such a beast), does not concern your code.
>
> > You are dreaming if you think that there are any circumstances (other than
> > circumstances in which performance doesn't matter) in which you don't have to
> > concern yourself about the difference between a uniprocessor and a 256 CPU
> > machine.
>
> I concur.  You would generally do the program design and use of threads
> differently depending on whether you expected to be using a multi-core
> parallel processing system versus a uniprocessor.

I also feel the need to weigh in and agree. Anyone who has done any
kind of serious parallel processing (note: I chose that term
specifically over "multi-threading") knows this. There are many, many
aspects of parallel processing that go FAR beyond a Thread class, a
couple different "Lock" mechanisms, and some functions like start,
stop, pause, resume, and join.

The most common assumptions made by people who like to talk about
multi-threading are that all the processors being utilized are the
same (e.g. dual core+) and that they all share unified memory (ala a
single desktop PC). While that certainly does happen, most of the time
the processors aren't even at the same location, let along the same or
sharing memory.

Another misconception is that you dispatch work to speed things up a
particular operation, when usually that isn't the case. Yes,
occasionally you happen to have a great problem that yields itself to
either a map/reduce solution, or you have 1,000,000 data sets to
process and you simply send 1,000 to each of your 1,000 available
hardware threads. But, that (in my experience) is also a rare
occurrence. Typically, you are just dispatching work to something else
so that you can continue on with other things that are just as
important. Going faster may just be a fringe benefit.

To assume that a programmer doesn't need to even think about the
problem and that the language/OS will just take care of it for you
demonstrates gross inexperience with the problem domain. It's would be
akin to "Java has support for OpenGL, so making a 3D game must be
trivial."

Jeff M.

From: John B. Matthews
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 00:47:32 +0000
Message-ID: <nospam-2B2A8D.20473204062009@news.aioe.org>

In article 
<····································@f16g2000vbf.googlegroups.com>,
 "Jeff M." <·······@gmail.com> wrote:

> On Jun 4, 3:35 pm, ····@sevak.isi.edu (Thomas A. Russ) wrote:
> > Kaz Kylheku <········@gmail.com> writes:
> > > ["Followup-To:" header set to comp.lang.lisp.]
> >
> > Expanded to include comp.lang.java.programmer, since I'm pretty 
> > sure that's where Roedy Green reads.
> >
> > > On 2009-06-04, Roedy Green <···········@mindprod.com.invalid> wrote:
> >
> > > > The nice thing about Java is whether you are on a single core 
> > > > processor or a 256 CPU machine (We got to run our Athena 
> > > > Integer Java spreadsheet engine on such a beast), does not 
> > > > concern your code.
> >
> > > You are dreaming if you think that there are any circumstances 
> > > (other than circumstances in which performance doesn't matter) in 
> > > which you don't have to concern yourself about the difference 
> > > between a uniprocessor and a 256 CPU machine.
> >
> > I concur.  You would generally do the program design and use of 
> > threads differently depending on whether you expected to be using a 
> > multi-core parallel processing system versus a uniprocessor.
> 
> I also feel the need to weigh in and agree. Anyone who has done any 
> kind of serious parallel processing (note: I chose that term 
> specifically over "multi-threading") knows this. There are many, many 
> aspects of parallel processing that go FAR beyond a Thread class, a 
> couple different "Lock" mechanisms, and some functions like start, 
> stop, pause, resume, and join.

Indeed, the 3rd edition of the Java Language Specification [1] revised 
the threading model, having previously deprecated several of the 
methods mentioned [2]. About the same time, several task oriented 
utilities were added to the standard library [3].

The results, while not ground-breaking, can dramatically improve the 
responsiveness of GUI applications, for example, where rendering is 
often confined to a single thread of execution.
 
[1]<http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html>
[2]<http://java.sun.com/javase/6/docs/api/java/lang/Thread.html>
[3]<http://java.sun.com/javase/6/docs/api/java/util/concurrent/package-summary.html>

-- 
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

From: Roedy Green
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 03:40:28 +0000
Message-ID: <gn4h2517bi7vd3u9s28vpgjacqelm8lm10@4ax.com>

On 04 Jun 2009 13:35:03 -0700, ···@sevak.isi.edu (Thomas A. Russ)
wrote, quoted or indirectly quoted someone who said :

>For one thing, spawning a large number of threads to do computation or
>subdividing one big computation into smaller chunks to do in separate
>threads usually incurs some often significant overhead that only makes
>sense if you can make up for the overhead by having parallel execution
>speed things up.

Obviously you have to tweak the number of threads to account for how
many CPUs you have and how much RAM you have (and how slow/wide your
Internet connection is)  I have proposed that it might be possible to
make apps self-tweaking. See http://mindprod.com/jgloss/tweakable.html


Is there anything else you have to change in your code to deal with
the number of cores available?  
-- 
Roedy Green Canadian Mind Products
http://mindprod.com

Never discourage anyone... who continually makes progress, no matter how slow.
~ Plato 428 BC died: 348 BC at age: 80

From: Paul Wallich
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 14:56:16 +0000
Message-ID: <h0bbmg$ab8$1@reader1.panix.com>

Roedy Green wrote:
> On 04 Jun 2009 13:35:03 -0700, ···@sevak.isi.edu (Thomas A. Russ)
> wrote, quoted or indirectly quoted someone who said :
> 
>> For one thing, spawning a large number of threads to do computation or
>> subdividing one big computation into smaller chunks to do in separate
>> threads usually incurs some often significant overhead that only makes
>> sense if you can make up for the overhead by having parallel execution
>> speed things up.
> 
> Obviously you have to tweak the number of threads to account for how
> many CPUs you have and how much RAM you have (and how slow/wide your
> Internet connection is)  I have proposed that it might be possible to
> make apps self-tweaking. See http://mindprod.com/jgloss/tweakable.html
>  
> Is there anything else you have to change in your code to deal with
> the number of cores available?  

Yes. Unless you're project is embarassingly embarassingly parallel, the 
optimal communications pattern may change according to the number of 
cores. Even the entire partitioning strategy might change. And not only 
by number of cores but by the interaction between number of cores and 
well- or ill-behaved nature of the particular data set.

From: Scott Burson
Subject: Re: multi-core software
Date: Thu, 04 Jun 2009 21:28:41 +0000
Message-ID: <a659f469-0c5d-4010-bcc3-15193909fa7a@p4g2000vba.googlegroups.com>

On Jun 4, 12:37 pm, Kaz Kylheku <········@gmail.com> wrote:
> ["Followup-To:" header set to comp.lang.lisp.]
> On 2009-06-04, Roedy Green <···········@mindprod.com.invalid> wrote:
> > Threads have been part of Java since Day 1.
>
> Unfortunately, not sane threads designed by people who actually understand
> multithreading.

What's wrong with them?  -- It's a sincere question.  I haven't
written much Java, but I've written a little, and used the threads --
I don't recall it being any worse than Pthreads or whatever.

-- Scott

From: Alessio Stalla
Subject: Re: multi-core software
Date: Thu, 04 Jun 2009 21:33:59 +0000
Message-ID: <af5b5d5c-b2b0-4ceb-932f-592f195b7c69@r33g2000yqn.googlegroups.com>

On Jun 4, 11:28 pm, Scott Burson <········@gmail.com> wrote:
> On Jun 4, 12:37 pm, Kaz Kylheku <········@gmail.com> wrote:
>
> > ["Followup-To:" header set to comp.lang.lisp.]
> > On 2009-06-04, Roedy Green <···········@mindprod.com.invalid> wrote:
> > > Threads have been part of Java since Day 1.
>
> > Unfortunately, not sane threads designed by people who actually understand
> > multithreading.
>
> What's wrong with them?  -- It's a sincere question.  I haven't
> written much Java, but I've written a little, and used the threads --
> I don't recall it being any worse than Pthreads or whatever.
>
> -- Scott

Yes, I too believe threads in Java are not that bad. Nothing
exceptional, either. And there are also little goodies like thread-
local variables - sure we take them for granted in Lisp (special
variables rebound in each thread), but I don't think C + phtreads has
them.

Alessio

From: Dimiter "malkia" Stanev
Subject: Re: multi-core software
Date: Thu, 04 Jun 2009 21:43:26 +0000
Message-ID: <4A283FFE.3000207@mac.com>

Alessio Stalla wrote:
> On Jun 4, 11:28 pm, Scott Burson <········@gmail.com> wrote:
>> On Jun 4, 12:37 pm, Kaz Kylheku <········@gmail.com> wrote:
>>
>>> ["Followup-To:" header set to comp.lang.lisp.]
>>> On 2009-06-04, Roedy Green <···········@mindprod.com.invalid> wrote:
>>>> Threads have been part of Java since Day 1.
>>> Unfortunately, not sane threads designed by people who actually understand
>>> multithreading.
>> What's wrong with them?  -- It's a sincere question.  I haven't
>> written much Java, but I've written a little, and used the threads --
>> I don't recall it being any worse than Pthreads or whatever.
>>
>> -- Scott
> 
> Yes, I too believe threads in Java are not that bad. Nothing
> exceptional, either. And there are also little goodies like thread-
> local variables - sure we take them for granted in Lisp (special
> variables rebound in each thread), but I don't think C + phtreads has
> them.
> 
> Alessio

Visual Studio has them (threadlocalities and such).... But I think 
what's missing in general is fork() on Windows systems - I still think 
there is no need for threads (embedded devices/consoles excluded) - just 
fork() yourself and start talking to other you - no sharing is better.

From: Jon Harrop
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 22:17:28 +0000
Message-ID: <adidncvaMKq1BbTXnZ2dnUVZ8rhi4p2d@brightview.co.uk>

Dimiter "malkia" Stanev wrote:
> Alessio Stalla wrote:
>> Yes, I too believe threads in Java are not that bad. Nothing
>> exceptional, either. And there are also little goodies like thread-
>> local variables - sure we take them for granted in Lisp (special
>> variables rebound in each thread), but I don't think C + phtreads has
>> them.
> 
> Visual Studio has them (threadlocalities and such).... But I think
> what's missing in general is fork() on Windows systems - I still think
> there is no need for threads (embedded devices/consoles excluded) - just
> fork() yourself and start talking to other you - no sharing is better.

The constant time taken to fork is many orders of magnitude slower than the
time taken to schedule a work item on a thread-based wait-free
work-stealing concurrent queue implementation (e.g. Cilk or Microsoft's
TPL). Fork also means separate processes which means either communication
via serialization which is O(n) instead of O(1) reference passing or manual
memory management.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Thomas Munro
Subject: Re: multi-core software
Date: Thu, 04 Jun 2009 23:08:51 +0000
Message-ID: <a1a86e99-cd95-4b56-90ab-a66081b4c1e6@l28g2000vba.googlegroups.com>

On Jun 4, 10:33 pm, Alessio Stalla <·············@gmail.com> wrote:
> Yes, I too believe threads in Java are not that bad. Nothing
> exceptional, either. And there are also little goodies like thread-
> local variables - sure we take them for granted in Lisp (special
> variables rebound in each thread), but I don't think C + phtreads has
> them.

POSIX does have thread local storage (but it requires slightly more
work than a plain old variable):
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_setspecific.html

Same goes for Java:
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/ThreadLocal.html

Non-standard but common extension (GCC, IBM, Intel, Sun, Microsoft)
allow for plain old variable syntax:
http://gcc.gnu.org/onlinedocs/gcc-4.2.4/gcc/Thread_002dLocal.html
http://publib.boulder.ibm.com/infocenter/comphelp/v9v111/index.jsp?topic=/com.ibm.xlcpp9.aix.doc/language_ref/thread.htm
http://software.intel.com/en-us/articles/performance-tools-for-software-developers-thread-local-variables-and-initialization/
http://docs.sun.com/app/docs/doc/819-5265/bjabr?a=view
http://msdn.microsoft.com/en-us/library/ms859913.aspx

From: toby
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 02:52:00 +0000
Message-ID: <aa31d24f-cf30-4444-a04b-c79e89693924@o18g2000yqi.googlegroups.com>

On Jun 4, 3:37 pm, Kaz Kylheku <········@gmail.com> wrote:
> ["Followup-To:" header set to comp.lang.lisp.]
> On 2009-06-04, Roedy Green <···········@mindprod.com.invalid> wrote:
>
> > On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee <······@gmail.com>
> > wrote, quoted or indirectly quoted someone who said :
>
> >>• Why Must Software Be Rewritten For Multi-Core Processors?
>
> > Threads have been part of Java since Day 1.
>
> Unfortunately, not sane threads designed by people who actually understand
> multithreading.

So have you discovered Erlang yet?

>
> > The nice thing about Java is whether you are on a single core
> > processor or a 256 CPU machine (We got to run our Athena Integer Java
> > spreadsheet engine on such a beast), does not concern your code.
>
> You are dreaming if you think that there are any circumstances (other than
> circumstances in which performance doesn't matter) in which you don't have to
> concern yourself about the difference between a uniprocessor and a 256 CPU
> machine.

From: =?UTF-8?B?QW5kcsOpIFRoaWVtZQ==?=
Subject: Re: multi-core software
Date: Thu, 04 Jun 2009 20:42:39 +0000
Message-ID: <h09bjv$58a$1@news.eternal-september.org>

Kaz Kylheku schrieb:

> You are dreaming if you think that there are any circumstances (other than
> circumstances in which performance doesn't matter) in which you don't have to
> concern yourself about the difference between a uniprocessor and a 256 CPU
> machine.

What about 1 core vs. 2 cores?
And 2 cores vs. 256 cores?


André
-- 
Lisp is not dead. It’s just the URL that has changed:
http://clojure.org/

From: Rob Warnock
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 01:38:24 +0000
Message-ID: <up6dna1aq--N6rXXnZ2dnUVZ_hqdnZ2d@speakeasy.net>

André Thieme  <·······························@justmail.de> wrote:
+---------------
| Kaz Kylheku schrieb:
| > You are dreaming if you think that there are any circumstances
| > (other than circumstances in which performance doesn't matter)
| > in which you don't have to concern yourself about the difference
| > between a uniprocessor and a 256 CPU machine.
| 
| What about 1 core vs. 2 cores?
+---------------

That's the biggest difference, but...

+---------------
| And 2 cores vs. 256 cores?
+---------------

As they say in the military, "Quantity has a quality all its own."  ;-}

Even Dekker's Algorithm might be a reasonable way to do mutual
exclusion with only 2 cores[1], but *no* single-lock solution is
going to work well for 256 cores. Even with hardware support [such
as LL/SC, say], if nothing else the "write invalidate" traffic from
the lock location is going to create a hot spot in the memory system.

Back at SGI, it was often the case that an app that worked very well
on a 64-CPU ccNUMA system fell over when the CPU count was raised to 256.
Ditto a working 256-CPU app when expanded to 1024-CPU.

Amdahl's Law <http://en.wikipedia.org/wiki/Amdahl%27s_Law> always
gets you, eventually...  ;-}  ;-}


-Rob

[1] O.k., so nobody is going to *seriously* consider using Dekker's
    Algorithm for user-mode programs these days, though I *have* used
    it successfully for simple interlocks between a single-CPU host
    computer and a "smart" DMA controller. [*Much* more efficient
    than interrupting the host, actually.]

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Seamus MacRae
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 17:12:37 +0000
Message-ID: <h0bjm7$hq9$1@news.eternal-september.org>

Kaz Kylheku wrote:
[snip]

I see you're no longer confining yourself to the thread from hell, and 
now randomly crashing other threads comp.lang.java.programmer to 
badmouth Java and crosspost into cll.

> Unfortunately, not sane threads designed by people who actually understand
> multithreading.

Then let me enlighten you! We have had some nice concurrency support 
over here in Java-land since Java 5, and especially Java 6, what with 
java.util.concurrent, nonblocking streams in NIO, and 
SwingWorker/SwingUtilities.

From: Vend
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 17:26:20 +0000
Message-ID: <500f8756-fb11-4ac9-b30c-bc6170bab4ce@j20g2000vbp.googlegroups.com>

On Jun 4, 8:35 pm, Roedy Green <···········@mindprod.com.invalid>
wrote:
> On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee <······@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>
> >• Why Must Software Be Rewritten For Multi-Core Processors?
>
> Threads have been part of Java since Day 1.  Using threads complicates
> your code, but even with a single core processor, they can improve
> performance, particularly if you are doing something like combing
> multiple websites.
>
> The nice thing about Java is whether you are on a single core
> processor or a 256 CPU machine (We got to run our Athena Integer Java
> spreadsheet engine on such a beast), does not concern your code.
>
> You just have to make sure your threads don't interfere with each
> other, and Java/the OS, handle exploiting all the CPUs available.

You need to decompose your problem in 256 independent tasks.

It can be trivial for some problems and difficult or perhaps
impossible for some others.

From: Kaz Kylheku
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 18:15:00 +0000
Message-ID: <20090617091529.242@gmail.com>

On 2009-06-05, Vend <······@virgilio.it> wrote:
> On Jun 4, 8:35 pm, Roedy Green <···········@mindprod.com.invalid>
> wrote:
>> On Thu, 4 Jun 2009 09:46:44 -0700 (PDT), Xah Lee <······@gmail.com>
>> wrote, quoted or indirectly quoted someone who said :
>>
>> >• Why Must Software Be Rewritten For Multi-Core Processors?
>>
>> Threads have been part of Java since Day 1.  Using threads complicates
>> your code, but even with a single core processor, they can improve
>> performance, particularly if you are doing something like combing
>> multiple websites.
>>
>> The nice thing about Java is whether you are on a single core
>> processor or a 256 CPU machine (We got to run our Athena Integer Java
>> spreadsheet engine on such a beast), does not concern your code.
>>
>> You just have to make sure your threads don't interfere with each
>> other, and Java/the OS, handle exploiting all the CPUs available.
>
> You need to decompose your problem in 256 independent tasks.
>
> It can be trivial for some problems and difficult or perhaps
> impossible for some others.

Even for problems where it appears trivial, there can be hidden
issues, like false cache coherency communication where no actual
sharing is taking place. Or locks that appear to have low contention and
negligible performance impact on ``only'' 8 processors suddenly turn into
bottlenecks. Then there is NUMA. A given address in memory may be
RAM attached to the processor accessing it, or to another processor,
with very different access costs.

From: Roedy Green
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 23:26:37 +0000
Message-ID: <6baj251aj9jb23ga5b1s0aqng59t66v9rt@4ax.com>

On Fri, 5 Jun 2009 18:15:00 +0000 (UTC), Kaz Kylheku
<········@gmail.com> wrote, quoted or indirectly quoted someone who
said :

>Even for problems where it appears trivial, there can be hidden
>issues, like false cache coherency communication where no actual
>sharing is taking place. Or locks that appear to have low contention and
>negligible performance impact on ``only'' 8 processors suddenly turn into
>bottlenecks. Then there is NUMA. A given address in memory may be
>RAM attached to the processor accessing it, or to another processor,
>with very different access costs.

Could what you are saying be summed up by saying, "The more threads
you have the more important it is to keep your threads independent,
sharing as little data as possible."
-- 
Roedy Green Canadian Mind Products
http://mindprod.com

Never discourage anyone... who continually makes progress, no matter how slow.
~ Plato 428 BC died: 348 BC at age: 80

From: Vend
Subject: Re: multi-core software
Date: Sat, 06 Jun 2009 00:49:38 +0000
Message-ID: <1489cda9-cb03-4159-8cfe-021895da251f@j18g2000yql.googlegroups.com>

On Jun 6, 1:26 am, Roedy Green <···········@mindprod.com.invalid>
wrote:
> On Fri, 5 Jun 2009 18:15:00 +0000 (UTC), Kaz Kylheku
> <········@gmail.com> wrote, quoted or indirectly quoted someone who
> said :
>
> >Even for problems where it appears trivial, there can be hidden
> >issues, like false cache coherency communication where no actual
> >sharing is taking place. Or locks that appear to have low contention and
> >negligible performance impact on ``only'' 8 processors suddenly turn into
> >bottlenecks. Then there is NUMA. A given address in memory may be
> >RAM attached to the processor accessing it, or to another processor,
> >with very different access costs.
>
> Could what you are saying be summed up by saying, "The more threads
> you have the more important it is to keep your threads independent,
> sharing as little data as possible."

Besides technical issues such as cache conflicts and synchronization
latencies, there are more theoretical issues of task decomposability.
It seems it is not always feasible to decompose an algorithm into
subprograms that can be executed in parallel.

From: Raymond Wiker
Subject: Re: multi-core software
Date: Sat, 06 Jun 2009 08:59:06 +0000
Message-ID: <m2fxedd9gl.fsf@RAWMBP.local>

Roedy Green <···········@mindprod.com.invalid> writes:

> On Fri, 5 Jun 2009 18:15:00 +0000 (UTC), Kaz Kylheku
> <········@gmail.com> wrote, quoted or indirectly quoted someone who
> said :
>
>>Even for problems where it appears trivial, there can be hidden
>>issues, like false cache coherency communication where no actual
>>sharing is taking place. Or locks that appear to have low contention and
>>negligible performance impact on ``only'' 8 processors suddenly turn into
>>bottlenecks. Then there is NUMA. A given address in memory may be
>>RAM attached to the processor accessing it, or to another processor,
>>with very different access costs.
>
> Could what you are saying be summed up by saying, "The more threads
> you have the more important it is to keep your threads independent,
> sharing as little data as possible."

	Absolutely... not a new observation, either, as it follows
directly from Amdahl's law.

From: George Neuner
Subject: Re: multi-core software
Date: Sat, 06 Jun 2009 19:46:51 +0000
Message-ID: <1fhl25d01jaecem232smgaf1isgu527e1u@4ax.com>

On Fri, 05 Jun 2009 16:26:37 -0700, Roedy Green
<···········@mindprod.com.invalid> wrote:

>On Fri, 5 Jun 2009 18:15:00 +0000 (UTC), Kaz Kylheku
><········@gmail.com> wrote, quoted or indirectly quoted someone who
>said :
>
>>Even for problems where it appears trivial, there can be hidden
>>issues, like false cache coherency communication where no actual
>>sharing is taking place. Or locks that appear to have low contention and
>>negligible performance impact on ``only'' 8 processors suddenly turn into
>>bottlenecks. Then there is NUMA. A given address in memory may be
>>RAM attached to the processor accessing it, or to another processor,
>>with very different access costs.
>
>Could what you are saying be summed up by saying, "The more threads
>you have the more important it is to keep your threads independent,
>sharing as little data as possible."

And therein lies the problem of leveraging many cores.  There is a lot
of potential parallelism in programs (even in Java :) that is lost
because it is too fine a grain for threads.  Even the lightest weight
user space ("green") threads need a few hundred instructions, minimum,
to amortize the cost of context switching.

Add to that the fact that programmers have shown themselves, on
average, to be remarkably bad at figuring out what _should_ be done in
parallel - as opposed to what _can_ be done - and you've got a clear
indicator that threads, as we know them, are not scalable except under
a limited set of conditions. 

George

From: Paul Rubin
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 02:58:55 +0000
Message-ID: <7xskicoikw.fsf@ruckus.brouhaha.com>

George Neuner <········@comcast.net> writes:
> Even the lightest weight
> user space ("green") threads need a few hundred instructions, minimum,
> to amortize the cost of context switching.

I thought the definition of green threads was that multiplexing them
doesn't require context switches.

From: Jeff M.
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 03:08:29 +0000
Message-ID: <bc191dba-a1bf-4187-8035-8cd9dd552d93@n21g2000vba.googlegroups.com>

On Jun 6, 9:58 pm, Paul Rubin <·············@NOSPAM.invalid> wrote:
> George Neuner <········@comcast.net> writes:
> > Even the lightest weight
> > user space ("green") threads need a few hundred instructions, minimum,
> > to amortize the cost of context switching.
>
> I thought the definition of green threads was that multiplexing them
> doesn't require context switches.

There's always a context switch. It's just whether or not you are
switching in/out a virtual stack and registers for the context or the
hardware stack/registers.

Jeff M.

From: Paul Rubin
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 06:56:52 +0000
Message-ID: <7x8wk4351n.fsf@ruckus.brouhaha.com>

"Jeff M." <·······@gmail.com> writes:
> > > Even the lightest weight
> > > user space ("green") threads need a few hundred instructions, minimum,
> > > to amortize the cost of context switching....
> There's always a context switch. It's just whether or not you are
> switching in/out a virtual stack and registers for the context or the
> hardware stack/registers.

I don't see the hundreds of instructions in that case.  

http://shootout.alioth.debian.org/u32q/benchmark.php?test=threadring&lang=ghc&id=3

shows GHC doing 50 million lightweight thread switches in 8.47
seconds, passing a token around a thread ring.  Almost all of that is
probably spent acquiring and releasing the token's lock as the token
is passed from one thread to another.  That simply doesn't leave time
for hundreds of instructions per switch.

From: Scott David Daniels
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 12:18:48 +0000
Message-ID: <boudnanh3ZYrM7bXnZ2dnUVZ_oSdnZ2d@pdx.net>

Paul Rubin wrote:
> "Jeff M." <·······@gmail.com> writes:
>>>> Even the lightest weight
>>>> user space ("green") threads need a few hundred instructions, minimum,
>>>> to amortize the cost of context switching....
>> There's always a context switch. It's just whether or not you are
>> switching in/out a virtual stack and registers for the context or the
>> hardware stack/registers.
> 
> I don't see the hundreds of instructions in that case.  
> 
> http://shootout.alioth.debian.org/u32q/benchmark.php?test=threadring&lang=ghc&id=3
> 
> shows GHC doing 50 million lightweight thread switches in 8.47
> seconds, passing a token around a thread ring.  Almost all of that is
> probably spent acquiring and releasing the token's lock as the token
> is passed from one thread to another.  That simply doesn't leave time
> for hundreds of instructions per switch.
Remember, he said, "to amortize the cost of the context switch," not to
perform the switch itself. I stutter-stepped there myself.  One not
completely obvious place is in blowing the instruction and likely the
data) cache.  I suspect it is tough to measure when that cost applies,
but I expect he's likely right, except on artificial benchmarks, and
the nub of the problem is not on the benchmarks.  There is something
to be said for the good old daays when you looked up the instruction
timings that you used in a little document for your machine, and could
know the cost of any loop.  We are faster now, but part of the cost of
that speed is that timing is a black art.  Perhaps modern operating
systems need the syyem call that was implemented in the Stanfor AI Lab's
operating system -- phase of the moon.

--Scott David Daniels
·············@Acm.Org

From: Lew
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 15:16:46 +0000
Message-ID: <h0gll0$e8j$1@news.albasani.net>

Scott David Daniels wrote:
> the nub of the problem is not on the benchmarks.  There is something
> to be said for the good old daays when you looked up the instruction
> timings that you used in a little document for your machine, and could
> know the cost of any loop.  We are faster now, but part of the cost of
> that speed is that timing is a black art.  

Those good old days never existed.  Those manuals never accounted for things 
that affected timing even then, like memory latency or refresh time.  SRAM 
cache made things worse, since the published timings never mentioned 
cache-miss delays.  Though memory cache might seem a recent innovation, it's 
been around a while.  It would be challenging to find any published timing 
since the commercialization of computers that would actually tell the cost of 
any loop.

Things got worse when chips like the '86 family acquired multiple instructions 
for doing loops, still worse when pre-fetch pipelines became deeper and wider, 
absolutely Dark Art due to multi-level memory caches becoming universal, and 
throw-your-hands-up-and-leave-for-the-corner-bar with multiprocessor NUMA 
systems.  OSes and high-level languages complicate the matter - you never know 
how much time slice you'll get or how your source got compiled or optimized by 
run-time.

So the good old days are a matter of degree and self-deception - it was easier 
to fool ourselves then that we could at least guess timings proportionately if 
not absolutely, but things definitely get more unpredictable over evolution.

-- 
Lew

From: Scott David Daniels
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 15:46:28 +0000
Message-ID: <ysCdnXhGhZX_QrbXnZ2dnUVZ_o-dnZ2d@pdx.net>

Lew wrote:
> Scott David Daniels wrote:
>> the nub of the problem is not on the benchmarks.  There is something
>> to be said for the good old daays when you looked up the instruction
>> timings that you used in a little document for your machine, and could
>> know the cost of any loop.  We are faster now, but part of the cost of
>> that speed is that timing is a black art.  
> 
> Those good old days never existed.  Those manuals never accounted for 
> things that affected timing even then, like memory latency or refresh 
> time.

Well, as Gilbert and Sullivan wrote:
      - What, never?
      - No, never!
      - What, "Never"?
      - Well, hardly ever.
Look up the LGP-30.  It was quite predictable.  It has been a while.

--Scott David Daniels
·············@Acm.Org

From: Jon Harrop
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 18:43:25 +0000
Message-ID: <HeednTZTUv-ZlLHXnZ2dnUVZ8t1i4p2d@brightview.co.uk>

Scott David Daniels wrote:
> Lew wrote:
>> Scott David Daniels wrote:
>>> the nub of the problem is not on the benchmarks.  There is something
>>> to be said for the good old daays when you looked up the instruction
>>> timings that you used in a little document for your machine, and could
>>> know the cost of any loop.  We are faster now, but part of the cost of
>>> that speed is that timing is a black art.
>> 
>> Those good old days never existed.  Those manuals never accounted for
>> things that affected timing even then, like memory latency or refresh
>> time.
> 
> Well, as Gilbert and Sullivan wrote:
>       - What, never?
>       - No, never!
>       - What, "Never"?
>       - Well, hardly ever.
> Look up the LGP-30.  It was quite predictable.  It has been a while.

Same for early ARMs.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Christopher C. Stacy
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 00:24:08 +0000
Message-ID: <yzlk53n374n.fsf@news.dtpq.com>

Lew <·····@lewscanon.com> writes:
> Those good old days never existed.  Those manuals never accounted for
> things that affected timing even then, like memory latency or refresh time.

A lot of Z80 code was written that takes advantage of precise knowledge
of the timing of instructions.

From: Ken T.
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 22:11:11 +0000
Message-ID: <4a2c3aff$0$11556$ec3e2dad@news.usenetmonster.com>

On Sun, 07 Jun 2009 11:16:46 -0400, Lew wrote:

> So the good old days are a matter of degree and self-deception - it was
> easier to fool ourselves then that we could at least guess timings
> proportionately if not absolutely, but things definitely get more
> unpredictable over evolution.

As I recall I could get exact timings on my 6502 based Commodore 64.  The 
issues you speak of simply weren't issues. 

-- 
Ken T.
http://www.electricsenator.net

  Never underestimate the power of stupid people in large groups.

From: Jon Harrop
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 01:39:40 +0000
Message-ID: <aNqdnVazYL8J97HXnZ2dneKdnZydnZ2d@brightview.co.uk>

Ken T. wrote:
> On Sun, 07 Jun 2009 11:16:46 -0400, Lew wrote:
>> So the good old days are a matter of degree and self-deception - it was
>> easier to fool ourselves then that we could at least guess timings
>> proportionately if not absolutely, but things definitely get more
>> unpredictable over evolution.
> 
> As I recall I could get exact timings on my 6502 based Commodore 64.  The
> issues you speak of simply weren't issues.

Let's not forget Elite for the 6502 exploiting predictable performance in
order to switch graphics modes partway down the vsync!

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Ken T.
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 04:28:20 +0000
Message-ID: <4a2c9364$0$11539$ec3e2dad@news.usenetmonster.com>

On Mon, 08 Jun 2009 02:39:40 +0100, Jon Harrop wrote:

> Ken T. wrote:
>> On Sun, 07 Jun 2009 11:16:46 -0400, Lew wrote:
>>> So the good old days are a matter of degree and self-deception - it
>>> was easier to fool ourselves then that we could at least guess timings
>>> proportionately if not absolutely, but things definitely get more
>>> unpredictable over evolution.
>> 
>> As I recall I could get exact timings on my 6502 based Commodore 64. 
>> The issues you speak of simply weren't issues.
> 
> Let's not forget Elite for the 6502 exploiting predictable performance
> in order to switch graphics modes partway down the vsync!

That actually didn't require predictable timing.  You could tell the 
video chip to send you an interrupt when it got to a given scan line.  I 
used this myself.  

Elite was cool though.  I wasted many hours playing that game. 

-- 
Ken T.
http://www.electricsenator.net

  Duct tape is like the force.  It has a light side, and a dark side,
  and it holds the universe together ...
        -- Carl Zwanzig

From: ········@mpi-sws.org
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 08:05:44 +0000
Message-ID: <c152dfa7-68c8-4fb7-9d45-4c08c936c243@e21g2000yqb.googlegroups.com>

On Jun 8, 6:28 am, "Ken T." <·······@home.com> wrote:
>
> > Let's not forget Elite for the 6502 exploiting predictable performance
> > in order to switch graphics modes partway down the vsync!
>
> That actually didn't require predictable timing.  You could tell the
> video chip to send you an interrupt when it got to a given scan line.  I
> used this myself.  

I don't know what Elite did, but I know for sure that it was a common
trick on the Atari ST to switch color palettes or graphics mode at a
fixed point *in each single scan line* to get more colors, or display
graphics on the screen borders. That required "synchronous
programming", i.e. counting clock cycles of machine instructions such
that for every point in the program you knew exactly where the
electron ray would be.

The Atari ST had an M68000 with exactly 8 MHz, which made this
possible. There were no caches in those times, and clock cycles were
entirely predictable.

From: Paul Wallich
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 14:33:36 +0000
Message-ID: <h0j7g1$pcn$1@reader1.panix.com>

········@mpi-sws.org wrote:
> On Jun 8, 6:28 am, "Ken T." <·······@home.com> wrote:
>>> Let's not forget Elite for the 6502 exploiting predictable performance
>>> in order to switch graphics modes partway down the vsync!
>> That actually didn't require predictable timing.  You could tell the
>> video chip to send you an interrupt when it got to a given scan line.  I
>> used this myself.  
> 
> I don't know what Elite did, but I know for sure that it was a common
> trick on the Atari ST to switch color palettes or graphics mode at a
> fixed point *in each single scan line* to get more colors, or display
> graphics on the screen borders. That required "synchronous
> programming", i.e. counting clock cycles of machine instructions such
> that for every point in the program you knew exactly where the
> electron ray would be.
> 
> The Atari ST had an M68000 with exactly 8 MHz, which made this
> possible. There were no caches in those times, and clock cycles were
> entirely predictable.

The usual trick for these machines was an exact multiple of the NTSC 
"color clock", which was approx 3.58 MHz. The 8-bit atari video games 
and home computers all used this technique, as did the C-64/128. 
68000-based machines (such as the ST and the Amiga) could not only 
exploit that synchrony, they could also (this was the days before memory 
wall) exploit the fact that a 680x0 typically accessed memory only once 
every 4 clock cycles to do DMA from the same memory when the CPU wasn't 
using it. High display resolutions would lock the processor out of RAM 
except during blanking intervals. (Talk about contention and hot spots.)

Figuring out how to reuse resources most effectively was pretty much the 
same as the register-allocation problem for compilers, and was sometimes 
solved using the same kinds of graph-coloring algorithms...

From: Jeff M.
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 14:23:56 +0000
Message-ID: <9869d28a-1b7e-4190-a69f-ffb68585c794@t11g2000vbc.googlegroups.com>

On Jun 7, 1:56 am, Paul Rubin <·············@NOSPAM.invalid> wrote:
> "Jeff M." <·······@gmail.com> writes:
> > > > Even the lightest weight
> > > > user space ("green") threads need a few hundred instructions, minimum,
> > > > to amortize the cost of context switching....
> > There's always a context switch. It's just whether or not you are
> > switching in/out a virtual stack and registers for the context or the
> > hardware stack/registers.
>
> I don't see the hundreds of instructions in that case.  
>
> http://shootout.alioth.debian.org/u32q/benchmark.php?test=threadring&...
>
> shows GHC doing 50 million lightweight thread switches in 8.47
> seconds, passing a token around a thread ring.  Almost all of that is
> probably spent acquiring and releasing the token's lock as the token
> is passed from one thread to another.  That simply doesn't leave time
> for hundreds of instructions per switch.

Who said there has to be? Sample code below (just to get the point
across):

struct context {
   vir_reg pc, sp, bp, ... ;
   object* stack;

   // ...

   context* next;
};

struct vm {
   context* active_context;
};

void switch_context(vm* v)
{
   // maybe GC v->active_context before switching

   v->active_context = v->active_context->next;
}

Also, there isn't "hundreds of instructions" with multiplexing,
either. It's all done in hardware. Take a look at the disassembly for
any application: one that uses native threads on a platform that
supports preemption. You won't see any instructions anywhere in the
program that perform a context switch. If you did that would be
absolutely horrible. Imagine if the compiler did something like this:

while(1)
{
  // whatever
}

do_context_switch_here();

That would suck. ;-)

That's not to imply that there isn't a cost; there's always a cost.
The example above just goes to show that for green threads, the cost
[of the switch] can be reduced down to a single pointer assignment.

Jeff M.

From: Jon Harrop
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 15:21:58 +0000
Message-ID: <gqCdnetPkKBRRLbXnZ2dnUVZ8mkAAAAA@brightview.co.uk>

Paul Rubin wrote:
> "Jeff M." <·······@gmail.com> writes:
>> > > Even the lightest weight
>> > > user space ("green") threads need a few hundred instructions,
>> > > minimum, to amortize the cost of context switching....
>> There's always a context switch. It's just whether or not you are
>> switching in/out a virtual stack and registers for the context or the
>> hardware stack/registers.
> 
> I don't see the hundreds of instructions in that case.
> 
>
http://shootout.alioth.debian.org/u32q/benchmark.php?test=threadring&lang=ghc&id=3
> 
> shows GHC doing 50 million lightweight thread switches in 8.47
> seconds, passing a token around a thread ring.  Almost all of that is
> probably spent acquiring and releasing the token's lock as the token
> is passed from one thread to another.  That simply doesn't leave time
> for hundreds of instructions per switch.

And Haskell is not exactly fast...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: John Thingstad
Subject: Re: multi-core software
Date: Sat, 06 Jun 2009 20:47:32 +0000
Message-ID: <op.uu4epioyut4oq5@pandora>

På Sat, 06 Jun 2009 21:46:51 +0200, skrev George Neuner  
<········@comcast.net>:

> On Fri, 05 Jun 2009 16:26:37 -0700, Roedy Green
>
> Add to that the fact that programmers have shown themselves, on
> average, to be remarkably bad at figuring out what _should_ be done in
> parallel - as opposed to what _can_ be done - and you've got a clear
> indicator that threads, as we know them, are not scalable except under
> a limited set of conditions.
>
> George

I find the dataflow model of concurrency on Oz to be interesting and to  
address many of the issues you just mentioned.
See in particular: 'Dataflow variables and declarative concurrency' and  
onward.
http://en.wikipedia.org/wiki/Oz_(programming_language)


---------------------
John Thingstad

From: Jon Harrop
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 15:20:56 +0000
Message-ID: <gqCdnehPkKATRLbXnZ2dnUVZ8mmdnZ2d@brightview.co.uk>

George Neuner wrote:
> On Fri, 05 Jun 2009 16:26:37 -0700, Roedy Green
> <···········@mindprod.com.invalid> wrote:
>>On Fri, 5 Jun 2009 18:15:00 +0000 (UTC), Kaz Kylheku
>><········@gmail.com> wrote, quoted or indirectly quoted someone who
>>said :
>>>Even for problems where it appears trivial, there can be hidden
>>>issues, like false cache coherency communication where no actual
>>>sharing is taking place. Or locks that appear to have low contention and
>>>negligible performance impact on ``only'' 8 processors suddenly turn into
>>>bottlenecks. Then there is NUMA. A given address in memory may be
>>>RAM attached to the processor accessing it, or to another processor,
>>>with very different access costs.
>>
>>Could what you are saying be summed up by saying, "The more threads
>>you have the more important it is to keep your threads independent,
>>sharing as little data as possible."
> 
> And therein lies the problem of leveraging many cores.  There is a lot
> of potential parallelism in programs (even in Java :) that is lost
> because it is too fine a grain for threads.

That will always be true so it conveys no useful information to the
practitioner.

> Even the lightest weight 
> user space ("green") threads need a few hundred instructions, minimum,
> to amortize the cost of context switching.

Work items in Cilk are much faster than that.

> Add to that the fact that programmers have shown themselves, on
> average, to be remarkably bad at figuring out what _should_ be done in
> parallel - as opposed to what _can_ be done - and you've got a clear
> indicator that threads, as we know them, are not scalable except under
> a limited set of conditions.

Parallelism is inherently not scalable. I see no merit in speculating about
the ramifications of "average" programmers alleged inabilities.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Jon Harrop
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 15:14:56 +0000
Message-ID: <FoOdndKEFoG7RbbXnZ2dnUVZ8nli4p2d@brightview.co.uk>

Roedy Green wrote:
> On Fri, 5 Jun 2009 18:15:00 +0000 (UTC), Kaz Kylheku
> <········@gmail.com> wrote, quoted or indirectly quoted someone who
> said :
>>Even for problems where it appears trivial, there can be hidden
>>issues, like false cache coherency communication where no actual
>>sharing is taking place. Or locks that appear to have low contention and
>>negligible performance impact on ``only'' 8 processors suddenly turn into
>>bottlenecks. Then there is NUMA. A given address in memory may be
>>RAM attached to the processor accessing it, or to another processor,
>>with very different access costs.
> 
> Could what you are saying be summed up by saying, "The more threads
> you have the more important it is to keep your threads independent,
> sharing as little data as possible."

I see no problem with mutable shared state.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Arved Sandstrom
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 15:35:01 +0000
Message-ID: <F6RWl.29760$Db2.22929@edtnps83>

Jon Harrop wrote:
> Roedy Green wrote:
>> On Fri, 5 Jun 2009 18:15:00 +0000 (UTC), Kaz Kylheku
>> <········@gmail.com> wrote, quoted or indirectly quoted someone who
>> said :
>>> Even for problems where it appears trivial, there can be hidden
>>> issues, like false cache coherency communication where no actual
>>> sharing is taking place. Or locks that appear to have low contention and
>>> negligible performance impact on ``only'' 8 processors suddenly turn into
>>> bottlenecks. Then there is NUMA. A given address in memory may be
>>> RAM attached to the processor accessing it, or to another processor,
>>> with very different access costs.
>> Could what you are saying be summed up by saying, "The more threads
>> you have the more important it is to keep your threads independent,
>> sharing as little data as possible."
> 
> I see no problem with mutable shared state.
> 
In which case, Jon, you're in a small minority.

AHS

From: Jon Harrop
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 18:41:43 +0000
Message-ID: <HeednTdTUv8DlbHXnZ2dnUVZ8t1i4p2d@brightview.co.uk>

Arved Sandstrom wrote:
> Jon Harrop wrote:
>> I see no problem with mutable shared state.
>
> In which case, Jon, you're in a small minority.

No. Most programmers still care about performance and performance means
mutable state.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Patricia Shanahan
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 21:04:49 +0000
Message-ID: <hcadnZUNZt3otrHXnZ2dnUVZ_gydnZ2d@earthlink.com>

Jon Harrop wrote:
> Arved Sandstrom wrote:
>> Jon Harrop wrote:
>>> I see no problem with mutable shared state.
>> In which case, Jon, you're in a small minority.
> 
> No. Most programmers still care about performance and performance means
> mutable state.
> 

I don't see why that would affect whether one thinks there are problems.

In my opinion, shared mutable state has a lot of problems. It is also
sometimes the best design for performance reasons.

Patricia

From: Lew
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 21:45:13 +0000
Message-ID: <h0hcda$e9j$1@news.albasani.net>

Jon Harrop wrote:
>>>> I see no problem with mutable shared state.
>>> In which case, Jon, you're in a small minority.

Patricia Shanahan wrote:
> In my opinion, shared mutable state has a lot of problems. It is also
> sometimes the best design for performance reasons.

As Dr. Jon pointed out upthread, one can write decent code with mutable shared 
state.  It is also true that mutable state presents a lot of problems - 
potential problems, ones that can be solved, but not ones that can be solved 
thoughtlessly.  On the flip side, one can write a tremendous amount of 
effective multi-threaded code involving shared mutable state with attention to 
a few rules of thumb, like always synchronize access and don't use different 
monitors to do so.

Unlike some environments (e.g., database management systems), Java's tools to 
manage concurrency are explicit and low level.  The programmer's job is to 
make sure those tools are used correctly to avoid problems.  As long as they 
do that, then there is no special problem with shared mutable state.

There is, however, a cost.  Certain things must happen slower when you share 
mutable state, than when you share immutable state or don't share state. 
Certain things must happen when you share mutable state, regardless of speed, 
because without them your code doesn't work.  For some reason, concurrent 
programming is an area often not well understood by a significant percentage 
of workaday programmers.  When problems do arise, they tend to be 
probabilistic in nature and vary widely with system characteristics like 
attempted load.

So the meeting ground is, yes, concurrent mutable state can present problems 
if not properly managed.  Properly managing such is not necessarily a huge 
burden, but it must be borne.  When done properly, shared mutable state will 
not present problems in production.

-- 
Lew

From: Jon Harrop
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 22:03:49 +0000
Message-ID: <du2dnU74mNphqrHXnZ2dnUVZ8tOdnZ2d@brightview.co.uk>

Lew wrote:
> As Dr. Jon pointed out upthread, one can write decent code with mutable
> shared
> state.  It is also true that mutable state presents a lot of problems -
> potential problems, ones that can be solved, but not ones that can be
> solved
> thoughtlessly.  On the flip side, one can write a tremendous amount of
> effective multi-threaded code involving shared mutable state with
> attention to a few rules of thumb, like always synchronize access and
> don't use different monitors to do so.
> 
> Unlike some environments (e.g., database management systems), Java's tools
> to
> manage concurrency are explicit and low level.  The programmer's job is to
> make sure those tools are used correctly to avoid problems.  As long as
> they do that, then there is no special problem with shared mutable state.
> 
> There is, however, a cost.  Certain things must happen slower when you
> share mutable state, than when you share immutable state or don't share
> state. Certain things must happen when you share mutable state, regardless
> of speed,
> because without them your code doesn't work.  For some reason, concurrent
> programming is an area often not well understood by a significant
> percentage
> of workaday programmers.  When problems do arise, they tend to be
> probabilistic in nature and vary widely with system characteristics like
> attempted load.
> 
> So the meeting ground is, yes, concurrent mutable state can present
> problems
> if not properly managed.  Properly managing such is not necessarily a huge
> burden, but it must be borne.  When done properly, shared mutable state
> will not present problems in production.

I agree entirely but my statements were about parallelism and not
concurrency. Parallel and concurrent programming have wildly different
characteristics and solutions. I don't believe shared mutable state is
overly problematic in the context of parallelism. Indeed, I think it is
usually the best solution in that context.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Lew
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 22:03:48 +0000
Message-ID: <h0hdg4$f9n$2@news.albasani.net>

Jon Harrop wrote:
> I agree entirely but my statements were about parallelism and not
> concurrency. Parallel and concurrent programming have wildly different
> characteristics and solutions. I don't believe shared mutable state is
> overly problematic in the context of parallelism. Indeed, I think it is
> usually the best solution in that context.

Interesting distinction.  Would it be fair to compare concurrent programming 
to the bricks used to build the parallel program's edifice?

-- 
Lew

From: Arved Sandstrom
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 23:21:14 +0000
Message-ID: <KXXWl.29794$Db2.7653@edtnps83>

Lew wrote:
> Jon Harrop wrote:
>> I agree entirely but my statements were about parallelism and not
>> concurrency. Parallel and concurrent programming have wildly different
>> characteristics and solutions. I don't believe shared mutable state is
>> overly problematic in the context of parallelism. Indeed, I think it is
>> usually the best solution in that context.
> 
> Interesting distinction.  Would it be fair to compare concurrent 
> programming to the bricks used to build the parallel program's edifice?

Way too much of a fine distinction. While they are in fact different, 
the point of concurrent programming is to structure programs as a group 
of computations, which can be executed in parallel (however that might 
actually be done depending on how many processors there are). Parallel 
computing means to carry out many computations simultaneously. These are 
interleaved definitions. And they are *not* wildly different.

If you talk about shared mutable state, it is not as easy to use as Dr 
Harrop seems to think it is. Maybe in his experience it has been, but in 
general it's no trivial thing to manage. Lew, you probably summarized it 
best a few posts upstream.

AHS

From: Jon Harrop
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 23:59:52 +0000
Message-ID: <7KudnYRCR5KtzrHXnZ2dnUVZ8kOdnZ2d@brightview.co.uk>

Arved Sandstrom wrote:
> Lew wrote:
>> Interesting distinction.  Would it be fair to compare concurrent
>> programming to the bricks used to build the parallel program's edifice?
> 
> Way too much of a fine distinction. While they are in fact different,
> the point of concurrent programming is to structure programs as a group
> of computations, which can be executed in parallel (however that might
> actually be done depending on how many processors there are).

No. Concurrent programming is about interleaving computations in order to
reduce latency. Nothing to do with parallelism.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Arved Sandstrom
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 01:42:08 +0000
Message-ID: <Q%ZWl.31002$PH1.5003@edtnps82>

Jon Harrop wrote:
> Arved Sandstrom wrote:
>> Lew wrote:
>>> Interesting distinction.  Would it be fair to compare concurrent
>>> programming to the bricks used to build the parallel program's edifice?
>> Way too much of a fine distinction. While they are in fact different,
>> the point of concurrent programming is to structure programs as a group
>> of computations, which can be executed in parallel (however that might
>> actually be done depending on how many processors there are).
> 
> No. Concurrent programming is about interleaving computations in order to
> reduce latency. Nothing to do with parallelism.

Jon, I do concurrent programming all the time, as do most of my peers. 
Way down on the list of why we do it is the reduction of latency.

AHS

From: Jon Harrop
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 02:42:32 +0000
Message-ID: <K5Odnb_xe7vN5LHXnZ2dnUVZ8g5i4p2d@brightview.co.uk>

Arved Sandstrom wrote:
> Jon Harrop wrote:
>> Arved Sandstrom wrote:
>>> Lew wrote:
>>>> Interesting distinction.  Would it be fair to compare concurrent
>>>> programming to the bricks used to build the parallel program's edifice?
>>> Way too much of a fine distinction. While they are in fact different,
>>> the point of concurrent programming is to structure programs as a group
>>> of computations, which can be executed in parallel (however that might
>>> actually be done depending on how many processors there are).
>> 
>> No. Concurrent programming is about interleaving computations in order to
>> reduce latency. Nothing to do with parallelism.
> 
> Jon, I do concurrent programming all the time, as do most of my peers.
> Way down on the list of why we do it is the reduction of latency.

What is higher on the list?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Arved Sandstrom
Subject: Re: multi-core software
Date: Wed, 10 Jun 2009 02:08:36 +0000
Message-ID: <EAEXl.31230$PH1.29043@edtnps82>

Jon Harrop wrote:
> Arved Sandstrom wrote:
>> Jon Harrop wrote:
>>> Arved Sandstrom wrote:
>>>> Lew wrote:
>>>>> Interesting distinction.  Would it be fair to compare concurrent
>>>>> programming to the bricks used to build the parallel program's edifice?
>>>> Way too much of a fine distinction. While they are in fact different,
>>>> the point of concurrent programming is to structure programs as a group
>>>> of computations, which can be executed in parallel (however that might
>>>> actually be done depending on how many processors there are).
>>> No. Concurrent programming is about interleaving computations in order to
>>> reduce latency. Nothing to do with parallelism.
>> Jon, I do concurrent programming all the time, as do most of my peers.
>> Way down on the list of why we do it is the reduction of latency.
> 
> What is higher on the list?

Correctness.

I'm not being facetious. I write applications that run on application 
servers, and from time to time I have had to write various special 
purpose servers. This kind of programming is all about managing 
concurrent execution of computations. The overarching concern is 
reliability and correct function. For many corporate situations, even 
with hundreds of users, the actual load at any instant is low enough 
that the various servers involved are nowhere close to being stressed 
out - performance is a secondary issue.

AHS

From: Jeff M.
Subject: Re: multi-core software
Date: Wed, 10 Jun 2009 14:50:37 +0000
Message-ID: <c1637ed4-9239-4b10-a1c0-847787e3419c@q37g2000vbi.googlegroups.com>

On Jun 9, 9:08 pm, Arved Sandstrom <·······@hotmail.com> wrote:
> Jon Harrop wrote:
> >
> > Arved Sandstrom wrote:
> >>
> >> Jon, I do concurrent programming all the time, as do most of my peers.
> >> Way down on the list of why we do it is the reduction of latency.
>
> > What is higher on the list?
>
> Correctness.
>

IMO, that response is a bit of a cop-out. Correctness is _always_ most
important, no matter what application you are creating; without it,
you don't have a job and the company you work for goes out of
business.

But, assuming that your program works and does what it's supposed to,
I agree with Jon that performance needs to be right near the top of
the list of concerns. Why? Performance isn't about looking good as a
programmer, or having fun making a function run in 15 cycles instead
of 24, or coming up with some neat bit packing scheme so that your app
now only uses 20K instead of 200K. Performance is - pure and simple -
about one thing only: money.

Programs that use more memory require more money for the hardware of
every user. Programs that run slower eat more time per day. If you
have 100,000 users, all doing an operation once per day that takes 20
seconds, being able to shave 5 seconds off that saves 5.78 man-days of
work. Hell, for some applications, that 20 seconds is just startup
time spent at a splash screen. Just imagine if every Google search
took even 5 seconds to resolve, how much time would be wasted every
day around the world - ignoring the fact that Google wouldn't exist if
that were the case ;-). Obviously Google engineers work incredibly
hard every day to ensure correct results, but performance better be
right up there at the top of the list as well.

Jeff M.

From: Matthias Blume
Subject: Re: multi-core software
Date: Wed, 10 Jun 2009 15:10:03 +0000
Message-ID: <m1vdn4unuc.fsf@hana.uchicago.edu>

"Jeff M." <·······@gmail.com> writes:

> On Jun 9, 9:08�pm, Arved Sandstrom <·······@hotmail.com> wrote:
>> Jon Harrop wrote:
>> >
>> > Arved Sandstrom wrote:
>> >>
>> >> Jon, I do concurrent programming all the time, as do most of my peers.
>> >> Way down on the list of why we do it is the reduction of latency.
>>
>> > What is higher on the list?
>>
>> Correctness.
>>
>
> IMO, that response is a bit of a cop-out. Correctness is _always_ most
> important, no matter what application you are creating; without it,
> you don't have a job and the company you work for goes out of
> business.
>
> But, assuming that your program works and does what it's supposed to,
> I agree with Jon that performance needs to be right near the top of
> the list of concerns. Why? Performance isn't about looking good as a
> programmer, or having fun making a function run in 15 cycles instead
> of 24, or coming up with some neat bit packing scheme so that your app
> now only uses 20K instead of 200K. Performance is - pure and simple -
> about one thing only: money.

Programmer time is vastly more expensive than CPU time, so the
money argument often leads to slow ("low performance") solutions as long
as they are "good enough" because developing a faster solution would
mean spending more valuable programmer time at a cost that cannot
be recovered over the life cycle of the product in question.

That being said, there are plenty of situations where performance
obviously does matter a great deal -- as you correctly pointed out.
(It all depends on the above mentioned "product in question" and the
nature of its life cycle.)

Matthias

From: Jon Harrop
Subject: Re: multi-core software
Date: Wed, 10 Jun 2009 19:42:09 +0000
Message-ID: <NNGdnaos5LzPlq3XnZ2dnUVZ8iSdnZ2d@brightview.co.uk>

Matthias Blume wrote:
> "Jeff M." <·······@gmail.com> writes:
>> But, assuming that your program works and does what it's supposed to,
>> I agree with Jon that performance needs to be right near the top of
>> the list of concerns. Why? Performance isn't about looking good as a
>> programmer, or having fun making a function run in 15 cycles instead
>> of 24, or coming up with some neat bit packing scheme so that your app
>> now only uses 20K instead of 200K. Performance is - pure and simple -
>> about one thing only: money.
> 
> Programmer time is vastly more expensive than CPU time, so the
> money argument often leads to slow ("low performance") solutions as long
> as they are "good enough" because developing a faster solution would
> mean spending more valuable programmer time at a cost that cannot
> be recovered over the life cycle of the product in question.

In the context of commercial software, the money to fund developers to
improve performance comes from the huge marketing budget because
performance is usually more about marketing than anything else.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Dimiter "malkia" Stanev
Subject: Re: multi-core software
Date: Wed, 10 Jun 2009 18:26:24 +0000
Message-ID: <h0otsg$cgb$1@malkia.motzarella.org>

Jeff M. wrote:
> On Jun 9, 9:08 pm, Arved Sandstrom <·······@hotmail.com> wrote:
>> Jon Harrop wrote:
>>> Arved Sandstrom wrote:
>>>> Jon, I do concurrent programming all the time, as do most of my peers.
>>>> Way down on the list of why we do it is the reduction of latency.
>>> What is higher on the list?
>> Correctness.
>>
> 
> IMO, that response is a bit of a cop-out. Correctness is _always_ most
> important, no matter what application you are creating; without it,
> you don't have a job and the company you work for goes out of
> business.

PC / Video Games definitely fall out of the correctness. As long as the 
game does not crash your XBOX/PS3/Whatever for certain amount of time, 
and behaves well then, it's fine.

Bugs are already part of the "genre".

In reality you can't ship on time, there are always BUGS :)

Most important thing in games is (at least for large percent of them) 
speed of graphics - fluid 60fps, or stable 30fps.

> 
> But, assuming that your program works and does what it's supposed to,
> I agree with Jon that performance needs to be right near the top of
> the list of concerns. Why? Performance isn't about looking good as a
> programmer, or having fun making a function run in 15 cycles instead
> of 24, or coming up with some neat bit packing scheme so that your app
> now only uses 20K instead of 200K. Performance is - pure and simple -
> about one thing only: money.
> 
> Programs that use more memory require more money for the hardware of
> every user. Programs that run slower eat more time per day. If you
> have 100,000 users, all doing an operation once per day that takes 20
> seconds, being able to shave 5 seconds off that saves 5.78 man-days of
> work. Hell, for some applications, that 20 seconds is just startup
> time spent at a splash screen. Just imagine if every Google search
> took even 5 seconds to resolve, how much time would be wasted every
> day around the world - ignoring the fact that Google wouldn't exist if
> that were the case ;-). Obviously Google engineers work incredibly
> hard every day to ensure correct results, but performance better be
> right up there at the top of the list as well.
> 
> Jeff M.

From: Seamus MacRae
Subject: Re: multi-core software
Date: Wed, 10 Jun 2009 17:49:30 +0000
Message-ID: <h0ornb$tj7$10@news.eternal-september.org>

Jeff M. wrote:
> On Jun 9, 9:08 pm, Arved Sandstrom <·······@hotmail.com> wrote:
>> Jon Harrop wrote:
>>> Arved Sandstrom wrote:
>>>> Jon, I do concurrent programming all the time, as do most of my peers.
>>>> Way down on the list of why we do it is the reduction of latency.
>>> What is higher on the list?
>> Correctness.
> 
> IMO, that response is a bit of a cop-out. Correctness is _always_ most
> important, no matter what application you are creating; without it,
> you don't have a job and the company you work for goes out of
> business.

And when, exactly, did Microsoft go out of business? I hadn't heard it 
mentioned in the news. :)

From: Jeff M.
Subject: Re: multi-core software
Date: Wed, 10 Jun 2009 18:07:12 +0000
Message-ID: <ee83960b-fe51-4e4a-b900-40a15883728d@j12g2000vbl.googlegroups.com>

On Jun 10, 12:49 pm, Seamus MacRae <··········@live.ca.invalid> wrote:
> Jeff M. wrote:
> > On Jun 9, 9:08 pm, Arved Sandstrom <·······@hotmail.com> wrote:
> >> Jon Harrop wrote:
> >>> Arved Sandstrom wrote:
> >>>> Jon, I do concurrent programming all the time, as do most of my peers.
> >>>> Way down on the list of why we do it is the reduction of latency.
> >>> What is higher on the list?
> >> Correctness.
>
> > IMO, that response is a bit of a cop-out. Correctness is _always_ most
> > important, no matter what application you are creating; without it,
> > you don't have a job and the company you work for goes out of
> > business.
>
> And when, exactly, did Microsoft go out of business? I hadn't heard it
> mentioned in the news. :)

Touche. :)

Jeff M.

From: Arved Sandstrom
Subject: Re: multi-core software
Date: Thu, 11 Jun 2009 00:06:08 +0000
Message-ID: <QTXXl.31455$PH1.19095@edtnps82>

Jeff M. wrote:
> On Jun 9, 9:08 pm, Arved Sandstrom <·······@hotmail.com> wrote:
>> Jon Harrop wrote:
>>> Arved Sandstrom wrote:
>>>> Jon, I do concurrent programming all the time, as do most of my peers.
>>>> Way down on the list of why we do it is the reduction of latency.
>>> What is higher on the list?
>> Correctness.
>>
> 
> IMO, that response is a bit of a cop-out. Correctness is _always_ most
> important, no matter what application you are creating; without it,
> you don't have a job and the company you work for goes out of
> business.
> 
> But, assuming that your program works and does what it's supposed to,
> I agree with Jon that performance needs to be right near the top of
> the list of concerns. Why? Performance isn't about looking good as a
> programmer, or having fun making a function run in 15 cycles instead
> of 24, or coming up with some neat bit packing scheme so that your app
> now only uses 20K instead of 200K. Performance is - pure and simple -
> about one thing only: money.
> 
> Programs that use more memory require more money for the hardware of
> every user. Programs that run slower eat more time per day. If you
> have 100,000 users, all doing an operation once per day that takes 20
> seconds, being able to shave 5 seconds off that saves 5.78 man-days of
> work. Hell, for some applications, that 20 seconds is just startup
> time spent at a splash screen. Just imagine if every Google search
> took even 5 seconds to resolve, how much time would be wasted every
> day around the world - ignoring the fact that Google wouldn't exist if
> that were the case ;-). Obviously Google engineers work incredibly
> hard every day to ensure correct results, but performance better be
> right up there at the top of the list as well.
> 
> Jeff M.

Point taken, but I primarily work on internal government and corporate 
applications. Might be hundreds or thousands of users at any given time, 
but not typically tens or hundreds of thousands. Hundreds or thousands 
of users translates to at least an order of magnitude less 
"simultaneous" users. Ops people that I talk to who monitor apps I've 
worked on, or similar apps, rarely report any such where the application 
itself is presenting a sluggish aspect to users because it's stressing 
out processors, or consuming gargantuan memory.

Typically when latency problems come up, it's because the app has to 
talk to other servers - databases, LDAP, print servers etc. In other 
words, the network is coming into play. In fact when we do work on 
performance, it's typically about minimizing calls to the database or 
other services.

AHS

From: Jon Harrop
Subject: Re: multi-core software
Date: Wed, 10 Jun 2009 14:44:05 +0000
Message-ID: <09udndwb56fpWLLXnZ2dnUVZ8hSdnZ2d@brightview.co.uk>

Arved Sandstrom wrote:
> Jon Harrop wrote:
>> Arved Sandstrom wrote:
>>> Jon Harrop wrote:
>>>> No. Concurrent programming is about interleaving computations in order
>>>> to reduce latency. Nothing to do with parallelism.
>>>
>>> Jon, I do concurrent programming all the time, as do most of my peers.
>>> Way down on the list of why we do it is the reduction of latency.
>> 
>> What is higher on the list?
> 
> Correctness.
> 
> I'm not being facetious. I write applications that run on application
> servers, and from time to time I have had to write various special
> purpose servers. This kind of programming is all about managing
> concurrent execution of computations. The overarching concern is 
> reliability and correct function. For many corporate situations, even
> with hundreds of users, the actual load at any instant is low enough
> that the various servers involved are nowhere close to being stressed
> out - performance is a secondary issue.

In other words, without concurrency the latency would be so high that you
would consider the program to be wrong. However you cut it, the real reason
is latency.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Paul Rubin
Subject: Re: multi-core software
Date: Wed, 10 Jun 2009 15:22:27 +0000
Message-ID: <7xmy8gazbg.fsf@ruckus.brouhaha.com>

Jon Harrop <···@ffconsultancy.com> writes:
> > I'm not being facetious. I write applications that run on application
> > servers, and from time to time I have had to write various special
> > purpose servers. This kind of programming is all about managing
> > concurrent execution of computations. The overarching concern is 
> > reliability and correct function. For many corporate situations, even
> > with hundreds of users, the actual load at any instant is low enough
> > that the various servers involved are nowhere close to being stressed
> > out - performance is a secondary issue.
> 
> In other words, without concurrency the latency would be so high
> that you would consider the program to be wrong. However you cut it,
> the real reason is latency.

I don't think that follows, if there is two-way communication and
dependency between the servers, combined with lack of control over
when any particular server decides to initiate an outgoing request.
Stuff may have to happen concurrently to avoid complete deadlock.

From: Arved Sandstrom
Subject: Re: multi-core software
Date: Wed, 10 Jun 2009 23:55:00 +0000
Message-ID: <oJXXl.31454$PH1.6096@edtnps82>

Jon Harrop wrote:
> Arved Sandstrom wrote:
>> Jon Harrop wrote:
>>> Arved Sandstrom wrote:
>>>> Jon Harrop wrote:
>>>>> No. Concurrent programming is about interleaving computations in order
>>>>> to reduce latency. Nothing to do with parallelism.
>>>> Jon, I do concurrent programming all the time, as do most of my peers.
>>>> Way down on the list of why we do it is the reduction of latency.
>>> What is higher on the list?
>> Correctness.
>>
>> I'm not being facetious. I write applications that run on application
>> servers, and from time to time I have had to write various special
>> purpose servers. This kind of programming is all about managing
>> concurrent execution of computations. The overarching concern is 
>> reliability and correct function. For many corporate situations, even
>> with hundreds of users, the actual load at any instant is low enough
>> that the various servers involved are nowhere close to being stressed
>> out - performance is a secondary issue.
> 
> In other words, without concurrency the latency would be so high that you
> would consider the program to be wrong. However you cut it, the real reason
> is latency.

For a certain group of applications and user loads I would concede that 
point, yes.

For quite a few other situations, you could queue up user requests and 
execute them in order, finishing each before proceeding to the next, and 
users wouldn't even notice. I wrote a J2SE server a few months ago, to 
solve a very specific problem associated with an application running on 
a J2EE server, that could handle dozens of users per second using this 
strategy. It didn't write it that way because doing so is perverse, but 
I could have.

AHS

From: Piet van Oostrum
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 08:35:34 +0000
Message-ID: <m2bpozazs9.fsf@cs.uu.nl>

By the way, there is a series of articles about concurrency on ACM Queue
which may be interesting for those participating in or just following
this discussion:

http://queue.acm.org/listing.cfm?item_topic=Concurrency&qc_type=theme_list&filter=Concurrency&page_title=Concurrency

Here is one introductory paragraph from one of the articles:

    Parallel programming poses many new challenges to the developer, one of
    which is synchronizing concurrent access to shared memory by multiple
    threads. Programmers have traditionally used locks for synchronization,
    but lock-based synchronization has well-known pitfalls. Simplistic
    coarse-grained locking does not scale well, while more sophisticated
    fine-grained locking risks introducing deadlocks and data races.
    Furthermore, scalable libraries written using fine-grained locks cannot
    be easily composed in a way that retains scalability and avoids deadlock
    and data races. 
-- 
Piet van Oostrum <····@cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: ····@vanoostrum.org

From: Seamus MacRae
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 18:00:41 +0000
Message-ID: <h0jjka$tvg$12@news.eternal-september.org>

Piet van Oostrum wrote:
> By the way, there is a series of articles about concurrency on ACM Queue
> which may be interesting for those participating in or just following
> this discussion:
> 
> http://queue.acm.org/listing.cfm?item_topic=Concurrency&qc_type=theme_list&filter=Concurrency&page_title=Concurrency
> 
> Here is one introductory paragraph from one of the articles:
> 
>     Parallel programming poses many new challenges to the developer, one of
>     which is synchronizing concurrent access to shared memory by multiple
>     threads. Programmers have traditionally used locks for synchronization,
>     but lock-based synchronization has well-known pitfalls. Simplistic
>     coarse-grained locking does not scale well, while more sophisticated
>     fine-grained locking risks introducing deadlocks and data races.
>     Furthermore, scalable libraries written using fine-grained locks cannot
>     be easily composed in a way that retains scalability and avoids deadlock
>     and data races. 

Is that the one about transactional memory?

From: Piet van Oostrum
Subject: Re: multi-core software
Date: Tue, 09 Jun 2009 06:00:12 +0000
Message-ID: <m2tz2q9cb7.fsf@cs.uu.nl>

>>>>> Seamus MacRae <··········@live.ca.invalid> (SM) wrote:

>SM> Piet van Oostrum wrote:
>>> By the way, there is a series of articles about concurrency on ACM Queue
>>> which may be interesting for those participating in or just following
>>> this discussion:
>>> 
>>> http://queue.acm.org/listing.cfm?item_topic=Concurrency&qc_type=theme_list&filter=Concurrency&page_title=Concurrency
>>> 
>>> Here is one introductory paragraph from one of the articles:
>>> 
>>> Parallel programming poses many new challenges to the developer, one of
>>> which is synchronizing concurrent access to shared memory by multiple
>>> threads. Programmers have traditionally used locks for synchronization,
>>> but lock-based synchronization has well-known pitfalls. Simplistic
>>> coarse-grained locking does not scale well, while more sophisticated
>>> fine-grained locking risks introducing deadlocks and data races.
>>> Furthermore, scalable libraries written using fine-grained locks cannot
>>> be easily composed in a way that retains scalability and avoids deadlock
>>> and data races. 

>SM> Is that the one about transactional memory?

Yes
-- 
Piet van Oostrum <····@cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: ····@vanoostrum.org

From: Jon Harrop
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 00:22:00 +0000
Message-ID: <JsSdnbTgAYDCxbHXnZ2dnUVZ8uOdnZ2d@brightview.co.uk>

Lew wrote:
> Jon Harrop wrote:
>> I agree entirely but my statements were about parallelism and not
>> concurrency. Parallel and concurrent programming have wildly different
>> characteristics and solutions. I don't believe shared mutable state is
>> overly problematic in the context of parallelism. Indeed, I think it is
>> usually the best solution in that context.
> 
> Interesting distinction.  Would it be fair to compare concurrent
> programming to the bricks used to build the parallel program's edifice?

Concurrent programming certainly underpins the foundations of almost all
parallel programs. Not least at the level of the OS scheduling the threads
than run the parallel programs. However, that knowledge is probably more
confusing than helpful here.

In essence, concurrent programming is concerned with reducing latency (e.g.
evading blocking) by interleaving computations whereas parallel programming
is concerned with maximizing throughput by performing computations at the
same time.

Historically, concurrency has been of general interest on single core
machines in the context of operating systems and IO and has become more
important recently due to the ubiquity of web programming. Parallelism was
once only important to computational scientists programming shared-memory
supercomputers and enterprise developers programming distributed-memory
clusters but the advent of multicore machines on the desktop and in the
games console has pushed parallelism into the lime light for ordinary
developers when performance is important.

Solutions for concurrent and parallel programming are also wildly different.
Concurrent programming typically schedules work items that are expected to
block on a shared queue for a pool of dozens or even hundreds of threads.
Parallel programming typically schedules work items that are expected to
not block on wait-free work-stealing queues for a pool of "n" threads
where "n" is the number of cores. Solutions for concurrent programming
(such as the .NET thread pool and asynchronous workflows in F#) can be used
as a poor man's solution for parallel programming but the overheads are
huge because they were not designed for this purpose so performance is much
worse than necessary. Solutions for parallel programming (e.g. Cilk, the
Task Parallel Library) are virtually useless for concurrent programming
because you quickly end up with all "n" threads blocked and the whole
program stalls.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Patricia Shanahan
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 00:31:26 +0000
Message-ID: <0KGdnabuMMt8xrHXnZ2dnUVZ_jBi4p2d@earthlink.com>

Jon Harrop wrote:
...
> Historically, concurrency has been of general interest on single core
> machines in the context of operating systems and IO and has become more
> important recently due to the ubiquity of web programming. Parallelism was
> once only important to computational scientists programming shared-memory
> supercomputers and enterprise developers programming distributed-memory
> clusters but the advent of multicore machines on the desktop and in the
> games console has pushed parallelism into the lime light for ordinary
> developers when performance is important.
...

Parallelism has also been important, for a long time, to multiprocessor
operating system developers. I got my first exposure to parallel
programming, in the 1980's, working on NCR's VRX operating system.

Patricia

From: Joshua Cranmer
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 19:35:34 +0000
Message-ID: <h0h4rk$h1o$1@news-int2.gatech.edu>

Jon Harrop wrote:
> No. Most programmers still care about performance and performance means
> mutable state.

[ Citation needed ].

Most programmers I've met could care less about performance.

-- 
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth

From: Jon Harrop
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 21:38:33 +0000
Message-ID: <rMKdnYWoZ82Nr7HXnZ2dnUVZ8nednZ2d@brightview.co.uk>

Joshua Cranmer wrote:
> Jon Harrop wrote:
>> No. Most programmers still care about performance and performance means
>> mutable state.
> 
> [ Citation needed ].
> 
> Most programmers I've met could care less about performance.

Then they have no need for parallelism in the first place.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Arved Sandstrom
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 20:19:16 +0000
Message-ID: <8hVWl.29780$Db2.10968@edtnps83>

Jon Harrop wrote:
> Arved Sandstrom wrote:
>> Jon Harrop wrote:
>>> I see no problem with mutable shared state.
>> In which case, Jon, you're in a small minority.
> 
> No. Most programmers still care about performance and performance means
> mutable state.

Quite apart from performance and mutable state, I believe we were 
talking about mutable _shared_ state. And this is something that gets a 
_lot_ of people into trouble.

AHS

From: Jeff M.
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 21:41:39 +0000
Message-ID: <6c5eab9d-a35d-43df-b45c-d2559c7b2623@f10g2000vbf.googlegroups.com>

On Jun 7, 3:19 pm, Arved Sandstrom <·······@hotmail.com> wrote:
> Jon Harrop wrote:
> > Arved Sandstrom wrote:
> >> Jon Harrop wrote:
> >>> I see no problem with mutable shared state.
> >> In which case, Jon, you're in a small minority.
>
> > No. Most programmers still care about performance and performance means
> > mutable state.
>
> Quite apart from performance and mutable state, I believe we were
> talking about mutable _shared_ state. And this is something that gets a
> _lot_ of people into trouble.
>

Mutable shared state gets _bad_ (err.. perhaps "inexperienced" would
be a better adjective) programmers - who don't know what they are
doing - in trouble. There are many problem domains that either benefit
greatly from mutable shared states or can't [easily] be done without
them. Unified memory management being an obvious example... there are
many more. Unshared state has its place. Immutable state has its
place. Shared immutable state has its place. Shared mutable place has
its place.

Jeff M.

From: Jon Harrop
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 21:51:48 +0000
Message-ID: <w6SdnRNd9oe2qLHXnZ2dnUVZ8kZi4p2d@brightview.co.uk>

Jeff M. wrote:
> On Jun 7, 3:19 pm, Arved Sandstrom <·······@hotmail.com> wrote:
>> Jon Harrop wrote:
>> > Arved Sandstrom wrote:
>> >> Jon Harrop wrote:
>> >>> I see no problem with mutable shared state.
>> >> In which case, Jon, you're in a small minority.
>>
>> > No. Most programmers still care about performance and performance means
>> > mutable state.
>>
>> Quite apart from performance and mutable state, I believe we were
>> talking about mutable _shared_ state. And this is something that gets a
>> _lot_ of people into trouble.
> 
> Mutable shared state gets _bad_ (err.. perhaps "inexperienced" would
> be a better adjective) programmers - who don't know what they are
> doing - in trouble. There are many problem domains that either benefit
> greatly from mutable shared states or can't [easily] be done without
> them. Unified memory management being an obvious example... there are
> many more. Unshared state has its place. Immutable state has its
> place. Shared immutable state has its place. Shared mutable place has
> its place.

Exactly. I don't believe that shared mutable state is any harder to do
correctly than the next solution and it is almost always the most efficient
solution and the sole purpose of writing parallel programs to leverage
multicores is performance.

A bad developer can screw up anything. I see no reason to think that shared
mutable state is any more fragile than the next thing a bad developer can
screw up.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: George Neuner
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 02:40:36 +0000
Message-ID: <prpo259pncbpjk0dmapnapde186k2m4m5l@4ax.com>

On Sun, 07 Jun 2009 22:51:48 +0100, Jon Harrop <···@ffconsultancy.com>
wrote:

>Jeff M. wrote:
>> On Jun 7, 3:19�pm, Arved Sandstrom <·······@hotmail.com> wrote:
>>> Jon Harrop wrote:
>>> > Arved Sandstrom wrote:
>>> >> Jon Harrop wrote:
>>> >>> I see no problem with mutable shared state.
>>> >> In which case, Jon, you're in a small minority.
>>>
>>> > No. Most programmers still care about performance and performance means
>>> > mutable state.
>>>
>>> Quite apart from performance and mutable state, I believe we were
>>> talking about mutable _shared_ state. And this is something that gets a
>>> _lot_ of people into trouble.
>> 
>> Mutable shared state gets _bad_ (err.. perhaps "inexperienced" would
>> be a better adjective) programmers - who don't know what they are
>> doing - in trouble. There are many problem domains that either benefit
>> greatly from mutable shared states or can't [easily] be done without
>> them. Unified memory management being an obvious example... there are
>> many more. Unshared state has its place. Immutable state has its
>> place. Shared immutable state has its place. Shared mutable place has
>> its place.
>
>Exactly. I don't believe that shared mutable state is any harder to do
>correctly than the next solution

I agree.


>and it is almost always the most efficient solution

But here I don't.  As the number of processors rises, mutable shared
state becomes a bottleneck.  Whether it can outperform unshared state
depends on the level of contention.

Using lock free data structures doesn't reduce the level of
contention, it simply reduces the average servicing latency.


>and the sole purpose of writing parallel programs to leverage
>multicores is performance.
>
>A bad developer can screw up anything. I see no reason to think that shared
>mutable state is any more fragile than the next thing a bad developer can
>screw up.

The problem is not 'bad' developers, but rather 'average' developers -
a group which, unfortunately, shares many traits with bad developers.
The average level of competency in engineering software systems is far
too low.

George

From: Jeff M.
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 03:21:03 +0000
Message-ID: <ceeeacbc-6c43-4364-a2f7-6a3dfbbbd2d2@o36g2000vbi.googlegroups.com>

On Jun 7, 9:40 pm, George Neuner <········@comcast.net> wrote:
>
> >A bad developer can screw up anything. I see no reason to think that shared
> >mutable state is any more fragile than the next thing a bad developer can
> >screw up.
>
> The problem is not 'bad' developers, but rather 'average' developers -
> a group which, unfortunately, shares many traits with bad developers.
> The average level of competency in engineering software systems is far
> too low.
>

While there are many, many average programmers out there, and I agree
they share many traits with bad programmers, I tend to lean towards
inexperience causing problems in this domain.

To be fair, many things are like this. And even experienced engineers
can have a tendency to trivialize something they think should "just
work". Sadly, the devil is always in the details. What makes parallel
programming more tricky than most domains is that when done wrong, the
results aren't just a piece of software that happens to use 100 MB of
RAM where 200 KB would have been just fine or causes inordinate
amounts of GC; the results are race conditions, dead locks, a program
pegging CPU usage (on all available nodes), or worse.

Your "average" programmer has no more knowledge of what parallel
programming is over your average PC user: do many things at once. "I'm
making a raytracer... just make each ray its own process and
everything 'just works!'". And, I think we can all agree that just
isn't the case. Again, the devil is in the details.

A problem domain where parallels may be drawn quite easily (imo) is
networking. You've either done it or you're a novice, and I don't care
how many books you've read or trivial 2-player pong games you've made.
In the real work packets drop, machines die, security matters, players
cheat, and bandwidth costs money.

Jeff M.

From: Madhu
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 07:09:11 +0000
Message-ID: <m3tz2r9p7s.fsf@moon.robolove.meer.net>

rant on!

* "Jeff M."
|> The problem is not 'bad' developers, but rather 'average' developers -
|> a group which, unfortunately, shares many traits with bad developers.
|> The average level of competency in engineering software systems is far
|> too low.
|
| While there are many, many average programmers out there, and I agree
| they share many traits with bad programmers, I tend to lean towards
| inexperience causing problems in this domain.
|
| To be fair, many things are like this. And even experienced engineers
| can have a tendency to trivialize something they think should "just
| work". Sadly, the devil is always in the details. What makes parallel
| programming more tricky than most domains is that when done wrong, the
| results aren't just a piece of software that happens to use 100 MB of
| RAM where 200 KB would have been just fine or causes inordinate
| amounts of GC; the results are race conditions, dead locks, a program
| pegging CPU usage (on all available nodes), or worse.

[...]

| A problem domain where parallels may be drawn quite easily (imo) is
| networking. You've either done it or you're a novice, and I don't care
| how many books you've read or trivial 2-player pong games you've made.
| In the real work packets drop, machines die, security matters, players
| cheat, and bandwidth costs money.

I think you hit the nub here.  Bandwidth costs money which means profits
for some at the expense of others.  This is precisely the reason for the
huge amounts of investements in those programmers and middleware
solutions that we see today which use _more_ bandwidth where less would
have sufficed. i.e. to introduce inefficiencies into the medium.  [I can
justify this claim in the web context].  The mechanisms are in place to
make money in *efforts to wipe out the inefficiencies* after they have
been introduced at a cost!

The 200KB-100MB example above has played itself out exceptionally well
in the hardware market, to the point, i think that today, that no one
would be comfortable developing on a java stack with less than 2GB RAM.
where the equivalent task (stripped off the middleware stack) can be
accomplished in 256M, (say, for completing the argument).

The splurge in concurrency funding of today is based on the same lines
--- the problem it is solving is how to support the hardware market for
the next so many years.  The problems that the software solves will be
no different from what we have been solving for the past few
decades. And what has been well solved.  The only difference is in who
gets to do the solving (new developer market), and platforms and
infrastructure to be sold on which the solutions are implemented.


Someone's take on the economic crisis was that it had been created by
lack of actual goods or services.  I can see a parallel between that and
the direction in which computing/computer science, especially this
concurrent "research" is being driven today.

The claim was ``Of the nearly three trillion dollars traded everyday,
only about two per cent is spent on providing actual goods and
services''.  But merely getting such a trding system going will be
profitable for a very long time, well until it collapses, if it
collapses at all.


Parallel algorithms theory has shown us what sorts of "basic
computation" can and cannot be parallelized.  I Look at what can
actually be parallelized as "the actual goods and services", essential
in solving the programming problem at hand. This is not much.

However what we can do is create more "useless work" that *can* be
parallelized and make these a part and parcel of the program/delivery
infrastructure.  This would create the bulk of the "trillion dollars
traded everyday", and it will keep all cores occupied.  The way it is
being done is today we are introducing constructs in our languages so we
lose control and hand over the control to the platform to do as it seems
fit.  We may have to write our programs in a different "more
inefficient" way but we get "automatic scaling" advantages.

Now the antognist-apologist will chime in with "this is precisely what
Common Lisp did when it gave up malloc for a Garabage collected
programming model".  But that is not a fair analogy and can be refuted
separately

--
Madhu

From: Raffael Cavallaro
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 16:47:42 +0000
Message-ID: <h0jfbe$f1n$1@news.eternal-september.org>

On 2009-06-08 03:39:11 -0400, Madhu <·······@meer.net> said:

> Parallel algorithms theory has shown us what sorts of "basic
> computation" can and cannot be parallelized.  I Look at what can
> actually be parallelized as "the actual goods and services", essential
> in solving the programming problem at hand. This is not much.

I agree with much of what you've written. However, I'd point out that 
simply having that capacity widely available (i.e., languages or 
language/OS extensions that allow straightforward data and task 
parallelization) might very well lead people to discover new 
applications that simply weren't feasible without 64 2.5 GHz cores and 
an equivalent amount of parallel horsepower on the graphics card 
available on the average user's machine (in the hypothetical near 
future). IOW, we can't guarantee it, but it's quite likely that if you 
build it, they will come, and figure out creative new uses for it.
-- 
Raffael Cavallaro

From: André Thieme
Subject: Re: multi-core software
Date: Fri, 12 Jun 2009 22:20:26 +0000
Message-ID: <h0ukbd$k7l$1@news.eternal-september.org>

Raffael Cavallaro schrieb:

> I agree with much of what you've written. However, I'd point out that
> simply having that capacity widely available (i.e., languages or 
> language/OS extensions that allow straightforward data and task 
> parallelization) might very well lead people to discover new 
> applications that simply weren't feasible without 64 2.5 GHz cores
> and an equivalent amount of parallel horsepower on the graphics card
> available on the average user's machine (in the hypothetical near 
> future). IOW, we can't guarantee it, but it's quite likely that if
> you build it, they will come, and figure out creative new uses for
> it.

Yes, that sounds plausible. The applications we have seen so far may
sooner or later be called "classical applications". They solve all the
problems for typical work.
Those programs will still be improved and such, but I think that more
and more intelligent software will begin to appear. And for that kind of
algorithms it seems very likely that parallelization can be very beneficial.
The human brain proves how excellent it can be done.


Andr�
-- 
Lisp is not dead. It�s just the URL that has changed:
http://clojure.org/

From: Madhu
Subject: Re: multi-core software
Date: Sat, 13 Jun 2009 00:54:22 +0000
Message-ID: <m3ocst0x8h.fsf@moon.robolove.meer.net>

* André Thieme <············@news.eternal-september.org> :
Wrote on Sat, 13 Jun 2009 00:20:26 +0200:

[I had killfiled you but that was a different email address.  GNUS
killfile does not seem to work with your UTF-8 encoded FROM field.  Any
suggestions on how to get that to work will be appreciated]

| Yes, that sounds plausible. The applications we have seen so far may
| sooner or later be called "classical applications". They solve all the
| problems for typical work.

Your speculation is exactly the kind of thing that one of those actively
funded to change the software landscape to introduce inefficiency will
say.

| Those programs will still be improved and such, but I think that more
| and more intelligent software will begin to appear. And for that kind
| of algorithms it seems very likely that parallelization can be very
| beneficial.

While there are legitimate arguments, these are the very concerte
unprovable wishy-washy statements which are used as an excuse to
introduce unwarranted inefficiency in the short term.

| The human brain proves how excellent it can be done.

This adds no information.  

--
Madhu

From: Pascal J. Bourguignon
Subject: Re: multi-core software
Date: Sat, 13 Jun 2009 02:45:48 +0000
Message-ID: <87r5xo3l7n.fsf@galatea.local>

Madhu <·······@meer.net> writes:

> * Andr� Thieme <············@news.eternal-september.org> :
> Wrote on Sat, 13 Jun 2009 00:20:26 +0200:
>
> [I had killfiled you but that was a different email address.  GNUS
> killfile does not seem to work with your UTF-8 encoded FROM field.  Any
> suggestions on how to get that to work will be appreciated]
>
> | Yes, that sounds plausible. The applications we have seen so far may
> | sooner or later be called "classical applications". They solve all the
> | problems for typical work.
>
> Your speculation is exactly the kind of thing that one of those actively
> funded to change the software landscape to introduce inefficiency will
> say.
>
> | Those programs will still be improved and such, but I think that more
> | and more intelligent software will begin to appear. And for that kind
> | of algorithms it seems very likely that parallelization can be very
> | beneficial.
>
> While there are legitimate arguments, these are the very concerte
> unprovable wishy-washy statements which are used as an excuse to
> introduce unwarranted inefficiency in the short term.

However, I assume if we were a small pack settled in a small
artificial space habitat, with scarce resources, we would probably
write only essential programs direct to the point of the direst needs.

   (Isn't it what we lispers actually do?  People complain all the
    time about lack of libraries or fancy lisp applications with
    chrome and suggar, "user friend" GUIs, etc.  We don't have the
    "resources" to do useless lisp programs.)

However, we aren't in that situation.  We are more than 9 billion
humans, most of them able to program.  We have a lot of computers and
a lot of electricity.  We even have enough food and water to feed
every body including all the potential programmers (notwithstanding
the deficiencies in logistics, and other artificial impediments such
as corrupt politicians and wars).  Once the essential programs are
written (they are), and enough computer power to run them is available
and allocated (it has been for a long time),  why shouldn't we have
some fun?  Including playing this big game of "who will get the
biggest bank account" score?  Yes, there's absolutely no difference
between Lindens and Dollars.

However, you might feel, as I do, that we're not in so good a
situation.  After all, we only have food reserves for a few weeks or
months, we only have energy reserves for a few months or a few years
(although we may count on being able to get more for a few decades).
And we wouldn't be able to deflect a meteorite, and we can only pray
we'll be able to fight all the plagues to come.  We should probably be
working more toward long term survival of humanity and civilisation,
including spreading self sustaining colonies all over the Solar
system,  and spend less time in front of the TV or watching our scores
on the web sites of our "banks".

-- 
__Pascal Bourguignon__

From: Nicolas Neuss
Subject: Re: multi-core software
Date: Sat, 13 Jun 2009 07:56:27 +0000
Message-ID: <87my8ck1n8.fsf@ma-patru.mathematik.uni-karlsruhe.de>

···@informatimago.com (Pascal J. Bourguignon) writes:

> However, we aren't in that situation.  We are more than 9 billion
> humans, most of them able to program.

Luckily, according to http://en.wikipedia.org/wiki/World_population it is
only 7 billions.  Bad enough, but not that bad.

Nicolas

From: Pascal J. Bourguignon
Subject: Re: multi-core software
Date: Sat, 13 Jun 2009 13:06:36 +0000
Message-ID: <87my8c2sgz.fsf@galatea.local>

Nicolas Neuss <········@math.uni-karlsruhe.de> writes:

> ···@informatimago.com (Pascal J. Bourguignon) writes:
>
>> However, we aren't in that situation.  We are more than 9 billion
>> humans, most of them able to program.
>
> Luckily, according to http://en.wikipedia.org/wiki/World_population it is
> only 7 billions.  Bad enough, but not that bad.

Oh, only 7 billions?  Colleagues at work have been producing a lot of
new babies in May (strangely all within a few weeks), so I have been
misled :-)

-- 
__Pascal Bourguignon__

From: Madhu
Subject: Re: multi-core software
Date: Sat, 13 Jun 2009 05:11:44 +0000
Message-ID: <m3k53g1zvz.fsf@moon.robolove.meer.net>

* (Pascal J. Bourguignon) <··············@galatea.local> :
Wrote on Sat, 13 Jun 2009 04:45:48 +0200:

| However, you might feel, as I do, that we're not in so good a
| situation.  After all, we only have food reserves for a few weeks or
| months, we only have energy reserves for a few months or a few years
| (although we may count on being able to get more for a few decades).
| And we wouldn't be able to deflect a meteorite, and we can only pray
| we'll be able to fight all the plagues to come.  We should probably be
| working more toward long term survival of humanity and civilisation,
| including spreading self sustaining colonies all over the Solar
| system, and spend less time in front of the TV or watching our scores
| on the web sites of our "banks".

Yes, But I still think the reality of what youre talking about is a
mixed bag its a good situation for some, and a bad situation for many
more.  That is to be expected --- it can be no other way.  Myself, I've
stopped playing that game for a few years, now and am only interested in
it to the extent of pointing out the incongruities; and pointing out (in
this thread) the cases where the long term benefits are eradicated to
make way for short term goals --- in particular how the gamepoints are
accruing to the people playing the game for effecting the very negative
effects we (two) wish to avoid.

No doubt all this is made possible from surpluses in some departments;
to a large extent, in this case, Mr.Moore and his law have financed the
law of increasing returns for a long time --- resulting in well
channeled flow of money, which has created the institution of "IT
spending" over the two decades.  The efforts which are underway are to
sustain the growth in these institutions, rather than advance towards
utopia, and it is a desperate effort becayse there is no underlying
fundamental gain to back it up.  This will be very hard. Most of the
theologies point out the only thing that effortlessly maintains such
growth rates easily is evil and it is called "evil" because it brings on
attendant suffering.

--
Madhu

From: Jon Harrop
Subject: Re: multi-core software
Date: Mon, 08 Jun 2009 03:15:46 +0000
Message-ID: <idadnZngX8SEHLHXnZ2dnUVZ8m5i4p2d@brightview.co.uk>

George Neuner wrote:
> On Sun, 07 Jun 2009 22:51:48 +0100, Jon Harrop <···@ffconsultancy.com>
> wrote:
>>and it is almost always the most efficient solution
> 
> But here I don't.  As the number of processors rises, mutable shared
> state becomes a bottleneck.  Whether it can outperform unshared state
> depends on the level of contention.

Only if you keep writing to it.

> Using lock free data structures doesn't reduce the level of
> contention, it simply reduces the average servicing latency.

The lowest common denominator is preallocated write-once shared mutable
state. That should be at least as fast as immutable shared state in all
cases.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Jon Harrop
Subject: Re: multi-core software
Date: Sun, 07 Jun 2009 21:29:06 +0000
Message-ID: <dbKdndLR-epEsrHXnZ2dnUVZ8s-dnZ2d@brightview.co.uk>

Arved Sandstrom wrote:
> Jon Harrop wrote:
>> Arved Sandstrom wrote:
>>> Jon Harrop wrote:
>>>> I see no problem with mutable shared state.
>>>
>>> In which case, Jon, you're in a small minority.
>> 
>> No. Most programmers still care about performance and performance means
>> mutable state.
> 
> Quite apart from performance and mutable state, I believe we were
> talking about mutable _shared_ state. And this is something that gets a
> _lot_ of people into trouble.

Nonsense. Scientists have been writing parallel programs for decades using
shared state extensively without whining about it. Databases are mutable
shared state but millions of database programmers solve real problems every
day without whining about it.

Use your common sense and you can write efficient parallel programs today
with little difficulty. I do.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: toby
Subject: Re: multi-core software
Date: Tue, 09 Jun 2009 17:47:11 +0000
Message-ID: <ec8bfa8b-011d-47ca-b525-8b617535226e@e24g2000vbe.googlegroups.com>

On Jun 7, 2:41 pm, Jon Harrop <····@ffconsultancy.com> wrote:
> Arved Sandstrom wrote:
> > Jon Harrop wrote:
> >> I see no problem with mutable shared state.
>
> > In which case, Jon, you're in a small minority.
>
> No. Most programmers still care about performance

Frequently when they shouldn't.

> and performance means
> mutable state.

Hm, not sure Erlangers would wholly agree.

>
> --
> Dr Jon D Harrop, Flying Frog Consultancy Ltd.http://www.ffconsultancy.com/?u

From: George Neuner
Subject: Re: multi-core software
Date: Tue, 09 Jun 2009 20:07:07 +0000
Message-ID: <f5gt25h2802l5dvki1gjav545kb7h66kp5@4ax.com>

On Tue, 9 Jun 2009 10:47:11 -0700 (PDT), toby
<····@telegraphics.com.au> wrote:

>On Jun 7, 2:41�pm, Jon Harrop <····@ffconsultancy.com> wrote:
>> Arved Sandstrom wrote:
>> > Jon Harrop wrote:
>>
>> performance means mutable state.
>
>Hm, not sure Erlangers would wholly agree.

Erlang uses quite a bit of mutable state behind the scenes ... the
programmers just don't see it.

George

From: Dimiter "malkia" Stanev
Subject: Re: multi-core software
Date: Tue, 09 Jun 2009 21:28:54 +0000
Message-ID: <h0mk6n$lj9$1@malkia.motzarella.org>

> Erlang uses quite a bit of mutable state behind the scenes ... the
> programmers just don't see it.
> 
> George

Heh... "The CPUs use quite a bit of mutable state behind the scenes ... 
the programmers just don't see it."

Actually with CPU they see it more, than... say Erlang (that's why you 
need to use fences/barriers/locked access here and there).

From: Jon Harrop
Subject: Re: multi-core software
Date: Tue, 09 Jun 2009 18:59:41 +0000
Message-ID: <S4-dndxpCeRHMrPXnZ2dnUVZ8tSdnZ2d@brightview.co.uk>

toby wrote:
> On Jun 7, 2:41 pm, Jon Harrop <····@ffconsultancy.com> wrote:
>> Arved Sandstrom wrote:
>> > Jon Harrop wrote:
>> >> I see no problem with mutable shared state.
>>
>> > In which case, Jon, you're in a small minority.
>>
>> No. Most programmers still care about performance
> 
> Frequently when they shouldn't.

I disagree. A lot of software is still far too slow because the programmers
failed to pay attention to performance. Blogspot in Firefox being one
example: it can barely keep up with my typing!

>> and performance means mutable state.
> 
> Hm, not sure Erlangers would wholly agree.

Erlang is about concurrency. This thread is about parallelism.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

From: Scott Burson
Subject: Re: multi-core software
Date: Fri, 05 Jun 2009 21:24:26 +0000
Message-ID: <ef72f2b4-f569-4e26-b52c-3899a203a1e4@n4g2000vba.googlegroups.com>

On Jun 4, 9:46 am, Xah Lee <······@gmail.com> wrote:
> Of interest:
>
> • Why Must Software Be Rewritten For Multi-Core Processors?
>  http://xahlee.org/UnixResource_dir/writ/multi-core_software.html
>
> plain text version follows.
>
> --------------------------------------------------
> Why Must Software Be Rewritten For Multi-Core Processors?
>
> Xah Lee, 2009-06-04
>
> I had a revelation today, namely, that it is necessary to rewrite
> software to use multi-processor in order to benefit from it.
>
> This may sound stupid, but is a revelation to me. For the past decade,
> the question has been on my mind, about why should software needs to
> be rewritten to take advantage of multi-processors. Because, in my
> mind, i thought that software are at some fundamental level just
> algorithms, and algorithms, have nothing to do with hardware
> implementation aspects such as number of processors. I always felt,
> that those talks about the need or difficulty of rewriting software
> for multi-processor (or multi-core these days) must be the product of
> idiocy of industrial imperative coding monkies. In particular, some
> languages such as java, the way they deal with it, seems to me
> extremely stupid. e.g. the concept of threads. In my mind, there
> should be a layer between the software and the hardware, such as the
> operating system, or the compiler, that should be able to
> automatically handle or compile the software so that it FULLY use the
> multi-processors when present. In short, i thought that a algorithm
> really has nothing to do with hardware details.
>
> I never really thought hard about this issue, but today, since i got a
> quad-core PC, so i looked into the issue, and thought about it, and i
> realized the answer. The gist is that, algorithm, fundamentally means
> manipulating some hardware, in fact, algorithm is a step by step
> instruction about some form of hardware, even the hardware may be
> abstract or virtual. For example, let's say the simplest case of 1+1.
> It is a algorithm, but where is the hardware? You may say it's
> abstract concept, or it being a mathematical model. What you call 1+1
> depends on the context, but in our context, those numbers are the
> hardware. To see this, lets say more complex example of listing primes
> by sieve. Again, it comes down to “what is a number”? Here, numbers
> can be stones, or arrangement of beads on abacus, it's hardware! As
> another example, say sorting. To begin with, you have to have some
> something to sort, that's hardware.
>
> Another way to see this is that, for a given computing problem, there
> are infinite number of algorithms to achieve the same thing. Some,
> will be better ones, requiring less steps, or requiring less storage.
> All these are concrete manipulation issues, and the thing being
> manipulated, ultimately we have to call it hardware. So, when hardware
> changes, say from one cpu to multi-cpu, there's no way for the
> algorithm to magically change and adopt the changed hardware. If you
> need a algorithm that is catered to the new hardware, you need a new
> algorithm.
>
> One might think that there might be algorithm Omega, such that it
> takes input of old hardware O, new hardware N, and a algorithm A, and
> output a algorithm B, such that B has the same behavior as A, but N+B
> performs better than O+A. This is like asking for Strong AI.
>
> One thing we often forgot is that algorithms ultimately translates to
> manipulating hardware. In a modern digital computer, that means
> software algorithms ultimately becomes machine instructions in CPU,
> which manipulate the 1s and 0s in register, or electricity voltage in
> transisters.
>
> In a more mundane point of view, a automatic system for software to
> work on multi-processors is a problem of breaking a problem into
> discrete units (so that they can be computed in parallel). The problem
> of finding a algorithm is entirely different from the problem of
> breaking a problem into distinct units. The problem of breaking a
> problem into distinct units is a entire new branch of mathematics. For
> example, let's say factoring. Factoring is a well defined mathematical
> problem. There are millions algorithms to do it, each class has
> different properties such as number of steps or storage units.
> However, if we look at these algorithms from the point of view of
> distinct units, it's a new perspective on classification of
> algorithms. Software are in general too ill-defined and fuzzy and
> complex, the software we use on personal computers such as email,
> browsers, games, don't even have mathematical models. They don't even
> have mathematical models of their inputs and outputs. To talk about
> automatic system of unitizing software, would be more like a AI
> fantasy. Roughly, a term that describes this aspect of research is
> Parallel computing.
>
> In the Wikipedia article, it talks about types of parallelism: Bit-
> level parallelism, Instruction-level parallelism, Data parallelism,
> Task parallelism. Then it also discusses hardware aspects classes:
> multicore, symmetric multiprocessing, distributed computing, cluster,
> grid. The subjects mentioned there more close to this essay are “data
> parallelism” and “task parallelism”. However, neither are high level
> enough as i discussed here. The issue discussed here would be like
> “algorithmic parallelism”. Indeed, Wikipedia mentioned “Automatic
> parallelization”, which is exactly what i'm talking about here. Quote:
>
>     Automatic parallelization of a sequential program by a compiler is
> the holy grail of parallel computing. Despite decades of work by
> compiler researchers, automatic parallelization has had only limited
> success.[40]
>
>     Mainstream parallel programming languages remain either explicitly
> parallel or (at best) partially implicit, in which a programmer gives
> the compiler directives for parallelization. A few fully implicit
> parallel programming languages exist — SISAL, Parallel Haskell, and
> (for FPGAs) Mitrion-C.
>
> It says “A few fully implicit parallel programming languages exist”.
> If you are a comp lang nutcase, don't get carried away by what those
> words might seem to mean.
>
> (Note: Wikipedia has a dedicated article on the subject: Automatic
> parallelization)
>
>   Xah
> ∑http://xahlee.org/
>
> ☄

From: Scott Burson
Subject: Re: multi-core software
Date: Sat, 06 Jun 2009 04:56:05 +0000
Message-ID: <9d55c2b9-c65d-4236-aeb4-053eaab2d727@37g2000yqp.googlegroups.com>

On Jun 5, 2:24 pm, Scott Burson <········@gmail.com> wrote:

[quoted text with no actual reply]

Good grief, how did that happen?  Musta hit "send" instead of
"discard".  Sorry...

-- Scott