Life After Moore's Law [Computer Architecture]

Prev: Processors stall on OLTP workloads about half the time--almost no ?matter what you do
Next: Processors stall on OLTP workloads about half the time--almostno matter what you do

From: Rick Jones on 30 Apr 2010 14:15

My waggish comment for the hour drawing inspiration from technology,
magic and well-rigged demos:

A sufficiciently short attention span is indistinguishable from
parallel thought.

rick jones
--
I don't interest myself in "why". I think more often in terms of
"when", sometimes "where"; always "how much." - Joubert
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

From: MitchAlsup on 30 Apr 2010 14:36

On Apr 30, 6:39 am, "nedbrek" <nedb...(a)yahoo.com> wrote:
> > That being said, MOST of the problem IS only that people are very
> > reluctant to change. We could parallelise ten or a hundred times
> > as many tasks as we do before we hit the really intractable cases.
>
> I'm curious what sort of problems these are? My day-to-day tasks are:
> 1) Compiling (parallel)
> 2) Linking (serial)
> 3) Running a Tcl interpreter (serial)
> 4) Simulating microarchitectures (serial, but I might be able to run
> multiple simulations at once, given enough RAM).

I am going to argue that linking is DAG structured, which is better
than serial but not fully parallel. But I digress...

I believe that simulations of miroarchitectures are inherently
parallel. Its just that you have to program these in an HDL (¿Verilog
anyone?) in order to gain access to the inherent parallelism of the
problem. Written in this way, you can throw a whole CPU farm at the
problem (ask me how I know) and get designer level feedback on how
well the algorithms have been implemented.

On the other hand, one can also write microarchitecture simulations as
a series of pipeline stages and have each pipeline stage operate on a
different node in a CPU farm.

What I believe you are asserting, is that it is currently faster to
simulate a microarchitecture as a serial application than it is to
simulate a microarchitecture as a parallel problem/application. Thus,
one can examine MORE applications/benchmarks by throwing a serial
application at a CPU farm than by throwing an HDL description at a CPU
farm. You cannot argue that you can get more detailed observations of
each and every little nuance of the purported implementation in the
serial microarchitecture similation.

So while the problem is inherently parallel, all reasonable parallel
implementations provide less value to the microarchitect than do the
parallel versions. Its not that it can't be done in parallel, its that
it is not cost effective (at that level of observation).

Mitch

From: Paul A. Clayton on 30 Apr 2010 17:51

On Apr 30, 2:36 pm, MitchAlsup <MitchAl...(a)aol.com> wrote:
[snip]
> I am going to argue that linking is DAG structured, which is better
> than serial but not fully parallel. But I digress...

Shouldn't (re)linking be very parallel? I.e., would not information
from previous linking allow rather aggressive speculation (with
a low probability of incorrect speculation?)? I am guessing
that in the majority of cases, a significant amount of information
could be available from previous linking/compilation.

I also wonder how much compilation work could be hoisted
into the edit through repository commit stages. (Using
information from previous compilations might also help with
phase ordering problems.) I am guessing that processing
power is underutilized during editing even with more
feature-rich development environments.

Paul A. Clayton
a highly ignorant technophile who is highly blessed by the
knowledge and wisdom of so many comp.arch posters

From: nedbrek on 2 May 2010 07:36

Hello all,

<nmm1(a)cam.ac.uk> wrote in message
news:hreg32$6s1$1(a)soup.linux.pwf.cam.ac.uk...
> In article <hrec1m$jse$1(a)news.eternal-september.org>,
> nedbrek <nedbrek(a)yahoo.com> wrote:
>>My day-to-day tasks are:
>>1) Compiling (parallel)
>>2) Linking (serial)
>>3) Running a Tcl interpreter (serial)
>>4) Simulating microarchitectures (serial, but I might be able to run
>>multiple simulations at once, given enough RAM).
>>
>>I'm particularly interested in parallel linking.
>
> Linking is fairly simply parallelisable, in the same way that most
> such transformations are - i.e. more in theory than practice. The
> only problem is when you have do do a large amount of the work of
> one part to work out what other tasks that part implies.

Interesting. I was going to ask if there any compilers working on parallel
link, then I remembered Google :)

http://en.wikipedia.org/wiki/Gold_%28linker%29

Released as beta in 2008, doesn't seem much news since then. Sounds pretty
cool.

Thanks,
Ned

From: nedbrek on 2 May 2010 07:45

Hello all,

"MitchAlsup" <MitchAlsup(a)aol.com> wrote in message
news:84651094-5177-4949-b6f9-48189a2b5a28(a)k29g2000yqh.googlegroups.com...
On Apr 30, 6:39 am, "nedbrek" <nedb...(a)yahoo.com> wrote:
>> I'm curious what sort of problems these are? My day-to-day tasks are:
>> 4) Simulating microarchitectures (serial, but I might be able to run
>> multiple simulations at once, given enough RAM).
>
> What I believe you are asserting, is that it is currently faster to
> simulate a microarchitecture as a serial application than it is to
> simulate a microarchitecture as a parallel problem/application. Thus,
> one can examine MORE applications/benchmarks by throwing a serial
> application at a CPU farm than by throwing an HDL description at a CPU
> farm. You cannot argue that you can get more detailed observations of
> each and every little nuance of the purported implementation in the
> serial microarchitecture similation.
>
> So while the problem is inherently parallel, all reasonable parallel
> implementations provide less value to the microarchitect than do the
> parallel [sic? serial?] versions. Its not that it can't be done in
> parallel,
> its that it is not cost effective (at that level of observation).

I wouldn't say that the serial is faster to run, but rather faster to
_develop_.

My first models were written in C, then C++. My latest is in D (just
starting!). The progression here is in developer productivity. An HDL
would be a big step backwards. I want to examine a lot of crazy ideas as
quickly as possible, not fight with implementation details.

Also, there is debugging. If I have a bug 1e7 cycles in, I don't think I
can hook into a CPU farm for debugging... I need to run it on my local
desktop, which is (more) serial (less parallel).

Any study is going to be turning a lot of knobs (sweep L1 size and latency,
ROB size, scheduler size, port bindings, etc). There is a lot of process
level parallelism there, but it chews up memory size and bandwidth.

Ned

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: Processors stall on OLTP workloads about half the time--almost no ?matter what you do
Next: Processors stall on OLTP workloads about half the time--almostno matter what you do