Processors stall on OLTP workloads about half the time--almost no matter what you do [Computer Architecture]

Prev: Looking for Sponsorship
Next: Processors stall on OLTP workloads about half the time--almostno matter what you do

From: Quadibloc on 30 Apr 2010 10:58

On Apr 30, 5:57 am, Anne & Lynn Wheeler <l...(a)garlic.com> wrote:

> from above (2006) article:
>
> is that the price per MIPS today is approximately six times higher than
> the $165 per MIPS that the traditional technology/price decline link
> would have produced

Part of this is the cost of RAS features (Reliability, Availability,
Serviceability: others have substituted Scalability or Security for
the last one), and part a hidden charge for access to IBM's quality
software.

With the Nehalem-EX, the clock is ticking on part of that.

HP owns OpenVMS, a decent mainframe-quality operating system. It
should really look into giving IBM some competition.

John Savard

From: MitchAlsup on 30 Apr 2010 12:46

On Apr 28, 2:36 pm, George Neuner <gneun...(a)comcast.net> wrote:
> What remains mostly is research into ways of recognizing repetitious
> patterns of data access in linked data structures (lists, trees,
> graphs, tries, etc.) and automatically prefetching data in advance of
> its use. I haven't followed this research too closely, but my
> impression is that it remains a hard problem.

There is a pattern recognizing prefetched in GreyHound (Opteron Rev-G)
and later that can lock onto no sequential and non-monotonic access
patterns. One of the bad SpedFP benchamrks had an access patern that
look something like (in cache ine addresses):

loop:
prefetch[n+1]'address = prefetch[n]'address+4
prefetch[n+2]'address = prefetch[n+1]'address+4
prefetch[n+3]'address = prefetch[n+2]'address-1
repeat at loop/break on crossing of physical page boundary

That is the loop concerns 3 cache lines, two having stpe sizes of +4
(or was it +3) and the next has a step size of -1. My DRAM controler
locks onto this non-linear stride and prefetches the lines at high
efficiency. Here up to 7 (or was it 8) different strides could be
'followed' if found in a repetive situation. {However the credit is
not due to me, but to another engineer who discovered a means to
encode this non-linear strides in an easy to access table.}

Its easy to see a compiler figuring this out also.

Mitch

From: MitchAlsup on 30 Apr 2010 14:43

On Apr 30, 1:05 pm, George Neuner <gneun...(a)comcast.net> wrote:
> On Fri, 30 Apr 2010 09:46:47 -0700 (PDT), MitchAlsup
> Yes. The example seems to be a list traversal, although I'm not sure
> what the negative offset represents - possibly a pointer to node data
> in a spined list.

Agreed that this is probably some kind of list traversal. The negative
number is as aspect of how the list was built.

<snip>
> I see prefetching as desirable for something like a map function where
> a) the entire list is traversed, and b) there will (typically) be some
> nontrivial computation per node. But many list algorithms involve
> only simple processing per node and, on average, only half the list
> will be traversed. It doesn't make sense to me to prefetch a bunch
> of nodes that may never be touched ... that's just cache pollution.

What you failed to see is that the DRAM prefetcher places the
prefetched line in a DRAM read buffer and polutes no cache in the
system. If no demand request for the line arrives, it is silently
discarded. If a demand or interrior request arrives, the line is
transfered back without any DRAM latency. You still incure the latency
of getting out to the DRAM controller and back (6-8ns), but save the
DRAM access latency (24-60ns). And you don't polute any of the caches!

Mitch

From: MitchAlsup on 1 May 2010 14:54

On Apr 30, 2:55 pm, George Neuner <gneun...(a)comcast.net> wrote:
> On Fri, 30 Apr 2010 11:43:22 -0700 (PDT), MitchAlsup
>
>
>
>
>
> <MitchAl...(a)aol.com> wrote:
> >On Apr 30, 1:05 pm, George Neuner <gneun...(a)comcast.net> wrote:
>
> >> I see prefetching as desirable for something like a map function where
> >> a) the entire list is traversed, and b) there will (typically) be some
> >> nontrivial computation per node. But many list algorithms involve
> >> only simple processing per node and, on average, only half the list
> >> will be traversed. It doesn't make sense to me to prefetch a bunch
> >> of nodes that may never be touched ... that's just cache pollution.
>
> >What you failed to see is that the DRAM prefetcher places the
> >prefetched line in a DRAM read buffer and polutes no cache in the
> >system. If no demand request for the line arrives, it is silently
> >discarded. If a demand or interrior request arrives, the line is
> >transfered back without any DRAM latency. You still incure the latency
> >of getting out to the DRAM controller and back (6-8ns), but save the
> >DRAM access latency (24-60ns). And you don't polute any of the caches!
>
> Which is better than fetching into multiple levels of cache, but still
> has the effect of tying up resources: the memory controller, the chips
> responding to the read, the read buffer line, etc. - unavailability of
> any or all of which might delay some other computation (depending on
> the architecture).

There were no resources being tied up that were/are useable by memory
requests from the coherent or incoherent requestors in this
prefetcher.
Nor were any cycles on the DRAM busses taken that would have been
useable to requests sitting around waiting for DRAM accesses.
This prefetcher watched for those periods in time where no requests
were present and banks were already open and used those 'free' cycles.

The only downside is that when prefetches were made closing of DRAM
pages might be delayed while the prefetch plays out.

Mitch

From: Quadibloc on 2 May 2010 22:36

On May 2, 6:04 pm, Del Cecchi <delcec...(a)gmail.com> wrote:
> Quadibloc wrote:

> > HP owns OpenVMS, a decent mainframe-quality operating system. It
> > should really look into giving IBM some competition.

> Why should HP try to reintroduce VMS into the market place? Do you
> really think that this is a financially beneficial or viable action?

It's true that I can't be certain this would be a sensible thing to
do. But I do think that there is a need for more operating systems
that are reliable and offer the security that real mainframe operating
systems do. Microsoft Windows doesn't cut it. Neither does Linux. Even
commercial versions of Unix, while they serve their intended purposes
better than Linux can as a substitute for them, are still derived from
what began as an extremely minimalist operating system.

Open VMS might well not be viable as part of an attempt by HP to
compete directly with IBM's mainframe offerings. But the market has a
lot of other places where HP could direct a system with a port of Open
VMS. They could, for example, make it an alternative to Windows
Server.

John Savard

First | Prev |
Pages: 1 2 3 4 5
Prev: Looking for Sponsorship
Next: Processors stall on OLTP workloads about half the time--almostno matter what you do