Processors stall on OLTP workloads about half the time--almost no matter what you do [Computer Architecture]

Prev: Processors stall on OLTP workloads about half the time--almostno matter what you do
Next: Processors stall on OLTP workloads about half the time--almostno matter what you do

From: Anne & Lynn Wheeler on 30 Apr 2010 13:32

Quadibloc <jsavard(a)ecn.ab.ca> writes:
> Part of this is the cost of RAS features (Reliability, Availability,
> Serviceability: others have substituted Scalability or Security for
> the last one), and part a hidden charge for access to IBM's quality
> software.

re:
http://www.garlic.com/~lynn/2010i.html#0 Processors stall on OLTP workloads about half the time--almost no matter what you do

the "Financial Matters: Mainframe Processor Pricing History" article
http://www.zjournal.com/index.cfm?section=article&aid=346

was tracking mainframe (mip) pricing during the 70s, 80s, and early part
of 90s when there were (similar) clone mainframes ... and the (ibm)
mainframe price/mip system pricing curve changed after clone mainframes
left the market in the 90s (i.e. the comment was that if the 70s, 80s, &
90s curve had continued up thru the data of the article, a mainframe
selling for $18m ... would have been instead selling for $3m ... aka
mainframe to mainframe pricing).

some number of complaints in the ibm-main mainframe mailing list is that
(regardless of high mainframe pricing), that mainframe software pricing
is dominating costs.

A 25+ yr old RAS story was that the product manager for 3090 mainframe
tracked me down after 3090s had been in customer shops for a year. There
is a mainframe industry reporting service that collects customer
mainframe EREP reports and publishes regular monthly summaries (at the
time including the various clone vendors). The problem was that 3090 was
designed to have something like aggregate 3-5 "channel errors" per annum
in total across all installed machines. The reporting service turned up
closer to 20 total "channel errors" that had occured in aggregate across
all installed 3090s.

I had done operating system driver for HYPERChannel ... allowing remote
mainframe controllers and devices at remote locations, using
HYPERChannel as a form of mainframe channel extension (for internal
installations). In some case, when I had an unrecoverable error, I would
reflect and emulated "channel check" which would result in various
recovery and retry operations by the standard operating system RAS. I
then tried to get the HYPERChannel driver released to customers, but
various corporat factions objected. As a result, the HYPERChannel vendor
effectively had to do a re-implementation. In any case, the 15 "extra"
3090 channel errors (aggregate across all installed 3090s for the first
year) was some HYPERChannel installations (reflecting emulated channel
check). So I did some research and selected emulated IFCC (iterface
control check) to be substituted in place of CC (channel check) ... it
turns out that IFCC follows effectively identical path thru error
recovery as CC (but wouldn't show up as channel error in the industry
reports).

Point is that there doesn't seemed to be anything similar in other
markets (i.e. industry monthly error/RAS reports across all customer
installed machines).

as an aside ... when we were doing ha/cmp in the early 90s
http://www.garlic.com/~lynn/subtopic.html#hacmp

I was asked to write a section for the corporate continuous availability
strategy document. The section got pulled because both Rochester
(as/400) and POK (mainframe) complained (that they couldn't meet the
availability criteria in my section). I had coined the term disaster
survivability and geographic survivability when out marketing ha/cmp
http://www.garlic.com/~lynn/submain.html#available

that was separate/independent to work involving cluster scaleup in
ha/cmp ... aka project started out as ha/6000 ... but I changed the name
to ha/cmp to also reflect the work on cluster scaleup. when the cluster
scaleup part of the effort was transferred and we were told that we
couldn't work on anything with more than four processors, the didn't
both to change the product name. recent thread in this n.g. on the
cluster scaleup subject:
http://www.garlic.com/~lynn/2010.html#6 Larrabee delayed: anyone know what's happening?
http://www.garlic.com/~lynn/2010.html#31 Larrabee delayed: anyone know what's happening?
http://www.garlic.com/~lynn/2010.html#41 Larrabee delayed: anyone know what's happening?
http://www.garlic.com/~lynn/2010.html#44 Larrabee delayed: anyone know what's happening?
http://www.garlic.com/~lynn/2010f.html#50 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010f.html#52 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010f.html#55 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010f.html#56 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010f.html#57 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010f.html#58 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010f.html#60 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010f.html#61 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010f.html#63 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010f.html#64 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010f.html#70 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010g.html#4 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010g.html#8 Handling multicore CPUs; what the competition is thinking
http://www.garlic.com/~lynn/2010g.html#48 Handling multicore CPUs; what the competition is thinking

--
42yrs virtualization experience (since Jan68), online at home since Mar1970

From: George Neuner on 30 Apr 2010 14:05

On Fri, 30 Apr 2010 09:46:47 -0700 (PDT), MitchAlsup
<MitchAlsup(a)aol.com> wrote:

>On Apr 28, 2:36�pm, George Neuner <gneun...(a)comcast.net> wrote:
>> What remains mostly is research into ways of recognizing repetitious
>> patterns of data access in linked data structures (lists, trees,
>> graphs, tries, etc.) and automatically prefetching data in advance of
>> its use. �I haven't followed this research too closely, but my
>> impression is that it remains a hard problem.
>
>There is a pattern recognizing prefetched in GreyHound (Opteron Rev-G)
>and later that can lock onto no sequential and non-monotonic access
>patterns. One of the bad SpedFP benchamrks had an access patern that
>look something like (in cache ine addresses):
>
>loop:
>prefetch[n+1]'address = prefetch[n]'address+4
>prefetch[n+2]'address = prefetch[n+1]'address+4
>prefetch[n+3]'address = prefetch[n+2]'address-1
>repeat at loop/break on crossing of physical page boundary
>
>That is the loop concerns 3 cache lines, two having stpe sizes of +4
>(or was it +3) and the next has a step size of -1. My DRAM controler
>locks onto this non-linear stride and prefetches the lines at high
>efficiency. Here up to 7 (or was it 8) different strides could be
>'followed' if found in a repetive situation. {However the credit is
>not due to me, but to another engineer who discovered a means to
>encode this non-linear strides in an easy to access table.}
>
>Its easy to see a compiler figuring this out also.
>
>Mitch

Yes. The example seems to be a list traversal, although I'm not sure
what the negative offset represents - possibly a pointer to node data
in a spined list.

As I mentioned to Robert, linked list following is one area where I
know there has been success (another is in search trees). The issue
with lists is whether to bother prefetching at all because many list
algorithms do little computation per node and prefetch would need to
keep several nodes ahead to avoid stall ... which IMO doesn't seem
reasonable for most cases.

I see prefetching as desirable for something like a map function where
a) the entire list is traversed, and b) there will (typically) be some
nontrivial computation per node. But many list algorithms involve
only simple processing per node and, on average, only half the list
will be traversed. It doesn't make sense to me to prefetch a bunch
of nodes that may never be touched ... that's just cache pollution.

George

From: George Neuner on 30 Apr 2010 15:55

On Fri, 30 Apr 2010 11:43:22 -0700 (PDT), MitchAlsup
<MitchAlsup(a)aol.com> wrote:

>On Apr 30, 1:05�pm, George Neuner <gneun...(a)comcast.net> wrote:
>
>> I see prefetching as desirable for something like a map function where
>> a) the entire list is traversed, and b) there will (typically) be some
>> nontrivial computation per node. �But many list algorithms involve
>> only simple processing per node and, on average, only half the list
>> will be traversed. � It doesn't make sense to me to prefetch a bunch
>> of nodes that may never be touched ... that's just cache pollution.
>
>What you failed to see is that the DRAM prefetcher places the
>prefetched line in a DRAM read buffer and polutes no cache in the
>system. If no demand request for the line arrives, it is silently
>discarded. If a demand or interrior request arrives, the line is
>transfered back without any DRAM latency. You still incure the latency
>of getting out to the DRAM controller and back (6-8ns), but save the
>DRAM access latency (24-60ns). And you don't polute any of the caches!

Which is better than fetching into multiple levels of cache, but still
has the effect of tying up resources: the memory controller, the chips
responding to the read, the read buffer line, etc. - unavailability of
any or all of which might delay some other computation (depending on
the architecture).

There isn't any free lunch.

George

From: Anne & Lynn Wheeler on 30 Apr 2010 17:07

Quadibloc <jsavard(a)ecn.ab.ca> writes:
> HP owns OpenVMS, a decent mainframe-quality operating system. It
> should really look into giving IBM some competition.

re:
http://www.garlic.com/~lynn/2010i.html#0 Processors stall on OLTP workloads about half the time--almost no matter what you do
http://www.garlic.com/~lynn/2010i.html#2 Processors stall on OLTP workloads about half the time--almost no matter what you do

from this post (in ibm-main mailing list)
http://www.garlic.com/~lynn/2010i.html#1 25 reasons why hardware is still hot at IBM

IBM's Unix poaching slows in Q1
http://www.theregister.co.uk/2010/04/29/ibm_unix_takeouts/

from above:

In November 2008, HP was perfectly happy to crow that it had converted
more than 250 IBM mainframe shops to Integrity machines in the prior two
years - which prompted IBM to retaliate about the 5,000 HP and Sun
takeouts it had done in the prior four years.

.... snip ...

.... Integrity (Itanium2) severs
http://en.wikipedia.org/wiki/HP_Integrity_Servers
http://h20341.www2.hp.com/integrity/us/en/systems/integrity-systems-overview.html

--
42yrs virtualization experience (since Jan68), online at home since Mar1970

From: Morten Reistad on 1 May 2010 07:43

In article <m3wrvops2r.fsf(a)garlic.com>,
Anne & Lynn Wheeler <lynn(a)garlic.com> wrote:
>Quadibloc <jsavard(a)ecn.ab.ca> writes:
>> HP owns OpenVMS, a decent mainframe-quality operating system. It
>> should really look into giving IBM some competition.

Nowadays it is not much about the os itself. It is about scaling
the application, and the database.

>from above:
>
>In November 2008, HP was perfectly happy to crow that it had converted
>more than 250 IBM mainframe shops to Integrity machines in the prior two
>years - which prompted IBM to retaliate about the 5,000 HP and Sun
>takeouts it had done in the prior four years.

HP sells good hardware (finally), but IBM has found a very profitable
"niche"; helping all the successful not-quite-google, but still very
aggressively growing internet operations deliver and scale their
systems. They will take good care of their wallets, too.

-- mrr

First | Prev |
Pages: 1 2
Prev: Processors stall on OLTP workloads about half the time--almostno matter what you do
Next: Processors stall on OLTP workloads about half the time--almostno matter what you do