From: jacko on 14 Jul 2010 14:35 On 14 July, 17:59, MitchAlsup <MitchAl...(a)aol.com> wrote: > On Jul 14, 5:37 am, "Piotr Wyderski" > > <piotr.wyder...(a)mothers.against.spam.gmail.com> wrote: > > Not only, Sparcs have much worse memory disambiguation mechanisms. > > Even such a simple loop shows stunning differences: > > > for(volatile unsigned int i = 0; i != 1000000000; ++i); > > > Investigation at the assembler level has shown that Sparc is good > > (i.e. Xeon-like) at loads and stores, but when there is a long > > store-load dependence, the performance loss is a total disaster, to > > put it mildly. Xeons do not exhibit this kind of problems. > > x86 grew up in an environment where stores were reloaded in a rather > short amount of time {arguments get pushed, then accessed a couple > cycles later in the called subroutine.) Thus, there are mechanisms to > forward store data to loads when the addresses match even when the > store instruction has not retired. This normally goes under the > monicer of Store-to-Load forwarding (STLF). > > Mitch Stack cache. http;//nibz.googlecode.com It's probably some throw back to shared memory models. Cache coherance is for fools who cant run a miniture network. It's like a sick joke where the kids with cash pick the stupid ideas and make fabrications.
From: Morten Reistad on 15 Jul 2010 05:34 In article <96ec88ed-b6a4-4e7e-a5d0-43b98ba34ae2(a)w12g2000yqj.googlegroups.com>, jacko <jackokring(a)gmail.com> wrote: >On 14 July, 17:59, MitchAlsup <MitchAl...(a)aol.com> wrote: >> On Jul 14, 5:37�am, "Piotr Wyderski" >> >> <piotr.wyder...(a)mothers.against.spam.gmail.com> wrote: >> > Not only, Sparcs have much worse memory disambiguation mechanisms. >> > Even such a simple loop shows stunning differences: >> >> > � � for(volatile unsigned int i = 0; i != 1000000000; ++i); >> >> > Investigation at the assembler level has shown that Sparc is good >> > (i.e. Xeon-like) at loads and stores, but when there is a long >> > store-load dependence, the performance loss is a total disaster, to >> > put it mildly. Xeons do not exhibit this kind of problems. >> >> x86 grew up in an environment where stores were reloaded in a rather >> short amount of time {arguments get pushed, then accessed a couple >> cycles later in the called subroutine.) Thus, there are mechanisms to >> forward store data to loads when the addresses match even when the >> store instruction has not retired. This normally goes under the >> monicer of Store-to-Load forwarding (STLF). >> >> Mitch > >Stack cache. http;//nibz.googlecode.com > >It's probably some throw back to shared memory models. Cache coherance >is for fools who cant run a miniture network. > >It's like a sick joke where the kids with cash pick the stupid ideas >and make fabrications. All of these tests handle billions and trillions of pretty lightweight IP packets, either as web servers, rtp reflectors, rtp bridges, dns servers or database servers. There is state kept between packets. Typical processing is high hundreds to low thousands of instructions per packet. This does not allow for many memory accesses to take place. You cannot identify the session before you have done a bit of identification on the packet, and by then there is a thread on a processor in the cluster that has read the packet and is acting on it. When the state needs to be inspected and updated the handful of bytes are very likely (on a 32 way machine the odds are close to 31/32) that they reside in the cache of another processor. Doing an effective cross-cache snoop of that cache then becomes essential for performance. Almost all the UDP stuff on the important internet servers behave like this. It verifies with the new instrumentation in Linux 2.6.31. These servers are what is the effective performance limit for a number of huge web sites. -- mrr
From: jacko on 15 Jul 2010 11:17 Is http://groups.google.com/group/comp.lang.forth/browse_thread/thread/4b9f67406c6852dd/844cab7cd4b9ab52#844cab7cd4b9ab52 important for solving the GC problem? Ref FIFOO
From: George Neuner on 15 Jul 2010 11:48 On Thu, 15 Jul 2010 08:17:44 -0700 (PDT), jacko <jackokring(a)gmail.com> wrote: >Is > >http://groups.google.com/group/comp.lang.forth/browse_thread/thread/4b9f67406c6852dd/844cab7cd4b9ab52#844cab7cd4b9ab52 > >important for solving the GC problem? Ref FIFOO There isn't enough detail in that conversation to figure out what problem the structure is trying to solve. What exactly are you asking? George
From: jacko on 15 Jul 2010 12:05
On Jul 14, 5:59 pm, MitchAlsup <MitchAl...(a)aol.com> wrote: > On Jul 14, 5:37 am, "Piotr Wyderski" > > <piotr.wyder...(a)mothers.against.spam.gmail.com> wrote: > > Not only, Sparcs have much worse memory disambiguation mechanisms. > > Even such a simple loop shows stunning differences: > > > for(volatile unsigned int i = 0; i != 1000000000; ++i); > > > Investigation at the assembler level has shown that Sparc is good > > (i.e. Xeon-like) at loads and stores, but when there is a long > > store-load dependence, the performance loss is a total disaster, to > > put it mildly. Xeons do not exhibit this kind of problems. > > x86 grew up in an environment where stores were reloaded in a rather > short amount of time {arguments get pushed, then accessed a couple > cycles later in the called subroutine.) Thus, there are mechanisms to > forward store data to loads when the addresses match even when the > store instruction has not retired. This normally goes under the > monicer of Store-to-Load forwarding (STLF). > > Mitch Yes, a force write-thru signal? Cache invalidates, well a write must invalidate any other cores cache line, but if no invalidate happens, then there's no imediate pressure to writeback. If an invalidate does happen, then there maybe a pressure to writeback. If an invalidate occurs when queued writeback is queued, then a write displacement writethru must immediatly occur. Am I missing something? Cheers Jacko |