Prev: Effects of Memory Latency and Bandwidth on Supercomputer,Application Performance
Next: Changing the color of objects/primitives only ? (flat shading...) (massive parallel lookup hardware idea...)
From: =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= on 27 Jul 2010 09:16 <http://www.redbooks.ibm.com/redpieces/pdfs/sg247833.pdf> Rather impressive. It is not many years ago that Z-systems lagged POWER badly. Briefly: 5.2 GHz OoO, hints at POWER4 like 5 instruction grouping up to 3 instructions decode per clock up to 5 instruction dispatch to functional units per clock 128 KB L1D, 64 KB L1I 1.5 MB L2 24 MB L3 per 4 cores up to 768 MB L4 256 byte line sizes at all levels. -- Mvh./Regards, Niels J�rgen Kruse, Vanl�se, Denmark
From: Andy Glew "newsgroup at on 27 Jul 2010 09:25 On 7/27/2010 6:16 AM, Niels J�rgen Kruse wrote: > <http://www.redbooks.ibm.com/redpieces/pdfs/sg247833.pdf> > > Rather impressive. It is not many years ago that Z-systems lagged POWER > badly. > > Briefly: > > 5.2 GHz > OoO, hints at POWER4 like 5 instruction grouping > up to 3 instructions decode per clock > up to 5 instruction dispatch to functional units per clock > 128 KB L1D, 64 KB L1I > 1.5 MB L2 > 24 MB L3 per 4 cores > up to 768 MB L4 > 256 byte line sizes at all levels. 256 *BYTE*? 2048 bits? Line sizes 4X the typical 64B line size of x86? These aren't cache lines. They are disk blocks. Won't make Robert Myers happy.
From: Terje Mathisen "terje.mathisen at on 27 Jul 2010 11:08 Andy Glew wrote: > On 7/27/2010 6:16 AM, Niels J�rgen Kruse wrote: >> 24 MB L3 per 4 cores >> up to 768 MB L4 >> 256 byte line sizes at all levels. > > 256 *BYTE*? Yes, that one rather screamed at me as well. > > 2048 bits? > > Line sizes 4X the typical 64B line size of x86? > > These aren't cache lines. They are disk blocks. Yes. So what? I (and Nick, and you afair) have talked for years about how current CPUs are just like mainframes of old: new old DISK -> TAPE : Sequential access only RAM -> DISK : HW-controlled, block-based transfer CACHE -> RAM : Actual random access, but blocks are still faster > > Won't make Robert Myers happy. > 768 MB of L4 means your problem size is limited to a little less than that, otherwise random access is out. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: =?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?= on 27 Jul 2010 12:37 Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > Andy Glew wrote: > > On 7/27/2010 6:16 AM, Niels J�rgen Kruse wrote: > >> 24 MB L3 per 4 cores > >> up to 768 MB L4 > >> 256 byte line sizes at all levels. > > > > 256 *BYTE*? > > Yes, that one rather screamed at me as well. Another surprising thing I spotted browsing through the redbook, is the claim of single cycle L1D access. That must be array access only, so there are at least address generation and format cycles before and after. Still, 3 cycle loads from a 128 KB L1D at 5.2 GHz must show up on the power budget. -- Mvh./Regards, Niels J�rgen Kruse, Vanl�se, Denmark
From: Robert Myers on 27 Jul 2010 13:25
On Jul 27, 9:25 am, Andy Glew <"newsgroup at comp-arch.net"> wrote: > On 7/27/2010 6:16 AM, Niels Jørgen Kruse wrote: > > > <http://www.redbooks.ibm.com/redpieces/pdfs/sg247833.pdf> > > > Rather impressive. It is not many years ago that Z-systems lagged POWER > > badly. > > > Briefly: > > > 5.2 GHz > > OoO, hints at POWER4 like 5 instruction grouping > > up to 3 instructions decode per clock > > up to 5 instruction dispatch to functional units per clock > > 128 KB L1D, 64 KB L1I > > 1.5 MB L2 > > 24 MB L3 per 4 cores > > up to 768 MB L4 > > 256 byte line sizes at all levels. > > 256 *BYTE*? > > 2048 bits? > > Line sizes 4X the typical 64B line size of x86? > > These aren't cache lines. They are disk blocks. > > Won't make Robert Myers happy. So much for my hopes for Blue Waters--unless IBM has some other tricks up its sleeve, which wouldn't surprise me. Robert. |