Prev: Texture units as a general function
Next: Buy Cardizem Overnight Delivery on all Cardizem Orders
From: Anton Ertl on 16 Dec 2009 11:23 We recently got a new server that uses the Xeon 3460 (2.8GHz Lynnfield, like a Core i7-860). We use that with an Intel S3420GPLC server board. The other components relevant for power consumption are 1 2GB DDR3 ECC DIMM (the working configuration will have 2 4GB DIMMs, but that's only going to add a few W of power consumption), 2 spinning 3.5" SATA hard disks, an idle DVD-ROM and a Corsair CMPSU-400CX 80+ power supply. Turbo boost was enabled, but apparently did not work in our setup (Debian Lenny (Linux-2.6.26)): our LaTeX benchmark had the same speed at all loads up to 4. We also left SMT ("hyperthreading") enabled. The power consumption of the whole box at different loads (generated with "yes >/dev/null") is: load 0 1 2 3 4 5 6 7 8 2800MHz 53W 80W 96W 118W 138W 145W 149W 151W 155W It's interesting that using SMT contexts (at loads 5-8) costs additional power; not as much as running a core, but a measurable amount. The other remarkable thing is that the idle power is pretty low compared to other single-socket servers we have: 103W for a system with a Xeon 3070 (2.66GHz Core 2 Duo-like), 83W for a system with an Athlon 64 X2 4400+. How does SMT affect performance? We varied the number of running "yes" threads and measured our LaTeX benchmark <http://www.complang.tuwien.ac.at/anton/latex-bench/>. When we started the LateX benchmark concurrently to 0 or 3 yes processes, we saw user (and real) times of around 0.484s. When we ran it concurrently with 4 or 7 yes processes, we saw user (and real) times of around 0.756s. I.e., we get a slowdown by a factor <1.6, whereas without SMT we would have seen a real-time slowdown by a factor of 2 (but no change in user time). So SMT gives a significant benefit for this setup. On the Atom the same setup resulted in a slowdown by a factor >2.3 <2009Sep8.131554(a)mips.complang.tuwien.ac.at>, so there SMT is a disadvantage for this setup. As usual, this is just a single data point and your setup will be different, so YMMV. - anton -- M. Anton Ertl Some things have to be seen to be believed anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html
From: Bengt Larsson on 19 Dec 2009 16:49 anton(a)mips.complang.tuwien.ac.at (Anton Ertl) wrote: >How does SMT affect performance? We varied the number of running >"yes" threads and measured our LaTeX benchmark ><http://www.complang.tuwien.ac.at/anton/latex-bench/>. When we >started the LateX benchmark concurrently to 0 or 3 yes processes, we >saw user (and real) times of around 0.484s. When we ran it >concurrently with 4 or 7 yes processes, we saw user (and real) times >of around 0.756s. I.e., we get a slowdown by a factor <1.6, whereas >without SMT we would have seen a real-time slowdown by a factor of 2 >(but no change in user time). So SMT gives a significant benefit for >this setup. On the Atom the same setup resulted in a slowdown by a >factor >2.3 <2009Sep8.131554(a)mips.complang.tuwien.ac.at>, so there SMT >is a disadvantage for this setup. I don't think "yes > /dev/null" is a good way to test SMT. It's a process that entirely hogs the CPU, but only on integer. A normal process accesses memory and second-level-cache now and then. I have an Atom, and I tested with a parallell make (of an editor, mg2a, in C). With all the files in memory, the make takes 14.4 seconds. with make -j (make -j 3 or 4 seems the most efficient) it takes 10.7 seconds. That is an improvement with 30-35 percent. In fact, even for an integer-hogging process you will get different results depending on how many execution units can be used in parallell. Or which execution units. SMT has its uses, but it makes measurement hard. I love that Atom has SMT though. That was a brilliant decision. For example, when watching Youtube, there are two main threads using CPU, and 2-3 small ones. So the multithreading really should help. Unfortunately I can't turn off the SMT, so I can't test that. (on Acer Aspire One)
From: Bengt Larsson on 19 Dec 2009 17:19 Bengt Larsson <bengtl8.net(a)telia.NOSPAMcom> wrote: >I have an Atom, and I tested with a parallell make (of an editor, >mg2a, in C). With all the files in memory, the make takes 14.4 >seconds. with make -j (make -j 3 or 4 seems the most efficient) it >takes 10.7 seconds. That is an improvement with 30-35 percent. Actually that is a bit stupid, since it improves beyond 2 threads. With two threads, I get 11.3 seconds, an improvment with 27%.
From: Bengt Larsson on 19 Dec 2009 17:38 Here is a fun test: int i; double f; for (i=0; i<limit; i++) { f += 1.0; } Improvement in throughput with two threads = 100%! The floating point add should take 5 cycles on the Atom, and indeed a loop is very close to 5 cycles per iteration. This also illustrates the "limited slip" in Atom, where integer instructions can bypass long-running floating point ones.
From: nedbrek on 20 Dec 2009 12:04 Hello all, "Bengt Larsson" <bengtl8.net(a)telia.NOSPAMcom> wrote in message news:a9kqi5tp99eana4uoc2r9d0l998gpuu21g(a)4ax.com... > Bengt Larsson <bengtl8.net(a)telia.NOSPAMcom> wrote: > >>I have an Atom, and I tested with a parallell make (of an editor, >>mg2a, in C). With all the files in memory, the make takes 14.4 >>seconds. with make -j (make -j 3 or 4 seems the most efficient) it >>takes 10.7 seconds. That is an improvement with 30-35 percent. > > Actually that is a bit stupid, since it improves beyond 2 threads. > With two threads, I get 11.3 seconds, an improvment with 27%. I usually do a "make -j N", where N = cores * 1.5 or 2. Compiling often gets stuck on disk (even if the source is in memory, and the final output is memory [ramdisk?], are all the temporary outs in memory? what about staticly linked libs?). Ned
|
Next
|
Last
Pages: 1 2 3 Prev: Texture units as a general function Next: Buy Cardizem Overnight Delivery on all Cardizem Orders |