Prev: And, following a series of proxy sites blocked open any page or download from RapidShare download sites have all accelerated the internet da free
Next: pseudoterminals and close
From: Scott Lurndal on 22 Mar 2010 14:59 "Peter Olcott" <NoSpam(a)OCR4Screen.com> writes: >I can't reply to this post with quoting turned off. I always >reply point for point, but, with quoting turned off it would >be too difficult to see who said what. Is there any way that >you can report your reply with quoting turned on? Learn to use a better news client. There is 'no such thing' as quoting, other that the perfectly legitimate quoting that David had already provided in his post (hint: the Usenet RFC's allow one or leading '>' symbols to denote quoting). scott
From: Scott Lurndal on 22 Mar 2010 15:00 David Schwartz <davids(a)webmaster.com> writes: >On Mar 21, 9:29=A0pm, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote: > >> It seems you may have missed this point machine A and >> machine C are given to have identical processors, and the >> ONLY difference between them is that machine C has much >> faster access to RAM =A0than machine A. > >You have previously said: "The new machines CPU is only 11% faster >than the prior >machine." And that seems to have been based purely on clock speed. Of course that doesn't include micorarchitectural and superscaler improvements. scott
From: Peter Olcott on 22 Mar 2010 15:25 "Scott Lurndal" <scott(a)slp53.sl.home> wrote in message news:c7Ppn.5$xs7.4(a)news.usenetserver.com... > David Schwartz <davids(a)webmaster.com> writes: >>On Mar 21, 9:29=A0pm, "Peter Olcott" >><NoS...(a)OCR4Screen.com> wrote: >> >>> It seems you may have missed this point machine A and >>> machine C are given to have identical processors, and >>> the >>> ONLY difference between them is that machine C has much >>> faster access to RAM =A0than machine A. >> >>You have previously said: "The new machines CPU is only >>11% faster >>than the prior >>machine." > > And that seems to have been based purely on clock speed. > Of course > that doesn't include micorarchitectural and superscaler > improvements. > > scott > Oh right I forgot about these sorts of things. Basically more instructions per clock cycle.
From: Chris Friesen on 22 Mar 2010 16:11 On 03/21/2010 10:18 PM, David Schwartz wrote: > Now imagine you turn it into two threads, one doing this: > X, X+1, X+2, X+3 > and one doing this: > Y, Y+1, Y+2, Y+3 > > Now, the prefetcher (still seeing only one read ahead) will see the > read for X+1 when it processes the read for X. The net result will be > that the two threads will run about twice as fast with the same memory > hardware, even though they are purely memory limited. I was under the impression that the hardware prefetcher was independent of threads of execution, in which case this wouldn't make any difference. Are you aware of CPUs which tie the prefetcher to execution context? Also, you are probably aware of this but for the benefit of other readers generally on modern processers the prefetcher can track several prefetch streams simultaneously. Chris
From: David Schwartz on 22 Mar 2010 16:42
On Mar 22, 1:11 pm, Chris Friesen <cbf...(a)mail.usask.ca> wrote: > I was under the impression that the hardware prefetcher was independent > of threads of execution, in which case this wouldn't make any > difference. Are you aware of CPUs which tie the prefetcher to execution > context? The prefetcher is a per-core construct and only sees the flow of instructions on that particular core. Two cores means two prefetchers, each seeing half of the operations. > Also, you are probably aware of this but for the benefit of other > readers generally on modern processers the prefetcher can track several > prefetch streams simultaneously. Right. The example was a huge oversimplification. More likely, there will be a small number of expensive memory operations interleaved with a large number of cheap (from cache) memory operations. The issue is how often the prefetcher will be able to merge fetches in the instruction stream that could be merged. It's easy to see in an artificial example like (Where Fx are fast operations): X, F1, F2, F3, X+1, F4, F5, F6 The prefetcher might not see the 'X+1' when it processes the 'X'. But if two cores wind up with one doing: X, F2, X+1, F5, ... The prefetcher is more likely to merge the X and X+1 fetches. When the prefetcher issues the fetch for X, a window opens up for the duration of that fetch during which a fetch for X+1 is much less expensive than it would ordinarily be. Whether the prefetcher sees that fetch in that window or not will depend on how many instructions, and how many fetches, are between the fetch for X and the fetch for X +1. (And other factors.) DS |