Prev: Learn about proxy sites and how to use them to open blocked sites unlimited downloads from RapidShare and megaupload and increase the speed of the Internet with new sites for free
Next: Why can't I catch a SIGBUS on my Mac (when strlen(NULL) is used)?
From: Peter Olcott on 5 Apr 2010 11:49 "Ersek, Laszlo" <lacos(a)caesar.elte.hu> wrote in message news:Pine.LNX.4.64.1004051406400.9882(a)login01.caesar.elte.hu... > On Mon, 5 Apr 2010, Paul wrote: > >> Sorry to interrupt the thread. You had a question about >> variability in performance of a small test program. (Your >> "memory bandwidth" test...) >> >> Maybe you've figured it out already. It isn't a problem >> with TLB performance. The notion of a random walk is >> wrong. > >> In other words, for about the first (roughly) 4143 steps, >> the program is doing a random walk. Then, the array walk >> gets stuck in a repeating, 23 step loop. > >> In any event, the random walk is not random. It is >> getting stuck in a (short) loop. The TLB might still be a >> factor, but you'll need a better test design, to evaluate >> that. > > Would you comment on the code herein? > > http://groups.google.com/group/comp.os.linux.development.apps/msg/aaf49b54b2501740?dmode=source > > Thanks, > lacos uint32 GetRandom(uint32 size) { return (rand() * (RAND_MAX + 1) + rand()) % size; } The only thing that I care about is making this function produce sequences of values that are more widely distributed. I want to force a cache busting algorithm on memory accesses.
From: Ersek, Laszlo on 5 Apr 2010 14:13 On Mon, 5 Apr 2010, Peter Olcott wrote: > "Paul" <nospam(a)needed.com> wrote in message > news:hpc817$9kn$1(a)news.eternal-september.org... >> Sorry to interrupt the thread. You had a question about variability in >> performance of a small test program. (Your "memory bandwidth" test...) >> >> Maybe you've figured it out already. It isn't a problem with TLB >> performance. The notion of a random walk is wrong. >> >> Your array was initialized with this. >> >> Random = rand() * rand(); >> Random %= size; >> Data[N] = Random; >> >> And tested for a billion cycles with this. >> >> for (uint32 N = 0; N < Max; N++) >> num = Data[num]; >> >> (In my previous posting, I'd missed the fact, that you were doing a >> constant number of steps in this loop, and varying the array size. In >> which case, the time to execute this walk should be constant, as long >> as the time to do the 1 billion memory accesses was the same.) > > No. The time should (and does) generally increase because of decreasing > cache spatial locality of reference. The random generator was already > modified: > > uint32 GetRandom(uint32 size) { > return (rand() * (RAND_MAX + 1) + rand()) % size; > } > > This is still not random enough but was not worth the time to fix on > this quick and dirty little proof of concept program. Simply run the > test ten times on every memory size and ignore the fast times. I'm looking at the source in <http://groups.google.ca/group/comp.os.linux.development.apps/msg/0d869538b2dc4d39?dmode=source> as if Initialize() would call the GetRandom() function above to populate Data[N]. IMO the problem is not the quality of the PRNG. If you were simply accessing array elements at offsets returned by the PRNG, the emergent pattern might perfectly well suit your needs. You are, however, placing "random pointers" at increasing offsets in Initialize(), then traverse this linked list in Process(). No matter how good and uniformly distributed the PRNG is, this may (and as Nicolas has shown, *will*) result in a short cycle somewhere. Suppose m, n, k are pairwise distinct, and x[m] == k and x[n] == k. The chain starting at x[k] will lead to either m or n (supposing it doesn't lead to an even shorter loop). Whichever comes first, the other one won't be reached anymore. Supposing n is the one reached no more, any node of any path leading to n isn't reached either. Perhaps picking the worst performing one of ten subsequent test runs could be shown to be "good enough", but I'm not confident. lacos
From: Peter Olcott on 5 Apr 2010 14:49 "Ersek, Laszlo" <lacos(a)caesar.elte.hu> wrote in message news:Pine.LNX.4.64.1004051935200.27152(a)login01.caesar.elte.hu... > On Mon, 5 Apr 2010, Peter Olcott wrote: > >> "Paul" <nospam(a)needed.com> wrote in message >> news:hpc817$9kn$1(a)news.eternal-september.org... > >>> Sorry to interrupt the thread. You had a question about >>> variability in performance of a small test program. >>> (Your "memory bandwidth" test...) >>> >>> Maybe you've figured it out already. It isn't a problem >>> with TLB performance. The notion of a random walk is >>> wrong. >>> >>> Your array was initialized with this. >>> >>> Random = rand() * rand(); >>> Random %= size; >>> Data[N] = Random; >>> >>> And tested for a billion cycles with this. >>> >>> for (uint32 N = 0; N < Max; N++) >>> num = Data[num]; >>> >>> (In my previous posting, I'd missed the fact, that you >>> were doing a constant number of steps in this loop, and >>> varying the array size. In which case, the time to >>> execute this walk should be constant, as long as the >>> time to do the 1 billion memory accesses was the same.) >> >> No. The time should (and does) generally increase because >> of decreasing cache spatial locality of reference. The >> random generator was already modified: >> >> uint32 GetRandom(uint32 size) { >> return (rand() * (RAND_MAX + 1) + rand()) % size; >> } >> >> This is still not random enough but was not worth the >> time to fix on this quick and dirty little proof of >> concept program. Simply run the test ten times on every >> memory size and ignore the fast times. > > I'm looking at the source in > <http://groups.google.ca/group/comp.os.linux.development.apps/msg/0d869538b2dc4d39?dmode=source> > as if Initialize() would call the GetRandom() function > above to populate Data[N]. > > IMO the problem is not the quality of the PRNG. If you > were simply accessing array elements at offsets returned > by the PRNG, the emergent pattern might perfectly well > suit your needs. You are, however, placing "random > pointers" at increasing offsets in Initialize(), then > traverse this linked list in Process(). No matter how good > and uniformly distributed the PRNG is, this may (and as > Nicolas has shown, *will*) result in a short cycle > somewhere. Not if you force every access to be at (least size * 0.25) distance from the prior access. > > Suppose m, n, k are pairwise distinct, and x[m] == k and > x[n] == k. The chain starting at x[k] will lead to either > m or n (supposing it doesn't lead to an even shorter > loop). Whichever comes first, the other one won't be > reached anymore. Supposing n is the one reached no more, > any node of any path leading to n isn't reached either. > > Perhaps picking the worst performing one of ten subsequent > test runs could be shown to be "good enough", but I'm not > confident. > > lacos
From: Nicolas George on 5 Apr 2010 15:09 "Peter Olcott" wrote in message <j-qdnXbpHJorrSfWnZ2dnUVZ_jWdnZ2d(a)giganews.com>: > Not if you force every access to be at (least size * 0.25) > distance from the prior access. Distance has nothing to do with it. You can cycle between the first and last element of the array, that makes size - epsilon, but it still is a very small cycle.
From: Peter Olcott on 5 Apr 2010 15:43 "Nicolas George" <nicolas$george(a)salle-s.org> wrote in message news:4bba35de$0$29842$426a74cc(a)news.free.fr... > "Peter Olcott" wrote in message > <j-qdnXbpHJorrSfWnZ2dnUVZ_jWdnZ2d(a)giganews.com>: >> Not if you force every access to be at (least size * >> 0.25) >> distance from the prior access. > > Distance has nothing to do with it. You can cycle between > the first and last > element of the array, that makes size - epsilon, but it > still is a very > small cycle. I am trying to find algorithm that makes cache as ineffective as possible. I want to get an accurate measure of the worst case performance of my deterministic finite automaton. The only thing that I can think of is to make sure that each memory access is more than max cache size away from the prior one. This should eliminate spatial locality of reference.
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Prev: Learn about proxy sites and how to use them to open blocked sites unlimited downloads from RapidShare and megaupload and increase the speed of the Internet with new sites for free Next: Why can't I catch a SIGBUS on my Mac (when strlen(NULL) is used)? |