Prev: Lolling at programmers, how many ways are there to create a bitmask ? ;) :)
Next: A naive conjecture about computer architecture and artificialintelligence
From: MitchAlsup on 6 Jul 2010 22:31 On Jul 6, 6:31 pm, Andy 'Krazy' Glew <ag-n...(a)patten-glew.net> wrote: > Neither does well on pointer chasiing > loop > ptr = ptr->next > or on hash table chasing > loop > h = hash(h+data[i]) The DRAM controler on my last design actually did data structures like this with amazing clairvoiance. It ends up that the benchmark built then traversed a very large data structure. Aftersetup, the data structure did not change "all that much" and left a clear pattern of +1,+1,-3 <repeat> that our DRAM prefetcher latched onto and eliminated the DRAM latency. One still paid the on-die latencies ot the memory controler and DRAM controler, however. Call Indirect and Jump Indirect Tables with and without hashing are generally pretty good given suficient area to contain the footprint and a little cleverness of the designers tuning the hash. Here, I think, it is much more worthwhile for the software people to choose reasonable data structures to contain the indirect tables, and then, over time, let the HW guys stumble over things to optimize. So, without SW doing anything special, all of a sudden it just gets faster. AND it looses none of its obviousness or RAS. Not so with the "special creative bent of the moment stuff." Stride 1 and Stride N stuff is so 1970. If you can't get these right, you might consider playing golf for a living..... Gather/Scater is where it is at today--yet I see so few that even make a quality attempt here. Mitch
From: Andy 'Krazy' Glew on 7 Jul 2010 11:56 On 7/6/2010 5:49 PM, jacko wrote: > JMP (PC+(SP++)) > > Is this an instruction worth having? SP is return stack pointer, for > forth people this would be RP. Get the syntax right. I think you mean JMP (PC+(*SP++)) As in tmp := load M[SP] inc sp += sizeof(tmp) tmp := PC + tmp jmp (tmp)
From: MitchAlsup on 7 Jul 2010 12:01 On Jul 6, 7:49 pm, jacko <jackokr...(a)gmail.com> wrote: > JMP (PC+(SP++)) > > Is this an instruction worth having? SP is return stack pointer, for > forth people this would be RP. In the PDP-11 one had access to: CALL (sp)+ Which would use the SP to access the TOS (and pop it off), then jump to that location, and push the return PC back onto the stack. They used to call this a co-routine, and one could build some amazing protocol machines using this technique. In effect, the state of the otherside of the protocol was kept in the return address instead of a memory variable. So, when you called it back, you ended up at the code that was appropriate for the state at hand. Mitch
From: Andy 'Krazy' Glew on 7 Jul 2010 12:12 On 7/6/2010 7:31 PM, MitchAlsup wrote: > On Jul 6, 6:31 pm, Andy 'Krazy' Glew<ag-n...(a)patten-glew.net> wrote: >> Neither does well on pointer chasiing >> loop >> ptr = ptr->next >> or on hash table chasing >> loop >> h = hash(h+data[i]) > > The DRAM controler on my last design actually did data structures like > this with amazing clairvoiance. It ends up that the benchmark built > then traversed a very large data structure. Aftersetup, the data > structure did not change "all that much" and left a clear pattern of > +1,+1,-3<repeat> that our DRAM prefetcher latched onto and eliminated > the DRAM latency. One still paid the on-die latencies ot the memory > controler and DRAM controler, however. Yes, Mitch. But the benchmark in question was contrived,. Run the program in the benchmark for a while, and you did not get this neat pattern. The linked list became randomized. I.e. your algo did not work as well in real life as it did in the benchmark, even for the program in the benchmark. However: your point is valid (becomes valid) when you have a garbage collector that tends to reorganize data so that certain pointer chases are allocated sequentially.
From: Ira Baxter on 8 Jul 2010 11:11
"Andy 'Krazy' Glew" <ag-news(a)patten-glew.net> wrote in message news:4C34A662.9080505(a)patten-glew.net... > On 7/7/2010 12:30 AM, Terje Mathisen wrote: >> Andy 'Krazy' Glew wrote: >>> > I meant to add: my recommendation in the P6 code generator's guide was to > issue > > CALL (reg) // indirect > CALL constant address of most likely method > > This way, even if the indirect prediction did not work (P6 didn't have a > fancy indirect predictor, just a BTB that predicted that the indirect > branch went the same place as it did last time), you would at least have > branched to whatever the compiler thought was the most likely method. So > likely some prefetch would have worked. > > You can think of this as a hammock, convergent code, with the divergence > occupying one instruction. > > Unfortunately, it required the called function to adjust the return > address, which did not fly ABI-wise. > I've often wanted to control where the return location of a subroutine was without doing a jump. A way to do this: MOV rn,=returnlocation JMP subroutine .... subroutine: PUSH rn .... RET (One can also just leave the return address in the register if the called procedure is a small leaf, but I'm interested in the PUSH-on-stack-case). One additional nice advantage is that it is easy to build in-line parameter lists. If you can do this, the double call above is possible without return address adjustment :-} MOV rn, =returnlocation JMP (reg) JMP constantaddress returnlocation: ... This raises register pressure some, but that's probably OK. But it has worse defect on modern machines. The hidden return stack used by branch predictors don't see a return address being PUSHed; just some data. So when the called routine returns, you get a mispredict because the shadow return stack is out of sync. To solve this problem, I've wished I had a PUSHRETURN <operand> to signal that ta return address is being pushed so as to help the shadow return stack predictor. (On the x86 one could easily add this with a prefix byte on push instructions). Why isn't this a good idea, and why haven't we seen it in x86 machines? I think I saw something similar for the 370 architecture. Is it a patent issue? Or just not worth the silicon according to somebody's study? Ira Baxter Semantic Designs |