Prev: Picking N-th ready element (e.g. in an OOO scheduler)
Next: Lolling at programmers, how many ways are there to create a bitmask ? ;) :)
From: nedbrek on 28 Jun 2010 10:14 Hello all, "Andy 'Krazy' Glew" <ag-news(a)patten-glew.net> wrote in message news:4C2627FD.9030100(a)patten-glew.net... > On 6/25/2010 5:51 PM, mac wrote: > > The Pin interface, http://www.pintool.org/, may be a good start. I meant to post a status update on my search for x86 performance simulators... I looked at PTLsim (and a related project, MARSS). I then realized that every simulator is just: while(1) { commit() exe() schedule() dispatch() fetch() } The interesting points are really in the architectural model, the memory system, and the system emulation. These are the hard part, and the part you must be most familiar with (to understand the impact on your other ideas). Thus, there is really no point in trying to reuse an existing infrastructure (since you need total knowledge, you must rewrite it to understand it). So, I started on my own arch model, with intentions of developing the system model... In the meantime, I'd like to do some simple studies. In this case, I think a lighter weight system would be good. Looking at Pin, I think I can throw together a DFA-like simulator pretty quickly... I should be back soon... Ned
From: Andy 'Krazy' Glew on 28 Jun 2010 09:53 On 6/28/2010 7:14 AM, nedbrek wrote: I then realized that > every simulator is just: > while(1) { > commit() > exe() > schedule() > dispatch() > fetch() > } > Not quite. What you have above is the so-called "reverse pipeline" model. Particularly if every iteration of the outer loop corresponds to a cycle. If so, then such a simulator cannot model pipelines that have 0 cycles through any such pipestage. Now, while at the moment we tend to assume that traditional RISC 5-stage pipelines are the shortest pipelines likely, some of us (me, at least) like being able to model eliminating the schedule pipestage, etc. To accomplish this, we connect the pipestages by queues or buffers (not necessarily in order), and timestamp queue entries with the earliest possible time that an entry can be consumed. This leads to for every cycle fetch(q1) dispatch(q1,q2) schedule(q2,q3) exe(q3,q4) commit(q4) or for every cycle while cycle not done fetch(q1) dispatch(q1,q2) schedule(q2,q3) exe(q3,q4) commit(q4) and, in general, the pipeline network is represented by a datastructure, not by code, allowing arbitrary order of evaluation of pipestages. The better simulators sort the pipestages for efficient evaluation. > The interesting points are really in the architectural model, the memory > system, and the system emulation. These are the hard part, and the part you > must be most familiar with (to understand the impact on your other ideas). > > Thus, there is really no point in trying to reuse an existing infrastructure > (since you need total knowledge, you must rewrite it to understand it). > > So, I started on my own arch model, with intentions of developing the system > model... > > In the meantime, I'd like to do some simple studies. In this case, I think > a lighter weight system would be good. > > Looking at Pin, I think I can throw together a DFA-like simulator pretty > quickly... > > I should be back soon... Amen. You don't want a simulator. You want a library of simulator components, and a toolbox of different simulator frameworks.
From: MitchAlsup on 28 Jun 2010 11:55 On Jun 28, 8:53 am, Andy 'Krazy' Glew <ag-n...(a)patten-glew.net> wrote: > and, in general, the pipeline network is represented by a datastructure, not by code, allowing arbitrary order of > evaluation of pipestages. The better simulators sort the pipestages for efficient evaluation. I advocate actively pursuing the random ordering of pipestage evaluation. This randomization exposes microarchitectural race conditions. Thus one might put the pipe stages in an array, and then randomize the array before each clock cycle such as: struct PipeStage pipestages[] = { fetch(), decode(), stations(), execute(), cache(), writeback(), update() }; # define NUMSTAGES (sizeof pipestages/sizeof PipeStage) struct PipeStage random[ NUMSTAGES ]; random = pipestages; while( FOREVER ) { randomize( *random, NUMSTAGES ); for( cpu = 0; I < CPUs; cpu++ ) for( stage = 0; stage < NUMSTAGES; stage++ ) random[stage]( CPU[cpu], stage ); } randomize can start out as the null randomizer and be advanced when the rest of the simulator is ready. Just swapping two elements at a time is completely sufficient to stumble upon these microarchitectural race conditions as long as you do not backtrack, and you have a good random number generator. Sometimes, you will want a nonrandom number generator to direct the randomization goals, and you should be sure to test the pipeline in the straight forward and straight backwards directions. {Don't forget to also randomize the order of the memory hierarchy and southbridge components, and any other sub-system in different clock domains.} Mitch
From: nedbrek on 29 Jun 2010 07:11 Hello all, "MitchAlsup" <MitchAlsup(a)aol.com> wrote in message news:77b7636d-0ed3-4758-8ff8-d9beb1965c18(a)c33g2000yqm.googlegroups.com... > On Jun 28, 8:53 am, Andy 'Krazy' Glew <ag-n...(a)patten-glew.net> wrote: >> and, in general, the pipeline network is represented by a datastructure, >> not >> by code, allowing arbitrary order of evaluation of pipestages. The better >> simulators sort the pipestages for efficient evaluation. > > I advocate actively pursuing the random ordering of pipestage > evaluation. This randomization exposes microarchitectural race > conditions. That's an interesting approach. I feel it's too close to the RUU (unless I am misunderstanding). I don't like having timestamps, except for debugging. I am of the camp "let there be a software structure for each hardware structure, and code for logic" (although parts of the memory and i/o system often devolve into timestamped queues, due to the enormous latencies). For IPFsim, we had a nice infrastructure (using factories) to instantiate scheduler and execution frameworks. It supported in-order (for our McKinley comparisons), P3, P4, and HSW. Most of the debugging I've done is through testing the extremities of knobs (open everything up and graph the performance, look for outliers), stepping through code, and looking at execution traces (Ed Grochowski wrote a nice tool for visualizing them, called Pipedream - it was this tool which helped convert him to the out-of-order faith). If there was one crazy new idea I'd want, it's the ability to run time backwards. I can't count the number of times I was tracking down a bug, and stepped one cycle too far! Ned
From: Muzaffer Kal on 29 Jun 2010 10:27
On Tue, 29 Jun 2010 06:11:52 -0500, "nedbrek" <nedbrek(a)yahoo.com> wrote: > >If there was one crazy new idea I'd want, it's the ability to run time >backwards. I can't count the number of times I was tracking down a bug, and >stepped one cycle too far! Isn't this as easy as keeping the last N cycle/instruction states and reload? -- Muzaffer Kal DSPIA INC. ASIC/FPGA Design Services http://www.dspia.com |