Prev: Delphi needs opengl gui :)
Next: 100% without investment online part time jobs..(adsense,datawork,neobux..more jobs)
From: Andy Glew "newsgroup at on 6 Aug 2010 01:31 On 8/5/2010 8:18 AM, Paul A. Clayton wrote: > On Aug 5, 9:20 am, Andy Glew<"newsgroup at comp-arch.net"> wrote: > [snip] > A tiny future file (e.g., 16 bits of three registers) integrated > with an adder (or two) could be very low latency. > > Interestingly, most recent x86 processors do have a limited > set of front-end registers--the segment registers--(with > dedicated adders to create a single immediate), though a > segment register update stalls the pipeline rather than > allowing forward progress on independent operations. > > > Paul A. Clayton > just a technophile > Plus... loop counter prediction stack pointer tracking This latter, http://www.intel.com/assets/pdf/manual/248966.pdf: 2.1.2.5 Stack Pointer Tracker The Intel 64 and IA-32 architectures have several commonly used instructions for parameter passing and procedure entry and exit: PUSH, POP, CALL, LEAVE and RET. These instructions implicitly update the stack pointer register (RSP), maintaining a combined control and parameter stack without software intervention. These instructions are typically implemented by several μops in previous microarchitectures. 2-9 INTEL® 64 AND IA-32 PROCESSOR ARCHITECTURES The Stack Pointer Tracker moves all these implicit RSP updates to logic contained in the decoders themselves. The feature provides the following benefits: • Improves decode bandwidth, as PUSH, POP and RET are single μop instructions in Intel Core microarchitecture. • Conserves execution bandwidth as the RSP updates do not compete for execution resources. • Improves parallelism in the out of order execution engine as the implicit serial dependencies between μops are removed. • Improves power efficiency as the RSP updates are carried out on small, dedicated hardware. Only thing is, these are somewhat ad-hoc solutions. Not a generic future file. By the way, the segment registers are not really implemented as a future file. They are implemented as a non-renamed or differently renamed register file, inside the AGU. Not a future file at all. Entirely read after schedule.
From: Paul A. Clayton on 6 Aug 2010 14:49 On Aug 6, 1:31 am, Andy Glew <"newsgroup at comp-arch.net"> wrote: > On 8/5/2010 8:18 AM, Paul A. Clayton wrote: [snip] > > Interestingly, most recent x86 processors do have a limited > > set of front-end registers--the segment registers--(with > > dedicated adders to create a single immediate), though a [snip] > Plus... > > loop counter prediction Yet I do not think such predictors actually take advantage of counter knowledge to resolve rather than just predict branches. > stack pointer tracking In a way this is like a non-speculative stride-based value predictor. > By the way, the segment registers are not really implemented as a future > file. They are implemented as a non-renamed or differently renamed > register file, inside the AGU. Not a future file at all. Entirely read > after schedule. Hmm. I thought the immediate was added in the front end (to reduce the number of sources in the AGU). So can the AGUs actually use four inputs--segment base, immediate, base register, index register? Paul A. Clayton just a technophile
From: Norbert Juffa on 7 Aug 2010 06:56 "Paul A. Clayton" <paaronclayton(a)embarqmail.com> wrote in message news:be8676ab-555f-49fd-9550-7d3f8c983c22(a)o19g2000yqb.googlegroups.com... > On Aug 6, 1:31 am, Andy Glew <"newsgroup at comp-arch.net"> wrote: > > On 8/5/2010 8:18 AM, Paul A. Clayton wrote: > [snip] > > > Interestingly, most recent x86 processors do have a limited > > > set of front-end registers--the segment registers--(with > > > dedicated adders to create a single immediate), though a [...] > > By the way, the segment registers are not really implemented as a future > > file. They are implemented as a non-renamed or differently renamed > > register file, inside the AGU. Not a future file at all. Entirely read > > after schedule. > > Hmm. I thought the immediate was added in the front end (to reduce > the number of sources in the AGU). So can the AGUs actually use > four inputs--segment base, immediate, base register, index register? As I recall it, AMD's K6-family used a 4-input adder, since 16-bit operating systems using non-zero segment bases were still in common use during most of the lifetime of that processor family. The Athlon could only handle a segment base of zero at full speed; an extra cycle of delay was incurred in case of a non-zero segment base. By the time Athlon shipped 32-bit operating systems had become common, and from what I remember they all used a flat address space (i.e. segment base of zero). -- Norbert
From: Andy Glew "newsgroup at on 8 Aug 2010 14:48
On 8/6/2010 11:49 AM, Paul A. Clayton wrote: > On Aug 6, 1:31 am, Andy Glew<"newsgroup at comp-arch.net"> wrote: >> On 8/5/2010 8:18 AM, Paul A. Clayton wrote: > [snip] >>> Interestingly, most recent x86 processors do have a limited >>> set of front-end registers--the segment registers--(with >>> dedicated adders to create a single immediate), though a > [snip] >> Plus... >> >> loop counter prediction > > Yet I do not think such predictors actually take advantage of > counter knowledge to resolve rather than just predict > branches. > >> stack pointer tracking > > In a way this is like a non-speculative stride-based value > predictor. > >> By the way, the segment registers are not really implemented as a future >> file. They are implemented as a non-renamed or differently renamed >> register file, inside the AGU. Not a future file at all. Entirely read >> after schedule. > > Hmm. I thought the immediate was added in the front end (to reduce > the number of sources in the AGU). So can the AGUs actually use > four inputs--segment base, immediate, base register, index register? The original P6, and some subsequent P6es, did the full 4 input add. In a single uop. Certain subsequent Intel machines split them up into two uops. By the way: adding in the the segment base was not a problem, since that was a semi-renamed resource in the AGU. Multi-input adders are easy. The problem was getting the three nion-segment components out of the OOO machine: basereg, indexreg, and immediate constant. I bellieve that AMD took an extra cycle if the segbase was nonzero. |