Prev: x86 instruction set usage-difference between windows 95 andwindows xp ?
Next: Probably a range/overflow checking bug in Delphi 2007 compiler.
From: Andy 'Krazy' Glew on 25 May 2010 03:16 http://semipublic.comp-arch.net/wiki/Transitive_Closure_Propagation == AMD Patent Application == See patent application http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=transitive.TTL.&s2=Advanced.AS.&OS=TTL/transitive+AND+AN/Advanced&RS=TTL/transitive+AND+AN/Advanced United States Patent Application 20080028193 Dhodapkar; Ashutosh S. Transitive suppression of instruction replay Filed July 31, 2006 == Other Work == Mike Haertel and I realized that replay was a dynamically unstable system as soon as we were introduced to the concept of replay, by Sager at Upton at the beginning of Willamette 1994-1996. The concept of transitive closure to suppress replay was the obvious next step. We explored concepts such as bitmasks attached to older instructions indicating what younger instructions might need to be canceled on a replay, and vice versa. Complete transitive closure is best suited to bitmask schedulers, because of the high fan-out. Incomplete transitive closure is another possibility: rather than indicating all children, or all ancestors, one might indicate, for example, the next N descendants. Indicating children can be done by iterating through the scheduler, whereas indicating ancestors tends to be best suited to CAMing. Instantaneous transitive closure cancellation is not necessary. It is only necessary that the cancellation wavefront propagate faster than the true execution wavefront. As mentioned above, listing N children or other descendants, compared to a smaller number M of immediate children. Particularly suited to schedulers of limited fan-out. Indeed, one might have a fixed size list of N children, and use any excess N-M for grandchildren, etc. In case the number of immediate descendants is too small, overflow lists, possibly with a different, flattened, dependency structure for cancellations. Some have observed that not making any special efforts to propagate the cancellation wave faster is usually okay: so long as the cancellation wave does not take cache misses. We might arrange for all of the cancellation wave operations to be unit latency, lower than true computation. It may be desirable to place the cancellation propagation in a different scheduler - aka a recovery scheduler. Hybrid systems, with bitmask transitive closure in an inner scheduler and tag oriented iterative cancellation in an outer scheduler, is a possibility. My (Glew's) multistar paper microarchitecture, http://docs.google.com/fileview?id=0B5qTWL2s3LcQZDIyZDVmN2EtYjY4MC00YjU2LWE4ZGMtYzk2MmU4M2U2NDQ5&hl=en, designed August 2004, disclosed in patent applications United States Patent Application 20070083735 Glew; Andrew Forsyth, Hierarchical processor Filed: August 29, 2005 describe replay schedulers and anti-schedulers with partial transitive closure. This was not claimed as novel in this patent application because transitive closure was already understood to be well-known. |