Prev: Call for Papers: The 2010 International Conference on Computer Design (CDES'10), USA, July 2010 (updated)
Next: Protecting 3x16 bits with 16 bits ? (Raptor codes)
From: "Andy "Krazy" Glew" on 20 Feb 2010 13:55 [Skybuckposting about doing GPGPU computations, using SW error detection and correction >>> I have 0% experience with error correcting codes I am afraid ! ;) :) mpm wrote: >> Have you considered paying a consultant? Don't discourage the kid! If Skybuck follows through on this, he's on the way to becoming an expert. Perhaps not in the academic theory, but perhaps on how to do SW ECC on GPU hardware. I have more fun reading Skybuck's posts. Hackers rule! For Skybuck: a) Try googling "residue error detection". I haven't vetted these papers - they give an idea, but they are probably not the papers that talk best abot how to use residues for error detection in ALUs and other computations. Scholarly articles for residue error detection Redundant residue number systems for error detection � - Etzel - Cited by 56 Detection and tracking of point features - Tomasi - Cited by 851 Concurrent error detection using watchdog processors- � - Mahmood - Cited by 315 Search Results Results include your SearchWiki notes for residue error detection. Share these notes Copy and paste this link into an email or IM: See a preview of the shared page 1. An Algorithm for Scaling and Single Residue Error Correction in ... by CC Su - 1990 - Cited by 11 - Related articles - All 4 versions {10} R. W. Watson, "Error detection and correction and other residue-interacting operations in a redundant residue number system," Ph.D. dissertation, ... portal.acm.org/citation.cfm?id=101793.101802 - a') The basic idea is, for every, say, 32 bit floating point comp[utation, compute a, say, 3 bit residue, and check the residues. b) CRCs are not so good for checking ALU integrity. CRCs are okay for checking data stored, e.g. in memory. You might also use a CRC if you do the computation twice, calculating a CRC at various intermediate points, and compare the CRCs as opposed to comparing all computation results. (I've used exactly this in simulators, comparing a hash of the simple in-order and full out-of-order simulations.) i.e. CRCs are for error detection when you do the computation twice, or for storage. Residues are the best known method for doing a computation once, and detecting if that single computation has an error. c) You talked about using xyz, leaving w for ECC. On the GPU I am most familiar with, xyzw are corresponding 32 bit fields of a 128 bit wide SIMD vector. On the GPU I am most familiar with, the instruction set is SIMD, so you tend to want to do all ops on all elements of a vector. Although you can mask. Doing residue or other ECC this way would be inefficient. You really want to do the ECC in a separate computation. A separate vector. E.g. a 4x32 vector of FP, and a 4x4 vector of residues. Or, even better, a vector of 32 32-bit FP data items (8x128 bits), with a 128 bit vector that is 32 4-bit residues. If you can do 32-wide SIMD, this would have the least overhead for residues. However, if your computation is not this uniform... ATI) on ATI's VLIW pipeline you may not lose so much by doing 3x32 + a residue However, the residue ops in SW may require more instructions that the FP. NVDA) If the xyz and w are really different scalars within a single thread of a warp, and you can get a full warp width of converged control flow, great. But even here if, for example, you did 32 bit computations, you might still be able to do 8 4 bit reside computations in the same 32 bits Or, rather, within a thread in a warp you might be able to do 8 32 bit computations, and then a single set of 4-bit wide residue computations packed 8 in a 32 bit number. |