Prev: Call for Papers: The 2010 International Conference on Computer Design (CDES'10), USA, July 2010 (updated)
Next: Protecting 3x16 bits with 16 bits ? (Raptor codes)
From: Skybuck Flying on 11 Feb 2010 14:42 Hello, Soon I will attempt some gpgpu development/shader development... I have already done some tests and so forth the gpu and it's memory seem to be working just fine... (and fast ;) :)) No bit errors so far. However the thought of bit errors creeping into it is indeed a bit scary... The hardware is older from 2006: nvidia 7900 gtx with 512 MB of RAM. (OpenGL will be used for it's development and cg shaders). It seems this gpu works with 4 floating point fields per register. Either 16 bit or 32 bit. Maybe it's always 32 bit ? I am not sure... probably not because 16 bit performance is twice as high. I plan on using: 3x16 bits so that needs a "vector" register with 4x16 bits. This means only 48 bits are used for example the .x, .y, .z and this leaves the .w to be used for something else. For me it could be interesting to make two modes of operations for the final product: First mode which uses 3x16 bits in registers and memory. Second mode which uses 4x16 bits in registers and memory. This would make the second mode slightly less memory efficient. I have a feeling that the gpu is very fast and has plenty of processing power/instruction power. So I might get away with adding some extra "data integrity checks" to the ..x, .y, .z components and store them in .w component. So the idea is basically to apply a 16 bit "data integrity check" to 48 bits of data. Using Shader Model 3.0 instructions/CG language for now... I wonder what is a good data integrity checking algorithm to detect bit errors in this situation ? The data integrity algorithm should not use to many branches... hopefully just one branch to compare results ?! ;) It shouldn't use too many memory look ups... that would be bad for performance ?! ;)... The 3x16 bits could be unpacked into 6x8 bit quantities which are then stored in 32 bit floating point registers to do further calculations on... If the algorithm is in floating point format and limits itself to 8 bit to 16 bit precision than that should work just fine... algorithms can be done with integers... I can convert it to 16 bit floating points ;) I could even convert it to 64 bit floating point but that would require 64 bit software math and would slow things done so better not to do that... The following operations are definetly available for such an algorithm: 32 bit floating point addition, can be used as 8 bit or 16 bit integer addition, (max 24 bit) 32 bit floating point subtraction, can be used as 8 bit or 16 bit integer subtraction, (max 24 bit) 32 bit floating point multiplication, can be used as 8 bit or 16 bit integer multiplication, (max 24 bit) 32 bit floating point division, can be used as 8 bit or 16 bit integer division, (max 24 bit) and ofcourse special graphics instructions: dot operations interpolation operations. (But I have never used them and don't quite understand on the gpu at least but I am willing to learn ;) :)) Some idea's in my head for now: 1. A simply weak "checksum" where everything is summed together... seems like a very bad algorithm, since bit flips might go undetected. 2. A crc32 ? But the algorithm I have requires a large table and thus memory lookups... doesn't seem to smart... and crc32 is like a large division ? maybe overkill for just 48 bits of data ? 3. I can vaguely remember something about parity ? Is that the same as a checksum or different ? I think that's different... parity counts the bits set and then stores that ? Doesn't seem so strong ? 4. I can vaguely remember an error correcting code which could correct 1 bit error ? by using two parities or so ? one vertical, one horizontal ? So I ask you software programmers/developers and hardware designers and algorithm designers :) out there the following question: What kind of error detection algorithms, or maybe even error correction algorithms are out there that you think would be suited for this special situation ? Also maybe you can design something specially for this situation ? (The algorithm could later also be applied to slightly smaller bits like only 16 bits, or 32 bits, or maybe even just 8 bits, but stored in 16 bits... that would be nice.) Bye, Skybuck.
From: Skybuck Flying on 11 Feb 2010 14:53 Here is a interesting thread about error detection on gpu's: http://gpgpu.org/forums/viewtopic.php?p=18648&sid=c7bf701c2deed980c0a8745f15a630b8 The first guy says: "run twice see if same results..." I think this is dangerous and wastefull, if a bit is truely damaged, the same bit error might simply occur twice. It's also wasting resources big time ;) :) Another guy says: people do weird things to their systems: like overclock, or not cool properly. Another guy says: memory chips might become warmer over time and might start producing bit errors. So I am thinking: It's now winter.. the computer is cool... thus no bit errors... but what happens in the summer when it's fricking hot ? Maybe bit errors will creep in... I will probably not be running my pc intensively during the summer, but my "future" products users might be... it's good to protect them from possible bit errors me thinks ;) :) So I kinda like this idea of adding some bit error detection capabilities... just in case ! ;) Then user can decide if he wants it or not by chosing the mode... ;) :) So I hope by protecting the bits like that it will detect most if not all problems with gpu/memory corruption ? Bye, Skybuck.
From: Skybuck Flying on 11 Feb 2010 14:57 Also I just realized something... CRC32 is too big... it requires 32 bits. There are only 16 bits available for storing an integrity code... Also it seems crc32 does 1 memory lookup per byte... so there are 3x2 bytes is 6 bytes, which would mean 6 memory lookups... which is a bit much for my taste... it might be acceptable anyway for a first version... but I would rather have something better... something that doesn't require a memory lookup and which fits in 16 bits... that would be nice ! ;) :) Bye, Skybuck.
From: Dave -Turner on 12 Feb 2010 06:57 so use crc16
From: Skybuck Flying on 12 Feb 2010 11:13
"Dave -Turner" <admin(a)127.0.0.1> wrote in message news:suOdnXyGC-Hf3-jWnZ2dnUVZ7v-dnZ2d(a)westnet.com.au... > so use crc16 I will try... However I still need an error correcting code for mode 3 (See new posting)... Any idea's ? I have 0% experience with error correcting codes I am afraid ! ;) :) Bye, Skybuck. |