From: Skybuck Flying on 12 Feb 2010 11:12 > For me it could be interesting to make two modes of operations for the > final product: > > First mode which uses 3x16 bits in registers and memory. > Second mode which uses 4x16 bits in registers and memory. I want to expand on these modes. There would be three modes: 1. "Casual/Normal mode" without error checking 2. "Paranoid/Strong Error Checking mode" with crc16 error checking. 3. "Cry like a baby mode/Error correcting mode" with ??? error correction. Mode 1 would give maximum speed. Mode 2 would be paranoid mode to detect if hardware fails. Mode 3 would allow people with damaged hardware to still do some decent calculations. Also mode 3 would allow the program/product to simply continue as good as possible, while mode 2 would break off which is kinda harsh but ok that's what mode 3 is for if you want to continue as good as possible ;) :) The remaining question is therefore: What error correcting code to use for mode 3 ? Bye, Skybuck.
From: rickman on 12 Feb 2010 16:19 On Feb 12, 11:12 am, "Skybuck Flying" <IntoTheFut...(a)hotmail.com> wrote: > > For me it could be interesting to make two modes of operations for the > > final product: > > > First mode which uses 3x16 bits in registers and memory. > > Second mode which uses 4x16 bits in registers and memory. > > I want to expand on these modes. > > There would be three modes: > > 1. "Casual/Normal mode" without error checking > 2. "Paranoid/Strong Error Checking mode" with crc16 error checking. > 3. "Cry like a baby mode/Error correcting mode" with ??? error correction.. > > Mode 1 would give maximum speed. > Mode 2 would be paranoid mode to detect if hardware fails. > Mode 3 would allow people with damaged hardware to still do some decent > calculations. > > Also mode 3 would allow the program/product to simply continue as good as > possible, while mode 2 would break off which is kinda harsh but ok that's > what mode 3 is for if you want to continue as good as possible ;) :) > > The remaining question is therefore: > > What error correcting code to use for mode 3 ? > > Bye, > Skybuck. A hamming code will allow you to correct 1 bit errors and detect most, if not all 2 bit errors. It has been a long time since I worked with that and in fact, it was for DRAM back in the days when DRAM was not as reliable as it is now (and much smaller). I don't recall the exact number of bits used, but I think 8 bits or less was used to protect a 32 bit word. The calculations are easily done in hardware, but I don't think it is quite so easy to do fast in software. Each bit of the code is a parity done on half the bits of the data. Each bit looks at the data in a different way, e.g. bit 0 looks at each odd bit, bit 1 looks at every other pair of bits, etc. For 48 bits of data, I think you need a 6 bit code word. A single bit error will cause the code to have errors in some combination of bits that when exored with the original code will point to the bad data bit which can then be corrected, including errors in the hamming code. Two detect two bit errors an extra check bit is added that sums parity over the entire word indicating if the number of errors is even or odd. This lets you detect even numbers of errors when then can be flagged without correction. Correcting when there are more than one error mostly produces even more bit errors. This does not use many gates in hardware and can be very fast. In the machine I worked on it was done on the fly on every memory operation. To do it in software would require a number of operations since it requires the bits to be shifted around to get the different parity combinations. On the other hand, parity on memory is pretty much extinct. Why do you think you need to run on the fly memory checks rather than just a power up extensive memory self test? Rick
From: Skybuck Flying on 12 Feb 2010 19:40 The document I linked to in another sub thread had something extra/interesting: It suggested to use a number of extra bits to do a parity check on the memory address ?! To check/test if it came indeed from the correct memory cell. Because if the bits on the address wires would screw up it would fetch the wrong memory address, but the results of it, the memory/data, would still pass the tests with flying colors ;) :) So extra bits would be needed to protect against that. I haven't completely read the document yet... so I am a bit fuzzy how to do that... it probably does a parity bit on the address bits... and then simply stores that parity bit with it... then when it's fetched the cpu could check that... with the address it has in it's cpu registers. For me I have 16 bits available... It would need 6 to 7 bits or so... that means plenty of bits are left to do something else... I was thinking... maybe do some extra hamming code on maybe the reversed bits or so ? Or maybe even better...: use those extra bits to do a full hamming code on the address bits and store that in it as well... it's possible me thinks... very interesting idea, since this would protect against memory corruption and address lines corruption. I wonder... is there any other corruption ? The extra memory checks are just sanity/paranoya checking... it's very important that the results can be trusted :) One document mentioned 3 out of 300 gpu's had problems, but these were "beta" products. Also this is the first time I will be using gpu's and it's memory so I want to be sure that nothing weird is going on... now or in the future... it's also a "feel good thing"... "user doesn't have to worry about it"... and users with bad hardware are protected as well and might be able to get some results even with bad hardware ;) And finally it's always interesting to learn new programming techniques ;) :) and make products more reliable if wanted :) I can add it too my toolbelt of skills ! ;) :) Bye, Skybuck.
|
Pages: 1 Prev: Chip and PIN is Broken Next: Algebraic attacks and data word rotations |