From: Skybuck Flying on
> For me it could be interesting to make two modes of operations for the
> final product:
>
> First mode which uses 3x16 bits in registers and memory.
> Second mode which uses 4x16 bits in registers and memory.

I want to expand on these modes.

There would be three modes:

1. "Casual/Normal mode" without error checking
2. "Paranoid/Strong Error Checking mode" with crc16 error checking.
3. "Cry like a baby mode/Error correcting mode" with ??? error correction.

Mode 1 would give maximum speed.
Mode 2 would be paranoid mode to detect if hardware fails.
Mode 3 would allow people with damaged hardware to still do some decent
calculations.

Also mode 3 would allow the program/product to simply continue as good as
possible, while mode 2 would break off which is kinda harsh but ok that's
what mode 3 is for if you want to continue as good as possible ;) :)

The remaining question is therefore:

What error correcting code to use for mode 3 ?

Bye,
Skybuck.


From: rickman on
On Feb 12, 11:12 am, "Skybuck Flying" <IntoTheFut...(a)hotmail.com>
wrote:
> > For me it could be interesting to make two modes of operations for the
> > final product:
>
> > First mode which uses 3x16 bits in registers and memory.
> > Second mode which uses 4x16 bits in registers and memory.
>
> I want to expand on these modes.
>
> There would be three modes:
>
> 1. "Casual/Normal mode" without error checking
> 2. "Paranoid/Strong Error Checking mode" with crc16 error checking.
> 3. "Cry like a baby mode/Error correcting mode" with ??? error correction..
>
> Mode 1 would give maximum speed.
> Mode 2 would be paranoid mode to detect if hardware fails.
> Mode 3 would allow people with damaged hardware to still do some decent
> calculations.
>
> Also mode 3 would allow the program/product to simply continue as good as
> possible, while mode 2 would break off which is kinda harsh but ok that's
> what mode 3 is for if you want to continue as good as possible ;) :)
>
> The remaining question is therefore:
>
> What error correcting code to use for mode 3 ?
>
> Bye,
>   Skybuck.

A hamming code will allow you to correct 1 bit errors and detect most,
if not all 2 bit errors. It has been a long time since I worked with
that and in fact, it was for DRAM back in the days when DRAM was not
as reliable as it is now (and much smaller). I don't recall the
exact number of bits used, but I think 8 bits or less was used to
protect a 32 bit word. The calculations are easily done in hardware,
but I don't think it is quite so easy to do fast in software. Each
bit of the code is a parity done on half the bits of the data. Each
bit looks at the data in a different way, e.g. bit 0 looks at each odd
bit, bit 1 looks at every other pair of bits, etc. For 48 bits of
data, I think you need a 6 bit code word. A single bit error will
cause the code to have errors in some combination of bits that when
exored with the original code will point to the bad data bit which can
then be corrected, including errors in the hamming code. Two detect
two bit errors an extra check bit is added that sums parity over the
entire word indicating if the number of errors is even or odd. This
lets you detect even numbers of errors when then can be flagged
without correction. Correcting when there are more than one error
mostly produces even more bit errors.

This does not use many gates in hardware and can be very fast. In the
machine I worked on it was done on the fly on every memory operation.
To do it in software would require a number of operations since it
requires the bits to be shifted around to get the different parity
combinations.

On the other hand, parity on memory is pretty much extinct. Why do
you think you need to run on the fly memory checks rather than just a
power up extensive memory self test?

Rick
From: Skybuck Flying on
The document I linked to in another sub thread had something
extra/interesting:

It suggested to use a number of extra bits to do a parity check on the
memory address ?! To check/test if it came indeed from the correct memory
cell.

Because if the bits on the address wires would screw up it would fetch the
wrong memory address, but the results of it, the memory/data, would still
pass the tests with flying colors ;) :)

So extra bits would be needed to protect against that. I haven't completely
read the document yet... so I am a bit fuzzy how to do that... it probably
does a parity bit on the address bits... and then simply stores that parity
bit with it... then when it's fetched the cpu could check that... with the
address it has in it's cpu registers.

For me I have 16 bits available... It would need 6 to 7 bits or so... that
means plenty of bits are left to do something else... I was thinking...
maybe do some extra hamming code on maybe the reversed bits or so ?

Or maybe even better...: use those extra bits to do a full hamming code on
the address bits and store that in it as well... it's possible me thinks...
very interesting idea, since this would protect against memory corruption
and address lines corruption.

I wonder... is there any other corruption ?

The extra memory checks are just sanity/paranoya checking... it's very
important that the results can be trusted :) One document mentioned 3 out of
300 gpu's had problems, but these were "beta" products.

Also this is the first time I will be using gpu's and it's memory so I want
to be sure that nothing weird is going on... now or in the future... it's
also a "feel good thing"... "user doesn't have to worry about it"... and
users with bad hardware are protected as well and might be able to get some
results even with bad hardware ;)

And finally it's always interesting to learn new programming techniques ;)
:) and make products more reliable if wanted :) I can add it too my toolbelt
of skills ! ;) :)

Bye,
Skybuck.