Sujet : Re: DRAM accommodations
De : blockedofcourse (at) *nospam* foo.invalid (Don Y)
Groupes : sci.electronics.designDate : 17. Sep 2024, 15:57:39
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vcc5d8$3igif$1@dont-email.me>
References : 1 2
User-Agent : Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2
On 9/17/2024 6:47 AM, Chris Jones wrote:
On 6/09/2024 8:54 am, Don Y wrote:
Given the high rate of memory errors in DRAM, what steps
are folks taking to mitigate the effects of these?
>
Or, is ignorance truly bliss? <frown>
Do we know whether DRAM chips implement ECC internally?
Some do (some processors implement ECC on internal data pathways!).
But, I've never seen any details of the mechanism(s) employed and
it's not likely that manufacturers would be eager to release those
details (competitive advantage, leaks information about how good their
process is, how close to their technological capacity they are
operating, etc.).
It seems an obvious thing for them to do. Of course it wouldn't help with bad solder joints on the DIMM, but it would help with many kinds of faults on the chip.
It also won't help with transfers between CPU and memory device,
subtle timing errors in the implementation, etc.
But, you have to remember: ECC isn't a panacea.
- It doesn't correct *all* errors (e.g., original SECDED just
corrected a single bit error)
- It can MIScorrect errors
- It doesn't DETECT all errors (e.g., it only reliably detects TWO
errors; for k-bit data, there will be 2^k code words that appear
"correct" -- a number identical to the actual number of code words
that *are* correct! -- yet have UNDETECTABLE errors), etc.
There is also often a cost to the ECC operation in terms of time,
power consumption, etc.
And, if you hide the functioning of the ECC inside the memory device,
then the application has no way of gauging how well the memory is
performing with/without the ECC functionality! You never know if the
ECC is only occasionally fixing stored data OR if it is fixing EVERY
access! (in the latter case, one should be wary of the number of
mistakes it is possibly making as well as the number of undetectable
errors that are slipping past it!)
Needless to say, there is a lot of research into alternative ECC
schemes that try to address different aspects of DRAM faults and
failures. But, naively expecting DRAM to store what you write to
it is a fairy tale. So, you should have, in place, a strategy to
address those likely failures in your product design (or, just
blame it on "the software" :> )