Sujet : Re: Efficiency of in-order vs. OoO
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 10. Mar 2024, 20:31:10
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <e8c9669118816848755cdd643e330295@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
User-Agent : Rocksolid Light
EricP wrote:
MitchAlsup1 wrote:
Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
Paul A. Clayton wrote:
>
For memory reads, the late failure generated by an uncorrectable
ECC error would probably have to be handled differently or there
would probably be little opportunity to exploit out-of-order
retirement. It might not be entirely unreasonable to treat such as
a fatal thread error that is asynchronous.
>
What about for memory stores where the ECC check on the delivered data fails ?? This seems to be just as fatal as a LD with an ECC fail.
As most stores are posted, the data stored needs to be 'poisoned'
so that any subsequent use of the data (e.g. a load) will report
a fault.
Storing the bad <arriving> ECC should take care of that.
I don't think that will always work. Assuming we are using a
72-bit SECDED ECC and a cache line is read with a double error,
then if the ST overwrites an 8 byte aligned value it will generate
a new valid ECC and correct the error.
For my scenario to transpire:: the cache line written back would have
had to be read in the L1/L2-cache with correct ECC (which accompanies
the line to DRAM controller) and the whole line would be written into
DRAM with the original ECC.
However if the ST is less than 8 bytes or misaligned, it won't know which
of the 8 bytes was invalid so can't tell if the bad data was overwritten.
If it keeps the old ECC as an error indicator, that code might actually be
correct for the new data. If it generates a new valid ECC then it loses
track of the fact that the data MAY be invalid.
Even uncacheable DRAM is accessed line-at-a-time.
In this second case of partial overwrite I think it has to generate a
new invalid ECC for the new 8 byte data indicating a double error.
It knows which DoubleWords contain bad ECC ...
When the modified line is written back to DRAM it retains the
double error ECC.
Straight from the CPU cache.