Sujet : Re: Efficiency of in-order vs. OoO
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 13. Mar 2024, 16:36:50
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <f62c31b943bb5f89ed14f4cc359762bb@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
User-Agent : Rocksolid Light
EricP wrote:
Scott Lurndal wrote:
EricP <ThatWouldBeTelling@thevillage.com> writes:
Scott Lurndal wrote:
EricP <ThatWouldBeTelling@thevillage.com> writes:
EricP wrote:
As most stores are posted, the data stored needs to be 'poisoned'
so that any subsequent use of the data (e.g. a load) will report
a fault.
Storing the bad <arriving> ECC should take care of that.
I don't think that will always work. Assuming we are using a
72-bit SECDED ECC and a cache line is read with a double error,
then if the ST overwrites an 8 byte aligned value it will generate
a new valid ECC and correct the error.
>
However if the ST is less than 8 bytes or misaligned, it won't know which
of the 8 bytes was invalid so can't tell if the bad data was overwritten.
If it keeps the old ECC as an error indicator, that code might actually be
correct for the new data. If it generates a new valid ECC then it loses
track of the fact that the data MAY be invalid.
>
In this second case of partial overwrite I think it has to generate a
new invalid ECC for the new 8 byte data indicating a double error.
>
When the modified line is written back to DRAM it retains the
double error ECC.
And if the page is out swapped and recycled we lose track of
the error indicator on that 8-byte value.
If it was properly poisoned, the access by the DMA engine will
cause a RAS error to be signalled and the DMA aborted.
And the OS does what with the page and its data?
This could happen long after the owner process terminated,
maybe part of a lazy file cache write back.
>
The only option for the OS might be to log the error and just reset
the ECC to valid for the current data so the IO can complete.
No, the I/O must be aborted. RAS 101 - do not propogate
poisoned data.
Consider a page being written out and the last cache line in the page has a bad ECC. What command does one send the disk to indicate "forget
all that data I just sent you" ??
Perhaps but tossing a whole block from an IO expands the size of
the problem by a factor of 1000's.
If that was one byte wrong in a text file then I think most people
would want it written, as opposed to tossing out their work.
If that was one byte wrong in a file system meta data block then
there is no good answer. Many of the meta data blocks are in linked lists
or B+ trees so not writing the block could corrupt a whole file system,
and writing the block could also cause corruption but hopefully less likely.
So you are damned if you do fix the ECC and write the block,
and damned if you don't. But do seems less damning.