Sujet : Re: Cost of handling misaligned access
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 03. Feb 2025, 17:43:25
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <bc9384630f7a308e2a780950fb8e1089@www.novabbs.org>
References : 1 2 3
User-Agent : Rocksolid Light
On Mon, 3 Feb 2025 15:15:54 +0000, EricP wrote:
MitchAlsup1 wrote:
On Sun, 2 Feb 2025 16:45:19 +0000, EricP wrote:
>
As you can see in the article below, the cost of NOT handling misaligned
accesses in hardware is quite high in cpu clocks.
>
To my eye, the incremental cost of adding hardware support for
misaligned
to the AGU and cache data path should be quite low. The alignment
shifter
is basically the same: assuming a 64-byte cache line, LD still has to
shift any of the 64 bytes into position 0, and reverse for ST.
>
A handful of gates to detect misalignedness and recognize the line and
page crossing misalignments.
>
The alignment shifters are twice as big.
>
Oh, right, twice the muxes and wires but the critical path length
should be the same - whatever a 64:1 mux is (3 gate delays?).
1 more gate of delay to double sifter width.
So the larger aligner for misaligned shouldn't slow down the whole cache
and penalize the normal aligned case.
Tag :: TLB comparison takes longer than shifting.
Now, while I accept these costs, I accept that others may not. I accept
these costs because of the performance issues when I don't.
>
The incremental cost is in a sequencer in the AGU for handling cache
line and possibly virtual page straddles, and a small byte shifter to
left shift the high order bytes. The AGU sequencer needs to know if the
line straddles a page boundary, if not then increment the 6-bit physical
line number within the 4 kB physical frame number, if yes then increment
virtual page number and TLB lookup again and access the first line.
(Slightly more if multiple page sizes are supported, but same idea.)
For a load AGU merges the low and high fragments and forwards.
>
I don't think there are line straddle consequences for coherence because
there is no ordering guarantees for misaligned accesses.
>
Generally stated as:: Misaligned accesses cannot be considered ATOMIC.
>
That too (I thought of that after hitting send).
What I was thinking of was: are there any coherence ordering issues if
in order to take advantage of the cache's access pipeline,
When you don't cross a cache line, you CAN make them ATOMIC.
When you cross a page boundary, you realistically cannot always*.
It all depends on where you want to draw the line.
Supporting misaligned access serves everyone.
Supporting misaligned ATOMICs serves no one.
(*) Consider the case where the second LD takes a miss in the TLB
and we have a 100+ cycle table walk. You could do something like
take a microfault and rerun after reloading the TLB--but this, then,
opens up a side channel because the TLB was updated before the causing
instruction retires. And since ATOMIC falls into the category of "do
it right is better than do it fast" you should not.
Consider another case where ATOMIC crosses a line boundary, and the
second line ends up with an ECC error ?!?
There are so may side cases to consider, than taking the whole lot of
them and saying "no" is simply best. There is a good case for mis-
aligned support, there is not such a case for misaligned ATOMICs.
the AGU issues both accesses at once, low fragment first, high second,
and the cache has hit-under-miss, and the low fragment misses while
the high fragment hits, as the effect would be the equivalent of a
LD-LD or ST-ST bypass.
>
I don't immediately see a problem, but if there were then AGU would have
to do each fragment synchronously which would double the access latency
for misaligned loads.