Liste des Groupes | Revenir à c arch |
The incremental cost is in a sequencer in the AGU for handling cache...
line and possibly virtual page straddles, and a small byte shifter to
left shift the high order bytes. The AGU sequencer needs to know if the
line straddles a page boundary, if not then increment the 6-bit physical
line number within the 4 kB physical frame number, if yes then increment
virtual page number and TLB lookup again and access the first line.
(Slightly more if multiple page sizes are supported, but same idea.)
For a load AGU merges the low and high fragments and forwards.
The hardware cost appears trivial, especially within an OoO core.
So there doesn't appear to be any reason to not handle this.
Am I missing something?
https://old.chipsandcheese.com/2025/01/26/inside-sifives-p550-microarchitecture/...
This terrible unaligned access behavior is atypical even for low power
cores. Arm's Cortex A75 only takes 15 cycles in the worst case of
dependent accesses that are both misaligned.
>
Digging deeper with performance counters reveals executing each unaligned
load instruction results in ~505 executed instructions.
P550 almost
certainly doesn�t have hardware support for unaligned accesses.
Rather, it�s likely raising a fault and letting an operating system
handler emulate it in software."
Les messages affichés proviennent d'usenet.