Sujet : Re: unaligned load/store (was: Re: Keeping other stuff with addresses)
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 22. Dec 2024, 11:33:01
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024Dec22.113301@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8
User-Agent : xrn 10.11
Jonathan Thornburg <
jonathan@gold.bkis-orchard.net> writes:
And some cases are even harder
(e.g., misaligned writes crossing L1 D-cache line boundaries where the
two lines are owned by different CPUs in a cache-coherent multiprocessor)
and might need a millicode trap.
Made me look up "millicode". Anything at all might need a millicode
trap on implementations that use millicode, but I don't see any
particular issue here that would make millicode particularly relevant.
And some cases may require going all the
way up to the OS (e.g., misaligned writes that cross virtual-memory-page
boundaries where one page is ok but the other is non-resident).
Again, sure, if you access a page that is not present, the hardware
traps to the OS to make that page present, but that's also the case
without unaligned accesses; and with software emulation of unaligned
accesses, as on Alpha with, e.g., UAC_NOPRINT, every unaligned access
traps to the OS. Ist this better?
And, because of the traps and their overheads (which will likely differ
significantly across different implementations of the same architecture,
e.g., different multiprocessor cache-coherency protocols), any code that
actually *uses* unaligned accesses -- especially unaligned writes -- isn't
performance-portable unless the actual dynamic frequency of unaligned
operations is very low.
Possible, but hardly relevant. E.g., I am interested in such things
and I was completely unaware of the penalties of unaligned stores
until I measured them: <
http://al.howardknight.net/?ID=143135464800>
<
https://www.complang.tuwien.ac.at/anton/unaligned-stores/>. I expect
that even among performance-conscious programmers, only a small
minority knows more than to avoid them, when it's cheaply possible.
Maybe some (probably more than are aware of actual costs) think that
they should avoid them at all cost, and then use e.g., bytewise
approaches for hashing strings than the on-average faster approaches
that fetch string data as wide as is practical and hash that.
So yes, allowing unaligned access does help "dusty deck" Fortran code...
but it comes at a significant cost.
It's not just dusty deck Fortran code.
And the cost for not supporting unaligned accesses is higher. There's
a reason why all surviving general-purpose architectures support
unaligned accesses.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>