Sujet : unaligned load/store (was: Re: Keeping other stuff with addresses)
De : jonathan (at) *nospam* gold.bkis-orchard.net (Jonathan Thornburg)
Groupes : comp.archDate : 22. Dec 2024, 00:22:35
Autres entêtes
Message-ID : <lsp0tqFs7aoU1@mid.individual.net>
References : 1 2 3 4 5 6 7
MitchAlsup1 <
mitchalsup@aol.com> wrote:
FORTRAN COMMON blocks require misaligned accesses to double precision
data.
R E Q U I R E in that it is neither optional nor wise to emulate with
exceptions. It is just barely tolerable using LD/ST Left/Right
instructions
out of the compiler.
I, personally, went through enough PAIN with misalignment, that over
time my mood swung from "aligned only" to "completely misaligned"::
a) because there is no performant* SW workaround
b) it is SO easy to fix in HW.
c) once fixed in HW, any SW burden is so small as to be barely
..measurable.
I'm not so sure (b) is true. Some cases are moderately easy to handle
in hardware (e.g., misaligned loads that stay within a single L1 D-cache
line), but some cases are harder (e.g., misaligned writes that cross L1
D-cache line boundaries) and might need a microcode trap (awkward if the
design wasn't otherwise using microcode). And some cases are even harder
(e.g., misaligned writes crossing L1 D-cache line boundaries where the
two lines are owned by different CPUs in a cache-coherent multiprocessor)
and might need a millicode trap. And some cases may require going all the
way up to the OS (e.g., misaligned writes that cross virtual-memory-page
boundaries where one page is ok but the other is non-resident).
So, allowing this in the architecture has several costs:
* extra hardware implementation effort to make sure the "hardware" cases
don't cost an extra gate delay or two on some critical path
* extra complexity and debugging time in hardware and in system software
(think about writing and *debugging* and *verifying* microcode/millicode
trap handlers for all those messy write-crossing-cache/page-boundary
cases, especially their interactions with multiprocessor cache coherency)
* this extra effort means a longer design time and/or greater design cost,
and hence (so long as the state-of-the-art of competing systems is still
steadily improving with time) that means a net lower price/performance
relative to competing systems
And, because of the traps and their overheads (which will likely differ
significantly across different implementations of the same architecture,
e.g., different multiprocessor cache-coherency protocols), any code that
actually *uses* unaligned accesses -- especially unaligned writes -- isn't
performance-portable unless the actual dynamic frequency of unaligned
operations is very low.
So yes, allowing unaligned access does help "dusty deck" Fortran code...
but it comes at a significant cost.
-- -- "Jonathan Thornburg [remove -color to reply]" <jt.bhbkis@gmail-pink.com> on the west coast of Canada "the stock market can remain irrational a lot longer than you can remain solvent" or (probably the correct original wording) "markets can remain irrational a lot longer than you and I can remain solvent"
-- A. Gary Shilling (often misattributed to John Maynard Keynes)