Liste des Groupes | Revenir à c arch |
MitchAlsup1 wrote:Although I cannot name a given thing where your argument permanentlyOn Mon, 17 Feb 2025 9:37:57 +0000, Terje Mathisen wrote:>
>Marcus wrote:>On 2025-02-03, Anton Ertl wrote:>BGB <cr88192@gmail.com> writes:>On 2/2/2025 10:45 AM, EricP wrote:>Digging deeper with performance counters reveals executing each>
unaligned
load instruction results in ~505 executed instructions. P550 almost
certainly doesn’t have hardware support for unaligned
accesses.
Rather, it’s likely raising a fault and letting an
operating system
handler emulate it in software."
>
An emulation fault, or something similarly nasty...
>
>
At that point, even turning any potentially unaligned load or store
into
a runtime call is likely to be a lot cheaper.
There are lots of potentially unaligned loads and stores. There are
very few actually unaligned loads and stores: On Linux-Alpha every
unaligned access is logged by default, and the number of
unaligned-access entries in the logs of our machines was relatively
small (on average a few per day). So trapping actual unaligned
accesses was faster than replacing potential unaligned accesses with
code sequences that synthesize the unaligned access from aligned
accesses.
If you compile regular C/C++ code that does not intentionally do any
nasty stuff, you will typically have zero unaligned loads stores.
>
My machine still does not support unaligned accesses in hardware (it's
on the todo list), and it can run an awful lot of software without
problems.
>
The problem arises when the programmer *deliberately* does unaligned
loads and stores in order to improve performance. Or rather, if the
programmer knows that the hardware supports unaligned loads and stores,
he/she can use that to write faster code in some special cases.
No, the real problem is when a compiler want to auto-vectorize any code
working with 1/2/4/8 byte items: All of a sudden the alignment
requirement went from the item stride to the vector register stride
(16/32/64 bytes).
If you provide misaligned access to SIMD registers, why not provide
misaligned access to all memory references !?!
>
I made this argument several times in my career.
>The only way this can work is to have the compiler control _all_>
allocations to make sure they are properly aligned, including code in
libraries, or the compiler will be forced to use vector load/store
operations which do allow unaligned access.
Either the entire environment has to be "air tight" or the HW
provides misaligned access at low cost. {{Good luck on the air
tight thing...}}
This is just one of many details where we've agreed for a decade or two
(three?). Some of them you persuaded me you were right, I don't remember
any obvious examples of the opposite, but most we figured out
independently. :-)
Terje
>
Les messages affichés proviennent d'usenet.