Liste des Groupes | Revenir à c arch |
Marcus wrote:If you provide misaligned access to SIMD registers, why not provideOn 2025-02-03, Anton Ertl wrote:>BGB <cr88192@gmail.com> writes:>On 2/2/2025 10:45 AM, EricP wrote:>Digging deeper with performance counters reveals executing each>
unaligned
load instruction results in ~505 executed instructions. P550 almost
certainly doesn’t have hardware support for unaligned accesses.
Rather, it’s likely raising a fault and letting an operating system
handler emulate it in software."
>
An emulation fault, or something similarly nasty...
>
>
At that point, even turning any potentially unaligned load or store into
a runtime call is likely to be a lot cheaper.
There are lots of potentially unaligned loads and stores. There are
very few actually unaligned loads and stores: On Linux-Alpha every
unaligned access is logged by default, and the number of
unaligned-access entries in the logs of our machines was relatively
small (on average a few per day). So trapping actual unaligned
accesses was faster than replacing potential unaligned accesses with
code sequences that synthesize the unaligned access from aligned
accesses.
If you compile regular C/C++ code that does not intentionally do any
nasty stuff, you will typically have zero unaligned loads stores.
>
My machine still does not support unaligned accesses in hardware (it's
on the todo list), and it can run an awful lot of software without
problems.
>
The problem arises when the programmer *deliberately* does unaligned
loads and stores in order to improve performance. Or rather, if the
programmer knows that the hardware supports unaligned loads and stores,
he/she can use that to write faster code in some special cases.
No, the real problem is when a compiler want to auto-vectorize any code
working with 1/2/4/8 byte items: All of a sudden the alignment
requirement went from the item stride to the vector register stride
(16/32/64 bytes).
The only way this can work is to have the compiler control _all_Either the entire environment has to be "air tight" or the HW
allocations to make sure they are properly aligned, including code in
libraries, or the compiler will be forced to use vector load/store
operations which do allow unaligned access.
Terje
Les messages affichés proviennent d'usenet.