Sujet : Re: Cost of handling misaligned access
De : tkoenig (at) *nospam* netcologne.de (Thomas Koenig)
Groupes : comp.archDate : 03. Feb 2025, 20:41:10
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vnr64m$1e7sb$1@dont-email.me>
References : 1 2 3 4
User-Agent : slrn/1.0.3 (Linux)
EricP <
ThatWouldBeTelling@thevillage.com> schrieb:
That is fine for code that is being actively maintained and backward
data structure compatibility is not required (like those inside a kernel).
>
However for x86 there was a few billion lines of legacy code that likely
assumed 2-byte alignment, or followed the fp64 aligned to 32-bits advice,
and a C language that mandates structs be laid out in memory exactly as
specified (no automatic struct optimization). Also I seem to recall some
amount of squawking about SIMD when it required naturally aligned buffers.
As SIMD no longer requires alignment, presumably code no longer does so.
Looking at Intel's optimization manual, they state in
"15.6 DATA ALIGNMENT FOR INTEL® AVX"
"Assembly/Compiler Coding Rule 65. (H impact, M generality) Align
data to 32-byte boundary when possible. Prefer store alignment
over load alignment."
and further down, about AVX-512,
"18.23.1 Align Data to 64 Bytes"
"Aligning data to vector length is recommended. For best results,
when using Intel AVX-512 instructions, align data to 64 bytes.
When doing a 64-byte Intel AVX-512 unaligned load/store, every
load/store is a cache-line split, since the cache-line is 64
bytes. This is double the cache line split rate of Intel AVX2
code that uses 32-byte registers. A high cache-line split rate in
memory-intensive code can cause poor performance."
This sounds reasonable, and good advice if you want to go
down SIMD lane.
Also in going from 32 to 64 bits, data structures that contain pointers
now could find those 8-byte pointers aligned on 4-byte boundaries.
This is mandated by the relevant ABI, and ABIs usually mandate
alignment on natural boundaries.
While the Linux kernel may not use many misaligned values,
I'd guess there is a lot of application code that does.
Unless it is generating external binary data (a _very_ bad idea,
XDR was developed for a reason), there is no big reason to use
unaligned data, unless somebody is playing fast and loose
with C pointer types, and that is a bad idea anyway.
Alternatively, a compiler could use it to implement somthing like
memcpy or memmove when it knows that unaligned accesses are safe.
But it would be really interesting to have a access to a system
where unaligned accesses trap, in order to find (and fix) ABI
issues and some undefined behavior on the C side.