Liste des Groupes | Revenir à c arch |
On 2/3/2025 1:41 PM, Thomas Koenig wrote:Iroincally^2 is that vVM allows each implementation to decide on howEricP <ThatWouldBeTelling@thevillage.com> schrieb:>
>That is fine for code that is being actively maintained and backward>
data structure compatibility is not required (like those inside a
kernel).
>
However for x86 there was a few billion lines of legacy code that likely
assumed 2-byte alignment, or followed the fp64 aligned to 32-bits
advice,
and a C language that mandates structs be laid out in memory exactly as
specified (no automatic struct optimization). Also I seem to recall some
amount of squawking about SIMD when it required naturally aligned
buffers.
As SIMD no longer requires alignment, presumably code no longer does so.
Looking at Intel's optimization manual, they state in
"15.6 DATA ALIGNMENT FOR INTEL® AVX"
>
"Assembly/Compiler Coding Rule 65. (H impact, M generality) Align
data to 32-byte boundary when possible. Prefer store alignment
over load alignment."
>
and further down, about AVX-512,
>
"18.23.1 Align Data to 64 Bytes"
>
"Aligning data to vector length is recommended. For best results,
when using Intel AVX-512 instructions, align data to 64 bytes.
>
When doing a 64-byte Intel AVX-512 unaligned load/store, every
load/store is a cache-line split, since the cache-line is 64
bytes. This is double the cache line split rate of Intel AVX2
code that uses 32-byte registers. A high cache-line split rate in
memory-intensive code can cause poor performance."
>
This sounds reasonable, and good advice if you want to go
down SIMD lane.
>
This is, ironically, a place where SIMD via ganged registers has an
advantage over SIMD via large monolithic registers.
Les messages affichés proviennent d'usenet.