Newsportal USENET - Re: Cost of handling misaligned access

Re: Cost of handling misaligned access

Sujet : Re: Cost of handling misaligned access
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 03. Feb 2025, 22:40:21

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <vnrd49$1f52h$2@dont-email.me>
References : 1 2 3 4 5
User-Agent : Mozilla Thunderbird

On 2/3/2025 1:41 PM, Thomas Koenig wrote:

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

That is fine for code that is being actively maintained and backward
data structure compatibility is not required (like those inside a kernel).
>
However for x86 there was a few billion lines of legacy code that likely
assumed 2-byte alignment, or followed the fp64 aligned to 32-bits advice,
and a C language that mandates structs be laid out in memory exactly as
specified (no automatic struct optimization). Also I seem to recall some
amount of squawking about SIMD when it required naturally aligned buffers.
As SIMD no longer requires alignment, presumably code no longer does so.
Looking at Intel's optimization manual, they state in
"15.6 DATA ALIGNMENT FOR INTEL® AVX"
"Assembly/Compiler Coding Rule 65. (H impact, M generality) Align
data to 32-byte boundary when possible. Prefer store alignment
over load alignment."
and further down, about AVX-512,
"18.23.1 Align Data to 64 Bytes"
"Aligning data to vector length is recommended. For best results,
when using Intel AVX-512 instructions, align data to 64 bytes.
When doing a 64-byte Intel AVX-512 unaligned load/store, every
load/store is a cache-line split, since the cache-line is 64
bytes. This is double the cache line split rate of Intel AVX2
code that uses 32-byte registers. A high cache-line split rate in
memory-intensive code can cause poor performance."
This sounds reasonable, and good advice if you want to go
down SIMD lane.

This is, ironically, a place where SIMD via ganged registers has an advantage over SIMD via large monolithic registers.
With ganged registers, it means one can load/store them piecewise as needed, and use unaligned loads/stores (with the larger forms being able to actively require natural alignment).
Though, granted, large monolithic registers are a more popular option vs ganged registers.
And, you can make the registers larger without either effectively halving the number of longer registers, or needing to double the number of shorter registers.
But, at the cost that much of the high-order bits of the registers will be essentially wasted for code operating on narrower vectors.
Say, if one has:
   64x 64-bit vectors (group of 1);
   32x 128-bit vectors (group of 2);
   16x 256-bit vectors (group of 4);
   8x 512-bit vectors (group of 8).
If they wanted a 1024-bit vector, they can make a choice:
   Live with only 4 vectors;
   Expand the size of the register file to 128x 64-bit vectors;
   Live with asymmetric wonk
   Parts of the register space only being accessible at larger sizes.
   ...
Though, with monolithic registers, each doubling of the register size also effectively mandates either a whole new set of instructions to deal with the larger size, or some other way to encode or specify the size (or, "who knows, it is whatever it is, software can figure it out"...).
This is less true of ganged registers.
   Say, if the CPU supported it, they could add, say:
   PADDX4.F //256-bit Binary32 ADD
   PSUBX4.F //256-bit Binary32 SUB
   PMULX4.F //256-bit Binary32 MUL
   ...
   While leaving everything else the same as before.
   The addition of wider load/store operations being optional.
   Don't have 256-bit Ld/St, use 128-bit Ld/St.
   Need fully unaligned access, use 64-bit Ld/St's.
   ...
And also making it easy for narrower implementations to simply crack the instructions into 128-bit vector operations internally (which may actually be implemented as two 64 bit vector ops running in parallel).
But, say, the pipeline could be designed internally around 64-bit vector ops, with a 4-wide machine able to do 256-bit vector operations mostly by supporting a 64-bit vector operation on each lane.
And, you can more easily "pretend" in the compiler to have whichever vector size you want. Code asks for 256 bit vectors but target only has 128? Just fake it using 128-bit ops.
But, granted, most ISAs aren't doing SIMD this way.
...

Also in going from 32 to 64 bits, data structures that contain pointers
now could find those 8-byte pointers aligned on 4-byte boundaries.
This is mandated by the relevant ABI, and ABIs usually mandate
alignment on natural boundaries.

While the Linux kernel may not use many misaligned values,
I'd guess there is a lot of application code that does.
Unless it is generating external binary data (a _very_ bad idea,
XDR was developed for a reason), there is no big reason to use
unaligned data, unless somebody is playing fast and loose
with C pointer types, and that is a bad idea anyway.

Often needed for speed in many cases.

Alternatively, a compiler could use it to implement somthing like
memcpy or memmove when it knows that unaligned accesses are safe.

Basically required unless you want them to be slow.
The aligned-only versions will almost invariably be slower, potentially significantly slower.

But it would be really interesting to have a access to a system
where unaligned accesses trap, in order to find (and fix) ABI
issues and some undefined behavior on the C side.

It may make sense to add some form of categorical separations:
Pointers that may be unaligned;
Pointers that must be aligned.
Trapping on unaligned being a reasonable option for the latter case.
Really needs to be per-pointer or per-access though, and not a global flag (which makes it kind of useless).
Some compilers have __aligned and __unaligned keywords.
Something like "[[aligned]]" and "[[unaligned]]" could also make sense, with the default likely depending on type and implementation...

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
2 Feb 25	Re: Cost of handling misaligned access	112	BGB
3 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	109	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	11	BGB
3 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	8	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	7	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	5	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	2	Thomas Koenig
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
4 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	3	Thomas Koenig
3 Feb 25	Re: Cost of handling misaligned access	2	BGB
3 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
4 Feb 25	Re: Cost of handling misaligned access	41	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	40	Terje Mathisen
5 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	35	Michael S
6 Feb 25	Re: Cost of handling misaligned access	32	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	31	Michael S
6 Feb 25	Re: Cost of handling misaligned access	2	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	28	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	27	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	26	Michael S
6 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	19	Michael S
7 Feb 25	Re: Cost of handling misaligned access	18	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	17	Michael S
7 Feb 25	Re: Cost of handling misaligned access	16	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	15	Michael S
7 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	3	MitchAlsup1
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
8 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	6	Michael S
8 Feb 25	Re: Cost of handling misaligned access	5	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	2	Michael S
11 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
10 Feb 25	Re: Cost of handling misaligned access	1	Michael S
7 Feb 25	Re: Cost of handling misaligned access	5	BGB
7 Feb 25	Re: Cost of handling misaligned access	4	MitchAlsup1
7 Feb 25	Re: Cost of handling misaligned access	3	BGB
8 Feb 25	Re: Cost of handling misaligned access	2	Anssi Saari
8 Feb 25	Re: Cost of handling misaligned access	1	BGB
6 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	5	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	3	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	2	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
13 Feb 25	Re: Cost of handling misaligned access	48	Marcus
13 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
14 Feb 25	Re: Cost of handling misaligned access	41	BGB
14 Feb 25	Re: Cost of handling misaligned access	40	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	39	BGB
18 Feb 25	Re: Cost of handling misaligned access	33	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	1	BGB
18 Feb 25	Re: Cost of handling misaligned access	31	Michael S
18 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
18 Feb 25	Re: Cost of handling misaligned access	26	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
18 Feb 25	Re: Cost of handling misaligned access	24	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	23	Terje Mathisen
19 Feb 25	Re: Cost of handling misaligned access	22	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	21	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
20 Feb 25	Re: Cost of handling misaligned access	5	MitchAlsup1
20 Feb 25	Re: Cost of handling misaligned access	2	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	2	Robert Finch
21 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	14	BGB
22 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
22 Feb 25	Re: Cost of handling misaligned access	12	Robert Finch
23 Feb 25	Re: Cost of handling misaligned access	10	BGB
23 Feb 25	Re: Cost of handling misaligned access	9	Michael S
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	7	Michael S
24 Feb 25	Re: Cost of handling misaligned access	4	Robert Finch
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
25 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
23 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
18 Feb 25	Re: Cost of handling misaligned access	3	BGB
19 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	5	Robert Finch
17 Feb 25	Re: Cost of handling misaligned access	5	Terje Mathisen