Newsportal USENET - Re: Cost of handling misaligned access

Re: Cost of handling misaligned access

Sujet : Re: Cost of handling misaligned access
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.arch
Date : 06. Feb 2025, 11:30:49

Autres entêtes

Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Feb6.113049@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8
User-Agent : xrn 10.11

Terje Mathisen <terje.mathisen@tmsw.no> writes:

Anton Ertl wrote:
If you have ever learned about vectorization, it's easy to see that
the inner loop can be vectorized. And obviously auto-vectorization
has worked in this case, not particularly amazing to me.
>
I have some (30 years?) experience with auto-vectorization, usually I've
been (very?) disappointed.

I have often been disappointed, too.

As I wrote this was the best I have ever
seen, and the resulting code actually performed extremely close to
theoretical speed of light, i.e. 3 clock cycles for each 3 avx instruction.

What theory is behind that? If we take the inner loop (from my
version that uses unsigned, not unsigned long) by clang-14.0.6 -O3
-mavx2:

50:   c5 f5 db 34 0a vpand (%rdx,%rcx,1),%ymm1,%ymm6
55:   c5 f5 db 7c 0a 20    vpand 0x20(%rdx,%rcx,1),%ymm1,%ymm7
5b:   c5 75 db 44 0a 40    vpand 0x40(%rdx,%rcx,1),%ymm1,%ymm8
61:   c5 75 db 4c 0a 60    vpand 0x60(%rdx,%rcx,1),%ymm1,%ymm9
67:   c5 cd 76 f2    vpcmpeqd %ymm2,%ymm6,%ymm6
6b:   c5 fd fa c6    vpsubd %ymm6,%ymm0,%ymm0
6f:   c5 c5 76 f2    vpcmpeqd %ymm2,%ymm7,%ymm6
73:   c5 e5 fa de    vpsubd %ymm6,%ymm3,%ymm3
77:   c5 bd 76 f2    vpcmpeqd %ymm2,%ymm8,%ymm6
7b:   c5 dd fa e6    vpsubd %ymm6,%ymm4,%ymm4
7f:   c5 b5 76 f2    vpcmpeqd %ymm2,%ymm9,%ymm6
83:   c5 d5 fa ee    vpsubd %ymm6,%ymm5,%ymm5
87:   48 83 e9 80    sub $0xffffffffffffff80,%rcx
8b:   48 39 c8 cmp %rcx,%rax
8e:   75 c0    jne 50 <inner+0x50>

I see that clang uses 4 ymm accumulators (ymm0, ymm3, ymm4, ymm5), so
the recurrences are only 1-cycle recurrences for the 4 vsubd
instructions and the sub instruction. So, with enough resources, a
CPU core could perform 1 iteration per cycle (and with hardware
reassociation, even faster, but apart from adding constants in Alder
Lake ff., we are not there yet). But current CPUs do not have that
many resources. If we want to determine the maximum speed given
resource limits, we have to look at the concrete CPU model.

BTW, an alternative would be to do some summation already in each
iteration, but with still only a one-cycle recurrence. This could
have looked like this:

vpand (%rdx,%rcx,1),%ymm1,%ymm6
vpand 0x20(%rdx,%rcx,1),%ymm1,%ymm7
vpand 0x40(%rdx,%rcx,1),%ymm1,%ymm8
vpand 0x60(%rdx,%rcx,1),%ymm1,%ymm9
vpcmpeqd %ymm2,%ymm6,%ymm6
vpcmpeqd %ymm2,%ymm7,%ymm7
vpaddd %ymm6,%ymm7,%ymm7
vpcmpeqd %ymm2,%ymm8,%ymm8
vpcmpeqd %ymm2,%ymm9,%ymm9
vpaddd %ymm8,%ymm9,%ymm9
vpaddd %ymm7,%ymm9,%ymm9
vpsubd %ymm0,%ymm9,%ymm0
sub $0xffffffffffffff80,%rcx
cmp %rcx,%rax
jne 50 <inner+0x50>

Here the SIMD recurrence uses ymm0. This saves having to perform the
summing up of SIMD accumulators after the inner loop. gcc uses
something like that in one of the codes I looked at.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
2 Feb 25	Re: Cost of handling misaligned access	112	BGB
3 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	109	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	11	BGB
3 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	8	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	7	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	5	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	2	Thomas Koenig
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
4 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	3	Thomas Koenig
3 Feb 25	Re: Cost of handling misaligned access	2	BGB
3 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
4 Feb 25	Re: Cost of handling misaligned access	41	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	40	Terje Mathisen
5 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	35	Michael S
6 Feb 25	Re: Cost of handling misaligned access	32	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	31	Michael S
6 Feb 25	Re: Cost of handling misaligned access	2	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	28	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	27	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	26	Michael S
6 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	19	Michael S
7 Feb 25	Re: Cost of handling misaligned access	18	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	17	Michael S
7 Feb 25	Re: Cost of handling misaligned access	16	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	15	Michael S
7 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	3	MitchAlsup1
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
8 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	6	Michael S
8 Feb 25	Re: Cost of handling misaligned access	5	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	2	Michael S
11 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
10 Feb 25	Re: Cost of handling misaligned access	1	Michael S
7 Feb 25	Re: Cost of handling misaligned access	5	BGB
7 Feb 25	Re: Cost of handling misaligned access	4	MitchAlsup1
7 Feb 25	Re: Cost of handling misaligned access	3	BGB
8 Feb 25	Re: Cost of handling misaligned access	2	Anssi Saari
8 Feb 25	Re: Cost of handling misaligned access	1	BGB
6 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	5	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	3	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	2	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
13 Feb 25	Re: Cost of handling misaligned access	48	Marcus
13 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
14 Feb 25	Re: Cost of handling misaligned access	41	BGB
14 Feb 25	Re: Cost of handling misaligned access	40	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	39	BGB
18 Feb 25	Re: Cost of handling misaligned access	33	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	1	BGB
18 Feb 25	Re: Cost of handling misaligned access	31	Michael S
18 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
18 Feb 25	Re: Cost of handling misaligned access	26	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
18 Feb 25	Re: Cost of handling misaligned access	24	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	23	Terje Mathisen
19 Feb 25	Re: Cost of handling misaligned access	22	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	21	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
20 Feb 25	Re: Cost of handling misaligned access	5	MitchAlsup1
20 Feb 25	Re: Cost of handling misaligned access	2	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	2	Robert Finch
21 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	14	BGB
22 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
22 Feb 25	Re: Cost of handling misaligned access	12	Robert Finch
23 Feb 25	Re: Cost of handling misaligned access	10	BGB
23 Feb 25	Re: Cost of handling misaligned access	9	Michael S
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	7	Michael S
24 Feb 25	Re: Cost of handling misaligned access	4	Robert Finch
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
25 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
23 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
18 Feb 25	Re: Cost of handling misaligned access	3	BGB
19 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	5	Robert Finch
17 Feb 25	Re: Cost of handling misaligned access	5	Terje Mathisen