Newsportal USENET - Re: Cost of handling misaligned access

Michael S wrote:

On Wed, 5 Feb 2025 18:10:03 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:

Anton Ertl wrote:
EricP <ThatWouldBeTelling@thevillage.com> writes:

As SIMD no longer requires alignment, presumably code no longer
does so.
>
Yes, if you use AVX/AVX2, you don't encounter this particular Intel
stupidity.
>
Recently, on the last day (Dec 25th) of Advent of Code, I had a
problem which lent itself to using 32-bit bitmaps: The task was to
check which locks were compatible with which keys, so I ended up with
code like this:
>
>
let mut part1 = 0;
for l in li..keylocks.len() {
let lock = keylocks[l];
for k in 0..li {
let sum = lock & keylocks[k];
if sum == 0 {
part1 += 1;
}
}
}
>
Telling the rust compiler to target my AVX2-capable laptop CPU (an
Intel i7), I got code that simply amazed me: The compiler unrolled
the inner loop by 32, ANDing 4 x 8 keys by 8 copies of the current
lock into 4 AVX registers (vpand), then comparing with a zeroed
register (vpcmpeqd) (generating -1/0 results) before subtracting
(vpsubd) those from 4 accumulators.
>
This resulted in just 12 instructions to handle 32 tests.
>
That sounds suboptimal.
By unrolling outer loop by 2 or 3 you can greatly reduce the number of
memory accesses per comparison. The speed up would depend on specific
microarchiture, but I would guess that at least 1.2x speedup is here.
Especially so when data is not aligned.

Anton already replied, as he wrote the total loop overhead is just three instructions, all of which can (& will?) overlap with the AVX instructions.
Due to the combined AVX and 4x unroll, the original scalar code is alreayd unrolled 32 x, so the loop overhead can mostly be ignored.
If the cpu has enough resources to run more than one 32-byte AVX instruction per cycle, then the same code will allow all four copies to run at the same time, but the timing I see on my laptop (93 ps) corresponds closely to one AVX op/cycle.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Date	Sujet	#	Auteur
4 Feb 25	Re: Cost of handling misaligned access	40	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	39	Terje Mathisen
5 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	34	Michael S
6 Feb 25	Re: Cost of handling misaligned access	32	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	31	Michael S
6 Feb 25	Re: Cost of handling misaligned access	2	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	28	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	27	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	26	Michael S
6 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	19	Michael S
7 Feb 25	Re: Cost of handling misaligned access	18	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	17	Michael S
7 Feb 25	Re: Cost of handling misaligned access	16	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	15	Michael S
7 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	3	MitchAlsup1
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
8 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	6	Michael S
8 Feb 25	Re: Cost of handling misaligned access	5	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	2	Michael S
11 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
10 Feb 25	Re: Cost of handling misaligned access	1	Michael S
7 Feb 25	Re: Cost of handling misaligned access	5	BGB
7 Feb 25	Re: Cost of handling misaligned access	4	MitchAlsup1
7 Feb 25	Re: Cost of handling misaligned access	3	BGB
8 Feb 25	Re: Cost of handling misaligned access	2	Anssi Saari
8 Feb 25	Re: Cost of handling misaligned access	1	BGB
6 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal