Newsportal USENET - Re: Cost of handling misaligned access

Re: Cost of handling misaligned access

Sujet : Re: Cost of handling misaligned access
De : already5chosen (at) *nospam* yahoo.com (Michael S)
Groupes : comp.arch
Date : 06. Feb 2025, 15:49:56

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <20250206164956.00005c3c@yahoo.com>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)

On Thu, 06 Feb 2025 13:54:55 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Michael S <already5chosen@yahoo.com> writes:
On Thu, 06 Feb 2025 10:59:39 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Michael S <already5chosen@yahoo.com> writes:
This resulted in just 12 instructions to handle 32 tests.

>
That sounds suboptimal.
By unrolling outer loop by 2 or 3 you can greatly reduce the
number of memory accesses per comparison.
...
The point of my proposal is not reduction of loop overhead and not
reduction of the # of x86 instructions (in fact, with my proposal
the # of x86 instructions is increased), but reduction in # of uOps
due to reuse of loaded values.
The theory behind it is that most typically in code with very high
IPC like the one above the main bottleneck is the # of uOps that
flows through rename stage.
>
Not counting loop overhead, an original 1x4 inner loop consists of 12
instructions, 16 uops. Suppose, we replace it by 2x2 inner loop that
does the same amount of work. New inner loop contains only RISC-like
instructions - 14 instructions, 14 uOps.
With 3x2 inner loop there are 20 instruction, 20 uOps and 1.5x more
work done per iteration.

I completely missed the "outer" in your response. Yes, looking at the
original loop again:

let mut part1 = 0;
for l in li..keylocks.len() {
let lock = keylocks[l];
for k in 0..li {
let sum = lock & keylocks[k];
if sum == 0 {
part1 += 1;
}
}
}

you can reuse the keylocks[k] value by unrolling the outer loop.
E.g., if you unroll the outer loop by a factor of 4 (and the inner
loop not beyond getting SIMD width), you can use almost the same code
as clang produces, but you load keylocks[0..li] 4 times less often,
and if the bottleneck for the inner-loop-only-optimized variant is
bandwidth to the memory subsystem (it seems that Terje Mathisen worked
with up to 62500 values, i.e., 250KB, i.e. L2), which is likely, there
may be quite a bit of speedup.

My understanding is different. There are only 2x250 values = 2000 bytes.

E.g., for Zen5 the bandwidth to L2 is reported to be 32 bytes/cycle,
which would limit the performance to need at least 4 cycles/iteration
(3.75 IPC), possibly less due to misalignment handling overhead, and
using AVX-512 would not help, whereas with reusing the loaded value
the limit would probably be resources, and AVX-512 would see quite a
bit of speedup over AVX2.

Another factor that can contribute to a speedup is increased number
of iterations in the inner loop - from 1..7 iterations in original to
1..15 in both of my above mentioned variants.

Yes. I actually don't see a reason to unroll the inner loop more than
needed for the SIMD instructions at hand, unless the number of
outer-loop iterations is too small. If you want more unrolling,
unroll the outer loop more.

Now, when I think about it, may be it makes sense to not unroll inner
loop at all, even by SIMD? Do all unrolling on the outer side? I.e.
process 32 iterations of outer loop at once. 32 "outer" locks hold in 4
SIMD registers, inner loop loads values with 'vbroadcastss ymm, m32'.
That would completely eliminates any alignment concerns.

Yet another possibility is to follow "work harder not smarter"
principle, i.e. process the whole square rather than just a relevant
triangle.

I don't see a triangle in the code above. There may be some more
outer loop involved that varies li from 0 to keylocks.len() or
something, but the code that is presented processes a square.

- anton

My mistake. Somehow I misread the code like
for l in li..keylocks.len()
for k in 0..l

Please ignore "smarter vs harder" part.

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
2 Feb 25	Re: Cost of handling misaligned access	112	BGB
3 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	109	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	11	BGB
3 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	8	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	7	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	5	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	2	Thomas Koenig
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
4 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	3	Thomas Koenig
3 Feb 25	Re: Cost of handling misaligned access	2	BGB
3 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
4 Feb 25	Re: Cost of handling misaligned access	41	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	40	Terje Mathisen
5 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	35	Michael S
6 Feb 25	Re: Cost of handling misaligned access	32	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	31	Michael S
6 Feb 25	Re: Cost of handling misaligned access	2	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	28	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	27	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	26	Michael S
6 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	19	Michael S
7 Feb 25	Re: Cost of handling misaligned access	18	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	17	Michael S
7 Feb 25	Re: Cost of handling misaligned access	16	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	15	Michael S
7 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	3	MitchAlsup1
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
8 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	6	Michael S
8 Feb 25	Re: Cost of handling misaligned access	5	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	2	Michael S
11 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
10 Feb 25	Re: Cost of handling misaligned access	1	Michael S
7 Feb 25	Re: Cost of handling misaligned access	5	BGB
7 Feb 25	Re: Cost of handling misaligned access	4	MitchAlsup1
7 Feb 25	Re: Cost of handling misaligned access	3	BGB
8 Feb 25	Re: Cost of handling misaligned access	2	Anssi Saari
8 Feb 25	Re: Cost of handling misaligned access	1	BGB
6 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	5	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	3	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	2	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
13 Feb 25	Re: Cost of handling misaligned access	48	Marcus
13 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
14 Feb 25	Re: Cost of handling misaligned access	41	BGB
14 Feb 25	Re: Cost of handling misaligned access	40	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	39	BGB
18 Feb 25	Re: Cost of handling misaligned access	33	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	1	BGB
18 Feb 25	Re: Cost of handling misaligned access	31	Michael S
18 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
18 Feb 25	Re: Cost of handling misaligned access	26	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
18 Feb 25	Re: Cost of handling misaligned access	24	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	23	Terje Mathisen
19 Feb 25	Re: Cost of handling misaligned access	22	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	21	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
20 Feb 25	Re: Cost of handling misaligned access	5	MitchAlsup1
20 Feb 25	Re: Cost of handling misaligned access	2	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	2	Robert Finch
21 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	14	BGB
22 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
22 Feb 25	Re: Cost of handling misaligned access	12	Robert Finch
23 Feb 25	Re: Cost of handling misaligned access	10	BGB
23 Feb 25	Re: Cost of handling misaligned access	9	Michael S
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	7	Michael S
24 Feb 25	Re: Cost of handling misaligned access	4	Robert Finch
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
25 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
23 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
18 Feb 25	Re: Cost of handling misaligned access	3	BGB
19 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	5	Robert Finch
17 Feb 25	Re: Cost of handling misaligned access	5	Terje Mathisen