Re: Cost of handling misaligned access

Liste des GroupesRevenir à c arch 
Sujet : Re: Cost of handling misaligned access
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.arch
Date : 05. Feb 2025, 20:26:18
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vo0e0r$2h20b$1@dont-email.me>
References : 1 2 3 4 5 6 7
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20
Anton Ertl wrote:
Terje Mathisen <terje.mathisen@tmsw.no> writes:
         for k in 0..li {
             let sum = lock & keylocks[k];
             if sum == 0 {
                 part1 += 1;
             }
         }
 Does Rust only have this roundabout way to express this sequentially?
In Forth I would express that scalarly as
 ( part1 ) li 0 do
   keylocks i th @ lock and 0= - loop
 ["-" because 0= produces all-bits-set (-1) for true]
 or in C as
 for (k=0; k<li; k++)
   part1 += (lock & keylocks[k])==0;
I could have written it as
   part1 += ((lock & keylocks[k]) == 0) as u32;
I.e just like C except all casting has to be explicit, and here the boolean result of the '==' test needs to be expanded into a u32.

 which I find much easier to follow.  I also expected 0..li to include
li (based on, I guess, the of .. in Pascal and its descendents), but
the net tells me that it does not (starting with 0 was the hint that
made me check my expectations).
:-)
It is similar to "for (k=0;k<li;k++) {}" so exclusive right limit feels natural.

 
Telling the rust compiler to target my AVX2-capable laptop CPU (an Intel
i7)
 I find it deplorable that even knowledgeable people use marketing
labels like "i7" which do not tell anything technical (and very little
non-technical) rather than specifying the full model number (e.g, Core
i7-1270P) or the design (e.g., Alder Lake).  But in the present case
"AVX2-capable CPU" is enough information.
 
I got code that simply amazed me: The compiler unrolled the inner
loop by 32, ANDing 4 x 8 keys by 8 copies of the current lock into 4 AVX
registers (vpand), then comparing with a zeroed register (vpcmpeqd)
(generating -1/0 results) before subtracting (vpsubd) those from 4
accumulators.
 If you have ever learned about vectorization, it's easy to see that
the inner loop can be vectorized.  And obviously auto-vectorization
has worked in this case, not particularly amazing to me.
I have some (30 years?) experience with auto-vectorization, usually I've been (very?) disappointed. As I wrote this was the best I have ever seen, and the resulting code actually performed extremely close to theoretical speed of light, i.e. 3 clock cycles for each 3 avx instruction.
[snip]

clang is somewhat better:
 For the avx2 case, 70 lines and 250 bytes.
For the x86-64-v4 case, 111 lines and 435 byes.
Rustc sits on top of the clang infrastucture, even with that 32-way unroll it was quite compact. I did not count, but your 70 lines seems to be in the ballpark.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Date Sujet#  Auteur
3 Feb 25 * Re: Cost of handling misaligned access106Anton Ertl
3 Feb 25 +- Re: Cost of handling misaligned access9BGB
23 Apr 26 +- 
4 Feb 25 +* Re: Cost of handling misaligned access40Anton Ertl
5 Feb 25 i`* Re: Cost of handling misaligned access39Terje Mathisen
5 Feb 25 i +* Re: Cost of handling misaligned access4Anton Ertl
5 Feb 25 i i+* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25 i ii`- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25 i i`- Re: Cost of handling misaligned access1Anton Ertl
5 Feb 25 i `* Re: Cost of handling misaligned access34Michael S
6 Feb 25 i  +* Re: Cost of handling misaligned access32Anton Ertl
6 Feb 25 i  i`* Re: Cost of handling misaligned access31Michael S
6 Feb 25 i  i +* Re: Cost of handling misaligned access2Anton Ertl
6 Feb 25 i  i i`- Re: Cost of handling misaligned access1Michael S
6 Feb 25 i  i `* Re: Cost of handling misaligned access28Terje Mathisen
6 Feb 25 i  i  `* Re: Cost of handling misaligned access27Terje Mathisen
6 Feb 25 i  i   `* Re: Cost of handling misaligned access26Michael S
6 Feb 25 i  i    `* Re: Cost of handling misaligned access25Terje Mathisen
6 Feb 25 i  i     +* Re: Cost of handling misaligned access19Michael S
7 Feb 25 i  i     i`* Re: Cost of handling misaligned access18Terje Mathisen
7 Feb 25 i  i     i `* Re: Cost of handling misaligned access17Michael S
7 Feb 25 i  i     i  `* Re: Cost of handling misaligned access16Terje Mathisen
7 Feb 25 i  i     i   `* Re: Cost of handling misaligned access15Michael S
7 Feb 25 i  i     i    +- Re: Cost of handling misaligned access1Terje Mathisen
7 Feb 25 i  i     i    +* Re: Cost of handling misaligned access3MitchAlsup1
8 Feb 25 i  i     i    i+- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25 i  i     i    i`- Re: Cost of handling misaligned access1Michael S
8 Feb 25 i  i     i    `* Re: Cost of handling misaligned access10Anton Ertl
8 Feb 25 i  i     i     +- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25 i  i     i     +* Re: Cost of handling misaligned access6Michael S
8 Feb 25 i  i     i     i`* Re: Cost of handling misaligned access5Anton Ertl
8 Feb 25 i  i     i     i +- Re: Cost of handling misaligned access1Michael S
9 Feb 25 i  i     i     i +* Re: Cost of handling misaligned access2Michael S
11 Feb 25 i  i     i     i i`- Re: Cost of handling misaligned access1Michael S
9 Feb 25 i  i     i     i `- Re: Cost of handling misaligned access1Michael S
9 Feb 25 i  i     i     +- Re: Cost of handling misaligned access1Michael S
10 Feb 25 i  i     i     `- Re: Cost of handling misaligned access1Michael S
7 Feb 25 i  i     `* Re: Cost of handling misaligned access5BGB
7 Feb 25 i  i      `* Re: Cost of handling misaligned access4MitchAlsup1
7 Feb 25 i  i       `* Re: Cost of handling misaligned access3BGB
8 Feb 25 i  i        `* Re: Cost of handling misaligned access2Anssi Saari
8 Feb 25 i  i         `- Re: Cost of handling misaligned access1BGB
6 Feb 25 i  `- Re: Cost of handling misaligned access1Terje Mathisen
6 Feb 25 `* Re: Cost of handling misaligned access5Waldek Hebisch
6 Feb 25  +* Re: Cost of handling misaligned access3Anton Ertl
6 Feb 25  i`* Re: Cost of handling misaligned access2Waldek Hebisch
6 Feb 25  i `- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  `- Re: Cost of handling misaligned access1Terje Mathisen

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal