On Sat, 25 Jan 2025 14:37:47 +0000, Farley Flud wrote:
Feast thine bloodshot, jaundiced eyeballs on absolutely perfect
AVX-512 assembly code:
I cannot resist giving the NASM dump of the assembled code
(in PIC form of course).
Feast thine jaundiced eyeballs below.
Note the data in hexadecimal which reads "DEADBEEF..."
That is a common device but I also added "DEAFCAFE..."
These allow me to easily discern where things are.
Also note the "90" at line 36. NASM pads alignment with byte
"90" which is the NOP instruction. I should change that padding
to all zeros but here it does no harm.
Assembly language is the ultimate (and only) language. Anyone who
does not embrace assembly is a phony and a fraud and deserves to
be ostracized, if not worse.
=================================================
1 BITS 64
2
3 segment .text
4 global _start
5
6 _start:
7 00000000 49B8- mov r8, data_in
7 00000002 [4000000000000000]
8 0000000A 49B9- mov r9, data_out
8 0000000C [0000000000000000]
9 00000014 488B1C25[08000000] mov rbx, qword [stride]
10 0000001C 4831D2 xor rdx, rdx
11 0000001F 488B0425[00000000] mov rax, qword [N]
12 00000027 48F7F3 div rbx ; rax = quotient, rdx = remainder
13 load:
14 0000002A 62D17D486F08 vmovdqa32 zmm1, zword [r8]
15 00000030 62D17D487F09 vmovdqa32 zword [r9], zmm1
16 00000036 4983C040 add r8, 64 ; increment data pointers
17 0000003A 4983C140 add r9, 64
18 0000003E 48FFC8 dec rax
19 00000041 75E7 jnz load
20 00000043 4D31DB xor r11, r11 ; load mask, i.e. only rdx to load and process
21 00000046 49C7C2FFFFFFFF mov r10, -1
22 0000004D 4889D1 mov rcx, rdx
23 00000050 4D0FA5D3 shld r11, r10, cl
24 00000054 C4C1FB92CB kmovq k1, r11;
25 00000059 62D17DC96F08 vmovdqa32 zmm1{k1}{z}, zword [r8]
26 0000005F 62D17D487F09 vmovdqa32 zword [r9], zmm1
27 exit:
28 00000065 31FF xor edi,edi
29 00000067 B83C000000 mov eax,60
30 0000006C 0F05 syscall
31
32 segment .data
33 align 64
34 00000000 2500000000000000 N: dq 37
35 00000008 1000000000000000 stride: dq 16
36 00000010 90<rep 30h> align 64
37 00000040 DEADBEEFDEADBEEFDE- data_in: dd 16 dup (0xefbeadde)
37 00000049 ADBEEFDEADBEEFDEAD-
37 00000052 BEEFDEADBEEFDEADBE-
37 0000005B EFDEADBEEFDEADBEEF-
37 00000064 DEADBEEFDEADBEEFDE-
37 0000006D ADBEEFDEADBEEFDEAD-
37 00000076 BEEFDEADBEEFDEADBE-
37 0000007F EF
38 00000080 DEAFCAFEDEAFCAFEDE- dd 16 dup (0xfecaafde)
38 00000089 AFCAFEDEAFCAFEDEAF-
38 00000092 CAFEDEAFCAFEDEAFCA-
38 0000009B FEDEAFCAFEDEAFCAFE-
38 000000A4 DEAFCAFEDEAFCAFEDE-
38 000000AD AFCAFEDEAFCAFEDEAF-
38 000000B6 CAFEDEAFCAFEDEAFCA-
38 000000BF FE
39 000000C0 DEADBEEFDEADBEEFDE- dd 5 dup (0xefbeadde)
39 000000C9 ADBEEFDEADBEEFDEAD-
39 000000D2 BEEF
40
41 segment .bss
42 alignb 64
43 00000000 <res 94h> data_out: resd 37
=====================================================================
-- Gentoo: The Fastest GNU/Linux Hands Down