Re: GNU/Linux Greatness: AVX 512 Assembly

Liste des GroupesRevenir à col advocacy 
Sujet : Re: GNU/Linux Greatness: AVX 512 Assembly
De : fflud (at) *nospam* gnu.rocks (Farley Flud)
Groupes : comp.os.linux.advocacy
Date : 25. Jan 2025, 20:19:06
Autres entêtes
Organisation : UsenetExpress - www.usenetexpress.com
Message-ID : <181e05acefef6fd8$206981$891815$802601b3@news.usenetexpress.com>
References : 1
On Sat, 25 Jan 2025 14:37:47 +0000, Farley Flud wrote:

 
Feast thine bloodshot, jaundiced eyeballs on absolutely perfect
AVX-512 assembly code:
 

I cannot resist giving the NASM dump of the assembled code
(in PIC form of course).

Feast thine jaundiced eyeballs below.

Note the data in hexadecimal which reads "DEADBEEF..."

That is a common device but I also added "DEAFCAFE..."

These allow me to easily discern where things are.

Also note the "90" at line 36.  NASM pads alignment with byte
"90" which is the NOP instruction.  I should change that padding
to all zeros but here it does no harm.

Assembly language is the ultimate (and only) language.  Anyone who
does not embrace assembly is a phony and a fraud and deserves to
be ostracized, if not worse.

=================================================

     1                                  BITS 64
     2                                 
     3                                  segment .text
     4                                  global _start
     5                                 
     6                                  _start:
     7 00000000 49B8-                   mov r8, data_in
     7 00000002 [4000000000000000]
     8 0000000A 49B9-                   mov r9, data_out
     8 0000000C [0000000000000000]
     9 00000014 488B1C25[08000000]      mov rbx, qword [stride]
    10 0000001C 4831D2                  xor rdx, rdx
    11 0000001F 488B0425[00000000]      mov rax, qword [N]
    12 00000027 48F7F3                  div rbx ; rax = quotient, rdx = remainder
    13                                  load:
    14 0000002A 62D17D486F08            vmovdqa32 zmm1, zword [r8]
    15 00000030 62D17D487F09            vmovdqa32 zword [r9], zmm1
    16 00000036 4983C040                add r8, 64 ; increment data pointers
    17 0000003A 4983C140                add r9, 64
    18 0000003E 48FFC8                  dec rax
    19 00000041 75E7                    jnz load
    20 00000043 4D31DB                  xor r11, r11 ; load mask, i.e. only rdx to load and process
    21 00000046 49C7C2FFFFFFFF          mov r10, -1
    22 0000004D 4889D1                  mov rcx, rdx
    23 00000050 4D0FA5D3                shld r11, r10, cl 
    24 00000054 C4C1FB92CB              kmovq k1, r11;
    25 00000059 62D17DC96F08            vmovdqa32 zmm1{k1}{z}, zword [r8]
    26 0000005F 62D17D487F09            vmovdqa32 zword [r9], zmm1
    27                                  exit:
    28 00000065 31FF                    xor edi,edi
    29 00000067 B83C000000              mov eax,60
    30 0000006C 0F05                    syscall
    31                                 
    32                                  segment .data
    33                                  align 64
    34 00000000 2500000000000000        N: dq 37
    35 00000008 1000000000000000        stride: dq 16
    36 00000010 90<rep 30h>             align 64
    37 00000040 DEADBEEFDEADBEEFDE-     data_in: dd 16 dup (0xefbeadde)
    37 00000049 ADBEEFDEADBEEFDEAD-
    37 00000052 BEEFDEADBEEFDEADBE-
    37 0000005B EFDEADBEEFDEADBEEF-
    37 00000064 DEADBEEFDEADBEEFDE-
    37 0000006D ADBEEFDEADBEEFDEAD-
    37 00000076 BEEFDEADBEEFDEADBE-
    37 0000007F EF                
    38 00000080 DEAFCAFEDEAFCAFEDE-     dd 16 dup (0xfecaafde)
    38 00000089 AFCAFEDEAFCAFEDEAF-
    38 00000092 CAFEDEAFCAFEDEAFCA-
    38 0000009B FEDEAFCAFEDEAFCAFE-
    38 000000A4 DEAFCAFEDEAFCAFEDE-
    38 000000AD AFCAFEDEAFCAFEDEAF-
    38 000000B6 CAFEDEAFCAFEDEAFCA-
    38 000000BF FE                
    39 000000C0 DEADBEEFDEADBEEFDE-     dd 5 dup (0xefbeadde)
    39 000000C9 ADBEEFDEADBEEFDEAD-
    39 000000D2 BEEF              
    40                                 
    41                                  segment .bss
    42                                  alignb 64
    43 00000000 <res 94h>               data_out: resd 37

=====================================================================





--
Gentoo: The Fastest GNU/Linux Hands Down

Date Sujet#  Auteur
25 Jan 25 * GNU/Linux Greatness: AVX 512 Assembly11Farley Flud
25 Jan 25 +- Re: GNU/Linux Greatness: AVX 512 Assembly1Farley Flud
26 Jan 25 +- Re: GNU/Linux Greatness: AVX 512 Assembly1DFS
26 Jan 25 `* Re: Oh Look! Feeb discovers Assembly! (was: GNU/Linux Greatness: AVX 512 Assembly)8Tyrone
26 Jan 25  +- Re: Oh Look! Feeb discovers Assembly! (was: GNU/Linux Greatness: AVX 512 Assembly)1Farley Flud
26 Jan 25  +* Re: Oh Look! Feeb discovers Assembly!4Physfitfreak
26 Jan 25  i+- Re: Oh Look! Feeb discovers Assembly!1DFS
27 Jan 25  i`* Re: Oh Look! Feeb discovers Assembly!2Tyrone
27 Jan 25  i `- Re: Oh Look! Feeb discovers Assembly!1Physfitfreak
27 Jan 25  +- Re: Oh Look! Feeb discovers Assembly!1DFS
1 Feb 25  `- Re: Oh Look! Feeb discovers Assembly! (was: GNU/Linux Greatness: AVX 512 Assembly)1Stéphane CARPENTIER

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal