GNU/Linux Greatness: AVX 512 Assembly

Liste des GroupesRevenir à col advocacy 
Sujet : GNU/Linux Greatness: AVX 512 Assembly
De : fflud (at) *nospam* gnu.rocks (Farley Flud)
Groupes : comp.os.linux.advocacy
Date : 25. Jan 2025, 15:37:47
Autres entêtes
Organisation : UsenetExpress - www.usenetexpress.com
Message-ID : <181df652f994e0cb$34540$2484$802601b3@news.usenetexpress.com>
Assembly language programming is both extremely simple and
extremely fun.

Yes, simple.  A CPU is a stupid beast and can only perform
very simple tasks.

Yes, fun.  There is much enjoyment to be had in using these
simple CPU tasks, like Lego, to construct complex functionality.

AVX-512 is currently the way to go with assembly programming.
AVX-512 operates on 512-bits, or 8 doubles, 16 floats, 8 long ints,
16 ints, or 64 chars (uint_8) simultaneously.

With GNU/Linux, AVX-512 is totally at your command.

What follows is a very basic program that essentially does
nothing.  It merely uses AVX-512 assembly to read a data
block of arbitrary length and then write that block back
into different memory.

It's purpose is to illustrate how to step through memory
at a given stride to read all the data.  Since not all data
is a multiple of 512 bits the code shows to deal with any
trailing bits.

For the sake of illustration the following assembly code
reads/writes 37 unsigned integers.  These will fill 2 AVX-512
registers with 5 uints left over.  Those final 5 are handled
with masking.

But any data block, up to 2^64 bytes (whew!), can be handled with
this simple code.

This program is written in NASM assembly.  NASM is the fucking
best assembler on planet Earth, hands down.

As I indicated, this program does essentially nothing.  There is
no output.  To view the "results" use the GDB debugger or, better,
the front end DDD.  With DDD one can step through the code to watch
the action unfold.

Feast thine bloodshot, jaundiced eyeballs on absolutely perfect
AVX-512 assembly code:

==================================
Begin AVX-512 NASM Assembly
==================================

BITS 64

segment .text
global _start

_start:
mov r8, data_in
mov r9, data_out
mov rbx, qword [stride]
xor rdx, rdx
mov rax, qword [N]
div rbx ; rax = quotient, rdx = remainder
load:
vmovdqa32 zmm1, zword [r8]
vmovdqa32 zword [r9], zmm1
add r8, 64 ; increment data pointers
add r9, 64
dec rax
jnz load
xor r11, r11 ; load mask, i.e. only rdx left over to load
mov r10, -1
mov rcx, rdx
shld r11, r10, cl 
kmovq k1, r11;
vmovdqa32 zmm1{k1}{z}, zword [r8]
vmovdqa32 zword [r9], zmm1
exit:
xor edi,edi
mov eax,60
syscall

segment .data
align 64
N: dq 37 ;set length of block and stride
stride: dq 16
align 64
data_in: dd 16 dup (0xefbeadde) ;dummy data
dd 16 dup (0xfecaafde)
dd 5 dup (0xefbeadde)

segment .bss
alignb 64
data_out: resd 37

==================================
End AVX-512 NASM Assembly
==================================



--
Gentoo: The Fastest GNU/Linux Hands Down

Date Sujet#  Auteur
25 Jan 25 * GNU/Linux Greatness: AVX 512 Assembly11Farley Flud
25 Jan 25 +- Re: GNU/Linux Greatness: AVX 512 Assembly1Farley Flud
26 Jan 25 +- Re: GNU/Linux Greatness: AVX 512 Assembly1DFS
26 Jan 25 `* Re: Oh Look! Feeb discovers Assembly! (was: GNU/Linux Greatness: AVX 512 Assembly)8Tyrone
26 Jan 25  +- Re: Oh Look! Feeb discovers Assembly! (was: GNU/Linux Greatness: AVX 512 Assembly)1Farley Flud
26 Jan 25  +* Re: Oh Look! Feeb discovers Assembly!4Physfitfreak
26 Jan 25  i+- Re: Oh Look! Feeb discovers Assembly!1DFS
27 Jan 25  i`* Re: Oh Look! Feeb discovers Assembly!2Tyrone
27 Jan 25  i `- Re: Oh Look! Feeb discovers Assembly!1Physfitfreak
27 Jan 25  +- Re: Oh Look! Feeb discovers Assembly!1DFS
1 Feb 25  `- Re: Oh Look! Feeb discovers Assembly! (was: GNU/Linux Greatness: AVX 512 Assembly)1Stéphane CARPENTIER

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal