Newsportal USENET - Re: Stack vs stackless operation

Re: Stack vs stackless operation

Sujet : Re: Stack vs stackless operation
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.lang.forth
Date : 26. Feb 2025, 18:46:13

Autres entêtes

Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Feb26.184613@mips.complang.tuwien.ac.at>
References : 1 2 3 4
User-Agent : xrn 10.11

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

mhx@iae.nl (mhx) writes:
: :=: ( a b -- ) \ exchange values among two variables
OVER @ >R DUP @ ROT ! R> SWAP ! ;

Another variant:

: exchange ( addr1 addr2 -- )
dup @ rot !@ swap ! ;

This uses the primitive

'!@' ( u1 a-addr -- u2 ) gforth-experimental "store-fetch"
   load U2 from A_ADDR, and store U1 there, as atomic operation

I worry that the atomic part will result in it being slower than the
versions that do not use !@. Let's measure that:

: exchange ( addr1 addr2 -- )
over @ swap !@ swap ! ;

: :=: ( addr1 addr2 -- )
OVER @ >R DUP @   ROT !   R> SWAP ! ;

: bench-exchange ( addr1 addr2 -- )
100000000 0 do 2dup exchange loop ;

: bench-:=: ( addr1 addr2 -- )
100000000 0 do 2dup :=: loop ;

variable v1
variable v2

1 v1 !
2 v2 !

Measurement with
perf stat -e cycles -e instructions gforth-fast xxxx.fs -e "v1 v2 bench-exchange bye"
perf stat -e cycles -e instructions gforth-fast xxxx.fs -e "v1 v2 bench-:=: bye"

Results on a Zen4:

   exchange :=:
   877_054_156    812_761_422 cycles
   3_708_692_329    3_908_642_117 instructions

So the @! variant is indeed slower, but only a little (0.65 cycles per
execution of these words); however, I would expect either a big
slowdown (from latency when dealing with the memory subsystem,
broadcasting to other cores, etc.) or none at all.

And here's the code:
see-code exchange    see-code :=:
$7EFDC12A06A8 over 1->2 $7FBD6B6A06A8 over 1->2
7EFDC0DEA3B0:   mov    r15,$08[r10] 7FBD6B26B3B0:   mov    r15,$08[r10]
$7EFDC12A06B0 @ 2->2    $7FBD6B6A06B0 @ 2->2
7EFDC0DEA3B4:   mov    r15,[r15]    7FBD6B26B3B4:   mov    r15,[r15]
$7EFDC12A06B8 swap 2->1 $7FBD6B6A06B8 >r 2->1
7EFDC0DEA3B7:   mov    [r10],r15    7FBD6B26B3B7:   mov    -$08[r14],r15
7EFDC0DEA3BA:   sub    r10,$08    7FBD6B26B3BB:   sub    r14,$08
$7EFDC12A06C0 !@ 1->1 $7FBD6B6A06C0 dup 1->2
7EFDC0DEA3BE:   mov    rax,$08[r10] 7FBD6B26B3BF:   mov    r15,r13
7EFDC0DEA3C2:   add    r10,$08    $7FBD6B6A06C8 @ 2->2
7EFDC0DEA3C6:   xchg $00[r13],rax 7FBD6B26B3C2:   mov    r15,[r15]
7EFDC0DEA3CA:   mov    r13,rax    $7FBD6B6A06D0 rot 2->3
$7EFDC12A06C8 swap 1->2 7FBD6B26B3C5:   mov    r9,$08[r10]
7EFDC0DEA3CD:   mov    r15,$08[r10] 7FBD6B26B3C9:   add    r10,$08
7EFDC0DEA3D1:   add    r10,$08    $7FBD6B6A06D8 ! 3->1
$7EFDC12A06D0 ! 2->0    7FBD6B26B3CD:   mov    [r9],r15
7EFDC0DEA3D5:   mov    [r15],r13    $7FBD6B6A06E0 r> 1->2
$7EFDC12A06D8 ;s 0->1 7FBD6B26B3D0:   mov    r15,[r14]
7EFDC0DEA3D8:   mov    r13,$08[r10] 7FBD6B26B3D3:   add    r14,$08
7EFDC0DEA3DC:   add    r10,$08    $7FBD6B6A06E8 swap 2->3
7EFDC0DEA3E0:   mov    rbx,[r14]    7FBD6B26B3D7:   add    r10,$08
7EFDC0DEA3E3:   add    r14,$08    7FBD6B26B3DB:   mov    r9,r13
7EFDC0DEA3E7:   mov    rax,[rbx]    7FBD6B26B3DE:   mov    r13,[r10]
7EFDC0DEA3EA:   jmp    eax    $7FBD6B6A06F0 ! 3->1
7FBD6B26B3E1:   mov    [r9],r15
$7FBD6B6A06F8 ;s 1->1
7FBD6B26B3E4:   mov    rbx,[r14]
7FBD6B26B3E7:   add    r14,$08
7FBD6B26B3EB:   mov    rax,[rbx]
7FBD6B26B3EE:   jmp    eax

The difference looks bigger than it is: There are lines for 4
additional primitives (no influence on performance) and 2 additional
instructions, resulting in a 6-line difference.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
   New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
24 Feb 25	Stack vs stackless operation	72	LIT
24 Feb 25	Re: Stack vs stackless operation	4	minforth
24 Feb 25	Re: Stack vs stackless operation	3	LIT
24 Feb 25	Re: Stack vs stackless operation	2	minforth
24 Feb 25	Re: Stack vs stackless operation	1	LIT
24 Feb 25	Re: Stack vs stackless operation	14	Anton Ertl
24 Feb 25	Re: Stack vs stackless operation	13	LIT
25 Feb 25	Re: Stack vs stackless operation	12	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	11	LIT
25 Feb 25	Re: Stack vs stackless operation	10	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	9	LIT
25 Feb 25	Re: Stack vs stackless operation	5	minforth
25 Feb 25	Re: Stack vs stackless operation	4	LIT
25 Feb 25	Re: Stack vs stackless operation	3	minforth
25 Feb 25	Re: Stack vs stackless operation	2	LIT
25 Feb 25	Re: Stack vs stackless operation	1	Gerry Jackson
25 Feb 25	Re: Stack vs stackless operation	3	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	2	LIT
25 Feb 25	Re: Stack vs stackless operation	1	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	9	dxf
25 Feb 25	Re: Stack vs stackless operation	8	LIT
25 Feb 25	Re: Stack vs stackless operation	6	dxf
25 Feb 25	Re: Stack vs stackless operation	5	LIT
26 Feb 25	Re: Stack vs stackless operation	4	dxf
26 Feb 25	Re: Stack vs stackless operation	3	LIT
26 Feb 25	Re: Stack vs stackless operation	2	minforth
26 Feb 25	Re: Stack vs stackless operation	1	LIT
25 Feb 25	Re: Stack vs stackless operation	1	Hans Bezemer
25 Feb 25	Re: Stack vs stackless operation	2	LIT
25 Feb 25	do...loop (was: Stack vs stackless operation)	1	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	10	LIT
26 Feb 25	Re: Stack vs stackless operation	9	Hans Bezemer
26 Feb 25	Re: Stack vs stackless operation	8	LIT
26 Feb 25	Re: Stack vs stackless operation	7	Hans Bezemer
26 Feb 25	Re: Stack vs stackless operation	6	LIT
27 Feb 25	Re: Stack vs stackless operation	5	LIT
27 Feb 25	Re: Stack vs stackless operation	4	LIT
2 Mar 25	Re: Stack vs stackless operation	3	LIT
5 Mar 25	Re: Stack vs stackless operation	2	Hans Bezemer
6 Mar 25	Re: Stack vs stackless operation	1	LIT
25 Feb 25	Re: Stack vs stackless operation	32	LIT
25 Feb 25	Re: Stack vs stackless operation	10	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	1	LIT
26 Feb 25	Re: Stack vs stackless operation	8	LIT
26 Feb 25	Re: Stack vs stackless operation	1	LIT
26 Feb 25	Re: Stack vs stackless operation	6	John Ames
26 Feb 25	Re: Stack vs stackless operation	5	LIT
27 Feb 25	Re: Stack vs stackless operation	4	dxf
27 Feb 25	Re: Stack vs stackless operation	3	LIT
27 Feb 25	Re: Stack vs stackless operation	2	Hans Bezemer
27 Feb 25	Re: Stack vs stackless operation	1	LIT
26 Feb 25	Re: Stack vs stackless operation	2	Waldek Hebisch
26 Feb 25	Re: Stack vs stackless operation	1	Anton Ertl
26 Feb 25	Re: Stack vs stackless operation	19	mhx
26 Feb 25	Re: Stack vs stackless operation	1	minforth
26 Feb 25	Re: Stack vs stackless operation	16	Anton Ertl
26 Feb 25	Re: Stack vs stackless operation	15	Anton Ertl
26 Feb 25	Re: Stack vs stackless operation	7	Paul Rubin
26 Feb 25	Re: Stack vs stackless operation	1	minforth
27 Feb 25	Re: Stack vs stackless operation	5	Anton Ertl
27 Feb 25	Re: Stack vs stackless operation	2	Paul Rubin
27 Feb 25	Re: Stack vs stackless operation	1	Anton Ertl
27 Feb 25	Re: Stack vs stackless operation	2	Gerry Jackson
27 Feb 25	Re: Stack vs stackless operation	1	Anton Ertl
28 Feb 25	Re: Stack vs stackless operation	7	Anton Ertl
28 Feb 25	Re: Stack vs stackless operation	6	Paul Rubin
1 Mar 25	Re: Stack vs stackless operation	5	Anton Ertl
1 Mar 25	Stack caching (: Stack vs stackless operation)	1	Anton Ertl
1 Mar 25	Re: Stack vs stackless operation	3	Anton Ertl
1 Mar 25	Re: Stack vs stackless operation	2	Anton Ertl
1 Mar 25	Re: Stack vs stackless operation	1	mhx
27 Feb 25	Re: Stack vs stackless operation	1	mhx