anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
mhx@iae.nl (mhx) writes:
: :=: ( a b -- ) \ exchange values among two variables
OVER @ >R DUP @ ROT ! R> SWAP ! ;
Another variant:
: exchange ( addr1 addr2 -- )
dup @ rot !@ swap ! ;
This uses the primitive
'!@' ( u1 a-addr -- u2 ) gforth-experimental "store-fetch"
load U2 from A_ADDR, and store U1 there, as atomic operation
I worry that the atomic part will result in it being slower than the
versions that do not use !@. Let's measure that:
: exchange ( addr1 addr2 -- )
over @ swap !@ swap ! ;
: :=: ( addr1 addr2 -- )
OVER @ >R DUP @ ROT ! R> SWAP ! ;
: bench-exchange ( addr1 addr2 -- )
100000000 0 do 2dup exchange loop ;
: bench-:=: ( addr1 addr2 -- )
100000000 0 do 2dup :=: loop ;
variable v1
variable v2
1 v1 !
2 v2 !
Measurement with
perf stat -e cycles -e instructions gforth-fast xxxx.fs -e "v1 v2 bench-exchange bye"
perf stat -e cycles -e instructions gforth-fast xxxx.fs -e "v1 v2 bench-:=: bye"
Results on a Zen4:
exchange :=:
877_054_156 812_761_422 cycles
3_708_692_329 3_908_642_117 instructions
So the @! variant is indeed slower, but only a little (0.65 cycles per
execution of these words); however, I would expect either a big
slowdown (from latency when dealing with the memory subsystem,
broadcasting to other cores, etc.) or none at all.
And here's the code:
see-code exchange see-code :=:
$7EFDC12A06A8 over 1->2 $7FBD6B6A06A8 over 1->2
7EFDC0DEA3B0: mov r15,$08[r10] 7FBD6B26B3B0: mov r15,$08[r10]
$7EFDC12A06B0 @ 2->2 $7FBD6B6A06B0 @ 2->2
7EFDC0DEA3B4: mov r15,[r15] 7FBD6B26B3B4: mov r15,[r15]
$7EFDC12A06B8 swap 2->1 $7FBD6B6A06B8 >r 2->1
7EFDC0DEA3B7: mov [r10],r15 7FBD6B26B3B7: mov -$08[r14],r15
7EFDC0DEA3BA: sub r10,$08 7FBD6B26B3BB: sub r14,$08
$7EFDC12A06C0 !@ 1->1 $7FBD6B6A06C0 dup 1->2
7EFDC0DEA3BE: mov rax,$08[r10] 7FBD6B26B3BF: mov r15,r13
7EFDC0DEA3C2: add r10,$08 $7FBD6B6A06C8 @ 2->2
7EFDC0DEA3C6: xchg $00[r13],rax 7FBD6B26B3C2: mov r15,[r15]
7EFDC0DEA3CA: mov r13,rax $7FBD6B6A06D0 rot 2->3
$7EFDC12A06C8 swap 1->2 7FBD6B26B3C5: mov r9,$08[r10]
7EFDC0DEA3CD: mov r15,$08[r10] 7FBD6B26B3C9: add r10,$08
7EFDC0DEA3D1: add r10,$08 $7FBD6B6A06D8 ! 3->1
$7EFDC12A06D0 ! 2->0 7FBD6B26B3CD: mov [r9],r15
7EFDC0DEA3D5: mov [r15],r13 $7FBD6B6A06E0 r> 1->2
$7EFDC12A06D8 ;s 0->1 7FBD6B26B3D0: mov r15,[r14]
7EFDC0DEA3D8: mov r13,$08[r10] 7FBD6B26B3D3: add r14,$08
7EFDC0DEA3DC: add r10,$08 $7FBD6B6A06E8 swap 2->3
7EFDC0DEA3E0: mov rbx,[r14] 7FBD6B26B3D7: add r10,$08
7EFDC0DEA3E3: add r14,$08 7FBD6B26B3DB: mov r9,r13
7EFDC0DEA3E7: mov rax,[rbx] 7FBD6B26B3DE: mov r13,[r10]
7EFDC0DEA3EA: jmp eax $7FBD6B6A06F0 ! 3->1
7FBD6B26B3E1: mov [r9],r15
$7FBD6B6A06F8 ;s 1->1
7FBD6B26B3E4: mov rbx,[r14]
7FBD6B26B3E7: add r14,$08
7FBD6B26B3EB: mov rax,[rbx]
7FBD6B26B3EE: jmp eax
The difference looks bigger than it is: There are lines for 4
additional primitives (no influence on performance) and 2 additional
instructions, resulting in a 6-line difference.
- anton
-- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.htmlcomp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: https://forth-standard.org/EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/EuroForth 2024 proceedings:
http://www.euroforth.org/ef24/papers/