Newsportal USENET - Performance benefits of primitive-centric code (was: Actually... )

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

As for performance, here is what I measure on gforth-itc:
>
sieve bubble matrix fib fft compile,
0.173 0.187 0.142 0.253 0.085 ,
0.164 0.191 0.134 0.242 0.088 opt-compile,
>
There is quite a bit of variation between the runs on the Zen4 machine
where I measured this.

That's not particularly impressive, but this primitive-centric code is
a stepping stone for a number of further changes which overall produce
a very good speedup. I demonstrate this with the following sequence
of invocations:

gforth-itc onebench.fs
#let's add primitive-centric code
gforth-itc -e "' opt-compile, is compile," onebench.fs
#now switch to direct-threaded code:
gforth --no-dynamic --ss-number=0 onebench.fs
#now allow dynamic superinstructions with replication:
gforth --ss-number=0 --opt-ip-updates=0 onebench.fs
#switch to benchmarking engine (less precision in error reporting):
gforth-fast --ss-number=0 --ss-states=1 --opt-ip-updates=0 onebench.fs
#swith on static stack caching with three registers:
gforth-fast --ss-number=0 --opt-ip-updates=0 onebench.fs
#optimize away most IP updates:
gforth-fast --ss-number=0 onebench.fs
#enabe static superinstructions:
gforth-fast onebench.fs

The results on a 5GHz Zen4 are (smaller is better):

sieve bubble matrix fib fft
0.173 0.184 0.142 0.247 0.085 gforth-itc
0.163 0.190 0.134 0.238 0.089 let's add primitive-centric code
0.164 0.187 0.130 0.246 0.085 now switch to direct-threaded code
0.084 0.128 0.051 0.105 0.030 +dynamic superinstructions with replication
0.053 0.061 0.032 0.049 0.018 switch to benchmarking engine
0.053 0.059 0.031 0.042 0.015 +static stack caching with three registers
0.020 0.021 0.011 0.027 0.013 +optimize away most IP updates
0.020 0.021 0.011 0.027 0.012 +enabe static superinstructions

As you can see, the overall effect of these changes is quite big.

You may wonder what these funny words all mean. Here's a list of
papers about these topics:

primitive-centric code:
https://www.complang.tuwien.ac.at/papers/ertl02.ps.gz

dynamic superinstructions with replication:
https://www.complang.tuwien.ac.at/papers/ertl%26gregg03.ps.gz

static stack caching:
https://www.complang.tuwien.ac.at/papers/ertl%26gregg05.ps.gz

IP update optimization:
https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.14

Static superinstructions:
https://www.complang.tuwien.ac.at/papers/ertl+02.ps.gz

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

Date	Sujet	#	Auteur
11 Jun 25	Actually... why not?	13	LIT
11 Jun 25	Re: Actually... why not?	12	Anton Ertl
12 Jun 25	Re: Actually... why not?	11	LIT
12 Jun 25	Re: Actually... why not?	10	Anton Ertl
12 Jun 25	Re: Actually... why not?	4	LIT
12 Jun 25	Re: Actually... why not?	1	LIT
12 Jun 25	Re: Actually... why not?	2	Anton Ertl
12 Jun 25	Re: Actually... why not?	1	LIT
12 Jun 25	Performance benefits of primitive-centric code (was: Actually... )	5	Anton Ertl
13 Jun 25	Re: Performance benefits of primitive-centric code	4	minforth
13 Jun 25	Re: Performance benefits of primitive-centric code	2	Paul Rubin
13 Jun 25	Re: Performance benefits of primitive-centric code	1	Anton Ertl
13 Jun 25	Re: Performance benefits of primitive-centric code	1	Anton Ertl