Sujet : Performance benefits of primitive-centric code (was: Actually... )
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.lang.forthDate : 12. Jun 2025, 22:01:46
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Jun12.230146@mips.complang.tuwien.ac.at>
References : 1 2 3 4
User-Agent : xrn 10.11
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
As for performance, here is what I measure on gforth-itc:
>
sieve bubble matrix fib fft compile,
0.173 0.187 0.142 0.253 0.085 ,
0.164 0.191 0.134 0.242 0.088 opt-compile,
>
There is quite a bit of variation between the runs on the Zen4 machine
where I measured this.
That's not particularly impressive, but this primitive-centric code is
a stepping stone for a number of further changes which overall produce
a very good speedup. I demonstrate this with the following sequence
of invocations:
gforth-itc onebench.fs
#let's add primitive-centric code
gforth-itc -e "' opt-compile, is compile," onebench.fs
#now switch to direct-threaded code:
gforth --no-dynamic --ss-number=0 onebench.fs
#now allow dynamic superinstructions with replication:
gforth --ss-number=0 --opt-ip-updates=0 onebench.fs
#switch to benchmarking engine (less precision in error reporting):
gforth-fast --ss-number=0 --ss-states=1 --opt-ip-updates=0 onebench.fs
#swith on static stack caching with three registers:
gforth-fast --ss-number=0 --opt-ip-updates=0 onebench.fs
#optimize away most IP updates:
gforth-fast --ss-number=0 onebench.fs
#enabe static superinstructions:
gforth-fast onebench.fs
The results on a 5GHz Zen4 are (smaller is better):
sieve bubble matrix fib fft
0.173 0.184 0.142 0.247 0.085 gforth-itc
0.163 0.190 0.134 0.238 0.089 let's add primitive-centric code
0.164 0.187 0.130 0.246 0.085 now switch to direct-threaded code
0.084 0.128 0.051 0.105 0.030 +dynamic superinstructions with replication
0.053 0.061 0.032 0.049 0.018 switch to benchmarking engine
0.053 0.059 0.031 0.042 0.015 +static stack caching with three registers
0.020 0.021 0.011 0.027 0.013 +optimize away most IP updates
0.020 0.021 0.011 0.027 0.012 +enabe static superinstructions
As you can see, the overall effect of these changes is quite big.
You may wonder what these funny words all mean. Here's a list of
papers about these topics:
primitive-centric code:
https://www.complang.tuwien.ac.at/papers/ertl02.ps.gzdynamic superinstructions with replication:
https://www.complang.tuwien.ac.at/papers/ertl%26gregg03.ps.gzstatic stack caching:
https://www.complang.tuwien.ac.at/papers/ertl%26gregg05.ps.gzIP update optimization:
https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.14Static superinstructions:
https://www.complang.tuwien.ac.at/papers/ertl+02.ps.gz- anton
-- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.htmlcomp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: https://forth-standard.org/EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/EuroForth 2024 proceedings:
http://www.euroforth.org/ef24/papers/