Performance benefits of primitive-centric code (was: Actually... )

Liste des GroupesRevenir à cl forth 
Sujet : Performance benefits of primitive-centric code (was: Actually... )
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.lang.forth
Date : 12. Jun 2025, 22:01:46
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Jun12.230146@mips.complang.tuwien.ac.at>
References : 1 2 3 4
User-Agent : xrn 10.11
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
As for performance, here is what I measure on gforth-itc:
>
sieve bubble matrix   fib   fft compile,
0.173  0.187  0.142 0.253 0.085 ,
0.164  0.191  0.134 0.242 0.088 opt-compile,
>
There is quite a bit of variation between the runs on the Zen4 machine
where I measured this.

That's not particularly impressive, but this primitive-centric code is
a stepping stone for a number of further changes which overall produce
a very good speedup.  I demonstrate this with the following sequence
of invocations:

gforth-itc onebench.fs
#let's add primitive-centric code
gforth-itc -e "' opt-compile, is compile," onebench.fs
#now switch to direct-threaded code:
gforth --no-dynamic --ss-number=0 onebench.fs
#now allow dynamic superinstructions with replication:
gforth --ss-number=0 --opt-ip-updates=0 onebench.fs
#switch to benchmarking engine (less precision in error reporting):
gforth-fast --ss-number=0 --ss-states=1 --opt-ip-updates=0 onebench.fs
#swith on static stack caching with three registers:
gforth-fast --ss-number=0  --opt-ip-updates=0 onebench.fs
#optimize away most IP updates:
gforth-fast --ss-number=0  onebench.fs
#enabe static superinstructions:
gforth-fast onebench.fs

The results on a 5GHz Zen4 are (smaller is better):

 sieve bubble matrix   fib   fft
 0.173  0.184  0.142 0.247 0.085 gforth-itc
 0.163  0.190  0.134 0.238 0.089 let's add primitive-centric code
 0.164  0.187  0.130 0.246 0.085 now switch to direct-threaded code
 0.084  0.128  0.051 0.105 0.030 +dynamic superinstructions with replication
 0.053  0.061  0.032 0.049 0.018 switch to benchmarking engine
 0.053  0.059  0.031 0.042 0.015 +static stack caching with three registers
 0.020  0.021  0.011 0.027 0.013 +optimize away most IP updates
 0.020  0.021  0.011 0.027 0.012 +enabe static superinstructions

As you can see, the overall effect of these changes is quite big.

You may wonder what these funny words all mean.  Here's a list of
papers about these topics:

primitive-centric code:
https://www.complang.tuwien.ac.at/papers/ertl02.ps.gz

dynamic superinstructions with replication:
https://www.complang.tuwien.ac.at/papers/ertl%26gregg03.ps.gz

static stack caching:
https://www.complang.tuwien.ac.at/papers/ertl%26gregg05.ps.gz

IP update optimization:
https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.14

Static superinstructions:
https://www.complang.tuwien.ac.at/papers/ertl+02.ps.gz

- anton
--
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

Date Sujet#  Auteur
11 Jun 25 * Actually... why not?13LIT
11 Jun 25 `* Re: Actually... why not?12Anton Ertl
12 Jun 25  `* Re: Actually... why not?11LIT
12 Jun 25   `* Re: Actually... why not?10Anton Ertl
12 Jun 25    +* Re: Actually... why not?4LIT
12 Jun 25    i+- Re: Actually... why not?1LIT
12 Jun 25    i`* Re: Actually... why not?2Anton Ertl
12 Jun 25    i `- Re: Actually... why not?1LIT
12 Jun 25    `* Performance benefits of primitive-centric code (was: Actually... )5Anton Ertl
13 Jun 25     `* Re: Performance benefits of primitive-centric code4minforth
13 Jun 25      +* Re: Performance benefits of primitive-centric code2Paul Rubin
13 Jun 25      i`- Re: Performance benefits of primitive-centric code1Anton Ertl
13 Jun 25      `- Re: Performance benefits of primitive-centric code1Anton Ertl

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal