Liste des Groupes | Revenir à c arch |
mitchalsup@aol.com (MitchAlsup1) writes:The point is that the cost of not getting allocated into a register>
is vastly lower--the count of instructions remains 1 while the
latency increases. That increase in latency does not hurt those
use once/seldom variables.
Latency is not the issue in modern high-performance AMD64 cores, which
have zero-cycle store-to-load forwarding
<http://www.complang.tuwien.ac.at/anton/memdep/>.
>
And yet, putting variables in registers gives a significant speedup:
On a Rocket Lake, numbers are times in seconds:
>
sieve bubble matrix fib fft
0.075 0.070 0.036 0.049 0.017 TOS in reg, RP in reg, IP in reg
0.100 0.149 0.054 0.106 0.037 TOS in mem, RP in mem, IP write-through to mem
>
In the first line, I used gforth-fast and tried to disable all
optimizations except those that keep certain variables in registers:
>
gforth-fast --ss-states=1 --ss-number=31 --opt-ip-updates=0 onebench.fs
>
I could not reduce the static superinstructions below 31 and still get
a result; I will have to investigate why, but that probably does not
make that much of a difference for several of these benchmarks.
Les messages affichés proviennent d'usenet.