Sujet : Re: Making Lemonade (Floating-point format changes)
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 14. May 2024, 06:35:53
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024May14.073553@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6
User-Agent : xrn 10.11
mitchalsup@aol.com (MitchAlsup1) writes:
I recall that MIPS could emulate a TLB table walk in something like
19 cycles. That is:: a few cycles to get there, a hash table access,
a check, a TLB install, and a few cycles to get back.
Which MIPS? R2000? R10000? Something else? Was this an inverted page
table?
On an x86 this would be at least 200 cycles just getting there and back.
Which x86? 8086? 80186? 80286? These (maybe the 8088 and V20, too)
are the only implementations that deserve to be called x86. If you
mean some IA-32 or AMD64 implementations, which ones?
Anyway, let's see how this works for the U74 (a RISC-V implementation
which apparently uses trapping for unaligned loads); here we have a
10M iteration loop with a payload that performs one load per
iteration:
[fedora-starfive:~/nfstmp/gforth-riscv:104544] perf stat -e instructions -e cycles gforth-fast -e ': foo 10000000 0 do @ loop ; 0 value x here aligned to x x x ! x foo drop bye'
Performance counter stats for 'gforth-fast -e : foo 10000000 0 do @ loop ; 0 value x here aligned to x x x ! x foo drop bye':
223805151 instructions:u # 0.70 insn per cycle
318131306 cycles:u
0.352533487 seconds time elapsed
0.257061000 seconds user
0.064265000 seconds sys
[fedora-starfive:~/nfstmp/gforth-riscv:104545] perf stat -e instructions -e cycles gforth-fast -e ': foo 10000000 0 do @ loop ; 0 value x here aligned 1+ to x x x ! x foo drop bye'
Performance counter stats for 'gforth-fast -e : foo 10000000 0 do @ loop ; 0 value x here aligned 1+ to x x x ! x foo drop bye':
5329494415 instructions:u # 0.75 insn per cycle
7149481783 cycles:u
7.183239751 seconds time elapsed
7.082298000 seconds user
0.070121000 seconds sys
So the unaligned access handling result in 511 additional instructions
per load compared to an aligned access (so it obviously does the
handling using some kind of trapping). Each unaligned access results
in 683 additional cycles.
So better use the unspecified MIPS, right? However, if the
unspecified MIPS is an R2000, 19 cycles on a 12.5MHz R2000 cost
1.52us, whereas 683 cycles on a 1000MHz U74 cost 0.683us (and I have
heard that in the Visionfive V2 the U74 runs at 1500MHz).
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>