Liste des Groupes | Revenir à c arch |
MitchAlsup1 wrote:The LD-OP-ST machine would have this built into the pipeline--On Tue, 20 Aug 2024 7:01:49 +0000, Anton Ertl wrote:>
>mitchalsup@aol.com (MitchAlsup1) writes:>On Mon, 19 Aug 2024 18:52:39 +0000, Brett wrote:>
>MitchAlsup1 <mitchalsup@aol.com> wrote:The thing is that one you go down the GBOoO route, your lack of>
registers
"namable in ASM" ceases to become a performance degrader. With
renaming
one can have R7 in use 40 times in a 100 instruction deep execution
window.
If this was true we would have 16 or even 8 visible registers, and all
would be fine. x86 does mostly fine with 16
And yet Intel went to 32 SIMD registers with AVX-512 (which admittedly
was first developed for an in-order microarchitecture) and are now
going to 32 GPRs with APX (no in-order excuse here). And IIRC the
announcement of APX says something about 10% fewer memory accesses or
somesuch.
>Careful, here::>
>
x86 has LD-OPs and LD-OP-STs which makes the 16 register file feel more
like it has 20-22 registers.
You feeling is strong (as shown by your repeatedly ignoring the
counterevidence), but wrong:
>
LD-OPs and LD-OP-STs as on AMD64 and PDP-11 make the 16 registers
equivalent to 17 registers on a load/store architecture:
>
Let's call the 17th register r16:
>
On a load-store architecture you replace "LD-OP dest,src" with:
>
ld r16=src
op dest,dest,r16
>
On a load-store architecture you replace "LD-OP-ST dest,src" with:
>
ld r16=dest
op r16,r16,src
st dest=r16
>
For a VAX-like three-memory-argument instruction you need two extra
registers, r16 and r17:
>
"mem1 = mem2 op mem3" becomes:
>
ld r16=mem2
ld r17=mem3
op r16,r16,r17
st mem1=r17
>
- anton
>
That is not what I am talking about::
>
i = i + 1;
as
ADD [&i],#1
>
1 instruction = 1 add, 1 LD and 1 ST. And
>
i = i + j;
as
ADD Ri,[&j]
>
In neither case is an extra register needed, and you may have
several of these in a local sequence of code. ...
On an in-order pipeline you need someplace to stash the temp value.
If you want, call it a special in-flight pseudo-register that only
exists for forwarding, it is still an identifier for a value that
is outside the architectural register set.
I think it might need two registers if you can have two suchIn the LD-OP-ST microarchitecture there would be some buffer
instructions in the pipeline back-to-back as there could be
multiple temp values in-flight at once
>
ADD [&i],#1
ADD [&j],#1
>
could have &i doing its store while &j is doing its load.
>
On OoO, if the reservation stations are valueless, you need a real
physical register to stash the temp value as there is no guarantee
the OP part of the uOp will launch just when the LD part finishes
doing its thing and forwards the value.
Les messages affichés proviennent d'usenet.