Liste des Groupes | Revenir à c arch |
MitchAlsup1 <mitchalsup@aol.com> wrote:On Tue, 20 Aug 2024 23:08:03 +0000, Brett wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:On Tue, 20 Aug 2024 17:40:50 +0000, Michael S wrote:
On Tue, 20 Aug 2024 16:40:06 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
and you may have
several of these in a local sequence of code. ...
No, you can not have several. It's always one then another one then yet
another one etc... Each one can reuse the same temporary register.
The point is that the cost of not getting allocated into a register
is vastly lower--the count of instructions remains 1 while the
latency increases. That increase in latency does not hurt those
use once/seldom variables.
The the examples cited, the lack of register allocation triples
the instruction count due to lack of LD-OP and LD-OP-ST. The
register count I stated is how many registers would a
non-LD-OP machine need to break even on the instruction count.
LD-OP-ST is a bridge too far for me.
LD-OP and OP-ST are fine with me and have benefits.
If you put cache write at or after register file write in the
pipeline; LD-OP-ST basically falls out for free and you can
move the intermediate values from whence they are produced
to where they are consumed with forwarding.
LD-OP-ST mostly only fits if it is add to memory.
42 bit opcodes work, you only need one in four RISC opcodes to merge to
LD-OP or OP-ST for code density to be the same, and generally you will do
better.
The two leftover bits can be ignored, or be a template indicator, so you
can pack in a LD-OP-ST, or 31 bit RISC ops.
Or go heads and tails packing.
But you have not built such, you built an improved RISC…
I spent 7 years doing x86-64.....so much for not having.....
It is from that episode the cemented me on the value of
[Rbase+Rindex<<scale+Displacement] and the utility of LD-OPs
and LD-OP-STs. Then I took that and made a better RISC ISA.
That RISC ISA did not have LD-OP-STs because of OpCode
encoding reasons not from pipelining reasons.
I assume OP-ST has issues with the value getting stuck if the address is
slow to resolve. With a register the value can just spill to the
register backing file. And because of this you create a hidden register
name for the value.
Athlon and Opteron had value capturing reservation stations.
K9 have value-free RSs. It caused little headache because
while we did not give it a named physical register, we did
give it a physical register for the intermediates. SW can only
read/write named PRs getting the name from logical to physical
register renaming.
You have information on how many hidden registers are in flight on
average and worst case, so I believe your numbers.
I have not looked to see if compilers generate LD-OP and OP-ST, at one
point Intel was discouraging such code.
Partially because AMD performed "relatively" better on LD-OPs and
LD-OP-STs than Intel at that time. Where "relatively" means
significantly above the noise level but "not all that much".
Les messages affichés proviennent d'usenet.