Sujet : Re: Reservation stations [was Continuations]
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 21. Jul 2024, 18:03:08
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024Jul21.190308@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : xrn 10.11
EricP <
ThatWouldBeTelling@thevillage.com> writes:
This is where I saw a benefits to using valued reservation stations vs
valueless ones - when a uArch has multiple similar FU each with its own
bank of RS that is scheduled for that FU.
>
Example of horizontal scaling of similar FU each with its own RS bank.
https://i0.wp.com/chipsandcheese.com/wp-content/uploads/2024/07/cheese_oryon_diagram_revised.png
>
With valueless RS, each RS stores only the source register number of
its operands and each FU has to be able to read all its operands
when a uOp launches (begins execution).
If the operation was waiting in the RS, at least one operand comes in
through a forwarding path (at least if the uarch has those).
This means the number of
PRF read ports scales according to the total number of FU operands.
(One could do read port sharing but then you have to schedule that too
and could have contention.)
Yes, but ARM A64 seems to have been designed with that in mind.
Also if an FU is unused on any cycle then
all its (expensive) operand read ports are unused.
>
Using the above Oryon as an example, with valueless RS, to launch
all 14 FU with 3 operands all at once needs 42 read ports.
ARM A64's stp instruction needs up to 4 register reads. But I think
Oryon only supports two stores per cycle, so that raises the number to
44. It also has a split register file: So for the integer register
file 6*3+2*4+2*2=30 ports would be needed, and for the vector register
file, 3*4=12 ports would be needed. I think, though, that the number
actually needed is much smaller, and various techniques are used to
make do with far fewer register ports.
With valued RS, to Dispatch 6 wide with 3 operands needs 18 read ports,
and the read ports are potentially usable for all dispatches.
Note that Oryon has 8-way rename/dispatch (I wonder how that works
with, e.g., 8 ldp instructions that each can write three registers).
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>