Sujet : Re: Non-pipelined FDIV/SQRT
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 18. Jul 2024, 20:54:05
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <d28e91ba200634ccd05603133cc838bc@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Rocksolid Light
On Thu, 18 Jul 2024 16:56:46 +0000, EricP wrote:
Stefan Monnier wrote:
If the FP multiplier is a 4-stage pipeline, and FDIV is iterating using
the multiplier, can the pipeline get a mix of multiple operations going
at once? FDIV for both Newton–Raphson and Goldschmidt iterates serially
so each can only use one of the 4 pipeline slots.
>
Something I've been wondering for a while, indeed.
IOW, is there enough parallelism inside the FDIV/SQRT "microcode" to
keep the FMAC fully busy (my naive understanding is that there isn't)?
If not, do current CPU make the FMAC available for other operations
while
an FDIV/SQRT is in progress? If not, how hard would it be?
In an SRT FDIV unit: no absolutely not
In a Goldschmidt FDIV unit 3 slots out of 20
Neither of these uses microcode--sequencing: yes; microcode: no.
>
>
Stefan
>
And if they can't mix then to what extent can the end of one op,
as it drains from the pipeline, overlap with the start of the next?
By the pipeline depth of the function unit minus 1.
For example: FDIV = 20 cycles unpipelined, FMUL 4-cycles pipelined;
the multiplier tree is in cycle 2 of the 4 stages. Sequencer knows
the last cycle FDIV of the multiplier (17) and enables its reservation
station to spit out an FMUL on cycle 15 which arrives at FU on cycle
17 after forwarding, so the FMUL takes cycle 18-19-20-21 and is done
in its normal 4-cycles.
Obviously FMUL can pipeline with FMUL but can the next FMUL overlap
with the end of a prior FDIV? An EXP?
The "busy" time of a FU is most often:
busy = latency - (pipeline_depth - 1);
>
I was thinking about reservation station schedulers and wondering
what they might have to optimize.
The interesting nature is spitting out an instruction and then routing
it to a non-busy FU of appropriate capability.