Liste des Groupes | Revenir à c arch |
On Sun, 19 May 2024 18:37:51 +0200They are not, this is part of what you do to make subnormal numbers exactly the same speed as normal inputs.
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
Thomas Koenig wrote:Why so wide?So, I did some more measurements on the POWER9 machine, and it cameThe FMA normalizer has to handle a maximally bad cancellation, so it
to around 18 cycles per FMA. Compared to the 13 cycles for the
FMA instruction, this actually sounds reasonable.
>
The big problem appears to be that, in this particular
implementation, multiplication is not pipelined, but done by
piecewise by addition. This can be explained by the fact that
this is mostly a decimal unit, with the 128-bit QP just added as
an afterthought, and decimal multiplication does not happen all
that often.
>
A fully pipelined FMA unit capable of 128-bit arithmetic would be
an entirely different beast, I would expect a throughput of 1 per
cycle and a latency of (maybe) one cycle more than 64-bit FMA.
needs to be around 350 bits wide. Mitch knows of course but I'm
guessing that this could at least be close to needing an extra cycle
on its own and/or heroic hardware?
>
Terje
>
Assuming that subnormal multiplier inputs are normalized before
Les messages affichés proviennent d'usenet.