Sujet : Re: Faster div or 1/sqrt approximations
De : tkoenig (at) *nospam* netcologne.de (Thomas Koenig)
Groupes : comp.archDate : 20. Jul 2024, 18:17:53
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v7grg1$3kf77$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : slrn/1.0.3 (Linux)
Terje Mathisen <
terje.mathisen@tmsw.no> schrieb:
Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
I, personally, have found many Newton-Raphson iterators that converge
faster using 1/SQRT(x) than using the SQRT(x) equivalent.
I can well believe that.
It is interesting to see what different architectures offer for
faster reciprocals.
POWER has fre and fres (double and single version) for approximate
divisin, which are accurate to 1/256. These operations are quite
fast, 4 to 7 cycles on POWER9, with up to 4 instructions per cycle
so obviously fully pipelined. With 1/256 accuracy, this could
actually be the original Quake algorithm (or its modification)
with a single Newton step, but this is of course much better in
hardware where exponent handling can be much simplified (and
done only once).
>
I've taken both a second and third look at InvSqrt() over the last few
years, it turns out that the Quake version is far from optimal: With the
exact same instructions, just a different set of constants, you can get
about 1.5 bits more than quake does, i.e. about 10 bits after that
single NR step (which isn't really NR since it modifies both the 1.5 and
the 0.5 factors).
Wikipedia has something on that, also literature; Moroz et al.
https://arxiv.org/abs/1603.04483 seem to give the optimum quasi-NR
method to do this, with one or two steps.
I've also looked a little bit at simplifying exp(-1/x) directly, and
it seems an unpleasent function to implement; at least there is no
argument reduction which comes to mind. With exp2(x), you can split
off the integer part and do an appoximation on the rest over a finite
interval, but if there is a fast approximation of the integer part
of 1/x without dividing, I am not aware of one :-)