Sujet : Re: Faster div or 1/sqrt approximations
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.archDate : 20. Jul 2024, 19:12:07
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v7guln$3l7kj$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2
Thomas Koenig wrote:
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
Thomas Koenig wrote:
MitchAlsup1 <mitchalsup@aol.com> schrieb:
>
I, personally, have found many Newton-Raphson iterators that converge
faster using 1/SQRT(x) than using the SQRT(x) equivalent.
>
I can well believe that.
>
It is interesting to see what different architectures offer for
faster reciprocals.
>
POWER has fre and fres (double and single version) for approximate
divisin, which are accurate to 1/256. These operations are quite
fast, 4 to 7 cycles on POWER9, with up to 4 instructions per cycle
so obviously fully pipelined. With 1/256 accuracy, this could
actually be the original Quake algorithm (or its modification)
with a single Newton step, but this is of course much better in
hardware where exponent handling can be much simplified (and
done only once).
>
I've taken both a second and third look at InvSqrt() over the last few
years, it turns out that the Quake version is far from optimal: With the
exact same instructions, just a different set of constants, you can get
about 1.5 bits more than quake does, i.e. about 10 bits after that
single NR step (which isn't really NR since it modifies both the 1.5 and
the 0.5 factors).
Wikipedia has something on that, also literature; Moroz et al.
https://arxiv.org/abs/1603.04483 seem to give the optimum quasi-NR
method to do this, with one or two steps.
Impressive amounts of math here, unfortunately they completely miss the point!
What they derive is the exact optimal value for the magic constant, using zero, one or two text-book NR iterations.
However, if you are willing to take that first NR iteration
halfnumber = 0.5f*x
...
i = R-(i>>1);
...
x = x*(1.5f-halfnumber*x*x);
and then make both the 0.5f and 1.5f constants free variables, you can in fact get 1.5 more bits than what they show in this paper.
In fact, using slightly different values for R (the magic constant) results in very marginal changes in max rel error, while optimizing all three constants improves that max error from the stated 1.75e-3 to about 0.6e-3!
I've also looked a little bit at simplifying exp(-1/x) directly, and
it seems an unpleasent function to implement; at least there is no
argument reduction which comes to mind. With exp2(x), you can split
off the integer part and do an appoximation on the rest over a finite
interval, but if there is a fast approximation of the integer part
of 1/x without dividing, I am not aware of one :-)
:-(
Fast is probably what you get from a pure lookup table, but with no obvious NR iteration to improve it.
Terje
-- - <Terje.Mathisen at tmsw.no>"almost all programming can be viewed as an exercise in caching"