Re: Faster div or 1/sqrt approximations (was: Continuations)

Liste des GroupesRevenir à c arch 
Sujet : Re: Faster div or 1/sqrt approximations (was: Continuations)
De : tkoenig (at) *nospam* netcologne.de (Thomas Koenig)
Groupes : comp.arch
Date : 20. Jul 2024, 23:58:59
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v7hbv3$3nb28$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : slrn/1.0.3 (Linux)
Michael S <already5chosen@yahoo.com> schrieb:
On Fri, 19 Jul 2024 20:25:51 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
>
MitchAlsup1 <mitchalsup@aol.com> schrieb:
 
I, personally, have found many Newton-Raphson iterators that
converge faster using 1/SQRT(x) than using the SQRT(x) equivalent. 
 
I can well believe that.
 
It is interesting to see what different architectures offer for
faster reciprocals.
 
POWER has fre and fres (double and single version) for approximate
divisin, which are accurate to 1/256.  These operations are quite
fast, 4 to 7 cycles on POWER9, with up to 4 instructions per cycle
so obviously fully pipelined.  With 1/256 accuracy, this could
actually be the original Quake algorithm (or its modification)
with a single Newton step, but this is of course much better in
hardware where exponent handling can be much simplified (and
done only once).
 
x86_64 has rcpss, accurate to 1/6144, with (looking at the
instruction tables) 6 for newer architectures, with a throuhtput
of 1/4. 
>
It seems, you looked at the wrong instruction table.

[Note I was not writing about inverse squre root, I was writing
about inverse].

I have to admit to being almost terminally confused by Intel
generation names, so I am likely to mix up what is old and what
is new.

Here are not the very modern x86-64 cores:
Arch     Latency Throughput (scalar/128b/256b)
Zen3      3       2/2/1
Skylake   4       1/1/1
Ice Lake  4       1/1/1
Power9    5-7     4/2/N/A

Power9 has it for 128-bit, but not for 256 bits (it doesn't have
those registers), and if I read the handbook correctly, that
would also be 4 operations in parallel.

>
So, if your business depends on calculating many inaccurate
square roots, fast, buy a POWER :-)
 
>
If you are have enough of independent rsqrt to do, all four processors
have the same theoretical peak throughput, but x86 tend to have more
cores and to run at faster clock. And lower latency makes achieving
peak throughput easier. Also, depending on target precision, higher
initial precision of x86 estimate means that sometimes you can get away
with 1 less NR iteration.
>
Also, if what you really need is sqrt rather than rsqrt, then depending
on how much inaccuracy you can accept, sometimes on modern x86 the
calculating accurate sqrt can be better solution than calculating
approximation. It is less likely to be the case on POWER9 Accurate sqrt

[table reformatted, hope I got this right]

(single precision)
Zen3      14      0.20/0.200/0.200
SkyLake   12      0.33/0.333/0.167
Ice Lake  12      0.33/0.333/0.167
Power9    26      0.20/0.095/N/A
>
Accurate sqrt (double precision)
Zen3      20      0.111/0.111/0.111
Skylake   12      0.167/0.167/0.083
Ice Lake  12      0.167/0.167/0.083
Power9    36      0.111/0.067/N/A
>
>
Other architectures I have tried don't seem to have it.
 
>
Arm64 has it. It is called FRSQRTE.

Interesting that "gcc -O3 -ffast-meth -mrecip" does not
appear to use it.

>
>
Does it make sense? Well, if you want to calculate lots of Arrhenius
equations, you don't need full accuracy and (like in Mitch's case)
exp has become as fast as division, then it could actually make a
lot of sense.  It is still possible to add Newton steps afterwards,
which is what gcc does if you add -mrecip -ffast-math.
>
I don't know about POWER, but on x86 I wouldn't do it.
I'd either use plain division that on modern cores is quite fast
or will use NR to calculate normal reciprocal. x86 provides initial
estimate for that too (RCPSS).

Note that I was talking about the inverse in the first place.

Date Sujet#  Auteur
13 Jul 24 * Continuations138Lawrence D'Oliveiro
13 Jul 24 +* Re: Continuations4BGB
14 Jul 24 i+* Re: Continuations2aph
15 Jul 24 ii`- Re: Continuations1Lawrence D'Oliveiro
14 Jul 24 i`- Re: Continuations1Anton Ertl
13 Jul 24 +* Re: Continuations23John Dallman
14 Jul 24 i+* Re: Continuations21Lawrence D'Oliveiro
14 Jul 24 ii`* Re: Continuations20George Neuner
14 Jul 24 ii `* Re: Continuations19John Levine
14 Jul 24 ii  `* Re: Continuations18Niklas Holsti
15 Jul 24 ii   +* Re: Continuations16John Levine
15 Jul 24 ii   i+- Re: Continuations1Terje Mathisen
15 Jul 24 ii   i+- Re: Continuations1John Levine
15 Jul 24 ii   i+* Re: Continuations9Niklas Holsti
16 Jul 24 ii   ii`* Re: Continuations8Lawrence D'Oliveiro
16 Jul 24 ii   ii `* Re: Continuations7John Levine
16 Jul 24 ii   ii  +- Re: Continuations1Chris M. Thomasson
16 Jul 24 ii   ii  `* Re: Continuations5Lawrence D'Oliveiro
16 Jul 24 ii   ii   `* Re: Continuations4John Levine
16 Jul 24 ii   ii    `* Re: Continuations3Lawrence D'Oliveiro
16 Jul 24 ii   ii     `* Re: Continuations2MitchAlsup1
17 Jul 24 ii   ii      `- Re: Continuations1Lawrence D'Oliveiro
16 Jul 24 ii   i+* Re: Continuations3Lawrence D'Oliveiro
16 Jul 24 ii   ii`* Re: Continuations2MitchAlsup1
16 Jul 24 ii   ii `- Re: Continuations1Lawrence D'Oliveiro
16 Jul 24 ii   i`- Re: Continuations1MitchAlsup1
16 Jul 24 ii   `- Re: Continuations1Lawrence D'Oliveiro
14 Jul 24 i`- Re: Continuations1BGB
13 Jul 24 +- Re: Continuations1BGB
14 Jul 24 +* Re: Continuations10Lawrence D'Oliveiro
15 Jul 24 i+* Re: Continuations7Thomas Koenig
15 Jul 24 ii`* Re: Continuations6Thomas Koenig
16 Jul 24 ii +* Re: Continuations4Thomas Koenig
16 Jul 24 ii i+* Re: Continuations2MitchAlsup1
17 Jul 24 ii ii`- Re: Continuations1Lawrence D'Oliveiro
17 Jul 24 ii i`- Re: Continuations1Lawrence D'Oliveiro
17 Jul 24 ii `- Re: Continuations1John Dallman
16 Jul 24 i+- Re: Continuations1Lawrence D'Oliveiro
16 Jul 24 i`- Re: Continuations1John Levine
14 Jul 24 +- Re: Continuations1George Neuner
14 Jul 24 +* Re: Continuations92John Savard
14 Jul 24 i+- Re: Continuations1BGB
15 Jul 24 i`* Re: Continuations90Lawrence D'Oliveiro
16 Jul 24 i `* Re: Continuations89John Savard
16 Jul 24 i  +* Re: Continuations2MitchAlsup1
17 Jul 24 i  i`- Re: Continuations1Lawrence D'Oliveiro
16 Jul 24 i  `* Re: Continuations86MitchAlsup1
17 Jul 24 i   +* Re: Continuations69John Savard
17 Jul 24 i   i`* Re: Continuations68MitchAlsup1
17 Jul 24 i   i `* Re: Continuations67Thomas Koenig
17 Jul 24 i   i  +- Re: Continuations1Thomas Koenig
17 Jul 24 i   i  +- Re: Continuations1Michael S
17 Jul 24 i   i  +* Re: Continuations37MitchAlsup1
17 Jul 24 i   i  i`* Re: Continuations36Stephen Fuld
17 Jul 24 i   i  i `* Re: Continuations35MitchAlsup1
17 Jul 24 i   i  i  +* Re: Continuations22Stephen Fuld
18 Jul 24 i   i  i  i+* Re: Continuations8MitchAlsup1
18 Jul 24 i   i  i  ii+- Re: Continuations1Michael S
18 Jul 24 i   i  i  ii`* Re: Continuations6MitchAlsup1
19 Jul 24 i   i  i  ii +- Re: Continuations1Stephen Fuld
21 Jul 24 i   i  i  ii +* Re: Reservation stations [was Continuations]2Anton Ertl
21 Jul 24 i   i  i  ii i`- Re: Reservation stations [was Continuations]1MitchAlsup1
21 Jul 24 i   i  i  ii `* Re: Reservation stations [was Continuations]2MitchAlsup1
22 Jul 24 i   i  i  ii  `- IPC (was: Reservation stations)1Anton Ertl
18 Jul 24 i   i  i  i+* Re: Continuations11Thomas Koenig
18 Jul 24 i   i  i  ii`* Re: Continuations10Michael S
18 Jul 24 i   i  i  ii `* Re: Continuations9Thomas Koenig
18 Jul 24 i   i  i  ii  `* Re: Continuations8Michael S
18 Jul 24 i   i  i  ii   +* Re: Continuations6Thomas Koenig
18 Jul 24 i   i  i  ii   i+- Re: Continuations1Michael S
18 Jul 24 i   i  i  ii   i`* Re: Continuations4Michael S
19 Jul 24 i   i  i  ii   i `* Re: Continuations3Thomas Koenig
19 Jul 24 i   i  i  ii   i  `* Re: Continuations2Michael S
20 Jul 24 i   i  i  ii   i   `- Re: Continuations1Thomas Koenig
18 Jul 24 i   i  i  ii   `- Re: Continuations1MitchAlsup1
18 Jul 24 i   i  i  i`* Re: Continuations2John Savard
18 Jul 24 i   i  i  i `- Re: Continuations1Thomas Koenig
18 Jul 24 i   i  i  +* Re: Continuations6Thomas Koenig
18 Jul 24 i   i  i  i`* Re: Continuations5Michael S
18 Jul 24 i   i  i  i `* Re: Continuations4Michael S
18 Jul 24 i   i  i  i  `* Re: Continuations3Thomas Koenig
18 Jul 24 i   i  i  i   `* Re: Continuations2MitchAlsup1
20 Jul 24 i   i  i  i    `- Re: Continuations1Thomas Koenig
18 Jul 24 i   i  i  +* Non-pipelined FDIV/SQRT (was: Continuations)3Stefan Monnier
18 Jul 24 i   i  i  i+- Re: Non-pipelined FDIV/SQRT1MitchAlsup1
28 Jul 24 i   i  i  i`- Re: Non-pipelined FDIV/SQRT1Michael S
18 Jul 24 i   i  i  `* Re: Continuations3MitchAlsup1
28 Jul 24 i   i  i   `* Re: Continuations2Paul A. Clayton
28 Jul 24 i   i  i    `- Re: Continuations1Michael S
19 Jul 24 i   i  `* Re: Continuations27Terje Mathisen
19 Jul 24 i   i   +* Re: Continuations5Thomas Koenig
19 Jul 24 i   i   i+- Re: Continuations1Chris M. Thomasson
19 Jul 24 i   i   i`* Re: Continuations3MitchAlsup1
20 Jul 24 i   i   i +- Re: Continuations1Terje Mathisen
20 Jul 24 i   i   i `- Re: Continuations1Thomas Koenig
19 Jul 24 i   i   `* Re: Continuations21MitchAlsup1
19 Jul 24 i   i    +* Re: Continuations8Terje Mathisen
22 Jul 24 i   i    i`* Re: Continuations7Michael S
22 Jul 24 i   i    i +* Re: Continuations3MitchAlsup1
22 Jul 24 i   i    i i`* Re: Continuations2Michael S
23 Jul 24 i   i    i i `- Re: Continuations1MitchAlsup1
23 Jul 24 i   i    i `* Re: Continuations3Terje Mathisen
19 Jul 24 i   i    `* Faster div or 1/sqrt approximations (was: Continuations)12Thomas Koenig
17 Jul 24 i   +* Re: Continuations3Lawrence D'Oliveiro
17 Jul 24 i   +* Re: Continuations12Stephen Fuld
17 Jul 24 i   `- Re: fancy instructions, Continuations1John Levine
15 Jul 24 +- Re: Continuations1wolfgang kern
15 Jul 24 +* Re: pessimal storage allocation, Continuations3John Levine
15 Jul 24 +- Re: Continuations1MitchAlsup1
16 Jul 24 `- Re: Continuations1Lynn Wheeler

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal