Newsportal USENET - Re: Faster div or 1/sqrt approximations (was: Continuations)

Re: Faster div or 1/sqrt approximations (was: Continuations)

Sujet : Re: Faster div or 1/sqrt approximations (was: Continuations)
De : already5chosen (at) *nospam* yahoo.com (Michael S)
Groupes : comp.arch
Date : 20. Jul 2024, 23:46:53

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <20240721014653.00004c9d@yahoo.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)

On Sat, 20 Jul 2024 21:58:59 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:
On Fri, 19 Jul 2024 20:25:51 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

MitchAlsup1 <mitchalsup@aol.com> schrieb:

I, personally, have found many Newton-Raphson iterators that
converge faster using 1/SQRT(x) than using the SQRT(x)
equivalent.

I can well believe that.

It is interesting to see what different architectures offer for
faster reciprocals.

POWER has fre and fres (double and single version) for approximate
divisin, which are accurate to 1/256. These operations are quite
fast, 4 to 7 cycles on POWER9, with up to 4 instructions per cycle
so obviously fully pipelined. With 1/256 accuracy, this could
actually be the original Quake algorithm (or its modification)
with a single Newton step, but this is of course much better in
hardware where exponent handling can be much simplified (and
done only once).

x86_64 has rcpss, accurate to 1/6144, with (looking at the
instruction tables) 6 for newer architectures, with a throuhtput
of 1/4.
>
It seems, you looked at the wrong instruction table.

[Note I was not writing about inverse squre root, I was writing
about inverse].
>

Oh, sorry. I misaunderstood.

I have to admit to being almost terminally confused by Intel
generation names, so I am likely to mix up what is old and what
is new.
>

Gen 2 is Sandy Bridg
Gen 3 is Ivy Bridge which is very similar core microarchitecture
Gen 4 is Haswell
Gen 5 is Broadwell which is similar, but has few changes
Gen 6,7,8,9 and majority of 10 is all the same microarchitecture -
Skylake.
11 is mostly Tiger lake and partially Ice Lake and Rocket Lake. All
three are different from each other on silicon proces side, but ver
similar on core microarcheticture.
12 is Alder Lake that has P cores and E cores. Microarchitecture of P
core is called Golden Cove.

Here are not the very modern x86-64 cores:
Arch    Latency Throughput (scalar/128b/256b)
Zen3 3    2/2/1
Skylake   4    1/1/1
Ice Lake 4    1/1/1
Power9 5-7    4/2/N/A

Power9 has it for 128-bit, but not for 256 bits (it doesn't have
those registers), and if I read the handbook correctly, that
would also be 4 operations in parallel.
>

Handbook is quite clear that both xvresp and xvredp have
max throughput=2.

So, if your business depends on calculating many inaccurate
square roots, fast, buy a POWER :-)

That's the sentence that caused my misunderstanding.

>
If you are have enough of independent rsqrt to do, all four
processors have the same theoretical peak throughput, but x86 tend
to have more cores and to run at faster clock. And lower latency
makes achieving peak throughput easier. Also, depending on target
precision, higher initial precision of x86 estimate means that
sometimes you can get away with 1 less NR iteration.
>
Also, if what you really need is sqrt rather than rsqrt, then
depending on how much inaccuracy you can accept, sometimes on
modern x86 the calculating accurate sqrt can be better solution
than calculating approximation. It is less likely to be the case on
POWER9 Accurate sqrt

[table reformatted, hope I got this right]

(single precision)
Zen3 14 0.20/0.200/0.200
SkyLake 12 0.33/0.333/0.167
Ice Lake 12 0.33/0.333/0.167
Power9 26 0.20/0.095/N/A
>
Accurate sqrt (double precision)
Zen3 20 0.111/0.111/0.111
Skylake 12 0.167/0.167/0.083
Ice Lake 12 0.167/0.167/0.083
Power9 36 0.111/0.067/N/A
>

Other architectures I have tried don't seem to have it.

>
Arm64 has it. It is called FRSQRTE.

Interesting that "gcc -O3 -ffast-meth -mrecip" does not
appear to use it.
>

You mean, on other architectures gcc does emit approximate rsqrt from
plain C or plain Fortran? In which situations?

>

Does it make sense? Well, if you want to calculate lots of
Arrhenius equations, you don't need full accuracy and (like in
Mitch's case) exp has become as fast as division, then it could
actually make a lot of sense.

Mitch was told more than one time that floating point division on
modern cores (esp. Apple, but others too) is much faster than he thinks.
But he tend to forget it quickly.

It is still possible to add Newton
steps afterwards, which is what gcc does if you add -mrecip
-ffast-math.
>
I don't know about POWER, but on x86 I wouldn't do it.
I'd either use plain division that on modern cores is quite fast
or will use NR to calculate normal reciprocal. x86 provides initial
estimate for that too (RCPSS).

Note that I was talking about the inverse in the first place.

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
13 Jul 24	Continuations	138	Lawrence D'Oliveiro
13 Jul 24	Re: Continuations	4	BGB
14 Jul 24	Re: Continuations	2	aph
15 Jul 24	Re: Continuations	1	Lawrence D'Oliveiro
14 Jul 24	Re: Continuations	1	Anton Ertl
13 Jul 24	Re: Continuations	23	John Dallman
14 Jul 24	Re: Continuations	21	Lawrence D'Oliveiro
14 Jul 24	Re: Continuations	20	George Neuner
14 Jul 24	Re: Continuations	19	John Levine
14 Jul 24	Re: Continuations	18	Niklas Holsti
14 Jul 24	Re: Continuations	16	John Levine
15 Jul 24	Re: Continuations	1	Terje Mathisen
15 Jul 24	Re: Continuations	1	John Levine
15 Jul 24	Re: Continuations	9	Niklas Holsti
16 Jul 24	Re: Continuations	8	Lawrence D'Oliveiro
16 Jul 24	Re: Continuations	7	John Levine
16 Jul 24	Re: Continuations	1	Chris M. Thomasson
16 Jul 24	Re: Continuations	5	Lawrence D'Oliveiro
16 Jul 24	Re: Continuations	4	John Levine
16 Jul 24	Re: Continuations	3	Lawrence D'Oliveiro
16 Jul 24	Re: Continuations	2	MitchAlsup1
17 Jul 24	Re: Continuations	1	Lawrence D'Oliveiro
16 Jul 24	Re: Continuations	3	Lawrence D'Oliveiro
16 Jul 24	Re: Continuations	2	MitchAlsup1
16 Jul 24	Re: Continuations	1	Lawrence D'Oliveiro
16 Jul 24	Re: Continuations	1	MitchAlsup1
16 Jul 24	Re: Continuations	1	Lawrence D'Oliveiro
14 Jul 24	Re: Continuations	1	BGB
13 Jul 24	Re: Continuations	1	BGB
14 Jul 24	Re: Continuations	10	Lawrence D'Oliveiro
15 Jul 24	Re: Continuations	7	Thomas Koenig
15 Jul 24	Re: Continuations	6	Thomas Koenig
16 Jul 24	Re: Continuations	4	Thomas Koenig
16 Jul 24	Re: Continuations	2	MitchAlsup1
17 Jul 24	Re: Continuations	1	Lawrence D'Oliveiro
17 Jul 24	Re: Continuations	1	Lawrence D'Oliveiro
17 Jul 24	Re: Continuations	1	John Dallman
15 Jul 24	Re: Continuations	1	Lawrence D'Oliveiro
16 Jul 24	Re: Continuations	1	John Levine
14 Jul 24	Re: Continuations	1	George Neuner
14 Jul 24	Re: Continuations	92	John Savard
14 Jul 24	Re: Continuations	1	BGB
15 Jul 24	Re: Continuations	90	Lawrence D'Oliveiro
16 Jul 24	Re: Continuations	89	John Savard
16 Jul 24	Re: Continuations	2	MitchAlsup1
17 Jul 24	Re: Continuations	1	Lawrence D'Oliveiro
16 Jul 24	Re: Continuations	86	MitchAlsup1
17 Jul 24	Re: Continuations	69	John Savard
17 Jul 24	Re: Continuations	68	MitchAlsup1
17 Jul 24	Re: Continuations	67	Thomas Koenig
17 Jul 24	Re: Continuations	1	Thomas Koenig
17 Jul 24	Re: Continuations	1	Michael S
17 Jul 24	Re: Continuations	37	MitchAlsup1
17 Jul 24	Re: Continuations	36	Stephen Fuld
17 Jul 24	Re: Continuations	35	MitchAlsup1
17 Jul 24	Re: Continuations	22	Stephen Fuld
18 Jul 24	Re: Continuations	8	MitchAlsup1
18 Jul 24	Re: Continuations	1	Michael S
18 Jul 24	Re: Continuations	6	MitchAlsup1
19 Jul 24	Re: Continuations	1	Stephen Fuld
21 Jul 24	Re: Reservation stations [was Continuations]	2	Anton Ertl
21 Jul 24	Re: Reservation stations [was Continuations]	1	MitchAlsup1
21 Jul 24	Re: Reservation stations [was Continuations]	2	MitchAlsup1
22 Jul 24	IPC (was: Reservation stations)	1	Anton Ertl
18 Jul 24	Re: Continuations	11	Thomas Koenig
18 Jul 24	Re: Continuations	10	Michael S
18 Jul 24	Re: Continuations	9	Thomas Koenig
18 Jul 24	Re: Continuations	8	Michael S
18 Jul 24	Re: Continuations	6	Thomas Koenig
18 Jul 24	Re: Continuations	1	Michael S
18 Jul 24	Re: Continuations	4	Michael S
19 Jul 24	Re: Continuations	3	Thomas Koenig
19 Jul 24	Re: Continuations	2	Michael S
20 Jul 24	Re: Continuations	1	Thomas Koenig
18 Jul 24	Re: Continuations	1	MitchAlsup1
18 Jul 24	Re: Continuations	2	John Savard
18 Jul 24	Re: Continuations	1	Thomas Koenig
18 Jul 24	Re: Continuations	6	Thomas Koenig
18 Jul 24	Re: Continuations	5	Michael S
18 Jul 24	Re: Continuations	4	Michael S
18 Jul 24	Re: Continuations	3	Thomas Koenig
18 Jul 24	Re: Continuations	2	MitchAlsup1
20 Jul 24	Re: Continuations	1	Thomas Koenig
18 Jul 24	Non-pipelined FDIV/SQRT (was: Continuations)	3	Stefan Monnier
18 Jul 24	Re: Non-pipelined FDIV/SQRT	1	MitchAlsup1
28 Jul 24	Re: Non-pipelined FDIV/SQRT	1	Michael S
18 Jul 24	Re: Continuations	3	MitchAlsup1
28 Jul 24	Re: Continuations	2	Paul A. Clayton
28 Jul 24	Re: Continuations	1	Michael S
19 Jul 24	Re: Continuations	27	Terje Mathisen
19 Jul 24	Re: Continuations	5	Thomas Koenig
19 Jul 24	Re: Continuations	1	Chris M. Thomasson
19 Jul 24	Re: Continuations	3	MitchAlsup1
20 Jul 24	Re: Continuations	1	Terje Mathisen
20 Jul 24	Re: Continuations	1	Thomas Koenig
19 Jul 24	Re: Continuations	21	MitchAlsup1
19 Jul 24	Re: Continuations	8	Terje Mathisen
22 Jul 24	Re: Continuations	7	Michael S
22 Jul 24	Re: Continuations	3	MitchAlsup1
22 Jul 24	Re: Continuations	2	Michael S
23 Jul 24	Re: Continuations	1	MitchAlsup1
23 Jul 24	Re: Continuations	3	Terje Mathisen
19 Jul 24	Faster div or 1/sqrt approximations (was: Continuations)	12	Thomas Koenig
17 Jul 24	Re: Continuations	3	Lawrence D'Oliveiro
17 Jul 24	Re: Continuations	12	Stephen Fuld
17 Jul 24	Re: fancy instructions, Continuations	1	John Levine
15 Jul 24	Re: Continuations	1	wolfgang kern
15 Jul 24	Re: pessimal storage allocation, Continuations	3	John Levine
15 Jul 24	Re: Continuations	1	MitchAlsup1
15 Jul 24	Re: Continuations	1	Lynn Wheeler