Sujet : Re: Radians Or Degrees?
De : already5chosen (at) *nospam* yahoo.com (Michael S)
Groupes : comp.lang.c comp.archDate : 20. Mar 2024, 23:03:48
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <20240321000348.00004b37@yahoo.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
User-Agent : Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
On Wed, 20 Mar 2024 20:33:44 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
Michael S wrote:
On Wed, 20 Mar 2024 09:54:36 -0400
Stefan Monnier <monnier@iro.umontreal.ca> wrote:
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact
result and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
My impression was that their performance was good enough that the
case for not-correctly-rounded implementations becomes very weak.
It all depend of what you compare against.
For scalar call for majority of transcendental functions on IEEE-754
list, it's probably very easy to get correctly rounded binary32
results in approximately the same time as results calculated with
max. err of, say, 0.75 ULP. Especially so if target machine has
fast binary64 arithmetic.
But in practice when we use lower (than binary64) precision we often
care about vector performance rather than scalar.
I.e. we care little about speed of sinf(), but want ippsTone_32f()
as fast as possible. In case you wonder, this function is part Intel
Performance Primitives and it is FAST. Writing correctly rounded
function that approaches the speed of this *almost* correctly
rounded routine (I think, for sane input ranges it's better than
0.55 ULP) would not be easy at all!
I challenge ANY software version of SIN() correctly rounded or not
to compete with my <patented> HW implementations for speed (or even
power).
>
Before you post this response, you could as well look at what
ippsTone_32f() is doing. Hint - it's not generic scalar sin().
IMHO, for long enough vector and on modern enough Intel or AMD CPU it
will very easily beat any scalar-oriented binary64-oriented HW
implementation of sin() or cos().
This function is not about latency. It's about throughput.
AFAIR, youu were quite surprised by speed (throughput) of another IPP
primitive, ippsSin_64f_A53() when I posted results of timing
measurement here less than 2 yeears ago. So, before you issue a
challenge, just take into account that ippsTone_32f() is both more
specialized than ippsSin_64f_A53() and has much lower precision. So,
while I didn't test, I expect that it is much much faster.