Sujet : Re: Radians Or Degrees?
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.lang.c comp.archDate : 21. Mar 2024, 08:38:53
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <utgo6e$22viq$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.1
Stefan Monnier wrote:
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
That much? I had the impression it was significantly cheaper.
The J. M. Muller book indicates about 2× to 2.5×
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project claims
to get much better performance (basically, in the same ballpark as
not-correctly-rounded implementations).
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
That is indeed interesting. However, it is also very interesting that they only do this for 32-bit or less. I.e. the domains for which it is almost trivially easy to verify the results by checking all possible inputs. :-)
My impression was that their performance was good enough that the case
for not-correctly-rounded implementations becomes very weak.
I agree in priniciple: If you can use polys that are not much more complicated than the min-max/cheby case, and which always round to the desired values, then that's an obvious good.
..
Reading the full log2f() code, it seems that they can use double for all evaluations, and with a single (!) mantissa exception, this produces the correct results for all inputs and all rounding modes. :-)
I.e. with 53 bits to work with, giving up about one ulp for the range reduction, the 52 remaining bits corresponds to 2n+6 bits available for the 5-term poly to evaluate a final float result.
When asking for perfectly rounded trancendentals, with approximately the same runtime, the float case is a form of cheating, simply because current FPUs tend to run float and double at the same speed.
OTOH, I'm still persuaded that for a float library, using this approach might in fact be included in the 754 standard.
Doing the same for double by "simply" doing all operations with fp128 variables would definitely take significantly longer, and verification would be an "interesting" problem. (Interesting in the "multiple PhDs" domain.)
Terje
-- - <Terje.Mathisen at tmsw.no>"almost all programming can be viewed as an exercise in caching"