Sujet : Re: Radians Or Degrees?
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.lang.c comp.archDate : 23. Mar 2024, 09:11:38
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <utm2rr$3hb2q$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.1
Michael S wrote:
On Thu, 21 Mar 2024 08:52:18 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
MitchAlsup1 wrote:
Stefan Monnier wrote:
IIUC that was not the case before their work: it was "easy" to get
the correct result in 99% of the cases, but covering all 100% of
the cases used to be costly because those few cases needed a lot
more internal precision.
>
Muller indicates one typically need 2×n+6 to 2×n+12 bits to get
correct roundings 100% of the time. FP128 only has 2×n+3 and is
insufficient by itself.
>
I agree with everything else you've written about this subject, but
afair, fp128 is using 1:15:112 while double is of course 1:10:53.
>
IEEE-754 binary64 is 1:11:52 :-)
Oops! Mea Culpa!
I _do_ know that double has a 10-bit exponent bias (1023), so it has to be 11 bits wide. :-(
But anyway I am skeptical about Miller's rules of thumb.
I'd expect that different transcendental functions would exercise
non-trivially different behaviors, mostly because they have different
relationships between input and output ranges. Some of them compress
wider inputs into narrower output and some do the opposite.
Yet another factor is luck.
I agree, this is a per-function problem, with some being substantially harder than others.
Besides, I see nothing special about binary128 as a helper format.
It is not supported on wast majority of HW, And even when it is
supported, like on IBM POWER, for majority of operations it is slower
than emulated 128-bit fixed-point. Fix-point is more work for coder, but
sounds like more sure path to success.
In my own code (since I don't have Mitch's ability to use much wider internal fp formats) I also prefer 64-bit u64 as the working chunk size.
Almost 30 years ago, during the FDIV workaround, I needed a q&d way to verify that our fpatan2 replacement was correct, so what I did was to write a 1:31:96 format library over a weekend.
Yeah, it was much more exponent than needed, but with only 32-bit registers available it was much easier to get the asm correct.
For the fpatan2 I used a dead simple approach with little range reduction, just a longish Taylor series (i.e. no Cheby optimizations).
I had previously written 3-4 different iomplementations of arbitrary precision atan2() when I wanted to calculatie as many digits of pi as possible, so I just reused one of those algorithms.
Terje
-- - <Terje.Mathisen at tmsw.no>"almost all programming can be viewed as an exercise in caching"