Liste des Groupes | Revenir à c arch |
On Sun, 1 Sep 2024 21:21:38 +0000, BGB wrote:OK. IIRC, I had padded the mantissa with '01' mostly out of paranoia.
On 9/1/2024 1:34 AM, Terje Mathisen wrote:MitchAlsup1 wrote:it is 53×53->106 to get correct rounding in 1 step.It was a revelation to me when I wrote my first fp emulation code and>
grok'ed how having a single guard bit followed by a sticky bit was
sufficient to do this for all rounding modes.
>
At that point I only needed to maintain enough intermediate bits to
guarantee I would still have those rounding bits after normalization.
>
This doesn't mean that I could skip calculating all the bits of the full
NxN->2N mantissa product, only that I didn't need to keep them all
around after normalization.
>
OK.
>
It seemed like when I looked over the 1985 spec initially, it only
required that the result be larger than that of the destination
(seemingly missed the point of it also requiring infinite precision).
>
Say, 54*54 => 68 bits, where 68 > 52, under this interpretation, it
would have worked. Granted, this does turn it into a probability game
whether the result is correct or off by 1.
Though, if one is free to choose what is done in HW or SW, a target which does DIV and SQRT in software should still be valid.>My point exactly,
But, have now since noticed that it did specify computing to infinite
precision (in this version of the standard), which, my FPU does not do.
>>Something IEE specifies but would require an intermediate of 2045
>
There was mention of some operations that I have generally not seen in
the ISA in real-world FPUs:
An FP remainder operator;
bits to get correct in all circumstances. This is easier to do in
Sw ! Mc6881 did it in nearly 2300 cycles !!
Granted.Converters to/from ASCII strings;Easier and better in SW.
Neither BJX2 nor RISC-V has this, mor did it exist in SH-4, ...An FP->Int truncate operator with the result still in FP format;RND (round) instrution.
Possible, though the library call can check if the input is larger than a limit and return identity since any value larger that E+52 is not going to have any fractional part to round or remove.Usually, one goes round-trip FP->Int->FP;Has underflow and overflow problems 2^1022 -> int=>overflow, ...
Which is probably not a lot, as off-hand I am not aware of many ISA's that have floor/ceil/round in the ISA itself, rather than doing it via conversion to an integer type....More modern machines have RND nobody will ever have REM.
>
Seems like pretty much everyone offloaded these tasks to the C library.
Most likely option is detecting the presence of non-zero values in the low 19 bits of the mantissa for the inputs (on both values), carry beyond the low 8 bits of the result, and the presence of values with zero exponent but non-zero value, ...>You could check for "inability to correctly round and trap on that
>
I had ended up with coverage of most of the rest, albeit still lacking a
"trap on denormal" handler (seemingly worked for MIPS and friends, *).
>
So, it seemed like it was getting pretty close to "could maybe pass the
1985 spec if one lawyers it...". Maybe not so much it seems, unless I
fix the FMUL issue (TBD if it can be done without significantly
increasing adder-chain latency).
{I have a patent on doing this in transcendental instructions}
OK.>GPUs started out without even IEEE 754 formats and over many generations
>
It is possible I could also add a check to detect and trap multiplies
for cases where both values have non-zero low-order bits (allowing these
to also be emulated in software).
>
So, went and added a flag for "Trap as needed to emulate full IEEE
semantics" to FPSCR, where the idea is that enabling this will cause it
to trap in cases where the FPU detects that the results would likely not
match the IEEE standard (if using FADDG/FSUBG/FMULG/..., generally if
fenv_access is enabled).
>
Might make sense to have a compiler option to assume fenv_access is
always enabled.
>
>
>
*: Though, from what I can gather, most of the N64 games and similar had
operated with this disabled (giving DAZ/FTZ semantics) which apparently
posed an annoyance for later emulators (things like moving platforms in
games like SMB64 would apparently slowly drift upwards or away from the
origin if the map was left running for long enough, etc; due to SSE and
similar tending to operate with denormals enabled).
did more and more of 754, then 2008, and closing in on 2019
>
>FMAC (with single rounding, which is the interesting one) you can of>
course get catastrophic cancellation, so you need all the 2N mantissa
bits of the multiplication plus the N bits from the addend, then you
either need a normalizer wide enough to take in any possibly alignments
of the two parts, or you must have separate logic for each of the major
cases.
>
Yeah, for the 2008 spec onward, would also need this...
>
It is possible to provide it as a library call, but granted this makes
it slower.
>
>
There are FMAC instructions, but they are currently both slow and
double-rounded (so, not so useful). Well, except for Binary16 and
Binary32 which appear single-rounded mostly because they happen to be
performed internally as Binary64 (but are still slow).
>
Les messages affichés proviennent d'usenet.