Re: binary128 implementation

Liste des GroupesRevenir à c arch 
Sujet : Re: binary128 implementation
De : bohannonindustriesllc (at) *nospam* gmail.com (BGB-Alt)
Groupes : comp.arch
Date : 24. May 2024, 22:22:12
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v2r0e5$2ggiu$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
User-Agent : Mozilla Thunderbird
On 5/24/2024 2:07 AM, Terje Mathisen wrote:
MitchAlsup1 wrote:
BGB-Alt wrote:
>
On 5/20/2024 7:28 AM, Terje Mathisen wrote:
Anton Ertl wrote:
Michael S <already5chosen@yahoo.com> writes:
On Sun, 19 May 2024 18:37:51 +0200
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
The FMA normalizer has to handle a maximally bad cancellation, so it
needs to be around 350 bits wide. Mitch knows of course but I'm
guessing that this could at least be close to needing an extra cycle
on its own and/or heroic hardware?
>
Terje
>
>
Why so wide?
Assuming that subnormal multiplier inputs are normalized before
multiplication, the product of multiplication is 226 bits
>
The product of the mantissa multiplication is at most 226 bits even if
you don't normalize subnormal numbers.  For cancellation to play a
role the addend has to be close in absolute value and have the
opposite sign as the product, so at most one additional bit comes into
play for that case (for something like the product being
0111111... and the addend being -10000000...).
>
This is the part of Mitch's explanation that I have never been able to totally grok, I do think you could get away with less bits, but only if
>
you can collapse the extra mantissa bits into sticky while aligning the
>
product with the addend. If that takes too long or it turns out to be easier/faster in hardware to simply work with a much wider mantissa, then I'll accept that.
>
I don't think I've ever seen Mitch make a mistake on anything like
this!
>
>
It is a mystery, though seems like maybe Binary128 FMA could be done in
>
software via an internal 384-bit intermediate?...
>
My thinking is, say, 112*112, padded by 2 bits (so 114 bits), leads to 228 bits. If one adds another 116 bits (for maximal FADD), this comes
to
>
344.
>
Maximal product with minimal augend::
>
     pppppppp-pppppppp-aaaaaaaa
>
Maximal augend with minimal product
>
     aaaaaaaa-pppppppp-pppppppp
>
So the way one builds HW is to have the augend shifter cover the whole

length and place the product in the middle::
>
        max                        min
     aaaaaaaa-aaaaaaaa-aaaaaaaa-aaaaaaaa
              pppppppp-pppppppp
>
The output of the product is still in carry-save form and the augend is
in pure binary so the adder is 3-input for 2×-width. This generates a
carry into the high order incrementor.
>
So one has a sticky generator for the right hand side augend, and an
incrementor for the left hand side augend. When doing high speed de-
normals one cannot count on the left hand side of product to have HoBs
set with standard ramifications (imaging a denorm product and a denorm
augend and you want the right answer.)
>
Any way you cook it, you have a 4× wide intermediate (minus 2-bits
IIRC).
4×112 = 448 -2 = 446.
There is a reason these things are not standard at this point of
technology.
 So this is basically due to the product part still being in carry-save format, so it cannot easily be moved/aligned, instead the augend has to be able to move to either side of it. OK, that makes sense!
 
OK.
I guess it is possible doing it this way could be more efficient with resources than doing a narrower FADD with "select larger value on left, shift smaller value on right" logic.
Though, for a software implementation, I would still assume select-and-shift.

 
>
Could you do it (IEEE accuracy) with less HW--yes, but only if you
allow
certain special cases to take more cycles in calculation. At a certain
point (a point made by Terje) it is easier to implement with wide
integer
calculations 128+128 and/or 128*128 along with double width shifts,
inserts,
and extracts.
>
IEEE did not make these things any easier by having a 2× std width fraction have 2×+3 bits of length requiring 8 multiplications with
minimal HW instead of 4 multiplications. On the other hand IBM did us
no favors with Hex FP either (keeping the exponent size the same and
having 2×+8 bits of fraction.)
 This is an intentional feature, not a bug!
 By making sure that all ieee larger formats have a mantissa with at least 2n+3 bits compared to the smaller format below, you avoid all double rounding issues if you do a calculation in the larger format and then immediately store it back to a smaller format container.
 By also having a wider exponent you can do things like sqrt(x^2+y^2) and completely avoid spuriouos overflows during the squaring ops: As long as the final result fits in float, it will be the correct result.
 We started out with 1:8:23 and 1:11:52, then we got 1:15:112 at the higher end and 1:5:10 for fp16 and 1:3:4 for fp8.
 Do note that the 8 and 16-bit variants do break the 2n+3 rule, also note that the AI training people like truncated 32-bit, i.e. 1:8:7 which keeps the full float range but with ~1/3 the mantissa resolution.
 Anyway, doing fp128 in SW I would of course do it using u64 unsigned integer ops: FMUL128 becomes 4 64x64->128 MUL ops plus the adding/merging of the terms and a bunch of book keeping work on the signs and exponents.
 With a single fully pipelined integer multiplier taking 4 cycles, this would be 7 cycles for the MULs, with the last three cycles overlapped with the initial ADD/ADC operations. Seems like it could be doable in sub-20 cycles?
 I'm assuming the CPU to be wide enough that the special cases can be handled in parallel with the default/normal inputs case, also assuming reg-reg MOVes to be zero cycles, handled in the renamer, in order to overcome the dedicated register (RDX) issue which we have retained even using MULX.
 
When I looked into possible options, the most sensible option at the time seemed to be to support some 128-bit ALU instructions to allow for faster software emulation. Mostly, the cost of having 128-bit ALU ops being significantly less than that of doing 128-bit floating-point in hardware.
Though, part of the thing with ALU is that one can implement it by gluing multiple 64-bit units together. With Shift, neither unit needs to "see" the results of the other half, so it is mostly a trick of routing the inputs.
With ADD/SUB/CMP, one has to get a little more clever and route the carry bits between the units, which has a potential latency cost. Though, if one adds an extra cycle of latency it is possible to compute both possible answers for the high-order bits and then select which one matches the carry observed from the low-order bits (nevermind the annoyance of 128-bit ADD having a 3-cycle latency, but this would still be faster than doing it with ADC instructions in my case).
Not entirely sure why other 64 bit ISA's tend not to have 128-bit ALU ops though.
In my case, BGBCC supports __int128 operations whether or not the ALUX instructions are enabled (along with _BitInt, *1).
*1:
   _BitInt(56)  x0;  //maps to 64-bit
   _BitInt(64)  x1;  //maps to 64-bit
   _BitInt(80)  x2;  //maps to 128-bit
   _BitInt(128) x3;  //maps to 128-bit
   _BitInt(160) x4;  //maps to 256-bit
   _BitInt(256) x5;  //maps to 256-bit
   _BitInt(272) x6;  //maps to 384-bit
   ...
All sizes beyond 256 bit mapping to the next integer multiple of 128 bits. The 256-bit type is special, in that it has its own dedicated logic, but exists via the _BitInt type. For 384 and beyond, generic logic is used that deals with any size value, but is slower.
Can note that in my implementation, BitInt does not enforce modulo behavior in the case of overflow (it is modulo only to the size of the container; enforcing odd-bit modulo behavior would add a fair bit of cost to using them).

Terje
 

Date Sujet#  Auteur
12 May 24 * Making Lemonade (Floating-point format changes)101John Savard
12 May 24 +* Re: Making Lemonade (Floating-point format changes)3wolfgang kern
15 May 24 i`* Re: Making Lemonade (Floating-point format changes)2Michael S
15 May 24 i `- Re: Making Lemonade (Floating-point format changes)1BGB
12 May 24 +* Re: Making Lemonade (Floating-point format changes)6Thomas Koenig
12 May 24 i`* Re: Making Lemonade (Floating-point format changes)5John Savard
12 May 24 i +- Re: Making Lemonade (Floating-point format changes)1John Savard
12 May 24 i `* Re: Making Lemonade (Floating-point format changes)3MitchAlsup1
13 May 24 i  `* Re: Making Lemonade (Floating-point format changes)2John Savard
13 May 24 i   `- Re: Making Lemonade (Floating-point format changes)1BGB
12 May 24 `* Re: Making Lemonade (Floating-point format changes)91John Dallman
12 May 24  `* Re: Making Lemonade (Floating-point format changes)90Thomas Koenig
13 May 24   `* Re: Making Lemonade (Floating-point format changes)89Michael S
13 May 24    +* Re: Making Lemonade (Floating-point format changes)56Thomas Koenig
14 May 24    i`* Re: Making Lemonade (Floating-point format changes)55Michael S
15 May 24    i `* Re: Making Lemonade (Floating-point format changes)54Thomas Koenig
15 May 24    i  `* Re: Making Lemonade (Floating-point format changes)53Michael S
19 May 24    i   `* Re: Making Lemonade (Floating-point format changes)52Thomas Koenig
19 May 24    i    +* Re: Making Lemonade (Floating-point format changes)3Michael S
19 May 24    i    i+- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
20 May 24    i    i`- Re: Making Lemonade (Floating-point format changes)1Thomas Koenig
19 May 24    i    `* Re: Making Lemonade (Floating-point format changes)48Terje Mathisen
19 May 24    i     +* Re: Making Lemonade (Floating-point format changes)40Michael S
19 May 24    i     i+- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
20 May 24    i     i+* Re: Making Lemonade (Floating-point format changes)30Terje Mathisen
20 May 24    i     ii`* Re: Making Lemonade (Floating-point format changes)29Michael S
20 May 24    i     ii `* Re: Making Lemonade (Floating-point format changes)28Terje Mathisen
20 May 24    i     ii  `* Re: Making Lemonade (Floating-point format changes)27Michael S
20 May 24    i     ii   +* Re: Making Lemonade (Floating-point format changes)19BGB
20 May 24    i     ii   i`* Re: Making Lemonade (Floating-point format changes)18MitchAlsup1
20 May 24    i     ii   i +- Re: Making Lemonade (Floating-point format changes)1Chris M. Thomasson
20 May 24    i     ii   i +- Re: Making Lemonade (Floating-point format changes)1Thomas Koenig
21 May 24    i     ii   i `* Re: Making Lemonade (Floating-point format changes)15BGB
21 May 24    i     ii   i  +* Re: Making Lemonade (Floating-point format changes)12Thomas Koenig
21 May 24    i     ii   i  i+* Re: Making Lemonade (Floating-point format changes)7Michael S
21 May 24    i     ii   i  ii+* Re: Making Lemonade (Floating-point format changes)5Terje Mathisen
21 May 24    i     ii   i  iii+- Re: Making Lemonade (Floating-point format changes)1Michael S
21 May 24    i     ii   i  iii`* Re: Making Lemonade (Floating-point format changes)3BGB
22 May 24    i     ii   i  iii `* Re: Making Lemonade (Floating-point format changes)2MitchAlsup1
22 May 24    i     ii   i  iii  `- Re: Making Lemonade (Floating-point format changes)1BGB-Alt
21 May 24    i     ii   i  ii`- Re: Making Lemonade (Floating-point format changes)1Thomas Koenig
21 May 24    i     ii   i  i`* Re: Making Lemonade (Floating-point format changes)4BGB
21 May 24    i     ii   i  i `* Re: Making Lemonade (Floating-point format changes)3MitchAlsup1
21 May 24    i     ii   i  i  +- Re: Making Lemonade (Floating-point format changes)1BGB
22 May 24    i     ii   i  i  `- Re: Making Lemonade (Floating-point format changes)1Terje Mathisen
21 May 24    i     ii   i  `* Re: Making Lemonade (Floating-point format changes)2MitchAlsup1
21 May 24    i     ii   i   `- Re: Making Lemonade (Floating-point format changes)1BGB
20 May 24    i     ii   `* Re: Making Lemonade (Floating-point format changes)7Terje Mathisen
21 May 24    i     ii    `* Re: Making Lemonade (Floating-point format changes)6Michael S
21 May 24    i     ii     `* Re: Making Lemonade (Floating-point format changes)5MitchAlsup1
21 May 24    i     ii      +* Re: Making Lemonade (Floating-point format changes)2Stefan Monnier
22 May 24    i     ii      i`- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
22 May 24    i     ii      +- Re: Making Lemonade (Floating-point format changes)1Terje Mathisen
22 May 24    i     ii      `- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
20 May 24    i     i`* binary128 implementation (was: Making Lemonade (Floating-point format changes)8Anton Ertl
20 May 24    i     i `* Re: binary128 implementation7Terje Mathisen
23 May 24    i     i  `* Re: binary128 implementation6BGB-Alt
23 May 24    i     i   `* Re: binary128 implementation5MitchAlsup1
24 May 24    i     i    `* Re: binary128 implementation4Terje Mathisen
24 May 24    i     i     `* Re: binary128 implementation3BGB-Alt
25 May 24    i     i      `* Re: binary128 implementation2MitchAlsup1
25 May 24    i     i       `- Re: binary128 implementation1BGB
19 May 24    i     +* Re: Making Lemonade (Floating-point format changes)6BGB
19 May 24    i     i`* Re: Making Lemonade (Floating-point format changes)5MitchAlsup1
20 May 24    i     i `* Re: Making Lemonade (Floating-point format changes)4BGB
20 May 24    i     i  `* Re: Making Lemonade (Floating-point format changes)3MitchAlsup1
20 May 24    i     i   `* Re: Making Lemonade (Floating-point format changes)2BGB
20 May 24    i     i    `- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
19 May 24    i     `- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
13 May 24    `* Re: Making Lemonade (Floating-point format changes)32BGB
13 May 24     `* Re: Making Lemonade (Floating-point format changes)31MitchAlsup1
14 May 24      +* Re: Making Lemonade (Floating-point format changes)22BGB
14 May 24      i`* Re: Making Lemonade (Floating-point format changes)21MitchAlsup1
14 May 24      i `* Re: Making Lemonade (Floating-point format changes)20BGB
14 May 24      i  `* Re: Making Lemonade (Floating-point format changes)19MitchAlsup1
14 May 24      i   +* Re: Making Lemonade (Floating-point format changes)2Michael S
15 May 24      i   i`- Re: Making Lemonade (Floating-point format changes)1Michael S
14 May 24      i   +- Re: Making Lemonade (Floating-point format changes)1BGB
16 May 24      i   `* Re: Making Lemonade (Floating-point format changes)15MitchAlsup1
17 May 24      i    `* Re: Making Lemonade (Floating-point format changes)14MitchAlsup1
17 May 24      i     +* Re: Making Lemonade (Floating-point format changes)2MitchAlsup1
18 May 24      i     i`- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
18 May 24      i     `* Re: Making Lemonade (Floating-point format changes)11Chris M. Thomasson
19 May 24      i      `* Re: Making Lemonade (Floating-point format changes)10Chris M. Thomasson
19 May 24      i       `* Re: Making Lemonade (Floating-point format changes)9Chris M. Thomasson
19 May 24      i        `* Re: Making Lemonade (Floating-point format changes)8Chris M. Thomasson
20 May 24      i         `* Re: Making Lemonade (Floating-point format changes)7Chris M. Thomasson
20 May 24      i          `* Re: Making Lemonade (Floating-point format changes)6Chris M. Thomasson
20 May 24      i           `* Re: Making Lemonade (Floating-point format changes)5Chris M. Thomasson
24 May 24      i            `* Re: Making Lemonade (Floating-point format changes)4Chris M. Thomasson
26 May 24      i             `* Re: Making Lemonade (Floating-point format changes)3George Neuner
27 May 24      i              +- Re: Making Lemonade (Floating-point format changes)1Chris M. Thomasson
1 Jun 24      i              `- Re: Making Lemonade (Floating-point format changes)1Chris M. Thomasson
14 May 24      +* Re: Making Lemonade (Floating-point format changes)4Anton Ertl
14 May 24      i`* Re: Making Lemonade (Floating-point format changes)3MitchAlsup1
14 May 24      i +- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
14 May 24      i `- Re: Making Lemonade (Floating-point format changes)1BGB
10 Jun 24      `* Re: Making Lemonade (Floating-point format changes)4Lawrence D'Oliveiro
10 Jun 24       `* Re: Making Lemonade (Floating-point format changes)3Terje Mathisen
10 Jun 24        `* Re: Making Lemonade (Floating-point format changes)2Niklas Holsti
11 Jun 24         `- Re: Making Lemonade (Floating-point format changes)1Lawrence D'Oliveiro

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal