Re: Cost of handling misaligned access

Liste des GroupesRevenir à c arch 
Sujet : Re: Cost of handling misaligned access
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 18. Feb 2025, 23:34:48
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vp31uk$1tg85$2@dont-email.me>
References : 1 2 3 4 5 6 7 8 9
User-Agent : Mozilla Thunderbird
On 2/18/2025 7:07 AM, Michael S wrote:
On Tue, 18 Feb 2025 02:55:33 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
>
It takes Round Nearest Odd to perform Kahan-Babashuka Summation.
>
 Are you aware of any widespread hardware that supplies Round to Nearest
with tie broken to Odd? Or of any widespread language that can request
such rounding mode?
Until both, implementing RNO on niche HW looks to me as wastage of both
HW resources and of space in your datasheet.
 Instead, think of what you possibly forgot to do in order to help
software implementation IEEE binary128. That would be orders of
magnitude more useful in real world. And don't take me wrong, "orders of
magnitude more useful" is still small niche on the absolute scale of
usefulness.
 
IME, what helps with Binary128 is mostly support for efficient handling of big integer values:
   128-bit, ideally fast;
   256-bit, at least semi-efficient
     Will need 256-bit ADD/SUB.
You will likely need a 128 x 128 -> 256-bit widening multiplier.
In my case, this is best implemented with 32x32->64 bit widening multiply ops, and MOVLLD/MOVLHD/MOVHLD/MOVHHD (roughly equivalent to PCKBB/PCKBT/PCKTB/PCKTT in the RV-P extension, also MOVLLD is equivalent to 'PACK' but RV-B lacks the other variants, 1).
1: Where 'B' has a subset of things that are useful, a bunch of holes where stuff that would be useful is absent or had been dropped, and a bunch of random stuff that seems very niche and/or not likely to be all that useful.

That is:: comply with IEEE 754-2019
>
 I'd say, comply with mandatory requirements of IEEE 754-2019.
For optional requirements, be selective. Prefer those that can be
accessed from widespread languages (including incoming editions of
language standards) over the rest.
 
I go the direction of thinking it might be instead preferable to do a different direction, and instead of trying to go for "numerical purity" or "having a bunch of math features in hardware", instead try to optimize for something that:
   More or less gives the useful parts of the IEEE specs;
   Is possible to make integer exact within a reasonable cost.
Would keep the formats from the newer standards, though probably toss out the Decimal formats.
Though, this would go in a different direction, say:
   DAZ+FTZ with Truncate as semi-canonical;
   Sub-ULP bits explicitly fall off the bottom in a defined way.
Could optimize for implementation with either hardware or with equivalent-size integer operations.
Say, one could imagine an abstract model where Binary64 FADD works sort of like:
   sgnA=valA>>63;
   sgnB=valA>>63;
   expA=(valA>>52)&2047;
   expB=(valB>>52)&2047;
   fraA=(valA&((1ULL<<52)-1));
   fraB=(valB&((1ULL<<52)-1));
   if(expA!=0)fraA|=1ULL<<52;
   if(expB!=0)fraB|=1ULL<<52;
   fraA=fraA<<9;  //9 sub ULP bits
   fraB=fraB<<9;
   shrA=(expB>=expA)?(expB-expA):0;
   shrB=(expA>=expB)?(expA-expA):0;
   sgn2A=sgnA; exp2A=expA; fra2A=fraA>>shrA;
   sgn2B=sgnB; exp2B=expB; fra2B=fraB>>shrB;
   //logical clock-edge here.
   fr1C_A=fra2A+fra2B;
   fr1C_B=fra2A-fra2B;
   fr1C_C=fra2B-fra2A;
   if(sgn2A^sgn2B)
   {
     if(fr1C_C>>63)
       { sgn1C=sgn2A; fra1C=fr1C_B; }
     else
       { sgn1C=sgn2B; fra1C=fr1C_C; }
   }
   else
     { sgn1C=!sgn2A; fra1C=fr1C_A; }
   //logical clock-edge here.
   if(fra2C>>62)
     { exp3C=exp2C+1; fra3C=fra2C>>1; }
   else
     { shl=clz64(fra2C)-2; exp3C=exp2C-shl; fra3C=fra2C<<shl; }
   //logical clock-edge here.
   if((exp3C>=2047) || (exp3C<=0))
     { sgnC=sgn2C; expC=(exp3C<=0)?0:2047; fraC=0; }
   else
   {
     sgnC=sgn2C; expC=exp3C; fraC=fra3C>>9;
     //if rounding is done, it goes here.
   }
   valC=(sgnC<<63)|(expC<<52)|fraC;
   //final clock edge.
   //result is now ready.
There are some other tricks that are possible in a Verilog implementation but are absent in this C like model, but they don't change the end result.
The main expensive parts being the mantissa-adder and shifts.
   The shift usually requires log2(N) stages of 2-input MUXing.
   Or, 1 MUX stage for each bit of the shift-amount value.
Though, ad least on Xilinx hardware, 2-bits of MUX can be fused into 1 stage of LUT6's (so, a 64-bit shift needs ~ 3 levels of LUT6).
Still not great to try doing much more than a shift in a single clock cycle. Though more resource budget, parallel shifts and add/subtract is lower latency than trying to detect and inverse the output (and bitwise NOT does not give acceptable results here, the subtracts need to be two's complement).
It can be cheaper to implement the FPU with ones' complement, but this does not give acceptable results if one tries doing integer operations of the FPU (it can be considered as a reasonable request that integer math done via the FPU gives exact integer answers for the ranges covered by the mantissa range).
But, this logic is still annoyingly expensive...
Generally, some special cases can be added on the input and output side to allow for integer conversion.
Int->Binary64:
Handled as an ADD between a negative zero and a synthesized FP value (non-normalized). Basically, fake the signs and exponent and shove the integer value into the mantissa.
Binary64->Int:
Handled as an ADD between a zero mantissa with a large exponent, and the input value. The resulting mantissa (extracted before normalization, and with some sign-selection hacks) being used as the output value.
It is tempting to consider an intermediate between Binary16 and BF16:
Binary16: Often not enough dynamic range, but more precision than needed;
BF16: Overkill dynamic range, not enough precision.
I am starting to suspect S.E6.M9 might have been closer to ideal.
   S.E7.M8 goes probably too far.
It is annoying that one needs to consider having both Binary16 and BF16, but adding a 3rd format to the mix wouldn't necessarily make this better.
There are cases where Binary16 has not enough precision, but usually these end up being handled by 16-bit fixed-point. But, if anything, a case could be made for for integer conversions with a variable exponent offset (to support fixed-point).
Say (in RV terms):
   FCNVSC.X.D  Xd, Fs, ImmExpAdj  //convert to integer with scale
   FCNVSC.D.X  Fd, Xs, ImmExpAdj  //convert to double with scale
Which could save using multiply/divide to scale FPU values.
Also maybe instructions to add/subtract a value from the exponent without needing to use a multiply.
Say:
   FADJEXP Fd, Fs, ImmExpAdj
Which does the equivalent of multiply/divide by power of 2.
   Could take the place of multiply in operations like:
     y=x*4096;
   Or:
     y=x/4096;
...

Date Sujet#  Auteur
2 Feb 25 * Re: Cost of handling misaligned access112BGB
3 Feb 25 +* Re: Cost of handling misaligned access2MitchAlsup1
3 Feb 25 i`- Re: Cost of handling misaligned access1BGB
3 Feb 25 `* Re: Cost of handling misaligned access109Anton Ertl
3 Feb 25  +* Re: Cost of handling misaligned access11BGB
3 Feb 25  i`* Re: Cost of handling misaligned access10Anton Ertl
3 Feb 25  i +- Re: Cost of handling misaligned access1BGB
3 Feb 25  i `* Re: Cost of handling misaligned access8Thomas Koenig
4 Feb 25  i  `* Re: Cost of handling misaligned access7Anton Ertl
4 Feb 25  i   +* Re: Cost of handling misaligned access5Thomas Koenig
4 Feb 25  i   i`* Re: Cost of handling misaligned access4Anton Ertl
4 Feb 25  i   i +* Re: Cost of handling misaligned access2Thomas Koenig
10 Feb 25  i   i i`- Re: Cost of handling misaligned access1Mike Stump
10 Feb 25  i   i `- Re: Cost of handling misaligned access1Mike Stump
4 Feb 25  i   `- Re: Cost of handling misaligned access1MitchAlsup1
3 Feb 25  +* Re: Cost of handling misaligned access3Thomas Koenig
3 Feb 25  i`* Re: Cost of handling misaligned access2BGB
3 Feb 25  i `- Re: Cost of handling misaligned access1MitchAlsup1
4 Feb 25  +* Re: Cost of handling misaligned access41Anton Ertl
5 Feb 25  i`* Re: Cost of handling misaligned access40Terje Mathisen
5 Feb 25  i +* Re: Cost of handling misaligned access4Anton Ertl
5 Feb 25  i i+* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i ii`- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i i`- Re: Cost of handling misaligned access1Anton Ertl
5 Feb 25  i `* Re: Cost of handling misaligned access35Michael S
6 Feb 25  i  +* Re: Cost of handling misaligned access32Anton Ertl
6 Feb 25  i  i`* Re: Cost of handling misaligned access31Michael S
6 Feb 25  i  i +* Re: Cost of handling misaligned access2Anton Ertl
6 Feb 25  i  i i`- Re: Cost of handling misaligned access1Michael S
6 Feb 25  i  i `* Re: Cost of handling misaligned access28Terje Mathisen
6 Feb 25  i  i  `* Re: Cost of handling misaligned access27Terje Mathisen
6 Feb 25  i  i   `* Re: Cost of handling misaligned access26Michael S
6 Feb 25  i  i    `* Re: Cost of handling misaligned access25Terje Mathisen
6 Feb 25  i  i     +* Re: Cost of handling misaligned access19Michael S
7 Feb 25  i  i     i`* Re: Cost of handling misaligned access18Terje Mathisen
7 Feb 25  i  i     i `* Re: Cost of handling misaligned access17Michael S
7 Feb 25  i  i     i  `* Re: Cost of handling misaligned access16Terje Mathisen
7 Feb 25  i  i     i   `* Re: Cost of handling misaligned access15Michael S
7 Feb 25  i  i     i    +- Re: Cost of handling misaligned access1Terje Mathisen
7 Feb 25  i  i     i    +* Re: Cost of handling misaligned access3MitchAlsup1
8 Feb 25  i  i     i    i+- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i    i`- Re: Cost of handling misaligned access1Michael S
8 Feb 25  i  i     i    `* Re: Cost of handling misaligned access10Anton Ertl
8 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i     +* Re: Cost of handling misaligned access6Michael S
8 Feb 25  i  i     i     i`* Re: Cost of handling misaligned access5Anton Ertl
8 Feb 25  i  i     i     i +- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i +* Re: Cost of handling misaligned access2Michael S
11 Feb 25  i  i     i     i i`- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i `- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Michael S
10 Feb 25  i  i     i     `- Re: Cost of handling misaligned access1Michael S
7 Feb 25  i  i     `* Re: Cost of handling misaligned access5BGB
7 Feb 25  i  i      `* Re: Cost of handling misaligned access4MitchAlsup1
7 Feb 25  i  i       `* Re: Cost of handling misaligned access3BGB
8 Feb 25  i  i        `* Re: Cost of handling misaligned access2Anssi Saari
8 Feb 25  i  i         `- Re: Cost of handling misaligned access1BGB
6 Feb 25  i  `* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i   `- Re: Cost of handling misaligned access1Michael S
6 Feb 25  +* Re: Cost of handling misaligned access5Waldek Hebisch
6 Feb 25  i+* Re: Cost of handling misaligned access3Anton Ertl
6 Feb 25  ii`* Re: Cost of handling misaligned access2Waldek Hebisch
6 Feb 25  ii `- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i`- Re: Cost of handling misaligned access1Terje Mathisen
13 Feb 25  `* Re: Cost of handling misaligned access48Marcus
13 Feb 25   +- Re: Cost of handling misaligned access1Thomas Koenig
14 Feb 25   +* Re: Cost of handling misaligned access41BGB
14 Feb 25   i`* Re: Cost of handling misaligned access40MitchAlsup1
18 Feb 25   i `* Re: Cost of handling misaligned access39BGB
18 Feb 25   i  +* Re: Cost of handling misaligned access33MitchAlsup1
18 Feb 25   i  i+- Re: Cost of handling misaligned access1BGB
18 Feb 25   i  i`* Re: Cost of handling misaligned access31Michael S
18 Feb 25   i  i +- Re: Cost of handling misaligned access1Thomas Koenig
18 Feb 25   i  i +* Re: Cost of handling misaligned access26MitchAlsup1
18 Feb 25   i  i i`* Re: Cost of handling misaligned access25Terje Mathisen
18 Feb 25   i  i i `* Re: Cost of handling misaligned access24MitchAlsup1
19 Feb 25   i  i i  `* Re: Cost of handling misaligned access23Terje Mathisen
19 Feb 25   i  i i   `* Re: Cost of handling misaligned access22MitchAlsup1
19 Feb 25   i  i i    `* Re: Cost of handling misaligned access21BGB
20 Feb 25   i  i i     +- Re: Cost of handling misaligned access1Robert Finch
20 Feb 25   i  i i     +* Re: Cost of handling misaligned access5MitchAlsup1
20 Feb 25   i  i i     i+* Re: Cost of handling misaligned access2BGB
20 Feb 25   i  i i     ii`- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     i`* Re: Cost of handling misaligned access2Robert Finch
21 Feb 25   i  i i     i `- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     `* Re: Cost of handling misaligned access14BGB
22 Feb 25   i  i i      +- Re: Cost of handling misaligned access1Robert Finch
22 Feb 25   i  i i      `* Re: Cost of handling misaligned access12Robert Finch
23 Feb 25   i  i i       +* Re: Cost of handling misaligned access10BGB
23 Feb 25   i  i i       i`* Re: Cost of handling misaligned access9Michael S
24 Feb 25   i  i i       i +- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i `* Re: Cost of handling misaligned access7Michael S
24 Feb 25   i  i i       i  +* Re: Cost of handling misaligned access4Robert Finch
24 Feb 25   i  i i       i  i+- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i  i`* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i  i `- Re: Cost of handling misaligned access1BGB
25 Feb 25   i  i i       i  `* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i   `- Re: Cost of handling misaligned access1BGB
23 Feb 25   i  i i       `- Re: Cost of handling misaligned access1Robert Finch
18 Feb 25   i  i `* Re: Cost of handling misaligned access3BGB
19 Feb 25   i  i  `* Re: Cost of handling misaligned access2MitchAlsup1
18 Feb 25   i  `* Re: Cost of handling misaligned access5Robert Finch
17 Feb 25   `* Re: Cost of handling misaligned access5Terje Mathisen

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal