Newsportal USENET - Re: Cost of handling misaligned access

Re: Cost of handling misaligned access

Sujet : Re: Cost of handling misaligned access
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 18. Feb 2025, 23:34:48

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <vp31uk$1tg85$2@dont-email.me>
References : 1 2 3 4 5 6 7 8 9
User-Agent : Mozilla Thunderbird

On 2/18/2025 7:07 AM, Michael S wrote:

On Tue, 18 Feb 2025 02:55:33 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
>
It takes Round Nearest Odd to perform Kahan-Babashuka Summation.
>
Are you aware of any widespread hardware that supplies Round to Nearest
with tie broken to Odd? Or of any widespread language that can request
such rounding mode?
Until both, implementing RNO on niche HW looks to me as wastage of both
HW resources and of space in your datasheet.
Instead, think of what you possibly forgot to do in order to help
software implementation IEEE binary128. That would be orders of
magnitude more useful in real world. And don't take me wrong, "orders of
magnitude more useful" is still small niche on the absolute scale of
usefulness.

IME, what helps with Binary128 is mostly support for efficient handling of big integer values:
   128-bit, ideally fast;
   256-bit, at least semi-efficient
   Will need 256-bit ADD/SUB.
You will likely need a 128 x 128 -> 256-bit widening multiplier.
In my case, this is best implemented with 32x32->64 bit widening multiply ops, and MOVLLD/MOVLHD/MOVHLD/MOVHHD (roughly equivalent to PCKBB/PCKBT/PCKTB/PCKTT in the RV-P extension, also MOVLLD is equivalent to 'PACK' but RV-B lacks the other variants, 1).
1: Where 'B' has a subset of things that are useful, a bunch of holes where stuff that would be useful is absent or had been dropped, and a bunch of random stuff that seems very niche and/or not likely to be all that useful.

That is:: comply with IEEE 754-2019
>
I'd say, comply with mandatory requirements of IEEE 754-2019.
For optional requirements, be selective. Prefer those that can be
accessed from widespread languages (including incoming editions of
language standards) over the rest.

I go the direction of thinking it might be instead preferable to do a different direction, and instead of trying to go for "numerical purity" or "having a bunch of math features in hardware", instead try to optimize for something that:
   More or less gives the useful parts of the IEEE specs;
   Is possible to make integer exact within a reasonable cost.
Would keep the formats from the newer standards, though probably toss out the Decimal formats.
Though, this would go in a different direction, say:
   DAZ+FTZ with Truncate as semi-canonical;
   Sub-ULP bits explicitly fall off the bottom in a defined way.
Could optimize for implementation with either hardware or with equivalent-size integer operations.
Say, one could imagine an abstract model where Binary64 FADD works sort of like:
   sgnA=valA>>63;
   sgnB=valA>>63;
   expA=(valA>>52)&2047;
   expB=(valB>>52)&2047;
   fraA=(valA&((1ULL<<52)-1));
   fraB=(valB&((1ULL<<52)-1));
   if(expA!=0)fraA|=1ULL<<52;
   if(expB!=0)fraB|=1ULL<<52;
   fraA=fraA<<9; //9 sub ULP bits
   fraB=fraB<<9;
   shrA=(expB>=expA)?(expB-expA):0;
   shrB=(expA>=expB)?(expA-expA):0;
   sgn2A=sgnA; exp2A=expA; fra2A=fraA>>shrA;
   sgn2B=sgnB; exp2B=expB; fra2B=fraB>>shrB;
   //logical clock-edge here.
   fr1C_A=fra2A+fra2B;
   fr1C_B=fra2A-fra2B;
   fr1C_C=fra2B-fra2A;
   if(sgn2A^sgn2B)
   {
   if(fr1C_C>>63)
   { sgn1C=sgn2A; fra1C=fr1C_B; }
   else
   { sgn1C=sgn2B; fra1C=fr1C_C; }
   }
   else
   { sgn1C=!sgn2A; fra1C=fr1C_A; }
   //logical clock-edge here.
   if(fra2C>>62)
   { exp3C=exp2C+1; fra3C=fra2C>>1; }
   else
   { shl=clz64(fra2C)-2; exp3C=exp2C-shl; fra3C=fra2C<<shl; }
   //logical clock-edge here.
   if((exp3C>=2047) || (exp3C<=0))
   { sgnC=sgn2C; expC=(exp3C<=0)?0:2047; fraC=0; }
   else
   {
   sgnC=sgn2C; expC=exp3C; fraC=fra3C>>9;
   //if rounding is done, it goes here.
   }
   valC=(sgnC<<63)|(expC<<52)|fraC;
   //final clock edge.
   //result is now ready.
There are some other tricks that are possible in a Verilog implementation but are absent in this C like model, but they don't change the end result.
The main expensive parts being the mantissa-adder and shifts.
   The shift usually requires log2(N) stages of 2-input MUXing.
   Or, 1 MUX stage for each bit of the shift-amount value.
Though, ad least on Xilinx hardware, 2-bits of MUX can be fused into 1 stage of LUT6's (so, a 64-bit shift needs ~ 3 levels of LUT6).
Still not great to try doing much more than a shift in a single clock cycle. Though more resource budget, parallel shifts and add/subtract is lower latency than trying to detect and inverse the output (and bitwise NOT does not give acceptable results here, the subtracts need to be two's complement).
It can be cheaper to implement the FPU with ones' complement, but this does not give acceptable results if one tries doing integer operations of the FPU (it can be considered as a reasonable request that integer math done via the FPU gives exact integer answers for the ranges covered by the mantissa range).
But, this logic is still annoyingly expensive...
Generally, some special cases can be added on the input and output side to allow for integer conversion.
Int->Binary64:
Handled as an ADD between a negative zero and a synthesized FP value (non-normalized). Basically, fake the signs and exponent and shove the integer value into the mantissa.
Binary64->Int:
Handled as an ADD between a zero mantissa with a large exponent, and the input value. The resulting mantissa (extracted before normalization, and with some sign-selection hacks) being used as the output value.
It is tempting to consider an intermediate between Binary16 and BF16:
Binary16: Often not enough dynamic range, but more precision than needed;
BF16: Overkill dynamic range, not enough precision.
I am starting to suspect S.E6.M9 might have been closer to ideal.
   S.E7.M8 goes probably too far.
It is annoying that one needs to consider having both Binary16 and BF16, but adding a 3rd format to the mix wouldn't necessarily make this better.
There are cases where Binary16 has not enough precision, but usually these end up being handled by 16-bit fixed-point. But, if anything, a case could be made for for integer conversions with a variable exponent offset (to support fixed-point).
Say (in RV terms):
   FCNVSC.X.D Xd, Fs, ImmExpAdj //convert to integer with scale
   FCNVSC.D.X Fd, Xs, ImmExpAdj //convert to double with scale
Which could save using multiply/divide to scale FPU values.
Also maybe instructions to add/subtract a value from the exponent without needing to use a multiply.
Say:
   FADJEXP Fd, Fs, ImmExpAdj
Which does the equivalent of multiply/divide by power of 2.
   Could take the place of multiply in operations like:
   y=x*4096;
   Or:
   y=x/4096;
...

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
2 Feb 25	Re: Cost of handling misaligned access	112	BGB
3 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	109	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	11	BGB
3 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	8	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	7	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	5	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	2	Thomas Koenig
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
4 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	3	Thomas Koenig
3 Feb 25	Re: Cost of handling misaligned access	2	BGB
3 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
4 Feb 25	Re: Cost of handling misaligned access	41	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	40	Terje Mathisen
5 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	35	Michael S
6 Feb 25	Re: Cost of handling misaligned access	32	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	31	Michael S
6 Feb 25	Re: Cost of handling misaligned access	2	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	28	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	27	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	26	Michael S
6 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	19	Michael S
7 Feb 25	Re: Cost of handling misaligned access	18	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	17	Michael S
7 Feb 25	Re: Cost of handling misaligned access	16	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	15	Michael S
7 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	3	MitchAlsup1
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
8 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	6	Michael S
8 Feb 25	Re: Cost of handling misaligned access	5	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	2	Michael S
11 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
10 Feb 25	Re: Cost of handling misaligned access	1	Michael S
7 Feb 25	Re: Cost of handling misaligned access	5	BGB
7 Feb 25	Re: Cost of handling misaligned access	4	MitchAlsup1
7 Feb 25	Re: Cost of handling misaligned access	3	BGB
8 Feb 25	Re: Cost of handling misaligned access	2	Anssi Saari
8 Feb 25	Re: Cost of handling misaligned access	1	BGB
6 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	5	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	3	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	2	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
13 Feb 25	Re: Cost of handling misaligned access	48	Marcus
13 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
14 Feb 25	Re: Cost of handling misaligned access	41	BGB
14 Feb 25	Re: Cost of handling misaligned access	40	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	39	BGB
18 Feb 25	Re: Cost of handling misaligned access	33	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	1	BGB
18 Feb 25	Re: Cost of handling misaligned access	31	Michael S
18 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
18 Feb 25	Re: Cost of handling misaligned access	26	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
18 Feb 25	Re: Cost of handling misaligned access	24	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	23	Terje Mathisen
19 Feb 25	Re: Cost of handling misaligned access	22	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	21	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
20 Feb 25	Re: Cost of handling misaligned access	5	MitchAlsup1
20 Feb 25	Re: Cost of handling misaligned access	2	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	2	Robert Finch
21 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	14	BGB
22 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
22 Feb 25	Re: Cost of handling misaligned access	12	Robert Finch
23 Feb 25	Re: Cost of handling misaligned access	10	BGB
23 Feb 25	Re: Cost of handling misaligned access	9	Michael S
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	7	Michael S
24 Feb 25	Re: Cost of handling misaligned access	4	Robert Finch
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
25 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
23 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
18 Feb 25	Re: Cost of handling misaligned access	3	BGB
19 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	5	Robert Finch
17 Feb 25	Re: Cost of handling misaligned access	5	Terje Mathisen