Re: Cost of handling misaligned access

Liste des GroupesRevenir à c arch 
Sujet : Re: Cost of handling misaligned access
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 19. Feb 2025, 23:42:04
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vp5mnu$2fjhi$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Mozilla Thunderbird
On 2/19/2025 11:31 AM, MitchAlsup1 wrote:
On Wed, 19 Feb 2025 16:35:41 +0000, Terje Mathisen wrote:
 
MitchAlsup1 wrote:
On Tue, 18 Feb 2025 21:09:54 +0000, Terje Mathisen wrote:
>
MitchAlsup1 wrote:
On Tue, 18 Feb 2025 13:07:39 +0000, Michael S wrote:
>
On Tue, 18 Feb 2025 02:55:33 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
>
It takes Round Nearest Odd to perform Kahan-Babashuka Summation.
>
>
Are you aware of any widespread hardware that supplies Round to Nearest
with tie broken to Odd? Or of any widespread language that can request
such rounding mode?
>
No, No
>
Until both, implementing RNO on niche HW looks to me as wastage of both
HW resources and of space in your datasheet.
>
They way I implement it, it is only an additional 10± gates.
>
With discrete logic, it should be identical to RNE, except for flipping
the ulp bit when deciding upon the rounding direction, right?
>
Yes,
>
With a full 4-bit lookup table you need a few more gates, but that is
still the obvious way to implement rounding in SW. (It is only ceil()
and floor() that requires the sign bit as input, the remaining rounding
modes can make do with ulp+guard+sticky.
>
sign+ULP+Gard+sticky is all you ever need for any rounding mode
IEEE or beyond.
>
That's what I believed all through the 2019 standards process and up to
a month or two ago:
>
In reality, the "NearestOrEven" rounding rule has an exception if/when
you need to round the largest possible fp number, with guard=1 and
sticky=0:
>
I.e. exactly halfway to the next possible value (which would be Inf)
>
In just this particular case, the OrEven part is skipped in favor of not
rounding up, so leaving a maximum/odd mantissa.
>
In the same case but sticky=1 we do round up to Inf.
>
This unfortunately means that the rounding circuit needs to be combined
with an exp+mant==0b111...111 input. :-(
 You should rename that mode as "Round but stay finite"
 
So, does it overflow?...
Meanwhile, I just did a sort of "poor man's rounding":
   Tries to round the low 8 bits;
   If there would be a carry out of the low 8 bits, no rounding happens.
Granted, proper rounding could have possibly allowed using bitwise inversion for negation earlier in a few places; but propagating the carry across the whole mantissa is a fairly high-latency operation.
Can note that the latency of carry-select adders is a little weird:
   16/32/64: Latency goes up steadily;
     But, still less than linear;
   128-bit: Only slightly more latency than 64-bit.
The best I could find in past testing was seemingly 16-bit chunks for normal adding. Where, 16-bits seemed to be around the break-even between the chained CARRY4's and the Carry-Select (CS being slower below 16 bits).
But, for a 64-bit adder, still basically need to give it a clock-cycle to do its thing. Though, not like 32 is particularly fast either; hence part of the whole 2 cycle latency on ALU ops thing. Mostly has to do with ADD/SUB (and CMP, which is based on SUB).
Admittedly part of why I have such mixed feelings on full compare-and-branch:
   Pro: It can offer a performance advantage (in terms of per-clock);
   Con: Branch is now beholden to the latency of a Subtract.
Determine branch direction in EX1, but timing is tight.
Branching from EX2 is too late though, as by this point EX1 will have already started running the following instruction, and there would be no good way to avoid the existence of a branch delay slot.
Vs, say:
   X< 0: Sign
   X>=0: !Sign
   X<=0: Sign || Zero
   x> 0: !Sign && !Zero
   x==0: Zero
   x!=0: !Zero
Unsigned is equivalent to EQ/NE.
Where, detecting all zeroes is at least cheaper than a subtract. But, detecting all zeroes still isn't free (for 64b, ~ 10 LUTs and 3 LUTs delay).
In either RV or XG3, a subset could be possible where Bcc requires Rs2 or Rm to be 0. Though, seemingly, such a subset isn't a rule in RV land (IIRC, even RV32E still using normal compare-and-branch).
Granted, if one did a "reduced cost" version of RV32E, using only BccZ, dropping SLL/SRL/SRA, only allowing certain values for SLLI/SRLI/SRAI, ..., people might be a bit like "WTF?".
But, seemingly, microcontroller land seems to mostly going for RV32IMC or RV32IMFC, which is aiming the bar a little higher...
But, I guess, the reason I can note for considering a possible BccZ only subset rule for XG3 is that (if one excludes RV support) it would make it easier to try to a higher clock-speed.
For XG1/XG2, compare-and-branch is an optional feature, and mostly ended up in use because, if RV needs it and is enabled, the core has already paid for it (and, it does still offer a performance advantage).
Though, I guess one could debate between which would be better:
   XG2 only core;
   XG3 only core;
   Both XG2 and XG3, noting that the same decoder can do both.
Though, there is a cost difference between XG2 and XG3 for 3-wide:
XG2 relies on compiler tagging;
XG3 has the CPU infer it.
There is a visible cost difference for having the CPU infer 3-wide cases (3-wide superscalar causes cost to jump up by around 1000 LUTs).
Though, for such a profile, may make sense to aim for a 2-wide configuration (and probably disallow 96-bit encodings in this case).
Granted, maybe I should instead try to stop fragmenting my own ISA to explore each sub-path?...
As-is, ATM, I am left testing Verilog simulations with 4 different copies of Doom in an attempt to validate that the ISA variants all work (can't do much more than this, my PC only has so much CPU).
Recently, I had thought I had found the bugs that were stopping RV+Jx and XG3 from fully working, but it seems more bugs remain...
XG3: Seemingly some compound "if()" statements are not working, but it takes around a day of simulation to get to where the branch occurs (I had thought I had fixed the issue, but it seems it is still not working). Current guess is that "CMPLT Rs, Imm33s, Rn" is not decoding correctly, but need to get to it in the simulation (where it should print out a debug message and hopefully give some clues) to verify.
At this rate, may need to add it as a sanity check in the Boot-ROM, but the Boot-ROM only has so much space for sanity-testing cruft.
RV+Jx: Nature of the newer bug has not yet been identified.
A few steps forward, but still not in the clear. Debugging goes a lot faster when one doesn't need to wait a day to check whether a fix attempt worked, or to try to get the relevant debug prints to try to figure out why it did not.
...

Date Sujet#  Auteur
2 Feb 25 * Re: Cost of handling misaligned access112BGB
3 Feb 25 +* Re: Cost of handling misaligned access2MitchAlsup1
3 Feb 25 i`- Re: Cost of handling misaligned access1BGB
3 Feb 25 `* Re: Cost of handling misaligned access109Anton Ertl
3 Feb 25  +* Re: Cost of handling misaligned access11BGB
3 Feb 25  i`* Re: Cost of handling misaligned access10Anton Ertl
3 Feb 25  i +- Re: Cost of handling misaligned access1BGB
3 Feb 25  i `* Re: Cost of handling misaligned access8Thomas Koenig
4 Feb 25  i  `* Re: Cost of handling misaligned access7Anton Ertl
4 Feb 25  i   +* Re: Cost of handling misaligned access5Thomas Koenig
4 Feb 25  i   i`* Re: Cost of handling misaligned access4Anton Ertl
4 Feb 25  i   i +* Re: Cost of handling misaligned access2Thomas Koenig
10 Feb 25  i   i i`- Re: Cost of handling misaligned access1Mike Stump
10 Feb 25  i   i `- Re: Cost of handling misaligned access1Mike Stump
4 Feb 25  i   `- Re: Cost of handling misaligned access1MitchAlsup1
3 Feb 25  +* Re: Cost of handling misaligned access3Thomas Koenig
3 Feb 25  i`* Re: Cost of handling misaligned access2BGB
3 Feb 25  i `- Re: Cost of handling misaligned access1MitchAlsup1
4 Feb 25  +* Re: Cost of handling misaligned access41Anton Ertl
5 Feb 25  i`* Re: Cost of handling misaligned access40Terje Mathisen
5 Feb 25  i +* Re: Cost of handling misaligned access4Anton Ertl
5 Feb 25  i i+* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i ii`- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i i`- Re: Cost of handling misaligned access1Anton Ertl
5 Feb 25  i `* Re: Cost of handling misaligned access35Michael S
6 Feb 25  i  +* Re: Cost of handling misaligned access32Anton Ertl
6 Feb 25  i  i`* Re: Cost of handling misaligned access31Michael S
6 Feb 25  i  i +* Re: Cost of handling misaligned access2Anton Ertl
6 Feb 25  i  i i`- Re: Cost of handling misaligned access1Michael S
6 Feb 25  i  i `* Re: Cost of handling misaligned access28Terje Mathisen
6 Feb 25  i  i  `* Re: Cost of handling misaligned access27Terje Mathisen
6 Feb 25  i  i   `* Re: Cost of handling misaligned access26Michael S
6 Feb 25  i  i    `* Re: Cost of handling misaligned access25Terje Mathisen
6 Feb 25  i  i     +* Re: Cost of handling misaligned access19Michael S
7 Feb 25  i  i     i`* Re: Cost of handling misaligned access18Terje Mathisen
7 Feb 25  i  i     i `* Re: Cost of handling misaligned access17Michael S
7 Feb 25  i  i     i  `* Re: Cost of handling misaligned access16Terje Mathisen
7 Feb 25  i  i     i   `* Re: Cost of handling misaligned access15Michael S
7 Feb 25  i  i     i    +- Re: Cost of handling misaligned access1Terje Mathisen
7 Feb 25  i  i     i    +* Re: Cost of handling misaligned access3MitchAlsup1
8 Feb 25  i  i     i    i+- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i    i`- Re: Cost of handling misaligned access1Michael S
8 Feb 25  i  i     i    `* Re: Cost of handling misaligned access10Anton Ertl
8 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i     +* Re: Cost of handling misaligned access6Michael S
8 Feb 25  i  i     i     i`* Re: Cost of handling misaligned access5Anton Ertl
8 Feb 25  i  i     i     i +- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i +* Re: Cost of handling misaligned access2Michael S
11 Feb 25  i  i     i     i i`- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i `- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Michael S
10 Feb 25  i  i     i     `- Re: Cost of handling misaligned access1Michael S
7 Feb 25  i  i     `* Re: Cost of handling misaligned access5BGB
7 Feb 25  i  i      `* Re: Cost of handling misaligned access4MitchAlsup1
7 Feb 25  i  i       `* Re: Cost of handling misaligned access3BGB
8 Feb 25  i  i        `* Re: Cost of handling misaligned access2Anssi Saari
8 Feb 25  i  i         `- Re: Cost of handling misaligned access1BGB
6 Feb 25  i  `* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i   `- Re: Cost of handling misaligned access1Michael S
6 Feb 25  +* Re: Cost of handling misaligned access5Waldek Hebisch
6 Feb 25  i+* Re: Cost of handling misaligned access3Anton Ertl
6 Feb 25  ii`* Re: Cost of handling misaligned access2Waldek Hebisch
6 Feb 25  ii `- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i`- Re: Cost of handling misaligned access1Terje Mathisen
13 Feb 25  `* Re: Cost of handling misaligned access48Marcus
13 Feb 25   +- Re: Cost of handling misaligned access1Thomas Koenig
14 Feb 25   +* Re: Cost of handling misaligned access41BGB
14 Feb 25   i`* Re: Cost of handling misaligned access40MitchAlsup1
18 Feb 25   i `* Re: Cost of handling misaligned access39BGB
18 Feb 25   i  +* Re: Cost of handling misaligned access33MitchAlsup1
18 Feb 25   i  i+- Re: Cost of handling misaligned access1BGB
18 Feb 25   i  i`* Re: Cost of handling misaligned access31Michael S
18 Feb 25   i  i +- Re: Cost of handling misaligned access1Thomas Koenig
18 Feb 25   i  i +* Re: Cost of handling misaligned access26MitchAlsup1
18 Feb 25   i  i i`* Re: Cost of handling misaligned access25Terje Mathisen
18 Feb 25   i  i i `* Re: Cost of handling misaligned access24MitchAlsup1
19 Feb 25   i  i i  `* Re: Cost of handling misaligned access23Terje Mathisen
19 Feb 25   i  i i   `* Re: Cost of handling misaligned access22MitchAlsup1
19 Feb 25   i  i i    `* Re: Cost of handling misaligned access21BGB
20 Feb 25   i  i i     +- Re: Cost of handling misaligned access1Robert Finch
20 Feb 25   i  i i     +* Re: Cost of handling misaligned access5MitchAlsup1
20 Feb 25   i  i i     i+* Re: Cost of handling misaligned access2BGB
20 Feb 25   i  i i     ii`- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     i`* Re: Cost of handling misaligned access2Robert Finch
21 Feb 25   i  i i     i `- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     `* Re: Cost of handling misaligned access14BGB
22 Feb 25   i  i i      +- Re: Cost of handling misaligned access1Robert Finch
22 Feb 25   i  i i      `* Re: Cost of handling misaligned access12Robert Finch
23 Feb 25   i  i i       +* Re: Cost of handling misaligned access10BGB
23 Feb 25   i  i i       i`* Re: Cost of handling misaligned access9Michael S
24 Feb 25   i  i i       i +- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i `* Re: Cost of handling misaligned access7Michael S
24 Feb 25   i  i i       i  +* Re: Cost of handling misaligned access4Robert Finch
24 Feb 25   i  i i       i  i+- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i  i`* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i  i `- Re: Cost of handling misaligned access1BGB
25 Feb 25   i  i i       i  `* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i   `- Re: Cost of handling misaligned access1BGB
23 Feb 25   i  i i       `- Re: Cost of handling misaligned access1Robert Finch
18 Feb 25   i  i `* Re: Cost of handling misaligned access3BGB
19 Feb 25   i  i  `* Re: Cost of handling misaligned access2MitchAlsup1
18 Feb 25   i  `* Re: Cost of handling misaligned access5Robert Finch
17 Feb 25   `* Re: Cost of handling misaligned access5Terje Mathisen

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal