Re: Cost of handling misaligned access

Liste des GroupesRevenir à c arch 
Sujet : Re: Cost of handling misaligned access
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 20. Feb 2025, 06:38:07
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vp6f42$2n5q2$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
User-Agent : Mozilla Thunderbird
On 2/19/2025 9:02 PM, MitchAlsup1 wrote:
On Wed, 19 Feb 2025 22:42:04 +0000, BGB wrote:
 
On 2/19/2025 11:31 AM, MitchAlsup1 wrote:
On Wed, 19 Feb 2025 16:35:41 +0000, Terje Mathisen wrote:
>
------------------
sign+ULP+Gard+sticky is all you ever need for any rounding mode
IEEE or beyond.
>
That's what I believed all through the 2019 standards process and up to
a month or two ago:
>
In reality, the "NearestOrEven" rounding rule has an exception if/when
you need to round the largest possible fp number, with guard=1 and
sticky=0:
>
I.e. exactly halfway to the next possible value (which would be Inf)
>
In just this particular case, the OrEven part is skipped in favor of not
rounding up, so leaving a maximum/odd mantissa.
>
In the same case but sticky=1 we do round up to Inf.
>
This unfortunately means that the rounding circuit needs to be combined
with an exp+mant==0b111...111 input. :-(
>
You should rename that mode as "Round but stay finite"
>
>
So, does it overflow?...
 Based on how IEEE 754 wo9rked throughout its history::
 If the calculation overflows without the need for rounding;
yes, it overflows. It is just that rounding all by itself does
not overflow that is different.
----------------
OK.
It almost is kinda sad in a way that IEEE-754 lacks the same sort of wonky overflow behaviors that we accept as standard in integer land.
Like, what if, say:
There were no Inf or NaN, and FPU just quietly overflowed and wrapped around (probably back down to near-zero range, but probably with the opposite sign).
Though, IIRC, this was sort of a thing (at least at one point) for Binary16 on ARM. Like, it wasn't until later that they switched to having Inf and NaN and similar.
Though, in my case, I discard Inf/NaN for Fp8, but make it saturating (0x7F and 0xFF representing the maximum and minimum values). Then 0x00/0x80 are "usually" understood as 0. Sorta makes more sense for FP8 as they are small enough that it is practical to also deal with the entire mantissa in these cases.
But, it is a tradeoff, without Inf/NaN, 99968.0 can be expressed with Binary16, but with Inf/NaN in existence, this is out of range...

>
Admittedly part of why I have such mixed feelings on full
compare-and-branch:
   Pro: It can offer a performance advantage (in terms of per-clock);
   Con: Branch is now beholden to the latency of a Subtract.
     Con: it can't compare to a constant
     Con: it can't compare floating point
 
I have experimented with encodings in my RV+Jx mode and XG3 that can allow for constants...
   However, the performance delta is pretty small.
Meanwhile:
   SLT X6, X18, 0x123
   BNE .L0, X6, X0
Isn't that much different than:
   LI    X6, 0x123
   BNE  .L0, X18, X6
In my case, I also have BTST/BNTST cases; but these can be done with lower latency...
Though, the relative gain from BTST/BNTST is also fairly modest.
I wouldn't expect much savings from FPU compare here, as they tend not to be all that high on the clock-cycle rankings. Theoretically, they wouldn't cost too much to add (though, potentially, with fairly loose adherence to IEEE semantics).
Though, one could argue that IEEE rules for comparison are a bit too complicated, and one could simplify:
<, >, <=, >=: Behave as equivalent to a sign/magnitude integer comparison.
==, !=: NaN special case merely makes == false, and thus != is true by extension.
Does implicitly mean NaN>Inf is true, but, probably fine in practice...
   Main useful case is using "if(!(x==x)) ..." to detect NaN.
Otherwise, which side of the branch ones' comparison falls down in the case of a NaN depends mostly on the whims of the compiler and similar.

----------------
 
Where, detecting all zeroes is at least cheaper than a subtract. But,
detecting all zeroes still isn't free (for 64b, ~ 10 LUTs and 3 LUTs
delay).
 1 gate   4-inputs inverted
2 gates 16-inputs true
3-gates 64-inputs inverted
I was thinking here:
6 bits in, 1 bit out;
But, can't be done in 2-levels of 6-bits, so needs to split it up and use 3 levels.
With ASIC logic, presumably one could just construct a 64-input OR gate. Though, maybe this would get fiddly as one would have to balance pull-up and pull-down strength against transistor leakage; so 2-levels of 8-input OR's might make more sense.
Though, on FPGA, one could combine it with AND logic: (A & B) != 0
Which could still be handled in 3 levels of LUT6 (albeit with 26 LUTs).
I am admittedly a little annoyed as Windows had recently rebooted my PC on its own for Windows Update, which now means I probably need to wait until tomorrow to see the results of a few Verilog tweaks (might have otherwise seen them today if not for the reboot).
Mostly because the crashes happen after Doom does its whole
"[.....            ]"
Thing, which doesn't exactly happen quickly.
Could almost makes sense to set up a decently fast PC probably running Linux, probably with minimal or no GPU, mostly just to run Verilator simulations.
While I have an old Xeon E5410 based rack server, this is not ideal:
   Uses a lot of power;
   Sounds like one is running a vacuum cleaner;
   The E5410 (at 2.3 GHz) is slower than my main PC at running Verilator.
Would probably want enough RAM and CPU that it could run, say, 8 simulations at the same time.
Though, reminds me of seeing a lot of people complaining online about CPU fan noise from PCs and similar...
Would be funny to see how this type of person would respond to rack-server levels of fan noise.
...
Maybe I should just go add the test case to the Boot ROM and fire up a 5th simulation. I will at least not have to wait until tomorrow to see the results on this one (eg, whether the debug-prints will happen to reveal enough clues to locate the decoding bug...).
The first parts of the Boot-ROM and also TestKern shell/kernel, mostly just being a bunch of sanity test code to verify whether or not various parts of the ISA and similar are behaving as expected.
But, in this case, can't put it in the shell, since this is still built in XG1 mode, though I did add the ability to boot the Boot-ROM in RISC-V and XG3 Modes (via an ugly hack), partly for this sort of testing.
Technically, the kernel could be built in XG2 or XG3 mode, but this would add hassle (or, at least, beyond that already spent building Doom for multiple ISA modes).
Though, in part, the kernel running in XG1 mode mostly works as a "confirm I haven't broken XG1" test.
...

Date Sujet#  Auteur
2 Feb 25 * Re: Cost of handling misaligned access112BGB
3 Feb 25 +* Re: Cost of handling misaligned access2MitchAlsup1
3 Feb 25 i`- Re: Cost of handling misaligned access1BGB
3 Feb 25 `* Re: Cost of handling misaligned access109Anton Ertl
3 Feb 25  +* Re: Cost of handling misaligned access11BGB
3 Feb 25  i`* Re: Cost of handling misaligned access10Anton Ertl
3 Feb 25  i +- Re: Cost of handling misaligned access1BGB
3 Feb 25  i `* Re: Cost of handling misaligned access8Thomas Koenig
4 Feb 25  i  `* Re: Cost of handling misaligned access7Anton Ertl
4 Feb 25  i   +* Re: Cost of handling misaligned access5Thomas Koenig
4 Feb 25  i   i`* Re: Cost of handling misaligned access4Anton Ertl
4 Feb 25  i   i +* Re: Cost of handling misaligned access2Thomas Koenig
10 Feb 25  i   i i`- Re: Cost of handling misaligned access1Mike Stump
10 Feb 25  i   i `- Re: Cost of handling misaligned access1Mike Stump
4 Feb 25  i   `- Re: Cost of handling misaligned access1MitchAlsup1
3 Feb 25  +* Re: Cost of handling misaligned access3Thomas Koenig
3 Feb 25  i`* Re: Cost of handling misaligned access2BGB
3 Feb 25  i `- Re: Cost of handling misaligned access1MitchAlsup1
4 Feb 25  +* Re: Cost of handling misaligned access41Anton Ertl
5 Feb 25  i`* Re: Cost of handling misaligned access40Terje Mathisen
5 Feb 25  i +* Re: Cost of handling misaligned access4Anton Ertl
5 Feb 25  i i+* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i ii`- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i i`- Re: Cost of handling misaligned access1Anton Ertl
5 Feb 25  i `* Re: Cost of handling misaligned access35Michael S
6 Feb 25  i  +* Re: Cost of handling misaligned access32Anton Ertl
6 Feb 25  i  i`* Re: Cost of handling misaligned access31Michael S
6 Feb 25  i  i +* Re: Cost of handling misaligned access2Anton Ertl
6 Feb 25  i  i i`- Re: Cost of handling misaligned access1Michael S
6 Feb 25  i  i `* Re: Cost of handling misaligned access28Terje Mathisen
6 Feb 25  i  i  `* Re: Cost of handling misaligned access27Terje Mathisen
6 Feb 25  i  i   `* Re: Cost of handling misaligned access26Michael S
6 Feb 25  i  i    `* Re: Cost of handling misaligned access25Terje Mathisen
6 Feb 25  i  i     +* Re: Cost of handling misaligned access19Michael S
7 Feb 25  i  i     i`* Re: Cost of handling misaligned access18Terje Mathisen
7 Feb 25  i  i     i `* Re: Cost of handling misaligned access17Michael S
7 Feb 25  i  i     i  `* Re: Cost of handling misaligned access16Terje Mathisen
7 Feb 25  i  i     i   `* Re: Cost of handling misaligned access15Michael S
7 Feb 25  i  i     i    +- Re: Cost of handling misaligned access1Terje Mathisen
7 Feb 25  i  i     i    +* Re: Cost of handling misaligned access3MitchAlsup1
8 Feb 25  i  i     i    i+- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i    i`- Re: Cost of handling misaligned access1Michael S
8 Feb 25  i  i     i    `* Re: Cost of handling misaligned access10Anton Ertl
8 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i     +* Re: Cost of handling misaligned access6Michael S
8 Feb 25  i  i     i     i`* Re: Cost of handling misaligned access5Anton Ertl
8 Feb 25  i  i     i     i +- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i +* Re: Cost of handling misaligned access2Michael S
11 Feb 25  i  i     i     i i`- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i `- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Michael S
10 Feb 25  i  i     i     `- Re: Cost of handling misaligned access1Michael S
7 Feb 25  i  i     `* Re: Cost of handling misaligned access5BGB
7 Feb 25  i  i      `* Re: Cost of handling misaligned access4MitchAlsup1
7 Feb 25  i  i       `* Re: Cost of handling misaligned access3BGB
8 Feb 25  i  i        `* Re: Cost of handling misaligned access2Anssi Saari
8 Feb 25  i  i         `- Re: Cost of handling misaligned access1BGB
6 Feb 25  i  `* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i   `- Re: Cost of handling misaligned access1Michael S
6 Feb 25  +* Re: Cost of handling misaligned access5Waldek Hebisch
6 Feb 25  i+* Re: Cost of handling misaligned access3Anton Ertl
6 Feb 25  ii`* Re: Cost of handling misaligned access2Waldek Hebisch
6 Feb 25  ii `- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i`- Re: Cost of handling misaligned access1Terje Mathisen
13 Feb 25  `* Re: Cost of handling misaligned access48Marcus
13 Feb 25   +- Re: Cost of handling misaligned access1Thomas Koenig
14 Feb 25   +* Re: Cost of handling misaligned access41BGB
14 Feb 25   i`* Re: Cost of handling misaligned access40MitchAlsup1
18 Feb 25   i `* Re: Cost of handling misaligned access39BGB
18 Feb 25   i  +* Re: Cost of handling misaligned access33MitchAlsup1
18 Feb 25   i  i+- Re: Cost of handling misaligned access1BGB
18 Feb 25   i  i`* Re: Cost of handling misaligned access31Michael S
18 Feb 25   i  i +- Re: Cost of handling misaligned access1Thomas Koenig
18 Feb 25   i  i +* Re: Cost of handling misaligned access26MitchAlsup1
18 Feb 25   i  i i`* Re: Cost of handling misaligned access25Terje Mathisen
18 Feb 25   i  i i `* Re: Cost of handling misaligned access24MitchAlsup1
19 Feb 25   i  i i  `* Re: Cost of handling misaligned access23Terje Mathisen
19 Feb 25   i  i i   `* Re: Cost of handling misaligned access22MitchAlsup1
19 Feb 25   i  i i    `* Re: Cost of handling misaligned access21BGB
20 Feb 25   i  i i     +- Re: Cost of handling misaligned access1Robert Finch
20 Feb 25   i  i i     +* Re: Cost of handling misaligned access5MitchAlsup1
20 Feb 25   i  i i     i+* Re: Cost of handling misaligned access2BGB
20 Feb 25   i  i i     ii`- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     i`* Re: Cost of handling misaligned access2Robert Finch
21 Feb 25   i  i i     i `- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     `* Re: Cost of handling misaligned access14BGB
22 Feb 25   i  i i      +- Re: Cost of handling misaligned access1Robert Finch
22 Feb 25   i  i i      `* Re: Cost of handling misaligned access12Robert Finch
23 Feb 25   i  i i       +* Re: Cost of handling misaligned access10BGB
23 Feb 25   i  i i       i`* Re: Cost of handling misaligned access9Michael S
24 Feb 25   i  i i       i +- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i `* Re: Cost of handling misaligned access7Michael S
24 Feb 25   i  i i       i  +* Re: Cost of handling misaligned access4Robert Finch
24 Feb 25   i  i i       i  i+- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i  i`* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i  i `- Re: Cost of handling misaligned access1BGB
25 Feb 25   i  i i       i  `* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i   `- Re: Cost of handling misaligned access1BGB
23 Feb 25   i  i i       `- Re: Cost of handling misaligned access1Robert Finch
18 Feb 25   i  i `* Re: Cost of handling misaligned access3BGB
19 Feb 25   i  i  `* Re: Cost of handling misaligned access2MitchAlsup1
18 Feb 25   i  `* Re: Cost of handling misaligned access5Robert Finch
17 Feb 25   `* Re: Cost of handling misaligned access5Terje Mathisen

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal