Re: Cost of handling misaligned access

Liste des GroupesRevenir à c arch 
Sujet : Re: Cost of handling misaligned access
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 18. Feb 2025, 02:00:18
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vp0m3f$1cth6$1@dont-email.me>
References : 1 2 3 4 5 6
User-Agent : Mozilla Thunderbird
On 2/14/2025 3:52 PM, MitchAlsup1 wrote:
On Fri, 14 Feb 2025 21:14:11 +0000, BGB wrote:
 
On 2/13/2025 1:09 PM, Marcus wrote:
-------------
>
The problem arises when the programmer *deliberately* does unaligned
loads and stores in order to improve performance. Or rather, if the
programmer knows that the hardware supports unaligned loads and stores,
he/she can use that to write faster code in some special cases.
>
>
Pretty much.
>
>
This is partly why I am in favor of potentially adding explicit keywords
for some of these cases, or to reiterate:
   __aligned:
     Inform compiler that a pointer is aligned.
     May use a faster version if appropriate.
       If a faster aligned-only variant exists of an instruction.
       On an otherwise unaligned-safe target.
   __unaligned: Inform compiler that an access is unaligned.
     May use a runtime call or similar if necessary,
       on an aligned-only target.
     May do nothing on an unaligned-safe target.
   None: Do whatever is the default.
     Presumably, assume aligned by default,
       unless target is known unaligned-safe.
 It would take LESS total man-power world-wide and over-time to
simply make HW perform misaligned accesses.
I think the usual issue is that on low-end hardware, it is seen as "better" to skip out on misaligned access in order to save some cost in the L1 cache.
Though, not sure how this mixes with 16/32 ISAs, given if one allows misaligned 32-bit instructions, and a misaligned 32-bit instruction to cross a cache-line boundary, one still has to deal with essentially the same issues.
Another related thing I can note is internal store-forwarding within the L1 D$ to avoid RAW and WAW penalties for multiple accesses to the same cache line.
More expensive option:
Detect and forward the stored data back into the Load side so that the Load or Store has an up-to-date view of the cache line;
Cheaper option:
Stall the pipeline until the prior store is able to complete and write its data back into the L1 cache arrays.
This partly effects the structure of prologs and memcpy:
Simple case: Just store or copy in sequential order.
   May take a significant penalty if the cache does not forward stores.
Stagger the store order to avoid WAW penalties:
   Reduces penalties (if accesses are properly aligned);
   More convoluted logic.
Say, it less convoluted to do, say:
   MOV.X  R24, (SP, 0)
   MOV.X  R26, (SP, 16)
   MOV.X  R28, (SP, 32)
   MOV.X  R30, (SP, 48)
Than, say:
   MOV.X  R24, (SP, 0)
   MOV.X  R28, (SP, 32)
   MOV.X  R26, (SP, 16)
   MOV.X  R30, (SP, 48)
This would be a much bigger headache though with 32-byte cache lines than with 16. Though, if switching to 32B lines, might also make sense to switch to half-line addressing.
But, yeah, I have recently gotten caught up in lots of bug hunting, which seems to be negatively effecting my mood.
Did eventually find a few bugs that were holding up XG3 in my Verilog core.
MOV.X was misbehaving, as XG3 had addressed R32..R63 in a different way than XG1 and XG3:
   XG1/XG3: Even Numbers encode R0..R30, Odd encodes R32..R62.
XG3:
   Just uses plain register numbers from R0..R62.
The outer logic for dealing with register pairs wasn't aware of the difference, so was incorrectly decoding references to R32..R62 as R0..R30.
The "CMPxx 3RI Imm6s" instructions were also decoding incorrectly when in XG3 mode when a jumbo-prefix was used and the immediate was negative, partly again due to an XG2/XG3 rules difference:
XG2 had switched to the EI bit for Sign, using WI to select unsigned comparison, XG3 continues using WI for sign in the presence of a Jumbo prefix. So was decoding as an unsigned compare rather than a signed compare.
Where, the unsigned case can't currently be encoded in XG3 with a full 33-bit immediate (but, could be encoded with a 17-bit immediate).
This crap has taken me several months to hunt down, I am not feeling very productive.
For RISC-V + Jumbo prefixes, there was another bug that was being a problem for a while.
But, it appears I might have figured out this one:
   GBR was not previously allowed to be fetched via the Ru port (Lane 2).
However, the offending encoding:
   ADDI Xd, X3, Imm33s
Needed the ability to fetch GBR from this port. This being because it goes through the ALU rather than the AGU, and the ALU also has a quirk that (unlike most everything else) it puts the low half in Lane 2 and the high half in Lane 1 (most other instructions put the low half in Lane 1 and high half in Lane 2). This was mostly because of the way the signal routing needed to work for ALUX (only the Lane 1 ALU could update the S/T bits, and this needs to be done from the high-half ALU for operations like CMPxx and ADC/SBB).
Didn't seem to think that the issue might have been in the ID2/RF stage.
I can note it isn't still allowed from the 'Rv' port, but likely this shouldn't matter unless one wants to use (in XG3):
   RSUB  R3, Imm33s, Rn
Or:  Rn = Imm33s - GBR;
But, this case is likely obscure enough that it might be better to "just leave it broken" to save some LUTs (or make it disallowed).
Sometimes, it does seem like I might be too dumb for a lot of this stuff.
Otherwise:
Have now added an option to widen superscalar fetch for XG3 and RISC-V to 3 instructions.
So, now things like:
   ADD, ADD, ADD
Can use all 3 lanes...
However, can note that Shift ops are still not allowed in Lane3, and the fetch isn't smart enough to shuffle instructions into valid lanes (this is theoretically possible, would likely add too much cost).
Seemingly, it adds around 1k LUT to the cost of the core to enable the logic to detect/handle 3-wide superscalar (vs 2-wide) which is a little steep (though, timing does seemingly improve in this case).
Also went and widened GBR to being a full 64 bits (to match emulator behavior), however, the high bits still remain fused with FPSCR. So, using the high bits of GBR may effect GPU behavior (with dynamic rounding mode ops), and dynamic rounding mode ops may magically change bits in the high part of GBR (cough, GP/X3).
Previously, fetching GBR via the normal GPR ports would give a version with the high 16-bits zeroed.
But, arguably, a full-width 64 bits is probably "more correct" even with the glued-on FPSCR.
Implicitly, this makes things like the rounding mode and similar effectively callee save rather than global (so, if the rounding mode is set in a called function, it will revert when this function returns).
Then again, I have heard that apparently there are libraries that rely on the global-rounding-mode behavior, but I have also heard of such libraries having issues or non-determinism when mixed with other libraries which try to set a custom rounding mode when these modes disagree.
I prefer my strategy instead:
   FADD/FSUB/FMUL:
     Hard-wired Round-Nearest / RNE.
     Does not modify FPU flags.
   FADDG/FSUBG/FMULG:
     Dynamic Rounding;
     May modify FPU flags.
Can note that RISC-V burns 3 bits for FPU instructions always encoding a rounding mode (whereas in my ISA, encoding a rounding mode other than RNE or DYN requiring a 64-bit encoding).
Proper RV has some "user accessible CSRs" here, but these are not yet supported. How to best deal with CSRs is an open issue, as most don't have a 1:1 mapping with those in BJX2. In theory, a mechanism could be added mostly for trying to deal with CSRs (possibly as some weird appendage to be glued onto the part of the register file that deals with control registers).
...
Well, also, some amount of internal emotional conflicts.
Generally feeling kinda worthless recently.
Well, and some amount of ongoing social issues, but I don't really want to go into my thoughts here right now.
But, alas...

Date Sujet#  Auteur
2 Feb 25 * Re: Cost of handling misaligned access112BGB
3 Feb 25 +* Re: Cost of handling misaligned access2MitchAlsup1
3 Feb 25 i`- Re: Cost of handling misaligned access1BGB
3 Feb 25 `* Re: Cost of handling misaligned access109Anton Ertl
3 Feb 25  +* Re: Cost of handling misaligned access11BGB
3 Feb 25  i`* Re: Cost of handling misaligned access10Anton Ertl
3 Feb 25  i +- Re: Cost of handling misaligned access1BGB
3 Feb 25  i `* Re: Cost of handling misaligned access8Thomas Koenig
4 Feb 25  i  `* Re: Cost of handling misaligned access7Anton Ertl
4 Feb 25  i   +* Re: Cost of handling misaligned access5Thomas Koenig
4 Feb 25  i   i`* Re: Cost of handling misaligned access4Anton Ertl
4 Feb 25  i   i +* Re: Cost of handling misaligned access2Thomas Koenig
10 Feb 25  i   i i`- Re: Cost of handling misaligned access1Mike Stump
10 Feb 25  i   i `- Re: Cost of handling misaligned access1Mike Stump
4 Feb 25  i   `- Re: Cost of handling misaligned access1MitchAlsup1
3 Feb 25  +* Re: Cost of handling misaligned access3Thomas Koenig
3 Feb 25  i`* Re: Cost of handling misaligned access2BGB
3 Feb 25  i `- Re: Cost of handling misaligned access1MitchAlsup1
4 Feb 25  +* Re: Cost of handling misaligned access41Anton Ertl
5 Feb 25  i`* Re: Cost of handling misaligned access40Terje Mathisen
5 Feb 25  i +* Re: Cost of handling misaligned access4Anton Ertl
5 Feb 25  i i+* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i ii`- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i i`- Re: Cost of handling misaligned access1Anton Ertl
5 Feb 25  i `* Re: Cost of handling misaligned access35Michael S
6 Feb 25  i  +* Re: Cost of handling misaligned access32Anton Ertl
6 Feb 25  i  i`* Re: Cost of handling misaligned access31Michael S
6 Feb 25  i  i +* Re: Cost of handling misaligned access2Anton Ertl
6 Feb 25  i  i i`- Re: Cost of handling misaligned access1Michael S
6 Feb 25  i  i `* Re: Cost of handling misaligned access28Terje Mathisen
6 Feb 25  i  i  `* Re: Cost of handling misaligned access27Terje Mathisen
6 Feb 25  i  i   `* Re: Cost of handling misaligned access26Michael S
6 Feb 25  i  i    `* Re: Cost of handling misaligned access25Terje Mathisen
6 Feb 25  i  i     +* Re: Cost of handling misaligned access19Michael S
7 Feb 25  i  i     i`* Re: Cost of handling misaligned access18Terje Mathisen
7 Feb 25  i  i     i `* Re: Cost of handling misaligned access17Michael S
7 Feb 25  i  i     i  `* Re: Cost of handling misaligned access16Terje Mathisen
7 Feb 25  i  i     i   `* Re: Cost of handling misaligned access15Michael S
7 Feb 25  i  i     i    +- Re: Cost of handling misaligned access1Terje Mathisen
7 Feb 25  i  i     i    +* Re: Cost of handling misaligned access3MitchAlsup1
8 Feb 25  i  i     i    i+- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i    i`- Re: Cost of handling misaligned access1Michael S
8 Feb 25  i  i     i    `* Re: Cost of handling misaligned access10Anton Ertl
8 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i     +* Re: Cost of handling misaligned access6Michael S
8 Feb 25  i  i     i     i`* Re: Cost of handling misaligned access5Anton Ertl
8 Feb 25  i  i     i     i +- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i +* Re: Cost of handling misaligned access2Michael S
11 Feb 25  i  i     i     i i`- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i `- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Michael S
10 Feb 25  i  i     i     `- Re: Cost of handling misaligned access1Michael S
7 Feb 25  i  i     `* Re: Cost of handling misaligned access5BGB
7 Feb 25  i  i      `* Re: Cost of handling misaligned access4MitchAlsup1
7 Feb 25  i  i       `* Re: Cost of handling misaligned access3BGB
8 Feb 25  i  i        `* Re: Cost of handling misaligned access2Anssi Saari
8 Feb 25  i  i         `- Re: Cost of handling misaligned access1BGB
6 Feb 25  i  `* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i   `- Re: Cost of handling misaligned access1Michael S
6 Feb 25  +* Re: Cost of handling misaligned access5Waldek Hebisch
6 Feb 25  i+* Re: Cost of handling misaligned access3Anton Ertl
6 Feb 25  ii`* Re: Cost of handling misaligned access2Waldek Hebisch
6 Feb 25  ii `- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i`- Re: Cost of handling misaligned access1Terje Mathisen
13 Feb 25  `* Re: Cost of handling misaligned access48Marcus
13 Feb 25   +- Re: Cost of handling misaligned access1Thomas Koenig
14 Feb 25   +* Re: Cost of handling misaligned access41BGB
14 Feb 25   i`* Re: Cost of handling misaligned access40MitchAlsup1
18 Feb 25   i `* Re: Cost of handling misaligned access39BGB
18 Feb 25   i  +* Re: Cost of handling misaligned access33MitchAlsup1
18 Feb 25   i  i+- Re: Cost of handling misaligned access1BGB
18 Feb 25   i  i`* Re: Cost of handling misaligned access31Michael S
18 Feb 25   i  i +- Re: Cost of handling misaligned access1Thomas Koenig
18 Feb 25   i  i +* Re: Cost of handling misaligned access26MitchAlsup1
18 Feb 25   i  i i`* Re: Cost of handling misaligned access25Terje Mathisen
18 Feb 25   i  i i `* Re: Cost of handling misaligned access24MitchAlsup1
19 Feb 25   i  i i  `* Re: Cost of handling misaligned access23Terje Mathisen
19 Feb 25   i  i i   `* Re: Cost of handling misaligned access22MitchAlsup1
19 Feb 25   i  i i    `* Re: Cost of handling misaligned access21BGB
20 Feb 25   i  i i     +- Re: Cost of handling misaligned access1Robert Finch
20 Feb 25   i  i i     +* Re: Cost of handling misaligned access5MitchAlsup1
20 Feb 25   i  i i     i+* Re: Cost of handling misaligned access2BGB
20 Feb 25   i  i i     ii`- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     i`* Re: Cost of handling misaligned access2Robert Finch
21 Feb 25   i  i i     i `- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     `* Re: Cost of handling misaligned access14BGB
22 Feb 25   i  i i      +- Re: Cost of handling misaligned access1Robert Finch
22 Feb 25   i  i i      `* Re: Cost of handling misaligned access12Robert Finch
23 Feb 25   i  i i       +* Re: Cost of handling misaligned access10BGB
23 Feb 25   i  i i       i`* Re: Cost of handling misaligned access9Michael S
24 Feb 25   i  i i       i +- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i `* Re: Cost of handling misaligned access7Michael S
24 Feb 25   i  i i       i  +* Re: Cost of handling misaligned access4Robert Finch
24 Feb 25   i  i i       i  i+- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i  i`* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i  i `- Re: Cost of handling misaligned access1BGB
25 Feb 25   i  i i       i  `* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i   `- Re: Cost of handling misaligned access1BGB
23 Feb 25   i  i i       `- Re: Cost of handling misaligned access1Robert Finch
18 Feb 25   i  i `* Re: Cost of handling misaligned access3BGB
19 Feb 25   i  i  `* Re: Cost of handling misaligned access2MitchAlsup1
18 Feb 25   i  `* Re: Cost of handling misaligned access5Robert Finch
17 Feb 25   `* Re: Cost of handling misaligned access5Terje Mathisen

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal