Re: Cost of handling misaligned access

Liste des GroupesRevenir à c arch 
Sujet : Re: Cost of handling misaligned access
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 18. Feb 2025, 11:53:28
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vp1ori$1llrm$1@dont-email.me>
References : 1 2 3 4 5 6 7 8
User-Agent : Mozilla Thunderbird
On 2/17/2025 11:07 PM, Robert Finch wrote:
On 2025-02-17 8:00 p.m., BGB wrote:
On 2/14/2025 3:52 PM, MitchAlsup1 wrote:
On Fri, 14 Feb 2025 21:14:11 +0000, BGB wrote:
>
On 2/13/2025 1:09 PM, Marcus wrote:
-------------
>
The problem arises when the programmer *deliberately* does unaligned
loads and stores in order to improve performance. Or rather, if the
programmer knows that the hardware supports unaligned loads and stores,
he/she can use that to write faster code in some special cases.
>
>
Pretty much.
>
>
This is partly why I am in favor of potentially adding explicit keywords
for some of these cases, or to reiterate:
   __aligned:
     Inform compiler that a pointer is aligned.
     May use a faster version if appropriate.
       If a faster aligned-only variant exists of an instruction.
       On an otherwise unaligned-safe target.
   __unaligned: Inform compiler that an access is unaligned.
     May use a runtime call or similar if necessary,
       on an aligned-only target.
     May do nothing on an unaligned-safe target.
   None: Do whatever is the default.
     Presumably, assume aligned by default,
       unless target is known unaligned-safe.
>
It would take LESS total man-power world-wide and over-time to
simply make HW perform misaligned accesses.
>
>
 
I think the usual issue is that on low-end hardware, it is seen as "better" to skip out on misaligned access in order to save some cost in the L1 cache.
>
I always include support for unaligned accesses even with a ‘low-end’ CPU. I think it is not that expensive and sure makes some things a lot easier when handled in hardware. For Q+ it just runs two bus cycles if the data spans a cache line and pastes results together as needed.
 
I had went aligned-only with some 32-bit cores in the past.
Whole CPU core fit into less LUTs than I currently spend on just the L1 D$...
Granted, some of these used a very minimal L1 cache design:
   Only holds a single cache line.
The smallest cores I had managed had used a simplified SH-based design:
   Fixed-length 16 bit instructions, with 16 registers;
   Only (Reg) and (Reg, R0) addressing;
   Aligned only;
   No shift or multiply;
   ...
Where, say:
   SH-4 -> BJX1-32 (Added features)
   SH-4 -> B32V (Stripped down)
   BJX1-32 -> BJX1-64A (64-bit, Modal Encoding)
   B32V -> B64V (64-bit, Encoding Space Reorganizations)
   B64V ~> BJX1-64C (No longer Modal)
Where, BJX1-64C was the end of this project (before I effectively did a soft-reboot).
Then transition phase:
   B64V -> BtSR1 (Dropped to 32-bit, More Encoding Changes)
     Significant reorganization.
     Was trying to get optimize for code density closer to MSP430.
   BtSR1 -> BJX2 (Back to 64-bit, re-adding features from BJX1-64C)
     A few features added for BtSR1 were dropped again in BJX2.
The original form of BJX2 was still a primarily 16-bit ISA encoding, but at this point pretty much mutated beyond recognition (and relatively few instructions were still in the same places that they were in SH-4).
For example (original 16-bit space):
   0zzz:
     SH-4: Ld/St (Rm,R0); also 0R and 1R spaces, etc.
     BJX2: Ld/St Only (Rm) and (Rm,R0)
   1zzz:
     SH-4: Store (Rn, Disp4)
     BJX2: 2R ALU ops
   2zzz:
     SH-4: Store (@Rn, @-Rn), ALU ops
     BJX2: Branch Ops (Disp8), etc
   3zzz:
     SH-4: ALU ops
     BJX2: 0R and 1R ops
   4zzz:
     SH-4: 1R ops
     BJX2: Ld/St (SP, Disp4); MOV-CR, LEA
   5zzz:
     SH-4: Load (Rm, Disp4)
     BJX2: Load (Unsigned), ALU ops
   6zzz:
     SH-4: Load (@Rm+ and @Rm), ALU
     BJX2: FPU ops, CMP-Imm4
   7zzz:
     SH-4: ADD Imm8, Rn
     BJX2: (XGPR 32-bit Escape Block)
   8zzz:
     SH-4: Branch (Disp8)
     BJX2: Ld/St (Rm, Disp3)
   9zzz:
     SH-4: Load (PC-Rel)
     BJX2: (XGPR 32-bit Escape Block)
   Azzz:
     SH-4: BRA Disp12
     BJX2: MOV Imm12u, R0
   Bzzz:
     SH-4: BSR Disp12
     BJX2: MOV Imm12n, R0
   Czzz:
     SH-4: Some Imm8 ops
     BJX2: ADD Imm8, Rn
   Dzzz:
     SH-4: Load (PC-Rel)
     BJX2: MOV Imm8, Rn
   Ezzz:
     SH-4: MOV Imm8, Rn
     BJX2: (32-bit Escape, Predicated Ops)
   Fzzz:
     SH-4: FPU Ops
     BJX2: (32-bit Escape, Unconditional Ops)
For the 16-bit ops, SH-4 had more addressing modes than BJX2:
   SH-4: @Reg, @Rm+, @-Rn, @(Reg,R0), @(Reg,Disp4) @(PC,Disp8)
   BJX2: (Rm), (Rm,R0), (Rm,Disp3), (SP,Disp4)
Although it may seem like it, I didn't just completely start over on the layout, but rather it was sort of an "ant-hill reorganization".
Say, for example:
   1zzz and 5zzz were merged into 8zzz, reducing Disp by 1 bit
   2zzz and 3zzz was partly folded into 0zzz and 1zzz
   8zzz's contents were moved to 2zzz
   4zzz and part of 0zzz were merged into 3zzz
   ...
A few CR's are still in the same places and SR still has a similar layout I guess, ...
Early on, there was the idea that the 32-bit ops were prefix-modified versions of the 16-bit ops, but early on this symmetry broke and the 16 and 32-bit encoding spaces became independent of each other.
Though, the 32-bit F0 space still has some amount of similarity to the 16-bit space.
Later on I did some testing and performance comparisons, and realized that using 32-bit encodings primarily (or exclusively) gave significantly better performance than relying primarily or exclusively on 16-bit ops. And at this point the ISA transitioned from a primarily 16-bit ISA (with 32-bit extension ops) to a primarily 32-bit ISA with a 16-bit encoding space. This transition didn't directly effect encodings, but did effect how the ISA developed from then going forward (more so, there was no longer an idea that the 16-bit ISA would need to be able to exist standalone; but now the 32-bit ISA did need to be able to exist standalone).
But, now newer forms of BJX2 (XG2 and XG3) have become almost unrecognizable from early BJX2 (as an ISA still primarily built around 16-bit encodings).
Except that XG2's instruction layout still carries vestiges of its origins as a prefix encoding. But, XG3 even makes this part disappear (by reorganizing the bits to more closely resemble RISC-V's layout).
Well, and there is:
   ZnmX -> ZXnm
But:
   F0nm_ZeoX

I prefer my strategy instead:
   FADD/FSUB/FMUL:
     Hard-wired Round-Nearest / RNE.
     Does not modify FPU flags.
   FADDG/FSUBG/FMULG:
     Dynamic Rounding;
     May modify FPU flags.
>
Can note that RISC-V burns 3 bits for FPU instructions always encoding a rounding mode (whereas in my ISA, encoding a rounding mode other than RNE or DYN requiring a 64-bit encoding).
>
>
Q+ encodes rounding mode the same way as RISCV as there are lots of bit available in the instruction. Burning bits on the rounding mode seems reasonable to me when bits are available.
 
Initially:
   3 bits of entropy were eaten by the 16-bit space;
   2 more bits were eaten by predication and WEX.
So, the initial ISA design for 32-bit ops had 5 less bits than in RISC-V land.
XG2 reclaimed the 16-bit space, but used the bits to expand all the register fields to 6 bits.
Not many bits left to justify burning on a rounding mode.
   And, my Imm/Disp fields were generally 3 bits less than RV.

 Modified the PRED modifier in Q+ to take a predicate bit from one of three registers used to supply bits. Previously an array of two-bit mask values encoded in the instruction indicated to 1) ignore the predicate bit 2) execute if predicate true or 3) execute if predicate false.
Since there were three reg specs available in the PRED modifier, it seemed to make more sense to specify three regs instead of one. So now it works 1) as before 2) execute if bit in Ra is set, 3) execute if bit in Rb is set, 3) execute if bit in Rc is set.
The same register may be specified for Ra, Rb, and Rc. Since there is sign inversion available, the original operation may be mimicked by specifying Ra, ~Ra.
 
In BJX2, all 32-bit instructions encode predication in 2 bits in each instruction.
In XG3, the space that would have otherwise encoded WEX was instead left to RISC-V (to create a conglomerate ISA).
But, there is also the possibility to use XG3 by itself without any RISC-V parts in the mix.

 

Date Sujet#  Auteur
2 Feb 25 * Re: Cost of handling misaligned access112BGB
3 Feb 25 +* Re: Cost of handling misaligned access2MitchAlsup1
3 Feb 25 i`- Re: Cost of handling misaligned access1BGB
3 Feb 25 `* Re: Cost of handling misaligned access109Anton Ertl
3 Feb 25  +* Re: Cost of handling misaligned access11BGB
3 Feb 25  i`* Re: Cost of handling misaligned access10Anton Ertl
3 Feb 25  i +- Re: Cost of handling misaligned access1BGB
3 Feb 25  i `* Re: Cost of handling misaligned access8Thomas Koenig
4 Feb 25  i  `* Re: Cost of handling misaligned access7Anton Ertl
4 Feb 25  i   +* Re: Cost of handling misaligned access5Thomas Koenig
4 Feb 25  i   i`* Re: Cost of handling misaligned access4Anton Ertl
4 Feb 25  i   i +* Re: Cost of handling misaligned access2Thomas Koenig
10 Feb 25  i   i i`- Re: Cost of handling misaligned access1Mike Stump
10 Feb 25  i   i `- Re: Cost of handling misaligned access1Mike Stump
4 Feb 25  i   `- Re: Cost of handling misaligned access1MitchAlsup1
3 Feb 25  +* Re: Cost of handling misaligned access3Thomas Koenig
3 Feb 25  i`* Re: Cost of handling misaligned access2BGB
3 Feb 25  i `- Re: Cost of handling misaligned access1MitchAlsup1
4 Feb 25  +* Re: Cost of handling misaligned access41Anton Ertl
5 Feb 25  i`* Re: Cost of handling misaligned access40Terje Mathisen
5 Feb 25  i +* Re: Cost of handling misaligned access4Anton Ertl
5 Feb 25  i i+* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i ii`- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i i`- Re: Cost of handling misaligned access1Anton Ertl
5 Feb 25  i `* Re: Cost of handling misaligned access35Michael S
6 Feb 25  i  +* Re: Cost of handling misaligned access32Anton Ertl
6 Feb 25  i  i`* Re: Cost of handling misaligned access31Michael S
6 Feb 25  i  i +* Re: Cost of handling misaligned access2Anton Ertl
6 Feb 25  i  i i`- Re: Cost of handling misaligned access1Michael S
6 Feb 25  i  i `* Re: Cost of handling misaligned access28Terje Mathisen
6 Feb 25  i  i  `* Re: Cost of handling misaligned access27Terje Mathisen
6 Feb 25  i  i   `* Re: Cost of handling misaligned access26Michael S
6 Feb 25  i  i    `* Re: Cost of handling misaligned access25Terje Mathisen
6 Feb 25  i  i     +* Re: Cost of handling misaligned access19Michael S
7 Feb 25  i  i     i`* Re: Cost of handling misaligned access18Terje Mathisen
7 Feb 25  i  i     i `* Re: Cost of handling misaligned access17Michael S
7 Feb 25  i  i     i  `* Re: Cost of handling misaligned access16Terje Mathisen
7 Feb 25  i  i     i   `* Re: Cost of handling misaligned access15Michael S
7 Feb 25  i  i     i    +- Re: Cost of handling misaligned access1Terje Mathisen
7 Feb 25  i  i     i    +* Re: Cost of handling misaligned access3MitchAlsup1
8 Feb 25  i  i     i    i+- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i    i`- Re: Cost of handling misaligned access1Michael S
8 Feb 25  i  i     i    `* Re: Cost of handling misaligned access10Anton Ertl
8 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i     +* Re: Cost of handling misaligned access6Michael S
8 Feb 25  i  i     i     i`* Re: Cost of handling misaligned access5Anton Ertl
8 Feb 25  i  i     i     i +- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i +* Re: Cost of handling misaligned access2Michael S
11 Feb 25  i  i     i     i i`- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i `- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Michael S
10 Feb 25  i  i     i     `- Re: Cost of handling misaligned access1Michael S
7 Feb 25  i  i     `* Re: Cost of handling misaligned access5BGB
7 Feb 25  i  i      `* Re: Cost of handling misaligned access4MitchAlsup1
7 Feb 25  i  i       `* Re: Cost of handling misaligned access3BGB
8 Feb 25  i  i        `* Re: Cost of handling misaligned access2Anssi Saari
8 Feb 25  i  i         `- Re: Cost of handling misaligned access1BGB
6 Feb 25  i  `* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i   `- Re: Cost of handling misaligned access1Michael S
6 Feb 25  +* Re: Cost of handling misaligned access5Waldek Hebisch
6 Feb 25  i+* Re: Cost of handling misaligned access3Anton Ertl
6 Feb 25  ii`* Re: Cost of handling misaligned access2Waldek Hebisch
6 Feb 25  ii `- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i`- Re: Cost of handling misaligned access1Terje Mathisen
13 Feb 25  `* Re: Cost of handling misaligned access48Marcus
13 Feb 25   +- Re: Cost of handling misaligned access1Thomas Koenig
14 Feb 25   +* Re: Cost of handling misaligned access41BGB
14 Feb 25   i`* Re: Cost of handling misaligned access40MitchAlsup1
18 Feb 25   i `* Re: Cost of handling misaligned access39BGB
18 Feb 25   i  +* Re: Cost of handling misaligned access33MitchAlsup1
18 Feb 25   i  i+- Re: Cost of handling misaligned access1BGB
18 Feb 25   i  i`* Re: Cost of handling misaligned access31Michael S
18 Feb 25   i  i +- Re: Cost of handling misaligned access1Thomas Koenig
18 Feb 25   i  i +* Re: Cost of handling misaligned access26MitchAlsup1
18 Feb 25   i  i i`* Re: Cost of handling misaligned access25Terje Mathisen
18 Feb 25   i  i i `* Re: Cost of handling misaligned access24MitchAlsup1
19 Feb 25   i  i i  `* Re: Cost of handling misaligned access23Terje Mathisen
19 Feb 25   i  i i   `* Re: Cost of handling misaligned access22MitchAlsup1
19 Feb 25   i  i i    `* Re: Cost of handling misaligned access21BGB
20 Feb 25   i  i i     +- Re: Cost of handling misaligned access1Robert Finch
20 Feb 25   i  i i     +* Re: Cost of handling misaligned access5MitchAlsup1
20 Feb 25   i  i i     i+* Re: Cost of handling misaligned access2BGB
20 Feb 25   i  i i     ii`- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     i`* Re: Cost of handling misaligned access2Robert Finch
21 Feb 25   i  i i     i `- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     `* Re: Cost of handling misaligned access14BGB
22 Feb 25   i  i i      +- Re: Cost of handling misaligned access1Robert Finch
22 Feb 25   i  i i      `* Re: Cost of handling misaligned access12Robert Finch
23 Feb 25   i  i i       +* Re: Cost of handling misaligned access10BGB
23 Feb 25   i  i i       i`* Re: Cost of handling misaligned access9Michael S
24 Feb 25   i  i i       i +- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i `* Re: Cost of handling misaligned access7Michael S
24 Feb 25   i  i i       i  +* Re: Cost of handling misaligned access4Robert Finch
24 Feb 25   i  i i       i  i+- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i  i`* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i  i `- Re: Cost of handling misaligned access1BGB
25 Feb 25   i  i i       i  `* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i   `- Re: Cost of handling misaligned access1BGB
23 Feb 25   i  i i       `- Re: Cost of handling misaligned access1Robert Finch
18 Feb 25   i  i `* Re: Cost of handling misaligned access3BGB
19 Feb 25   i  i  `* Re: Cost of handling misaligned access2MitchAlsup1
18 Feb 25   i  `* Re: Cost of handling misaligned access5Robert Finch
17 Feb 25   `* Re: Cost of handling misaligned access5Terje Mathisen

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal