Liste des Groupes | Revenir à c arch |
On 9/27/2024 2:40 PM, MitchAlsup1 wrote:I always did code motion prior to assembler. Code motion only has toOn Fri, 27 Sep 2024 18:26:28 +0000, BGB wrote:>
But, generally this does still impose limits:
Can't reorder instructions across a label;
Can't move instructions with an associated reloc;
Can't reorder memory instructions unless they can be proven to not aliasSame base register different displacement.
(loads may be freely reordered, but the relative order of loads and
stores may not unless provably non-aliasing);
The effectiveness of this does depend on how the C code is writtenOne of the reasons reservation stations became in vouge.
though (works favorably with larger blocks of mostly-independent
expressions).
-----Most agree it is closer to 30% than 25% {{Unless you clutter up the ISA>
such that your typical memref needs a support instruction.
>
Cough, RV64...
-----Which makes that 16% (above) into 48% and renormalizing to::>
~ 63% fixed-displacement;
~ 36% register-indexed and support instructions.
Yeah.
>
I think there are reasons here why I am generally getting lackluster
performance out of RV64...
The assembler gets to choose based on the memory model::>Comparably, XG2 has a 16K or 32K reach here (depending on immediate>
size), which hits most of the global variables. The fallback Jumbo
encoding hits the rest.
I get ±32K with 16-bit displacements
>
Baseline has special case 32-bit ops:
MOV.L (GBR, Disp10u), Rn //4K
MOV.Q (GBR, Disp10u), Rn //8K
>
But, in XG2, it gains 2 bits:
MOV.L (GBR, Disp12u), Rn //16K
MOV.Q (GBR, Disp12u), Rn //32K
>
Jumbo can encode +/- 4GB here (64-bit encoding).
MOV.L (GBR, Disp33s), Rn //+/- 4GB
MOV.Q (GBR, Disp33s), Rn //+/- 4GB
>
Mostly because GBR displacements are unscaled.
Plan for XG3 is that all Disp33s encodings would be unscaled.
BJX2 can also do (PC, Disp33s) in a single logical instruction...What is your definition of "single logical instruction". In my parlance,
>
But, RISC-V can't...
>
>Compared to::>>>
Likewise, no one seems to be bothering with 64-bit ELF FDPIC for RV64
(there does seem to be some interest for ELF FDPIC but limited to 32-bit
RISC-V ...). Ironically, ideas for doing FDPIC in RV aren't too far off
from PBO (namely, using GP for a global section and then chaining the
sections for each binary).
How are you going to do dense PIC switch() {...} in RISC-V ??
Already implemented...
>
With pseudo-instructions:
SUB Rs, $(MIN), R10
MOV $(MAX-MIN), R11
BGTU R11, R10, Lbl_Dfl
>
MOV .L0, R6 //AUIPC+ADD
SHAD R10, 2, R10 //SLLI
ADD R6, R10, R6
JMP R6 //JALR X0, X6, 0
>
.L0:
BRA Lbl_Case0 //JAL X0, Lbl_Case0
BRA Lbl_Case1
...
1 is less than 4, too.>Currently, BGBCC does not use this strategy.>
Though, for 64-bit constants it could be more compact and faster.
>
But, better still would be having Jumbo prefixes or similar, or even a
SHORI instruction.
Better Still Still is having 32-bit and 64-bit constants available
from the instruction stream and positioned in either operand position.
>
Granted...
>
>>Say, 64-bit constant-load in SH-5 or similar:>
xxxxyyyyzzzzwwww
MOV ImmX, Rn
SHORI ImmY, Rn
SHORI ImmZ, Rn
SHORI ImmW, Rn
Where, one loads the constant in 16-bit chunks.
Yech
>
But, 4 is still less than 6.
Les messages affichés proviennent d'usenet.