Newsportal USENET - Re: Misc: BGBCC targeting RV64G, initial results...

On Fri, 27 Sep 2024 18:26:28 +0000, BGB wrote:

On 9/27/2024 7:50 AM, Robert Finch wrote:
On 2024-09-27 5:46 a.m., BGB wrote:
---------
>
But, BJX2 does not spam the ADD instruction quite so hard, so is more
forgiving of latency. In this case, an optimization that reduces
common-case ADD to 1 cycle was being used (it only works though in the
CPU core if the operands are both in signed 32-bit range and no overflow
occurs; IIRC optionally using a sign-extended AGU output as a stopgap
ALU output before the output arrives from the main ALU the next cycle).
>

RISC-V group opinion is that "we have done nothing to damage pipeline
operating frequency". {{Except the moving of register specifier fields
between 32-bit and 16-bit instructions; except for: AGEN-RAM-CMP-ALIGN
in 2 cycles, and several others...}}

>
>
Comparably, it appears BGBCC leans more heavily into ADD and SLLI than
GCC does, with a fair chunk of the total instructions executed being
these two (more cycles are spent adding and shifting than doing memory
load or store...).
>
That seems to be a bit off. Mem ops are usually around 1/4 of

Most agree it is closer to 30% than 25% {{Unless you clutter up the ISA
such that your typical memref needs a support instruction.

instructions. Spending more than 25% on adds and shifts seems like a
lot. Is it address calcs? Register loads of immediates?
>
>
It is both...
>
>
In BJX2, the dominant instruction tends to be memory Load.
   Typical output from BGBCC for Doom is (at runtime):
   ~ 70% fixed-displacement;
   ~ 30% register-indexed.
   Static output differs slightly:
   ~ 84% fixed-displacement;
   ~ 16% register-indexed.
>
RV64G lacks register-indexed addressing, only having fixed displacement.
>
If you need to do a register-indexed load in RV64:
   SLLI X5, Xo, 2 //shift by size of index
   ADD X5, Xm, X5 //add base and index
   LW Xn, X5, 0   //do the load
>
This case is bad...

Which makes that 16% (above) into 48% and renormalizing to::
~ 63% fixed-displacement;
~ 36% register-indexed and support instructions.

>
>
Also global variables outside the 2kB window:
   LUI   X5, DispHi
   ADDI X5, X5, DispLo
   ADD   X5, GP, X5
   LW Xn, X5, 0
>
Where, sorting global variables by usage priority gives:
   ~ 35%: in range
   ~ 65%: not in range

Illustrating the falicy of 12-bits of displacement.

Comparably, XG2 has a 16K or 32K reach here (depending on immediate
size), which hits most of the global variables. The fallback Jumbo
encoding hits the rest.

I get ±32K with 16-bit displacements

>
Theoretically, could save 1 instruction here, but would need to add two
more reloc types to allow for:
   LUI, ADD, Lx
   LUI, ADD, Sx
Because annoyingly Load and Store have different displacement encodings;
and I still need the base form for other cases.
>
>
More compact way to load/store global variables would be to use absolute
32-bit or PC relative:
   LUI + Lx/Sx : Abs32
   AUIPC + Lx/Sx : PC-Rel32

MEM Rd,[IP,,DISP32/64] // IP-rel
-----

>
Likewise, no one seems to be bothering with 64-bit ELF FDPIC for RV64
(there does seem to be some interest for ELF FDPIC but limited to 32-bit
RISC-V ...). Ironically, ideas for doing FDPIC in RV aren't too far off
from PBO (namely, using GP for a global section and then chaining the
sections for each binary).

How are you going to do dense PIC switch() {...} in RISC-V ??

Main difference being that FDPIC uses fat
function pointers and does the GP reload on the caller, vs PBO where I
use narrow function pointers and do the reload on the callee (with
load-time fixups for the PBO Offset).
>
>
The result of all this is a whole lot of

unnecessary

Shifts and ADDs.

Seemingly, even more for BGBCC than for GCC, which already had a lot of
shifts and adds.
>
BGBCC basically entirely dethrowns the Load and Store ops ...
>
>
Possibly more so than GCC, which tended to turn most constant loads into
memory loads. It would load a table of constants into a register and
then pull constants from the table, rather than compose them inline.
>
Say, something like:
   AUIPC X18, X18, DispHi
   ADD X18, X18, DispLo
   (X18 now holds a table of constants, pointing into .rodata)
>
And, when it needs a constant:
   LW Xn, X18, Disp //offset of the constant it wants.
Or:
   LD Xn, X18, Disp //64-bit constant
>
>
Currently, BGBCC does not use this strategy.
Though, for 64-bit constants it could be more compact and faster.
>
But, better still would be having Jumbo prefixes or similar, or even a
SHORI instruction.

Better Still Still is having 32-bit and 64-bit constants available
from the instruction stream and positioned in either operand position.

Say, 64-bit constant-load in SH-5 or similar:
   xxxxyyyyzzzzwwww
   MOV   ImmX, Rn
   SHORI ImmY, Rn
   SHORI ImmZ, Rn
   SHORI ImmW, Rn
Where, one loads the constant in 16-bit chunks.

Yech

>
>

Don't you ever snip anything ??

Date	Sujet	#	Auteur
27 Sep 24	Misc: BGBCC targeting RV64G, initial results...	37	BGB
27 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	20	Robert Finch
27 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	19	BGB
27 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	18	MitchAlsup1
28 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	17	BGB
28 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	16	MitchAlsup1
28 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	15	BGB
29 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	14	MitchAlsup1
30 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	13	BGB
30 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	1	MitchAlsup1
1 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	11	Robert Finch
1 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	1	MitchAlsup1
3 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	9	BGB
4 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	2	Robert Finch
4 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	1	BGB
6 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	6	MitchAlsup1
8 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	5	BGB
8 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	4	MitchAlsup1
9 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	3	BGB
9 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	1	Stefan Monnier
9 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	1	MitchAlsup1
27 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	16	MitchAlsup1
27 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	2	BGB
28 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	1	MitchAlsup1
28 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	13	Paul A. Clayton
30 Sep 24	Re: Misc: BGBCC targeting RV64G, initial results...	12	MitchAlsup1
16 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	11	Paul A. Clayton
16 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	9	Stephen Fuld
16 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	1	Thomas Koenig
16 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	7	BGB
16 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	6	MitchAlsup1
17 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	5	BGB
18 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	4	MitchAlsup1
21 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	3	BGB
21 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	2	MitchAlsup1
22 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	1	BGB
16 Oct 24	Re: Misc: BGBCC targeting RV64G, initial results...	1	MitchAlsup1