On 6/11/2025 11:37 AM, MitchAlsup1 wrote:
On Wed, 11 Jun 2025 9:42:47 +0000, BGB wrote:
On 6/11/2025 12:56 AM, Thomas Koenig wrote:
quadibloc <quadibloc@gmail.com> schrieb:
>
Since the basis of the ISA is a RISC-like ISA,
>
[...]
>
3) Use only four base registers instead of eight.
4) Use only three index registers instead of seven.
5) Use only six index registers instead of seven, and use only four base
registers instead of eight when indexing is used.
>
Having different classes of base and index registers is very
un-RISCy, and not generally a good idea. General purpose registers
is one of the great things that the /360 got right, as the VAX
later did, and the 68000 didn't.
>
>
Agreed.
>
Ideally, one has an ISA where nearly all registers are the same:
No distinction between base/index/data registers;
No distinction between integer and floating point registers;
No distinction between general registers and SIMD registers;
...
Agreed:: But most architectures get the FP registers wrong under that
distinction, and apparently everyone gets the SIMD registers wrong.
Maybe it should be stated:: There is one register file of k-bits per
register (where K=32, 64, 128} and that there is no distinction between
what kind of data can go in what register.
Yeah.
Can note that seemingly my ISA is one of the few that is using GPRs for SIMD.
But, I have also noted that 256 and 512 bit vectors seem to be solidly in "diminishing returns" territory and also even higher end processors can struggle to deal with them effectively.
Well, and seemingly, in cases where AVX256 is used, it is seemingly more useful for doing memcpy in bigger chunks than actually doing SIMD with 256 bit vectors.
If taking the approach of, say: "Vectors are 64 or 128 bit, and call it done", then putting SIMD in GPR space (and using register pairs for 128-bit vectors) is fairly viable.
Well, more so when most of the 128-bit logic is being handled as "ganged up" 64-bit logic (so, little additional cost as the relevant paths had existed already for sake of 2-wide superscalar).
Though, there are tradeoffs. For example, SPRs can be, by definition,
not the same as GPRs. Say, if you have an SP or LR, almost by
definition, you will not be using it as a GPR.
Disagree:: One uses the SP as a base register "all the time",
one uses LR as a JMP source "every subroutine return".
Either is generally done using GPRs, and thus the problem
is to guarantee that you don't have so many of them that
you can't use them naturally in your ISA>
I am not saying they can't be in GPR space or GPR like in some ways, just they are not proper GPRs.
Say, both RISC-V and XG3 have:
R0/X0: ZR (Zero)
R1/X1: LR / RA (Link Register)
R2/X2: SP (Stack Pointer)
R3/X3: GP (Global Pointer)
They are mostly used like GPRs, but as far as I am concerned, none of these is a GPR proper.
Ironically, the main exception to this is X4/TP, as while it has an assigned role, it is one of the few registers in this group that, in fact, doesn't need special consideration here. (Decided to leave out going onto a tangent about how glibc seems to use this register though).
SP: Often parts of the ISA will assume that SP is where SP is.
Even if the surface level ISA is fairly agnostic to which register is SP, things like interrupt handler mechanisms, etc, are almost invariably going to need interact with SP in ways that differ from other GPRs.
LR: Functionally, in most ways the same as a GPR, but is assigned a special role and is assumed to have that role. Pretty much no one uses it as a base register though, with the partial exception of potential JALR wonk.
JALR X0, X1, 16 //not technically disallowed...
If one uses the 'C' extension, assumptions about LR and SP are pretty solidly baked in to the ISA design.
ZR: Always reads as 0, assignments are ignored; this behavior is very un-GPR-like.
GP: Similar situation to LR, as it mostly looks like a GPR.
In my CPU core and JX2VM, the high bits of GP were aliased to FPSR, so saving/restoring GP will also implicitly save/restore the dynamic rounding mode and similar (as opposed to proper RISC-V which has this stuff in a CSR).
Though, this isn't practically too much different from using the HOB's of captured LR values to hold the CPU ISA mode and similar (which my newer X3VM retains, though I am still on the fence about the "put FPSR bits into HOBs of GP" thing).
Does mean that either dynamic rounding mode is lost every time a GP reload is done (though, only for the callee), or that setting the rounding mode also needs to update the corresponding PBO GP pointer (which would effectively make it semi-global but tied to each PE image).
The traditional assumption though was that dynamic rounding mode is fully global, and I had been trying to make it dynamically scoped.
So, it may be that having FPSR as its own thing, and then explicitly saving/restoring FPSR in functions that modify the rounding mode, may be a better option.
Either way, still better than the situation with SH-4 where, unless the code sequence is written specially in ASM or similar, any attempt to use the rounding mode would be effectively useless (compiler would effectively end up needing to switch between Single and Double, *).
*: Though, this did mutate into the keep everything in Double internally and merely do convert to/from Single on Load/Store strategy, which BGBCC still uses (including for RV64 where it is arguably not the most efficient strategy). Though, in C type rules, every time you encounter a floating-point constant without an explicit 'f' suffix or similar, you need to promote to Double, so ironically in a "following C's type rules" implementation, converting everything to Double on load actually results in fewer type conversions than "widen to Double only to immediately narrow back to Float." (though, GCC seems to often quietly ignore this rule).
Though, it goes further, at least on SH-4, GCC seemed to use the final precision as the operating precision, so seemingly if preceding calculations were double but then resulted in a float result, it would back-propagate the 'float' such that everything was done as float (even when the variables were declared as double). Probably justifiable for SH-4 though as its FPU design was "kinda crap" (though, still, nominally better than x87).
There were scenarios where the lack of intermediate precision could visibly effect results though (and needing the function to return the value as double to not squash all the intermediate math down to float was, kinda weak).
Though, I suspect leaving all 'float' values as 'double' internally (unless the address of the variable is taken or similar), is arguably the better option if one is concerned with precision.
Though, OTOH, Quake has stuff like:
typedef float vec3_t[3];
vec3_t v0, v1, v2;
...
VectorCopy(v0, v1);
Where VectorCopy is a macro that expands it out to something like, IIRC,
do { v1[0]=v0[0]; v1[1]=v0[1]; v1[2]=v0[2]; } while(0);
Where BGBCC will naively load each value, widen it to double, narrow it back to float, and store the result.
Could special-case it, maybe do the "actually semi-efficient" thing of turning it into LWU+SW or similar on RV.
But, mostly working OK despite the FPU code that BGBCC generates for the RISC-V target being "kinda crap" (partly because BGBCC doesn't currently understand the distinction between GPRs and FPRs as it exists in RV64G; and "type register affinity" is still kinda weak).
Where, say, the compiler doesn't so much understand a hard split between integer and FPU registers but more "preferentially allocate integer registers here, and floating point values there" (with a sort of soft partition in the register allocation).
And, if a value is on the wrong side of the partition, it just sorta jostles them back and forth using "stomp registers" (X5/X6/X7 and F0..F3).
Though, may allocate FPU values on the integer side, but needs to disallow putting any integer or pointer type values on the FPR side.
Still somehow managing to be reasonably competitive with GCC despite the code generation for RV64 being kinda crap.
This is less of an issue for XG3 though, as XG3 is back to being a unified register space (though may still matter in cases where it tries to leverage RV64 encodings).
So, if ZR/LR/SP/GP are "not GPR", this is fine.
Pretty much everything else is best served by being a GPR or suitably
GPR like.
>
....