On 8/27/2024 2:59 PM, John Dallman wrote:
In article <vajo7i$2s028$1@dont-email.me>, tkoenig@netcologne.de (Thomas
Koenig) wrote:
Just read that some architects are leaving Intel and doing their own
startup, apparently aiming to develop RISC-V cores of all things.
They're presumably intending to develop high-performance cores, since
they have substantial experience in doing that for x86-64. The question
is if demand for those will develop.
Making RISC-V "not suck" in terms of performance will probably at least be easier than making x86-64 "not suck".
Android is apparently waiting for a new RISC-V instruction set extension;
you can run various Linuxes, but I have not heard about anyone wanting to
do so on a large scale.
My thoughts for "major missing features" is still:
Needs register-indexed load;
Needs an intermediate size constant load (such as 17-bit sign extended) in a 32-bit op.
Where, there is a sizeable chunk of constants between 12 and 17 bits, but not quite as many between 17 and 32 (and 32-64 bits is comparably infrequent).
I could also make a case for an instruction to load a Binary16 value and convert to Binary32 or Binary64 in an FPR, but this is arguably a bit niche (but, would still beat out using a memory load).
Big annoying thing with it, is that to have any hope of adoption, one needs an "actually involved" party to add it. There doesn't seem to be any sort of aggregated list of "known in-use" opcodes, or any real mechanism for "informal" extensions.
The closest we have on the latter point is the "Composable Extensions" extension by Jan Gray, which seems to be mostly that part of the ISA's encoding space can be banked out based on a CSR or similar.
I had considered possibly similar as part of the F3 block in my ISA, but not implemented yet. My idea would differ some in that the idea is that the F3 block would be divided into multiple smaller instruction blocks, rather than a single large block. With extensions mostly identified with FOURCC's.
Though, bigger immediate values and register-indexed loads do arguably better belong in the base ISA encoding space.
Also, for bigger immediate values, I more prefer the "jumbo prefix" approach that I had used, over the "use larger encodings and define a bunch of all-new instructions" approach that apparently Qualcomm is using/considering (well, along with apparently dropping the C extension and reusing the 16-bit encoding space for more 32-bit ops).
Well, along with people arguing over whether or not "that ship had sailed" and/or Qualcomm could justify a break in binary compatibility with programs compiled for SiFive CPU's, ...
Meanwhile, for the RV32IMC use case, one almost may as well just do an unlicensed Cortex-M clone or similar, as Thumb2 still tends to win in terms of code density and performance, and any patents on Thumb2 should (in theory) already be expired (would mostly just need to name it something different so as to not infringe on ARM's trademarks).
Well, say, doing an off-brand Thumb2 and calling it "Finger" or something (or a whole off-brand ARM11 clone and calling it "Hand", and then maybe running Raspbian binaries or similar).
At present, I am still on the fence about whether or not to support the C extension in RISC-V mode in the BJX2 Core, mostly because the encoding scheme just sucks bad enough that I don't really want to deal with it.
I had considered a mode of allowing using the 2 LSB bits for things like WEX hinting, but ended up not doing this and instead just implementing in-order superscalar for RISC-V, but as of yet, not for BJX2.
Though, there is a non-zero possibility that I might consider adding a "NOWEX" ISA variant that would effectively drop WEX in favor of more opcode space; and instead switch over to in-order superscalar. It likely wouldn't be that much different for BJX2 than for RISC-V, and arguably could have the "merit" of in theory allowing full-performance binary compatibility between 2-wide and 3-wide cores.
Though, TBD how well it would work out in practice.
I guess it is possible that I could drop/replace XG2RV Mode, as thus far XG2RV Mode is "kinda useless" and I would be better off just faking it in the assembler (using normal XG2 Mode).
Realistically, can't likely expect anyone else to adopt BJX2 though.
Also progress has been slow as seemingly at this point there is not much obvious to do other than debugging and misc stuff (like, trying to figure out a seemingly elusive bug where RISC-V mode + virtual memory isn't working in the Verilog implementation).
Went off and worked on font stuff and a new image format for TestKern (a small Rice-coded JPEG like format optimized for smaller code size) mostly as I got burnt out on trying and failing to figure out the cause of the bug. Seems like it may be a behavioral bug; as it seems to be resistant both to changing/recompiling binaries and to enabling/disabling features that would effect timing. Not yet found anything that seems to cause behavior to change.
Seems to be almost more behaving like a corrupted register or memory bug (and mostly seems to cause program to try to branch to a bad address).
...
Almost a question of if I might be better just doing a "purely" RISC-V core, or maybe a modified BJX2 Core that better mimic's SiFive's behavior for privileged operation in RISC-V mode (the CLINT mechanism, hardware page walk, RISC-V CSR's, ...) to potentially allow running a RISC-V Linux kernel (and maybe figure out a way to allow them to coexist with the kernel existing in RISC-V mode).
Though, bigger issue might be how to make it able to access hardware devices (seems like part of the physical address space is used for as a PCI Config space, and would need to figure out what sorts of devices the Linux kernel expects to be there in such a scenario).
Well, and then there is the issue that, if the CPU can still run BJX2 code, how to deal with the issue that more GPRs exist in BJX2 mode than exist in RISC-V mode (things will get wrecked if the OS scheduler is running in RISC-V mode and doesn't save/restore all of the registers).
...
John