On 9/17/2024 3:11 PM, MitchAlsup1 wrote:
On Tue, 17 Sep 2024 19:51:19 +0000, BGB wrote:
On 9/17/2024 4:39 AM, David Brown wrote:
On 16/09/2024 21:46, BGB wrote:
On 9/16/2024 4:27 AM, David Brown wrote:
>
Albeit, types like _Bool in my implementation are padded to a full
byte (it is treated as an "unsigned char" that is assumed to always
hold either 0 or 1).
>
That's the usual way to handle them.
Smallest C container is 1 byte
__BOOL can use as small a container as C can address
It works, and is typical.
But, does kinda seem like a waste sometimes to burn 8 bits to hold a 1 bit value.
Then again, it wastes even more space in function argument lists (a full 64 bits), but then again, one doesn't usually care too much about the size of function arguments (or the bits "wasted" in the upper unused parts of GPRs).
>
>
Another option would be for adjacent _Bool values to merge similar to
bitfields...
Though, seems that simply turning it into a byte is the typical option.
One can do ATOMIC stuff on a __BOOL
one cannot do ATOMIC stuff on struct { unsigned __bool: 1};
Probably true.
>
>
This comes up as an issue in some Windows file formats, where one
can't just naively use a struct with 32-bit fields because some 32-bit
members only have 16-bit alignment.
>
Ah, the joys of using ancient formats with new systems!
>
>
I was around when this stuff was still newish.
>
Some are essentially frozen in time with their misaligned members.
In HW the packing and unpacking of multi-container single variables
is easy--its just wires.
Still better than:
"Well, initial field wasn't big enough";
"Repurpose those bytes from over there, and glue them on".
Really NOT a problem in HW--understandably low efficiency in SW.
Yeah.
Bunch of shifts and OR's.
Then again, by the time one is dealing with this part (say, in a FAT directory entry), chances are they have already gone through the (more expensive) process of linearly searching the directory to find the file they are looking for.
Ironically, also, FAT is a much more convoluted format than, say, ELF.
But conversely getting ELF shared-object loading to work is being a much bigger pain, mostly because seemingly no one has really bothered to document this stuff for crap (like, "what, exactly, is supposed to be the correct initial state of the registers, auxiliary vectors, ..., such that ld-linux.so and similar is happy?...").
As-is, it now seems it gets a good part of the way though the process, but then dies trying to deref a NULL pointer before getting to the last stage (after which it will apparently transfer control to the entry point for the main binary).
Seemingly, I would also expect there to be some way to supply to ld-linux.so the locations for where one put the various ELF images (besides just the main binary and the interpreter, as it appears to assume that the kernel loads them into memory, but ld-linux.so does the symbol lookups and relocs).
All this being made a little harder as I don't have any symbolic information for these images, so need to try to manually figure out where it is at based on the memory addresses (and by looking at the code for the C library).
>
There would need to be a mechanism in the ISA to select between these
modes though (probably a "magic branch" scheme different from the one
used for Inter-ISA branches).
Modes make testing significantly harder. Each mode adds 1 to the
exponent
how many test cases it takes to adequately test a part.
Possibly.
But, modes are kinda unavoidable here:
CPU only runs RV64GC or similar:
Doomed to relative slowness;
CPU only does CoEx:
Closes off the ability to run binaries that assume RV64GC.
CPU only does new ISA:
Well, then it can't run RISC-V code, making all this kinda moot.
This would be different from my current multi-ISA scheme, where RISC-V and BJX2 are essentially entirely separate ISAs (and jumping from RV64 to BJX2 mode currently requires using a full 64-bit pointer).
And, also the ABIs don't match.
The "new ISA" design could express both the RISC-V and BJX2 ABI;
While there are a few differences, they would matter more for a kernel mode context switcher (the new ISA lacks access to R0/R1/R14 from the existing ISA, similar to XG2RV, but this will not matter for function calls);
...
As-is, the CoEx scheme would not be possible with XG2 + RISC-V as BJX2 and RISC-V instruction encodings are entirely incompatible.
Whereas, in the new ISA case, I designed the encoding to allow for some level of compatibility with RISC-V.
Modes would also avoid one of the argued main downsides of the Qualcomm proposal. Though, apparently, from what I gather they had also added auto-increment addressing, which I am not so sold on (wasn't able to find a proper spec though for what exactly they did; have only been able to find second-hand accounts).
>
This would likely include an RV64 encoding for "Branch to/from CoEx",
and an encoding within this ISA to jump between CoEx and "Native" mode.
>
Magic branches make sense mostly as any such mode switch is going to
require a pipeline flush.
>
This is assuming an implementation that would want to be able to support
both this ISA and also RV64GC.
>
One possibility could be (in native RV notation):
RV64 (Branches if supported, NOP if not):
LBU X0, Xs, Disp12s //Dest=RV64GC
LWU X0, Xs, Disp12s //Dest=CoEx
LHU X0, Xs, Disp12s //Dest=Native
New ISA:
LBU X0, Xs, Disp10s //Dest=RV64GC
LWU X0, Xs, Disp10s //Dest=CoEx
LHU X0, Xs, Disp10s //Dest=Native
This only gives 36-bits (top) or 30-bits (bottom) or range. What you are
going to want is 64-bits of range -- especially when switching modes--
you PROBABLY want to use an entirely different sub-tree of the
translation
table trees.
Idea here is that 'Xs' will give the base address for the target.
On the RISC-V side, this would mean, say:
AUIPC X7, disp
LWU X0, X7, disp
Similar to a normal JALR.
I could almost interpret X0 as PC, except that on a "standard" RISC-V CPU, the non-supported case would be, likely: "program crashes trying to access a NULL pointer", which is less useful.
Branches in the new ISA would likely be encoded using jumbo prefixes.
Well, partly because the new ISA lacks AUIPC, but the new ISA can encode it more directly as, essentially:
LWU X0, PC, Disp33s
Though, in CoEx mode, it could still borrow RISC-V's AUIPC instruction (in the "native mode", faking an AUIPC would require a 64-bit encoding).
However, using AUIPC would offer little benefit here (over a 64-bit PC-relative encoding).
TBD if the CoEx mode would allow Jumbo-prefixing RISC-V ops. In theory, could be possible, but would also probably be more hair than it is worth...
The encoding for RISC-V immediate values is already a mess, and don't necessarily want to throw jumbo-prefix logic into the mix...
Most obvious case is to combine them with Imm12 instructions, adding 21 bits from the prefix to extend them to 33 bits.
As for a possible ISA name.
Well, if I implemented it with my current stuff, possibly XG3 (for "native mode") or RVXG3 (for CoEx mode).
Would still require implementing support for it in BGBCC, and evaluating whether or not the additional instruction-decoder logic is affordable (not much would need to be added in terms of EX side logic, as the BSR4I/XG3 design doesn't really add anything that RISC-V and BJX2 doesn't do already...).
Well, apart from possibly making the decoder bigger.
Though, this would be more attractive for a CPU that was made to run RISC-V as the primary ISA, rather than one running both BJX2 and RISC-V, as I would expect the relative cost for the decoder would be lower than what is needed for the existing BJX2 decoder.
...