On 3/7/2025 11:34 AM, MitchAlsup1 wrote:
On Fri, 7 Mar 2025 11:08:56 +0000, BGB wrote:
On 3/6/2025 10:09 PM, Lawrence D'Oliveiro wrote:
----------------
>
So, writing things like:
y[55:48]=x[19:12];
2 instructions in My 66000. One extract, one insert.
1 instruction in this case...
The 3 sub-fields being, 36, 48, and 56.
The way I defined things does mean adding 1 to the high bit in the encoding, so 63:56 would be expressed as 64:56, which nominally uses 1 more bit of range. Though, if expressed in 6 bits, the behavior I had defined it as, effectively causes it to be modulo.
Though, ironically, while I could have eliminated the "off by 1" for the high index encoding, this would also implicitly disallow the ability to express modulo (wrap around) bitfields. So, it seemed like a reasonable tradeoff (even if wraparound bitfields aren't really a thing).
In the layout for my ISA, I had uses 8,8,8 because I had the bits, and this allows the same layout for both 64 and 128 bit operations. Though, 8,7,7 would have also worked.
For RV+Jx, there weren't quite enough bits. So, it instead used 8,6,6,1; for the 64-bit case it functions as a modulo ring. For the 128-bit case there is some wonk:
Hi>=Lo: Extra bit is added to high bit of both;
Else: Extra bit is added to high bit of high.
This leaves modulo wraparound as N/E for 128-bits, but likely wraparound is going to be a rare edge case.
Within BGBCC, it internally uses 16.16.16 for the IR stage, but mostly because this allows for potentially large _BitInt vectors.
Though, bit vectors larger than 128 bits, or bitfields larger than 64, wont be directly supported at the ISA level (and will need to be faked by the compiler).
But, in Verilog, it is not uncommon to go well outside of the usual 64-bit limit that C typically imposes.
But, "32kb" in a single bit-vector should probably be enough for anyone...
>
And:
j=x[19:12];
Also a single instruction, or 2 or 3 in the fallback case (encoded as a
shift and mask).
1 instruction--extract (SLL or SLA)
BITMOV, parameters being -12, 0, 8.
But, does require a register holding 0 as input at present for XG1 and XG2.
Fallback cases:
SHAD.Q Rm, -12, Rt
AND Rt, 255, Rd
And:
SRLI Rt, Rm, 12
ANDI Rd, Rt, 255
----------------------
>
For a simple test:
lj[ 7: 0]=li[31:24];
lj[15: 8]=li[23:16];
lj[23:16]=li[15: 8];
lj[31:24]=li[ 7: 0];
Does seem to compile down to 4 instructions.
1 innstruction:: BITR rd,rs1,<8>
In this particular case, there is also a SWAP.L instruction, but I was ignoring it for sake of this example, and my compiler isn't that clever.
It was already an ugly chunk of code to get it to recognize the above cases as single-instruction encodings.
I went the route of adding Verilog style syntax as for now, inferring all this from shifts and masks would be asking too much from my compiler...
Though, looking at the compiler code, it would be subject to the "side
effects in lvalue may be applied twice" bug:
(*ct++)[19:12]=(*cs++)[15:8];
5 instructions:: LD, LD, EXT, INS, ST; with deferred ADD to Rcs and Rct.
Ideally, one would have:
LD, LD, BITMOV, ST, ADD, ADD
However, there is going to be an issue with the logic-structure in my compiler in this case:
LD, ADD, LD, ADD, BITMOV, ST, ADD
In the past, for operators like "+=" and similar, there was a workaround, effectively detecting the non-clean expression and compiling it as:
tmp=&(*ct++);
*tmp=*tmp+val;
But, haven't yet added this special case for the bit handling.
It is a pesky issue, that reappears readily...
I am debating if it is worthwhile to deal with in this case, or if I should maybe just be lazy and do the "detect and issue a warning or error" thing instead (the proper workaround is fairly awkward, and this is likely a fairly obscure edge case).
I mostly added this for sake of possible Verilog support, and this is an edge case that can't happen in Verilog.
Unlike Verilog, in C mode it will currently require single-bit fetch to use a notation like x[17:17], but this is more because a person is much more likely to type "x[17]" by accident (such as by using the wrong variable, a missing '*', or ...).
I am considering adding something to infer bitfield extract from "((x>>SHR)&MASK)".
Will likely need to use a different sub-operator though, mostly because the type of the original expression would likely be "int" or similar, and at present the bit-extract operator will give a type of '_BitInt(N)' which is not quite the same.
One uncertainty at present is whether register declarations with a non-zero low bit are actually a thing:
reg[23: 8] wackyVar1;
reg[11:-4] wackyVar2;
It would simplify things a fair bit if I can just pretend these cases do not exist... (then everything can mostly just decay into the existing _BitInt type internally).
Well, along with an open question of how much the compiler needs to distinguish reg/wire/input/output. Can maybe ignore "inout" as Verilator doesn't support it, and for this I mostly only care about supporting similar functionality to Verilator (so, tristate logic will probably not be a thing in most cases).
Expressions like "16'h4zz6" will need to be able to remember it, current thinking is that these will be a special literal integer pair type, representing each bit as 2 bits, say:
00: 0, 01: 1, 10: Z, 11: X
They will matter for "casez" or similar, but in other cases will just decay into just the value bits.
Though, input/output distinction would be needed for module ports, so I may need something here. Could almost just repurpose signed _BitInt for output, except that (annoyingly) a semantic distinction between signed and unsigned values does exist in SystemVerilog.
So, at a least, may need a new subtype of BitInt for "output". But could maybe get by with dropping the distinction between "reg"/"wire"/"input", like was the case in older versions of Verilator.
These distinctions don't matter as much if running Verilog on a CPU.
Well, and probably also working on fixing the "fast" mode for JX2VM (emulator performance hasn't been a particularly high priority here; but would matter a bit more if using it for Verilog simulation).
Well, with the goal of it being hopefully possible to throw some sort of source-level debugging on this stuff.
Debugging with Verilator is currently a problem, as trying to navigate through undocumented auto-generated C++ confetti in GDB isn't particularly workable.
Like, if I could just do something like (in GDB):
(break simulation via CTRL-C or similar)
print top->cpu->regGpr->gprArrA[7]
I would have much less reason to complain...
Much less practical to dig through ~ 10MB of generated C++ to try to figure out how it has mangled the names and/or seemingly decomposed a lot of the modules to merge them all into some big mass of code.
Say, if every module mapped 1:1 with a class, and every module instance was an object instance, would be less of an issue (and I would probably have less of a reason to bother).
...