On 3/7/2025 12:03 PM, Robert Finch wrote:
In other news, got around to implementing the BITMOV logic in BGBCC and similar, and implementing support for Verilog style notation.
>
So, writing things like:
y[55:48]=x[19:12];
Is now technically possible in my C variant, and (if the BITMOV instruction is enabled) be encoded in a single instruction.
>
Where:
y[55:48]=x[19:12];
Is 1 instruction, main requirement is that the source and destination bitfield are the same width (widening will require multiple ops).
>
If the instruction is not enabled, the fallback path is 4 to 6 instructions (in BJX2), or around 12 to 16 instructions for RV64G. I decided to also add support for BITMOV to the RV decoder via my jumbo- extension encodings (though, with some limitations on the 128-bit case due to it needing to fit into a 21 bit immediate).
>
And:
j=x[19:12];
Also a single instruction, or 2 or 3 in the fallback case (encoded as a shift and mask).
>
I find these operations handy when dealing with I/O devices that have bitfields. It makes it easy to test bits and compiles to an extract instruction.
inbyte = *irqport;
if (inbyte[7])
<do something>
Support for bitfield indexing (Verilog style) is in the Arpl compiler. I found there were enough differences from standard C I better call it a different name.
BGBCC mostly accepts standard C.
But, has some amount of potentially wacky non-standard extensions...
Though, it seems that the idea of bitfield instructions or gluing some Verilog style functionality onto a C compiler isn't quite as crazy or wacky as it may at first seem...
Then again, turns out even Thumb2 has bitfield instructions, so these are not super rare (rarer though is having an explicit syntax for this sort of stuff).
Likely, if it was "that" weird or obscure, they wouldn't have bothered adding it.
Formally though, RISC-V still lacks them.
Apparently early forms of BitManip had considered them, but they were dropped.
Like, bitfield helpers were too weird/obscure, but hard-coding parts of the CRC or stuff related to DES encryption and similar into the ISA is fine...
Like, it is kinda unbalanced:
Most "general use" features ended up being dropped, leaving most of it (beyond Zba) as just sort of a grab-bag of very niche features, most of which only being relevant to certain specific algorithms.
Like, it is a bit of a "stupid rebellion" if one ends up reviving previously dropped instructions, like ADDWU, mostly because "ADD with zero extension" is "something I actually have a use for...".
Well, and I guess because my design impetus is not to micro-optimize how quickly the CPU can run specific benchmarks (say, adding or removing instructions based on whether or not they benefit the CoreMark score...).
Granted, arguably, maybe tuning for Doom performance over weighs the relative effect of register-indexed load/store and similar, but not exactly like these are all that rare in non-Doom workloads.
...
For a simple test:
lj[ 7: 0]=li[31:24];
lj[15: 8]=li[23:16];
lj[23:16]=li[15: 8];
lj[31:24]=li[ 7: 0];
Does seem to compile down to 4 instructions.
>
Apart from some special-case recognition logic, would have otherwise needed 8 or 12...
Ended up adding special cases both for things like:
Both sides of the assignment is a bit vector;
And that the size matches up, etc.
Bit vector source/dest is a normal variable (common case).
If these fail, it falls back to more generic logic (and thus generating more instructions in the final machine code).
Though, some of this is relevant for the "=" operator in general, which has a whole lot of special-case pattern matching assigned to it.
Well, as dealing with assignment often isn't quite as simple as:
Compile RHS expression;
Store result value into LHS expression.
And, a "simple" solution like:
Take address of LHS;
Turn it into an operation on a pointer.
Wouldn't be acceptable as a general-case solution in most cases.
In many cases, taking the address of something results in adverse penalties in other areas (requires it to exist in memory and then gets aliasing pointer semantics involved, ...).
...
I kinda suspect there are no good simple/elegant solutions here, and given how huge and complicated GCC is in these areas, I suspect they mostly just went down the "brute force every scenario" path in most areas.
Like, every time you try to be "simple" in one place, it often pushes the complexity somewhere else, so if anything it more turns into a game of pushing it around and seeing what needs to deal with it (programmer, compiler frontend, compiler backend, logic in the CPU, time itself, ...).
Like, seemingly, things like compiler logic are also subject to the laws of entropy...
One could impose limits, but C has a long tradition of people being able to do:
whatever=whatever;
With no real concern for the implicit entropy cost of having the ability to do so...
Well, and seemingly the design of both compilers and CPU ISAs turning into a problem of trying to twiddle stuff to reduce the implicit entropy costs.
...