On 10/17/2024 9:28 PM, MitchAlsup1 wrote:
On Thu, 17 Oct 2024 0:03:26 +0000, BGB wrote:
On 10/16/2024 5:16 PM, MitchAlsup1 wrote:
On Wed, 16 Oct 2024 20:23:08 +0000, BGB wrote:
>
>
Ironically, one of the main arguable use-cases for old Fortran style IF
statements is implementing the binary dispatch logic in a binary
subdivided "switch()", but not enough to justify having a dedicated
instruction for it.
>
Say:
MOV Imm, Rt //pivot case
BLT Rt, Rx, .lbl_lo
BGT Rt, Rx, .lbl_hi
BRA .lbl_case
>
With a 64-bitinstruction one could do::
>
B3W .lbl_lo,.lbl_zero,.lbl_hi
>
rather straightforwardly.....
>
Possibly, but the harder part would be to deal with decoding and feeding
the instruction through the pipeline.
Feed the 3×15-bit displacements to the branch unit. When the condition
resolves, use one of the 2 selected displacements as the target address.
No dedicated "branch unit" in my case.
Generally, non-predicted branching is handled by using the AGU to generate the address, as in a memory load, but then signaling that a branch should be be initiated (in the EX1 stage's glue logic).
Generating a 3-way branch does not map to the AGU though.
One downside of such a branch is that it would also not mix with my existing branch predictor logic, which thus far is built around a state machine of taken vs non-taken, so would likely ignore a 3-way branch (making it potentially slower than multiple conventional branches).
Granted, I guess it could be decoded as if it were a normal 3RI op or
similar, but then split up the immediate into multiple parts in EX1.
Why would you want do make it 3×11-bit displacements when you can
make it 3×16-bit displacements.
+------+-----+-----+----------------+
| Bc | 3W | Rt | .lb_lo |
+------+-----+-----+----------------+
| .lb_zero | .lb_hi |
+------------------+----------------+
Neither BJX2 nor RISC-V have the encoding space to pull this off...
Even in a clean-slate ISA, it would be a big ask.
Could be possible though in both, via a 96 bit encoding.
Likely, a 2-way with fall-through on equal might make more sense:
Cheaper to implement;
If it falls through, one has already found the target case.
But, yeah, 3x 11b isn't super useful, 2x 16b could be more useful.
But, still wouldn't play with the branch-predictor.
FWIW: Actually I went with the current jumbo prefix encoding rather than the official 64-bit instruction encoding scheme for my RV64 ext because, ironically, the route I went would eat less of the encoding space.
Working more on BGBCC's RV64 support, I have recently ended up adding a mode to mimic native RISC-V ASM syntax. Ended up mostly relying on mnemonics to try to detect whether to use "Rd, Rs1, Rs2" vs "Rs1, Rs2, Rd" ordering.
Some things are a little wonky in the assembler. As the way BGBCC had been doing things and the way RV ASM specifies things doesn't always match up strictly 1:1.
Ended up using mnemonics:
First thing on the line, so easy to parse;
One of the biggest points of divergence between native RV and what BGBCC had been using (there wasn't really enough syntactic differences to rely on this to tell them apart).
The assembler basically counts them up, and whichever side has more votes for it wins in terms of operand ordering.
Say:
LD X10, 16(X2) //will vote for Rd first ordering
MOV.Q (SP, 16), R10 //will vote for Rd last ordering.
LI X11, 1234 //will vote for Rd first ordering
MOV 1234, R11 //will vote for Rd last ordering.
MV X12, X10 //will vote for Rd first ordering
MOV R10, R12 //will vote for Rd last ordering.
...
Names that are shared in both styles have no vote either way.
Stuff will not necessarily work as intended if one goes mix-and-match with the ASM styles (it is determined per ASM blob, not per line).
Potentially, one could have ASM blobs too simple to be unambiguous, though:
RET and JALR vote for Rd first;
RTS and JMP vote for Rd last.
So, theoretically, even the simplest inline ASM function should be unambiguous (and one isn't going to use inline ASM just to specify a single ADD instruction or similar...).
For LW and SW, both are parsed as-if they were loads, but SW and similar have gotten new ID numbers, so if one tries to do a Load with one of the Store IDs, it bounces it over to the Store path in the instruction emitter logic. This is a little wonky, but alas (was either this or add wonky special case logic in the ASM parsing).
The main alternative would have been to add assembler directives to indicate the operand ordering more explicitly (at which point one could go mix-and-match with the ASM styles if they wanted, provided directives were used).
Some operand lists are only valid in certain modes though:
ADD Rs, Imm, Rn //only valid if Rd last
ADD Rn, Rs, Imm //only valid if Rd first
Though, these cases don't count in the vote as they would have required more involved parsing. These could be used as "keys" tough, as ASM parsing would fail (resulting in a compiler error) if in the wrong mode.
Note that in the ASM parsing "(R4, 16)" and "16(R4)" are considered functionally equivalent. If I wanted, could also in theory add support for Intel style "[R4+16]" style syntax.
In other news:
Was poking around and implemented a simplistic vaguely-MP3-like audio codec.
General:
Uses AdRice for the entropy coder;
Uses Block-Haar as the main transform;
As 2 levels of an 8-element Haar transform, for a 64-element block.
Groups of 4 center blocks and 1 side block form a larger 256 sample block;
Uses a "half-linear cubic spline" for low frequency components;
Multiple 256 sample blocks are encoded end-to-end into larger blocks which are entropy-coded separately;
A group of headers are re-encoded occasionally, these give general features like the encoded sample rate and main quantization tables (though, quantization is primarily controlled by a dynamically encoded parameter, which encodes a fixed-point scale for the block encoded per-block).
The audio is encoded relative to a spline, as with just the block-Haar by itself, the results sounded kinda awful. Low frequencies resulted in significant blocking artifacts, and blocky stair-stepping sounds pretty bad with audio.
I had set up the spline with the control points aligned with the edges of the blocks. This initially made sense, but I have found that sounds in a certain frequency range can cause the DC of the block to move significantly relative the spline (turning them into obvious square waves).
...