On 2/20/2025 7:26 PM, Robert Finch wrote:
Admittedly part of why I have such mixed feelings on full
compare-and-branch:
Pro: It can offer a performance advantage (in terms of per-clock);
Con: Branch is now beholden to the latency of a Subtract.
Con: it can't compare to a constant
Con: it can't compare floating point
>
compare to constant, and floating point compares are supported in Q+ so the cons are gone
constant compare is supported with the constant postfix. Because bits are available in the instruction there is also a precision field. One can compare bytes and branch for instance.
Similar, just with a prefix encoding in my case, and support is optional.
I have recently also added it to BJX2, where it is now also possible to encode:
JCMPxx Rm, Imm, Disp8s (XG1/XG2)
Bxx Rm, Imm, Disp10s (XG3)
Note that both use the same encoding space, Just XG2 kept the XG1 encodings, and I can't change this detail without breaking binary compatibility (was able to change it for XG3 though, as XG3 is sort of its own thing encoding-wise; so I was free to try to fix up some of the worse hair here).
And:
MOV.x Imm17s, (Rm, Disp10s)
I was initially going to have Rm get turned into the immediate, but while this would have worked for JCMPxx/Bxx, it would not have worked for MOV.x, and it makes sense to have decoding consistency when possible.
So:
BTSTT Rm, Imm17s, Disp10s //if((Imm&Rm)==0)
BTSTF Rm, Imm17s, Disp10s //if((Imm&Rm)!=0)
BGT Rm, Imm17s, Disp10s //if(Imm> Rm) || if(Rm< Imm)
BLE Rm, Imm17s, Disp10s //if(Imm<=Rm) || if(Rm>=Imm)
BGTU Rm, Imm17s, Disp10s //if(Imm> Rm) || if(Rm< Imm)
BLEU Rm, Imm17s, Disp10s //if(Imm<=Rm) || if(Rm>=Imm)
BEQ Rm, Imm17s, Disp10s //if(Imm==Rm)
BNE Rm, Imm17s, Disp10s //if(Imm!=Rm)
Nevermind if the operand ordering is convoluted...
And, arguably, GT and LE may have better off had they been called LT and GE in this case, but...
It is basically encoded in a similar way to the 3RI Imm17s special case, just with the Rn register field rather than Ro/Rt (and operating on F1 block instructions rather than F0 block). Not defined for the F2 block.
Not actually tested yet, still need to implement compiler support and similar.
BTW: I have now confirmed that my Verilog bugfix yesterday for XG3 has worked, so now Doom starts up and runs the demo loop. However, demos desync in a different way than usual, showing that there are still some issues.
Still no update on the remaining RV+Jx bug(s), poked at something to see if it changes anything. At the last cycle, it is crashing on an invalid memory access (causing a breakpoint in the TLB miss handler), which doesn't tell me much in the Verilog sim as to what was the cause of said memory access.
Q+ supports all kinds of compares including those generating bit vectors and SETxx, ZSETxx type compares. That is maybe its drawback, supporting too many things.
Yeah, it is a tradeoff. Too many edge cases leads to cost and debugging effort. Sometimes, better to try to not get too complicated.
My stuff is already getting annoyingly complicated and difficult to debug.