On 3/11/2025 7:38 PM, Lawrence D'Oliveiro wrote:
On Wed, 12 Mar 2025 00:26:50 -0000 (UTC), John Levine wrote:
Next was PDP=11 where MOV R1,R2 copies R1 into R2.
What about CMP (compare) versus SUB (subtract)? CMP does the subtract
without updating the destination operand, only setting the condition
codes. But are the operands the same way around as SUB (i.e. backwards for
comparison purposes) or are they flipped?
Well, it is that way on x86 at least...
Otherwise, depends on arch...
At present, I have:
SUB Rm, Rn //Rn=Rn-Rm
SUB Rs, Rt, Rn //Rn=Rs-Rt
CMPGT Rm, Rn //SR.T = Rn>Rm OR ~ ((Rn-Rm)>0) (N/A XG3)
CMPGT Rs, Rt, Rn //Rn = Rs>Rt
In neither case does SUB update flags, and CMPGT only updates SR.T, but this form is only allowed in XG1 and XG2 (in XG3, it is demoted to optional and, if allowed, is instead encoded as "CMPGT Rs, Rt, ZZR").
There are several CMPxx ops available:
CMPEQ / SEQ
CMPNE / SNE
CMPGT / SGT / SLT
CMPGE / SGE / SLE
CMPHI / SGTU / SLTU
CMPHS / SGEU / SLEU
TEST / STST
TESTN / SNTST
The SLT/SGT and SLE/SGE operators are basically the same operator, differing primarily in that the arguments are flipped.
That said, the combination of SLT and SGE or SGT and SLE make more sense if immediate-synthesis is being used. This is a little bit of a design wart in my own ISA as I didn't think of this and had used SGT and SGE.
Note that with both SLT and SGE (or the reverse), it is possible to fake the other 2 cases (SLE and SGT).
The latter names were partly influenced by RISC-V naming, though RISC-V only had SLT and SLTU. I added the latter cases as RV extensions (more recently with 32-bit encodings) as they were used often enough in the BGBCC output, that they can save a few % off the size of the binary. With 32-bit encodings, they save a little over 3% relative to using multi-op sequences to fake these cases.
BTST/BNTST, STST/SNTST, are actually more common than unsigned compare-branch and compare in my compiler output, but RISC-V lacks them.
Well, the ISA variants also tweaked branches:
XG1:
CMPxx Rm, Rn; BT/BF label //Compare Rn with Rm (Core)
JCMPxx Rn, Disp10 //compare Rn with 0 (Opt)
JCMPxx Rm, Rn, Disp8 //compare Rn with Rm (Opt)
XG2:
Basically same as XG1.
Just BT/BF has a larger displacement in XG2 (8MB vs 1MB).
XG3:
Bxx Rm, Rn, Disp10 //compare Rn with Rm (Core)
Sxx Rm, Rn, ZZR; BT/BF label //Compare Rm with Rn (Opt)
In the BT/BF case, range is 16MB.
Bxx Disp10, Rn, Rm //Same, but RV ordering.
In XG1/2/3, Jumbo_Imm will expand an normal Imm/Disp field to 33 bits.
In XG3, immediate-synthesis may also encode an Imm33.
Vs 29 bits in XG1 and XG2.
In RV+Jx, typical immediate-synthesis is limited to 17 bits, though a 26-bit case exists (with some tradeoffs).
In XG1/2, there was a "CMPxx Imm10, Rn" case, to compare an Imm10 with Rn in a 32-bit encoding. This encoding was (for now) not formally carried over into XG3. Immediate-synthesis can deal with this, but this is a 64-bit encoding.
One possibility would be to give RV64's "SLTI Imm12s" encoding the same basic semantics when the destination is ZERO (like in the XG3 ops) and then borrow the RV64 encoding in this case. Though, it seems wonky to borrow an RV encoding to encode semantics that are, in effect, alien to RISC-V...
The seemingly backwards compare ordering was itself partly a holdover from SH, but, one gets used to it (it is at least consistent with how they are decoded and performed, if one imagines them decoded as 2R subtract).
...