Sujet : Re: Decrement And Branch
De : kegs (at) *nospam* provalid.com (Kent Dickey)
Groupes : comp.archDate : 09. Sep 2024, 04:31:00
Autres entêtes
Organisation : provalid.com
Message-ID : <vblq5k$2991r$1@dont-email.me>
References : 1 2 3 4
User-Agent : trn 4.0-test76 (Apr 2, 2001)
In article <
2024Aug15.123928@mips.complang.tuwien.ac.at>,
Anton Ertl <
anton@mips.complang.tuwien.ac.at> wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
On Wed, 14 Aug 2024 9:10:01 +0000, Anton Ertl wrote:
>
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
Like I said, I wondered why this sort of thing wasn't more common ...
[snip]
My 66000 finds use cases all the time, and I also have Branch on bit
instructions and have my CMP instructions build bit-vectors of outcomes.
>
If an architecture has the 88000-style treatment of comparison results
(fill a GPR with conditions, one bit per condition), instructions like
TBNZ and TBZ certainly are useful, but ARM A64 uses a condition code
register with NZCV flags for dealing with conditions, so what is TBNZ
and TBZ used for on this architecture? Looking at a binary I have at
hand, I see a lot of checking bit #63 and some checking of #31, #15,
#7, i.e., checking for whether a 64-bit, ... 8-bit number is negative.
There are also a number of uses coming from libgcc, e.g.,
>
6f0a8: 37e001c3 tbnz w3, #28, 6f0e0
<__aarch64_sync_cache_range+0x50>
6f0e8: 37e801e2 tbnz w2, #29, 6f124
<__aarch64_sync_cache_range+0x94>
6f6dc: b7980b84 tbnz x4, #51, 6f84c <__addtf3+0x71c>
6fb28: b79000a3 tbnz x3, #50, 6fb3c <__addtf3+0xa0c>
6fc30: b79000a3 tbnz x3, #50, 6fc44 <__addtf3+0xb14>
70248: b7980d02 tbnz x2, #51, 703e8 <__multf3+0x728>
7036c: b79809a2 tbnz x2, #51, 704a0 <__multf3+0x7e0>
70430: b77801a2 tbnz x2, #47, 70464 <__multf3+0x7a4>
7048c: b79ffae2 tbnz x2, #51, 703e8 <__multf3+0x728>
70498: b79ffa82 tbnz x2, #51, 703e8 <__multf3+0x728>
>
The tf3 stuff probably is the implementation of long doubles. In any
case, in this binary with 26473 instructions, there are 30 occurences
of tbnz and 41 of tbz, for a total of 71 (0.3% of static instruction
count).
>
Apparently the usefulness of decrement-and-branch is even lower.
>
Certainly in my code most loops count upwards.
>
- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
PA-RISC had "ADDIB,cond,n imm,reg,target". Add a 5-bit signedimmediate to reg, and then branch on comparing the result to 0
(effectively), allowing branching on <, <=, =, >, >=, overflow, carry,
etc. And a non-immediate version ADDB. The target was +/-8KB.
Really simple loops could be done with the loop operation in the delay
slot of ADDIB.
The HP C/C++ Compiler pretty much converted all for() loops to count down
to 0, when it wasn't too awkward. So:
for(i = 0; i < 100; i++) {
array[i] = 0;
}
would be effectively transformed to:
ptr = &array[0];
for(i = 99, i >= 0; i--) {
*ptr++ = 0;
}
Which becomes (PA-RISC has target register listed last, and delay slots,
and nullification where on branches it nullifies next instruction if it
is not taken):
MOV array,r8
LDI 99,r9
LOOP: ADDIB,>=,n -1,r9,LOOP ; r9=r9-1. If r9 >= 0, jump to LOOP
STD,ma r0,8(r8) ; (r8)=r0; r8=r8+8
So it could use ADDIB for many "for" loops. The way nullification works,
it works properly even if the loop should never execute. If r9 starts
at 0, no STD will be done. There was no reason to change the source
code, the compiler would do the transform for you. PA-RISC also had
CMPIB which just does the compare and branch. ADDIB is a very simple
instruction which costs very little to add, and saves 2 instructions for
many loops (ADDI,CMP_0,Bcc -> ADDIB). I think it is a mistake for ARM to
not have it. I see a lot of "ADD, CMP, Bcc" in ARM assembly code.
To avoid inverting the counter, "ADD1CMPBcc" would ADD 1 to a counter,
compare the counter to another register, and branch on condition.
As for ARM TBNZ and TBZ, I see it used all the time in my code where I
often use single bit flags in control variables:
if(flags & FLAG_SPECIAL1) { // FLAG_SPECIAL1 = 0x40
// Do "SPECIAL1" stuff
}
In one program I've written on ARM, 2.3% of all instructions are TBZ or
TBNZ.
Kent