Sujet : Re: auto predicating branches
De : monnier (at) *nospam* iro.umontreal.ca (Stefan Monnier)
Groupes : comp.archDate : 23. Apr 2025, 03:59:10
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <jwvcyd338k2.fsf-monnier+comp.arch@gnu.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Gnus/5.13 (Gnus v5.13)
I do not see 2 LDDs being performed parallel unless the execution
width is at least 14-wide. In any event loop recurrence restricts the
IIUC you'll get multiple loads in parallel if the loads take a long time
because of cache misses. Say each load takes 100 cycles, then there is
plenty of time during one load to predict many more iterations of the
loop and hence issue many more loads.
With a branching code, the addresses of those loads depend mostly on the
branch predictions, so the branch predictions end up performing a kind
of "value prediction" (where the value that's predicted is the address
of the next lookup).
With predication your load address will conceptually depend on 3 inputs:
the computation of `base + middle`, the computation of `base + 0`, and
the computation of the previous `needle < *base[middle]` test to choose
between the first two. If the LD of `*base[middle]` takes 100 cycle,
that means a delay of 100 cycles before the next LD can be issued.
Of course, nothing prevents a CPU from doing "predicate prediction":
instead of waiting for an answer to `needle < *base[middle]`, it could
try and guess whether it will be true or false and thus choose to send
one of the two addresses (or both) to the memory (and later check the
prediction and rollback, just like we do with normal branches).
Stefan