mitchalsup@aol.com (MitchAlsup1) writes:
On Mon, 21 Apr 2025 6:05:32 +0000, Anton Ertl wrote:
>
Robert Finch <robfi680@gmail.com> writes:
Having branches automatically convert into
predicates when they branch forward a short distance <7 instructions.
>
If-conversion in hardware is a good idea, if done well, because it
involves issues that tend to be unknown to compilers:
>
I had little trouble teaching Brian how to put if-conversion into the
compiler with my PRED instructions. Alleviating HW from having to bother
other than being able to execute PREDicated clauses.
Compilers certainly can perform if-conversion, and there have been
papers about that for at least 30 years, but compilers do not know
when if-conversion is profitable. E.g., "branchless" (if-converted)
binary search can be faster than a branching one when the searched
array is cached at a close-enough level, but slower when most of the
accesses miss the close caches
<
https://stackoverflow.com/questions/11360831/about-the-branchless-binary-search>.
* How predictable is the condition? If the condition is very well
predictable, if-conversion is not a good idea, because it turns the
control dependency (which does not cost latency when the prediction
is correct) into a data dependency. Moreover, in this case the
if-conversion increases the resource consumption. Compilers are not
good at predicting the predictability AFAIK.
>
Rather than base the choice on the predictability of the condition,
It is based on whether FETCH will pass the join-point before the
condition resolves. On an 8-wide machine this might be "THE next
cycle".
On a CPU with out-of-order execution, the instruction fetcher often
runs many dozens of instructions ahead of the functional units which
produce the conditions, so your criterion could cover pretty big IFs
And, given that you want the compiler to do it, the compiler would
have to know about that. Ok, what decision will you take in what
case, and why?
* Is the condition available before or after the original data
dependencies? And if afterwards, by how many cycles? If it is
afterwards and the branch prediction would be correct, the
if-conversion means that the result of the instruction is available
later, which may reduce IPC.
>
Generally, it only adds latency--if the execution window is not staled
at either end this does not harm IPC.
If the additional latency is on the critical path and execution is
dependency-limited, this reduces IPC. And yes, this will result in
the buffers (especially the schedulers) filling up and stalling the
front end.
OTOH, if the branch prediction would
be incorrect, the recovery also depends on when the condition
becomes available,
>
There is no "recovery" from PREDication, just one clause getting
nullified.
I apparently wrote that in a misunderstandable way. Here's another
attempt: When comparing the branching variant to the predicated
(if-converted) variant, if the branching variant would be
mispredicted, it is always at a disadvantage wrt. latency compared to
the predicated variant, because the branching variant restarts from
the instruction fetch when the condition becomes available, while the
predicated variant is already fetched and decoded and waits in a
scheduler for the condition.
Note that in the binary-search case linked-to above, that's also the
case, but in the branchy version the benefit comes from the correct
predictions and the lack of data-dependencies between the loads: In
those cases the cache-missing load does not depend on the previous
cache-missing load (unlike the branchless version), resulting in an
overall shorter latency.
and the total latency is higher in the case of no
if-conversion. The compiler may do an ok job at predicting whether
a condition is available before or after the original data
dependencies (I don't know a paper that evaluates that), but without
knowing about the prediction accuracy of a specific condition that
does not help much.
>
So the hardware should take predictability of a condition and the
availability of the condition into consideration for if-conversion.
>
My argument is that this is a SW decision (in the compiler) not a
HW decision (other than providing the PREDs).
That's a position, not an argument. Do you have an argument for your
position?
Since PREDs are not
predicted (unless you think they are predicted BOTH ways) they do
not diminish the performance of the branch predictors.
Nor increase it. But it sounds like you think that the compiler
should choose predication when the condition is not particularly
predictable. How should the compiler know that?
The compiler choose PRED because FETCH reaches the join-point prior
to the branch resolving. PRED is almost always faster--and when
it has both then-clause and else-clause, it always saves a branch
instruction (jumping over the else-clause).
It appears that you have an underpowered front end in mind. Even in
the wide machines of today and without predication, the front end
normally does not have a problem fetching at least as many
instructions as the rest of the machine can handle, as long as the
predictions are correct. Even if the instruction fetcher cannot fetch
the full width in one cycle due to having more taken branches than the
instruction fetcher can handle or some other hiccup, this is usually
made up by delivering more instructions in other cycles than the rest
of the CPU can handle. E.g., Skymont
<
https://old.chipsandcheese.com/2024/06/15/intel-details-skymont/>
fetches from three sequential streams, 3*32bytes/cycle, and decodes
into 3*3 uops/cycle and stores them in 3 32-entry uop queues; the
renamer consumes 8 instructions from these queues, so as long as the
predictions are correct and the average fetching and decoding is >8
uops/cycle, the renamer will rarely see fewer than 8 uops available,
even if there is the occasional cycle where the taken branches are so
dense that the instruction fetcher cannot deliver enough relevant
bytes to the instruction decoder.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>