Re: Dealing with mispredictions

Liste des GroupesRevenir à c arch 
Sujet : Re: Dealing with mispredictions
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.arch
Date : 26. Dec 2024, 22:54:53
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <f82a13d29b2d9c697242fbffb9656ed4@www.novabbs.org>
References : 1 2 3 4 5 6 7 8
User-Agent : Rocksolid Light
On Thu, 26 Dec 2024 9:46:21 +0000, Anton Ertl wrote:

mitchalsup@aol.com (MitchAlsup1) writes:
Sooner or later, the pipeline designer needs to recognize the of
occuring
code sequence pictured as::
>
    INST
    INST
    BC-------\
    INST     |
    INST     |
    INST     |
/----BR       |
|    INST<----/
|    INST
|    INST
\--->INST
    INST
>
So that the branch predictor predicts as usual, but DECODER recognizes
the join point of this prediction, so if the prediction is wrong, one
only nullifies the mispredicted instructions and then inserts the
alternate instructions while holding the join point instructions until
the alternate instruction complete.
>
Would this really save much?  The main penalty here would still be
fetching and decoding the alternate instructions.  Sure, the
instructions after the join point would not have to be fetched and
decoded, but they would still have to go through the renamer, which
typically is as narrow or narrower than instruction fetch and decode,
so avoiding fetch and decode only helps for power (ok, that's
something), but probably not performance.
When you have the property that FETCH will stumble over the join point
before the branch resolves, the fact you reached the join point means
a branch misprediction is avoided (~16 cycles) and you nullify 4
instructions from reservation stations. FETCH is not disrupted, and
execution continues.
The balance is the mispredict recovery overhead (~16 cycles) compared
to the cost of inserting the un-predicted path into execution (1 cycle
in the illustrated case).

And the kind of insertion you imagine makes things more complicated,
and only helps in the rare case of a misprediction.
PREDication is designed for the unpredictable branches--as a means to
directly express the fact that the <equivalent> branch code is not
expected to be predicted well.
For easy to predict branches, don't recode as PRED--presto; done. So,
rather than having to Hint branches or to guard individual instructions;
I PREDicate short clauses; saving bits in each instruction because these
bits come from the PRED-instruction.

What alternatives do we have?  There still are some branches that are
hard to predict and for which it would be helpful to optimize them.
>
Classically the programmer or compiler was supposed to turn
hard-to-predict branches into conditional execution (e.g., someone
(IIRC ARM) has an ITE instruction for that, and My 6600 has something
similar IIRC).  These kinds of instructions tend to turn the condition
from a control-flow dependency (free when predicted, costly when
mispredicted) into a data-flow dependency (usually some cost, but
usually much lower than a misprediction).
Conditional execution and merging (CMOV) rarely takes as few
instructions
as branchy code and <almost> always consumes more power. However, there
are a few cases where CMOV works out better than PRED, so: My 66000 has
both.

But programmers are not that great on predicting mispredictions (and
programming languages usually don't have ways to express them),
compilers are worse (even with feedback-directed optimization as it
exists, i.e., without prediction accuracy feedback), and
predictability might change between phases or callers.
A conditional branch inside a subroutine is almost always dependent on
who calls the subroutine. Some calls may have a nearly 100% prediction
rate in one direction, other calls a near 100% prediction rate in the
other direction.
One thing that IS different n My 66000 (other than PREDs not needing to
be predicted) is that loops are not predicted--there is a LOOP instr-
uction that performs ADD-CMP-BC back to the top of the loop in 1 cycle.
Since HW can see the iterating register and the terminating limit,
one does not overpredict iterations and then mispredict them away,
instead on predicts loop only so long as the arithmetic supports that
prediction.
Thus, in My 66000, looping branches do not contribute to predictor
pollution (updates), leaving the branch predictors to deal with the
harder stuff. In addition we have a LD IP instruction (called CALX)
that loads a value from a table directly into IP, so no jumps here.
And finally:: My 66000 has a Jump Through Table (JTT) instruction,
which performs:: range check, table access, add scaled table entry
to IP and transfer control.
Thus, there is very little indirect prediction (maybe none on smaller
implementations), switch tables are all PIC, and the tables are
typically ¼ the size of the equivalents in other 64-bit ISAs.
So, by taking Looping branches, indirect branches, and indirect calls
out of the prediction tables, those that remain should be more
predictable.

So it seems to me that this is something that the hardware might use
history data to predict whether a branch is hard to predict (and maybe
also taking into account how the dependencies affect the cost), and to
switch between a branch-predicting implementation and a data-flow
implementation of the condition.
>
I have not followed ISCA and Micro proceedings in recent years, but I
would not be surprised if somebody has already done a paper on such an
idea.
>
- anton

Date Sujet#  Auteur
3 Oct 24 * Microarchitectural support for counting33Anton Ertl
3 Oct 24 +* Re: Microarchitectural support for counting28Brett
5 Oct 24 i`* Re: Microarchitectural support for counting27MitchAlsup1
5 Oct 24 i +- Re: Microarchitectural support for counting1Brett
5 Oct 24 i +* Interrupts in OoO (was: Microarchitectural support for counting)7Anton Ertl
7 Oct 24 i i+* Re: Interrupts in OoO (was: Microarchitectural support for counting)4Brett
7 Oct 24 i ii+* Re: Interrupts in OoO2MitchAlsup1
8 Oct 24 i iii`- Re: Interrupts in OoO1MitchAlsup1
8 Oct 24 i ii`- Re: Interrupts in OoO1Terje Mathisen
7 Oct 24 i i+- Re: Interrupts in OoO1MitchAlsup1
13 Oct 24 i i`- Re: Interrupts in OoO1Anton Ertl
5 Oct 24 i +* Re: Microarchitectural support for counting2MitchAlsup1
25 Dec 24 i i`- Re: Microarchitectural support for counting1MitchAlsup1
25 Dec 24 i +* Re: Microarchitectural support for counting8Paul A. Clayton
25 Dec 24 i i`* Re: Microarchitectural support for counting7MitchAlsup1
25 Dec 24 i i +- Re: Microarchitectural support for counting1MitchAlsup1
31 Dec 24 i i `* Re: Microarchitectural support for counting5Paul A. Clayton
1 Jan 25 i i  `* Re: Microarchitectural support for counting4MitchAlsup1
2 Jan 25 i i   +- Re: Microarchitectural support for counting1MitchAlsup1
6 Jan 25 i i   `* Re: Microarchitectural support for counting2Paul A. Clayton
7 Jan 25 i i    `- Re: Microarchitectural support for counting1Terje Mathisen
25 Dec 24 i `* Re: Microarchitectural support for counting8MitchAlsup1
26 Dec 24 i  +* Dealing with mispredictions (was: Microarchitectural support ...)2Anton Ertl
26 Dec 24 i  i`- Re: Dealing with mispredictions1MitchAlsup1
26 Dec 24 i  `* Re: Microarchitectural support for counting5Michael S
26 Dec 24 i   `* Re: branch guessing, Microarchitectural support for counting4John Levine
26 Dec 24 i    +- Re: branch guessing, Microarchitectural support for counting1Michael S
26 Dec 24 i    +- Re: branch guessing, Microarchitectural support for counting1MitchAlsup1
26 Dec 24 i    `- Re: branch guessing, Microarchitectural support for counting1Thomas Koenig
26 Dec 24 +* Re: Microarchitectural support for counting2Chris M. Thomasson
26 Dec 24 i`- Re: Microarchitectural support for counting1Anton Ertl
27 Dec 24 `* Re: Microarchitectural support for counting2jseigh
28 Dec 24  `- Re: Microarchitectural support for counting1jseigh

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal