Sujet : Re: Reverse engineering of Intel branch predictors
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 05. Nov 2024, 21:41:22
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <a3d81b5c64ce058ad21f42a8081162cd@www.novabbs.org>
References : 1 2 3
User-Agent : Rocksolid Light
On Mon, 28 Oct 2024 17:55:29 +0000, Stefan Monnier wrote:
In MY 66000 ISA::
a) RET is not predicted
b) switch() is not predicted
c) method calls are not predicted
d) GOT calls are not predicted
>
Which pretty much gets rid of the problem.
>
By "the problem" I guess you mean "indirect jumps", right?
Mostly, but I also changed on what the jump is predicted.
c+d) GOT calls and method calls use the CALX instruction which
loads IP from memory--thus not needing prediction--and not using
a trampoline, either.
>
I don't understand the "thus not needing prediction". Loading IP from
memory takes time, doesn't it? Depending on your memory hierarchy and
where the data is held, I'd say a minimum of 3 cycles and often more.
What do you do during those cycles?
It is not that these things don't need prediction, it is that you do the
prediction and then verify the prediction using different data.
For example: The classical way to do dense switches is a LD of the
target
address and a jump to the target. This requires verifying the address of
the target. Whereas if you predict as JTT does, you verify by matching
the index number (which is known earlier and since the table is
read-only
you don't need to verify the target address.
So, it is not that you don't predict, it is that the data used to
verify the prediction is more precise and available earlier.
b) switch() is the JTT instruction
>
IIUC this is basically like CALX, except with a bit more work to
generate the address from which you fetch the IP (plus a bit more work
to generate the IP from the fetched data as well). So: same question.
So, CALX is a LDD IP,[address] and generally LDD IP,[IP,#GOT[extern]-.]
Since GOT is not writeable from the thread using it (my architecture
has this requirement) and GOT is DW aligned; we can
a) avoid the LDAlign stage of the pipeline
b) feed the loaded value directly into FETCH
saving 2 cycles, while
c) for those situations where we predict through GOT, we verify the
GOT offset (DISP) instead of the loaded GOT value (an entry point).
And since different kinds of data is used in the prediction and
verification--the strategies used in the above paper are not
actual attack strategies in My 66000 implementations.
>
>
Stefan