Sujet : Re: Reverse engineering of Intel branch predictors
De : monnier (at) *nospam* iro.umontreal.ca (Stefan Monnier)
Groupes : comp.archDate : 12. Nov 2024, 20:00:02
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <jwvpln0qpel.fsf-monnier+comp.arch@gnu.org>
References : 1 2 3 4 5 6 7 8
User-Agent : Gnus/5.13 (Gnus v5.13)
Hmm... but in order not to have bubbles, your prediction structure still
needs to give you a predicted target address (rather than a predicted
index number), right?
Yes, but you use the predicted index number to find the predicted
target IP.
Hmm... but that would require fetching that info from memory.
Can you do that without introducing bubbles?
>
In many/most (dynamic) cases, they have already been fetched and all
that is needed is muxing the indexed field out of Instruction Buffer.
I guess for small jump table that would work well, indeed, but for
something like a bytecode interpreter, even if you can compact it to
have only 16bit per entry, that still spans 512B. Is your IB large
enough for that?
If you're lucky it's in the L1 Icache, but that still takes a couple
cycles to get, doesn't it?
My 1-wide machine fetches 4-words per cycle.
My 6-wide machine fetches 3 ½-cache-lines per cycle.
Even with a 256B cache line width, it would take 2 cycles to get a 512B
jump table into your IB, after which you still have to select (and
compute, if the table is compacted) the corresponding target address,
and only after that can you start fetching (which itself will suffer
the L1 latency), so we're up to a 5-6 cycle bubble, no?
Stefan