Sujet : Re: An execution time puzzle
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 11. Mar 2025, 19:09:51
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Mar11.190951@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5
User-Agent : xrn 10.11
Michael S <
already5chosen@yahoo.com> writes:
Another open issue is that the gcc-12 build of gforth-fast (using r13
instead of r14) is 3 cycles slower than the gcc-10 build. I don't see
an extension of my BTB theory that would explain this. So either my
BTB theory is wrong or there is another effect at work.
>
I tried to understand Indirect Target Predictor paragraph in Opt.
Manual, but failed.
Here is the text of this short paragraph for those who don't like too
look for things themselves, but have better chance than me
to understand what is going on (i.e. primarily for Mitch Alsup)
Thanks.
2.8.1.4
Indirect Target Predictor
The processor implements a 1024-entry indirect target array used to
predict the target of some non-RET indirect branches. If a branch has
had multiple different targets, the indirect target predictor chooses
among them using global history at L2 BTB correction latency.
Branches that have so far always had the same target are predicted
using the static target from the branch's BTB entry. This means the
prediction latency for correctly predicted indirect branches is
roughly 5-(3/N), where N is the number of different targets of the
indirect branch. For these reasons, code should attempt to reduce the
number of different targets per indirect branch.
In the case of this microbenchmark, every indirect branch has only one
target, and the fact that we see cases where this loop with two
indirect branches is executed in 2 cycles indicates that such indirect
branches can be performed in one cycle; that's probably the part about
the "static target".
What is written looks pretty clear to me; maybe when you have read the
indirect-branch sections of several chipsandcheese articles, this all
looks normal to you (although the formula looks curious to me). If
you have any questions, I can give you my interpretation of what is
written here.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>