Liste des Groupes | Revenir à c arch |
Programmers and compilers are notoriously bad at predictingSince each chased pointer starts back at LSQ, the cost is no differentI thought the important difference is that the decision to prefetch or
than an explicit Prefetch instruction, except without (a),(b) and (c)
having been applied first.
not can be done dynamically based on past history.
branches (except for error branches), but ought to be quite good
at predicting prefetches. If a pointer is loaded, chances are
very high that are it will be dereferenced.
I don't think it's that simple: prefetches only bring the data into L1
cache, so they're only useful if:
- The data is not already in L1.
- The data will be used soon (i.e. before it gets thrown away from the cache).
- The corresponding load doesn't occur right away.
In all other cases, the prefetch will be just wasted work.
It's easy for programmers to "predict" those (dependent) loads which will occur
right away, but those don't really benefit from a prefetch.
E.g. if the dependent load is done 2 cycles later, performing a prefetch
lets you start the memory access 2 cycles early, but since that access
is not in L1 it'll take more than 10 cycles, so shaving
2 cycles off isn't of great benefit.
Given that we're talking about performing a prefetch on the result of
a previous load, and loads tend to already have a fairly high latency
(3-5 cycles), "2 cycles later" really means "5-7 cycles after the
beginning of the load of that pointer". That can easily translate to 20
instructions later.
My gut feeling is that it's difficult for programmers to predict whatDifficult becomes impossible when the code has to operate "well" over
will happen more than 20 instructions further without looking at
detailed profiling.
Stefan
Les messages affichés proviennent d'usenet.