Liste des Groupes | Revenir à c arch |
...I am sure that they
were aware that this call instruction was expensive, but they expected
that it was worth the cost, and also expected that implementors would
reduce the cost to below what a sequence of simpler instructions would
cost (looking at REP MOVSB in many generations of Intel and AMD CPUs,
we see such expectations disappointed; I have not measured recent
generations, though).
It depends on what your call "a sequence of simpler instructions".
For R/E/CX above, say, dozen 'rep movsb' is faster than a simple
non-unrolled loop of single-byte loads and stores on pretty much any
Intel or AMD CPU since a down of time. If we are talking about this
century, then, at least for Intel, I think that we can claim that the
same is true even relatively to simple loop of 32-bit loads and stores.
If we replace a dozen with hundred or three then it will become true
for loop of 64-bit loads/stores as well.
>
Or, may be, in your book 5KB of elaborate code that contains unrolled
and non-unrolled loops of YMM, XMM, Rxx, Exx, and byte memory accesses
still considered 'a sequence of simpler instructions' ?
If it is a case then I am not going to argue.
Les messages affichés proviennent d'usenet.