Sujet : Re: rep movsb vs. simpler instructions for memcpy/memmove
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 13. Mar 2025, 20:35:33
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <3104c2ab707086659d698a0377450527@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
User-Agent : Rocksolid Light
On Thu, 13 Mar 2025 16:43:07 +0000, Stefan Monnier wrote:
What is different about MM compared to `rep movsb`
MM does not modify the pointers. MM keeps its current index,
thus the compiler can use the Rf pointer multiple times.
that you can confidently state that it will always be optimal?
Compared to the explosion in memmove() subroutine, yes.
>
Are you suggesting that what prevents Intel to make `rep movsb` optimal
is the fact that it modifies its pointers?
Certainly does not help.
But they never really "tried all that hard" to make them continuously
Optimal.
And they have "So Many" extra burdens, such as when from is MMI/O
space access and to is cache coherent, and all sorts of other self
imposed problems. Using MTRRs one can switch the kind of memory
to and from point in the middle of a REP MOVs. All of which do no-
thing to make optimality easier.
So, at a certain point in time, designers punt. If all competing
parties punt, nobody is put asunder.
I have no experience implementing such an instruction, but I find it odd
that such a "cosmetic detail" would have such an profound impact on the
performance of an instruction. Can't they just "macroexpand" it during
decoding into two instructions (one which copies the bytes without
modifying the pointers, and then one which just adjusts the pointers)?
My 66000 happens to know that memory space changes will not happen
in the middle of these kinds of things (including vectorized Loops).
My compilers don't create such problems for HW to solve. {That is;
the truly horrific x86 optimality problems don't exist.}
You may choose differently.
>
Stefan