Sujet : Re: rep movsb vs. simpler instructions for memcpy/memmove
De : monnier (at) *nospam* iro.umontreal.ca (Stefan Monnier)
Groupes : comp.archDate : 13. Mar 2025, 20:53:25
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <jwvv7sc1zve.fsf-monnier+comp.arch@gnu.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
User-Agent : Gnus/5.13 (Gnus v5.13)
MitchAlsup1 [2025-03-13 19:35:33] wrote:
[...]
On Thu, 13 Mar 2025 16:43:07 +0000, Stefan Monnier wrote:
What is different about MM compared to `rep movsb`
[...]
But they never really "tried all that hard" to make them
continuously Optimal.
But is there a reason to presume an implementer of My 66000 would have
the luxury of putting more efforts into making MM "optimal" than Intel put
into making `rep movsb`?
And they have "So Many" extra burdens,
Ah, now you seem to be getting to the kind of answer I was looking for.
such as when from is MMI/O space access and to is cache coherent, and
all sorts of other self imposed problems. Using MTRRs one can switch
the kind of memory to and from point in the middle of a REP MOVs.
All of which do nothing to make optimality easier.
How does MM avoid those complexities?
My 66000 happens to know that memory space changes will not happen
in the middle of these kinds of things (including vectorized Loops).
How does it know? Is it because the ISA just says "don't do that" (I
guess MM would then signal an error if it happens?), or is there some
underlying difference to the way the semantics/cachability of memory
pages is specified which makes it impossible to specify a memory range
to MM where the semantics changes partways?
My compilers don't create such problems for HW to solve. {That is;
the truly horrific x86 optimality problems don't exist.}
How do compilers getting in the picture? I thought they were basically
ignorant of such subtleties of memory caching, as controlled by MTRRs.
Stefan