Sujet : Re: rep movsb vs. simpler instructions for memcpy/memmove
De : monnier (at) *nospam* iro.umontreal.ca (Stefan Monnier)
Groupes : comp.archDate : 14. Mar 2025, 03:20:16
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <jwva59oz7ew.fsf-monnier+comp.arch@gnu.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Gnus/5.13 (Gnus v5.13)
[ Of course, I still haven't understood either why it technically can
happen in amd64 but not in My 66000. ]
The cartesian product is smaller, more amenable to buffering and
caching, with more easily discovered (or eliminated) boundaries.
These are the parts I can guess. But the part I don't get is what makes
the factors of your cartesian product smaller, what makes your CPU
more amenable to buffering and caching, and what makes those boundaries
easier to discover or eliminate in My 66000 than in amd64.
From what I have gathered so far, the difference in optimizability
between `MM` and `rep movsb` is not due to the semantics of the
instruction, but in the rest of the CPU.
I guess part of my question is: would an `MM` instruction added to, say,
RISC-V or ARM be as easy to optimize as for My 66000 or would it be more
like for the amd64?
Stefan