Sujet : Re: rep movsb vs. simpler instructions for memcpy/memmove
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.archDate : 14. Mar 2025, 02:30:06
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <2db3b1e37183511df9a0a539c618f1c6@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : Rocksolid Light
On Fri, 14 Mar 2025 1:08:38 +0000, Stefan Monnier wrote:
The REP MOV straddles the boundary between two MTRRs.
Why/when would this happen in practice?
Nobody said it was a good idea.
>
But if it doesn't happen in normal cases, then it shouldn't be
significant to performance. So is the problem that just detecting the
occurrence of this situation is already too costly to make `rep
movsb` fast?
A camel's back is only so strong.
Conjecture that there are 14-different kinds of memory on both
source and destination. So we need a 14×14 check on where we are
every cycle, or every time a boundary could be crossed.
Now, µCode (or HW sequencer) needs to check certain things at
certain boundaries, and switch optimal[DRAM,DRAM] to
optimal[Streaming-store, PCIe-config-space] on a cycle's notice,
while adjusting its memory model from "causal" to strongly ordered.
[ Of course, I still haven't understood either why it technically can
happen in amd64 but not in My 66000. ]
The cartesian product is smaller, more amenable to buffering and
caching, with more easily discovered (or eliminated) boundaries.
>
Stefan