Sujet : Re: rep movsb vs. simpler instructions for memcpy/memmove
De : already5chosen (at) *nospam* yahoo.com (Michael S)
Groupes : comp.archDate : 14. Mar 2025, 15:20:09
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <20250314162009.000078cf@yahoo.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
On Fri, 14 Mar 2025 13:18:37 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
As for the "transfer level speed", I would not know why delivering to
DRAM should be faster than delivering to L3, L2, or L1. On the
contrary, it seems to me that delivering to DRAM is at least as slow
as the other variants.
Transfer level speed would be faster with DMA, because CPU typically has
no way to issue Read requests for chunks of data that are bigger than 64
bytes.
OTOH, DMA resides on device itself and uses as big transfer unit as
appropriate, up to maximum of 4 KB.
In theory, "rep movsb" can generate bigger (than 64B) read transfers,
but I don't belive that by now state of the art is that advanced.
Besides, on all PCE buses, but especially so on PCIe, write transfers
(DMA is doing Write transfer in this case) utilizes bus significantly
better than read transfers. The difference is most pronounced for
small transfers, but on something like 4-lane PCIe Gen4 the difference
can be quite big even when Read transactions uses maximal transfer size.
In any case, that's not what most uses of memcpy() or memmove(), or
rep movsb with their synchronous interfaces are about.
Agreed.