Liste des Groupes | Revenir à c arch |
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>mitchalsup@aol.com (MitchAlsup1) writes:>
>So:>
# define memcpy memomve
Incidentally, if one wants to do this, it's advisable to write
>
#undef memcpy
>
before the #define of memcpy.
>and move forward with life--for the 2 extra cycles memmove costs
it saves everyone long term grief.
Is it two extra cycles? Here are some data points from
<2017Sep23.174313@mips.complang.tuwien.ac.at>:
>
Haswell (Core i7-4790K), glibc 2.19
1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size
14 14 15 15 17 30 48 85 150 281 570 1370 memmove
15 16 13 16 19 32 48 86 161 327 631 1420 memcpy
>
Skylake (Core i5-6600K), glibc 2.19
1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size
14 14 14 14 15 27 43 77 147 305 573 1417 memmove
13 14 10 12 14 27 46 85 165 313 607 1350 memcpy
>
Zen (Ryzen 5 1600X), glibc 2.24
1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size
16 16 16 17 32 43 66 107 177 328 601 1225 memmove
13 13 14 13 38 49 73 116 188 336 610 1233 memcpy
>
I don't see a consistent speedup of memcpy over memmove here.
>
However, when one uses memcpy(&var,ptr,8) or the like to perform an
unaligned access, gcc transforms this into a load (or store) without
the redefinition of memcpy, but into much slower code with the
redefinition (i.e., when using memmove instead of memcpy).
>Simply replacing memcpy() by memmove() of course will always>
work, but there might be negative consequences beyond a cost
of 2 extra cycles -- for example, if a negative stride is
better performing than a positive stride, but the nature
of the compaction forces memmove() to always take the slower
choice.
If the two memory blocks don't overlap, memmove() can use the
fastest stride.
Les messages affichés proviennent d'usenet.