Liste des Groupes | Revenir à c arch |
Michael S <already5chosen@yahoo.com> writes:On Mon, 09 Sep 2024 10:30:34 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:One would hope so, but here's what happens with gcc-12:>
#include <string.h>
void foo1(char *p, char* q)
{
memcpy(p,q,32);
}
void foo2(char *p, char* q)
{
memmove(p,q,32);
}
gcc -O3 -mavx2 -c -Wall xxx-memmove.c ; objdump -d xxx-memmove.o:
0000000000000000 <foo1>:
0: c5 fa 6f 06 vmovdqu (%rsi),%xmm0
4: c5 fa 7f 07 vmovdqu %xmm0,(%rdi)
8: c5 fa 6f 4e 10 vmovdqu 0x10(%rsi),%xmm1
d: c5 fa 7f 4f 10 vmovdqu %xmm1,0x10(%rdi)
12: c3 ret
13: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
1a: 00 00 00 00
1e: 66 90 xchg %ax,%ax
0000000000000020 <foo2>:
20: ba 20 00 00 00 mov $0x20,%edx
25: e9 00 00 00 00 jmp 2a <foo2+0xa>
The jmp in line 25 is probably a tail-call to memmove().
My guess is that xmm registers and unrolling are used here rather
than ymm registers because waking up the second 128 bits takes
time. But even with that, the code uses two different registers,
and if scheduled differently, could be used for implementing
foo2():
0: c5 fa 6f 06 vmovdqu (%rsi),%xmm0
8: c5 fa 6f 4e 10 vmovdqu 0x10(%rsi),%xmm1
4: c5 fa 7f 07 vmovdqu %xmm0,(%rdi)
d: c5 fa 7f 4f 10 vmovdqu %xmm1,0x10(%rdi)
12: c3 ret
- anton
Try -march instead of -mavx2. E.g. -march=haswell
Sometimes gcc is beyond logic.
For gcc -O3 -march=haswell I got the same result (with gcc-12). I
also tried -march=x86-64-v3 with the same result.
But gcc -O3 -march=x86-64-v4 produced:
0000000000000000 <foo1>:
0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0
4: c5 fe 7f 07 vmovdqu %ymm0,(%rdi)
8: c5 f8 77 vzeroupper
b: c3 ret
c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000000010 <foo2>:
10: c5 fe 6f 06 vmovdqu (%rsi),%ymm0
14: c5 fe 7f 07 vmovdqu %ymm0,(%rdi)
18: c5 f8 77 vzeroupper
1b: c3 ret
And when changing the length to 64:
0000000000000000 <foo1>:
0: 62 f1 fe 48 6f 06 vmovdqu64 (%rsi),%zmm0
6: 62 f1 fe 48 7f 07 vmovdqu64 %zmm0,(%rdi)
c: c5 f8 77 vzeroupper
f: c3 ret
0000000000000010 <foo2>:
10: 62 f1 fe 48 6f 06 vmovdqu64 (%rsi),%zmm0
16: 62 f1 fe 48 7f 07 vmovdqu64 %zmm0,(%rdi)
1c: c5 f8 77 vzeroupper
1f: c3 ret
But when changing the length to 63:
0000000000000000 <foo1>:
0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0
4: c5 fe 7f 07 vmovdqu %ymm0,(%rdi)
8: c5 fe 6f 4e 1f vmovdqu 0x1f(%rsi),%ymm1
d: c5 fe 7f 4f 1f vmovdqu %ymm1,0x1f(%rdi)
12: c5 f8 77 vzeroupper
15: c3 ret
16: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
1d: 00 00 00
0000000000000020 <foo2>:
20: ba 3f 00 00 00 mov $0x3f,%edx
25: e9 00 00 00 00 jmp 2a <foo2+0xa>
- anton
Les messages affichés proviennent d'usenet.