Sujet : Re: Computer architects leaving Intel...
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 09. Sep 2024, 11:30:34
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024Sep9.123034@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : xrn 10.11
Michael S <
already5chosen@yahoo.com> writes:
On Mon, 9 Sep 2024 10:20:00 +0200
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
float invsqrt(float x)
{
...
int32_t ix = *(int32_t *) &x;
[...]
int32_t ix;
memcpy(&ix, &x, sizeof(ix));
...
I don't know if it is always true in more complex cases, where absence
of aliasing is less obvious to compiler.
Something like
memmove(*p, *q, 8)
can be translated to something like
0: 48 8b 06 mov (%rsi),%rax
3: 48 89 07 mov %rax,(%rdi)
without any aliasing worries, and indeed, gcc-9, gcc-10, and gcc-12,
does that.
However, I'd expect that as
long as a copied item fits in register, the magic will work equally
with both memcpy and memmove.
One would hope so, but here's what happens with gcc-12:
#include <string.h>
void foo1(char *p, char* q)
{
memcpy(p,q,32);
}
void foo2(char *p, char* q)
{
memmove(p,q,32);
}
gcc -O3 -mavx2 -c -Wall xxx-memmove.c ; objdump -d xxx-memmove.o:
0000000000000000 <foo1>:
0: c5 fa 6f 06 vmovdqu (%rsi),%xmm0
4: c5 fa 7f 07 vmovdqu %xmm0,(%rdi)
8: c5 fa 6f 4e 10 vmovdqu 0x10(%rsi),%xmm1
d: c5 fa 7f 4f 10 vmovdqu %xmm1,0x10(%rdi)
12: c3 ret
13: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
1a: 00 00 00 00
1e: 66 90 xchg %ax,%ax
0000000000000020 <foo2>:
20: ba 20 00 00 00 mov $0x20,%edx
25: e9 00 00 00 00 jmp 2a <foo2+0xa>
The jmp in line 25 is probably a tail-call to memmove().
My guess is that xmm registers and unrolling are used here rather than
ymm registers because waking up the second 128 bits takes time. But
even with that, the code uses two different registers, and if
scheduled differently, could be used for implementing foo2():
0: c5 fa 6f 06 vmovdqu (%rsi),%xmm0
8: c5 fa 6f 4e 10 vmovdqu 0x10(%rsi),%xmm1
4: c5 fa 7f 07 vmovdqu %xmm0,(%rdi)
d: c5 fa 7f 4f 10 vmovdqu %xmm1,0x10(%rdi)
12: c3 ret
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>