Sujet : Re: Computer architects leaving Intel...
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.archDate : 10. Sep 2024, 18:16:07
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024Sep10.191607@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
User-Agent : xrn 10.11
David Brown <
david.brown@hesbynett.no> writes:
However, my point was that "hand-optimised" source code can lead to
poorer results on newer /compilers/ compared to simpler source code. If
you've googled for "bit twiddling hacks" for cool tricks, or written
something like "(x << 4) + (x << 2) + x" instead of "x * 21", then the
results will be slower with a modern compiler and modern cpu, even
though the "hand-optimised" version might have been faster two decades
ago. You can expect the modern tool to convert the multiplication into
shifts and adds if that is more efficient on the target, or a
multiplication if that is best on the target. But you can't expect the
compiler to turn the shifts and adds into a multiplication.
Why not? Let's see:
[b3:~/tmp:109062] gcc -Os -c xxx-mul.c && objdump -d xxx-mul.o
xxx-mul.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 48 6b c7 15 imul $0x15,%rdi,%rax
4: c3 ret
[b3:~/tmp:109063] gcc -O3 -c xxx-mul.c && objdump -d xxx-mul.o
xxx-mul.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 48 8d 04 bf lea (%rdi,%rdi,4),%rax
4: 48 8d 04 87 lea (%rdi,%rax,4),%rax
8: c3 ret
So gcc-12 obviously understands that your "hand-optimized" version is
equivalent to the multiplication, and with -O3 then decides that the
leas are faster.
(Sometimes it can, but you can't expect it to.)
That also works the other way.
But it becomes really annoying when I intend it not to perform a
transformation, and it performs the transformation, like when writing
"-(x>0)" and the compiler turns that into a conditional branch. These
days gcc does not do that, but I have just seen another twist:
long bar(long x)
{
return -(x>0);
}
gcc-12 -O3 turns this into:
10: 31 c0 xor %eax,%eax
12: 48 85 ff test %rdi,%rdi
15: 0f 9f c0 setg %al
18: f7 d8 neg %eax
1a: 48 98 cltq
1c: c3 ret
So apparently sign-extension optimization is apparently still lacking.
Clang-14 handles this fine:
10: 31 c0 xor %eax,%eax
12: 48 85 ff test %rdi,%rdi
15: 0f 9f c0 setg %al
18: 48 f7 d8 neg %rax
1b: c3 ret
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>