Liste des Groupes | Revenir à c arch |
David Brown <david.brown@hesbynett.no> writes:Again - sometimes a compiler will recognise a particular hand-optimised pattern, turn it back to something logically simpler, then optimise from there. But you cannot /expect/ that. On the whole, compilers are more likely to recognise clear and simple patterns than complex ones, especially using bit manipulation in odd ways.However, my point was that "hand-optimised" source code can lead toWhy not? Let's see:
poorer results on newer /compilers/ compared to simpler source code. If
you've googled for "bit twiddling hacks" for cool tricks, or written
something like "(x << 4) + (x << 2) + x" instead of "x * 21", then the
results will be slower with a modern compiler and modern cpu, even
though the "hand-optimised" version might have been faster two decades
ago. You can expect the modern tool to convert the multiplication into
shifts and adds if that is more efficient on the target, or a
multiplication if that is best on the target. But you can't expect the
compiler to turn the shifts and adds into a multiplication.
[b3:~/tmp:109062] gcc -Os -c xxx-mul.c && objdump -d xxx-mul.o
xxx-mul.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 48 6b c7 15 imul $0x15,%rdi,%rax
4: c3 ret
[b3:~/tmp:109063] gcc -O3 -c xxx-mul.c && objdump -d xxx-mul.o
xxx-mul.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 48 8d 04 bf lea (%rdi,%rdi,4),%rax
4: 48 8d 04 87 lea (%rdi,%rax,4),%rax
8: c3 ret
So gcc-12 obviously understands that your "hand-optimized" version is
equivalent to the multiplication, and with -O3 then decides that the
leas are faster.
(Sometimes it can, but you can't expect it to.)
That also works the other way.One day, perhaps, compilers will be perfect. But not yet :-(
But it becomes really annoying when I intend it not to perform a
transformation, and it performs the transformation, like when writing
"-(x>0)" and the compiler turns that into a conditional branch. These
days gcc does not do that, but I have just seen another twist:
long bar(long x)
{
return -(x>0);
}
gcc-12 -O3 turns this into:
10: 31 c0 xor %eax,%eax
12: 48 85 ff test %rdi,%rdi
15: 0f 9f c0 setg %al
18: f7 d8 neg %eax
1a: 48 98 cltq
1c: c3 ret
So apparently sign-extension optimization is apparently still lacking.
Clang-14 handles this fine:
10: 31 c0 xor %eax,%eax
12: 48 85 ff test %rdi,%rdi
15: 0f 9f c0 setg %al
18: 48 f7 d8 neg %rax
1b: c3 ret
Les messages affichés proviennent d'usenet.