Sujet : Re: else ladders practice
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.lang.cDate : 22. Nov 2024, 16:21:55
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vhq7ik$17oi0$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0
On 22/11/2024 15:19, Michael S wrote:
On Fri, 22 Nov 2024 12:33:29 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:
Bart <bc@freeuk.com> wrote:
>
Sure. That's when you run a production build. I can even do that
myself on some programs (the ones where my C transpiler still
works) and pass it through gcc-O3. Then it might run 30% faster.
>
On fast machine running Dhrystone 2.2a I get:
>
tcc-0.9.28rc 20000000
gcc-12.2 -O 64184852
gcc-12.2 -O2 83194672
clang-14 -O 83194672
clang-14 -O2 85763288
>
so with 02 this is more than 4 times faster. Dhrystone correlated
resonably with runtime of tight compute-intensive programs.
Compiler started to cheat on original Dhrystone, so there are
bigger benchmarks like SPEC INT. But Dhrystone 2 has modifications
to make cheating harder, so I think it is still reasonable
benchmark. Actually, difference may be much bigger, for example
in image processing both clang and gcc can use vector intructions,
with may give additional speedup of order 8-16.
>
30% above means that you are much better than tcc or your program
is badly behaving (I have programs that make intensive use of
memory, here effect of optimization would be smaller, but still
of order 2).
>
gcc -O is not what Bart was talking about. It is quite similar to -O1.
"Similar" in this particular case being a synonym for "identical" :-)
Try gcc -O0.
With regard to speedup, I had run only one or two benchmarks with tcc
and my results were close to those of Bart. gcc -O0 very similar to tcc
in speed of the exe, but compiles several times slower. gcc -O2 exe
about 2.5 times faster.
(Note that "gcc -O0" is still a vastly more powerful compiler than tcc in many ways.)
I'd guess, I can construct a case, where gcc successfully vectorized
some floating-point loop calculation and showed 10x speed up vs tcc on
modern Zen5 hardware. But that's would not be typical.
The effect you get from optimisation depends very much on the code in question, the exact compiler flags, and also on the processor you are using.
Fairly obviously, if your code spends a lot of time in system calls, waiting for external events (files, networks, etc.), or calling code in other separately compiled libraries, then optimisation of your code will make almost no difference. Something that does a lot of calculations and data manipulation, on the other hand, can be much faster. Even then, however, it depends on what you are doing.
Beyond simple "-O3" flags, things like "-march=native" and "-ffast-math" (if you have floating point calculations, and you are sure this does not affect the correctness of the code!) can make a huge difference by allowing more re-arrangements, vector/SIMD processing, using more instructions on newer processors, and having a more accurate model of scheduling.
And the type of processor is also very important. x86 processors are tuned to running crappy code, since a lot of the time they are used to run old binaries made by old tools, or binaries made by people who don't know how to use their tools well. So they have features like extremely local data caches to hide the cost of using the stack for local variables instead of registers. And often it doesn't matter if you do one instruction or a dozen instructions, because you are waiting for memory anyway. If you are looking at microcontrollers, on the other hand, optimisation can make a huge difference for a lot of real-world code.
There is also another substantial difference in code efficiency that is missed out in these sorts of pretend benchmarks. When efficiency really matters, top-shelf compilers give you features and extensions to help. You can use intrinsics, or vector extensions, or pragmas, or attributes, or "builtins", to give the compiler more information and work with it to give more opportunities for optimisation. Many of these are not portable (or of limited portability), and getting top speed from your code is not an easy job, but you certainly have possibilities with a tool like gcc or clang that you can never have with tcc or other tiny compilers.