Sujet : Re: else ladders practice
De : bc (at) *nospam* freeuk.com (Bart)
Groupes : comp.lang.cDate : 23. Nov 2024, 15:17:36
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vhso61$1o2of$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
User-Agent : Mozilla Thunderbird
On 22/11/2024 19:29, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
clang -O3 -march=native 126112us
clang -O3 222136us
clang -O 225855us
gcc -O3 -march=native 82809us
gcc -O3 114365us
gcc -O 287786us
tcc 757347us
You've omitted -O0 for gcc and clang. That timing probably won't be too far from tcc, but compilation time for larger programs will be significantly longer (eg. 10 times or more).
The trade-off then is not worth it unless you are running gcc for other reasons (eg. for deeper analysis, or to compile less portable code that has only been tested on or written for gcc/clang; or just an irrational hatred of simple tools).
There is some irregularity in timings, but this shows that
factor of order 9 is possible.
That's an extreme case, for one small program with one obvious bottleneck where it spends 99% of its time, and with little use of memory either.
For simply written programs, the difference is more like 2:1. For more complicated C code that makes much use of macros that can expand to lots of nested function calls, it might be 4:1, since it might rely on optimisation to inline some of those calls.
Again, that would be code written to take advantage of specific compilers.
But that is still computationally intensive code working on small amounts of memory.
I have a text editor written in my scripting language. I can translate its interpreter to C and compile with both gcc-O3 and tcc.
Then, yes, you will notice twice as much latency with the tcc interpreter compared with gcc-O3, when doing things like deleting/inserting lines at the beginning of a 1000000-line text file.
But typically, the text files will be 1000 times smaller; you will notice no difference at all.
I'm not saying no optimisation is needed, ever, I'm saying that the NEED for optimisation is far smaller than most people seem to think.
Here are some timings for that interpreter, when used to run a script to compute fib(38) the long way:
Interp Built with Timing
qc tcc 9.0 secs (qc is C transpiled version)
qq mm 5.0 (-fn; qq is original M version) qc gcc-O3 4.0
qq mm 1.2 (-asm)
(My interpreter doesn't bother with faster switch-based or computed-goto based dispatchers. The choice is between a slower function-table-based one, and an accelerated threaded-code version using inline ASM.
These are selected with -fn/-asm options. The -asm version is not JIT; it is still interpreting a bytecode at a time).
So the fastest version here doesn't use compiler optimisation, and it's 3 times the speed of gcc-O3. My unoptimised HLL code is also only 25% slower than gcc-O3.
That is for this test, but that's also one that is popular for language benchmarks.