Liste des Groupes | Revenir à cl c |
On 22/11/2024 19:29, Waldek Hebisch wrote:Bart <bc@freeuk.com> wrote:clang -O3 -march=native 126112us
clang -O3 222136us
clang -O 225855us
gcc -O3 -march=native 82809us
gcc -O3 114365us
gcc -O 287786us
tcc 757347us
You've omitted -O0 for gcc and clang. That timing probably won't be too
far from tcc, but compilation time for larger programs will be
significantly longer (eg. 10 times or more).
The trade-off then is not worth it unless you are running gcc for other
reasons (eg. for deeper analysis, or to compile less portable code that
has only been tested on or written for gcc/clang; or just an irrational
hatred of simple tools).
There is some irregularity in timings, but this shows that
factor of order 9 is possible.
That's an extreme case, for one small program with one obvious
bottleneck where it spends 99% of its time, and with little use of
memory either.
For simply written programs, the difference is more like 2:1. For more
complicated C code that makes much use of macros that can expand to lots
of nested function calls, it might be 4:1, since it might rely on
optimisation to inline some of those calls.
Again, that would be code written to take advantage of specific compilers.
But that is still computationally intensive code working on small
amounts of memory.
I have a text editor written in my scripting language. I can translate
its interpreter to C and compile with both gcc-O3 and tcc.
Then, yes, you will notice twice as much latency with the tcc
interpreter compared with gcc-O3, when doing things like
deleting/inserting lines at the beginning of a 1000000-line text file.
But typically, the text files will be 1000 times smaller; you will
notice no difference at all.
I'm not saying no optimisation is needed, ever, I'm saying that the NEED
for optimisation is far smaller than most people seem to think.
Here are some timings for that interpreter, when used to run a script to
compute fib(38) the long way:
Interp Built with Timing
qc tcc 9.0 secs (qc is C transpiled version)
qq mm 5.0 (-fn; qq is original M version)
qc gcc-O3 4.0
qq mm 1.2 (-asm)
(My interpreter doesn't bother with faster switch-based or computed-goto
based dispatchers. The choice is between a slower function-table-based
one, and an accelerated threaded-code version using inline ASM.
These are selected with -fn/-asm options. The -asm version is not JIT;
it is still interpreting a bytecode at a time).
So the fastest version here doesn't use compiler optimisation, and it's
3 times the speed of gcc-O3. My unoptimised HLL code is also only 25%
slower than gcc-O3.
Les messages affichés proviennent d'usenet.