Bart <
bc@freeuk.com> wrote:
On 01/12/2024 13:04, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 28/11/2024 12:37, Michael S wrote:
On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
>
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.
>
My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.
>
This file mostly comprises sqlite3.c which is a machine-generated
amalgamation of some 100 actual C files.
>
You wouldn't normally do development with that version, but in my
scenario, where I was trying to find out why the version built with my
compiler was buggy, I might try adding debug info to it then building
with a working compiler (eg. gcc) to compare with.
Even in context of developing a compiler I would not run blindly
many compiliations of large file.
Difficult bugs always occur in larger codebases, but with C, these in a
language that I can't navigate, and for programs which are not mine, and
which tend to be badly written, bristling with typedefs and macros.
It could take a week to track down where the error might be ...
It could be. You could declare that the program is hopeless or do
what is needed. Which frequently means effectively using available
debugging features. For example, I got strange crash. Looking at
data in the debugger suggested that data is malformed. So I used
data breakpoints to figure out which instruction initialized the data.
That needed several runs of the program, in each run looking what
happened to suspected memory location. At the end I localized the
problem and rest was easy.
Some problems are easy, for example significat percentage of
segfaults: you have something which is not a valid address
ad freqently you immediatly see why the address is wrong and
how to fix this. Still, finding this usually takes longer
than compilation.
At first stage I would debug
compiled program, to find out what is wrong with it.
... within the C program. Except there's nothing wrong with the C
program! It works fine with a working compiler.
The problem will be in the generated code, so in an entirely different
program.
Of course problem is in the generated code. But debug info (I had
at least _some_ debug info, apparently you do not have it) shows you
which part of source is responsible for given machine code. And you
can see data, so can see what is happening in the generated program.
And you have C source so you can see what should happen. Once
you know place where "what is happening" differs from "what should
happen" you normally can produce quite small reproducing example.
So normal debugging tools are useful when several sets of
source code are in involved, in different languages, or the error occurs
in the second generation version of either the self-hosted tool, or the
program under test if it is to do with languages.
(For example, I got tcc.c working at one point. My generated tcc.exe
could compile tcc.c, but that second-generation tcc.c didn't work.)
Clear, you work in stages: first you find out what is wrong with
second-generation tcc.exe. Then you find out piece of tcc.c that was
miscompiled by first generation tcc.exe (producing wrong second
generation compiler). Then you find piece of tcc.c which was
responsible for this miscompilation. And finally you look why
your compiler miscompiled this piece of tcc.c.
Tedius, yes. It is easier if you have good testsuite, that is
collection of small programs that excercise various constructs
and potentially problematic combinations.
Anyway, most of the work involves executing programs in debugger
and observing critical things. Re-creating executables is rare
in comparison. Main point where compiler speed matters is time
to run compiler testsuite.
After that I would try to minimize the testcase, removing code which
do not contribute to the bug.
Again, there is nothing wrong with the C program, but in the code
generated for it. The bug can be very subtle, but it usually turns out
to be something silly.
Removing code from 10s of 1000s of lines (or 250Kloc for sql) is not
practical. But yet, the aim is to isolate some code which can be used to
recreate the issue in a smaller program.
If you have "good" version (say one produced by 'gcc' or by earlier
worong verion of your compiler), then you can isolate problem by
linking parts produced by different compilers. Even if you have
one huge file, typically you can split it into parts (if it is one
huge function normally it is possible to split it into smaller
ones). Yes, it is work but getting quality product needs work.
Debugging can involve comparing two versions, one working, the other
not, looking for differences. And here there may be tracking statements
added.
If the only working version is via gcc, then that's bad news because it
makes the process even more of a PITA.
Well, IME tracking statements frequently produce too much or too little
data. When dealing with C code I tend to depend more on debugger,
setting breakpoints in crucial places and examing data there. Extra
printing functions can help, for example gcc has printing functions
for its main data structures. Such functions can be called from
debugger and give nicer output than generic debugger functions.
But even if you need extra printiong functions you can put them
in separate file, compile once and use multiple times.
I added an interpreter mode to my IL, because I assume that would give a
solid, reliable reference implementation to compare against.
If turned out to be even more buggy than the generated native code!
(One problem was to do with my stdarg.h header which implements VARARGS
used in function definitions. It assumes the stack grows downwords.
This is true on most machines, but not all.
In
my interpreter, it grows downwards!)
You probably meant upwards? And handling such things is natural
when you have portablity in mind, either you parametrise stdarg.h
so that it works for both stack directions, or you make sure that
interpreter and compiler use the same direction (the later seem to
be much easier). Actually, I think that most natural way is to
have data structure layout in the interpreter to be as close as
possible to compiler data layout. Of course, there are some
unavoidable differences, interpreter needs registers for its operation
so some variables that could be in registers in compiled code
will end in stack frame.
That involves severla compilations
of files with quickly decreasing sizes.
Tim isn't asking the right questions (or any questions!). WHY does gcc
take so long to generate indifferent code when the task can clearly be
done at least a magnitude faster?
The simple answer is: users tolerate long compile time. If users
abandoned 'gcc' to some other compiler due to long compile time,
then 'gcc' developers would notice.
People use gcc. They come to depend on its features, or they might use
(perhaps unknowingly) some extensions. On Windows, gcc includes some
headers and libraries that belong to Linux, but other compilers don't
provide them.
The result is that if they were to switch to a smaller, faster compiler,
their program may not work.
They'd have to use it from the start. But then they may want to use
libraries which only work with gcc ...
Well, you see that there are reasons to use 'gcc'. Long ago I
produced image processing DLL for Windows. First version was
developed on Linux using 'gcc' and then compiled on Windows
using Borland C. It turned out that in Borland C 'setjmp/longjmp'
did not work, so I had to work around this. Not nice, but
managable. At that time C standard did not include function
to round floats to integers and that proved to be problematic.
C default, that is truncation produced artifacts that were not
acceptable. So I used emulation of rounding based on 'floor',
that worked OK, but turned out to be slow (something like 70%
of runtime went into rounding). So I replaced this by assembler
code. With Borland C I had to call a separate assembler routine,
which had some overhead.
Next version was cross-compiled on Linux using gcc. This version
used inline assembly for rounding and was significantly faster
than what Borland C produced. Note: images to process were
largish (think of say 12000 by 20000 pixels) and speed was
important factor. So using 'gcc' specific code was IMO justified
(this code was used conditionally, other compilers would get
slow portable version using 'floor').
You need to improve your propaganda for faster C compilers...
I actually don't know why I care. I get the benefit of my fast tools
every day; they're a joy to use. So I'm not bothered that other people
are that tolerant of slow, cumbersome build systems.
But then, people in this group do like to belittle small, fast products
(tcc for example as well as my stuff), and that's where it gets annoying.
I tried tcc compiling TeX. Long ago it did not work due to limitations
of tcc. This time it worked. Small comparison on main file (19062
lines):
Command time size code size data
tcc -g 0.017 290521 1188
tcc 0.015 290521 1188
gcc -O0 -g 0.440 248467 14
gcc -O0 0.413 248467 14
gcc -O -g 1.385 167565 0
gcc -O 1.151 167565 0
gcc -Os -g 1.998 142336 0
gcc -Os 1.724 142336 0
gcc -O2 -g 2.683 207913 0
gcc -O2 2.257 207913 0
gcc -O3 -g 3.510 255909 0
gcc -O3 2.934 255909 0
clang -O0 -g 0.302 232755 14
clang -O0 0.189 232755 14
clang -O -g 1.996 223410 0
clang -O 1.683 223410 0
clang -Os -g 1.693 154421 0
clang -Os 1.451 154421 0
clang -O2 -g 2.774 259569 0
clang -O2 2.359 259569 0
clang -O3 -g 2.970 280235 0
clang -O3 2.537 280235 0
I have dully provided both time when using '-g' and without.
Both are supposed to produce the same code (so also code
and data sizes are the same), but you can see that '-g'
measurably increases compile time. AFAIK compiler data
structures contain slots for debug info even if '-g' is
not given and compiler generates no debug info. So
actial cost of supporting '-g' is higher than the difference,
you pay part of this cost even if you do not use the
capability.
ATM I do not have data handy to compare runtimes (TeX needs
extra data to do uesful work), so I provide code and data
size as a proxy. As you can see even at -O0 gcc and clang
manage to put almost all data into istructions (actually
in tex.c _all_ intialized data is constant), while tcc
keeps it as data which requires extra instructions to
access. gcc at -O and -Os and clang at -Os produce code
which is about half of size of tcc result. Some part
of it may be due to using smaller instructions, but most
is likely because gcc and clang results simply have much
less instructions. At higher optimization level code
size grows, this is probably due to inlining and code
duplication. This usually gives some small speedup at
cost of bigger code, but one would have to measure
(sometimes attempts at optimization backfire and lead
to slower code).
Anyway, 19062 lines is much larger than typical file that
I work with and even for such size compile time is reasonable.
Maybe less typical is modest use of include files, tex.c
uses few standard C headers and 1613 lines of project-specific
headers. Still, there are macros and macro-expanded result
is significantly bigger than the source.
In the past TeX execution time correlated reasonably well with
Dhrystone. On Dhrystone tcc compiled code is about 4 times
slower than gcc/lang, so one can expect tcc compiled TeX to
be significantly slower than one compiled by gcc or clang.
-- Waldek Hebisch