On 01/12/2024 13:04, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 28/11/2024 12:37, Michael S wrote:
On Wed, 27 Nov 2024 21:18:09 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>
c:\cx>tm gcc sql.c #250Kloc file
TM: 7.38
>
Your example illustrates my point. Even 250 thousand lines of
source takes only a few seconds to compile. Only people nutty
enough to have single source files over 25,000 lines or so --
over 400 pages at 60 lines/page! -- are so obsessed about
compilation speed.
>
My impression was that Bart is talking about machine-generated code.
For machine generated code 250Kloc is not too much.
>
This file mostly comprises sqlite3.c which is a machine-generated
amalgamation of some 100 actual C files.
>
You wouldn't normally do development with that version, but in my
scenario, where I was trying to find out why the version built with my
compiler was buggy, I might try adding debug info to it then building
with a working compiler (eg. gcc) to compare with.
Even in context of developing a compiler I would not run blindly
many compiliations of large file.
Difficult bugs always occur in larger codebases, but with C, these in a language that I can't navigate, and for programs which are not mine, and which tend to be badly written, bristling with typedefs and macros.
It could take a week to track down where the error might be ...
> At first stage I would debug
> compiled program, to find out what is wrong with it.
... within the C program. Except there's nothing wrong with the C program! It works fine with a working compiler.
The problem will be in the generated code, so in an entirely different program. So normal debugging tools are useful when several sets of source code are in involved, in different languages, or the error occurs in the second generation version of either the self-hosted tool, or the program under test if it is to do with languages.
(For example, I got tcc.c working at one point. My generated tcc.exe could compile tcc.c, but that second-generation tcc.c didn't work.)
After that I would try to minimize the testcase, removing code which
do not contribute to the bug.
Again, there is nothing wrong with the C program, but in the code generated for it. The bug can be very subtle, but it usually turns out to be something silly.
Removing code from 10s of 1000s of lines (or 250Kloc for sql) is not practical. But yet, the aim is to isolate some code which can be used to recreate the issue in a smaller program.
Debugging can involve comparing two versions, one working, the other not, looking for differences. And here there may be tracking statements added.
If the only working version is via gcc, then that's bad news because it makes the process even more of a PITA.
I added an interpreter mode to my IL, because I assume that would give a solid, reliable reference implementation to compare against.
If turned out to be even more buggy than the generated native code!
(One problem was to do with my stdarg.h header which implements VARARGS used in function definitions. It assumes the stack grows downwords. In my interpreter, it grows downwards!)
That involves severla compilations
of files with quickly decreasing sizes.
Tim isn't asking the right questions (or any questions!). WHY does gcc
take so long to generate indifferent code when the task can clearly be
done at least a magnitude faster?
The simple answer is: users tolerate long compile time. If users
abandoned 'gcc' to some other compiler due to long compile time,
then 'gcc' developers would notice.
People use gcc. They come to depend on its features, or they might use (perhaps unknowingly) some extensions. On Windows, gcc includes some headers and libraries that belong to Linux, but other compilers don't provide them.
The result is that if they were to switch to a smaller, faster compiler, their program may not work.
They'd have to use it from the start. But then they may want to use libraries which only work with gcc ...
You need to improve your propaganda for faster C compilers...
I actually don't know why I care. I get the benefit of my fast tools every day; they're a joy to use. So I'm not bothered that other people are that tolerant of slow, cumbersome build systems.
But then, people in this group do like to belittle small, fast products (tcc for example as well as my stuff), and that's where it gets annoying.
So, how long to build LLVM again? It used to be hours. Here's my take on it being built from scratch:
c:\px>tm mm pc
Compiling pc.m to pc.exe
TM: 0.08
This standalone program takes a source file containing an IL program rendered as text. It can create EXE, or run it or interpret it.
Let's try it out:
c:\cx>cc -p lua # compile a C program to IL
Compiling lua.c to lua.pcl
c:\cx>\px\pc -r lua fib.lua # Now compile and run it in-memory
Processing lua.pcl to lua.(run)
Running: fib.lua
1 1
2 1
3 2
4 3
5 5
6 8
7 13
...
Or I can interpret it:
c:\cx>\px\pc -i lua fib.lua
Processing lua.pcl to lua.(int)
Running: fib.lua
1 1
...
All that from a product that took 80ms to build and comprises a self-contained 180KB executable.
If nobody here can appreciate the benefits of have such a baseline product, then there's nothing I can do about that.