Sujet : Re: Suggested method for returning a string from a C program?
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.lang.cDate : 29. Mar 2025, 13:37:09
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vs8phl$1adfu$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
User-Agent : Mozilla Thunderbird
On 29/03/2025 01:32, bart wrote:
On 28/03/2025 23:53, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/03/2025 22:33, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/03/2025 20:41, Scott Lurndal wrote:
[...]
The gnu compiler is not multithreaded. The single thread was
compute bound for 13 minutes and 46 seconds.
>
So what was that -j96 about?
"-j96" is an option to GNU make, not to the compiler. It might
invoke
gcc multiple times in parallel, but each invocation of gcc will still be
single-threaded.
>
So, is there just once instance of gcc at work during those 13
minutes, or multiple?
>
In other words, would it take longer than 13:40 mins without it, or
does it help? If -j96 makes no difference, then why specify it?
>
I haven't done any measurements, but I don't know what's unclear.
>
If a single thread was compute bound for 13:46, using "-j96"
won't make that single thread run any faster, but it can enable
"make" to do other things while that single thread is running.
It's also common to use "-j" without an argument, to run as many
jobs simultaneously as possible, or "-j$(nproc)" to run as many
parallel jobs as the number of processing units available (if you
have the "nproc" command; it's part of GNU coreutils).
>
I can imagine "-j" causing problems if dependencies are expressed
incorrectly, but I haven't run into such a problem myself.
>
Are you saying that this job consists of single a C (or C++) source file, so it is not possible to parallelise the processes necessary to compile it? (I've not idea of gcc's capabilities there.)
That would be funny given that I've had criticisms myself for attempting to compile monolithic C programs.
My guess - and only Scott can say for sure - is that his software contains a very large number of files, no doubt some C and some C++.
Today's lesson comes in three parts.
First, "make -j".
When you use "make", the "make" program will coordinate all the programs needed to do the build - running gcc on source files, running pre-processing steps, post-processing steps, linkers, analysers, documentation programs, little utility programs - anything that needs to be done for the build. "make" does this in an order to match the dependencies - if action "A" depends on the output from action "B", then action "A" is not started until "B" is finished. And if the inputs needed for "B" have not changed since it's output was last generated, then action "B" doesn't need to be run at all. The tasks are collected together into a directed acyclic graph, using the partial ordering of dependencies.
When you use "make -j", "make" will run all these tasks in parallel. The partial order of the DAG is preserved. So if you have a 96 core system, and you have hundreds of files that need compiled in this build, and you run "make -j 96", then "make" will coordinate 96 instances of the compiler (or other needed tasks) running at the same time. It won't run more than 96 of them - as compilations finish, they free up "job slots" in make's "job server", and other compilations or tasks are started. When there are not enough tasks that can be done (perhaps due to the dependencies), fewer tasks will run in parallel.
As always with multi-tasking of any sort, if there is a long-running task, then it takes the time it takes - you can't speed it up, no matter how many cpu cores you have.
So one of Scott's compiles takes 13 minutes. "make -j" won't speed that up. But it will mean that any other compilations can be done in parallel. Maybe he has 600 other files that each take 30 seconds to compile. With "make -j", the build takes the 13 minutes it has to for the one awkward file - all the rest are compiled while that is going on. With non-parallel "make", it would take 5 hours (if I've done my sums correctly).
Thus "make -j" is a really good idea, even if you have a particularly long-running task (compilation or anything else).
Second, why is gcc single-threaded when compiles can sometimes take a long time?
The prime reason for that is that multi-threaded compilation is very difficult. There are some aspects of it that could be run in parallel, such as some of the analysis and optimisation could be split per function. But the overhead of multi-threading and keeping shared data and information safe and synchronous would be significant, and you would still typically run the big time-consuming part - the inter-procedural optimisations - as a single thread. In practice, most big pieces of software are build of many files, so parallelising at the build level (such as "make -j") is easier, safer, and more efficient.
The bottleneck of many big builds is the link process. Traditionally, this needs to collect together all the object files and static libraries. In more modern systems, especially with C++, it also de-duplicates sections. Since linking is a task that usually can't begin until all the compilation is finished, and it is usually just one single task, it makes sense to focus on making linking multi-threaded. And this is what we see with modern linkers - a great deal of effort is put into multi-threading the linking process (especially when partitions from the link are passed back to the compiler for link-time optimisation and code generation).
The third point from this thread, is why is gcc so slow on a particular C file? As you have noted before, some aspects of compilation and optimisation - particularly inter-procedural optimisation - increases super-linearly with size, both the size of individual functions and the number of functions. I don't know what this particular file is, but given what I know of Scotts work and my own experience, I think this could be a generated file for hardware simulation. These typically lead to very large files and very large functions, with a great many variables that are used in simple expressions or statements (like "if (node_1234.enabled && clock.rising_edge) node_1234.next = node_1235.output"). Tracking all these variables and their lifetimes, and re-arranging code in an efficient manner, becomes a very time consuming problem for the compiler. But that effort can make a significant difference to the run-time of the simulation, which will normally be orders of magnitude longer than the compilation time. Thus it can be worth having code structured this way.
It is not a typical use-case for compilation, and thus not a major focus for compiler development, but it is used in real systems.