| Liste des Groupes | Revenir à cl c |
In article <110k0mp$329k6$1@dont-email.me>,It is certainly the case that C code has been written for a long time. And it is certainly the case that some C code was written long ago, and is still used on systems today. But I think it is important to keep in mind that the solid majority of C code is relatively recent. Very little pre-C90 code is ever compiled with modern tools. Code that is old and still in use is important code, but modern code and modern tools should not be kept back because of it.
David Brown <david.brown@hesbynett.no> wrote:On 13/06/2026 14:02, Dan Cross wrote:Yes.In article <110ghmv$21vi3$1@dont-email.me>,>
David Brown <david.brown@hesbynett.no> wrote:[snip]>
As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always
frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your
code - especially if writing correct code gives inefficient results with
the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to
blame compiler developers for implementing the language as it is defined.
Eh...I think those people have a point.
>
Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.
I think it is important for tools to be helpful, and it's fine to
complain if a tool is being directly unhelpful - or ask for improvements
when you think it could be better.
Here's the problem that I have with this line of reasoning. CBut I _do_ think it is fair to say that UB is very easy to fall>
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.
It can certainly happen, yes. And I fully sympathise on these few
occasions when changes to the standard has meant that code that
previously had defined behaviour, now has different or undefined
behaviour. (However, I think that for some kinds of code, programmers
could be better at specifying exactly what standards their code
requires, and the standards they use when compiling code.)
>
But it is important to realise that if you write code with UB, it is
/your/ mistake - not the mistake of the compiler developers, or the
mistake of the standards authors. Compiler vendors can (and do!) try to
help programmers find their mistakes - experience shows, however, that
many programmers reach first for bug report forms or complaints in
forums before compiler tools like sanitisers or even enabling warnings
on their builds.
>
Programming in C is a cooperative effort - including the standards
authors, the compiler vendors, and the C programmers. Each group can
try to help the others, but each is ultimately responsible for their own
part.
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.
"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.
Even once it was a part of C, the concept was communicated
poorly.
Some people seem to delight in this, believing precision inAgreed.
interpreting the standard in abstruse ways is an expression of
deep technical expertise; but it really is not.
Yes, UB is created by programmers. However, in large systems,That can certainly happen. But that's just bugs in the code. I don't see why UB should be considered as something special here. People making changes to existing code sometimes misunderstand things, or accidentally break something that worked before. That's life as a programmer, and there are techniques to reduce the risk - code reviews, linters, testing regimes, etc. Nothing gives 100% guarantees, and everything has to weigh risks, consequences, costs and resources. UB is not special here.
it may be that it was created inadvertantly; someone makes a
change that subtley invalidates some invariant that an unknown
caller far away in the code base (or in another one that relies
on the change via an indirect dependency) and now you've got UB;
locally, everything appears correct; but it's the combination
where the UB manifests.
UB means precisely that I can choose trapping, or IB, or optimising on the assumption it does not happen. If signed integer overflow were defined as wrapping, then compilers could not put in traps to catch the errors because as far as the language is concerned, they are not errors. If they are defined as causing traps, then that's the semantics - compilers could not optimise code assuming overflow does not happen, unless it can prove there is no overflow.This example makes little sense to me. If you don't wantRegehr called out a dichotomy with UB: programmers using a>
language hate it; compiler writers love it.
I think Regehr has made some good points in his writings, but I do not
agree with him on everything.
>
As a programmer, I am a fan of the concept of UB. I am quite happy with
the idea that operations have a pre-condition, and that if there is no
"right answer" for a given input, I should not provide that input. I
prefer that signed integer arithmetic overflow is UB, and do not want it
to be wrapping or have some other semantics - to me, it is far clearer
that way. If I have UB in my code, it's a bug - no different from any
other bug I might make.
integer overflow, then don't overflow; the techniques for
avoiding it are pretty well known. But why is specifically
better that it is UB, rather than than trapping in debug
builds, or having IB semantics based on the underlying machine?
It seems to be that the burden on the programmer is the same.
(I think you missed a bit of your answer here?)It is the case that in C, there are some kinds of UB that can be quiteI disagree. I think almost all non-trivial programs have UB to
subtle. However, you rarely need to risk meeting them. Yes, there are
pitfalls - don't go near them, and they don't matter.
a greater or lesser extent, whether they intend to or not.
However, it is unfortunately the case that sometimes avoiding UB can beThis is kind of my point. If you need a fast way to convery
costly in performance terms. An example would be if you have need of
type-punning - perhaps you have a float in memory and you want to access
it as an uint32_t for some reason. Casting a float * to an uint32_t *
and using that new pointer is UB. Some compilers will nonetheless
generate the code you want after such a cast. Some compilers might not,
depending on details of the rest of the surrounding code, because it is
UB. A non-UB solution would be to use memcpy(), or a type-punning
union. For highly optimising compilers, that's fine - the code
generated by gcc or clang for a memcpy() here is likely to be as
efficient as you could get - directly reading the float from memory to
an integer register. For other compilers, however, you might get a call
to a memcpy() library function in an external DLL, taking orders of
magnitude more cycles. What is the poor programmer to do? Write code
that is portable and correct, but very slow with some implementations?
Write code that "cheats" and is efficient on some implementations but
might not give the desired results on others? Use pre-processor
monstrosities to detect different compilers and adapt accordingly? That
is what I see as the biggest issue resulting from compiler optimisation
based on UB. I don't know what the "best" answer here is.
I want good definitions of things that should be defined. Things that cannot have good definitions, are fine left undefined. A language standard should not be trying to define the behaviour of /everything/.UB is literally the opposite of well-defined.Here's my own vignette: I was chatting with a friend who works>
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.
>
I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.
I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that.
For some programs, yes. For others, no.I am happy knowing that I cannot divide by 0,Yup. That should be a trap.
For some programs, yes. For others, no.or find the square root of a negative number (in the realYup. That should be a trap.
domain).
I agree that wrapping semantics should be something you have to ask for. (As an aside, I think it is a mistake for languages to have types that have wrapping semantics - it's the operations that should wrap, not the types. Zig gets it right by distinguishing between "x + y" and "x +% y".)I am happy knowing that I cannot add two ints if their sumYup. That should be a trap (if you want wrapping semantics, you
overflows the range of their type,
should request it explicitly).
There I agree entirely. The build model of compiling units to separate object files without any information beyond symbol names made sense 50 years ago - we should be doing far better now. (We /can/ do far better, but it requires conventions in the way you write your C code and the options used when compiling or linting the program.)and that I cannot call a function with a different number orYup. That should be a compile-time error.
type of parameters than its definition.
Certainly it is possible to eliminate a number of things that are UB in C. UB that is not necessary, or not useful, is a bad thing in a language.I have a great deal of difficulty seeing how things could beThere are existence proofs of other languages that can, and do,
any different, other than in a managed language with significant
overhead from run-time checks - and that goes against the
"aggressively optimised" requirement.
do these things, and do them well. I hate to keep beating this
drum, but I think Rust does well here: in safe Rust, UB is a
compile-time error; in *unsafe* Rust, there are tools to help
find where programmers violate the language's invariants.
I didn't say you said it did :-)Having "well-defined semantics" does not mean the language should acceptI never said that it did.
anything that happens to fit the syntax and grammar rules, or that all
functions and operations should give a defined result for all inputs.
I am all in favour of compile-time checks and rejecting code with errors (not just UB) as soon as possible. The "perfect" language is one where you really can follow the old Ada saying - if you can make it compile, it's ready to ship.It means that the set of valid inputs is clearly defined, along with theSo I was the one who said "well-defined semantics" and I had a
outputs and effects you get when the inputs are valid.
specific meaning in mind. Your definition is incomplete with
respect to that meaning: in addition to what you said, invalid
inputs should be rejected, either as a compile time error, or by
generating an exception or panic at runtime. If you want to
live dangerously and turn the runtime checks off for performance
reasons, then you get 2's complement behavior for integers or
whatever the machine does for the others.
I'd be happy for the C standard to say that signed integer overflow is a bug, or that code is not allowed to overflow its integer arithmetic. I would not be happy if it said compilers must trap on the bug or handle it in some specific way - what happens when a bug is reached is still UB. And if the wording of the standard were changed to call it a "bug" rather than "UB", it would make absolutely zero difference to the way I write my code.(There are plenty of points in the C standards where the wording couldIt's not just that it's nowhere close to being as well-defined
make the semantics clearer, or where the range of input values could
easily have been larger - I am not suggesting C is as well-defined as it
could reasonably be.)
as it should be, it's because the language as defined permits
behavior that varies far too widely, specifically because of UB.
Consider one of the examples you gave: signed integer overflow.
The standard doesn't say that you _can't_ add two numbers
together if you overflow, it just says that if you do, the
language imposes no requirements on the resulting behavior. It
may trap, it may elide the addition entirely, or it may do it
and let the result be whatever the underlying machine does.
That is, the _language_ does not say that it's a bug; it says
that it's not going to say anything about it at all.
This is one reason the committee is trying to reign some of thisIt was badly worded - I meant that programmers do not want mistakes that they might make to lead to additional problems. We can all appreciate and expect that if we make a mistake in code with an incorrect calculation, that will give incorrect output, or perhaps a crash in the program. But we hope that it will not lead to corruption of a filesystem, or an exploitable security hole - something out of proportion with the mistake.
in.
I think that's simplistic; not many programmers actively want toThat, I think, is the tension: there was a fundamental breakdown>
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.
Communication between the separate parties is always an issue, and it is
easy for it to be a one-way street with a language standards committee
dictating the rules with little attention to feedback, then compiler
vendors following these rules without listening to the users.
>
A challenge here, perhaps, is that users are a very diverse group. How
much should compiler vendors cater for those that put a lot of effort
into correctness and want top efficiency, or those that are less
knowledgable about the language but want to avoid the consequences of
their mistakes? What about those working with old code written for
different compilers with different unwritten rules? It is not easy to
please everyone.
"avoid the consequences of their mistakes." Do you really
believe that they do? If so, why?
Conversely, there *is* this kind of machismo attitude among manyIn my field, people usually put a lot of effort into writing code simply and clearly. You avoid mistakes not by being "clever", but by being meticulous and careful. I don't think successful C programming requires greater intellect, knowledge or experience compared to other programming languages - but it /does/ require an appropriate attitude. You are working with sharp knives - pay attention to what you are doing, and you'll be fine.
C programmers that it requires a superior intellect to truly
understand this language, and those who do not (or who make any
mistake in their understanding) are simply unworthy. I have
repeatedly observed this over many decades now, and when I see
it, I think that it is odious.
My experience is that most programmers are highly intelligent,I was avoiding signed integer overflow long before I had read any C standards or even knew about the term "UB". Programming in C does not need a lawyer knowledge of the language. It is just like programming in any other programming language - use features that you know are correct, and if you want to do something and don't know how to do so correctly, look it up.
capable people. They are not wrong to want behavior they can
rely on, particularly when things are not obvious, as they
often are not. They also want a language that requires a less
lawyerly read of to understand its semantics; that could go the
way of formality (my preferred approach) or just clearer
exposition. Either would be preferable to the current state.
In fairness, I think the current members of the committeeBuggy toolchains are always a pain. (So is buggy hardware - microcontrollers and cpus have their errors too.)
recognize this.
Obviously in a production setting tools should be tested and>I am not in any way saying that critics of aspects of C (the language,>
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and
unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.
The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.
I think that is a good way to handle the situation. In my projects, I
do not normally upgrade or change toolchains. While I think the risk of
UB is small in my own code, small does not mean non-existent. And for
my work, generated code that behaves correctly in terms of C semantics
but has different execution times or code size might also be an issue -
so changes in toolchains mean a lot of extra testing and qualification.
qualified. But the danger posed by UB adds unacceptable risk on
large projects, and the burden for updating a toolchain is too
high. That is as much an indictment of the language as of any
particular project.
As a counter example, there was the Harvey project, which was a
fork of Plan 9 where the Plan 9 C dialect was replaced with ISO
C; we accounted for this by having CI build with 6 seperate
compilers; this flushed out a lot of bugs.
I am surprised that more projects do not adopt canary CI builds
against newer toolchains.
In addition, for some microcontrollers the toolchains have relativelyFun fact: part of the reason Google got involved in clang and
small user bases and consequently higher risks of unknown bugs in the
toolchains themselves. Sometimes there are also implementation-specific
features that change between versions (though that is less of an issue
these days).
LLVM development was because the vendor toolchain for a
particular microcontroller used in android phones was buggy and
would crash (that is, the compiler itself crashed). The
solution was not to live with it; it was to build a better
toolchain.
Google could afford to do that; I recognize not manyUnfortunately that's true.
organizations can.
There's no simple answer here.See above. Those people may well have written the code before CAnd just as it's not acceptable to blame compiler writers for>
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.
Being dead does not resolve you of the responsibility - the person that
wrote the code with UB is the person who wrote the code with the UB,
just like any other bugs. That person wrote the code with the error.
was standardized and before UB as we know it now existed. Also,
by definition UB is not an error.
It might not be fair to hold it against them - there are a great manyExactly. The footguns hiding in C code that has worked
possible reasons why it was not their fault (typically management is
more at fault than the coders!). And placing blame is rarely a useful
exercise - usually it does not matter where the bugs came from, only
that they are there and need to be fixed or worked around.
perfectly for decades, dating back to before the standards
existed, are legion. Caveat emptor.
_Or_ the code may have been written with careful regard for the
standard, but something _else_ may have been changed that now
leads to exposure to UB. For example, perhaps code was written
that multiples two numbers, `a*b`; a known to be `unsigned int`
when written, but `b` is a signed int. But maybe that is hidden
behind a typedef; some time in the future, the typedef is
changed so that `a` is now `unsigned short`; perhaps someone
realized that the domain values never exceed 16 bits and by
changing the definition some critical structure now fits in a
single cache line. But also now the type promotion rules kick
so that `a*b` happens with the factors as `signed int` and in
there exist values of `a` and `b` where `a*b` overflows: UB.
The code had no UB; the change was elsewhere; no one saw this
because the tests all passed and everything looked ok; then
someone upgrades the compiler and now things break.
Who's fault is that?
And no, this is not contrived; this is exactly the sort of thingOr the language, or the tools.
that happens on large, long-lived projects.
...but be careful blaming the programmer.As said earlier, C is what it is. I suspect that it will>
continue to make incremental improvements, but we're basically
stuck with what we have.
Agreed.
Les messages affichés proviennent d'usenet.