Sujet : Re: Computer architects leaving Intel...
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.archDate : 02. Sep 2024, 13:22:51
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vb4amr$2rcbt$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0
On 02/09/2024 06:08, George Neuner wrote:
On Sun, 1 Sep 2024 22:07:53 +0200, David Brown
I'm not going to argue about whether UB in code is wrong. The
question I have concerns what to do with something that explicitly is
mentioned as UB in some standard N, but was not addressed in previous
standards.
Was it always UB? Or should it be considered ID until it became UB?
I can't answer for languages other than C and C++ (others might be able to compare usefully to, for example, Ada or Fortran). But the C standards explicitly state that behaviours that are not defined in the standards are undefined behaviour in exactly the same way as cases that are labelled as undefined behaviour, and also cases where the program violates a "shall" or "shall not" requirement.
To be clear - the meaning of "undefined behaviour" is simply that no behaviour has been defined. The C standards can say that something is "undefined behaviour" (or just fail to give a definition of the behaviour) and then the implementation can give a definition of it. An example here would be that the C standards say that signed integer arithmetic overflow is undefined behaviour - if you have a signed integer operation and the mathematically correct results can't be represented in the type, then there is no possible way for the generated code to give the correct result. The C standards therefore leave this as "undefined behaviour". However, if you use "gcc -fwrapv" then the behaviour /is/ defined - it is defined as two's complement wrapping.
So if you write C code that overflows signed integer arithmetic and relies on given behaviour and results, the code is wrong because it has undefined behaviour - you are, at best, relying on luck. But if you write C code with such demands and specify that it is only suitable for use with the gcc "-fwrapv" flag, then it is not wrong and there is no undefined behaviour because the compiler implementation has given a definition of the behaviour. However, if you use the same code with, say, old versions of MSVC then you are back to luck and UB even if that compiler does not have optimisations based on knowing that signed integer arithmetic overflow is UB. And it is /your/ fault when the code fails on newer versions of MSVC that /do/ have such optimisations.
This is all very different from what the C standards call "implementation-defined behaviour". Such things as how signed integers are converted to unsigned integers are explicitly IB in the C standards - implementations must define and document the behaviour.
It does seem to me that as the C standard evolved, and as more things
have *explicitly* become documented as UB, compiler developers have
responded largely by dropping whatever the compiler did previously -
sometimes breaking code that relied on it.
I think that is perhaps partly true, partly a myth, and partly simply a side-effect of compilers gaining more optimisations as they are able to analyse more code at a time and do more advanced transforms. The C standards have clarified some of the text over time (most people would agree there is still plenty of scope for improvement there!). That can include changing some things that were previously undefined by omission to being explicitly labelled UB. I can't think of any examples off-hand. But note that this would not in any way change the meaning of the code - UB by omission is the same as explicit UB as far as the C language is concerned. There are very few cases where code was correct for original standard C90 (i.e., independent of any IB and independent of particular compilers) and is not correct C23 with identical defined behaviour. There were a few things changed between C90 and C99, but I don't know of any since then other than a few added keywords that could conflict with user identifiers.
It is an unfortunate truth that older C compilers did not do as good a job at optimisation as newer ones. And this meant that many tricks were used in order to get efficient results, even those some of these relied on UB. Such code can have different results on different compilers, or different sets of options, because there is no definition of what the "correct" result should be. The programmer will have a clear idea of what they think is "correct", but it is not defined or specified anywhere. Usually the programmer feels it is "obvious" what the intended behaviour is - but "obvious" to a programmer does not mean "obvious" to a compiler. Thus you end up with code that works (as intended by the programmer) by testing and good luck with some compilers and options, and fails by bad luck on other compilers or options. The compiler didn't "break" the code - the code was broken to start with. But it is entirely reasonable and understandable why the programmer wrote the "broken" code in the first place, and why it did a useful job despite having UB.
So I appreciate when people get frustrated that changes to a tool change the apparent behaviour of their code. But it is important to understand the the compiler is not wrong here - it is doing the best job it can for people writing correct code. A development tool should emphasis people using it /now/ - and while there is C code in use today that was written many decades ago, the majority of C code (and even more so for C++) is much more recent. It would be wrong to limit modern programmers because of code written long ago - even more so when there is no clear specification of how that old code was supposed to work.
I have moved on from C (mostly), and I learned long ago to archive
toolchains and to expect that any new version of a tool might break
something that worked previously. I don't like it, but it generally
doesn't annoy me that much.
This all depends on the kind of code you write, and the kind of system you target. On my embedded targets, most of my code can be written in standard C. But a lot of it also uses at least some gcc extensions to improve the code - enhancing static error checking, making it more efficient, or making it easier and clearer to write. I am quite clear there that the code is dependent on gcc (it would probably also be fine for clang, but I have not checked that). For all such code, I do my utmost to make sure it is correct and safe, with no UB and no IB beyond what is obvious and necessary. Most programs will also contain code that is more specifically toolchain-dependent, perhaps with snippets of inline assembly, or target-specific features that are needed. This was more of an issue before, when I was using a wider range of compilers.
But for any given project, I stick to a single compiler version and usually one set of compiler flags. For my work, code without C-level UB is not enough - I sometimes also need to test for things like run-time speed and code size, or interaction with external tools of various sorts, or stack usage limits - all things that are outside the scope of C.
However, I don't remember when I last found that portable code that I wrote and was working on one compiler failed to have correct C-level functionality when compiled with a newer compiler (or flags) due to undefined behaviour, new optimisations, or changes in the C standard. I've had portability issues with older code due to IB such as writing code for a microcontroller with a different size of "int". I've seen issues with third-party code - I've had to compile such code with "-fwrapv -fno-strict-aliasing" on occasion. I've made other mistakes in my code. And I've got UB things wrong in my early days when new to C programming. But truly, I am at a loss to understand why some people are so worried about UB in C - you simply need to know the rules and specifications for the language features you use, and follow those rules.
MMV. Certainly Anton's does. ;-)
Anton writes code that seriously pushes the boundary of what can be achieved. For at least some of the things he does (such as GForth) he is trying to squeeze every last drop of speed out of the target. And he is /really/ good at it. But that means he is forever relying on nuances about code generation. His code, at least for efficiency if not for correctness, is dependent on details far beyond what is specified and documented for C and for the gcc compiler. He might spend a long time working with his code and a version of gcc, fine-tuning the details of his source code to get out exactly the assembly he wants from the compiler. Of course it is frustrating for him when the next version of gcc generates very different assembly from that same source, but he is not really programming at the level of C, and he should not expect consistency from C compilers like he does.
Similar to you (David), I came from a - not embedded per se - but
kiosk background: HRT indrustrial QA/QC systems. I know well the
attraction of a new compiler yielding better performing code. I also
know a large amount of my code was hardware and OS specific, that
those are the things beyond the scope of the compiler, but they also
are things that I don't want to have to revisit every time a new
version of the compiler is released.
Yes. For this kind of work, you want to keep your build environment consistent - no matter how careful you are to write correct code without UB.
13 of one, baker's dozen of the other.