Sujet : Re: A Famous Security Bug
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.lang.cDate : 23. Mar 2024, 17:08:48
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <utmuqg$3nr3t$1@dont-email.me>
References : 1 2 3 4 5 6 7 8
User-Agent : Mozilla Thunderbird
On 23/03/2024 10:20, Richard Kettlewell wrote:
David Brown <david.brown@hesbynett.no> writes:
I have tried to explain the reality of what the C standards say in a
couple of posts (including one that I had not posted before you wrote
this one). I have tried to make things as clear as possible, and
hopefully you will see the point.
>
If not, then you must accept that you interpret the C standards in a
different manner from the main compile vendors, as well as some "big
names" in this group. That is, of course, not proof in itself - but
you must realise that for practical purposes you need to be aware of
how others interpret the standard, both for your own coding and for
the advice or recommendations you give to others.
Agreed that the ship has sailed on whether LTO is a valid optimization.
But it’s understandable why someone might reach a different conclusion.
I /do/ understand why Kaz thinks the way he does. I am just trying to show that his interpretation is wrong, so that he can better understand what is going on, and how to get the behaviour he wants.
- Phase 7 says the tokens are “semantically analyzed and translated as a
translation unit”.
- Phase 8 does not use either verb, “analyzed” or “translated”.
- At least two steps (in the abstract, as-if model) are explicitly
happening in the “as a translation unit” level but not in any wider
context.
- The result of those two steps (“translator output”) is than
“collected”.
- Unless you somehow understand that “collected” implicitly includes
further analysis and translation, it’s does not seem unnatural to
conclude that many of the whole-program optimizations done by LTO
implementations would be outside the spec.
This would be very easy to address, by replacing “collected” with a word
or phrase that makes clear that further analysis and translation can
happen outside the “as a translation unit” context.
I would be entirely happy to see clearer wording in the standards here, or at least some footnotes saying what is allowed or not allowed.
Obviously this would violate the principle from the rationale that
existing code (that uses TU boundaries to get memset to “work”) is
important and existing implementations (LTO) are not, but C
standardization has never actually behaved as if that is true anyway.
Oh, I think the C standards committee have done quite well at that. But doing it /completely/ would clearly be impossible, as different people have different ideas about how they think C is defined, and how they think C compilers have to behave. In my line of work, I see plenty of old code that makes assumptions that are not remotely justified by the C standards, but which happened to work on the old or limited toolchain used by the person who wrote the code. If the C standards tried to codify such practices, or if C compilers tried to make sure that /all/ code that worked with other compilers or older versions works on newer tools, progress on compilers would be completely stalled and we'd have no optimisations that weren't already in common use in the 1970's.
What the standards committee try to say is that if code follows C standard N correctly, then when it is compiled under C standard N+1 it should have the same semantics and the same behaviour. And they do that reasonably, but not perfectly.
It would be unreasonable to expect them to guarantee the behaviour of code under new standards when the code did not have guaranteed behaviour under the old standards. Using TU boundaries to "get memset to work" has never been guaranteed.