Re: A Famous Security Bug

Liste des GroupesRevenir à cl c  
Sujet : Re: A Famous Security Bug
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.lang.c
Date : 22. Mar 2024, 21:26:22
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <utkphe$34l73$1@dont-email.me>
References : 1 2 3 4 5 6 7 8
User-Agent : Mozilla Thunderbird
On 22/03/2024 19:55, Kaz Kylheku wrote:
On 2024-03-22, David Brown <david.brown@hesbynett.no> wrote:
You should read the footnotes to 5.1.1.2 "Translation phases".
Footnotes are not normative, but they are helpful in explaining the
meaning of the text.  They note that compilers don't have to follow the
details of the translation phases, and that source files, translation
units, and translated translation units don't have to have one-to-one
correspondences.
 Yes, I'm aware of that. For instance preprocessing can all be jumbled
into one process. But it has to produce that result.
 Even if translation phases 7 and 8 are combined, the semantic analysis
of the individual translation unit has to appear to be settled before
linkage. So for instance a translation unit could incrementally emerge
from the semantic analysis steps, and those parts of it already analyzed
(phase 7) could start to be linked to other translation units (phase 8).
 
Again, you are inferring far too much here.  The standard is /not/ limiting like this.
Compilers can make use of all sorts of additional information.  They have always been able to do so.  They can use extra information provided by compiler extensions - such as gcc attributes.  They can use information from profiling to optimise based on real-world usage.  They can analyse source code files and use that analysis for optimisation (and hopefully also static error checking).
Consider this:
A compiler can happily analyse each source code file in all kinds of ways, completely independently of what the C standards (or perhaps, by happy coincidence, using the same types of pre-processing and interpretation).  This analysis can be stored in files or some other storage place.  Do you agree that this is allowed, or do you think the C standards somehow ban it?  Note that we are calling this "analysis" - not C compilation.
Now the compiler starts the "real" compilation, passing through the translation phases one by one.  When it gets to phase 7, it reads all this stored analysis information.  (Nothing in the standards says the compiler can't pull in extra information - it is quite normal, for example, to pull in code snippets as part of the compilation process.) For each translation unit, it produces two outputs (in one "fat" object file) - one part is a relatively dumb translation that does not make use of the analysis, the other uses the analysis information to generate more optimal code.  Both parts make up the "translator output" for the translation unit.  Again, can you point to anything in the C standards that would forbid this?
Then we come to phase 8.  The compiler (or linker) reads all the "translator output" files needed for the complete program.  It checks that it has the same set of input files as were used during the pre-compilation analysis.  If they are all the same, then the analysis information about the different units is valid, and thus the optimisations using that extra information are valid.  The "dumb translation" versions can be used as a fallback if the analysis was not valid - otherwise they are thrown out, and the more optimised versions are linked together.
There is nothing in the description of the translation phases that hinders this.  All the compiler has to do is ensure that the final program - not any individual translation units - has correct observable behaviour.
I would also refer you to section 1 of the C standards - "Scope".  In particular, note that "This document does /not/ specify the mechanism by which C programs are transformed for use by a data-processing system". (Emphasis mine.)  The workings of the compiler are not part of the standard.

I'm just saying that certain information leakage is clearly permitted,
regardless of how the phases are integrated.
 
The standard also does not say what the output of "translation" is - it
does not have to be assembly or machine code.  It can happily be an
internal format, as used by gcc and clang/llvm.  It does not define what
"linking" is, or how the translated translation units are "collected
into a program image" - combining the partially compiled units,
optimising, and then generating a program image is well within that
definition.
>
(That can be inferred
from the rules which forbid semantic analysis across translation
units, only linkage.)
>
The rules do not forbid semantic analysis across translation units -
they merely do not /require/ it.  You are making an inference without
any justification that I can see.
 Translation phase 7 is clearly about a single translation unit in
isolation:
 "The resulting tokens are syntactically and semantically analyzed
  and translated as a translation unit."
 Not: "as a combination of multiple translation uints".
The point is that many things are local to a translation unit, such as statics, type definitions, and so on.  These are valid within the translation unit (within their scope, of course), and independent of identically named items in other translation units.  It is about defining a kind of "unit of compilation" for the language semantics - it is /not/ restricting the behaviour of a compiler.
LTO does not change the language semantics in any way.  The language semantics determine the observable behaviour of the program, and we have already established that this must be unchanged.  Generated instructions for a target are not part of the language semantics.

 5.1.1.1 clearly refers to "[t]he separate translation units of a
program".
It does so all in terms of what a compiler /may/ do.
And there is never any specification of the result of a "translation". It can happily be byte-code, or internal toolchain-specific formats.

 LTO pretends that the program is still divided into the same translation
units, while minging them together in ways contrary to all those
chapter 5 descriptions.
No.

 The conforming way to obtain LTO is to actually combine multiple
preprocessing translation units into one.
 
You could do that if you like (after manipulating things to handle statics, type definitions, etc.).
And you would then find that if "foo()" in "foo.c" called "bar()" in "bar.c", the call to "bar()" might be inlined, or omitted, or otherwise optimised, just as it could be if they were both defined in the same translation unit.
The result would be the same kind of object code as you get with LTO - one in which the observable behaviour is as expected, but you might get different details in the generated code.
I don't know why you would think that this kind of combination of units is conforming, but LTO is not.  It's all the same thing in principle - the only difference is that real-world implementations of LTO are designed to be scalable, do as much as possible in parallel, and avoid re-doing work for files that don't change.
Some link-time optimisation or "whole program optimisation" toolchains are aimed at small code bases (such as might fit into a small microcontroller) and combine all the source code together then handle it all at once.  Again, the principles and the semantics are not any different from gcc LTO - it's just a different way of splitting up the work.

That's why we can have a real world security issue caused by zeroing
being optimized away.
>
No, it is not.  We have real-world security issues for all sorts of
reasons, including people mistakenly thinking they can force particular
types of code generation by calling functions in different source files.
 In fact, that code generation is forced, when people do not use LTO,
which is not enabled by default.
 
No, it is not.
The C standards don't talk about LTO, or whether or not it is enabled, or what is "default", or even what kind of code generation you get.
If the compiler knows that a function call will not have or affect observable behaviour, it can omit that call.  It does not matter how it knows this.  LTO is a very practical way to get this information, but it might not be the only way.  Profile-guided optimisation information may provide the same information.  So could attributes given in the function declaration (and a future C standard will likely support such attributes).
But if the compiler doesn't know for sure that it is safe to omit the call, then it must generate it.  Correctness trumps optimisation!

The rules spelled out in ISO C allow us to unit test a translation
unit by linking it to some harness, and be sure it has exactly the
same behaviors when linked to the production program.
>
No, they don't.
>
If the unit you are testing calls something outside that unit, you may
get different behaviours when testing and when used in production.
 Yes; if you do nonconforming things.
No one is suggesting doing "nonconforming things".
To give a simple example, suppose your unit is intended to perform some calculations and then call a callback with the result.  In a test harness, you would provide a callback that checks the result against the expected value, and provides a pass/fail log message.  In production use, you would provide a callback that pops up a window with the value, or sends it in an email to the user.  The observable behaviour of the production program and the test program is very different.
In fact, unless you are testing the production version, or you are producing a test harness, you would normally expect very different observable behaviours from any unit testing and real usage of the code.

 
only thing you can be sure of from testing is that if you find a bug
during testing, you have a bug in the code.  You can never use testing
to be sure that the code works (with the exception of exhaustive testing
of all possible inputs, which is rarely practical).
 LTO will break translation units that are simple enough to be trivially
proven to have a certain behavior.
 
Again, claiming this will not make it true.  You need to update your ideas about what observable behaviour actually is.

If I have some translation unit in which there is a function foo, such
that when I call foo, it then calls an external function bar, that's
observable.
>
5.1.2.2.1p6 lists the three things that C defines as "observable
behaviour".  Function calls - internal or external - are not amongst these.
 External calls are de facto observable,
The phrase "de facto" is an admission that you understand that none of this is part of the /actual/ standards.  You have dropped from "the official standards make this clear" down to "I think this".

because we have it for granted
when we have a translation unit that calls a certain function, we can
supply another translation unit which supplies that function. In
that function we can communicate with the host environment to confirm
that it was called.
 
All such boundaries are lost in the link stage, before observable behaviour becomes relevant.

I can link that unit to a program which supplies bar,
containing a printf call, then call foo and verify that the printf call
is executed.
>
Yes, you can.  The printf call - or, more exactly, the "input and output
dynamics" - are observable behaviour.  The call to "bar", however, is not.
 If bar does not call the function, then the observable behavior of
printf doesn't occur either; they linked by logic / cause-and-effect.
 
Nonsense.
The compiler-generated code must produce the correct observable behaviour.  It can do that however it likes.  It can put a call to "printf" directly in "foo".  It can replace the "printf" with a "puts" or a series of target-specific "write_a_char" calls if the results are the same.
C is defined in terms of behaviour, not particular instruction sequences.  If you write "x = y * 4;", the compiler can generate instructions that look like "x = y + y + y + y;", or "x = y * 2; x = x + y + y;", or "x = y << 8 - (2 * y + 3 * y - y)_;", or anything it likes as long as the result is correct (and obviously avoiding any extra overflows).

A behavior that is not itself formally classified as observable can be
discovered by logical linkage to be necessary for the production of
observable behavior. It can be an "if, and only if" linkage.
 If an observable behavior B occurs if, and only if, some behavior A
occurs, then the fact of whether A occurs or not is de facto observable.
Calling it "de facto observable behaviour" is just confusing your understanding here.  But you can well say that if B is observed, that means A must have happened.
However, you have not in any way shown that A (in this case, instructions to call the function "bar") is the only way to result in the observable behaviour.

 
The compiler, when compiling the source of "foo", will include a call to
"bar" when it does not have the source code (or other detailed semantic
information) for "bar" available at the time.
 Translation phases 1 to 7 forbid processing material from another
translation unit.
Nope.

Conforming semantic analysis of a translation unit has
nothing but that translation unit.
 
Nope.

But you are mistaken to
think it does so because the call is "observable" or required by the C
standard.
 Sure; let's say that the call can be tied to observable behavior
elsewhere such that the call occurs if and only if the observable
behavior occurs.
 
That would be a better way to put it.  But it is still not the case here.

It does so because it cannot prove that /running/ the
function "bar" contains no observable behaviour, or otherwise affects
the observable behaviour of the program.  The compiler cannot skip the
call unless it can be sure it is safe to do so - and if it knows nothing
about the implementation of "bar", it must assume the worst.
 The compiler cannot do any of this if it is in a conforming mode.
The compiler can omit the call to "bar" if it is sure that it results in no observable behaviour.  It cannot omit it if it is not sure of this. It is /that/ simple.

 But sure, in the nonconforming LTO paradigm, which does have to adhere
to sane rules, that more or less follow what would have to happen if
multiple preprocessing translation units were merged at the token level
and thus analyzed together.
 
Sometimes the compiler may have additional information - such as if it
is declared the gcc "const" or "pure" attributes (or the standardised
"unsequenced" and "reproducible" attributes in the draft for the next C
version after C23).
 If the declarations are available only in another translation unit,
they cannot be taken into account when analyzing this translation unit.
 
Wrong.
This is really the crux of your misunderstandings.  You have read between the lines of the standard and imagined rules that don't exist. Once you realise that they are imaginary, I expect the rest to fall into place.

Since ISO C says that the semantic analysis has been done (that
unit having gone through phase 7), we can take it for granted as a
done-and-dusted property of that translation unit that it calls bar
whenever its foo is invoked.
>
No, we can't - see above.  Nothing in the C standards forbids any
additional analysis, or using other information in code generation.
 Any semantic analysis performed be that which is stated in translation
phase 7, which happens for one translation unit, before considering
linkage to other translation units.
 What forbids is is that no semantic analysis activity is decribed as
taking place in translation phase 8, other than linage.
The C standards also don't describe drinking coffee while waiting for the compiler.  Just because something is not mentioned, does not mean it is forbidden!

 
Say I have a call to foo in main, and the definition of foo is in
another translation unit.  In the absence of LTO, the compiler will have
to generate a call to foo.  If LTO is able to determine that foo doesn't
do anything, it can remove the code for the function call, and the
resulting behavior of the linked program is unchanged.
>
There always situations in which optimizations that have been forbidden
don't cause a problem, and are even desirable.
>
>
Can you give examples?
>
You already mentioned "-fast-math" (and by implication, its various
subflags in gcc, clang and icc).  These are clearly documented as
allowing some violations of the C standards (and not least, the IEEE
floating point standards, which are stricter than those of C).
 Yes, and some people want that, learn how it works, and get their
programs working with it, all the while knowing that it's
nonconforming to IEEE and ISO C.
Indeed.  I am "some people" in this context.

 Another tool in the box.
Agreed.
But "-ffast-math" was already covered, and is irrelevant precisely because it is entirely clear that it is potentially standards-violating. (But it is not "forbidden".  I have yet to see any ISO C police enforcers at my office door, waving a warrant.)
I wanted to know if you had other examples of what you see as standards-violating optimisations that are not documented as such.

 
(While I don't much like an "appeal to authority" argument, I think it's
worth noting that the major C / C++ compilers, gcc, clang/llvm and MSVC,
all support link-time optimisation.  They also all work together with
both the C and C++ standards committees.  It would be quite the scandal
if there were any truth in your claims and these compiler vendors were
all breaking the rules of the languages they help to specify!)
 Why would it be?
It would run counter to the whole point of having a standard.

 In the first place, all the implementations you mention have to be
explicitly put into a nondefault configuration in order to resemble
conforming ISO C implementations.
Yes, but they are clear about that.  (At least, gcc is - I haven't read the documentation for clang as thoroughly, and have barely touched MSVC.)
It is absolutely fine for a compiler to have conforming and non-conforming modes.  But it is /not/ fine for it to have a major part of its optimisation that is as critically non-conforming as you seem to believe, and not even mention this fact.

 LTO is not even enabled by default (for good reasons).
The good reasons are that not all setups support it (it needs particular linkers), it can significantly increase build times, it makes some kinds of debugging nearly impossible, it plays badly with other tools such as profilers and code coverage analysis, and you can have trouble if you are doing weird things with compiler and linker file interaction or some other kinds of non-standard C coding.
And like many optimisations, it can change the behaviour of incorrect code that happens to work by luck with different choices of optimisation settings.
Those are all very good reasons for not enabling it for default, when the results are often only a few percent improvement in efficiency (for some code, it can be a lot more helpful).
Most compilers don't enable /any/ significant optimisation by default.

 A few goofballs who maintain GNU/Linux distros are turning on LTO for
compiling upstream packages whose development they know nothing about
beyond ./configure && make. (Luckily, the projects themselves can take
countermeasures to defend against this.)
 I think the fact that LTO is almost certainly nonconforming deserves
more attention, but not panic or anything like that.
If it /were/ nonconforming, I think that would deserve huge attention. But it is not.

 LTO should be made into a conforming feature that is optional.
Translation phase 8 can be split into 8 and 9. In 8, translation units
would be optionally partitioned into subsets. Each subset containing
two or more translation units would be be subjected to further semantic
analysis, as a group, and turned into a subset translation unit.
Phase 9 would be same as former 8.
 Whether an implementation supports subsetting and the manner in which
units are indicated for subsetting would be implementation-defined, but
it would be clear that there is a semantic difference, and that each
implementation must support a translation mode in which the subsetting
isn't performed.
 

Date Sujet#  Auteur
20 Mar 24 * A Famous Security Bug118Stefan Ram
20 Mar 24 +* Re: A Famous Security Bug108Kaz Kylheku
20 Mar 24 i+* Re: A Famous Security Bug2Keith Thompson
20 Mar 24 ii`- Re: A Famous Security Bug1Keith Thompson
21 Mar 24 i+* Re: A Famous Security Bug35David Brown
21 Mar 24 ii`* Re: A Famous Security Bug34Kaz Kylheku
21 Mar 24 ii +* Re: A Famous Security Bug4Chris M. Thomasson
21 Mar 24 ii i`* Re: A Famous Security Bug3Chris M. Thomasson
22 Mar 24 ii i `* Re: A Famous Security Bug2Chris M. Thomasson
22 Mar 24 ii i  `- Re: A Famous Security Bug1Chris M. Thomasson
21 Mar 24 ii +* Re: A Famous Security Bug28Keith Thompson
22 Mar 24 ii i+* Re: A Famous Security Bug24Kaz Kylheku
22 Mar 24 ii ii+* Re: A Famous Security Bug19Keith Thompson
22 Mar 24 ii iii`* Re: A Famous Security Bug18Kaz Kylheku
22 Mar 24 ii iii +* Re: A Famous Security Bug2James Kuyper
22 Mar 24 ii iii i`- Re: A Famous Security Bug1Kaz Kylheku
22 Mar 24 ii iii +- Re: A Famous Security Bug1David Brown
22 Mar 24 ii iii `* Re: A Famous Security Bug14Keith Thompson
22 Mar 24 ii iii  `* Re: A Famous Security Bug13Kaz Kylheku
23 Mar 24 ii iii   `* Re: A Famous Security Bug12David Brown
23 Mar 24 ii iii    `* Re: A Famous Security Bug11Kaz Kylheku
23 Mar 24 ii iii     +* Re: A Famous Security Bug2David Brown
24 Mar 24 ii iii     i`- Re: A Famous Security Bug1Kaz Kylheku
23 Mar 24 ii iii     `* Re: A Famous Security Bug8James Kuyper
24 Mar 24 ii iii      `* Re: A Famous Security Bug7Kaz Kylheku
24 Mar 24 ii iii       `* Re: A Famous Security Bug6David Brown
24 Mar 24 ii iii        `* Re: A Famous Security Bug5Kaz Kylheku
24 Mar 24 ii iii         +* Re: A Famous Security Bug3David Brown
27 Mar 24 ii iii         i`* Re: A Famous Security Bug2Kaz Kylheku
28 Mar 24 ii iii         i `- Re: A Famous Security Bug1David Brown
24 Mar 24 ii iii         `- Re: A Famous Security Bug1Chris M. Thomasson
22 Mar 24 ii ii+- Re: A Famous Security Bug1James Kuyper
22 Mar 24 ii ii`* Re: A Famous Security Bug3David Brown
22 Mar 24 ii ii `* Re: A Famous Security Bug2Kaz Kylheku
22 Mar 24 ii ii  `- Re: A Famous Security Bug1David Brown
22 Mar 24 ii i`* Re: A Famous Security Bug3James Kuyper
22 Mar 24 ii i `* Re: A Famous Security Bug2Kaz Kylheku
22 Mar 24 ii i  `- Re: A Famous Security Bug1James Kuyper
22 Mar 24 ii `- Re: A Famous Security Bug1David Brown
21 Mar 24 i`* Re: A Famous Security Bug70Anton Shepelev
21 Mar 24 i +- Re: A Famous Security Bug1Keith Thompson
21 Mar 24 i +* Re: A Famous Security Bug15Kaz Kylheku
22 Mar 24 i i+* Re: A Famous Security Bug13David Brown
22 Mar 24 i ii`* Re: A Famous Security Bug12Kaz Kylheku
22 Mar 24 i ii +- Re: A Famous Security Bug1James Kuyper
22 Mar 24 i ii `* Re: A Famous Security Bug10David Brown
23 Mar 24 i ii  `* Re: A Famous Security Bug9Richard Kettlewell
23 Mar 24 i ii   +- Re: A Famous Security Bug1Kaz Kylheku
23 Mar 24 i ii   +* Re: A Famous Security Bug2David Brown
23 Mar 24 i ii   i`- Re: A Famous Security Bug1Kaz Kylheku
24 Mar 24 i ii   `* Re: A Famous Security Bug5Tim Rentsch
24 Mar 24 i ii    `* Re: A Famous Security Bug4Malcolm McLean
17 Apr 24 i ii     `* Re: A Famous Security Bug3Tim Rentsch
18 Apr 24 i ii      +- Re: A Famous Security Bug1David Brown
18 Apr 24 i ii      `- Re: A Famous Security Bug1Keith Thompson
28 Mar 24 i i`- Re: A Famous Security Bug1Anton Shepelev
22 Mar 24 i +- Re: A Famous Security Bug1Tim Rentsch
22 Mar 24 i `* Re: A Famous Security Bug52James Kuyper
22 Mar 24 i  `* Re: A Famous Security Bug51bart
23 Mar 24 i   +* Re: A Famous Security Bug5Keith Thompson
23 Mar 24 i   i`* Re: A Famous Security Bug4Kaz Kylheku
23 Mar 24 i   i `* Re: A Famous Security Bug3David Brown
23 Mar 24 i   i  `* Re: A Famous Security Bug2bart
24 Mar 24 i   i   `- Re: A Famous Security Bug1David Brown
23 Mar 24 i   `* Re: A Famous Security Bug45James Kuyper
23 Mar 24 i    `* Re: A Famous Security Bug44bart
23 Mar 24 i     +* Re: A Famous Security Bug37David Brown
23 Mar 24 i     i`* Re: A Famous Security Bug36bart
24 Mar 24 i     i +* Re: A Famous Security Bug29David Brown
24 Mar 24 i     i i`* Re: A Famous Security Bug28bart
24 Mar 24 i     i i +* Re: A Famous Security Bug12Keith Thompson
25 Mar 24 i     i i i+- Re: A Famous Security Bug1David Brown
25 Mar 24 i     i i i+* Re: A Famous Security Bug3Michael S
25 Mar 24 i     i i ii+- Re: A Famous Security Bug1David Brown
25 Mar 24 i     i i ii`- Re: A Famous Security Bug1Keith Thompson
25 Mar 24 i     i i i`* Re: A Famous Security Bug7bart
25 Mar 24 i     i i i `* Re: A Famous Security Bug6Michael S
25 Mar 24 i     i i i  +* Re: A Famous Security Bug4bart
25 Mar 24 i     i i i  i`* Re: A Famous Security Bug3David Brown
25 Mar 24 i     i i i  i `* Re: A Famous Security Bug2Malcolm McLean
25 Mar 24 i     i i i  i  `- Re: A Famous Security Bug1Michael S
25 Mar 24 i     i i i  `- Re: A Famous Security Bug1David Brown
25 Mar 24 i     i i `* Re: A Famous Security Bug15David Brown
25 Mar 24 i     i i  `* Re: A Famous Security Bug14Michael S
25 Mar 24 i     i i   `* Re: A Famous Security Bug13David Brown
25 Mar 24 i     i i    +* Re: A Famous Security Bug3Michael S
25 Mar 24 i     i i    i+- Re: A Famous Security Bug1David Brown
25 Mar 24 i     i i    i`- Re: A Famous Security Bug1bart
25 Mar 24 i     i i    `* Re: A Famous Security Bug9bart
25 Mar 24 i     i i     +* Re: A Famous Security Bug7Michael S
25 Mar 24 i     i i     i`* Re: A Famous Security Bug6bart
25 Mar 24 i     i i     i +- Re: A Famous Security Bug1David Brown
25 Mar 24 i     i i     i `* Re: A Famous Security Bug4Michael S
25 Mar 24 i     i i     i  `* Re: A Famous Security Bug3bart
26 Mar 24 i     i i     i   `* Re: A Famous Security Bug2Michael S
26 Mar 24 i     i i     i    `- Re: A Famous Security Bug1bart
25 Mar 24 i     i i     `- Re: A Famous Security Bug1David Brown
24 Mar 24 i     i `* Re: A Famous Security Bug6Michael S
24 Mar 24 i     i  `* Re: A Famous Security Bug5bart
25 Mar 24 i     i   +* Re: A Famous Security Bug2Michael S
25 Mar 24 i     i   i`- Re: A Famous Security Bug1Michael S
25 Mar 24 i     i   +- Re: A Famous Security Bug1David Brown
28 Mar 24 i     i   `- Re: A Famous Security Bug1James Kuyper
23 Mar 24 i     +- Re: A Famous Security Bug1Tim Rentsch
24 Mar 24 i     +- Re: A Famous Security Bug1Michael S
24 Mar 24 i     +* Re: A Famous Security Bug3Michael S
28 Mar 24 i     `- Re: A Famous Security Bug1James Kuyper
20 Mar 24 +- Re: A Famous Security Bug1Joerg Mertens
20 Mar 24 +* Re: A Famous Security Bug5Chris M. Thomasson
27 Mar 24 `* Re: A Famous Security Bug3Stefan Ram

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal