Dan Cross <
cross@spitfire.i.gajendra.net> wrote:
In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just because
it can't prove that there will be no UB. If it could, every signed
or floating-point arithmetic operation with unknown operand values
would grant the same permission.
>
But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.
>
In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.
So there are two things that are at play here.
First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.
I think that this paragraph (and several other it this post and
other posts) represent fundamental misanderstanding. This may
be due to the way C standard is written. AFAIK Extended Pascal
standard (once you translate terminalogy) states the same things as
C about UB, but in clearer way. Some relevant parts below:
: 3.1 Dynamic-violation
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected up to,
: but not beyond, execution of the declaration, definition, or
: statement that exhibits (see clause 6) the dynamic-violation.
: 3.2 Error
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected.
...
: 5.1 Processors
...
: e) be able to determine whether or not the program violates any
: requirements of this International Standard, where such a violation is
: not designated an error or dynamic-violation,
...
: 5.2 Programs
...
: b) if it conforms at level 1, use only those features of the language
: specified in clause 6;
UB in C standard corresponds with 'error' in Pascal standard. And
(by clause above) program is allowed only to use defined features,
trying to use something that has no definition (undefined by
ommision of definition) is automatically an error.
Overflow in arithmetic in Pascal is an error, as is accessing
wrong variant of variant record. Due to this accessing variable
using wring type is an error in Pascal.
Since valid programs shall contain no errors (as defined above)
Pascal compiler my optimize assuming that user program contains
no errors. This is the same as C compiler optimizing on assumption
that there is no udefined behaviour in the program.
Of course, C is different language than Pascal and in particular
C contains more "dangerous" constructs that may lead to
undefined behaviour.
However, the fundamental thing remain: detecting undefined behaviour
("errors") at compile time in general is hard, and compilers are
not obliged to do so. But they may optimize trusting that program
contains no undefined behaviour ("no error").
Indeed, I would go so far as to suggest that _most_ instances of
UB are detected and used (by the translator) during translation.
I think it is different: compiler _assumes_ no undefined behaviour
and optimizes accoringly. But when there is undefined behaviour,
then program behaves in unexpected way at runtime. Also, when
you assume a false thing, then you can logically derive anything
from it, so there is no limit to possible damage.
So to say that, "this program doesn't have UB because the
statement that contains UB is never executed" doesn't make a lot
of sense to me. It would be closer to being correct if one said
"this program is unaffected by UB since the expression that has
UB is never evaluated when the program executes": again, in this
case (as, I suspect, in most cases) the UB simply _is_: the
expression `INT_MAX + 1` does not become well-defined just
because it is never executed.
Well, what is interesting to users is runtime behaviour of programs
and undefined behaviour usually is runtime thing (as troubles that
can be easily detected at compile time are usually constraint
violations which should be detected at compile time). Fact that
some undefined behaviour can be detected at compile time does
not change this. And AFAICS there was very deliberate decision to
allow programs which contains code that would be undefined behaviour
if executed, but are considerd OK if such code is not executed.
BTW: Pascal wording is different but Pascal standard contains
identical provision and Pascal validation suite contains explicit
tests of this sort.
Second, there's this notion that the standard is just
underspecified with respect to these matters, specifically, it
does not _prohibit_ a translation from implementing an emulator
for the abstract machine that evaluates code at translation
time. Indeed, I suspect that _most_ compilers do something
largely analogous to that; that's how they detect UB so that
they can take advantage of it when optimizing. But if that's
the case, then nothing prohibits them from relieving themselves
of their obligation to follow the standard once they observe
that some bit of code has UB.
As I wrote, this is different. Compilers routinely compute some
constant expressions at compile time. Constant here meaning that
expression does not depend on runtime values. Compilers track
ranges of variables. But this is done using assumption that
there are no undefined behaviour. For example in loop:
for(int i = 1; i > 0; i++) {
...
}
absent assigments to i in loop body compiler may infer that 'i > 0'
and skip the test. If however there is undefined behaviour, then
compiler may infer any nonsense. This may look like compiler
detected undefined behaviour, but compiler typically do not check
consistency of inferences. In fact, intermediate things and
useless facts are quickly discarded, so detecting undefined
behaviour via inconsistency of inferred facts would significantly
increase memory use and probably also compile time.
A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.
Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.
As already noted C standard explictely forbids such behaviour.
BTW: There were past discussions of the same and other people
quited relevant passage which is quite explicit. I am not
going to search for it, but it is in the standard.
Is it? I am unable to locate where the standard _actually says
that it is_. That is my whole point.
Sorry, I looked in place given by other people, but I do not
remember exact location. I would say that once you find right
place and read it carefuly it is pretty clear.
And yet the standard does not say that. That is an
interpretation; I assume it is universally shared, but if we
want to limit ourselves to what the standard _actually says_ it
is woefully underspecified in this regard.
There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied.
Originally C was defined by single implementation which was not
doing much optimisation. But clearly starting from the first
C standard undefined behaviour had the same meaning as Pascal
error: permission for compilers to optimize on assumption that
is does not happen. The issue was well understood in seventies.
Already in late sixties Fortran compilers could do interesting
optimizations, not expected by naive users. In seventies
majority (or at least most influential) view was that it is
programmer resonsibility to obey language rules and that
compiler should optimize on assumption that rules are obeyed.
C reflect this point of view.
One can discuss if such point of view is valid now, but C
is a product of such thinking.
This is circular reasoning. You're saying that something that
is provably UB in this program cannot prevent that program from
being strictly confirming because the program is strictly
confirming.
What you wrote above is similar to standard wording, except that
standard formilates it much better, closer to "code that would
cause undefined behaviour if executed does not prevent otherwise
strictly confirming program from being strictly confirming".
In the past there were disscusion when an implementation can reject
a program. I do not remember what was the conclusion in the case
when implementation can prove that program must cause undefined
behaviour, but otherwise program violates no constraints. Probably
it can reject it, but I am not sure. But if there were any possiblity
that program may execute without undefined behaviour (including
containg code that would cause undefined behaviour if executed),
then implementation should accept such program.
This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.
As I wrote, there is quite explicit statement in the standard
which says opposite of what you wrote above: mere presence of
code that would cause undefined behaviour if executed does not
make program non conforming.
In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.
I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.
I do not thing operational semantics is best way to define C.
Naive operational semantics would define too much things, so
one would need serious work to define what should be defined
and leave undefined what should be undefined.
My personal favorite is axiomatic semantics. IMO it is quite well
adapted to defining programming languages. Substantial part of
Pascal was given axiomatic semantics in Alagic and Arbib book.
Pascal standard do not explitely use axiomatic semantics, but
my impression was that with managable effort it coulde be rewritten
using axiomatic semantics. Modern C standard is bigger, partly due
to library case, partly because C have much more operators. But
problem seem to be mostly quantity of needed text.
I think that compiler writers would welcome axiomatic semantics,
it would make their work simpler. More preciely, now compiler
writers must temselves translate standad text into formulation
similar to axiomatic semantics. Having official semantics
would make their work simpler. It would prevent implementing
some optimizations based on misunderstanding of the standard,
but if such optimizations were deemed worthty they could
implement then as a nonstandard thing (like -fast-math now)
and lobby for change in the standard.
I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.
-- Waldek Hebisch
Haut de la page
Les messages affichés proviennent d'usenet.
NewsPortal