Bernd Linsel <
bl1-thispartdoesnotbelonghere@gmx.com> writes:
Maybe my previous post was not clear enough: It's not a general UB
detector that I'd like to have integrated into the compiler (there are
static checker tools available that can nearly perfectly do that);
Undefined behaviour is something that is exercised at run-time.
That's why the "undefined behaviour sanitizers" insert run-time
checks. And of course they only detect the behaviour when it is
actually exercised. I.e., they usually will not detect overflowable
buffers, because your usual test inputs don't exercise those.
What do you mean with the static checker tools you mention?
instead, I'd like to get a warning when the compiler does something
other than you would expect when reading the code in a "do what I mean"
manner.
Of course the fans of compilers that do what nobody means found a
counterargument long ago: They claim that compilers would need psychic
powers to know what you mean. So one way to specify what I guess you
mean with 'read the code in a "do what I mean" manner' is the
behaviour that the the compiler exhibits without "knowledge" coming
from the assumption that there is no undefined behaviour in the
program. For a longer discussion read
<
https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>.
And yes, compilers could actually produce information about
differences between such a compilation and a compilation where the
compiler assumes that undefined behaviour does not happen.
One way to use such information is if you then intend to run the
compiler in "Assume That Undefined Behaviour Does Not Happen" mode for
production code: check *every* case where the resulting code behaves
differently. If the behaviour of the ATUBDNH compiler is not
according to your intentions, change the source code to avoid
undefined behaviour in such cases, forcing the ATUBDNH compiler to
behave as you intend. If the behaviour of the ATUBDNH compiler is as
you intended, you can keep the source code as-is (but then you get the
same warning the next time 'round). Or you can change the source code
in a way that results in the compiler not needing to ATUBDNH in order
to produce the code you would like (see below for examples).
Another way to use such information is if you intend to run the
compiler in don't-ATUBDNH mode for production code. In that case you
only need to look at a few cases: those occuring in the
most-frequently executed code. Again, for each difference there are
two cases: If your intention is only reflected in the don't-ATUBDNH
code, you don't have to do anything, or change the code such that the
warning goes away in the future (without changing the code). If your
intention is also covered by the ATUBDNH case, you can change the code
to actually perform the optimization also in the don't-ATUBDNH
compiler.
Here are examples: Wang et al. [Section 3.3 of wang+12], found that in
all of SPECint 2006 there were only two places where the ATUBDNH made
a measurable difference to performance. These were two inner loops.
In one case the code is
int k;
int *ic, *is;
...
for (k = 1; k <= M; k++) {
...
ic[k] += is[k];
...
}
and the don't-ATUBDNH variant has a sign extension after the "k++"
that the ATUBDNH does not have. Wang et al. suggest changing the type
of k to size_t to avoid this sign-extension operation. After that
change ATUBDNH makes no difference to this loop.
The other loop is
quantum_reg *reg;
...
// reg->size: int
// reg->node[i].state: unsigned long long
for (i = 0; i < reg->size; i++)
reg->node[i].state = ...;
Here ATUBDNH pulls the load of reg->size out of the loop (it assumes
that reg->size does not alias with reg->node[i].state). Wang et
al. solved that by assigning reg->size to a variable outside the loop,
i.e., something like:
quantum_reg *reg;
...
long reg_size = reg->size
for (i = 0; i < reg_size; i++)
reg->node[i].state = ...;
But once we are at that, why stop at optimizations suggestions coming
from ATUBDNH. E.g., consider a loop similar to the second loop:
quantum_reg *reg;
...
// reg->size: int
// reg->node[i].state: int <==== HERE'S THE DIFFERENCE
for (i = 0; i < reg->size; i++)
reg->node[i].state = ...;
In this case ATUBDNH would not allow pulling reg->size out of the
loop, yet you don't intend to ever alias reg->size with
reg->node[i].state. A compiler could actually guess your intention,
and suggest that you may want to pull reg->size out (plus also mention
the caveats about possible aliasing).
So once we are there, we no longer need ATUBDNH, we just need
don't-ATUBDNH and a compiler option that produces manual-optimization
suggestions, ordered by the expected payoff (probably it's a good idea
to use profile data for this ordering).
I personally try to turn GCC into don't-ATUBDNH as far as possible
with options like "-fno-delete-null-pointer-checks
-fno-strict-aliasing -fno-strict-overflow".
@InProceedings{wang+12,
author = {Xi Wang and Haogang Chen and Alvin Cheung and Zhihao Jia and Nickolai Zeldovich and M. Frans Kaashoek},
title = {Undefined Behavior: What Happened to My Code?},
booktitle = {Asia-Pacific Workshop on Systems (APSYS'12)},
OPTpages = {},
year = {2012},
url1 = {
http://homes.cs.washington.edu/~akcheung/getFile.php?file=apsys12.pdf},
url2 = {
http://people.csail.mit.edu/nickolai/papers/wang-undef-2012-08-21.pdf},
OPTannote = {}
}
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>