Liste des Groupes | Revenir à c arch |
David Brown <david.brown@hesbynett.no> writes:OK.Anton writes code that seriously pushes the boundary of what can beNo. We distribute Gforth as source code. It works for a wide variety
achieved. For at least some of the things he does (such as GForth) he
is trying to squeeze every last drop of speed out of the target. And he
is /really/ good at it. But that means he is forever relying on nuances
about code generation. His code, at least for efficiency if not for
correctness, is dependent on details far beyond what is specified and
documented for C and for the gcc compiler. He might spend a long time
working with his code and a version of gcc, fine-tuning the details of
his source code to get out exactly the assembly he wants from the
compiler.
of architectures and compilers. So unlike what you suggest and what
some people have suggested earlier to avoid problems with new
"optimizations" in newer releases of gcc, we don't concentrate on a
specific version of gcc.
As long as you are sticking to defined behaviour (defined by the C standards, or by the gcc documentation), and use specified C standard versions in the build, then there should not be any incorrect behaviour in different versions. Performance might regress, and of course there's always the risk of bugs.Of course it is frustrating for him when the next version ofIt's normal and no problem when the next version of gcc generates
gcc generates very different assembly from that same source, but he is
not really programming at the level of C, and he should not expect
consistency from C compilers like he does.
different assembly language. There are some basic assumptions that
our code relies on, and that mostly does not change between gcc
versions.
An essential assumption is that, when we have:I don't see anything in the gcc reference manual suggesting that &&B is the end of the corresponding code. What you get - all you get - is that "goto * &&A" gives the same effect as "goto A".
A:
C code
B:
... that when we do &&A and &&B (which is documented in the GNU C
manual), we get the addresses pointing to the start and end of the
machine code corresponding to the C code.
In the days starting withThis is all the kind of thing you can expect when you make assumptions about code generation that are not supported by the documentation. Compilers can, and do, move code around in various ways, duplicate it, combine it, unroll it, compress it - whatever gives (or tries to give - optimisation is not an exact science) better results while giving the documented behaviour.
gcc-3.0, we found that gcc started reordering the basic blocks within
loops, so replaced loops in the part of the code that needs such
assumptions into separate functions. Around gcc-7, gcc started to
compile
A: C-code1
B: C-code2
C: goto *...
to the same code as
A: C-code1; C-code2; goto *...;
B: C-code2; goto *...;
C: goto *...;
I found a workaround that avoids this kind of code generation.
Another problem from gcc-3.1 to at least gcc-4.4 (intermittently) isAgain, the compiler is not doing anything outside its specifications. What you want here is a guarantee of behaviour that is not defined anywhere. You are not seeing a bug in the compiler, or an incompatibility with previous versions - you are seeing the need for a feature (and a controlling compiler flag) that gcc does not currently have. It's a potential feature that might be useful to other people too, while being an anti-feature to others.
that gcc compiled
goto *ca;
into the equivalent of
goto gotoca;
/* and elsewhere */
gotoca: goto *ca;
We reported that repeatedly. At one point a gcc maintainer gave us
some bullshit about a possible performance advantage from this
transformation, of course without presenting any empirical support,
while we saw a big slowdown on our code. We developed workarounds for
that, and they are in Gforth to this day, even though we have not
encountered a new gcc version with this problem for over a decade, but
new Gforth should also work on old gcc.
Another assumption is that when we concatenate the code snippet(-fno-delete-null-pointer-checks will make no difference to code that doesn't accidentally use leap-before-you-look checking.)
between label A and B (which contains C-code1) and the code snippet
between label X and Y (which contains C-code3), executing the result
will behave like the concatenation of C-code1 and C-code3 in source
code. This assumption has two aspects:
1) Do the register assignments at the labels fit together. It turns
out that we never had a problem with that, and I think that the
reason for that is that the "goto *" can jump to any of those
labels (all their addresses are taken), and so the register
assignment must be the same right after each label.
What guarantees that the assignments are the same right before each
label? Probably that after the label, there is not much between
the label and the next goto*, and that makes all registers at
potential targets live.
2) If we have two pieces of machine code produced in this fashion,
does the architecture guarantee that such a concatenation works?
It turns out that in general-purpose architectures, all-but-one do.
That includes IA-64. The exception is MIPS with its architectural
load-delay slot (and there are also scheduling restrictions having
to do with the hilo register that may be problematic): the first
code snippet may end in a load, and the next code snippet may start
with an instruction that reads the result of the load. So we just
disabled this concatenation on MIPS.
We do a number of things to achieve stability: We do sanity-checking
on the resulting machine code snippets and fall back to plain threaded
code if the snippets turn out not to be relocatable.
Also, we enable all the flags for defining behaviour in gcc that we
find (unfortunately, in the documentation they are intermixed with
other options). For good measure, this includes
-fno-delete-null-pointer-checks, although I doubt that it makes a
difference for our code either way.
One thing that came up about a year ago was that gcc auto-vectorizesThat happens sometimes. In my brief testing of clang, it often seems a bit too keen on vectorising code that would be better kept short and simple. I have no doubt gcc gets that wrong sometimes too.
adjacent memory accesses on AMD64 (apparently the AMD64 port
maintainers are unhappy because AMD64 does not have instructions like
ARM A64's ldp and stp:-), which did not impact correctness, but led to
worse performance (not just in Gforth; I have also seen it in the
bubble benchmark from John Hennessy's Stanford small integer
benchmarks; I'm sure there is some SPEC benchmark that benefits). A
quick addition of -fno-tree-vectorize fixed that.
We have been thinking about moving from C to a better-definedYou are looking for more than C and the gcc documented extensions give you. That is always going to be hard.
language, namely assembly language, but have not yet taken the plunge,
and it has not been necessary yet. Gcc has not been as crazy in our
experience as the UB rethoric might make one think. Why is that? I
think the reasons are:
1) Gforth and a lot of other "irrelevant" (to the gcc maintainers)
projects sail in the slipstream of "relevant" code like SPEC and
the Linux kernel that are all full of undefined behaviour (Linux
defines many of them with flags, like Gforth does), so gcc does not
"optimize" as crazily as a UB fan might wish.
2) The code snippets are very short, with many in-edges on the
preceding and following label, which tends to destroy any
"knowledge" that the compiler might have derived from the
assumption that the program does not exercise undefined behaviour.
This severely limits the distance over which such "optimizations"
can be performed.
Nevertheless, the last time I tried what happens if I compile without
the behaviour-defining options, the result did not work; I did not
investigate this further.
Les messages affichés proviennent d'usenet.