David Brown <
david.brown@hesbynett.no> writes:
On 05/09/2024 13:31, Anton Ertl wrote:
It's normal and no problem when the next version of gcc generates
different assembly language. There are some basic assumptions that
our code relies on, and that mostly does not change between gcc
versions.
...
In the days starting with
gcc-3.0, we found that gcc started reordering the basic blocks within
loops, so replaced loops in the part of the code that needs such
assumptions into separate functions. Around gcc-7, gcc started to
compile
A: C-code1
B: C-code2
C: goto *...
to the same code as
A: C-code1; C-code2; goto *...;
B: C-code2; goto *...;
C: goto *...;
I found a workaround that avoids this kind of code generation.
>
This is all the kind of thing you can expect when you make assumptions
about code generation that are not supported by the documentation.
Nobody said that gcc did anything wrong here. We were, however,
surprised that -fno-reorder-blocks did not suppress the reordering; we
reported this as bug, but were told that this option does something
different from what it says. Anyway, we developed a workaround. And
we also developed a workaround for the code duplication problem that
showed up in gcc-7.
I too have written code that relies on being able to identify the start
and end of certain bits of code - typically for microcontrollers where
you want some bits of code (like flash programming routines or very
timing critical interrupt code) put in ram rather than flash. Sometimes
that can be done with compiler extensions, sometimes it takes extra
flags, linker file magic, or other messing around. But it's not
something I would expect to be portable, and it needs confirmed for
every compiler version and selection of flags used. (I realise that
this is a vastly simpler task for the kind of work I do than for an open
source project!)
Between what we developed for gcc-3.2 (released 2002) in 2003 and
today, the only new development in these 21 years was the code
duplication in gcc-7 and the workaround for that. IIRC Gforth also
worked without that workaround, but was slower.
Another problem from gcc-3.1 to at least gcc-4.4 (intermittently) is
that gcc compiled
goto *ca;
into the equivalent of
goto gotoca;
/* and elsewhere */
gotoca: goto *ca;
We reported that repeatedly. At one point a gcc maintainer gave us
some bullshit about a possible performance advantage from this
transformation, of course without presenting any empirical support,
while we saw a big slowdown on our code. We developed workarounds for
that, and they are in Gforth to this day, even though we have not
encountered a new gcc version with this problem for over a decade, but
new Gforth should also work on old gcc.
>
Again, the compiler is not doing anything outside its specifications.
Nobody said it did. We did, however, report this as a pessimization
repeatedly. And eventually the gcc people fixed it; we already saw
versions without this bug in gcc-4.0 or 4.1 IIRC, but in 4.4 it was
there again, but apparently they have since fixed it for good.
You are looking for more than C and the gcc documented extensions give
you. That is always going to be hard.
Really? It works.
Ideally, you need a new gcc flag or two with documented and guaranteed
effects to give you the assurance you need for your code. That's going
to take a lot of effort, I would expect, and I can see it being hard for
a relatively nice project like Gforth to push for that.
Our approach has been to find sanity-checks and workarounds based on
what gcc provided.
However, we were not the only ones working with code copying, and
Prokopski and Verbrugge have implemented changes to gcc that support
this technique, and presented it at the GCC Developers’ Summit 2007
<
https://gcc.gnu.org/wiki/HomePage?action=AttachFile&do=get&target=GCC2007-Proceedings.pdf>
and at CC'08:
@InProceedings{prokopski&verbrugge08,
author = {Gregory B. Prokopski and Clark Verbrugge},
title = {Compiler-Guaranteed Safety in Code-Copying Virtual
Machines},
booktitle = {Compiler Construction (CC'08)},
pages = {163--177},
year = {2008},
publisher = {Springer LNCS 4959},
url = {
http://www.sable.mcgill.ca/publications/papers/2008-2/paper.pdf},
OPTannote = {}
}
The source code was available, but the gcc maintainers were apparently
not interested. So much for "patches welcome".
Looking back, while there was quite a bit of interest in code-copying
(both for interpreters and for partial evaluators) from about
1998-2008, AFAIK Gforth is the only project that stuck with this
technique.
When others consider relatively unsophisticated interpreters to be too
slow, they tend to go for JIT compilers that generate machine code
using target-specific code (including machine-code encoding code).
Maybe the constant advocacy that everything outside the standard is
considered to be broken and the next compiler will not compile it as
intended has had its effects. Or maybe if we had published a
code-copying howto, more people would have found out how to do it in a
way that works pretty reliably.
OTOH, we ourselves have been thinking about switching to the kind of
JIT compiler that others have gone for. So we fell for this advocacy
ourselves. But looking at the stability of Gforth, this is not really
justified. Still, a solid foundation like machine code provides more
confidence than a foundation based on C where every new compiler
version may bring unpleasant surprises (and not just for projects such
as Gforth), even if the experience is that things work.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>