Newsportal USENET - Re: "The provenance memory model for C", by Jens Gustedt

On 7/10/2025 4:34 AM, David Brown wrote:

On 10/07/2025 04:28, BGB wrote:
On 7/9/2025 4:41 AM, David Brown wrote:
On 09/07/2025 04:39, BGB wrote:
On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>
...
>
There have been plenty of papers and blogs written about pointer provenance (several by Gustedt) and how it could work. It's not a very easy thing to follow in any format. A patch to current C standards is perhaps the least easy to follow, but it is important for how the concept could be added to C.
>
>
Admittedly, as of yet, I haven't quite figured out what exactly provenance is supposed to be, or how it is supposed to work in practice.
>
I've read a bit, but I think it would take quite an effort to understand the details.
As a compiler user (albeit one with an interest in compilers and code generation), rather than a compiler developer, my attitude to writing C code will be the same if and when pointer provenance becomes part of the C model and C compiler optimisations - don't lie to your compiler. If you want to do weird stuff behind the compiler's back (and that is certainly possible in embedded development), use "volatile" accesses in the right places. So for me, in practical use, pointer provenance will simply mean that the compiler can do a bit more optimisation with less manual work - and that's a nice thing. (I'll still be interested in how it works, but that's for fun, not for real work.)

Probably true, but I am also thinking as a compiler developer.
Granted, arguably my compiler isn't great, but this is a different issue.

>
If you think that certain code could go faster because certain suspected
aliasing isn't actually taking place, then since C99 you were able to
spin the roulette wheel and use "restrict".
>
>
"restrict" can certainly be useful in some cases. There are also dozens of compiler extensions (such as gcc attributes) for giving the compiler extra information about aliasing.
>
>
And, the annoyance of them being compiler dependent...
Sure. "restrict" is, of course, not compiler dependent - but the effect it has on optimisation is compiler dependent.

I was unclear, I meant more about the use of GCC attributes and similar being compiler dependent.

Often you can also get improved results by manually "caching" data in local variables, instead of using pointer or array access directly, thus avoiding any extra memory accesses the compiler has to put in just in case pointers alias. But code is neater if you don't have to do that kind of thing.

This is often sort of what I end up doing anyways, because manually caching stuff in local variables and being selective about when things are loaded or stored to external locations, is often better for performance in MSVC.
Doesn't make so much difference with GCC though.
But, for native Windows, I am primarily using MSVC.
For BGBCC, yeah best case performance is manually caching things, manually unrolling or modulo scheduling loops, and trying to organize expressions such that results are not reused too quickly. Often typically breaking up complex expressions into multiple simpler ones so as to limit dependencies. Also avoiding any expensive operations whenever possible (divide, modulo, or 64 bit multiply or similar), ...
Such a style tends to look kinda like one is writing assembler in C, but can often perform OK.

>
>
So the aliasing analysis and its missed opportunities are the
programmer's responsibility.
>
It's always better for the machine to miss opportunities than to miss
compile. :)
>
>
Agreed.
>
It is always better for the toolchain to be able to optimise automatically than to require manual intervention by the programmer. (It should go without saying that optimisations are only valid if they do not affect the observable behaviour of correct code.) Programmers are notoriously bad at figuring out what will affect their code efficiency, and will either under-use "restrict" where it could clearly be safely used to speed up code, or over-use it resulting in risky code.
>
If the compiler can't be sure that accesses don't alias, then of course it should assume that aliasing is possible.
>
The idea of pointer provenance is to let compilers (and programmers!) have a better understanding of when accesses are guaranteed to be alias- free, when they are guaranteed to be aliasing, and when there are no guarantees. This is useful for optimisation and program analysis (including static error checking). The more information the compiler has, the better.
>
>
That is the idea at least.
>
Though, if one assumes the compiler has non-local visibility, this is a problem.
>
Granted, as long as one can keep using more traditional semantics, probably OK.
Of course compilers can (and must!) fall back to the "assume accesses might alias" approach when they don't have the extra information. But at least for code in the same compilation, they can do better.
And there is a trend amongst those wanting higher performance to use link-time optimisation, whole-program optimisation, or similarly named techniques to share information across units. Traditional separate compilation to object files then linking by identifier name only is a nice clear model, but hugely limiting for both optimisation and static error checking.

Ironically, my compiler doesn't do traditional separate compilation in the first place.
The frontend (separate stages) are basically:
   Preprocess;
   Parse to AST;
   Flatten AST to a Stack-Based IL.
The handling of value caching and similar is basically in the 3AC stage, which doesn't exist until after we are generating the final binary.
If you try to produce object files, or static libraries, in this case, they are basically just blobs of this stack-oriented bytecode.
The behavior of the IL is sort of part-way between that of JVM and .NET bytecode. Though, has some structural aspects in common with WASM (eg, more monolothic bytecode blob rather than structures and tables), but the bytecode semantics are more like those of .NET bytecode.
Where, say:
   JVM:
   ".class" files, each representing a single class type;
Has tables for literals, fields, and methods;
Each table contains various structures for each item.
Bytecode operations have explicit types;
Member and method types, etc, are identified by ASCII strings;
Handles internal control flow with byte-based branch offsets;
Stack offset to be the same at all paths to a given spot.
However, it is not always obvious what the stack-offset is.
   .NET:
   Has various tables embedded in a COFF or PE/COFF;
   Has tables organized similar to those in a relational database;
   Uses a dense packing scheme for table contents.
   Bytecode operations types are carried implicitly on the stack;
   Type signatures are encoded as binary blobs;
   IIRC, mixed offsets and label breaks;
   No items may be on the stack during a branch.
   WASM:
   Big blob of bytecode, which defines other structure;
   Sorta type-inferred, often using wonk to express types implicitly;
   Uses structured control-flow even at the IL level.
   BGBCC / RIL3:
   Big blob of bytecode, defining all structure;
   Internally, compiler uses two large tables for everything;
   One table holds all top-level declarations and similar;
   The other holds any literal values and non-declaration entities.
   Operation types are carried on the stack;
   Types are mostly identified using ASCII strings;
   Control flow via label markers and branch to label ID's;
   Any branch requires a label instruction with a matching ID.
   Usually no items on stack during branch.
   Except for special edge cases like "a?b:c".
Though, part of this was because BGBCC's origins weren't originally as a C compiler, rather it was an early fork off of my BGBScript interpreter (though, using parts from two different versions of the interpreter, *1).
Where, BGBScript was a language superficially resembling JavaScript.
*1:
   2001: Wrote a Scheme interpreter (early high-school);
   Mostly worked using S-Expression walking and similar.
   2004: Wrote a JS clone around the time I graduated high-school;
   First version used repurposed DOM for the ASTs;
   And AST walking for interpretation;
   Major property: Horridly slow.
   2006: Rewrote BGBScript VM
   Used chunks from the Scheme interpreter;
   Switched over to bytecode;
   Went from conservative to precise GC (...pain...).
   2007:
   BGBCC starts as a fork of the 2004 interpreter,
   With parts of the 2006 interpreter.
   Soon discovers: Interpreted C not a great script language.
   2008/2009: Sorta writes a Doom 3 clone;
   A (mostly final) version of BGBScript is created.
   Mostly borrowing more from ActionScript;
   Gains a JIT compiler;
   Switched back to conservative GC.
   2010/2011: The Doom3 clone partly reworking into a Minecraft clone.
   Uses BGBScript as a major script language, partial impl language.
   2014:
   Rebooted script language, new version more like Java and C#;
   Also a new 3D engine, designed to be simpler and lighter weight.
   2017:
   I started designing my own ISA and similar;
   BGBCC was revived and used as a C compiler for it.
   Had also looked at LCC,
   but noted I had less of a starting point with LCC;
   What LCC could do, seemingly BGBCC already did.
However, partly because of this, BGBCC remained mostly using a bytecode format. But, stuff got wonky in the middle part partly as at one point I had assumed trying to eliminate the stack IL stage (so the stages aren't fully separate in the code).
Also some wonk as there was a point earlier on in BGBCC's development where I was trying to move to a purely ASCII IL that was syntactically derived from PostScript, but this sorta fell apart.

>
>
>
In my compiler, the default was to use a fairly conservative aliasing strategy.
>
...
With pointer operations, all stores can be assumed potentially aliasing unless restrict is used, regardless of type.
>
>
C does not require that. And it is rare in practice, IME, for code to actually need to access the same data through different lvalue types (other than unsigned char). It is rarer still for it not to be handled better using type-punning unions or memcpy() - assuming the compiler handles memcpy() decently.
>
>
I take a conservative approach because I want the compiler to be able to run code that assumes traditional behavior (like that typical of 1990s era compilers, or MSVC).
Please don't call this "traditional behaviour" of compilers - be honest, and call it limited optimisation and dumb translation. And don't call it "code that assumes traditional behaviour" - call it "code written by people who don't really understand the language". Code which assumes you can do "extern float x; unsigned int * p = (unsigned int *) &x;" is broken code. It always has been, and always will be - even if it does what the programmer wanted on old or limited compilers.
There were compilers in the 1990's that did type-based alias analysis, and many other "modern" optimisations - I have used at least one.

Either way, MSVC mostly accepts this sorta code.
Also I think a lot of this code was originally written for compilers like Watcom C and similar.
Have noted that there are some behavioral inconsistencies, for example:
Some old code seems to assumes that x<<y, y always shifts left but modulo to the width of the type. Except, when both x and y are constant, code seems to expect it as if it were calculated with a wider type, and where negative shifts go in the opposite direction, ... with the result then being converted to the final type.
Meanwhile, IIRC, GCC and Clang raise an error if trying to do a large or negative shift. MSVC will warn if the shift is large or negative.
Though, in most cases, if the shift is larger than the width of the type, or negative, it is usually a programming error.

It's okay to be conservative in a compiler (especially when high optimisation is really difficult!). It's okay to have command-line switches or pragmas to support additional language semantics such as supporting access via any lvalue type, or giving signed integer arithmetic two's complement wrapping behaviour. It's okay to make these the defaults.
But it is not okay to encourage code to make these compiler-specific assumptions without things like a pre-processor check for the specific compiler and pragmas to explicitly set the required compiler switches. It is not okay to excuse bad code as "traditional style" - that's an insult to people who have been writing good C code for decades.

A lot of the code I have seen from the 90s was written this way.
Though, a lot of it comes from a few major sources:
   id Software;
   Can mostly be considered "standard" practice,
   along with maybe Linux kernel, ...
   Apogee Software
   Well, some of this code is kinda bad.
   This code tends to be dominated by global variables.
   Also treating array bounds as merely a suggestion.
   Raven Software
   Though, most of this was merely modified ID Software code.
Early on, I think I also looked a fair bit at the Linux kernel, and also some of the GNU shell utilities and similar (though, the "style" was very different vs either the Linux kernel or ID code).
Early on, I had learned C partly by tinkering around with id's code and trying to understand what secrets it contained.
But, alas, an example from Wikipedia shows a relevant aspect of id's style:
https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code
Which is, at least to me, what I consider "traditional".
A particularly infamous example of Apogee's style is basically the code for the Build Engine. Where, ironically, the dominant versions of "modern" Duke Nuken 3D ports are derived from ground-up rewrites rather than ports of the original Build Engine code.
But, Build Engine defines ones' motivation to try to port it to anything that is not 32-bit x86.
Like, at least id's code was mostly 64-bit clean...
Quake 1/2/3 also used a lot of "float *" and "double *" for general memory copies in a few places.
I had found though that these would break if compiling on 64-bit and then running on a Core2 based system, as seemingly the MOVSS and MOVSD instructions on Core2 may mangle bit patterns if they happen to encode a NaN (but not on AMD chips).
Though, the general workaround was to replace these cases with "memcpy()".

>
Granted, it is a tradeoff that a lot of this code needs to be modified to work on GCC and Clang (absent the usual need for "-fwrapv -fno- strict-aliasing" options).
>
Granted, there is a command-line option to enable TBAA semantics, just it is not the default option in this case (so, in BGBCC, TBAA is opt- in; rather than opt-out in GCC and Clang).
>
BGBCC's handling of memcpy is intermediate:
It can turn it into loads and stores;
But, it can't turn it into a plain register move;
Taking the address of a variable will also cause the variable to be loaded/stored every time it is accessed in this function (regardless of where it is accessed in said function).
>
So:
memcpy(&i, &f, 8);
Will still use memory ops and wreck the performance of both the i and f variables.
Well, there you have scope for some useful optimisations (more useful than type-based alias analysis). memcpy does not need to use memory accesses unless real memory accesses are actually needed to give the observable effects specified in the C standards.

Possibly, but by the stage we know that it could be turned into a reg-reg move (in the final code generation), most of the damage has already been done.
Basically, it would likely be necessary to detect and special case this scenario at the AST level(probably by turning it into a cast or intrinsic). But, usually one doesn't want to add too much of this sort of cruft to the AST walk.
But, then, apart from code written to assume GCC or similar, most of the code doesn't use memcpy in this way.
So, it would mostly only bring significant advantage if pulling code in from GCC land.
Ideally, I would want the compiler to be smaller and faster.
Had at one point started looking into this, but the effort fizzled, and now I am not sure I was going in the right direction. It would have gone over to separate compilation, but now it seems like compiling via an IL stage may in fact be the better option.
Well, with the main debate being what the IL should look like:
   Stack IL: Like JVM, .NET, WASM, existing BGBCC, ...
   Untagged register IL: Like Dalvik
   SSA Form: Like LLVM/Clang.
But, my own experience suggests stack-oriented may still be the better option for a compiler IL (with translation into SSA form during the backend).
Then, the major difference is seemingly the perennial issue of trying to rework how the stack bytecode is packaged.
Which then turns into me wanting to stick it into a TLV package or similar, then losing motivation vs not bothering and staying with the existing "big blob of bytecode" format.
Well, and the simplest option of "take the existing format and put it in a RIFF wrapper" is mostly in "meh, whatever" territory.
Decided to leave out a tangent where BMP vs RDIB led me off on a tangent about image and audio formats.

unsigned int f_to_u(float f) {
     unsigned int u;
     memcpy(&u, &f, sizeof(f));
     return u;
}
gcc compiles that to :
f_to_u:
     movd eax, xmm0
     ret

Yeah, it is more clever here, granted.

Meanwhile:
   i=*(uitn64_t *)(&f);
Will only wreck the performance of 'f'.
>
>
The best option for performance in BGBCC is one of either:
   i=__float64_getbits(f); //compiler intrinsic
   i=(__m64)f;              //__m64 and __m128 do a raw-bits cast.
>
Though, these options don't exist in the other compilers.
Such compiler extensions can definitely be useful, but it's even better if a compiler can optimise standard code - that way, programmers can write code that works correctly on any compiler and is efficient on the compilers that they are most interested in.

Possibly.
For "semi-portable" code, usually used MSVC style, partly as by adding 'volatile' it seemingly also works in GCC. Though, often with macro wrappers.

>
Implicitly, casting via __m64 or __m128 is a double-cast though. In BGBCC, these types don't natively support any operators (so, they are basically sort of like the value-equivalents of "void *").
>
>
So:
   memcpy(&i, &f, 8);      //best for GCC and Clang
   i=*(uitn64_t *)(&f);   //best for MSVC, error-prone in GCC
   i=(__m64)f;             //best for BGBCC, N/A for MSVC or GCC
>
In a lot of cases, these end up with wrappers.
>
GCC:
   static inline uitn64_t getU64(void *ptr)
   {
     uitn64_t v;
     memcpy(&v, ptr, 8);
     return(v);
   }
MSVC or BGBCC:
   #define getU64(ptr) (*((volatile uint64_t *)(ptr)))
>
Though, have noted that volatile usually works in GCC as well, though in GCC there is no obvious performance difference between volatile and memcpy, whereas in MSVC the use of a volatile cast is faster.
In gcc, a memcpy here will need to use a single memory read unless "getU64" is called with the address of a variable that is already in a register (in which case you get a single register move instruction). A volatile read will also do a single memory read - but it might hinder other optimisations by limiting the movement of code around.

Possibly.
When I tried benchmarking these before:
   GCC:
   Seemingly no difference between memcpy and volatile;
   MSVC:
   Adding or removing volatile made no real difference;
   Using memcpy is slower.
   BGBCC: Either memcpy or volatile carries an overhead.
   The use of volatile is basically a shotgun de-optimization;
   If doesn't know what to de-optimize, so goes naive for everything.

On MSVC, last I saw (which is a long time ago), any use of "memcpy" will be done using an external library function (in an DLL) for generic memcpy() use - clearly that will have /massive/ overhead in comparison to the single memory read needed for a volatile access.

It is slightly more clever now, but still not great.
   Will not (always) generate a library call.
   Though, in VS2008 or similar, was always still a library call.
   VS2010 and VS2013 IIRC might setup and use "REP MOVSB" instead.
It will do it inline, but still often:
   Spill variables;
   Load addresses;
   Load from source;
   Store to destination;
   Load value from destination.
What BGBCC gives here is basically similar.

>
Don't want to use static inline functions in BGBCC though, as it still doesn't support inline functions in the general case.
>

Date	Sujet	#	Auteur
2 Jul 25	"The provenance memory model for C", by Jens Gustedt	9	Alexis
2 Jul 25	Re: "The provenance memory model for C", by Jens Gustedt	8	Kaz Kylheku
9 Jul03:39	Re: "The provenance memory model for C", by Jens Gustedt	7	BGB
9 Jul10:41	Re: "The provenance memory model for C", by Jens Gustedt	6	David Brown
10 Jul03:28	Re: "The provenance memory model for C", by Jens Gustedt	5	BGB
10 Jul10:34	Re: "The provenance memory model for C", by Jens Gustedt	4	David Brown
11 Jul03:09	Re: "The provenance memory model for C", by Jens Gustedt	3	BGB
11 Jul09:48	Re: "The provenance memory model for C", by Jens Gustedt	2	David Brown
11 Jul20:05	Re: "The provenance memory model for C", by Jens Gustedt	1	BGB