Newsportal USENET - Re: "The provenance memory model for C", by Jens Gustedt

On 7/2/2025 8:10 AM, Kaz Kylheku wrote:

On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>
"This work has finally resulted in the publication of an international
standard, Technical Specification ISO/IEC TS 6010 (edited by Henry
Kleynhans, Bloomberg, UK) ...
OMG, it's a completely idiotic document. What it is is a kind of patch
against a specific version of ISO C, written in plain language rather
than in diff format. Like "replace this paragraph with this one, add
this sentence after that one, ...".
What the actual fuck? How will that be maintainable going forward, first
of all.
You can't follow what this is without applying the patch: obtaining
the exact ISO C standard that it targets and performing the edits.
Almost nobody is going to do that.
Right off the bat I spotted pointless shit in it that has nothing to do
with provenance:
6.4.5 Equality operators
1 In section 6.5.9 Equality operators, add the following after the rst
sentence of paragraph 3:
2 None of the operands shall be an invalid pointer value.
I don't have confidence in an author's understanding of C, if they
believe that ISO C defines the behavior of invalid pointers being
compared, such that this needs to be rectified by a private "patch"
of the text.
The concept of pointer provenance can be expressed other than
as a textual patch against ISO C.
It can be regarded as a language extension and documented similarly
to how a sane compiler documentor would do it.

"In this article, I will try to explain what this is all about, namely
on how a provenance model for pointers interferes with alias analysis of
modern compilers.
Well, no shit; provenance is often dynamic; whereas aliasing analysis
wants to be static.

For those that are not fluent with the terminology or
the concept we have a short intro what pointer aliasing is all about, a
review of existing tools to help the compiler and inherent difficulties
and then the proposed model itself. At the end there is a brief takeaway
that explains how to generally avoid complications and loss of
optimization opportunities that could result from mis-guided aliasing
analysis."
If you think that certain code could go faster because certain suspected
aliasing isn't actually taking place, then since C99 you were able to
spin the roulette wheel and use "restrict".
So the aliasing analysis and its missed opportunities are the
programmer's responsibility.
It's always better for the machine to miss opportunities than to miss
compile. :)

Agreed.
In my compiler, the default was to use a fairly conservative aliasing strategy.
Structure or array loads may be cached, but any "wild store" will flush any cached loads.
Store to a free pointer will invalidate every cached load;
Storing to a struct member will flush any cached loads of this member (from the same struct type);
Storing to an array will flush any cached loads for which non-alias can't be verified;
...
For example:
   x=arr[5];
   arr[10]=w;
   y=arr[5];
The first array load can be reused because it is provable that a store to arr[10] can't effect a load from arr[5].
A store to a pointer, however, would invalidate the load (unless the pointer is marked as restrict or similar).
Array stores and pointer stores could be partially distinguished at this level, in that an array store may not necessarily invalidate stuff, but a pointer store necessarily will.
With pointer operations, all stores can be assumed potentially aliasing unless restrict is used, regardless of type.
Compiler will keep track of which variables have had their addresses taken, with more conservative semantics used in these cases. If you take the address of a variable, or load the address of an array, etc. Then their contents will be assumed to be volatile. Say, for example, the value of a variable may not be held across a pointer store, etc.
If no address has been taken (explicitly) then non-alias may still be assumed.
   int arr[16];
   int *ptr;
   arr[5]=10; //only effects arr
   *ptr=15; //no effect on arr[5] if arr's address is not taken
   x=arr[5];
But, if this exists somewhere:
   ptr2=arr;
Then this changes 'arr', which may no longer cache loads, and "*ptr=15;" would then flush the cached value.
Similarly, casting a struct type to a different type (within a local scope) could be assumed to disallow the ability to make assumptions of non-alias between members of different structs. Though, partly, one can ignore the case of casting "void *" to a struct type.
...
Say:
   Foo *foo;
   Bar *bar;
   x=foo->x;
   bar->y=5;
   ...
"Foo->x" remains cached by default, but:
   bar=(Bar *)foo;
Would locally invalidate the assumption that Foo and Bar do not alias.
Whereas, say:
   ptr=(char *)foo;
Would only necessarily break caching for Foo but not necessarily for Bar.
It varies some as to whether some things are evaluated as local (within the scope of a function) or global (whole program). Things like taking the address of a global variable or array also apply globally in my compiler; ... Though, some of this works mostly because the compiler lacks true separate compilation.
If something is marked volatile, no caching is performed.
But, yeah, a more conservative model allows a lot of the same performance gains as something like TBAA, but without breaking as easily. Ideally, one can use type-casts and pointer-based type punning however they want and not run into issues.
There may be errors here, as I am writing from memory.
...

Date	Sujet	#	Auteur
2 Jul 25	"The provenance memory model for C", by Jens Gustedt	3	Alexis
2 Jul 25	Re: "The provenance memory model for C", by Jens Gustedt	2	Kaz Kylheku
9 Jul03:39	Re: "The provenance memory model for C", by Jens Gustedt	1	BGB