Re: "The provenance memory model for C", by Jens Gustedt

Liste des GroupesRevenir à cl c  
Sujet : Re: "The provenance memory model for C", by Jens Gustedt
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.lang.c
Date : 10. Jul 2025, 03:28:59
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <104n8he$lb42$1@dont-email.me>
References : 1 2 3 4
User-Agent : Mozilla Thunderbird
On 7/9/2025 4:41 AM, David Brown wrote:
On 09/07/2025 04:39, BGB wrote:
On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>
...
 
>
I don't have confidence in an author's understanding of C, if they
believe that ISO C defines the behavior of invalid pointers being
compared, such that this needs to be rectified by a private "patch"
of the text.
>
 You might not be aware of it, but the author Jens Gustedt is a member of the C standards committee, and has been for some time.  He is the most vocal, public and active member.  I think that suggests he has quite a good understanding of C and the ISO standards!  Not everyone agrees about his ideas and suggestions about how to move C forward - but that's fine (and it's fine by Jens, from what I have read).  That's why there is a standards committee, with voting, rather than a BDFL.
 
The concept of pointer provenance can be expressed other than
as a textual patch against ISO C.
>
 There have been plenty of papers and blogs written about pointer provenance (several by Gustedt) and how it could work.  It's not a very easy thing to follow in any format.  A patch to current C standards is perhaps the least easy to follow, but it is important for how the concept could be added to C.
 
Admittedly, as of yet, I haven't quite figured out what exactly provenance is supposed to be, or how it is supposed to work in practice.

It can be regarded as a language extension and documented similarly
to how a sane compiler documentor would do it.
>
"In this article, I will try to explain what this is all about, namely
on how a provenance model for pointers interferes with alias analysis of
modern compilers.
>
Well, no shit; provenance is often dynamic; whereas aliasing analysis
wants to be static.
>
For those that are not fluent with the terminology or
the concept we have a short intro what pointer aliasing is all about, a
review of existing tools to help the compiler and inherent difficulties
and then the proposed model itself. At the end there is a brief takeaway
that explains how to generally avoid complications and loss of
optimization opportunities that could result from mis-guided aliasing
analysis."
>
If you think that certain code could go faster because certain suspected
aliasing isn't actually taking place, then since C99 you were able to
spin the roulette wheel and use "restrict".
>
 "restrict" can certainly be useful in some cases.  There are also dozens of compiler extensions (such as gcc attributes) for giving the compiler extra information about aliasing.
 
And, the annoyance of them being compiler dependent...

So the aliasing analysis and its missed opportunities are the
programmer's responsibility.
>
It's always better for the machine to miss opportunities than to miss
compile. :)
>
>
Agreed.
 It is always better for the toolchain to be able to optimise automatically than to require manual intervention by the programmer. (It should go without saying that optimisations are only valid if they do not affect the observable behaviour of correct code.)  Programmers are notoriously bad at figuring out what will affect their code efficiency, and will either under-use "restrict" where it could clearly be safely used to speed up code, or over-use it resulting in risky code.
 If the compiler can't be sure that accesses don't alias, then of course it should assume that aliasing is possible.
 The idea of pointer provenance is to let compilers (and programmers!) have a better understanding of when accesses are guaranteed to be alias- free, when they are guaranteed to be aliasing, and when there are no guarantees.  This is useful for optimisation and program analysis (including static error checking).  The more information the compiler has, the better.
 
That is the idea at least.
Though, if one assumes the compiler has non-local visibility, this is a problem.
Granted, as long as one can keep using more traditional semantics, probably OK.

>
In my compiler, the default was to use a fairly conservative aliasing strategy.
>
...
With pointer operations, all stores can be assumed potentially aliasing unless restrict is used, regardless of type.
>
 C does not require that.  And it is rare in practice, IME, for code to actually need to access the same data through different lvalue types (other than unsigned char).  It is rarer still for it not to be handled better using type-punning unions or memcpy() - assuming the compiler handles memcpy() decently.
 
I take a conservative approach because I want the compiler to be able to run code that assumes traditional behavior (like that typical of 1990s era compilers, or MSVC).
Granted, it is a tradeoff that a lot of this code needs to be modified to work on GCC and Clang (absent the usual need for "-fwrapv -fno-strict-aliasing" options).
Granted, there is a command-line option to enable TBAA semantics, just it is not the default option in this case (so, in BGBCC, TBAA is opt-in; rather than opt-out in GCC and Clang).
BGBCC's handling of memcpy is intermediate:
It can turn it into loads and stores;
But, it can't turn it into a plain register move;
Taking the address of a variable will also cause the variable to be loaded/stored every time it is accessed in this function (regardless of where it is accessed in said function).
So:
   memcpy(&i, &f, 8);
Will still use memory ops and wreck the performance of both the i and f variables.
Meanwhile:
   i=*(uitn64_t *)(&f);
Will only wreck the performance of 'f'.
The best option for performance in BGBCC is one of either:
   i=__float64_getbits(f);  //compiler intrinsic
   i=(__m64)f;              //__m64 and __m128 do a raw-bits cast.
Though, these options don't exist in the other compilers.
Implicitly, casting via __m64 or __m128 is a double-cast though. In BGBCC, these types don't natively support any operators (so, they are basically sort of like the value-equivalents of "void *").
So:
   memcpy(&i, &f, 8);      //best for GCC and Clang
   i=*(uitn64_t *)(&f);   //best for MSVC, error-prone in GCC
   i=(__m64)f;             //best for BGBCC, N/A for MSVC or GCC
In a lot of cases, these end up with wrappers.
GCC:
   static inline uitn64_t getU64(void *ptr)
   {
     uitn64_t v;
     memcpy(&v, ptr, 8);
     return(v);
   }
MSVC or BGBCC:
   #define getU64(ptr)  (*((volatile uint64_t *)(ptr)))
Though, have noted that volatile usually works in GCC as well, though in GCC there is no obvious performance difference between volatile and memcpy, whereas in MSVC the use of a volatile cast is faster.
Don't want to use static inline functions in BGBCC though, as it still doesn't support inline functions in the general case.
Though, a lot of the 90s era code I run doesn't assume inline functions either, but instead more often uses big macros:
   #define foo(x, y) \
   do { \
     int z = x+y; \
     ... \
   } while(0);
But, ironically a few cases replaced these macros with functions, as the function-call overhead was less than the hassle of having bulky blobs of code being duplicated inline each time (and in this case, the function call overhead isn't too unreasonable).

Equally, this means that using type-based alias analysis generally gives only small efficiency benefits in C code (but more in C++).  The majority of situations where alias analysis and a compiler knowledge of no aliasing (or always aliasing) would make a difference, are between pointers or other lvalues of compatible types.  That is why provenance tracking can have potentially significant benefits.
 
But, the "tracking" part isn't great.
It implies potentially needing to be able to figure out where the value came from by walking backwards across the control flow graph (or, worse yet, the call graph). There be dragons there...
My model doesn't need tracking, merely keeping track of the relevant status flags and similar.
   Did variable 'a' ever have its address taken?
   Was type T ever cast to a different type?
   ...
it is much like the compiler may also keep track of different things, like for example, whether types like "__int128" or "__float128" were used, which operators were used on them, etc.
Decided to leave out talking about past language/compiler efforts which ran into trouble in these areas.
But, preferable IMO to try to keep things simple.
Which ideally means avoiding things more complicated than what can be managed by setting flags or similar.
Even if, yeah, the current strategy also has some drawbacks.
Granted, there are some more complicated things in the compiler, like code for tracking the first and last use of a given "version" of a variable. Where, every time a variable is assigned, it gets a new version; and references to a variable track which version it is accessing.
But, this is relevant some to things like register allocator decisions.
If the current variable version isn't used again, it can be evicted;
If it isn't used as an input in the reachable basic-blocks, its current value can also be discarded (possibly saving a memory store).
...

Date Sujet#  Auteur
2 Jul 25 * "The provenance memory model for C", by Jens Gustedt9Alexis
2 Jul 25 `* Re: "The provenance memory model for C", by Jens Gustedt8Kaz Kylheku
9 Jul03:39  `* Re: "The provenance memory model for C", by Jens Gustedt7BGB
9 Jul10:41   `* Re: "The provenance memory model for C", by Jens Gustedt6David Brown
10 Jul03:28    `* Re: "The provenance memory model for C", by Jens Gustedt5BGB
10 Jul10:34     `* Re: "The provenance memory model for C", by Jens Gustedt4David Brown
11 Jul03:09      `* Re: "The provenance memory model for C", by Jens Gustedt3BGB
11 Jul09:48       `* Re: "The provenance memory model for C", by Jens Gustedt2David Brown
11 Jul20:05        `- Re: "The provenance memory model for C", by Jens Gustedt1BGB

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal