Newsportal USENET - Re: "The provenance memory model for C", by Jens Gustedt

On 11/07/2025 04:09, BGB wrote:

On 7/10/2025 4:34 AM, David Brown wrote:
On 10/07/2025 04:28, BGB wrote:
On 7/9/2025 4:41 AM, David Brown wrote:
On 09/07/2025 04:39, BGB wrote:
On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>
...

>
Please don't call this "traditional behaviour" of compilers - be honest, and call it limited optimisation and dumb translation. And don't call it "code that assumes traditional behaviour" - call it "code written by people who don't really understand the language". Code which assumes you can do "extern float x; unsigned int * p = (unsigned int *) &x;" is broken code. It always has been, and always will be - even if it does what the programmer wanted on old or limited compilers.
>
There were compilers in the 1990's that did type-based alias analysis, and many other "modern" optimisations - I have used at least one.
>
Either way, MSVC mostly accepts this sorta code.

I remember reading in a MSVC blog somewhere that they had no plans to introduce type-based alias analysis in the compiler. The same blog article announced their advanced new optimisations that treat signed integer overflow as undefined behaviour and explained that they'd being doing that for years in a few specific cases. I think it is fair to assume there is a strong overlap between the programmers who think MSVC, or C and C++ in general, have two's complement wrapping of signed integers when the hardware supports it, as those who think pointer casts let you access any data.
And despite the blog, I don't believe MSVC will be restricted that way indefinitely. After all, they encourage the use of clang/llvm for C programming, and that does do type-based alias analysis and optimisation.
The C world is littered with code that "used to work" or "works when optimisation is not used" because it relied on shite like this - unwarranted assumptions about limitations in compiler technology.

Also I think a lot of this code was originally written for compilers like Watcom C and similar.
Have noted that there are some behavioral inconsistencies, for example:
Some old code seems to assumes that x<<y, y always shifts left but modulo to the width of the type. Except, when both x and y are constant, code seems to expect it as if it were calculated with a wider type, and where negative shifts go in the opposite direction, ... with the result then being converted to the final type.
Meanwhile, IIRC, GCC and Clang raise an error if trying to do a large or negative shift. MSVC will warn if the shift is large or negative.
Though, in most cases, if the shift is larger than the width of the type, or negative, it is usually a programming error.

It's okay to be conservative in a compiler (especially when high optimisation is really difficult!). It's okay to have command-line switches or pragmas to support additional language semantics such as supporting access via any lvalue type, or giving signed integer arithmetic two's complement wrapping behaviour. It's okay to make these the defaults.
>
But it is not okay to encourage code to make these compiler-specific assumptions without things like a pre-processor check for the specific compiler and pragmas to explicitly set the required compiler switches. It is not okay to excuse bad code as "traditional style" - that's an insult to people who have been writing good C code for decades.
>
A lot of the code I have seen from the 90s was written this way.

Yes. A lot code from the 90's was written badly. A lot of code today is written badly. Just because a lot of code was, and still is, written that way does not stop it being bad code.

Though, a lot of it comes from a few major sources:
   id Software;
     Can mostly be considered "standard" practice,
     along with maybe Linux kernel, ...
   Apogee Software
     Well, some of this code is kinda bad.
     This code tends to be dominated by global variables.
     Also treating array bounds as merely a suggestion.
   Raven Software
     Though, most of this was merely modified ID Software code.
Early on, I think I also looked a fair bit at the Linux kernel, and also some of the GNU shell utilities and similar (though, the "style" was very different vs either the Linux kernel or ID code).

The Linux kernel is not a C style to aspire to. But they do at least try to make such assumptions explicit - the kernel build process makes it very clear that it requires the "-fno-strict-aliasing" flag and can only be correctly compiled by a specific range of gcc versions (and I think experimentally, icc and clang). Low-level and systems programming is sometimes very dependent on the details of the targets, or the details of particular compilers - that's okay, as long as it is clear in the code and the build instructions. Then the code (or part of it at least) is not written in standard C, but in gcc-specific C or some other non-standard dialect. It is not, however, "traditional C".

Early on, I had learned C partly by tinkering around with id's code and trying to understand what secrets it contained.
But, alas, an example from Wikipedia shows a relevant aspect of id's style:
https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code
Which is, at least to me, what I consider "traditional".

The declaration of all the variables at the top of the function is "traditional". The reliance on a specific format for floating point is system-dependent code (albeit one that works on a great many systems). The use of "long" for a 32-bit integer is both "traditional" /and/ system-dependent. (Though it is possible that earlier in the code there are pre-processor checks on the size of "long".) The use of signed integer types for bit manipulation is somewhere between "traditional" and "wrong". The use of pointer casts instead of a type-punning union is wrong. The lack of documentation and comments, use of an unexplained magic number, and failure to document or comment the range for which the algorithm works and its accuracy limitations are also very traditional - a programming tradition that remains strong today.
It is worth remembering that game code (especially commercial game code) is seldom written with a view to portability, standard correctness, or future maintainability. It is written to be as fast as possible using the compiler chosen at the time, to be build and released as a binary in the shortest possible time-to-market.

>
So:
memcpy(&i, &f, 8);
Will still use memory ops and wreck the performance of both the i and f variables.
>
Well, there you have scope for some useful optimisations (more useful than type-based alias analysis). memcpy does not need to use memory accesses unless real memory accesses are actually needed to give the observable effects specified in the C standards.
>
Possibly, but by the stage we know that it could be turned into a reg-reg move (in the final code generation), most of the damage has already been done.
Basically, it would likely be necessary to detect and special case this scenario at the AST level(probably by turning it into a cast or intrinsic). But, usually one doesn't want to add too much of this sort of cruft to the AST walk.

One thing to remember is that functions like "memcpy" don't have to be treated as normal functions. You can handle it as a keyword in your compiler if that's easiest. You can declare it as a macro in your <strings.h>. You can combine these, and have compiler-specific extensions (keywords, attributes, whatever) and have the declaration as a function with attributes. Your key aim is to spot cases where there is a small compile-time constant on the size of the memcpy.

But, then, apart from code written to assume GCC or similar, most of the code doesn't use memcpy in this way.
So, it would mostly only bring significant advantage if pulling code in from GCC land.

How well do you handle type-punning unions? Do they need to be moved out to the stack, or can they be handled in registers?

unsigned int f_to_u(float f) {
     unsigned int u;
     memcpy(&u, &f, sizeof(f));
     return u;
}
>
gcc compiles that to :
>
f_to_u:
     movd eax, xmm0
     ret
>
Yeah, it is more clever here, granted.

Meanwhile:
   i=*(uitn64_t *)(&f);
Will only wreck the performance of 'f'.
>
>
The best option for performance in BGBCC is one of either:
   i=__float64_getbits(f); //compiler intrinsic
   i=(__m64)f;              //__m64 and __m128 do a raw-bits cast.
>
Though, these options don't exist in the other compilers.
>
Such compiler extensions can definitely be useful, but it's even better if a compiler can optimise standard code - that way, programmers can write code that works correctly on any compiler and is efficient on the compilers that they are most interested in.
>
Possibly.
For "semi-portable" code, usually used MSVC style, partly as by adding 'volatile' it seemingly also works in GCC. Though, often with macro wrappers.

Code that has to be widely portable, with an aim to being efficient on many compilers and correct on all, always ends up with macro wrappers for this kind of thing, defined conditionally according to compiler detection.

>
Implicitly, casting via __m64 or __m128 is a double-cast though. In BGBCC, these types don't natively support any operators (so, they are basically sort of like the value-equivalents of "void *").
>
>
So:
   memcpy(&i, &f, 8);      //best for GCC and Clang
   i=*(uitn64_t *)(&f);   //best for MSVC, error-prone in GCC
   i=(__m64)f;             //best for BGBCC, N/A for MSVC or GCC
>
In a lot of cases, these end up with wrappers.
>
GCC:
   static inline uitn64_t getU64(void *ptr)
   {
     uitn64_t v;
     memcpy(&v, ptr, 8);
     return(v);
   }
MSVC or BGBCC:
   #define getU64(ptr) (*((volatile uint64_t *)(ptr)))
>
Though, have noted that volatile usually works in GCC as well, though in GCC there is no obvious performance difference between volatile and memcpy, whereas in MSVC the use of a volatile cast is faster.
>
In gcc, a memcpy here will need to use a single memory read unless "getU64" is called with the address of a variable that is already in a register (in which case you get a single register move instruction). A volatile read will also do a single memory read - but it might hinder other optimisations by limiting the movement of code around.
>
Possibly.
When I tried benchmarking these before:
   GCC:
     Seemingly no difference between memcpy and volatile;

As I explained, that is to be expected in cases where the you can't get other optimisations that "volatile" would block. Usually simple timing benchmarks have fewer optimisation opportunities than real code.

MSVC:
Adding or removing volatile made no real difference;

That will, of course, depend on the benchmark. A volatile access will not normally take more time than a non-volatile access. But non-volatile accesses can be re-ordered, combined, or omitted in ways that volatile accesses cannot.

Using memcpy is slower.

As I explained.

   BGBCC: Either memcpy or volatile carries an overhead.
     The use of volatile is basically a shotgun de-optimization;
     If doesn't know what to de-optimize, so goes naive for everything.

Okay.

On MSVC, last I saw (which is a long time ago), any use of "memcpy" will be done using an external library function (in an DLL) for generic memcpy() use - clearly that will have /massive/ overhead in comparison to the single memory read needed for a volatile access.
>
It is slightly more clever now, but still not great.
   Will not (always) generate a library call.
   Though, in VS2008 or similar, was always still a library call.
     VS2010 and VS2013 IIRC might setup and use "REP MOVSB" instead.
It will do it inline, but still often:
   Spill variables;
   Load addresses;
   Load from source;
   Store to destination;
   Load value from destination.
What BGBCC gives here is basically similar.

>
>
Don't want to use static inline functions in BGBCC though, as it still doesn't support inline functions in the general case.
>
>

Date	Sujet	#	Auteur
2 Jul 25	"The provenance memory model for C", by Jens Gustedt	9	Alexis
2 Jul 25	Re: "The provenance memory model for C", by Jens Gustedt	8	Kaz Kylheku
9 Jul03:39	Re: "The provenance memory model for C", by Jens Gustedt	7	BGB
9 Jul10:41	Re: "The provenance memory model for C", by Jens Gustedt	6	David Brown
10 Jul03:28	Re: "The provenance memory model for C", by Jens Gustedt	5	BGB
10 Jul10:34	Re: "The provenance memory model for C", by Jens Gustedt	4	David Brown
11 Jul03:09	Re: "The provenance memory model for C", by Jens Gustedt	3	BGB
11 Jul09:48	Re: "The provenance memory model for C", by Jens Gustedt	2	David Brown
11 Jul20:05	Re: "The provenance memory model for C", by Jens Gustedt	1	BGB