Sujet : Re: C23 thoughts and opinions
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.lang.cDate : 29. May 2024, 09:02:49
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v36nf9$12bei$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0
On 28/05/2024 22:21, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
On 28/05/2024 02:33, Keith Thompson wrote:
[...]
Without some kind of programmer control, I'm concerned that the rules
for defining an array so #embed will be correctly optimized will be
spread as lore rather than being specified anywhere.
>
They might, but I really do not think that is so important, since they
will not affect the generated results.
Right, it won't affect the generated results (assuming I use it
correctly). Unless I use `#embed optimize(true)` to initialize
a struct with varying member sizes, but that's my fault because I
asked for it.
I am still not understanding your point. (I am confident that you have a point, even if I don't get it.)
I cannot see why there would be any need or use of manually adding optimisation hints or controls in the source code. I cannot see why the there is any possibility of getting incorrect results in any way.
The point is compile-timer performance, and perhaps even the ability
to compile at all.
I'm thinking about hypothetical cases where I want to embed a
*very* large file and parsing the comma-delimited sequence could
have unacceptable compile-time performance, perhaps even causing
a compile-time stack overflow depending on how the parser works.
Every time the compiler sees #embed, it has to decide whether to
optimize it or not, and the decision criteria are not specified
anywhere (not at all in the standard, perhaps not clearly in the
compiler's documentation).
Yes, I agree with that. And this is how it should be - this is not something that should be specified. The C standards give minimum requirements for things like the number of identifiers or the length of lines. But pretty much all compilers, for most of the "translation limits", say they are "limited by the memory of the host computer". The same will apply to #embed. And some compilers will cope better than others with huge #embed's, some will be faster, some more memory efficient. Some will change from version to version. This is not something that can sensibly be specified or formalized - like pretty much everything in regard to compilation time, each compiler does the best it can without any specifications. I'd expect compiler reference manuals might have hints, such as saying #embed is fastest with unsigned char arrays (or whatever), but no more than that.
But again - I see no reason for manual optimisation hints, and no reason for any possible errors.
Let me outline a possible strategy for a compiler like gcc. (I have not looked at the prototype implementations from thephd, nor any gcc developer discussions.)
gcc splits the C pre-processor and the compiler itself, and (currently) communicates dataflow in only one direction, via a temporary file or a pipe. But the "gcc" (or "g++", according to preference) driver program calls and coordinates the two programs.
If the pre-processor is called stand-alone, then it will generate a comma-separated list of integers, helpfully split over multiple lines of reasonable size. This will clearly always be correct, and always work, within limits of a compiler's translation limits.
But when the gcc driver calls it, it will have a flag indicating that the target compiler is gcc and supports an extended pre-processed syntax (and also that the source is C23 - after all, the C pre-processor can be used as a macro processor for other files with no relation to C). Now the pre-processor has a lot more freedom. Whenever it meets an #embed directive, it can generate a line :
#embed_data 123456
followed in the file by 123456 (or whatever) bytes of binary data. The C compiler, when parsing this file, will pull that in as a single blob. Then it is up to the C compiler - which knows how the #embed data will be used - to tell if the these bytes should be used as parameters to a macro, initialisation for a char array, or whatever. And it can use them as efficiently as practically possible. (It is probably only worth using this for #embed data over a certain size - smaller #embed's could just generate the integer sequences.)
Nowhere in this is there any call of manual optimisation hints, nor any risk of incorrect results.