Re: C23 thoughts and opinions

Liste des GroupesRevenir à cl c 
Sujet : Re: C23 thoughts and opinions
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.lang.c
Date : 27. May 2024, 13:42:28
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v31rj5$o20$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0
On 27/05/2024 01:17, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
On 26/05/2024 00:58, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
On 25/05/2024 03:29, Keith Thompson wrote:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
David Brown <david.brown@hesbynett.no> writes:
On 23/05/2024 14:11, bart wrote:
[...]

>
The compiler will generate results /as if/ it had expanded the file to
a list of numbers and parsed them.  But it will not do that in
practice. (At least, not for more serious implementations - simple
solutions might do so to get support implemented quickly.)
 I'll start by acknowledging that the prototype information apparently
*does* optimize #embed when it can.  I was mistaken on that point.
 #embed *must* expand to the standard-defined comma-delimited sequence in
*some* cases.
 Which means that the piece of the compiler that implements #embed has to
recognize when it must generate that sequence, and when it can do
something more efficient.
Yes, exactly.

I'd expect implementations to have extremely fast implementations for
initialising arrays of character types, and probably also for other
arrays of scaler types.  More complicated examples - such as
parameters in a macro or function call - would probably use a
fall-back of generating naïve lists of integer constants.
 My problem is not just with how the compiler can figure out when it can
optimize, but how programmers are supposed to understand whatever rules
it uses.  Can I rely on the optimization being performed if I use a
typedef for unsigned char, or if I use an enumeration type whose
underlying type is unsigned char, or if I have initialization elements
befor and after the #embed directive?
I don't know if that is something the programmer should need to consider, at least for most cases.  Generally as a programmer you don't consider the compilation speed when writing code.  You simply expect that compiler writers try to make their tools as fast as reasonably possible without sacrificing features.  Sometimes there can be particular use-cases where the programmer has to look at the compiler manuals and adapt the code or build procedures to suit.  I think that will be the case here too - compiler manuals should document what types of #embed usage they optimise.  But I think it is unlikely that people writing portable code will do anything other than initialising a const (or constexpr) array of unsigned char if they have big enough files for optimisation to be relevant.  Any compiler that does any #embed optimisation will handle this case.  And even simple #embed implementations will likely be better than any alternatives (such as using xxd).

 Effective use of #embed requires too much "magic" for my taste --
particularly having the preprocessor rely on information from later
phases.  The semantics of #embed don't rely on that information, but
efficient use for large files does.
 
It is a violation of the neat layered (or pipeline) view of C compilation.  But you could argue that this has been broken for decades - you have _Pragma that is syntactically an operator but duplicates preprocessor work, you have compiler pragmas that duplicate command-line flags (and command-line flags that duplicate preprocessor defines), you have pre-compiled headers, you have LTO that passes data multiple times through different parts of the pipeline.

If you have a binary file containing a sequence of int values, you
can
use #embed to initialize an unsigned char array that's aliased with or
copied to the int array.
The *embed element width* is typically going to be CHAR_BIT bits by
default.  It can only be changed by an *implementation-defined* embed
parameter.  It seems odd that there's no standard way to specify the
element width.
It seems even more odd that the embed element width is
implementation defined and not set to CHAR_BIT by default.
>
I agree.  But it may be left flexible for situations where the host
and target have different ideas about CHAR_BIT.  (Targets with
CHAR_BIT other than 8 are very rare, hosts with CHAR_BIT other than 8
are non-existent, but C remains flexible.)
 I would think that you'd want the element width to match CHAR_BIT *on
the target* (which is the only CHAR_BIT that's relevant or available).
If you're cross-compiling, you'd probably want to embed a file that
could have been used on the target system.
Yes, I think so.

 And if I'm not doing that kind of exotic cross-compiling, I can't rely
on the element width being CHAR_BIT *or* on any standard way to specify
that I want it to be CHAR_BIT.
 Requiring the default width to be CHAR_BIT would, I'm guessing, solve
99% of cases.  Allowing it to be specified by a parameter would solve
the remaing 1%.  And I expect it *will* be CHAR_BIT in most or all
implementations, and programmers will rely on that assumption.  I think
the standard should guarantee that.
 
I agree with you.  I'm just trying to think of why the standards might not make that guarantee.

For a very large file, that could be a significant burden.  (I don't
have any numbers on that.)
>
I do :
>
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>
>
(That's from a proposal for #embed for C and C++.  Generating the
numbers and parsing them is akin to using xxd.)
>
More useful links:
>
<https://thephd.dev/embed-the-details#results>
<https://thephd.dev/implementing-embed-c-and-c++>
>
(These are from someone who did a lot of the work for the proposals,
and prototype implementations, as far as I understand it.)
 That second link does have a lot of good information.  I think I had
seen it before, but I hadn't read it thoroughly.  It refers to prototype
implementations for both gcc and clang.  I've built the prototype on my
system, and godbolt.org has it, but the gcc prototype (for which the
article provides good performance data) doesn't seem to be available
anywhere.
 
You are putting a lot more effort into this testing than I have.  For my work, I am generally dependent on "official" toolchain builds - provided by the manufacturers of the microcontrollers we use, or at least by the manufacturers of the cpu cores.  I like to keep track of what's coming - future versions of C or C++, future versions of compilers, etc.  But details such as implementation efficiency (rather than features) don't matter much to me until they are available as part of these pre-built toolchains.  (Sometimes it's fun to try things earlier, and I enjoy playing with newer compilers on godbolt.org, but I don't see testing the speed of #embed to be /so/ much fun that I'd bother building a compiler for it!)
But it's nice to see you've done some independent testing.  I have no particular reason to double "thephd.dev", but no particular reason to consider it authoritative either.

My experiments with the clang prototype have been a bit confusing.  I
assumed that `clang -E` would give me meaningful results, but it always
produces the comma-delimited sequence of integer constants, and even
that output is inconsistent.  It looks like "-E" synthesizes naive and
not entirely correct output.  Feeding that output to clang produces
warnings that I don't get without "-E".  Some of this might be the
result of user error on my part.
 I did some tests with 100MB file, both with #embed and with #include
using the output of "xxd".  #embed *is* much faster.
 According to <https://thephd.dev/implementing-embed-c-and-c++>, it
internally generates __builtin_pp_embed, which takes as arguments the
expected type (always unsigned char for now), the filename as a string
literal, and the data encoded as a base64 string literal.  That's not
going to be as fast as a hypothetical pure binary blob, but apparently
it's still much faster than parsing a comma-delimited sequence.
 I haven't been able to get "clang -E" in the prototype to generate
__builtin_pp_embed, or to get clang to recognize it.  There are internal
things going on that I don't understand.
 The author points out that using binary blobs would break tools that
work with -E preprocessed source files.  If you could assume that the
preprocessed output will be processed only by the same compiler, that
wouldn't be an issue, but apparently that's not a safe assumption.
 The author acknowedges that the prototype implementation doesn't handle
all cases correctly.
That's all good testing results - thanks for reporting them.

>
Prototypes have been made, and they do have such optimisations.  How
things end up in real tools remains to be seen, of course.
 Here's how I personally would have preferred for #embed to be specified:
 - As in current C23 drafts, #embed with no parameters must operate *as
   if* it expanded to a comma-delimited list of integer constant
   expressions.
- With no parameters, both the common cases (initializing an array of
   characters) and odd cases (e.g., initializing a struct object with
   varying types and sizes of members) must work as specified.
- A standard-defined parameter allows control over optimization.
 The parameter can be "optimize(true)" or "optimize(false)".
 "optimize(false)" has no formal effect, but the compiler *should*
generate the canonical sequence of constants.
 "optimize(true)" causes undefined behavior if #embed is used in a
context other than the initialization of an array of character type.
 
I disagree here.  I want the compiler to generate the "as if" results regardless of any optimisation, working as currently specified.  And /if/ the compiler is able to optimise the #embed, then I want it to do so automatically - I see no situation in which I would ever want "optimize(false)".
What would be nice is an optional warning if the #embed size is over a certain limit and it is unable to optimise it - a message telling the user that an array of "unsigned char" would be faster than an array of "signed char", or whatever, would be helpful.  But that kind of thing is definitely implementation-specific.
I'd also like a pre-processor command-line option (again this is clearly implementation-specific) to force non-optimised output from #embed, for use with "gcc -E" (or "clang -E") and third-party tools.

A naive compiler can quietly ignore the optimize() parameter and always
generate the comma-delimited sequence.  An exceedingly clever compiler
could ignore it and always make a correct decision about whether to
optimize #embed.
 Without the optimize parameter, typical compilers are expected to
optimize #embed depending on the context in which it's used, and should
produce the correct results in all cases.  The parameter can be used to
override the compiler's judgement.
 Another possibility might have been to specify that #embed can *only* be
used to initialize an array of character type, and any other use either
has undefined behavior or is a constraint violation.  That would avoid
all the complication of determining from context whether it can be
optimized, and would probably cover 99% of cases.  But it's probably too
late for that.
 
Agreed.
As it is, #embed is complicated because it covers more than the simple case of initialising a const array of unsigned char.  But it can't cover anything like all cases of embedding external data in C programs.  (I have programs with internal web servers - they need to embed all files in a directory, and create an indexing structure.  This is currently all automated by a python script called from the makefile - switching to #embed only would involve manual source changes when files are added or removed.)

Date Sujet#  Auteur
22 May 24 * C23 thoughts and opinions524David Brown
22 May 24 +* Re: C23 thoughts and opinions355Thiago Adams
22 May 24 i+* Re: C23 thoughts and opinions352David Brown
22 May 24 ii+* Re: C23 thoughts and opinions22Thiago Adams
23 May 24 iii`* Re: C23 thoughts and opinions21David Brown
23 May 24 iii `* Re: C23 thoughts and opinions20Thiago Adams
23 May 24 iii  +* Re: C23 thoughts and opinions18David Brown
23 May 24 iii  i`* Re: C23 thoughts and opinions17Thiago Adams
23 May 24 iii  i `* Re: C23 thoughts and opinions16Keith Thompson
24 May 24 iii  i  +- Re: C23 thoughts and opinions1David Brown
24 May 24 iii  i  `* Re: C23 thoughts and opinions14Thiago Adams
24 May 24 iii  i   `* Re: C23 thoughts and opinions13Keith Thompson
24 May 24 iii  i    `* Re: C23 thoughts and opinions12Thiago Adams
24 May 24 iii  i     `* Re: C23 thoughts and opinions11Keith Thompson
25 May 24 iii  i      `* Re: C23 thoughts and opinions10Thiago Adams
25 May 24 iii  i       +* Re: C23 thoughts and opinions4Keith Thompson
25 May 24 iii  i       i`* Re: C23 thoughts and opinions3Thiago Adams
25 May 24 iii  i       i `* Re: C23 thoughts and opinions2David Brown
26 May 24 iii  i       i  `- Re: C23 thoughts and opinions1Keith Thompson
25 May 24 iii  i       `* Re: C23 thoughts and opinions5David Brown
25 May 24 iii  i        `* Re: C23 thoughts and opinions4Thiago Adams
25 May 24 iii  i         +* Re: C23 thoughts and opinions2David Brown
26 May 24 iii  i         i`- Re: C23 thoughts and opinions1bart
6 Jun 24 iii  i         `- Re: C23 thoughts and opinions1Thiago Adams
23 May 24 iii  `- Re: C23 thoughts and opinions1Thiago Adams
23 May 24 ii+* Re: C23 thoughts and opinions323Keith Thompson
23 May 24 iii+* Re: C23 thoughts and opinions313Thiago Adams
23 May 24 iiii`* Re: C23 thoughts and opinions312bart
23 May 24 iiii +* Re: C23 thoughts and opinions309David Brown
23 May 24 iiii i`* Re: C23 thoughts and opinions308Keith Thompson
24 May 24 iiii i +- Re: C23 thoughts and opinions1David Brown
25 May 24 iiii i +* Re: C23 thoughts and opinions305Keith Thompson
25 May 24 iiii i i`* Re: C23 thoughts and opinions304David Brown
26 May 24 iiii i i `* Re: C23 thoughts and opinions303Keith Thompson
26 May 24 iiii i i  +* Re: C23 thoughts and opinions300David Brown
26 May 24 iiii i i  i+* Re: C23 thoughts and opinions17bart
26 May 24 iiii i i  ii`* Re: C23 thoughts and opinions16Michael S
26 May 24 iiii i i  ii `* Re: C23 thoughts and opinions15bart
26 May 24 iiii i i  ii  `* Re: C23 thoughts and opinions14Michael S
26 May 24 iiii i i  ii   +* Re: C23 thoughts and opinions3bart
26 May 24 iiii i i  ii   i`* Re: C23 thoughts and opinions2Michael S
26 May 24 iiii i i  ii   i `- Re: C23 thoughts and opinions1bart
26 May 24 iiii i i  ii   +* Re: C23 thoughts and opinions5Malcolm McLean
26 May 24 iiii i i  ii   i`* Re: C23 thoughts and opinions4Michael S
27 May 24 iiii i i  ii   i `* Re: C23 thoughts and opinions3Lawrence D'Oliveiro
27 May 24 iiii i i  ii   i  +- Re: C23 thoughts and opinions1Chris M. Thomasson
27 May 24 iiii i i  ii   i  `- Re: C23 thoughts and opinions1David Brown
26 May 24 iiii i i  ii   +- Re: C23 thoughts and opinions1Michael S
26 May 24 iiii i i  ii   +- Re: C23 thoughts and opinions1bart
27 May 24 iiii i i  ii   +- Re: C23 thoughts and opinions1Keith Thompson
27 May 24 iiii i i  ii   `* Re: C23 thoughts and opinions2Lawrence D'Oliveiro
27 May 24 iiii i i  ii    `- Re: C23 thoughts and opinions1Michael S
26 May 24 iiii i i  i+- Re: C23 thoughts and opinions1Thiago Adams
27 May 24 iiii i i  i+* Re: C23 thoughts and opinions66Keith Thompson
27 May 24 iiii i i  ii+* Re: C23 thoughts and opinions62David Brown
28 May 24 iiii i i  iii`* Re: C23 thoughts and opinions61Keith Thompson
28 May 24 iiii i i  iii `* Re: C23 thoughts and opinions60David Brown
28 May 24 iiii i i  iii  `* Re: C23 thoughts and opinions59Keith Thompson
28 May 24 iiii i i  iii   +- Re: C23 thoughts and opinions1Michael S
29 May 24 iiii i i  iii   `* Re: C23 thoughts and opinions57David Brown
14 Jun 24 iiii i i  iii    `* Re: C23 thoughts and opinions56Keith Thompson
15 Jun 24 iiii i i  iii     +* Re: C23 thoughts and opinions12bart
15 Jun 24 iiii i i  iii     i`* Re: C23 thoughts and opinions11David Brown
15 Jun 24 iiii i i  iii     i `* Re: C23 thoughts and opinions10bart
16 Jun 24 iiii i i  iii     i  +* Re: C23 thoughts and opinions5Lawrence D'Oliveiro
16 Jun 24 iiii i i  iii     i  i`* Re: C23 thoughts and opinions4bart
16 Jun 24 iiii i i  iii     i  i +- Re: C23 thoughts and opinions1Lawrence D'Oliveiro
16 Jun 24 iiii i i  iii     i  i `* Re: C23 thoughts and opinions2Chris M. Thomasson
17 Jun 24 iiii i i  iii     i  i  `- Re: C23 thoughts and opinions1Lawrence D'Oliveiro
16 Jun 24 iiii i i  iii     i  `* Re: C23 thoughts and opinions4David Brown
16 Jun 24 iiii i i  iii     i   `* Re: C23 thoughts and opinions3bart
17 Jun 24 iiii i i  iii     i    +- Re: C23 thoughts and opinions1David Brown
17 Jun 24 iiii i i  iii     i    `- Re: C23 thoughts and opinions1Michael S
15 Jun 24 iiii i i  iii     +* Re: C23 thoughts and opinions3David Brown
16 Jun 24 iiii i i  iii     i`* Re: C23 thoughts and opinions2Lawrence D'Oliveiro
16 Jun 24 iiii i i  iii     i `- Re: C23 thoughts and opinions1David Brown
17 Jun 24 iiii i i  iii     `* Hex string literals (was Re: C23 thoughts and opinions)40Keith Thompson
17 Jun 24 iiii i i  iii      +* Re: Hex string literals (was Re: C23 thoughts and opinions)20David Brown
18 Jun 24 iiii i i  iii      i+* Re: Hex string literals (was Re: C23 thoughts and opinions)18Keith Thompson
18 Jun 24 iiii i i  iii      ii+* Re: Hex string literals (was Re: C23 thoughts and opinions)2Lawrence D'Oliveiro
18 Jun 24 iiii i i  iii      iii`- Re: Hex string literals (was Re: C23 thoughts and opinions)1Keith Thompson
18 Jun 24 iiii i i  iii      ii`* Re: Hex string literals (was Re: C23 thoughts and opinions)15David Brown
19 Jun 24 iiii i i  iii      ii +* Re: Hex string literals (was Re: C23 thoughts and opinions)6Keith Thompson
19 Jun 24 iiii i i  iii      ii i`* Re: Hex string literals (was Re: C23 thoughts and opinions)5David Brown
19 Jun 24 iiii i i  iii      ii i `* Re: Hex string literals (was Re: C23 thoughts and opinions)4Kaz Kylheku
19 Jun 24 iiii i i  iii      ii i  `* Re: Hex string literals (was Re: C23 thoughts and opinions)3Michael S
19 Jun 24 iiii i i  iii      ii i   +- Re: Hex string literals (was Re: C23 thoughts and opinions)1bart
19 Jun 24 iiii i i  iii      ii i   `- Re: Hex string literals (was Re: C23 thoughts and opinions)1Michael S
19 Jun 24 iiii i i  iii      ii `* Re: Hex string literals (was Re: C23 thoughts and opinions)8Lawrence D'Oliveiro
19 Jun 24 iiii i i  iii      ii  +* Re: Hex string literals (was Re: C23 thoughts and opinions)6David Brown
21 Jun 24 iiii i i  iii      ii  i`* Re: Hex string literals (was Re: C23 thoughts and opinions)5Lawrence D'Oliveiro
21 Jun 24 iiii i i  iii      ii  i +* Re: Hex string literals (was Re: C23 thoughts and opinions)3David Brown
22 Jun 24 iiii i i  iii      ii  i i`* Re: Hex string literals (was Re: C23 thoughts and opinions)2Lawrence D'Oliveiro
22 Jun 24 iiii i i  iii      ii  i i `- Re: Hex string literals (was Re: C23 thoughts and opinions)1David Brown
21 Jun 24 iiii i i  iii      ii  i `- Re: Hex string literals (was Re: C23 thoughts and opinions)1James Kuyper
19 Jun 24 iiii i i  iii      ii  `- Re: Hex string literals (was Re: C23 thoughts and opinions)1Keith Thompson
18 Jun 24 iiii i i  iii      i`- Re: Hex string literals (was Re: C23 thoughts and opinions)1Lawrence D'Oliveiro
17 Jun 24 iiii i i  iii      +* Re: Hex string literals (was Re: C23 thoughts and opinions)5Richard Kettlewell
17 Jun 24 iiii i i  iii      i+- Re: Hex string literals (was Re: C23 thoughts and opinions)1Richard Kettlewell
18 Jun 24 iiii i i  iii      i`* Re: Hex string literals (was Re: C23 thoughts and opinions)3Keith Thompson
18 Jun 24 iiii i i  iii      i +- Re: Hex string literals (was Re: C23 thoughts and opinions)1Lawrence D'Oliveiro
18 Jun 24 iiii i i  iii      i `- Re: Hex string literals (was Re: C23 thoughts and opinions)1Richard Kettlewell
17 Jun 24 iiii i i  iii      `* Re: Hex string literals (was Re: C23 thoughts and opinions)14bart
28 May 24 iiii i i  ii+* Re: C23 thoughts and opinions2Keith Thompson
28 May 24 iiii i i  ii`- Re: C23 thoughts and opinions1Malcolm McLean
27 May 24 iiii i i  i+* Re: C23 thoughts and opinions121Lawrence D'Oliveiro
28 May 24 iiii i i  i`* xxd -i vs DIY Was: C23 thoughts and opinions94Michael S
28 May 24 iiii i i  `* Re: C23 thoughts and opinions2Keith Thompson
12 Jun 24 iiii i `- Re: C23 thoughts and opinions1Bonita Montero
23 May 24 iiii `* Re: C23 thoughts and opinions2Keith Thompson
23 May 24 iii+* Re: C23 thoughts and opinions7Thiago Adams
23 May 24 iii`* Re: C23 thoughts and opinions2David Brown
23 May 24 ii`* Re: C23 thoughts and opinions6Michael S
23 May 24 i`* Re: C23 thoughts and opinions2Lawrence D'Oliveiro
22 May 24 +* Re: C23 thoughts and opinions10Malcolm McLean
22 May 24 +* Re: C23 thoughts and opinions9Chris M. Thomasson
23 May 24 +* Re: C23 thoughts and opinions2Lawrence D'Oliveiro
23 May 24 +* Re: C23 thoughts and opinions14Michael S
23 May 24 +* Re: C23 thoughts and opinions - why so conservative?37Michael S
23 May 24 +* Re: C23 thoughts and opinions94Bonita Montero
25 May 24 `* Re: C23 thoughts and opinions2Thiago Adams

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal