Liste des Groupes | Revenir à cl c |
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:I like your suggestion here. It's very similar to mine, though with a prefix 0x"..." rather than b"...". I'd be fine with either.
[...]uc"..." string literals might be made even simpler, for example allowing[...]
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
*If* hexadecimal string literals were to be added to a future version
of the language, I think I have a syntax that I like better than
what I suggested.
Inspired by the existing syntax for integer and floating-pointFair enough.
hex constants, I propose using a "0x" prefix. 0x"deadbeef" is an
expression of type `const unsigned char[4]` (assuming CHAR_BIT==8),
with values 0xde, 0xad, 0xbe, 0xef in that order. Byte order is
irrelevant; we're specifying byte values in order, not bytes of
the representation of some larger type. memcpy()ing 0x"deadbeef"
to a uint32 might yield either 0xdeadbeef or uxefbeadde (or other
more exotic possibilities).
Again, unlike other string literals, there is no implicit terminating
null byte. And I suggest making them const, since there's no
existing code to break.
If CHAR_BIT==8, each byte is represented by two hex digits. More
generally, each byte is represented by (CHAR_BIT+3)/4 hex digits in
the absence of whitespace. Added whitespace marks the end of a byte,
0x"deadbeef" is 1, 2, 3, or 4 bytes if CHAR_BIT is 32, 16, 12, or 8
respectively, but 0x"de ad be ef" is 4 bytes regardless of CHAR_BIT.
0x"" is a syntax error, since C doesn't support zero-length arrays.
Anything between the quotes other than hex digits and spaces is a
syntax error.
0x"dead beef" is still 4 bytes if CHAR_BIT==8; the space forces theI would /definitely/ vote against any kind of backslash escapes here. That would mess up the simplicity of the syntax.
end of a byte, but the usage of spaces doesn't have to be consistent.
This could be made more flexible by allowing various backslash
escapes, but I'm not inclined to complicate it too much.
Note that the value of a (proposed) hex string literal is not aThat is /highly/ unlikely to be useful. I work in the field that uses binary more than anywhere else, and where compilers have supported 0b11001001 format for binary literals from /long/ before they reached the C standards - and I have very rarely seen them in practice. When you do see them, they are in isolation - no one will write enough binary values in a row for such a format to be useful. Hex strings are potentially useful because you are cutting { 0x12, 0x34, 0x45, 0x67 } to 0x"12344567", which is a fair bit more compact. For binary, the compaction is irrelevant and indeed counter-productive - binary literals became a lot more practical with the introduction of digit separators. (For standard C, these are from C23, but for C++ they came in C++14, and compilers have supported them as extensions in C.)
string unless it happens to end in zero. I still use the term
"string literal" because it's closely tied to existing string
literal syntax, and existing string literals don't necessarily
represent strings anyway ("embedded\0null\0characters").
Binary string literals 0b"11001001" might also be worth
considering (that's of type `const unsigned char[1]`).
OctalMost situations where octal could be useful died out many decades ago - it is vastly more likely that "012" is intended to mean 12 than 10. No serious programming language supports a leading 0 as an indication of octal unless they are forced to do so by backwards compatibility, and many that used to support them have dropped them.
string literals 0"012 345 670" *might* be worth considering.
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3193.htm>Binary support is useless, and octal support would be worse than useless - even using an 0o rather than 0 prefix. Completeness is not a justification for repeating old mistakes or complicating a good idea with features that will never be used.
proposes a new "0o123" syntax for octal constants; if that's adopted,
I propose allowing 0o"..." and *not" 0"...". I'm not sure whether
to suggest hex only, or doing hex, octal, and binary for the sake
of completeness.
What I'm trying to design here is a more straightforward way toPersonally, I'd see it as useful when /not/ using #embed. I really do not think programmers will care what format #embed uses. I don't share your concerns about efficiency of implementation, or that programmers need to know when it is efficient or not. In almost all circumstances, C programmers never see or need to think about a separation between a C preprocessor and a post-processed C compiler - they are seen as a single entity, and can use whatever format is convenient between them. And once you ignore the implementation details, which are an SEP, the way #embed is defined is better than a definition using these new hex blob strings.
represent raw (unsigned char[]) data in C code, largely but not
exclusively for use by #embed.
Les messages affichés proviennent d'usenet.