Keith Thompson <Keith.S.Thompson+
u@gmail.com> writes:
[...]
uc"..." string literals might be made even simpler, for example allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
[...]
*If* hexadecimal string literals were to be added to a future version
of the language, I think I have a syntax that I like better than
what I suggested.
Inspired by the existing syntax for integer and floating-point
hex constants, I propose using a "0x" prefix. 0x"deadbeef" is an
expression of type `const unsigned char[4]` (assuming CHAR_BIT==8),
with values 0xde, 0xad, 0xbe, 0xef in that order. Byte order is
irrelevant; we're specifying byte values in order, not bytes of
the representation of some larger type. memcpy()ing 0x"deadbeef"
to a uint32 might yield either 0xdeadbeef or uxefbeadde (or other
more exotic possibilities).
Again, unlike other string literals, there is no implicit terminating
null byte. And I suggest making them const, since there's no
existing code to break.
If CHAR_BIT==8, each byte is represented by two hex digits. More
generally, each byte is represented by (CHAR_BIT+3)/4 hex digits in
the absence of whitespace. Added whitespace marks the end of a byte,
0x"deadbeef" is 1, 2, 3, or 4 bytes if CHAR_BIT is 32, 16, 12, or 8
respectively, but 0x"de ad be ef" is 4 bytes regardless of CHAR_BIT.
0x"" is a syntax error, since C doesn't support zero-length arrays.
Anything between the quotes other than hex digits and spaces is a
syntax error.
0x"dead beef" is still 4 bytes if CHAR_BIT==8; the space forces the
end of a byte, but the usage of spaces doesn't have to be consistent.
This could be made more flexible by allowing various backslash
escapes, but I'm not inclined to complicate it too much.
Note that the value of a (proposed) hex string literal is not a
string unless it happens to end in zero. I still use the term
"string literal" because it's closely tied to existing string
literal syntax, and existing string literals don't necessarily
represent strings anyway ("embedded\0null\0characters").
Binary string literals 0b"11001001" might also be worth
considering (that's of type `const unsigned char[1]`). Octal
string literals 0"012 345 670" *might* be worth considering.
<
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3193.htm>
proposes a new "0o123" syntax for octal constants; if that's adopted,
I propose allowing 0o"..." and *not" 0"...". I'm not sure whether
to suggest hex only, or doing hex, octal, and binary for the sake
of completeness.
What I'm trying to design here is a more straightforward way to
represent raw (unsigned char[]) data in C code, largely but not
exclusively for use by #embed.
-- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.comvoid Void(void) { Void(); } /* The recursive call of the void */