Liste des Groupes | Revenir à cl c |
David Brown <david.brown@hesbynett.no> writes:Yes, but "generally work" is not quite as strong as I would like. My preference for UTF-8 strings is a const unsigned char type (with C23, it will be char8_t, which is defined to be the same type as "unsigned char"). But u8"Hello, world" UTF-8 string literals (since C11) are considered to be like an array of type "char" in C (until C23), so I guess UTF-8 strings will be safe in plain char arrays. Still, the bytes in a UTF-8 strings are code units with values between 0 and 255, so I prefer to store these in a type that can hold that range of values.
[...]I recommend never using "char" as a type unless you really mean a[...]
character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to
be numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.
The implementation-definedness of plain char is awkward, but char
arrays generally work just fine for UTF-8 strings.
If char isI would also prefer that, but too much existing code relies on plain char being signed on the platforms it runs on. I personally think the idea of having signed or unsigned characters is a very poor choice of names for the terms, but it's way too late to change that! C23 has "char8_t" which is always unsigned.
signed, byte values greater than 127 will be stored as negative
values, but it will almost certainly just work (if your system
is configured to handle UTF-8). Likewise for Latin-1 and similar
8-bit character sets.
The standard string functions operate on arrays of plain char, so
storing UTF-8 strings in arrays of uint8_t or unsigned char will
seriously restrict what you can do with them.
(I'd like to a future standard require plain char to be unsigned,
but I don't know how likely that is.)
Les messages affichés proviennent d'usenet.