Sujet : Re: "undefined behavior"?
De : Keith.S.Thompson+u (at) *nospam* gmail.com (Keith Thompson)
Groupes : comp.lang.cDate : 14. Jun 2024, 00:39:52
Autres entêtes
Organisation : None to speak of
Message-ID : <87h6dw5s53.fsf@nosuchdomain.example.com>
References : 1 2 3 4
User-Agent : Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
David Brown <
david.brown@hesbynett.no> writes:
On 13/06/2024 00:18, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
I recommend never using "char" as a type unless you really mean a
character, limited to 7-bit ASCII. So if your "outliers" array really
is an array of such characters, "char" is fine. If it is intended to
be numbers and for some reason you specifically want 8-bit values, use
"uint8_t" or "int8_t", and initialise with { 0 }.
[...]
The implementation-definedness of plain char is awkward, but char
arrays generally work just fine for UTF-8 strings.
>
Yes, but "generally work" is not quite as strong as I would like.
Agreed, but we're stuck with it.
My
preference for UTF-8 strings is a const unsigned char type (with C23,
it will be char8_t, which is defined to be the same type as "unsigned
char").
But then you can't use standard library functions (unless you use
pointer conversions).
#include <stdio.h>
int main(void) {
const char *s = "héllo, wörld";
const unsigned char *u = "héllo, wörld";
puts(s);
puts(u); // constraint violation
puts((const char*)u); // valid but ugly
}
Implementations that make plain char signed *have to* deal sanely with
8-bit data. The standard might permit some things to misbehave, but as
a QoI issue it's reasonably safe to assume that it Just Works unless
you're using the DeathStation 9000.
(What happens if you have a platform that uses ones' complement
arithmetic, with "char" being signed and a range of -127 to +127, and
you have a u8"..." string which has a code unit of 0x80 that cannot be
represented in "char" ? It's just a hypothetical question, of
course.)
C23 mandates two's-complement for all integer types.
Ones'-complement implementations are rare, and I don't think any of
them support recent C standards, so "u8"..." is going to be a syntax
error anyway. My guess (and it's nothing more than that) is that
any ones'-complement implementations make plain char unsigned just
to avoid this kind of issue. But even if they don't, a signed byte
with all bits 1 (-0 in ones'-complement) is likely to be treated
as 0xff by I/O functions.
-- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.comvoid Void(void) { Void(); } /* The recursive call of the void */