Sujet : Re: Representation of _Bool
De : tr.17687 (at) *nospam* z991.linuxsc.com (Tim Rentsch)
Groupes : comp.lang.cDate : 18. Jan 2025, 21:17:02
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <86o7035135.fsf@linuxsc.com>
References : 1 2 3
User-Agent : Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Keith Thompson <Keith.S.Thompson+
u@gmail.com> writes:
learningcpp1@gmail.com (m137) writes:
>
Hi Keith,
>
Thank you for posting this.
>
The message being referred to is one I posted Sun 2021-05-23, with
Message-ID <87tums515a.fsf@nosuchdomain.example.com>. It's visible on
Google Groups at
<https://groups.google.com/g/comp.lang.c/c/4FUlV_XkmXg/m/OG8WeUCfAwAJ>.
>
As others have suggested, please include attribution information when
posting a followup. You don't need to quote the entire message,
but provide at least some context, particularly when the parent
message is old.
>
This is an update to that message.
>
I noticed that the newer drafts of C23
(N2912 onwards, I think) have replaced the term "trap representation"
with "non-value representation":
- **Trap representation** was last defined in [N2731 3.19.4(1)]
(https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf#page=)
as "an object representation that need not represent a value of the
object type."
- **Non-value representation** is most recently defined in
[N3435 3.26(1)]
(https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3435.pdf#page=23)
as "an object representation that does not represent a value of the
object type."
>
The definition of non-value representation rules out object
representations that represent a value of the object type from
being non-value representations. So it seems to be stricter than
the definition of trap representation, which does not seem to rule
out such object representations from being trap representations.
Is this interpretation correct?
>
I don't believe so. As far as I can tell, a "non-value
representation" (C23 and later) is exactly the same thing as a
"trap representation" (C17 and earlier). The older term was
probably considered unclear, since it could imply that a trap is
required. In fact, reading an object with a trap/non-value
representation has undefined behavior, which can include yielding
the value you might have expected.
>
If so, what happens to the 254 trap representations that GCC and
Clang reserve for `_Bool`?
>
I see no evidence in gcc's documentation that gcc treats
representations other than 0 or 1 as trap/non-value representations.
I see only two references to "trap representation", one for signed
integer types (saying that there are no trap representations) and
one regarding type-punning via unions. There are no relevant
references to "padding bits".
>
I'm less familiar with clang's documentation, but I see no reference
to "trap representation" or "non-value representation".
>
We can get some information about this by running a test program.
See below.
>
Assuming a width of 1, each of those 254
object representations represents a value in `_Bool`'s domain (the
half whose value bit is 1 represents the value `true`, while the
other half whose value bit is 0 represents the value `false`), so
they cannot be thought of as non-value representations (since a
non-value representation must be an object representation that
**does not** represent a value of the object type).
>
Reading an object with a non-value representation has undefined
behavior. If the observed value happens to be a valid value of
the object's type, that's still consistent with undefined
behavior. *Everything* is consistent with undefined behavior.
>
I've been stuck on this for quite some time, so would be grateful
for any guidance you could provide.
>
Editions of the C standard earlier than C23 were not entirely
clear about the representation of _Bool. (C90 does not have _Bool
or bool. C99 through C17 have _Bool as a keyword, with bool as
a macro defined in <stdbool.h>. C23 has bool as a keyword, with
_Bool as an alternate spelling.)
>
In C99 and later, _Bool/bool is required to be an unsigned integer
type large enough to hold the values 0 and 1. Its size must be at
least CHAR_BIT bits (which is at least 8). The *rank* of _Bool is
less than the rank of all other standard integer types.
>
The rank implies that the range of values is a subset of the
range of values of any other unsigned integer type. The rank does
*not* imply anything about relative sizes. unsigned char has a
higher rank than bool, but bool could have additional padding bits
making sizeof(bool)>1. (Probably no implementation does this.)
unsigned char has no padding bits.
>
C11 implies that _Bool can have more than one value bit, which
means it could represent values greater than 1 (but no more than
0..UCHAR_MAX).
>
C23 (I'm using the N3096 draft) tightens the requirements, saying
that bool has exactly one value bit and (sizeof(bool)*CHAR_BIT)-1
padding bits -- again implying that sizeof(bool) might be greater
than 1, but forbidding values greater than 1.
>
Typically in C17 and earlier, and always in C23, _Bool/bool will
have exactly 1 value bit and CHAR_BIT-1 padding bits. Padding bits
do not contribute to the value of an object (so 0 and 1 are the
only possible values), but non-zero padding bits *may or may not*
create trap/non-value representations. (A gratuitously exotic
implementation might use a representation other than 00000001 for
true, but 00000000 is guaranteed to be a representation for 0/false.)
>
As far as I can tell, the standard is silent on whether a bool object
with non-zero padding bits is a trap/non-value representation or not.
There are no conditions other than the rules for how integer
types are represented. As long as those conditions are met an
implementation is free to make any set of object representations
be a trap representation (and I assume that hasn't changed for
C23, not counting the change that the width of _Bool must be
one under C23).
I wrote a test program to explore how bool is treated. It uses
memcpy to set the representation of a bool object and then prints
the value of that object. Source is at the bottom of this message.
>
If bool has no non-value representations, then the values of the
CHAR_BIT-1 padding bits must be ignored when reading a bool object,
and the value of such an object is determined only by its single
value bit, 0 or 1. If it does have non-value representations,
then reading such an object has undefined behavior.
>
With gcc 14.2.0, with "-std=c23", all-zeros is treated as false
when used in a condition and all other representations are treated
as true. Converting the value of a bool object to another integer
type yields the value of its full 8-bit representation. If a bool
object holds a representation other than 00000000 or 00000001,
it compares equal to both `true` and `false`.
>
This implies that bool has 1 value bit and 7 padding bits (as
required by C23) and that it has 2 value representations and 254
trap representations. The observed behavior for the non-value
representations is the result of undefined behavior. (gcc -std=c23
sets __STDC_VERSION__ to 202000L, not 202311L. The documentation
acknowledges that support for C23 is experimental and incomplete.)
>
With clang 19.1.4, with "-std=c23", the behavior is consistent
with bool having no non-value representations. The 7 padding bits
do not contribute to the value of a bool object. Any bool object
with 0 as the low-order bit is treated as false in a condition and
yields 0 when converted to another integer type,. Any bool object
with 1 as the low-order bit is treated as true, and yields 1 when
converted to another integer type. I presume the intent is for bool
to have 256 value representations and no non-value representations
(with the padding bits ignored as required), but it's also consistent
with bool having non-value representations and the observed behavior
being undefined. It's not possible to determine with a test program
whether the output is the result of undefined behavior or not.
>
As far as I can tell, the question of whether bool has non-value
representations is unspecified but not implementation-defined,
meaning that an implementation is not required to document its
choice.
6.2.6.1 paragraph 2 says objects other than bitfields are composed
of contiguous sequences of one or more bytes, the number, order,
and encoding of which are either explicitly specified or
implementation-defined. Which object representations are legal
values and which are non-value/trap representations should be
part of the encoding, and hence implementation defined.
#include <stdio.h>
#include <string.h>
#include <limits.h>
#if __STDC_VERSION__ < 202311L
#include <stdbool.h>
#endif
int main() {
printf("__STDC_VERSION__ = %ldL\n", __STDC_VERSION__);
#if __STDC_VERSION__ < 202311L
puts("Older than C23, using <stdbool.h>");
#else
puts("C23 or later, using bool directly");
#endif
printf("sizeof (unsigned char) = %zu, sizeof (bool) = %zu\n",
sizeof (unsigned char), sizeof (bool));
>
const bool no = false;
const bool yes = true;
unsigned char uc;
memcpy(&uc, &no, 1);
printf("false is represented as %d\n", (int)uc);
memcpy(&uc, &yes, 1);
printf("true is represented as %d\n", (int)uc);
>
for (int i = 0; i <= UCHAR_MAX; i ++) {
const unsigned char uc = i;
bool b;
memcpy(&b, &uc, 1);
const unsigned char value = b;
printf("uc = 0x%02x b = 0x%02x b is %s, b%sfalse, b%strue\n",
(unsigned)uc,
value,
b ? "truthy" : "falsy ",
b == false ? "==" : "!=",
b == true ? "==" : "!=");
}
}
I was surprised to discover that running this program (as C11,
under gcc 8.4.0) with the last 'false' changed to 'no' and the
last 'true' changed to 'yes' gave a different result, namely,
except for value==0 and value==1 there were no "==" for the
b comparisons.