Sujet : Re: The integral type 'byte' (was Re: Suggested method for returning a string from a C program?)
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.lang.cDate : 25. Mar 2025, 10:38:31
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vrttin$321rm$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0
On 24/03/2025 23:57, Keith Thompson wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 21.03.2025 00:10, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
[...]
Look at this one for example:
>
typedef uint8_t byte; // from arduino.h
>
I can only one of reason this exists, which is that 'byte' is a far
nicer denotation.
>
I agree in this case. "byte" documents what the type is intended for.
>
I disagree on both above expressed opinions in more than one way.
>
Byte is a bad term to denote a quantity or an intention. Formerly
a "Byte" was used to carry characters; its size could be anything
from 5 to 9 bit. There was a reason why in international standards
documents there's the 'octet' introduced to unambiguously hint to
an 8-bit quantity. Neither is it good, as we see in practice, to
assume a 'byte' (whatever it actually is) to be able to carry a
character, not even 'char' or 'unsigned char' seem to be able to
accomplish that given the "wide character" types in the context of
Unicode (16 bit, 32 bit) characters and (variable-length) UTF-8
encodings.
"Byte" is a defined term in C. The definition is "addressable
unit of data storage large enough to hold any member of the basic
character set of the execution environment", but other parts of
the standard make it clear that sizeof yields a count of bytes,
that there are CHAR_BIT bits in a byte, that types char, unsigned
char, and signed char are all one byte in size, and that CHAR_BIT
is at least 8 but can be bigger. I'm aware of the history, but if
I defined a "byte" type in C that's what I would mean.
It is IMHO unfortunate that "bytes" and "characters" are conflated
in C. This was done before multi-byte or wide characters were a
thing, but we're stuck with it.
The definition above:
typedef uint8_t byte; // from arduino.h
is IMHO not ideal. Various language rules taken together imply
that uint8_t *either* is exactly one byte *or* does not exist (if
CHAR_BIT>8), but unsigned char is directly specified to be exactly
one byte. But it's system-specific, so I wouldn't worry about it
or advocate changing it.
Personally, I think "typedef uint8_t byte;" /is/ ideal - and much better than "typedef unsigned char byte;" would be. I say that as someone who works in a field with bits and bytes more than ints and doubles.
While the term "byte" has historically be used for different sizes, and has a specific meaning in the C standards, the meaning has changed and solidified over time. Using it to refer to anything other than 8-bit octets is archaic - it's like using the word "computer" to refer to a person who does calculations. That's fine for a history lesson (or the excellent film "Hidden Figures"), but not for practicality. And the use of the C-specific version of the term "byte" in the C standards should be seen as archaic - it is regularly jumbled with "char" and "unsigned char".
There are C implementations for devices that have a "byte" greater than 8 bits. I've used a few of them (DSP devices, rather than dinosaurs). In all their documentation, information, SDK header files, I have never seen "byte" in reference to addressable memory units. Instead, they are considered "word-addressable" devices - where "word" is far more flexible in size. (Some systems have 16-bit words, some 24-bit words, some 32-bit words, etc.) Any reference to "byte" is invariably 8-bit.
Of course, the number of C programmers who ever see or target a device which does not have CHAR_BIT == 8 is tiny. Although there are DSP devices with CHAR_BIT > 8, they are /very/ niche. DSP programming is a highly specialised field, mostly done by small groups in tight cooperation with the manufacturers - and much of it is done with matlab or other such tools, rather than direct C programming. Almost no C code is written with the expectation that it will run on CHAR_BIT == 8 machines and also CHAR_BIT > 8 machines.
Thus pretty much any programmer in the last 50 years sees "byte" as synonymous with 8-bit octet, including C programmers, and for the last 30 years or so it has been the ISO standard definition of the term. Allowing even the possibility of something else - as you would have by typedef'ing unsigned char - is doing the user a disservice. The term "byte", without very clear additional context, should be used /solely/ to refer to 8-bit sizes. So the correct type to use for typedef'ing it would be "uint8_t". Code that uses a type named "byte", when compiled for a target with CHAR_BIT > 8, /should/ fail hard on the compile.