Liste des Groupes | Revenir à cl c |
On 13/08/2024 21:33, Keith Thompson wrote:By that you mean the Unicode index. But you say elsewhere that everything in your source code is UTF8.Bart<bc@freeuk.com> writes:"An integer character constant has type int. The value of an integer character constant containing
[...]What exactly do you mean by multi-byte characters? Is it a literalIt's a character constant of type int with an implementation-defined
such as 'ABCD'?
>
I've no idea what C makes of that,
value. Read the section on "Character constants" in the C standard
(6.4.4.4 in C17).
>
(With gcc, its value is 0x41424344, but other compilers can and do
behave differently.)
>
We discussed this at some length several years ago.
>
[...]
a single character that maps to a single value in the literal encoding (6.2.9) is the numerical value
of the representation of the mapped character in the literal encoding interpreted as an integer.
The value of an integer character constant containing more than one character (e.g. ’ab’), or
containing a character or escape sequence that does not map to a single value in the literal encoding,
is implementation-defined. If an integer character constant contains a single character or escape
sequence, its value is the one that results when an object with type char whose value is that of the
single character or escape sequence is converted to type int."
I am suggesting the define this:
"The value of an integer character constant containing more than one character (e.g. ’ab’), or containing a character or escape sequence that does not map to a single value in the literal encoding, is implementation-defined."
How?
First, all source code should be utf8.
Then I am suggesting we first decode the bytes.
For instance, '×' is encoded with 195 and 151. We consume these 2 bytes and the utf8 decoded value is 215.
Then this is the defined behaviorThis is where you need to decide whether the integer value within '...', AT RUNTIME, represents the Unicode index or the UTF8 sequence.
static_assert('×' == 215)
Les messages affichés proviennent d'usenet.