Re: multi bytes character - how to make it defined behavior?

Liste des GroupesRevenir à l c 
Sujet : Re: multi bytes character - how to make it defined behavior?
De : richard (at) *nospam* damon-family.org (Richard Damon)
Groupes : comp.lang.c
Date : 14. Aug 2024, 05:44:24
Autres entêtes
Organisation : i2pn2 (i2pn.org)
Message-ID : <1ffb2244967a28423c968f4b4a9fec5a2553f356@i2pn2.org>
References : 1
User-Agent : Mozilla Thunderbird
On 8/13/24 10:45 AM, Thiago Adams wrote:
static_assert('×' == 50071);
 GCC -  warning multi byte
CLANG - error character too large
 I think instead of "multi bytes" we need "multi characters" - not bytes.
 We decode utf8 then we have the character to decide if it is multi char or not.
 decoding '×' would consume bytes 195 and 151 the result is the decoded Unicode value of 215.
 It is not multi byte : 256*195 + 151 = 50071
 O the other hand 'ab' is "multi character" resulting
 256 * 'a' + 'b' = 256*97+98= 24930
 One consequence is that
 'ab' == '𤤰'
 But I don't think this is a problem. At least everything is defined.
When you use the single quotes by themselves ('), you are specifying characters in the narrow character set, typically ASCII, but might be some other 8-bit character encoding. It can not specify extended character beyond those.
You can (if the implementation allows it) place multiple characters in the constant to get an integer value with those characters packed.
When you use the double quotes by themselves ("), you are specifying a string of these narrow characters, although this form might allow for multi-byte encodings of some characters, like is done with UTF-8.
You can specifiy wide character constants by the syntax of L'x', u'x', or U'x'.
L'x' will give you what ever the inplementation calls its "wide character set". This MIGHT be UCS-2/UTF-16 or UCS-4/UTF-32 encoded, but doesn't need to be.
The u'x' form will always be USC-2/UTF-16, and U'x' will always be UCS-4/UTF-32
Like the plain 'x' form, the results from a single character, can not be a multi-unit value, so u'x' can't generate a two surrogate pairs for a single source character.
Change the ' to a " and you get wide strings, just like the characters, but now u"xx" and L"xx" can generate charaters that use surrogate pairs (or other multi-part encodings for L"xxx")

Date Sujet#  Auteur
13 Aug 24 * multi bytes character - how to make it defined behavior?19Thiago Adams
14 Aug 24 +* Re: multi bytes character - how to make it defined behavior?16Bart
14 Aug 24 i`* Re: multi bytes character - how to make it defined behavior?15Keith Thompson
14 Aug 24 i `* Re: multi bytes character - how to make it defined behavior?14Thiago Adams
14 Aug 24 i  `* Re: multi bytes character - how to make it defined behavior?13Bart
14 Aug 24 i   +* Re: multi bytes character - how to make it defined behavior?11Thiago Adams
14 Aug 24 i   i+* Re: multi bytes character - how to make it defined behavior?9Bart
14 Aug 24 i   ii`* Re: multi bytes character - how to make it defined behavior?8Thiago Adams
14 Aug 24 i   ii +- Re: multi bytes character - how to make it defined behavior?1Thiago Adams
14 Aug 24 i   ii +* Re: multi bytes character - how to make it defined behavior?5Bart
14 Aug 24 i   ii i`* Re: multi bytes character - how to make it defined behavior?4Thiago Adams
14 Aug 24 i   ii i `* Re: multi bytes character - how to make it defined behavior?3Bart
14 Aug 24 i   ii i  `* Re: multi bytes character - how to make it defined behavior?2Thiago Adams
14 Aug 24 i   ii i   `- Re: multi bytes character - how to make it defined behavior?1Bart
15 Aug 24 i   ii `- Re: multi bytes character - how to make it defined behavior?1Lawrence D'Oliveiro
15 Aug 24 i   i`- Re: multi bytes character - how to make it defined behavior?1Lawrence D'Oliveiro
15 Aug 24 i   `- Re: multi bytes character - how to make it defined behavior?1Lawrence D'Oliveiro
14 Aug 24 +- Re: multi bytes character - how to make it defined behavior?1Ben Bacarisse
14 Aug 24 `- Re: multi bytes character - how to make it defined behavior?1Richard Damon

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal