Newsportal USENET - Re: multi bytes character - how to make it defined behavior?

Sujet : Re: multi bytes character - how to make it defined behavior?
De : thiago.adams (at) *nospam* gmail.com (Thiago Adams)
Groupes : comp.lang.c
Date : 14. Aug 2024, 19:28:10

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <v9isvq$i0fs$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11
User-Agent : Mozilla Thunderbird

On 14/08/2024 15:12, Bart wrote:

On 14/08/2024 18:40, Thiago Adams wrote:
On 14/08/2024 14:07, Bart wrote:

That Chinese ideogram occupies 4 bytes. It is impossible for 'ab' to clash with some other Unicode character.
>
>
>
My suggestion again. I am using string but imagine this working with bytes from file.
>
>
#include <stdio.h>
#include <assert.h>
...
int get_value(const char* s0)
{
    const char * s = s0;
    int value = 0;
    int uc;
    s = utf8_decode(s, &uc);
    while (s)
    {
      if (uc < 0x007F)
      {
         //multichar formula
         value = value*256+uc;
      }
      else
      {
         //single char
         value = uc;
         break; //check if there is more then error..
      }
      s = utf8_decode(s, &uc);
    }
    return value;
}
>
int main(){
   printf("%d\n", get_value(u8"×"));
   printf("%d\n", get_value(u8"ab"));
}
I see your problem. You're mixing things up.

The objective is :
- make single characters have the Unicode value without having to use U''
- allow more than one chars like 'ab' in some cases where each character is less than 0x007F. This can break code for instance '¼¼'.
but I am suspecting people are not using in this way (I hope)

gcc will combine BYTE values together (by shifting by 8 bits or multiplying by 256), including the individual bytes that represent UTF8.
You are combining ONLY ASCII bytes, and comparing the results with 21-bit Unicode values.
That is meaningless. I'm not surprised you get a clash between A*256+B, and some arbitrary Unicode index.

In any case..my suggestion looks dangerous. But meanwhile this is not well specified in the standard.

Date	Sujet	#	Auteur
13 Aug 24	multi bytes character - how to make it defined behavior?	19	Thiago Adams
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	16	Bart
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	15	Keith Thompson
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	14	Thiago Adams
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	13	Bart
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	11	Thiago Adams
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	9	Bart
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	8	Thiago Adams
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	1	Thiago Adams
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	5	Bart
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	4	Thiago Adams
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	3	Bart
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	2	Thiago Adams
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	1	Bart
15 Aug 24	Re: multi bytes character - how to make it defined behavior?	1	Lawrence D'Oliveiro
15 Aug 24	Re: multi bytes character - how to make it defined behavior?	1	Lawrence D'Oliveiro
15 Aug 24	Re: multi bytes character - how to make it defined behavior?	1	Lawrence D'Oliveiro
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	1	Ben Bacarisse
14 Aug 24	Re: multi bytes character - how to make it defined behavior?	1	Richard Damon