Re: multi bytes character - how to make it defined behavior?

Liste des GroupesRevenir à l c 
Sujet : Re: multi bytes character - how to make it defined behavior?
De : bc (at) *nospam* freeuk.com (Bart)
Groupes : comp.lang.c
Date : 14. Aug 2024, 20:12:34
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v9is2h$i0sd$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : Mozilla Thunderbird
On 14/08/2024 18:40, Thiago Adams wrote:
On 14/08/2024 14:07, Bart wrote:

That Chinese ideogram occupies 4 bytes. It is impossible for 'ab' to clash with some other Unicode character.
>
>
 My suggestion again. I am using string but imagine this working with bytes from file.
  #include <stdio.h>
#include <assert.h>
...
int get_value(const char* s0)
{
    const char * s = s0;
    int value = 0;
    int  uc;
    s = utf8_decode(s, &uc);
    while (s)
    {
      if (uc < 0x007F)
      {
         //multichar formula
         value = value*256+uc;
      }
      else
      {
         //single char
         value = uc;
         break; //check if there is more then error..
      }
      s = utf8_decode(s, &uc);
    }
    return value;
}
 int main(){
   printf("%d\n", get_value(u8"×"));
   printf("%d\n", get_value(u8"ab"));
}
I see your problem. You're mixing things up.
gcc will combine BYTE values together (by shifting by 8 bits or multiplying by 256), including the individual bytes that represent UTF8.
You are combining ONLY ASCII bytes, and comparing the results with 21-bit Unicode values.
That is meaningless. I'm not surprised you get a clash between A*256+B, and some arbitrary Unicode index.

Date Sujet#  Auteur
13 Aug 24 * multi bytes character - how to make it defined behavior?19Thiago Adams
14 Aug 24 +* Re: multi bytes character - how to make it defined behavior?16Bart
14 Aug 24 i`* Re: multi bytes character - how to make it defined behavior?15Keith Thompson
14 Aug 24 i `* Re: multi bytes character - how to make it defined behavior?14Thiago Adams
14 Aug 24 i  `* Re: multi bytes character - how to make it defined behavior?13Bart
14 Aug 24 i   +* Re: multi bytes character - how to make it defined behavior?11Thiago Adams
14 Aug 24 i   i+* Re: multi bytes character - how to make it defined behavior?9Bart
14 Aug 24 i   ii`* Re: multi bytes character - how to make it defined behavior?8Thiago Adams
14 Aug 24 i   ii +- Re: multi bytes character - how to make it defined behavior?1Thiago Adams
14 Aug 24 i   ii +* Re: multi bytes character - how to make it defined behavior?5Bart
14 Aug 24 i   ii i`* Re: multi bytes character - how to make it defined behavior?4Thiago Adams
14 Aug 24 i   ii i `* Re: multi bytes character - how to make it defined behavior?3Bart
14 Aug 24 i   ii i  `* Re: multi bytes character - how to make it defined behavior?2Thiago Adams
14 Aug 24 i   ii i   `- Re: multi bytes character - how to make it defined behavior?1Bart
15 Aug 24 i   ii `- Re: multi bytes character - how to make it defined behavior?1Lawrence D'Oliveiro
15 Aug 24 i   i`- Re: multi bytes character - how to make it defined behavior?1Lawrence D'Oliveiro
15 Aug 24 i   `- Re: multi bytes character - how to make it defined behavior?1Lawrence D'Oliveiro
14 Aug 24 +- Re: multi bytes character - how to make it defined behavior?1Ben Bacarisse
14 Aug 24 `- Re: multi bytes character - how to make it defined behavior?1Richard Damon

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal