Re: multi bytes character - how to make it defined behavior?

Liste des GroupesRevenir à l c 
Sujet : Re: multi bytes character - how to make it defined behavior?
De : thiago.adams (at) *nospam* gmail.com (Thiago Adams)
Groupes : comp.lang.c
Date : 14. Aug 2024, 20:28:10
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v9isvq$i0fs$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11
User-Agent : Mozilla Thunderbird
On 14/08/2024 15:12, Bart wrote:
On 14/08/2024 18:40, Thiago Adams wrote:
On 14/08/2024 14:07, Bart wrote:
 
That Chinese ideogram occupies 4 bytes. It is impossible for 'ab' to clash with some other Unicode character.
>
>
>
My suggestion again. I am using string but imagine this working with bytes from file.
>
>
#include <stdio.h>
#include <assert.h>
 ...
int get_value(const char* s0)
{
    const char * s = s0;
    int value = 0;
    int  uc;
    s = utf8_decode(s, &uc);
    while (s)
    {
      if (uc < 0x007F)
      {
         //multichar formula
         value = value*256+uc;
      }
      else
      {
         //single char
         value = uc;
         break; //check if there is more then error..
      }
      s = utf8_decode(s, &uc);
    }
    return value;
}
>
int main(){
   printf("%d\n", get_value(u8"×"));
   printf("%d\n", get_value(u8"ab"));
}
 I see your problem. You're mixing things up.
The objective is :
  - make single characters have the Unicode value without  having to use U''
  - allow more than one chars like 'ab' in some cases where each character is less than 0x007F. This can break code for instance '¼¼'.
but I am suspecting people are not using in this way (I hope)

gcc will combine BYTE values together (by shifting by 8 bits or multiplying by 256), including the individual bytes that represent UTF8.
 You are combining ONLY ASCII bytes, and comparing the results with 21-bit Unicode values.
 That is meaningless. I'm not surprised you get a clash between A*256+B, and some arbitrary Unicode index.
 
In any case..my suggestion looks dangerous. But meanwhile this is not well specified in the standard.

Date Sujet#  Auteur
13 Aug 24 * multi bytes character - how to make it defined behavior?19Thiago Adams
14 Aug 24 +* Re: multi bytes character - how to make it defined behavior?16Bart
14 Aug 24 i`* Re: multi bytes character - how to make it defined behavior?15Keith Thompson
14 Aug 24 i `* Re: multi bytes character - how to make it defined behavior?14Thiago Adams
14 Aug 24 i  `* Re: multi bytes character - how to make it defined behavior?13Bart
14 Aug 24 i   +* Re: multi bytes character - how to make it defined behavior?11Thiago Adams
14 Aug 24 i   i+* Re: multi bytes character - how to make it defined behavior?9Bart
14 Aug 24 i   ii`* Re: multi bytes character - how to make it defined behavior?8Thiago Adams
14 Aug 24 i   ii +- Re: multi bytes character - how to make it defined behavior?1Thiago Adams
14 Aug 24 i   ii +* Re: multi bytes character - how to make it defined behavior?5Bart
14 Aug 24 i   ii i`* Re: multi bytes character - how to make it defined behavior?4Thiago Adams
14 Aug 24 i   ii i `* Re: multi bytes character - how to make it defined behavior?3Bart
14 Aug 24 i   ii i  `* Re: multi bytes character - how to make it defined behavior?2Thiago Adams
14 Aug 24 i   ii i   `- Re: multi bytes character - how to make it defined behavior?1Bart
15 Aug 24 i   ii `- Re: multi bytes character - how to make it defined behavior?1Lawrence D'Oliveiro
15 Aug 24 i   i`- Re: multi bytes character - how to make it defined behavior?1Lawrence D'Oliveiro
15 Aug 24 i   `- Re: multi bytes character - how to make it defined behavior?1Lawrence D'Oliveiro
14 Aug 24 +- Re: multi bytes character - how to make it defined behavior?1Ben Bacarisse
14 Aug 24 `- Re: multi bytes character - how to make it defined behavior?1Richard Damon

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal