Liste des Groupes | Revenir à cl c |
On 14/08/2024 18:40, Thiago Adams wrote:The objective is :On 14/08/2024 14:07, Bart wrote:...That Chinese ideogram occupies 4 bytes. It is impossible for 'ab' to clash with some other Unicode character.>
>
>
My suggestion again. I am using string but imagine this working with bytes from file.
>
>
#include <stdio.h>
#include <assert.h>int get_value(const char* s0)I see your problem. You're mixing things up.
{
const char * s = s0;
int value = 0;
int uc;
s = utf8_decode(s, &uc);
while (s)
{
if (uc < 0x007F)
{
//multichar formula
value = value*256+uc;
}
else
{
//single char
value = uc;
break; //check if there is more then error..
}
s = utf8_decode(s, &uc);
}
return value;
}
>
int main(){
printf("%d\n", get_value(u8"×"));
printf("%d\n", get_value(u8"ab"));
}
gcc will combine BYTE values together (by shifting by 8 bits or multiplying by 256), including the individual bytes that represent UTF8.In any case..my suggestion looks dangerous. But meanwhile this is not well specified in the standard.
You are combining ONLY ASCII bytes, and comparing the results with 21-bit Unicode values.
That is meaningless. I'm not surprised you get a clash between A*256+B, and some arbitrary Unicode index.
Les messages affichés proviennent d'usenet.