Liste des Groupes | Revenir à cl c |
On 14/08/2024 15:12, Bart wrote:Obviously that can't work, for example because two printable ASCII characters with codes 32 to 96, will have values from 1024 to 9216 when combined in a character literal. Those are going to clash with Unicode characters with those values.On 14/08/2024 18:40, Thiago Adams wrote:The objective is :On 14/08/2024 14:07, Bart wrote:>>That Chinese ideogram occupies 4 bytes. It is impossible for 'ab' to clash with some other Unicode character.>
>
>
My suggestion again. I am using string but imagine this working with bytes from file.
>
>
#include <stdio.h>
#include <assert.h>
...int get_value(const char* s0)>
{
const char * s = s0;
int value = 0;
int uc;
s = utf8_decode(s, &uc);
while (s)
{
if (uc < 0x007F)
{
//multichar formula
value = value*256+uc;
}
else
{
//single char
value = uc;
break; //check if there is more then error..
}
s = utf8_decode(s, &uc);
}
return value;
}
>
int main(){
printf("%d\n", get_value(u8"×"));
printf("%d\n", get_value(u8"ab"));
}
I see your problem. You're mixing things up.
- make single characters have the Unicode value without having to use U''
- allow more than one chars like 'ab' in some cases where each character is less than 0x007F. This can break code for instance '¼¼'.
but I am suspecting people are not using in this way (I hope)
In any case..my suggestion looks dangerous. But meanwhile this is not well specified in the standard.It wasn't well-specified even when dealing with 100% ASCII. For example, 'AB' might have the hex value 0x4142 on one compiler, 0x4241 on another, maybe just 0x41 or 0x42 on a third, or even 0x41410000.
Les messages affichés proviennent d'usenet.