Liste des Groupes | Revenir à l c |
On 5/9/2025 12:52 PM, Bonita Montero wrote:Am 07.05.2025 um 12:08 schrieb BGB:>
If you know one side is UTF-8 and the other is UTF-16, thenUnicode hasn't locales, i.e. alternative meanings for the same code-
conversion does not need to know or care which locale is in effect.
point. Even the characters from 128 to 255 are fixed to Latin-1.
A locale is not an encoding; nor is it a codepage.
>
A locale is a set of formatting and language-specific rules to apply.
>
Which, in some past contexts, may have been associated with the usage
of specific code pages, but codepages are N/A with Unicode. Even as
such, various language specific rules may still exist.
>
For things like case-folding, you may still need to care about which
language (AKA, locale) is in effect, as some conversions may apply to
some languages but not others.
>
Some letters case-map differently depending on the language, ligatures
may be in effect (which may compose/decompose or map to other
ligatures), etc.
>
>
Or, one just throws a lot of this out and uses a simplified set of
"mostly language neutral" rules.
>
Say, case conversion maps:
Upper: 0061..007A -> 0041..005A
Lower: 0041..005A -> 0061..007A
Upper: 00E0..00FE -> 00C0..00DE
Lower: 00C0..00DE -> 00E0..00FE
... (Add a few more, for Greek / Cyrillic / etc)
>
And, maybe a few special cases, say (*):
009A <-> 008A
009C <-> 008C
009E <-> 008E
00FF <-> 009F
*: Assuming the "1252 mappings in Unicode Space replacing C1 controls" wonk.
>
Probably ignore most everything else, it passes through as-is.
Les messages affichés proviennent d'usenet.