Sujet : Re: Character non-equivalence, was Byte Addressability And Beyond
De : johnl (at) *nospam* taugh.com (John Levine)
Groupes : comp.archDate : 07. Jun 2024, 22:26:03
Autres entêtes
Organisation : Taughannock Networks
Message-ID : <v3vttb$5tk$1@gal.iecc.com>
References : 1 2 3 4
User-Agent : trn 4.0-test77 (Sep 1, 2010)
It appears that EricP <
ThatWouldBeTelling@thevillage.com> said:
Eeewww... I didn't even think of that.
What does one do about them? You can't treat them as equivalent in a
string compare... the user might want the first B and not second B.
People keep rediscovering that when you're using Unicode, nothing is
simple. One of its canonical forms is NFKC which uses composed
versions of accented characters, and uses a canonical equivalence rule
to turn some kinds of characters that look similar into a single form.
That solves some of the problems but not even close to all of them.
The rules about whether two strings are upper/lower caase equivalent
depend on the language and sometimes even the local version of the
language, e.g. French French and Quebec French have different
conventions about accented capital letters.
The only thing I can say with confidence is that any rule that starts
with "You can just ..." is wrong.
-- Regards,John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",Please consider the environment before reading this e-mail. https://jl.ly