Sujet : Re: Simple string conversion from UCS2 to ISO8859-1
De : richard (at) *nospam* damon-family.org (Richard Damon)
Groupes : comp.lang.cDate : 26. Feb 2025, 05:21:21
Autres entêtes
Organisation : i2pn2 (i2pn.org)
Message-ID : <620d04356778b5162e9f094f77f4ab3d749fab91@i2pn2.org>
References : 1 2 3 4 5 6 7 8 9
User-Agent : Mozilla Thunderbird
On 2/25/25 3:31 PM, Lawrence D'Oliveiro wrote:
On Tue, 25 Feb 2025 15:53:23 +0100, pozz wrote:
... the standard says UCS2
Does it mention anything about the surrogates ranges (0xD800 .. 0xDFFF)?
In order for it to be strict UCS-2, they would have to be forbidden. If
they are allowed, then that makes it UTF-16.
To my knowledge, UCS-2 doesn't say those codes are "forbidden", just that they are not defined codes.
UCS-2 basically became a legacy code when they needed to expand unicode to more than 16 bits. Systems defined to use it basically just treat UTF-16 surrogate pairs as two characters they don't know what they mean, just like a lot of programs can treat UTF-8 as "ASCII" with some codes it doesn't know what they mean.
The ignorance is bliss method works well for a number of tasks, you just need to only alter strings at points you "understand", and not need to actualy count characters (which actualy becomes hard to do totally right in Unicode anyway).