Sujet : Re: python text, Byte Addressability And Beyond
De : ldo (at) *nospam* nz.invalid (Lawrence D'Oliveiro)
Groupes : comp.archDate : 03. Jun 2024, 09:11:10
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v3jtqu$3qdv3$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : Pan/0.158 (Avdiivka; )
On Thu, 30 May 2024 12:47:35 GMT, Anton Ertl wrote:
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>
On Wed, 29 May 2024 08:20:03 GMT, Anton Ertl wrote:
>
In UTF-32 a character is a sequence of (32-bit) code units.
In UTF-8 a character is a sequence of (8-bit) code units.
>
The point being, there is a 1:1 correspondence between the two
representations of the same characters/code points. So your claim that
use of one is somehow a “mistake” while the other is not, is spurious.
If the data you are working on is provided in files containing UTF-8,
conversion to UTF-32 does not provide any benefits and is therefore an
unnecessary complication, and therefore a mistake.
Assuming it does not provide any benefits is the mistake.