Sujet : Re: Rationale for aligning data on even bytes in a Unix shell file?
De : Keith.S.Thompson+u (at) *nospam* gmail.com (Keith Thompson)
Groupes : comp.lang.cDate : 08. May 2025, 22:13:43
Autres entêtes
Organisation : None to speak of
Message-ID : <87v7qaerg8.fsf@nosuchdomain.example.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
User-Agent : Gnus/5.13 (Gnus v5.13)
BGB <
cr88192@gmail.com> writes:
On 5/8/2025 6:13 AM, Janis Papanagnou wrote:
On 08.05.2025 05:30, BGB wrote:
[...]
>
Though, even for the Latin alphabet, once one goes much outside of ASCII
and Latin-1, it gets messy.
I noticed that in several places you were referring to
Latin-1. Since
decades that has been replaced by the Latin-9 (ISO 8859-15) character
set[*] for practical reasons ('€' sign, for example).
Why is your focus still on the old Latin-1 (ISO 8859-1) character
set?
Janis, just curious
[*] Unless Unicode and its encodings are used.
>
U+00A0..U+00FF are designated as Latin-1 in Unicode.
I don't think that's accurate. Do you have a reference for that?
It's true that those characters have the same names in Unicode
as in Latin-1. Though the Wikipedia article says that the ranges
0x00..0x1F and 0x7F..0x9F are *undefined*. (That doesn't match my
recollection; I thought they were defined as control characters.)
In any case, Latin-1 and Latin-9 treat those ranges in the same way.
Both can be seen as encodings for small subsets of Unicode.
[...]
CP-1252, is the dominant remaining ASCII character set in use, is
based on Latin-1, with a few characters from Latin-15 shoved into the
places where control codes previously went.
CP-1252 is not an ASCII character set. ASCII is a 7-bit character set.
CP-1252 is an 8-bit character set as are the Latin-* sets. Most 8-bit
sets are *based on* ASCII.
-- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.comvoid Void(void) { Void(); } /* The recursive call of the void */