Sujet : Re: Rationale for aligning data on even bytes in a Unix shell file?
De : janis_papanagnou+ng (at) *nospam* hotmail.com (Janis Papanagnou)
Groupes : comp.lang.cDate : 28. Apr 2025, 19:05:18
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vuog0v$3s200$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
On 28.04.2025 11:10, Bonita Montero wrote:
Am 28.04.2025 um 10:08 schrieb Janis Papanagnou:
My file system (and obviously also the file systems of others that
are posting here) have no problems with any locale.
That's the problem: the filesystem should have a specific locale.
Otherwise you copy some files from a different computer where the
user has a different locale and you get Swahili-filenames.
Okay, I think I see where you're coming from. Reminds me my formerly
view on the file names topic; some decades ago I argued that it might
be good to have only ASCII texts allowed for file names, specifically
no control characters (and maybe even some more characters), to avoid
some common issues with such characters. Needless to say that with
such a "standard" we wouldn't have been able to support I18N. So some
decades ago I changed my opinion on that. (Note that I was not saying
that this is the same as your opinion, but there's similarities; to
have well-defined "transfer syntax" including the character set.)
The historic architecture of Linux file systems is able to represent
files having file names in arbitrary languages. That's why the Unix
file systems don't show the issues that other (popular) OSes show.
Windows only has UTF-16-filenames and nov varying locale.
(I thought Windows would use "UCS2". Anyway; would 16 bit suffice to
support full Unicode; I thought it wouldn't, or only old restricted
versions of Unicode.)
But first lets speak about [character] "encodings" (not "locales");
I re-insert the snipped paragraph.
Generally, and specifically if you choose to use international
characters for file names, the prevalent and nowadays the de facto
standard is to use an UTF-8 encoding.
Above you said:
Otherwise you copy some files from a different computer where the
user has a different locale and you get Swahili-filenames.
Interoperability requires standards, also in the character encoding.
Or else some conversion will be necessary. Nowadays the most common
and most widely used encoding standard seems to be UTF-8 (not UCS2
and not UTF-16). In cases where you exchange data with systems that
do not use that de facto standard you have to convert the data. And
there's tools to do that for you, like 'iconv'.
A coupling of the file system with a fixed character encoding would
have prevented I18N, as I said above, but it's also not necessary to
couple those.
As long as Windows continues using its own "standards" I understand
that some [Windows-]folks are angrily cussing systems that rely on
prevalent standards.
(So we can agree to disagree on the file system and encoding topic.)
Janis