Sujet : Re: Rationale for aligning data on even bytes in a Unix shell file?
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.lang.cDate : 07. May 2025, 19:26:49
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vvg8uq$1647n$2@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
User-Agent : Mozilla Thunderbird
On 5/7/2025 7:58 AM, Janis Papanagnou wrote:
On 07.05.2025 12:08, BGB wrote:
[...]
>
Though, if someone really must make something case-insensitive, a case
could be made for only supporting it for maybe Latin, Greek, and
Cyrillic.
I don't understand what you want to say here; it just sounds strange
to me. - Mind to elaborate?
Latin, Greek, and Cyrillic, are the main alphabets which actually have a useful and reasonably well defined concept of "case", and thus "case folding" actually makes sense for these.
For most other places, it does not, and one can likely ignore rules for things outside of these alphabets. Can eliminate a bunch of rules for alphabets that don't actually have "case" as we would understand it.
By limiting rules in these ways, a simpler and more manageable set of rules is possible. Vs, say, actual Unicode rules, which tend to have stuff going on all over the place.
Ligatures pose an issue though, but presumably option is one of:
Case fold between ligatures, when both variants exist;
Treat the ligature as its own character;
Decompose and compare.
Though, FWIW, in my normalization code, I mostly ignored ligatures, as while they could be decomposed in many cases, they could only be recomposed for locales that actually use said ligature (like, in English, if AE and IJ started spontaneously merging into new characters, this would be weird and out of place; and having a filesystem layer that merely decomposed any ligatures it encountered would not be ideal).
Ideally, this would be better handled in a file-browser or
similar, and not in the VFS or FS driver itself.
Janis