Sujet : Re: Rationale for aligning data on even bytes in a Unix shell file?
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.lang.cDate : 09. May 2025, 06:28:26
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vvk43f$2jat9$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
User-Agent : Mozilla Thunderbird
On 5/8/2025 9:22 PM, Lawrence D'Oliveiro wrote:
On Thu, 8 May 2025 01:57:05 -0500, BGB wrote:
Either way, case-insensitivity at the FS level adds complexity.
If you look around some other groups, you will see discussion of a recent
rant from Linus Torvalds on this very issue. Basically, he doesn’t like
case-insensitivity. And he is justified in pointing out that it leads to
more opportunities for bugs in the kernel code. The only reason we need to
have it is because it makes certain things easier for users.
OK.
I have mixed feelings on the whole idea.
I guess, one intermediate option could be to keep the FS proper as case
sensitive, but then fake case insensitivity at the level of the OS APIs
(based on a system-level locale setting).
There is a standard Unicode locale-independent case-folding algorithm.
That is what Linux implements. At the time of volume initialization, it
only involves setting one filesystem parameter, which says to assume that
all filenames are UTF-8-encoded.
Possibly.
The issue is partly one of how to deal with it when opening files in a way that isn't painfully slow (and is compatible with "fast" lookup strategies, like hash-tables or tree-based lookups).
Like, the specifics of the case-folding algorithm itself aren't the main issue, so much as keeping it small and fast.
Both of which become harder the more rules that exist; and ideally that remain "acceptable" when the code is effectively running 5 orders of magnitude slower than on a modern PC (like, you really "feel" code inefficiency when running in a verilog simulation at kHz speeds).
While, granted, in this project, despite using FAT, some trickery was used to speed things up, say:
Caching directory contents and using hashed lookups (*);
Reading in and caching FAT chains when files are opened, effectively using cached arrays of cluster numbers for IO operations;
...
*: Chances are, if one accesses a directory once, they may access it again in the near future.
But, this doesn't scale all that well.