On 5/7/2025 8:07 PM, Lawrence D'Oliveiro wrote:
On Wed, 7 May 2025 05:08:03 -0500, BGB wrote:
Ideally, filesystems should be case sensitive by default;
If someone wants case insensitivity, this can be better handled at the
application or file-browser level.
Even Linux has given in on this. The widely-used ext4 filesystem has an
option for case-insensitivity, which, once enabled for a volume, can be
activated on a per-directory basis.
Ironically, Windows and NTFS went the other way, adding an option for case-sensitive directories (though, one needs to use special commands in PowerShell to enable it on a per-directory basis).
...
Either way, case-insensitivity at the FS level adds complexity.
I guess, one intermediate option could be to keep the FS proper as case sensitive, but then fake case insensitivity at the level of the OS APIs (based on a system-level locale setting).
Say, program tries to open "Foo.txt";
Kernel sees that no "Foo.txt" exists, but "foo.txt" does, and the directory was flagged as case-insensitive, and so the kernel does a case-folded open.
Granted, how to implement this semi-efficiently is its own issue.
Externally doing a directory walk and seeing if any of the files match the requested name is possible, if albeit inefficient. Building a case-folding hash of a directory could be possible, but only makes sense if one expects this to happen repeatedly in a given directory (if it is one-off, it is little better than a linear walk and match).
One intermediate option could be to have a hidden metadata file, such as case-folded names table. This merely lists all the files in a directory, but with all the filenames normalized to all lower case or similar (with a bitmap of which characters were case-folded).
Ironically, this isn't too far off from how one might support Unix style metadata on FAT32. Say, one has a hidden file, "$_TKMETA.DAT" which isn't shown in directory listings, but may be used by the FS driver for extended metadata (say, in this case keyed using the 8.3 name).
It is kind of a crap option, but (mostly) survives Windows intrusions (but will still break on directory copy, as modern Windows versions do not preserve the original 8.3 names). This variant using the LFN's for the user-visible name, unlike "UMSDOS" which provided its own filenames and didn't use the VFAT LFN scheme.
Though. if doing a natively case-insensitive filesystem, I guess one option could be to fold all names to lower case, and then store a bitmask of which bytes to flip back to upper case.
Assuming a 64 byte dirent, and a similar AVL-like directory structure:
{
u32 ino; //00, inode number (low 32 bits)
u16 lsn; //04, left child node
u16 psn; //06, parent node
u16 rsn; //08, right child node
u16 hsn; //0A, node high bits
u16 ino_hi; //0C, inode high bits
byte zdepth; //0E, Z height of node (0=Leaf)
byte etype; //0F, dirent type
byte name[40]; //10, name
u32 ncase1; //38, case fold (first 32 bytes)
byte ncase2; //3C, case fold (next 8 bytes)
byte pad1; //3D, MBZ
u16 hsn2; //3E, more node high bits
}
Base name drops from 48 to 40, to accommodate the case-folding bits.
The LFN entries could have a similar modification.
The hsn member adds 5 more bits to lsn, rsn, and psn; extending each from 16 to 21 bits.
Though, hsn2 could potentially extend the size of the node index, increasing maximum directory size from 2 million files to 64 billion files. Granted, a hard limit of 2 million files in a directory is probably fairly reasonable (given the existing upper limit on my actual HDD's is seemingly around 3600 files in a directory).
...