On Sat, 12/21/2024 10:46 PM, Lawrence D'Oliveiro wrote:
On Sat, 21 Dec 2024 20:56:27 -0500, Paul wrote:
NTFS then, is now just about perfect ...
Still does a poor job of handling lots of small files, though.
First and foremost, we want file systems that
don't lose any of our goods. The file systems we
have now, are more robust than they used to be,
and this is a start. The file system is scanned in
the background, details are not provided as to
how this works, merely that latent faults are not
an issue as they used to be.
I don't live on a diet of synthetic tests. Synthetic
tests are fun, but that's for characterization and
telling people the best way to use the file systems.
The file system is good enough for casual home user
usage. I would not bill something like NTFS as
a hyperscaler product.
I can tell you from my testing, not to put four billion
files in a single flat directory. The transfer would
never finish. Balanced trees of files work better,
and you might be able to handle 4x the files that way.
I would also not attempt to put four billion files
in a balanced tree. When the Wiki article on NTFS
says the file system has a theoretical limit of
four billion files, I doubt anyone has finished the
test of that statement. It is quite common in file systems,
to over-promise, and discover by exhaustive testing,
the limit of the file system is not actually as stated
on the tin. Apple for example, had two incidents, where
they released a TN stating a capability, and then three
months later, had to retract and rewrite a few of the
TN details to suit.
The OS has enough issues with things like File Explorer,
to discourage running a giant lemonade stand off the OS.
It's not nearly good enough for that.
The Federated Search has a recommended max size of one
million files. I have 1.2 million files on my collection,
and the federated search works OK on that. The reason
for the recommended size, is it takes a whole day to
index a file collection that big. And it slows down
the larger the collection. As you would expect with
the generation of inverted indexes.
When I do synthetic tests here, I do them on a RAM drive,
not for speed (nothing has "speed" here), that's just to
avoid shaking the disk drive to shit. The fastest way to
transfer files off a Windows disk, is at cluster level
with a backup product. I can transfer 40 million files
from one storage device to another in ten minutes, if
working at the cluster level. Then enjoy random accessing
them through the file system again, once the files are
on the new device. For performance reasons, I would never
recommend cp -R src dest as the "right way" for every job.
Transferring 40 million files with a cp -R, that's going
to take more than a day to do. and that's why we experiment
with the odd synthetic case, to see what the best method is.
Paul