On Thu, 1/16/2025 10:51 PM, vallor wrote:
On Fri, 17 Jan 2025 02:49:17 -0000 (UTC), Lawrence D'Oliveiro
<ldo@nz.invalid> wrote in <vmcgfd$3osq8$2@dont-email.me>:
On Thu, 16 Jan 2025 18:06:00 -0500, Paul wrote:
>
You can freeze a copy of C: for example, and run a Robocopy over it.
>
Until you hit the limitations of Windows and Robocopy, and have to give
up on it and use Linux instead.
>
<https://www.theregister.com/2010/09/24/sysadmin_file_tools/>
I suspect things may have changed in the last *14 years*.
I think we discussed this one already.
That's *not* how you transfer sixty million files.
You image the partition with a cluster level tool,
and by comparison to file level, it screams. It
runs at disk speed.
Then at the other machine, you restore at the cluster
level. Now, all sixty million files are there, and you've
saved enough time to take a vacation.
I have samples of fragmented volumes here, stored as .mrimg .
If I want to show an example of a fragmented volume, I can load
one of those up in about ten minutes or so. The top one, is
64 million files stored in a single directory. The second one
is 64 million files in a tree.
writeflat-64million-Ddrive-736391-00-00.mrimg 2,323,353,194 bytes
writeidx4-64million-Ddrive-736391-00-00.mrimg 2,381,377,130 bytes
It's the same with using 7ZIP for archiving. 7ZIP can run pretty
fast, on solid block files of a good size. However, if you feed it
a tree structure full of small files, it doesn't matter how fast your
CPU is, the program "starves" for want of responses from the file system.
You might get 10MB/sec from it. That's because the disk heads are
flying around. Even if you use my RAMDrive for this, it's *still*
slow. That doesn't help either, as just the file system stack takes
too long per file handled.
But working at the cluster level works a treat, a backup and restore
and you're done. And it isn't even lunch hour yet. It does take
time, for the backup tool to "crawl" the tree and make the
index they like to use. That's still an expensive part of it.
If you attack that problem file-by-file, it's the heat death of the
universe bad.
dir in command prompt took four minutes. Dumped to list.txt
$ tail list.txt
05/26/2024 06:09 AM 7 3FFFFF8.txt
05/26/2024 06:09 AM 7 3FFFFF9.txt
05/26/2024 06:09 AM 7 3FFFFFA.txt
05/26/2024 06:09 AM 7 3FFFFFB.txt
05/26/2024 06:09 AM 7 3FFFFFC.txt
05/26/2024 06:09 AM 7 3FFFFFD.txt
05/26/2024 06:09 AM 7 3FFFFFE.txt
05/26/2024 06:09 AM 7 3FFFFFF.txt
67108864 File(s) 469,762,048 bytes
1 Dir(s) 7,497,859,072 bytes free
$ head list.txt
Volume in drive F is RAMDrive
Volume Serial Number is 50A8-F4E5
Directory of F:\out
05/26/2024 06:09 AM <DIR> .
05/26/2024 04:47 AM 7 0000000.txt
05/26/2024 04:47 AM 7 0000001.txt
05/26/2024 04:47 AM 7 0000002.txt
05/26/2024 04:47 AM 7 0000003.txt
$ wc -l list.txt
67108872 list.txt
Windows has a copy of tar.exe in System32 now, and I
can try that, but it processes about 10MB/sec. A wild guess
is that this might take two hours, but I can't really be sure.
(You can't afford to turn on Verbose, or the terminal
might become the limiting factor.)
I'm trying to avoid explorer.exe getting a "whiff" of
the folder, because once that happens, explorer rails
on one core as it counts files and so on. The way to stop
explorer.exe from working on that, is to rip the file system
away, and once the "pointer" to the file system is not
available any more, it stops. You would not want explorer.exe
doing the copy. tar.exe is looking less efficient than
a backup and restore (the restore of 64 million files took
about 3 minutes 40 seconds). You can also kill explorer.exe in
Task Manager and start another one, if you like.
Robocopy should not be counting files like explorer.exe does
when it prepares to do a copy, but I would expect it's going
to be one of those two hour things. It's going to be in a
different ballpark.
Paul