On 2024-04-12, James Harris <
james.harris.1@gmail.com> wrote:
For a number of reasons I am looking for a way of recording a list of
the files (and file-like objects) on a Unix system at certain points in
time. The main output would simply be sorted text with one
fully-qualified file name on each line.
>
What follows is my first attempt at it. I'd appreciate any feedback on
whether I am going about it the right way or whether it could be
improved either in concept or in coding.
>
There are two tiny scripts. In the examples below they write to
temporary files f1 and f2 to test the mechanism but the idea is that the
reports would be stored in timestamped files so that comparisons between
one report and another could be made later.
>
The first, and primary, script generates nothing other than names and is
as follows.
>
export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1
>
You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.
>
I am not sure I coded the command right albeit that it seems to work on
test cases.
>
The output from that starts with lines such as
>
/
/bin
/boot
/boot/System.map-5.15.0-101-generic
/boot/System.map-5.15.0-102-generic
...etc...
>
Such a form would be ideal for input to grep and diff to look for
relevant files that have been added or removed between any two runs.
>
The second, and less important, part is to store (in a separate file)
info about each of the file names as that may be relevant in some cases.
That takes the first file as input and has the following form.
>
cat /tmp/f1 |\
tr '\n' '\0' |\
xargs -0 sudo ls -ld > /tmp/f2
>
The output from that is such as
>
drwxr-xr-x 23 root root 4096 Apr 13 2023 /
lrwxrwxrwx 1 root root 7 Mar 7 2023 /bin -> usr/bin
drwxr-xr-x 3 root root 4096 Apr 11 11:30 /boot
...etc...
>
As for run times, if anyone's interested, despite the server I ran this
on having multiple locally mounted filesystems and one NFS the initial
tests ran in 90 seconds to generate the first file and 5 minutes to
generate the second, which would mean (as long as no faults are found)
that it would be no problem to run at least the first script whenever
required. Other than that, I'd probably also schedule both to run each
night.
>
That's the idea. As I say, comments, advice and criticisms on the idea
or on the coding would be appreciated!
>
One thing, find has a "printf" option, where you can format the
output. you can remove the need for "tr" by using this instead of
"-print0".
-printf "%P\n"
That will also remove the leading slash, which I think is a good idea
in this case. Use the lower case p to keep the starting point of the
file and have the leading path.
If you are wanting to validate a directory tree, that is, see if it
has changed, I would recommend using mtree. It's available in debian
under the mtree-bsd package.
Mtree can output a list of files, plus other attributes to a spec
file, and can tell you later, according to the spec file, what changes
have been made. The problem with your "find" method, is you can't
tell if a file has simply been modified.
Using mtree, you can do two things. One generate a specification
file, which is really a list of files plus selected attributes at any
point in time AND, see what changes have been made. As a bonus, you
can get it to output the spec in a simple format, using the "-C"
option, and you get output very similar to "find" with a little extra
info tacked on, which you could remove using a pipe.
You could output to a spec file which as the date in the filename,
then run mtree against any previous spec file to see what has changes
between that spec and the current state.
If you just want the list of files, find works fine, with the
suggestion I made about printing the filename, but have a look at
mtree because I think it will save you a bit of coding.
Thats the thing with Linux, or computing in general, its likely what
you thought of has already been done, and there is a tool which does
it, or easily adapated to do it.