On 08/03/2024 03:39, Rich wrote:
James Harris <james.harris.1@gmail.com> wrote:
That's right. What this is for is code to list files in a folder
which are duplicates of those in another folder (same name, same
relative place in the folder hierarchy, etc). For example, say there
are two folders, c and d, and one is potentially a copy of the other.
The command
>
$ ./lsdup.py -r c d
>
lists files in d (and subdirectories due to the -r recurse option) which
are duplicates of those in c.
>
In answer to your other point about using xargs, I would use it if it
would do what's required, and do so consistently,
It will, provided you output file names with ASCII null terminators
instead of ASCII newlines, and use the -0 option to tell xargs the
filenames are null separated.
but I am not sure whether I can trust it or not.
You can. Provided you feed it ASCII null terminated filenames, it will
work properly, no matter what other weird characters might be in the
filenames.
You are right: unusual characters in file names is the kind of issue I am wary about with xargs. Running a command to do automatic deletion requires a lot of trust in the command's operation.
I /have/ been thinking about adding a -0 option but it has issues such as:
(1) With -0 one cannot easily postprocess the list of file names, if required.
(2) With xargs interactivity (e.g. the -i in rm -i) is lost.
Yes, there are ways round such issues but they may require altering a command line /after/ it has been found to be correct. That was the issue which led to this thread. If I I generate a list of delete commands with a series of commands such as
A | B
then once I am happy with them I would prefer simply to append | sh as in
A | B | sh
rather than changing the form to
sh <(A | B)
This is not so much about convenience as about making sure less can go wrong - always a good idea when deleting files by means of a command.
IMO better than a -print- or -0 option would be a function which would render file names exactly as the current shell would - same escape sequences, etc.
..
(A separate command, lsempty.py, is is used to delete the resultant
empty folders.)
A separate python command is completely unnecessary. Removing empty
leaf directories is already easy using the tools provided by the
system.:
$ find c d -type d -empty -print0 | xargs -0 rmdir
If you also want to remove parent empty directories should all their
children go away, change to:
$ find c d -type d -empty -print0 | xargs -0 rmdir -p
Thanks. That looks as though it would work, albeit that it would print a few spurious error messages which add to the work of the person running the code. For example,
$ find f -type d -empty -print0 | xargs -0 rmdir -pv
rmdir: removing directory, 'f/1/2/3'
rmdir: removing directory, 'f/1/2'
rmdir: removing directory, 'f/1'
rmdir: failed to remove directory 'f/1': Directory not empty
rmdir: removing directory, 'f/b/c/d'
rmdir: removing directory, 'f/b/c'
rmdir: removing directory, 'f/b'
rmdir: failed to remove directory 'f/b': Directory not empty
rmdir: removing directory, 'f/b/g/h'
rmdir: removing directory, 'f/b/g'
rmdir: removing directory, 'f/b'
rmdir: failed to remove directory 'f/b': Directory not empty
$
Note, in particular, the repetitions of f/b.
By contrast, my version of the above is clearer:
$ ./lsempty.py f | sed 's/^/rmdir -v /' | sh
rmdir: removing directory, 'f/1/2/3'
rmdir: removing directory, 'f/1/2'
rmdir: removing directory, 'f/b/c/d'
rmdir: removing directory, 'f/b/c'
rmdir: removing directory, 'f/b/g/h'
rmdir: removing directory, 'f/b/g'
$
In answer to the point about filenames with spaces and odd characters, I
currently output names in single quotes, as above. This will
complain until the parent is empty, but the complaints can be
ignored.
Single quote is also a possible filename character, so if, by chance,
you end up with a file containing one ' somewhere your wrapping in
single quotes will result in a fail at that point.
I deal with that (at present) by juxtaposing single-quoted strings. For example, if there is a file called
won't scan.txt
with an apostrophe and a space then I get the following results. First, the command:
$ ./lsdup.py c d -r | grep won | sed 's/^/rm -v /'
rm -v 'd/won'\''t scan.txt'
Then, piping that command to the shell works properly in:
$ ./lsdup.py c d -r | grep won | sed 's/^/rm -v /' | sh
removed "d/won't scan.txt"
IOW the file name becomes the concatenation of
'won'
\'
't scan.txt'
That said, I am not sure that this will work in all cases and expect that a function to render a file name to match the current shell would be more dependable. Pity that shells don't provide such a function ... AFAIK.
-- James Harris