Re: program to remove duplicates

Liste des GroupesRevenir à cl c  
Sujet : Re: program to remove duplicates
De : fir (at) *nospam* grunge.pl (fir)
Groupes : comp.lang.c
Date : 22. Sep 2024, 15:32:05
Autres entêtes
Organisation : i2pn2 (i2pn.org)
Message-ID : <66F02A65.3000802@grunge.pl>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : Mozilla/5.0 (Windows NT 5.1; rv:27.0) Gecko/20100101 Firefox/27.0 SeaMonkey/2.24
fir wrote:
fir wrote:
Bart wrote:
On 22/09/2024 11:24, fir wrote:
Paul wrote:
>
The normal way to do this, is do a hash check on the
files and compare the hash. You can use MD5SUM, SHA1SUM, SHA256SUM,
as a means to compare two files. If you want to be picky about
it, stick with SHA256SUM.
>
>
the code i posted work ok, and if someone has windows and mingw/tdm
may compiel it and check the application if wants
>
hashing is not necessary imo though probably could speed things up -
im not strongly convinced that the probablility of misteke in this
hashing is strictly zero (as i dont ever used this and would need to
produce my own hashing probably).. probably its mathematically proven
ists almost zero but as for now at least it is more interesting for me
if the cde i posted is ok
>
I was going to post similar ideas (doing a linear pass working out
checksums for each file, sorting the list by checksum and size, then
candidates for a byte-by-byte comparison, if you want to do that, will
be grouped together).
>
But if you're going to reject everyone's suggestions in favour of your
own already working solution, then I wonder why you bothered posting.
>
(I didn't post after all because I knew it would be futile.)
>
>
>
yet to say about this efficiency
>
whan i observe how it work - this program is square in a sense it has
half square loop over the directory files list, so it may be lik
20x*20k/2-20k comparcions but it only compares mostly sizes so this
kind of being square im not sure how serious is ..200M int comparsions
is a problem? - mayeb it become to be for larger sets
>
in the meaning of real binary comparsions is not fully square but
its liek sets of smaller squares on diagonal of this large square
if yu (some) know what i mean... and that may be a problem as
if in that 20k files 100 have same size then it makes about 100x100 full
loads and 100x100 full binary copmpares byte to byte which
is practically full if there are indeed 100 duplicates
(maybe its less than 100x100 as at first finding of duplicate i mark it
as dumpicate and ship it in loop then
>
but indeed it shows practically that in case of folders bigger than 3k
files it slows down probably unproportionally so the optimisation is
in hand /needed for large folders
>
thats from the observation on it
>
>
>
but as i said i mainly wanted this to be done to remove soem space of
this recovered somewhat junk files.. and having it the partially square
way is more important than having it optimised
>
it works and if i see it slows down on large folders i can divide those
big folders on few for 3k files and run this duplicate mover in each one
>
more hand work but can be done by hand
hovever saying that the checksuming/hashing idea is kinda good ofc
(sorting oprobably the less as maybe a bit harder to write, as im never sure if my old quicksirt hand code has no error i once tested like 30
quicksort versions in mya life trying to rewrite it and once i get some
mistake in thsi code and later never strictly sure if the version i finally get is good - its probably good but im not sure)
but i would need to understand that may own way of hashing has practically no chances to generate same hash on different files..
and i never was doing that things so i not rethinked it..and now its a side thing possibly not worth studying

Date Sujet#  Auteur
21 Sep 24 * program to remove duplicates28fir
21 Sep 24 +* Re: program to remove duplicates5fir
21 Sep 24 i`* Re: program to remove duplicates4fir
21 Sep 24 i `* Re: program to remove duplicates3fir
21 Sep 24 i  `* Re: program to remove duplicates2fir
22 Sep 24 i   `- Re: program to remove duplicates1fir
21 Sep 24 +* Re: program to remove duplicates19Chris M. Thomasson
22 Sep 24 i`* Re: program to remove duplicates18fir
22 Sep 24 i +- Re: program to remove duplicates1Chris M. Thomasson
22 Sep 24 i `* Re: program to remove duplicates16Lawrence D'Oliveiro
22 Sep 24 i  +* Re: program to remove duplicates14fir
22 Sep 24 i  i+- Re: program to remove duplicates1Chris M. Thomasson
22 Sep 24 i  i+- Re: program to remove duplicates1Lawrence D'Oliveiro
22 Sep 24 i  i`* Re: program to remove duplicates11Paul
22 Sep 24 i  i +* Re: program to remove duplicates9fir
22 Sep 24 i  i i`* Re: program to remove duplicates8Bart
22 Sep 24 i  i i +* Re: program to remove duplicates3fir
22 Sep 24 i  i i i`* Re: program to remove duplicates2fir
22 Sep 24 i  i i i `- Re: program to remove duplicates1fir
22 Sep 24 i  i i `* Re: program to remove duplicates4fir
22 Sep 24 i  i i  `* Re: program to remove duplicates3fir
22 Sep 24 i  i i   `* Re: program to remove duplicates2fir
22 Sep 24 i  i i    `- Re: program to remove duplicates1fir
22 Sep 24 i  i `- Re: program to remove duplicates1Chris M. Thomasson
22 Sep 24 i  `- Re: program to remove duplicates1DFS
22 Sep 24 +- Re: program to remove duplicates1Lawrence D'Oliveiro
1 Oct 24 `* Re: program to remove duplicates2Josef Möllers
1 Oct 24  `- Off Topic (Was: program to remove duplicates)1Kenny McCormack

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal