Hello,
I am very glad to have found dupd, as it offers the best workflow for my use-case.
I have run the following command, on about 150TB data. It took about 70 hours:
# dupd scan --path /path1 --path /path2
Files: 2420698 0 errors 1354 s
Total duplicates: 2108486 files in 690968 groups in 238110 s
Run 'dupd report' to list duplicates
Then, I did:
dupd started listing the files which are unique to /path2, but it is taking a very long time, with CPU tagged at about 50%.
Is this normal? I thought that since the files have been listed in a SQLite db, print such list would have been fast?