If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Rate Thread | Display Modes |
#1
|
|||
|
|||
File Manager: Any attribute to tell whether files are duplicates?
Besides file name and file size...
There are so many different attributes, including about 10 different Date attributes, one would think that some attribute would match to at least suggest that two files are duplicates. You know, like a Date attribute of when the thing was authored... Am I missing something? Thanks. |
Ads |
#2
|
|||
|
|||
File Manager: Any attribute to tell whether files are duplicates?
John Doe wrote:
Besides file name and file size... There are so many different attributes, including about 10 different Date attributes, one would think that some attribute would match to at least suggest that two files are duplicates. You know, like a Date attribute of when the thing was authored... Am I missing something? No, that's pretty much it, as far as Explorer is concerned. You can have it display the "Date created", but that gets reset when copying a file, or moving from one disk to another. Personally, I just use the filesize. If two files are the same size, I open them and look. (If there are many, I check md5sums.) -- Mars was where it should be. I felt more real. |
#3
|
|||
|
|||
File Manager: Any attribute to tell whether files are duplicates?
Auric__ wrote:
John Doe wrote: Besides file name and file size... There are so many different attributes, including about 10 different Date attributes, one would think that some attribute would match to at least suggest that two files are duplicates. You know, like a Date attribute of when the thing was authored... Am I missing something? No, that's pretty much it, as far as Explorer is concerned. You can have it display the "Date created", but that gets reset when copying a file, or moving from one disk to another. Personally, I just use the filesize. If two files are the same size, I open them and look. (If there are many, I check md5sums.) You can run hashdeep on a drive and compute MD5 (a kind of checksum) for the entire drive. Sort by hash (LibreOffice Calc), and identify duplicates that way. The first thing that stands out, is a small set of zero length files, which all have the same checksum. Hashdeep can be blazing fast on an SSD, because it is multithreaded and can work on more than one file at a time. It might be less well tuned for a HDD (head thrashing), and then it might work better with threading turned off. With hashdeep executable, you can name the algo to use, while md5deep just does md5. (Basically the one file is renamed, and the program "checks its name" to figure out what role to play.) http://md5deep.sourceforge.net/ In terms of computing checksums, they vary with "speed" and "certainty". CRC32 is fast (1GB/sec maybe), but there could be collisions. MD5 is slower (300MB/sec ?), but the mistaken positives will be fewer SHA1 is slower still (100MB/sec ?). Etc. SHA256 is quality stuff (30MB/sec ?) but the odds of two files being equal by mistake will be very low indeed. (Virustotal uses it instead of uploading a file. Virustotal searches the database with it. Your browser calculates the hash.) MD5 is a good trade-off on storage speed (how fast you can read a disk to compute them), versus certainty. All of the previously named methods, give much better quality than running a simple arithmetic checksum, which could mistakenly claim two files are the same. You can test the relative speeds, using the right-click menu provided by 7ZIP, which now generates checksums on selected files for you. And you can get some speed numbers that way. ******* There are also third party "duplicate finders" which do the same thing for you. With no scripting or command line stuff. ******* In any case, it takes time, and isn't free. It's about as nasty as an AV scan. The only absolute check of identical contents, is a byte-by-byte compare. A duplicate finder could identify candidates by sorting on MD5, then switch to byte-by-byte for the small number of candidates with identical MD5 - just to be sure. Paul |
#4
|
|||
|
|||
File Manager: Any attribute to tell whether files are duplicates?
On Sat, 21 Oct 2017 06:32:13 -0000 (UTC), "Auric__"
wrote: John Doe wrote: Besides file name and file size... There are so many different attributes, including about 10 different Date attributes, one would think that some attribute would match to at least suggest that two files are duplicates. You know, like a Date attribute of when the thing was authored... Am I missing something? No, that's pretty much it, as far as Explorer is concerned. You can have it display the "Date created", but that gets reset when copying a file, or moving from one disk to another. Personally, I just use the filesize. If two files are the same size, I open them and look. (If there are many, I check md5sums.) Then you could get a program like Total Commander which allows you to take a couple of files and compare than by actual content. |
Thread Tools | |
Display Modes | Rate This Thread |
|
|