A Windows XP help forum. PCbanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » PCbanter forum » Windows 10 » Windows 10 Help Forum
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

File Manager: Any attribute to tell whether files are duplicates?



 
 
Thread Tools Rate Thread Display Modes
  #1  
Old October 21st 17, 01:52 AM posted to alt.comp.os.windows-10
John Doe[_8_]
external usenet poster
 
Posts: 2,378
Default File Manager: Any attribute to tell whether files are duplicates?

Besides file name and file size...

There are so many different attributes, including about 10
different Date attributes, one would think that some attribute
would match to at least suggest that two files are duplicates.
You know, like a Date attribute of when the thing was
authored... Am I missing something?

Thanks.
Ads
  #2  
Old October 21st 17, 07:32 AM posted to alt.comp.os.windows-10
Auric__
external usenet poster
 
Posts: 295
Default File Manager: Any attribute to tell whether files are duplicates?

John Doe wrote:

Besides file name and file size...

There are so many different attributes, including about 10
different Date attributes, one would think that some attribute
would match to at least suggest that two files are duplicates.
You know, like a Date attribute of when the thing was
authored... Am I missing something?


No, that's pretty much it, as far as Explorer is concerned. You can have it
display the "Date created", but that gets reset when copying a file, or
moving from one disk to another.

Personally, I just use the filesize. If two files are the same size, I open
them and look. (If there are many, I check md5sums.)

--
Mars was where it should be. I felt more real.
  #3  
Old October 21st 17, 10:23 AM posted to alt.comp.os.windows-10
Paul[_32_]
external usenet poster
 
Posts: 11,873
Default File Manager: Any attribute to tell whether files are duplicates?

Auric__ wrote:
John Doe wrote:

Besides file name and file size...

There are so many different attributes, including about 10
different Date attributes, one would think that some attribute
would match to at least suggest that two files are duplicates.
You know, like a Date attribute of when the thing was
authored... Am I missing something?


No, that's pretty much it, as far as Explorer is concerned. You can have it
display the "Date created", but that gets reset when copying a file, or
moving from one disk to another.

Personally, I just use the filesize. If two files are the same size, I open
them and look. (If there are many, I check md5sums.)


You can run hashdeep on a drive and compute MD5 (a kind of checksum)
for the entire drive. Sort by hash (LibreOffice Calc), and identify
duplicates that way. The first thing that stands out, is a small set
of zero length files, which all have the same checksum.

Hashdeep can be blazing fast on an SSD, because it is multithreaded
and can work on more than one file at a time. It might be less well
tuned for a HDD (head thrashing), and then it might work better
with threading turned off. With hashdeep executable, you can
name the algo to use, while md5deep just does md5. (Basically
the one file is renamed, and the program "checks its name" to
figure out what role to play.)

http://md5deep.sourceforge.net/

In terms of computing checksums, they vary with "speed" and "certainty".
CRC32 is fast (1GB/sec maybe), but there could be collisions.
MD5 is slower (300MB/sec ?), but the mistaken positives will be fewer
SHA1 is slower still (100MB/sec ?). Etc.
SHA256 is quality stuff (30MB/sec ?) but the odds of two files being equal
by mistake will be very low indeed. (Virustotal uses it instead
of uploading a file. Virustotal searches the database with it.
Your browser calculates the hash.)

MD5 is a good trade-off on storage speed (how fast you can read a
disk to compute them), versus certainty. All of the previously
named methods, give much better quality than running a simple
arithmetic checksum, which could mistakenly claim two files
are the same.

You can test the relative speeds, using the right-click menu provided
by 7ZIP, which now generates checksums on selected files for you. And
you can get some speed numbers that way.

*******

There are also third party "duplicate finders" which do the
same thing for you. With no scripting or command line stuff.

*******

In any case, it takes time, and isn't free. It's about
as nasty as an AV scan.

The only absolute check of identical contents, is a byte-by-byte
compare. A duplicate finder could identify candidates by sorting
on MD5, then switch to byte-by-byte for the small number
of candidates with identical MD5 - just to be sure.

Paul
  #4  
Old October 21st 17, 05:40 PM posted to alt.comp.os.windows-10
No_Name
external usenet poster
 
Posts: 5
Default File Manager: Any attribute to tell whether files are duplicates?

On Sat, 21 Oct 2017 06:32:13 -0000 (UTC), "Auric__"
wrote:

John Doe wrote:

Besides file name and file size...

There are so many different attributes, including about 10
different Date attributes, one would think that some attribute
would match to at least suggest that two files are duplicates.
You know, like a Date attribute of when the thing was
authored... Am I missing something?


No, that's pretty much it, as far as Explorer is concerned. You can have it
display the "Date created", but that gets reset when copying a file, or
moving from one disk to another.

Personally, I just use the filesize. If two files are the same size, I open
them and look. (If there are many, I check md5sums.)


Then you could get a program like Total Commander which allows you to
take a couple of files and compare than by actual content.
 




Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off






All times are GMT +1. The time now is 10:52 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 PCbanter.
The comments are property of their posters.