PCbanter - View Single Post

**Paul[_32_]** · December 10th 18, 07:50 PM posted to alt.windows7.general

Ken Blake wrote:
On Mon, 10 Dec 2018 16:09:08 +0000, "J. P. Gilliver (John)"
wrote:

In message , Big Al
writes:
On 12/10/18 4:44 AM, JBI wrote:
Just what it says, I'm trying to find something portable and free
that would do a search of my hard drive the way Google does, where an
input image is given and then the search finds the same or similar
images. Thanks!
Now that would be cool. I know I have duplicates under different names.
You say "or similar", I'd just like to find a binary duplicate.

Al
I've found "Duplicate Image Finder", by Runningman software, works well:
you can set the percentage similarity you wish, including 100%. However,
this seems to have disappeared. I await with interest the results that
another poster in this thread reports from several other utilities that
person has found.

For binary duplicates, _if_ you know one filename, using the
"Everything" search utility, with no name specified, highlighting the
file, then sorting the result by the size column, will list files of
identical size adjacent to the highlighted one, so that their names and
locations can be seen, and then compared (if only with "fc /b").

A very good suggestion. That has to be *much* faster than comparing a
file with every other file on the drive, bit by bit.

On the other hand, perhaps a good duplicate finding program would
begin by doing the same kind of thing. I hadn't thought of that when I
posted my earlier message in this thread.

They do make some pretty strange software.

http://www.mindgems.com/products/VS-...SDIF-About.htm

This is similar to what the image upload Google finder
was *supposed* to do. It pretends to compute a "percentage
of similarity" and notice it's not doing autocorrelation.
It almost seems like a crude classifier.

http://www.mindgems.com/products/VS-...tae_images.png

When I tested it, the Google one was terrible. And it's
a "hard" problem, so I'm not faulting them for this. No
matter what approach you use, it simply won't do a good
job on all possible inputs.

*******

"Duplicate files", where the MD5 of the files must be
exact, to detect a duplicate, that's an O(n) problem.
The detecting similar images can be O(n) or O(n^2),
depending on how you attempt to do it.

I can't find it now, but I did have a heuristic program
that computes a signature for each photo, and you compare
"how many bits are different" between two photo signatures,
to tell whether they're similar. And that one was O(n).
It could easily produce false positives, and isn't intended
for the "flower" problem above. You'll notice the "flower"
example is much more tolerant (whatever the method is).

Paul