View Single Post
  #60  
Old August 18th 18, 09:06 PM posted to alt.comp.freeware,alt.comp.os.windows-10
Terry Pinnell[_3_]
external usenet poster
 
Posts: 732
Default Sort files by aspect ratio?

Paul wrote:

Paul wrote:
Terry Pinnell wrote:
Terry Pinnell wrote:

Paul wrote:

Terry Pinnell wrote:
Terry Pinnell wrote:

Anyone know of a tool or hack that will do something that Win 10 File
Explorer unfortunately cannot: sort a folder of files into aspect
ratio
(width/height)?

It's an operation I need quite frequently, such as when trying to
isolate all files with say a 16:9 ratio (to some fine tolerance if
necessary).

Terry, East Grinstead, UK
Reinhard has developed a great solution based on GAWK and EXIFTOOL,
pulled together by a batch file.

It's not only marginally faster than Destinations2Folders but also
shows
the originals sorted to aspect ratio, which I sometimes prefer
before I
split off into subfolders. (That's the approach I used in my own
macro -
which was about 150 times slower!)

In practice the source folder, typically a copy of the originals, is
selected and 'Ratio-Rename' selected from the right click context
menu,
a shortcut to that batch file having already been copied to the
Send To
folder.
The files are then renamed with prefixes defining the AR and sorted by
that name.

My test folder of 100 JPGs and BMPs were processed in under 4
seconds on
this PC under Win 10 Pro (i7, 4.0GHz, 32 GB).

Excellent work, thanks Reinhard!

Terry, East Grinstead, UK
Here's my test result.

My test folder had 120 files, of which my
criterion of 4:3 with a 1 percent tolerance
caused 47 files to be copied.

The timing function uses "timeit" from rktools.

https://s15.postimg.cc/6z9oq7k7v/skimmer_test.gif

I have problems with most of the timing tools I've
used over the years, in that I can't believe what
I'm seeing. I guess I'm not very good at guestimating
time with the ole Mark One Eyeball.

I modified skimmer.awk slightly, to remove some
of the print statements in the output, so the screen
wouldn't be quite such a mess. The debug statements
were left in the version I published, to make it
easier to figure out when something was going wrong.

Paul
Just under a tenth of a second? From clicking 'Go' to seeing your
finished result?

Terry, East Grinstead, UK

OK, I was looking at 'Process Time', my mistake. But I see your Elapsed
Time, is impressive at under 1.5 seconds. That's what I've been
measuring with my stop watch.
I'm curious about what goes on for the other 1.4 s on top of 'Process
Time'?

Terry, East Grinstead, UK


I actually hate those little timer programs, because
they never seem to match what I see going on the screen.
In this case, it's possible Windows Defender scans
the files before the run actually starts.

I'm thinking maybe I should be shooting video or something :-)

One possibility, is I could watch the run with Process Monitor,
and use timestamps off that run as a metric.

*******

The results show that the powershell call has more
variation than the file copying.

Here are three runs after booting up.
The Powershell (database query) seems to have a variable time.

The third run is after several repeated invocations,
so is "fully warmed up".

The Timeit time seems to match the gawk stamps.

3:44:57.533 gawk process start
3:44:57.603 powershell.exe process start
3:44:59.913 powershell.exe process exit
3:45:00.988 gawk process exit

Timeit Elapsed Time 3.457 seconds
Timeit Process Time 0.140 seconds

*******

3:57:05.562 gawk process start
3:57:05.623 powershell.exe process start
3:57:07.057 powershell.exe process exit
3:57:08.082 gawk process exit

Timeit Elapsed Time 2.522 seconds
Timeit Process Time 0.125 seconds

*******

4:02:49.400 gawk process start
4:02:49.456 powershell.exe process start
4:02:50.738 powershell.exe process exit
4:02:51.777 gawk process exit

Timeit Elapsed Time 2.379 seconds
Timeit Process Time 0.078 seconds

And I'm not even getting close to yesterdays 1.416 second
time. But some sort of update came in yesterday. Benchmarking
using Windows 10 as a platform is largely a waste of time.
You're not in control.

Paul


Sometimes benchmarking can teach you a lot.

In the first new test case, I threw 50000 JPGs into
my work folder, then had supper while the indexer
indexed them all. In the first case, I selected
aspect ratio files which do not match the majority
of the files. The total time is 4.8 seconds or so.

50144 images in tree, of which 47 got copied

10:09:42.145
10:09:42.192 \__ database lookup time about 3.8 seconds
10:09:45.963 / \___ file copy time about 1 second
10:09:47.028 /

50144 images in tree, of which 47 got copied

*******

In this case, I selected an aspect ratio of 1280x720
which matches all 50000 of the new files. The lookup
of the files still takes 3.8 seconds (because my
search makes no change to how the query is done). But
the file copy took 10 minutes. It needs a speedup
of around a factor of 50. I blame this on all
the subshells getting forked. I would need to rewrite
the code a tiny bit, to fix this.

10:10:28.879
10:10:28.926
10:10:32.736
... dnc === computer *crashed* without finishing - nice
timeit=10:00.498 === 3.8 seconds database, 596 seconds to copy files

The computer crashed, because I was using procmon to monitor
the experiment, and it exhausted system memory. All of it.
Removing Procmon and rerunning with "timeit" completed
the measurement for me. I did eventually get the
10 minute number.

50144 images in tree, of which 50010 got copied

*******

So there is some value in testing "scaling" a bit.
I don't consider my script taking 10 minutes to
do that, to be very encouraging.

I really wanted to use "robocopy" to transfer
the files, but robocopy doesn't accept a copy_list.
Robocopy would kick ass, given a chance. Using
"copy" like I did, was a poor second choice.

Paul


I baulked at 50,000 but I doubled up my 100 successively to 3,200. That
5 GB folder had a wide range of ARs, mostly JPGs, a few BMPs.

The first attempt failed because AWK apparently dislikes filenames
containing spaces, and FE created lots of those, like
20020302-125739-Ashdown6 - Copy - Copy - Copy - Copy - Copy.JPG

But after renaming them simply in Bulk Renamer Utility (0001 to 3200),
the elapsed time of Reinhard's AWK/BAT combo was 18 secs, by stop watch.
If the relationship is roughly linear that would imply 4:40 for 50,000;
close to five minutes.

So, ProcMon, not something to leave running while you have a coffee
then!


Terry, East Grinstead, UK
Ads