If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Rate Thread | Display Modes |
#61
|
|||
|
|||
Sort files by aspect ratio?
Terry Pinnell wrote:
I baulked at 50,000 but I doubled up my 100 successively to 3,200. That 5 GB folder had a wide range of ARs, mostly JPGs, a few BMPs. The first attempt failed because AWK apparently dislikes filenames containing spaces, and FE created lots of those, like 20020302-125739-Ashdown6 - Copy - Copy - Copy - Copy - Copy.JPG But after renaming them simply in Bulk Renamer Utility (0001 to 3200), the elapsed time of Reinhard's AWK/BAT combo was 18 secs, by stop watch. If the relationship is roughly linear that would imply 4:40 for 50,000; close to five minutes. So, ProcMon, not something to leave running while you have a coffee then! Terry, East Grinstead, UK Here's a new version. query.ps1 shouldn't have changed. There are now three files in the kit, plus having to acquire a copy of gawk.exe version 3. skimmer.awk gawk script query.ps1 powershell query of Windows Search database copy.ps1 powershell used to copy output files gawk.exe gnuwin32 GAWK version 3 for Windows **************** Helper script "query.ps1" ******************** # powershell -file query.ps1 -TREEDIR "'C:\'" param([string]$TREEDIR="'C:\'") $sql = "SELECT System.ItemFolderPathDisplay, ` System.ItemName, ` System.Image.HorizontalSize, ` System.Image.VerticalSize FROM SYSTEMINDEX ` WHERE System.Image.HorizontalSize0 AND ` System.Image.VerticalSize0 AND ` SCOPE=$TREEDIR" $provider = "provider=search.collatordso;extended properties=’application=windows’;" $connector = new-object system.data.oledb.oledbdataadapter -argument $sql, $provider $dataset = new-object system.data.dataset if ($connector.fill($dataset)) { $dataset.tables[0] | Export-CSV query.csv } **************** end of Helper script "query.ps1" ************** **************** "skimmer.awk" ******************** # gawk -f skimmer.awk width height percent scan_path out_dir # # gawk -f skimmer.awk 16 9 1 "C:\\" "C:\users\user name\downloads\outdir" NUL # # 0 1 2 3 4 5 (no input file) # # ARGC = 6 ARGV[0] .. ARGV[5] # # query.csv looks like this, skip the first two lines. There can be commas in the filename! # # #TYPE System.Data.DataRow # "SYSTEM.ITEMFOLDERPATHDISPLAY","SYSTEM.ITEMNAME"," SYSTEM.IMAGE.HORIZONTALSIZE","SYSTEM.IMAGE.VERTICA LSIZE" # "C:\Users\user name\Downloads\JPG2","0000014994_1.jpg","669","600 " # "C:\Users\user name\Downloads\JPG2","04.jpg","500","375" # # Powershell copy loop # Get-Content .\abspathnfile.txt | Foreach-Object { copy-item -Path $_ -Destination "X:\out\"} # ################################################## ######################### # This is a first cut script, with no error handling or disaster proofing! # No warranty expressed or implied. Paul. # # Aug19,2018 Switch to Powershell for file copying. About 1000 files per second. BEGIN { if (ARGC != 6) { print "Usage: width height percent_tol source_tree destdir" print "gawk -f skimmer.awk 16 9 1 " "\"C:\\\\\" " "\"C:\\users\\user name\\downloads\\outdir\" NUL" print "" print "The program needs five arguments." print "In some cases, two backslashes may be required on the end of a path, to work." print "This proof print will then show one backslash as having made it through." print "" for (i = 1; i ARGC; i++) print ARGV[i] exit 0 } else { print "Called with" print "" for (i = 1; i ARGC; i++) print ARGV[i] print "" width = ARGV[1]+0 height = ARGV[2]+0 percent = ARGV[3]+0 outdir = ARGV[5] } # houseclean before run - no collision protection, run one copy only! cmd = "\"del query.csv copyme.txt\"" system( cmd ) cmd = "\"powershell -executionpolicy bypass -file query.ps1 -TREEDIR \"'" ARGV[4] "'\"\"" print "Query: " cmd print "" system( cmd ) # You can redirect stderr output to clean the output a bit. # Here, I'm hiding the warning that the directory already exists. cmd = "\"md \"" outdir "\" 2NUL\"" print "Cmd: " cmd print "" system( cmd ) # Checking whether any files in the output folder, will conflict. # Assumes outdir is one big flat folder, and not a tree! # The "a-d" removes directory names from the listing. cmd = "dir /b /a-d \"" outdir "\" 2NUL" print "Cmd: " cmd print "" dest[ "no file by this name" ] = 0 src[ "no file by this name" ] = 0 while ((cmd | getline) 0) { dest[ $0 ] = 0 # holds current outdir filename list } close( cmd ) high = width/height * (1 + percent/100) low = width/height * (1 - percent/100) if ( (high 0) || (low 0) ) exit 0 i=0 j=0 trouble=0 # No FPAT in Gawk3 FS="\"" while ( (getline "query.csv") 0 ) { # scan for collisions, make copyme.txt if ( i = 2 ) { # "C:\Users\xxxx yyyyyyy\Downloads\JPG2","04.jpg","500","375" # 2 3 4 5 6 7 8 aspect = $6/$8 if ( (high = aspect) && (low = aspect) ) { if ( $4 in dest ) { trouble++ if (trouble = 20) { print $2 "\\" $4 " already exists in output directory" } } if ( $4 in src ) { trouble++ src[ $4 ]++ if (trouble = 20) { print $4 " exists " src[ $4 ] " times on copyme.txt list" } } else { src[ $4 ] = 1 } # do the simplified copy scheme here and make a file list print $2 "\\" $4 "copyme.txt" j++ } } i++ } close( "query.csv" ) close( "copyme.txt" ) if (trouble 0) { # collision detection on file names... print "" print "Trouble detected, " trouble " problems, exiting run before copying anything" print " Only the first 20 problems are printed to screen" exit 0 } # Only copy files if there is no trouble. cmd = "\"powershell -executionpolicy bypass -file copy.ps1 -TREEDIR \"" outdir "\"\"" print "Cmd: " cmd print "" system( cmd ) print i " images in tree, of which " j " got copied" } ************* end of "skimmer.awk" **************** **************** Helper script "copy.ps1" ******************** # Source filelist is hardwired to "copyme.txt". # Accepts a text file of absolute_path_filenames to copy to TREEDIR # powershell -executionpolicy bypass -file copy.ps1 -TREEDIR 'F:\outdir test' param([string]$TREEDIR="'X:\does_not_exist'") Get-Content .\copyme.txt | Foreach-Object { copy-item -Path $_ -Destination "$TREEDIR" } **************** end of Helper script "copy.ps1" ************** Time for 47 files copied from a scan of 50100 files = 3 seconds Time for 50000 files copied from a scan of 50100 files = 55 seconds Paul |
Ads |
Thread Tools | |
Display Modes | Rate This Thread |
|
|