If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Rate Thread | Display Modes |
#136
|
|||
|
|||
Why is search so brain dead these days?
On 6/26/2020 6:49 AM, Frank Slootweg wrote:
I assume your mean*extra* context menu stuff, i.e. tools which are added to the default (right-click) context menu. If so, are you sure these extra tools do not work in the context menu of a file dialog box? I do not have many such tools, but I did a quick test and if I do a 'Save as...' in Chrome, the context menu in the 'Save as' popup offers 'Share with Skype' (probably not a good example), '7-zip' and 'Convert to PDF in Foxit PhantomPDF'. For other items, it offers things like VLC and Google Drive. So if the context menu of a file dialog box shows*these* third-party tools, why shouldn't it show the mentioned third-party*search* tools? The context menus may or may not work, but it's irrelevant if you can't make use of the findings. If you're searching something in the file access dialog box with Agent Ransack or something, AR's findings can't be used by the open/save dialog box for use by the program trying to open or save a file. Yousuf Khan |
Ads |
#137
|
|||
|
|||
Why is search so brain dead these days?
On 6/23/2020 8:04 PM, Stan Brown wrote:
On Tue, 23 Jun 2020 13:09:57 -0400, Yousuf Khan wrote: Remember there used to be a time when if you wanted to delete entire groups of files or folders in DOS, and you used a "del *.*" command, and the whole thing would be done in under 1 second? But then later in Windows, doing the same thing would take minutes, just because the Explorer is doing it in a braindead way, where it deletes each file individually? Maybe I'm having a lapse of memory, but I can't remember ever seeing that happen -- unless files were in use, of course, but then they couldn't be deleted at all. It was actually a well-known feature, which I think was a leftover from CP/M days (predecessor to DOS), which DOS retained but was deprecated in favour of the newer DOS commands. But Microsoft itself just kept using this older CP/M call within command.com itself, so it never got rid of the old call, since it was so much faster than DOS's own slow version which did things one file at a time. Yousuf Khan |
#138
|
|||
|
|||
Why is search so brain dead these days?
On 6/23/20 12:09 PM, Yousuf Khan wrote:
On 6/23/2020 10:48 AM, Ken Blake wrote: On 6/22/2020 12:45 PM, Alan Baker wrote: All the hits on my sister-in-law's name on the Mac: 3 seconds for 1405 hits. Just sayin' :-) That's very slow compared to Search Everything on Windows. Another area where Microsoft has ****ed something up, even though it was working simply beforehand, is file deletions. Remember there used to be a time when if you wanted to delete entire groups of files or folders in DOS, and you used a "del *.*" command, and the whole thing would be done in under 1 second? But then later in Windows, doing the same thing would take minutes, just because the Explorer is doing it in a braindead way, where it deletes each file individually? Then it took 3rd party utils to bring back the 1 second deletes? Â*Â*Â*Â*Yousuf Khan I think , starting with Vista, to delete a file there was a long wait as Windows needed to calculate disc space. Pretty annoying and absurd |
#139
|
|||
|
|||
Why is search so brain dead these days?
Alan Baker wrote:
On 2020-06-21 7:42 a.m., philo wrote: On 6/21/2020 8:38 AM, philo wrote: On 6/20/2020 6:06 PM, Rene Lamontagne wrote: On 2020-06-20 5:54 p.m., Yousuf Khan wrote: I'm referring mainly to Windows search, but this applies to a lot of other search algorithms all over the place and on the Internet too. In the olden days, search was very efficient and somewhat intuitive. For example, let's say you try to do a search for "virtual" and expect you might find something like VirtualBox, VirtualPC, whatever. But for some reason, the current Windows search cannot find these. If you do a search for the full name, then it may find them (hit and miss). In the old days, these searches would find all instances where the string would occur, even as part of a substring. It was very easy to do searches, and you could even do multiple words to narrow down the searches. What has gone wrong with search algorithms now? Yousuf Khan I really can't help you here because I never use Windows search. I use "Search Everything" and "Agent Ransack" exclusively. sorry Rene Thanks for the info. As one who recently did a search that found close to nothing, I am happy with the much improved results using the free version of Agent Ransack. Ransack : 52 hits in ten minutes From Explorer, after one hour , four hits...search nowhere near complete Spotlight: all the hits on the entire drive in 15 seconds. The Microsoft search is able to do this in under a second. *But*, there are caveats. The Windows Search is accessible as an SQL operation (requiring scripting), or through File Explorer (the normal user path). I was able to create a test partition with a million files on it, and the search can correctly figure out how many text files are present, in less than a second. But if we do the SQL benchmark instead, why are the results different ? Well, it's a hint. ******* I prepared a "tree" of folders, deep enough to hold a billion files, but only partially filled it (16 files per folder at the bottom level. The test pattern is pathological, in that an inverted index will not be able to deal with the pattern all that well. It means the dictionary has a vocabulary of a million words, not an English dictionary of fifty thousand. filename 00012345.txt contains "00012345" I didn't start with a million files. I started with 32 million files, and the indexer could not index more than around 2 million of them. It looked like the merge step failed silently (inverted indexer merges main index with the small index it creates as it's scanning). As a result, I wiped the test partition and loaded up a million files. With "real user files", this limitation might never be evident. It's the pathology ("incompressibility" if you will), of the content, which eventually prevents indexing. The index file stopped in this case, at around 8GB. In the following test, the index file is around 3.5GB or so, for 1048576 txt files. String files-returned time(sec) 0000000* none 0.042 000000* 27 0.054 00000* 4080 0.184 === reasonably good 0000* 65520 2.528 000* FAIL 30.01 ("permission error" ???) Filename != 'A' 1114100 44.4 sec Rather than the SQL being incapable of returning more than 65520 file references, it's the nature of the query that seems to break it. A query intended to "light up" the index, works as expected. It takes 44.4 seconds to produce a 45MB file listing, with all of the hits in it. So now I switch over to File Explorer. Any of the above searches completes in under a second. How is this possible ? Well, the "results display" is obviously not completely computed. Whwen you use the scroll bar, as you scroll down, additional searches take place. Just enough to populate the screen. In the SQL results, you can see for small numbers of screen objects (like 27 items), the results can come back reasonably quickly. OK, so why am I not crowing about this result ? Well, you can't save the output in the File Explorer window. You cannot *print* the output in the File Explorer window. You can do a copy/paste, but the copy/paste only has the "path" column. Not all columns are copied if you do copy paste. If you use Nirsoft sysexp.exe (a tool to copy a recalcitrant window like this one), there is wheel spin for an *hour*, and no result. Like an expensive hooker, Windows Search is "pretty to look at, but can't boil an egg". Who needs fast results exactly, if you can't do anything with them ? ******* Summary: The "easy to obtain" results, in under a second, are to me, mostly useless. I may want to post-process a result, which seems just about impossible with the GUI. I *can* get *a* result from Windows Search, but it takes a script calling into an SQL engine. The runtime of using SQL, depends on the pathology of what's in the index. The results hint that the technology does not scale well. Even with an easier-to-index regular file mix, it's eventually going to have trouble. Just at some slightly higher number of files. The index was stored on rotating rust. The test was not done using an SSD. Users have described having 50GB Windows.edb files, but I don't know if that's still the case on the current release of Win10. Paul |
#140
|
|||
|
|||
Why is search so brain dead these days?
On 6/20/2020 3:54 PM, Yousuf Khan scribbled: I'm referring mainly to Windows search, but this applies to a lot of other search algorithms all over the place and on the Internet too. In the olden days, search was very efficient and somewhat intuitive. For example, let's say you try to do a search for "virtual" and expect you might find something like VirtualBox, VirtualPC, whatever. But for some reason, the current Windows search cannot find these. If you do a search for the full name, then it may find them (hit and miss). In the old days, these searches would find all instances where the string would occur, even as part of a substring. It was very easy to do searches, and you could even do multiple words to narrow down the searches. What has gone wrong with search algorithms now? Yousuf Khan explorer is not made right, it is a liar. The true layout to the directory tree shows you that it is. And that is why, no one can figure it. Neither can its search engine. Everyone should know Bill Gates is a massive thief, and a criminal. its to bad, the 666 does not see it this way. You can get, a file viewer that is not of Bill Gates, and the search file may work better and faster. I mean, without to get lost. |
#141
|
|||
|
|||
Why is search so brain dead these days?
On 2020-06-28 10:50 a.m., Paul wrote:
Alan Baker wrote: On 2020-06-21 7:42 a.m., philo wrote: On 6/21/2020 8:38 AM, philo wrote: On 6/20/2020 6:06 PM, Rene Lamontagne wrote: On 2020-06-20 5:54 p.m., Yousuf Khan wrote: I'm referring mainly to Windows search, but this applies to a lot of other search algorithms all over the place and on the Internet too. In the olden days, search was very efficient and somewhat intuitive. For example, let's say you try to do a search for "virtual" and expect you might find something like VirtualBox, VirtualPC, whatever. But for some reason, the current Windows search cannot find these. If you do a search for the full name, then it may find them (hit and miss). In the old days, these searches would find all instances where the string would occur, even as part of a substring. It was very easy to do searches, and you could even do multiple words to narrow down the searches. What has gone wrong with search algorithms now? Â*Â*Â*Â* Yousuf Khan I really can't help you here because I never use Windows search. I use "Search Everything" andÂ* "Agent Ransack" exclusively. sorry Rene Thanks for the info. As one who recently did a search that found close to nothing, I am happy with the much improved results using the free version of Agent Ransack. Ransack : 52 hits in ten minutes Â*From Explorer, after one hour , four hits...search nowhere near complete Spotlight: all the hits on the entire drive in 15 seconds. The Microsoft search is able to do this in under a second. Ummm... ...it's not faster than Spotlight. Sorry. *But*, there are caveats. Right! The Windows Search is accessible as an SQL operation (requiring scripting), or through File Explorer (the normal user path). I was able to create a test partition with a million files on it, and the search can correctly figure out how many text files are present, in less than a second. With an SQL query... ...or in an Explorer window? But if we do the SQL benchmark instead, why are the results different ? Well, it's a hint. ******* I prepared a "tree" of folders, deep enough to hold a billion files, but only partially filled it (16 files per folder at the bottom level. The test pattern is pathological, in that an inverted index will not be able to deal with the pattern all that well. It means the dictionary has a vocabulary of a million words, not an English dictionary of fifty thousand. Â*Â* filenameÂ* 00012345.txtÂ* contains "00012345" I didn't start with a million files. I started with 32 million files, and the indexer could not index more than around 2 million of them. It looked like the merge step failed silently (inverted indexer merges main index with the small index it creates as it's scanning). As a result, I wiped the test partition and loaded up a million files. With "real user files", this limitation might never be evident. It's the pathology ("incompressibility" if you will), of the content, which eventually prevents indexing. The index file stopped in this case, at around 8GB. In the following test, the index file is around 3.5GB or so, for 1048576 txt files. StringÂ*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* files-returnedÂ*Â* time(sec) 0000000*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* noneÂ*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* 0.042 000000*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* 27Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* 0.054 00000*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* 4080Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* 0.184Â*Â* === reasonably good 0000*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* 65520Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* 2.528 000*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* FAILÂ*Â*Â*Â*Â*Â*Â*Â*Â*Â* 30.01Â*Â*Â* ("permission error" ???) Filename != 'A'Â*Â* 1114100Â*Â*Â*Â*Â*Â*Â*Â*Â* 44.4 sec Rather than the SQL being incapable of returning more than 65520 file references, it's the nature of the query that seems to break it. A query intended to "light up" the index, works as expected. It takes 44.4 seconds to produce a 45MB file listing, with all of the hits in it. So now I switch over to File Explorer. Any of the above searches completes in under a second. How is this possible ? Well, the "results display" is obviously not completely computed. Whwen you use the scroll bar, as you scroll down, additional searches take place. Just enough to populate the screen. In the SQL results, you can see for small numbers of screen objects (like 27 items), the results can come back reasonably quickly. OK, so why am I not crowing about this result ? Well, you can't save the output in the File Explorer window. You cannot *print* the output in the File Explorer window. You can do a copy/paste, but the copy/paste only has the "path" column. Not all columns are copied if you do copy paste. If you use Nirsoft sysexp.exe (a tool to copy a recalcitrant window like this one), there is wheel spin for an *hour*, and no result. Like an expensive hooker, Windows Search is "pretty to look at, but can't boil an egg". Who needs fast results exactly, if you can't do anything with them ? ******* Summary: The "easy to obtain" results, in under a second, are to me, mostly useless. I may want to post-process a result, which seems just about impossible with the GUI. I *can* get *a* result from Windows Search, but it takes a script calling into an SQL engine. The runtime of using SQL, depends on the pathology of what's in the index. The results hint that the technology does not scale well. Even with an easier-to-index regular file mix, it's eventually going to have trouble. Just at some slightly higher number of files. The index was stored on rotating rust. The test was not done using an SSD. Users have described having 50GB Windows.edb files, but I don't know if that's still the case on the current release of Win10. And Spotlight works from the command line. man mdfind mdfind -- finds files matching a given query |
Thread Tools | |
Display Modes | Rate This Thread |
|
|