If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Rate Thread | Display Modes |
#46
|
|||
|
|||
A browser question: now address used in hosts file
"J. P. Gilliver (John)" wrote
| Maybe I will. Though for my hosts file, I'd not be able to report much | difference (only 105 entries) - I think. Anyway, I've just done it | (that's how I know it was 105), so we'll see. Ick. It sounds like you need an editor with a Replace All function. | Yes. But that's silly. People with gigantic | HOSTS files are reasoning that they can have | a smaller file by reducing the number of characters. | But the real time is in searching for the URL string, | and that operation is probably taking a few ms, | even in a gigantic HOSTS file. It's so fast that | it's hard to measure. | | Well, let's say 5 ms. 100 links would thus make for half a second - and | web pages with 100 links probably aren't uncommon Maybe. But most of those are in the same domain. And the browser or Acrylic will cache DNS. I like to keep HOSTS small. But I doubt there's any notable difference in speed between looking through 100 lines for "doubleclick" or looking through 1000 lines. (There would, however, be a big boost from blocking the gobs of script that have to be parsed, via HOSTS and/or NoScript and/or by blocking script altogether. Many pages now have so much script that it's like loading a large software program.) |
Ads |
#47
|
|||
|
|||
A browser question
On Mon, 25 Sep 2017 11:55:33 -0400, Mayayana wrote:
"Rodney Pont" wrote | I think you're right. I never noticed that before. | The guidance doesn't explain in detail but does show | this example: | | 127.0.0.1 ad.* ads.* | | Isn't it .* for regex to match anything? You might be right. Like I said, life's too short for regexp so I don't care. Here's the Acrylic info: I've left the Acrylic info in below. I had meant to put my comment after your *doubleclick* example but I cut too early. Your example would have worked then since it wasn't using regex. I do find regex confusing and I'm glad I'm not alone :-) --------------------------------------- The separator between IPADDRESS and HOSTNAMES can be any number of spaces # # or tabs or both. If the HOSTNAMES contain the special characters '*' and # # '?' a (slow) "dir" like pattern matching algorithm is used instead of a # # (fast) binary search within the list of host names: # # # # 127.0.0.1 ad.* ads.* # # # # If a HOSTNAME starts with the '/' character instead it is treated like a # # regular expression (also very slow compared to a binary search): # # # # 127.0.0.1 /^ads?\..*$ # # # # Note: More info about the regular expression engine and its syntax can be # # found at: http://regexpstudio.com ------------------------------------------ So it only parses lines beginning with / as regexp. * is treated as a wildcard. -- Faster, cheaper, quieter than HS2 and built in 5 years; UKUltraspeed http://www.500kmh.com/ |
#48
|
|||
|
|||
A browser question: now address used in hosts file
In message , Mayayana
writes: "J. P. Gilliver (John)" wrote | Maybe I will. Though for my hosts file, I'd not be able to report much | difference (only 105 entries) - I think. Anyway, I've just done it | (that's how I know it was 105), so we'll see. Ick. It sounds like you need an editor with a Replace All function. I have. Notepad+. (Though I think the basic Notepad has that as well.) [] Maybe. But most of those are in the same domain. And the browser or Acrylic will cache DNS. I like to keep HOSTS small. But I doubt there's any notable Me too. It'll be a lot smaller when I get the round tuit for installing Acrylic and editing it; I've only just downloaded Acrylic today. [] NoScript and/or by blocking script altogether. Many pages now have so much script that it's like loading a large software program.) Couldn't agree more; unfortunately, a lot of pages don't work properly if you block scripts. (Sometimes they don't work but in an obvious manner, which would be fine; however, sometimes they don't work in a way that's far from obvious - something just isn't there.) -- J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf Just because you're old it doesn't mean you go beige. Quite the reverse. - Laurence Llewelyn-Bowen, RT 2015/7/11-17 |
#49
|
|||
|
|||
A browser question
Mayayana wrote:
"Paul" wrote | Well, consider what would happen, if you enabled the | IIS web server in your copy of Windows. But I'm not running a server. How about real world scenarios? Maybe you could volunteer to do thorough testing of 0.0.0.0 vs 127.0.0.1. I have actually run IIS. But I'm just not a web person, so not today thanks. I think there might be IIS in Win10 in Windows Features. Someone with more web monkey genes, might enjoy messing with that for you. Paul |
#50
|
|||
|
|||
A browser question: now address used in hosts file
A lookup on 0.0.0.0 will immediately fail. It cannot be assigned to any
one host (but in a server can be used for masking on any IP address). 127.0.0.1 is permitted assignment to a host (localhost) and as such there must be the time to listen for a response. After all, you might have a local server running on your host that listens for requests for connections, so a redirect to 127.0.0.1 has to wait to see if there exists a listener to responde at that IP address (on whatever port was used there). There are many apps that run as a local server and rely on connections to localhost (127.0.0.1). Redirects in the hosts file to 0.0.0.0 are more quickly rejected than are redirects to 127.0.0.1. Windows knows immediately it cannot get to a host with an IP address of 0.0.0.0. Windows doesn't know there is no listening process (on any port) at 127.0.0.1 until it tries. If you run a program that is a local server listening on a port at 127.0.0.1, the hosts file will get even slower. You will be using the hosts file with your web browser. Your web browser will default to port 80 for its outbound connection requests (although a different port can be specified (http://host.domainort). Although some other port can be specified, port 80 is the default for HTTP and 443 for HTTPS. If your local server is listening on 80, 443, or both, then redirecting to 127.0.0.1 in your hosts file will have Windows pointing your web browser to that local server process listening on those ports. Instead of getting an immediate rejection (nothing listening on that port), your web browser connects to that local server. Using 127.0.0.1 will conflict with many if not most local servers that you can run on your localhost and why it was a bad choice for an IP address to perform auto-blocking. It continues to get used because of tradition: it started that way so it remains that way despite that way is fraught with hazard. When I tried the pre-compiled hosts files (e.g., MVP hosts), I changed all the 127.0.0.1 addresses to 0.0.0.0. That made the rejects happen quicker. Not by a lot per lookup but still quicker. Reduced rejection time was noticed only in web pages that had hundreds of links to external resources. There could be a host listening at 127.0.0.1 so there's time waiting for a response. 0.0.0.0 cannot be assigned as a device address. https://en.wikipedia.org/wiki/0.0.0.0 https://www.lifewire.com/four-zero-ip-address-818384 However, I dropped using a pre-compiled hosts file. They are too often out of date and often too agressive in what they block. They also must list a *host*. They cannot list a domain. So, for example, it takes over 50 entries to block some (not all) of the hosts at doubleclick.com. Instead of using URL blocking where you could specify only a domain or perhaps even wildcarding to reflect several matching hosts at a domain, a host must be specified in the hosts file. Also, some domains will accept any hostname and redirect all connection requests to their server regardless of the hostname used. That means a web page could have any random hostname and still connect to the ad or tracking server. There is no way any text file could contain every possible combination of characters in a hostname at a domain to effectively block the entire domain where their server maps any hostname to their server. In that domain's nameserver, they map ANY hostname to the IP address of their server. The point of using the hosts file for blocking was a frankenjob purpose of that file. It happened to work but wasn't intended for that purpose. For EVERY lookup in a web page, the hosts file gets opened and searched sequentially from top to bottom in a text search (not a binary database search). For the next lookup, the hosts file has to be reopened and search sequentially. The hosts file is *NOT* cached into memory (and good thing, too, considering how huge are the pre-compiled hosts files). For web pages with a lot of external resources (not relatively addressed), all this repeated file opening and serial searching adds more time, so much so that the lookup time can outweigh the DNS lookup times. If the web page had 100 externally linked resources, that means opening the hosts text file 100 times and parsing the text within the file from top to bottom 100 times. Parsing a text file to search within it is slow, and all that file handling (open, read, close) takes time. The hosts file was intended to hold from a few entries to a few hundred entries, not 20 THOUSAND entries. It is just a text file, not a binary database. Using the hosts file for blocking got started because it was a kludge that worked; however, pre-compiled hosts files have become ridiculously huge. Someone realized the kludge a couple decades ago would work but users still think it is an applicable scheme today. Old info lasts forever in the Internet, and often a lot of it is not datestamped or simply regurgitated without thinking it through. Also, unlike other ad and tracking blocking methods, there is no easy and quick method to disable or enable the use of the hosts file. Your web browser has no means of toggling the hosts file on and off. Almost all adblock apps provide a means to toggle them off and back on. Even Internet Explorer's TPL feature should be quickly toggled on and off. With a hosts file, you would need a separate batch file and shortcut to it to rename the hosts file so it couldn't be found in the next and subsequent DNS lookup requests, and then rename it back to "hosts" when you wanted to start using that list again. Toggling the hosts file on and off won't work if you leave the local DNS Client service running. Once a positive lookup gets cached in the DNS Client's cache, it gets used first so that IP address gets reused instead of what you specified in the hosts file. The DNS Client caches positive entries for 1 day (it is configurable). You would need to flush the DNS cache after toggling off (renaming) the hosts file, or leave the DNS Client service always disabled and then have your web client reissue DNS requests to the server for every lookup (not in the hosts file) rather than use the faster local DNS cache. Don't remember if it was you or someone else that mentioned using HostsMan. Well, if you are really intent on using a pre-compiled hosts file then you want it to get updated. What good is an antiquated blacklist? Sure there are some hosts (not domains ... hosts!) listed in the hosts file that don't often change. Mostly those would be for the big CDN (Content Delivery Network) providers. Doubleclick is but just one of thousands of CDNs. Those pre-compiled hosts files also list smaller ad and tracking sources. The blacklist isn't static. If it were, we'd still be using the same pre-compiled hosts file from a couple decades ago with only a couple thousand entries. The sources change so the blacklist should also change. Are you going to revisit the site where you got a pre-compiled hosts file, download the latest one, and then write it atop the old hosts file (which means losing any edits you did for sites you consider were false positives in their blacklist)? Hostsman does an automated job of retrieving the latest version of the pre-compiled hosts file for you and replacing the existing (old) hosts file. You could run it manually if you don't like anything auto-updating itself; however, you are using a blacklist against ads and tracking and that changes often. Would you disable the auto-update in your anti-virus software and then remember to do it manually every year or two? If so, don't bother using an AV program, or a hosts file a they will way out of date. |
#51
|
|||
|
|||
A browser question: now address used in hosts file
"VanguardLH" wrote
| So you are NOT using someone else's pre-compiled hosts file which is | what I was discussing. I don't think it matters very much. The MVPS HOSTS is very bloated, but if people are not waiting seconds for parsing then we have to conclude the bloat is not a problem. | It isn't just doing the file I/O calls to open the file. Then the file | has to be read. It is a text file. It is parsed line by line and then | a search done on match criteria to determine if the host to be visited | is listed in the text file. Then the file has to get closed. Reading in a file is extremely fast. Even for multiple MBs it's virtually instant. Operations to find a specific string are also very fast. The logical approach would be to treat the file as a single string and look for, say, ads.doubleclick.net. If found then look to the left for the IP. There's no point checking each line. I don't know how you think you know that's how it's done. It would be an extremely wasteful approach. I also use an Acrylic HOSTS file with wildcards. I'm not seeing a lag. If you find there's a parsing lag then you may be doing something wrong. If uBlock Origin works for you that's not a problem. But it sounds like you're trying to find reasons to convince yourself that it's better than HOSTS and then going further to convince yourself that HOSTS is actually a problem. uBlock is written in javascript and presumably has to use some kind of blacklist. According to wikipedia it uses several: "uBlock Origin and uBlock support the majority of Adblock Plus's filter syntax. The popular filter lists EasyList and EasyPrivacy are enabled as default subscriptions. The extensions are capable of importing hosts files, and a number of community maintained lists are available at installation. Among the host files available, Peter Lowe's Ad servers list and Malware Domains are also enabled as default." So uBlock is using several lists, including at least one HOSTS file, and filtering them with poor- performance javascript rather than compiled code. Waddayaknow'boutthat! Who'd have thunk it has to parse a list to filter ads? |
Thread Tools | |
Display Modes | Rate This Thread |
|
|