A Windows XP help forum. PCbanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » PCbanter forum » Microsoft Windows 7 » Windows 7 Forum
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

A browser question



 
 
Thread Tools Rate Thread Display Modes
  #46  
Old September 25th 17, 05:04 PM posted to alt.windows7.general,alt.comp.os.windows-10,microsoft.public.windowsxp.general
Mayayana
external usenet poster
 
Posts: 6,438
Default A browser question: now address used in hosts file

"J. P. Gilliver (John)" wrote

| Maybe I will. Though for my hosts file, I'd not be able to report much
| difference (only 105 entries) - I think. Anyway, I've just done it
| (that's how I know it was 105), so we'll see.

Ick. It sounds like you need an editor with a
Replace All function.

| Yes. But that's silly. People with gigantic
| HOSTS files are reasoning that they can have
| a smaller file by reducing the number of characters.
| But the real time is in searching for the URL string,
| and that operation is probably taking a few ms,
| even in a gigantic HOSTS file. It's so fast that
| it's hard to measure.
|
| Well, let's say 5 ms. 100 links would thus make for half a second - and
| web pages with 100 links probably aren't uncommon

Maybe. But most of those are in the same domain.
And the browser or Acrylic will cache DNS. I like to
keep HOSTS small. But I doubt there's any notable
difference in speed between looking through 100 lines
for "doubleclick" or looking through 1000 lines. (There
would, however, be a big boost from blocking the gobs
of script that have to be parsed, via HOSTS and/or
NoScript and/or by blocking script altogether. Many
pages now have so much script that it's like loading
a large software program.)


Ads
  #47  
Old September 25th 17, 05:21 PM posted to alt.windows7.general
Rodney Pont[_5_]
external usenet poster
 
Posts: 95
Default A browser question

On Mon, 25 Sep 2017 11:55:33 -0400, Mayayana wrote:

"Rodney Pont" wrote


| I think you're right. I never noticed that before.
| The guidance doesn't explain in detail but does show
| this example:
|
| 127.0.0.1 ad.* ads.*
|
| Isn't it .* for regex to match anything?

You might be right. Like I said, life's
too short for regexp so I don't care.
Here's the Acrylic info:


I've left the Acrylic info in below. I had meant to put my comment
after your *doubleclick* example but I cut too early. Your example
would have worked then since it wasn't using regex. I do find regex
confusing and I'm glad I'm not alone :-)



---------------------------------------
The separator between IPADDRESS and HOSTNAMES can be any number of spaces #
# or tabs or both. If the HOSTNAMES contain the special characters '*' and
#
# '?' a (slow) "dir" like pattern matching algorithm is used instead of a
#
# (fast) binary search within the list of host names:
#
#
#
# 127.0.0.1 ad.* ads.*
#
#
#
# If a HOSTNAME starts with the '/' character instead it is treated like a
#
# regular expression (also very slow compared to a binary search):
#
#
#
# 127.0.0.1 /^ads?\..*$
#
#
#
# Note: More info about the regular expression engine and its syntax can be
#
# found at: http://regexpstudio.com
------------------------------------------

So it only parses lines beginning with / as regexp.
* is treated as a wildcard.



--
Faster, cheaper, quieter than HS2
and built in 5 years;
UKUltraspeed http://www.500kmh.com/


  #48  
Old September 25th 17, 05:29 PM posted to alt.windows7.general,alt.comp.os.windows-10,microsoft.public.windowsxp.general
J. P. Gilliver (John)[_4_]
external usenet poster
 
Posts: 2,679
Default A browser question: now address used in hosts file

In message , Mayayana
writes:
"J. P. Gilliver (John)" wrote

| Maybe I will. Though for my hosts file, I'd not be able to report much
| difference (only 105 entries) - I think. Anyway, I've just done it
| (that's how I know it was 105), so we'll see.

Ick. It sounds like you need an editor with a
Replace All function.


I have. Notepad+. (Though I think the basic Notepad has that as well.)
[]
Maybe. But most of those are in the same domain.
And the browser or Acrylic will cache DNS. I like to
keep HOSTS small. But I doubt there's any notable


Me too. It'll be a lot smaller when I get the round tuit for installing
Acrylic and editing it; I've only just downloaded Acrylic today.
[]
NoScript and/or by blocking script altogether. Many
pages now have so much script that it's like loading
a large software program.)

Couldn't agree more; unfortunately, a lot of pages don't work properly
if you block scripts. (Sometimes they don't work but in an obvious
manner, which would be fine; however, sometimes they don't work in a way
that's far from obvious - something just isn't there.)

--
J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

Just because you're old it doesn't mean you go beige. Quite the reverse.
- Laurence Llewelyn-Bowen, RT 2015/7/11-17
  #49  
Old September 25th 17, 10:01 PM posted to alt.windows7.general,alt.comp.os.windows-10
Paul[_32_]
external usenet poster
 
Posts: 11,873
Default A browser question

Mayayana wrote:
"Paul" wrote

| Well, consider what would happen, if you enabled the
| IIS web server in your copy of Windows.

But I'm not running a server. How about
real world scenarios? Maybe you could volunteer
to do thorough testing of 0.0.0.0 vs 127.0.0.1.


I have actually run IIS.

But I'm just not a web person, so not today thanks.

I think there might be IIS in Win10 in Windows Features.
Someone with more web monkey genes, might enjoy
messing with that for you.

Paul

  #50  
Old September 26th 17, 03:16 AM posted to alt.windows7.general,alt.comp.os.windows-10,microsoft.public.windowsxp.general
VanguardLH[_2_]
external usenet poster
 
Posts: 10,881
Default A browser question: now address used in hosts file

A lookup on 0.0.0.0 will immediately fail. It cannot be assigned to any
one host (but in a server can be used for masking on any IP address).
127.0.0.1 is permitted assignment to a host (localhost) and as such
there must be the time to listen for a response. After all, you might
have a local server running on your host that listens for requests for
connections, so a redirect to 127.0.0.1 has to wait to see if there
exists a listener to responde at that IP address (on whatever port was
used there). There are many apps that run as a local server and rely on
connections to localhost (127.0.0.1). Redirects in the hosts file to
0.0.0.0 are more quickly rejected than are redirects to 127.0.0.1.
Windows knows immediately it cannot get to a host with an IP address of
0.0.0.0. Windows doesn't know there is no listening process (on any
port) at 127.0.0.1 until it tries. If you run a program that is a local
server listening on a port at 127.0.0.1, the hosts file will get even
slower. You will be using the hosts file with your web browser. Your
web browser will default to port 80 for its outbound connection requests
(although a different port can be specified (http://host.domainort).
Although some other port can be specified, port 80 is the default for
HTTP and 443 for HTTPS. If your local server is listening on 80, 443,
or both, then redirecting to 127.0.0.1 in your hosts file will have
Windows pointing your web browser to that local server process listening
on those ports. Instead of getting an immediate rejection (nothing
listening on that port), your web browser connects to that local server.
Using 127.0.0.1 will conflict with many if not most local servers that
you can run on your localhost and why it was a bad choice for an IP
address to perform auto-blocking. It continues to get used because of
tradition: it started that way so it remains that way despite that way
is fraught with hazard.

When I tried the pre-compiled hosts files (e.g., MVP hosts), I changed
all the 127.0.0.1 addresses to 0.0.0.0. That made the rejects happen
quicker. Not by a lot per lookup but still quicker. Reduced rejection
time was noticed only in web pages that had hundreds of links to
external resources. There could be a host listening at 127.0.0.1 so
there's time waiting for a response. 0.0.0.0 cannot be assigned as a
device address.

https://en.wikipedia.org/wiki/0.0.0.0
https://www.lifewire.com/four-zero-ip-address-818384

However, I dropped using a pre-compiled hosts file. They are too often
out of date and often too agressive in what they block. They also must
list a *host*. They cannot list a domain. So, for example, it takes
over 50 entries to block some (not all) of the hosts at doubleclick.com.
Instead of using URL blocking where you could specify only a domain or
perhaps even wildcarding to reflect several matching hosts at a domain,
a host must be specified in the hosts file. Also, some domains will
accept any hostname and redirect all connection requests to their server
regardless of the hostname used. That means a web page could have any
random hostname and still connect to the ad or tracking server. There
is no way any text file could contain every possible combination of
characters in a hostname at a domain to effectively block the entire
domain where their server maps any hostname to their server. In that
domain's nameserver, they map ANY hostname to the IP address of their
server.

The point of using the hosts file for blocking was a frankenjob purpose
of that file. It happened to work but wasn't intended for that purpose.
For EVERY lookup in a web page, the hosts file gets opened and searched
sequentially from top to bottom in a text search (not a binary database
search). For the next lookup, the hosts file has to be reopened and
search sequentially. The hosts file is *NOT* cached into memory (and
good thing, too, considering how huge are the pre-compiled hosts files).
For web pages with a lot of external resources (not relatively
addressed), all this repeated file opening and serial searching adds
more time, so much so that the lookup time can outweigh the DNS lookup
times. If the web page had 100 externally linked resources, that means
opening the hosts text file 100 times and parsing the text within the
file from top to bottom 100 times. Parsing a text file to search within
it is slow, and all that file handling (open, read, close) takes time.
The hosts file was intended to hold from a few entries to a few hundred
entries, not 20 THOUSAND entries. It is just a text file, not a binary
database. Using the hosts file for blocking got started because it was
a kludge that worked; however, pre-compiled hosts files have become
ridiculously huge. Someone realized the kludge a couple decades ago
would work but users still think it is an applicable scheme today. Old
info lasts forever in the Internet, and often a lot of it is not
datestamped or simply regurgitated without thinking it through.

Also, unlike other ad and tracking blocking methods, there is no easy
and quick method to disable or enable the use of the hosts file. Your
web browser has no means of toggling the hosts file on and off. Almost
all adblock apps provide a means to toggle them off and back on. Even
Internet Explorer's TPL feature should be quickly toggled on and off.
With a hosts file, you would need a separate batch file and shortcut to
it to rename the hosts file so it couldn't be found in the next and
subsequent DNS lookup requests, and then rename it back to "hosts" when
you wanted to start using that list again. Toggling the hosts file on
and off won't work if you leave the local DNS Client service running.
Once a positive lookup gets cached in the DNS Client's cache, it gets
used first so that IP address gets reused instead of what you specified
in the hosts file. The DNS Client caches positive entries for 1 day (it
is configurable). You would need to flush the DNS cache after toggling
off (renaming) the hosts file, or leave the DNS Client service always
disabled and then have your web client reissue DNS requests to the
server for every lookup (not in the hosts file) rather than use the
faster local DNS cache.

Don't remember if it was you or someone else that mentioned using
HostsMan. Well, if you are really intent on using a pre-compiled hosts
file then you want it to get updated. What good is an antiquated
blacklist? Sure there are some hosts (not domains ... hosts!) listed in
the hosts file that don't often change. Mostly those would be for the
big CDN (Content Delivery Network) providers. Doubleclick is but just
one of thousands of CDNs. Those pre-compiled hosts files also list
smaller ad and tracking sources. The blacklist isn't static. If it
were, we'd still be using the same pre-compiled hosts file from a couple
decades ago with only a couple thousand entries. The sources change so
the blacklist should also change. Are you going to revisit the site
where you got a pre-compiled hosts file, download the latest one, and
then write it atop the old hosts file (which means losing any edits you
did for sites you consider were false positives in their blacklist)?
Hostsman does an automated job of retrieving the latest version of the
pre-compiled hosts file for you and replacing the existing (old) hosts
file. You could run it manually if you don't like anything
auto-updating itself; however, you are using a blacklist against ads and
tracking and that changes often. Would you disable the auto-update in
your anti-virus software and then remember to do it manually every year
or two? If so, don't bother using an AV program, or a hosts file a they
will way out of date.
  #51  
Old September 26th 17, 02:26 PM posted to alt.windows7.general,alt.comp.os.windows-10,microsoft.public.windowsxp.general
Mayayana
external usenet poster
 
Posts: 6,438
Default A browser question: now address used in hosts file

"VanguardLH" wrote

| So you are NOT using someone else's pre-compiled hosts file which is
| what I was discussing.

I don't think it matters very much. The MVPS HOSTS
is very bloated, but if people are not waiting seconds
for parsing then we have to conclude the bloat is not
a problem.

| It isn't just doing the file I/O calls to open the file. Then the file
| has to be read. It is a text file. It is parsed line by line and then
| a search done on match criteria to determine if the host to be visited
| is listed in the text file. Then the file has to get closed.

Reading in a file is extremely fast. Even for multiple
MBs it's virtually instant. Operations to find a specific
string are also very fast. The logical approach would be
to treat the file as a single string and look for, say,
ads.doubleclick.net. If found then look to the left for
the IP. There's no point checking each line. I don't
know how you think you know that's how it's done.
It would be an extremely wasteful approach.

I also use an Acrylic HOSTS file with wildcards. I'm
not seeing a lag. If you find there's a parsing lag then
you may be doing something wrong.

If uBlock Origin works for you that's not a problem.
But it sounds like you're trying to find reasons to
convince yourself that it's better than HOSTS and then
going further to convince yourself that HOSTS is
actually a problem.

uBlock is written in javascript and presumably has to
use some kind of blacklist. According to wikipedia it
uses several:

"uBlock Origin and uBlock support the majority of Adblock Plus's filter
syntax. The popular filter lists EasyList and EasyPrivacy are enabled as
default subscriptions. The extensions are capable of importing hosts files,
and a number of community maintained lists are available at installation.
Among the host files available, Peter Lowe's Ad servers list and Malware
Domains are also enabled as default."

So uBlock is using several lists, including at least
one HOSTS file, and filtering them with poor-
performance javascript rather than compiled code.
Waddayaknow'boutthat! Who'd have thunk it has
to parse a list to filter ads?



 




Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off






All times are GMT +1. The time now is 12:50 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 PCbanter.
The comments are property of their posters.