-
Notifications
You must be signed in to change notification settings - Fork 1
Automatically download and convert hosts files and Adblock filters to Privoxy action files
License
lcferrum/adhosts2privoxy
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
adhosts2privoxy 1. License ---------- Copyright (c) 2018-2019 Lcferrum This program comes with no warranty. You must use this program at your own risk. Licensed under BSD 2-Clause License - see LICENSE file for details. 2. About -------- This Python 2.7 script converts multiple hosts files and Adblock filters into single Privoxy action file. It has simple built-in download functionality so you can supply web address to the script instead of specifying local file path. Script gets information about input files via config file. 3. Where to get --------------- Main project homepage is at GitHub: https://github.com/lcferrum/adhosts2privoxy 4. Usage -------- On Linux/Mac: ./adhosts2privoxy.py [CONFIG [ACTION_FILE [INPUT_DIR]]] On Windows: python adhosts2privoxy.py [CONFIG [ACTION_FILE [INPUT_DIR]]] CONFIG is a relative (to current dir) or absolute path to config file containing list of input files (default: 'adhosts2privoxy.conf'). More on config file in the next section. Config file should have the same encoding as current locale. ACTION_FILE is a relative (to current dir) or absolute path to resulting Privoxy action file (default: 'hosts.action'). Action file will be written using current locale encoding. INPUT_DIR is a relative (to current dir) or absolute path to directory where input files are located or will be downloaded to (defaults to current directory). You can specify absolute path for each input file in config - in this case, INPUT_DIR is ignored for such files. Built-in download functionality is good enough for most hosted hosts files and Adblock filters on the web. It takes in account Content-Disposition and Content-Type headers, supports Internationalized Domain Names. But you can always specify local file path for input file in config, instead of URL, if you don't need to download anything or prefer to rely on external download utilities such as Wget or Curl. 5. Configuration file --------------------- Configuration file consists of any number of sections, each corresponding to input file to be processed. Section starts with '[NAME]' header and followed by 'VARIABLE: VALUE' or 'VARIABLE=VALUE' entries. Comment lines start with '#' or ';', inline comments start with ';'. Possible VARIABLEs are: 'Url', 'File', 'Keep', 'Type' and 'Encoding'. Example config file: [Ad Hosts] Url=http://foo.bar/adhosts.txt File=ad_hosts Keep=1 Type=hosts Encoding=UTF-8 Section name is a display name of input file to be processed. It will be used in script output. In this example, section name is 'Ad Hosts'. 'Url' variable holds a web address of input file. 'Url' variable is optional and, if it is omitted, local input file will be processed and 'File' variable should be set accordingly. In this example, input file will be downloaded from 'http://foo.bar/adhosts.txt'. 'File' variable contains path where downloaded file will be saved or, in case if 'Url' is omitted, path to locally stored input file. 'File' variable is optional only when 'Url' is provided: in absence of 'File' variable, input filename will be deduced from Content-Disposition header or web address itself. 'File' contains relative or absolute path. Relative paths (this includes automatically deduced filenames of downloaded input files) are calculated relative to directory specified in HOSTS_DIR script parameter (which is current directory by default). In this example, downloaded input file will be saved in current directory under filename 'ad_hosts'. By default, after script finishes processing input file, it deletes it. To control this behavior 'Keep' variable is used - if it translates to True (values 'yes', 'true', 'on' and non-zero decimals), file won't be deleted after being processed. If it translates to False (values 'no', 'false', 'off' and zero equivalents) or variable is omitted - default action takes place and file becomes deleted. In this example, input file won't be deleted after being processed. 'Type' variable defines type of the input file. Possible values are: 'hosts' for hosts files, 'adblock' for ordinary Adblock filters, 'adblock+hosts' for uBlock Origin flavored Adblock filters and 'auto' for type auto-detection. Note that 'adblock+hosts' type can't be auto-detected. 'Type' defaults to 'auto'. In this example, input file type is hosts file. Input files can have wide variety of encodings. So, when reading input file, script uses encoding specified in Content-Type header or in current locale, if Content-Type encoding is not available or when input file is stored locally. Automatically selected input file encoding can be overridden with 'Encoding' variable. In this example, downloaded input file will be read using 'UTF-8' encoding. 6. Hosts file processing ------------------------ When hosts file is processed by the script, it's hostnames (and aliases) become patterns for 'block' action in Privoxy action file. Multiple hostnames are collapsed in single pattern if there is a hostname that acts as high level domain for all of them. For example, consider this hosts file: 0.0.0.0 ads.foo.bar virii.foo.bar 0.0.0.0 foo.bar 0.0.0.0 more.ads.foo.bar 0.0.0.0 ad1.foo.baz ad2.foo.baz After processing it, resulting action file is similar to this: {+block{Blocked hostname.}} .foo.bar .ad1.foo.baz .ad2.foo.baz Script algorithm skips malformed hosts entries or entries with "non-blocking" IPs. Only following IPs are considered "blocking": whole 127.0.0.0/8 network (incl. 127.0.0.1), 0.0.0.0, ::1, ::. Also, typical "loopback" and "localhost" hostnames are ignored. 7. Adblock filter support ------------------------- Script only processes following Adblock filter rules: '||www.foo.com^', '||www.foo.com|' and '||www.foo.com'. Type options ('$') may be present but largely ignored by the script. The only option that it isn't ignored is 'domain=' option which, if present, should contain excluding list of domains ('~') - otherwise, rule will be skipped. If uBlock Origin flavored type of Adblock filter is chosen, additionally script will process rules like 'www.foo.com'. This behaviour stems from the difference in how uBlock Origin and other Adblock clones handle non-anchored non-wildcard patterns. While uBlock Origin treats such patterns as complete hostname, other Adblock clones treat it like wildcard that can be matched anywhere within the address. By using uBlock Origin flavored type, plain text files, containing simple list of hostnames, can be processed by the script. Hostnames extracted from filter rules are treated the same way as in hosts files. 8. Usage example ---------------- Let's say, you need to turn web hosted hosts file and Adblock filter to Privoxy action file. Hosts file is at 'http://foo.bar/hosts.txt', Adblock filter is at 'http://foo.baz/easylist.txt'. For this purpose, config file can be as simple as this: [Hosts file] Url=http://foo.bar/hosts.txt [Adblock filter] Url=http://foo.baz/easylist.txt Save this config as 'adhosts2privoxy.conf' in the same directory where adhosts2privoxy script is located. Then just run the script from this directory without any additional parameters. Resulting Privoxy action file 'hosts.action' will be created in the same directory. Downloaded files won't be kept and will be deleted after processing.
About
Automatically download and convert hosts files and Adblock filters to Privoxy action files
Topics
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published