Circumvention monitor

There's a typical issue with ad networks that often switch to using random new domains, and it's hard to keep an eye on them. This crawler is supposed to automate this process.

Circumvention monitor

Reports

Every day the circumvention monitor runs automatically and generates two files:

report/report.md - human-readable report.
report/rules.txt - blocking rules for the domains discovered by the crawler.

How to configure it

In order to add a new ad system to monitor, add a new JS object to the configuration.

{
  "name": "AD SYSTEM NAME",
  "criteria": [
    {
      "urlPattern": "URL PATTERN",
      "contentPattern": "CONTENT PATTERN",
      "contentType": "script",
      "thirdParty": true,
      "ruleProperties": {
        "modifiers": ["third-party"],
        "scope": "registeredDomain"
      }
    }
  ],
  "pages": ["https://example.net/", "https://example.com/"]
}

name - ad system name. Will be used in the report to identify this ruleset.
criteria - a list of criteria that will be used to identify ad requests.
- urlPattern (optional) - ad request URL must match this pattern. It can be a string, a wildcard, or a regular expression.
  
  Examples:
  - test - string, all URLs that contain this string.
  - *test*test* - wildcard, the URL must match this wildcard.
  - /.*test.*/ - regular expression. Note that / are just special characters and not a part of the regular expression.
- contentPattern (optional) - response content must match this pattern. Just like urlPattern, it can be a string, a wildcard, or a regular expression.
- contentType (optional) - one of this list.
- thirdParty (optional) - if specified, we check if request is third party or not.
- ruleProperties (optional) - additional propreties for the rules generated by the compiler.
  - modifiers (optional) - an array of modifiers that should be added to the rule
  - scope (optional) - rule scope. Possible values are:
    - domain - full domain name (||exact.domain.name^)
    - registeredDomain - registered domain name (eTLD+1) (||domain.name^)
    - domainAndPath - domain + path (||exact.domain.name/path/without/query)
pages - a list of webpages that will be crawled in order to extract this ad system domains.

How to run it

yarn install - install dependencies
yarn monitor - run the crawler with default arguments

Run yarn monitor -v to make it print the verbose log.

TODO

Make basic rules modifiers configurable (see report.js)
Allow monitoring DOM state (I need examples where this is needed)
"criteria" should allow blocking or adding custom CSS to test pages so that we could trigger circumvention scripts

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.vscode		.vscode
conf		conf
report		report
src		src
test		test
.eslintignore		.eslintignore
.eslintrc		.eslintrc
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
jest.cofig.js		jest.cofig.js
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Circumvention monitor

Reports

How to configure it

How to run it

TODO

About

Releases

Packages

Contributors 3

Languages

License

ameshkov/circumvention-monitor

Folders and files

Latest commit

History

Repository files navigation

Circumvention monitor

Reports

How to configure it

How to run it

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages