Skip to content

Commit

Permalink
Fix validation of TLD domains. #63
Browse files Browse the repository at this point in the history
Squashed commit of the following:

commit db11c98
Author: Slava Leleka <v.leleka@adguard.com>
Date:   Sun Dec 22 22:53:24 2024 +0300

    add release date

commit 6df0c77
Author: Slava Leleka <v.leleka@adguard.com>
Date:   Sat Dec 21 11:58:50 2024 +0300

    Applied suggestion

commit b15d7f4
Author: Slava Leleka <v.leleka@adguard.com>
Date:   Sat Dec 21 11:58:45 2024 +0300

    Applied suggestion

commit 3579f0c
Author: jellizaveta <e.egorova@adguard.com>
Date:   Fri Dec 20 16:58:22 2024 +0300

    readme, add case with composite TLDs

commit dad1a78
Author: jellizaveta <e.egorova@adguard.com>
Date:   Thu Dec 19 11:57:56 2024 +0300

    add todo

commit d20db9c
Author: jellizaveta <e.egorova@adguard.com>
Date:   Thu Dec 19 11:55:32 2024 +0300

    remove redundant rule

commit ed53fb2
Merge: 3448837 eac70e2
Author: jellizaveta <e.egorova@adguard.com>
Date:   Thu Dec 19 11:54:29 2024 +0300

    merge

commit 3448837
Author: jellizaveta <e.egorova@adguard.com>
Date:   Thu Dec 19 11:53:03 2024 +0300

    add tests

commit 5654e6f
Author: jellizaveta <e.egorova@adguard.com>
Date:   Thu Dec 19 11:45:47 2024 +0300

    fix var name

commit 6aa69ef
Author: jellizaveta <e.egorova@adguard.com>
Date:   Thu Dec 19 11:44:38 2024 +0300

    update changelog

commit cc59b99
Merge: fc54e6b 85e2953
Author: jellizaveta <e.egorova@adguard.com>
Date:   Thu Dec 19 11:43:38 2024 +0300

    merge master

commit eac70e2
Author: Slava Leleka <v.leleka@adguard.com>
Date:   Wed Dec 18 18:05:44 2024 +0300

    Applied suggestion

commit 7b7022c
Author: Slava Leleka <v.leleka@adguard.com>
Date:   Wed Dec 18 18:04:05 2024 +0300

    Applied suggestion

commit fc54e6b
Author: Slava Leleka <v.leleka@adguard.com>
Date:   Wed Dec 18 17:04:09 2024 +0300

    Applied suggestion

commit b8375db
Author: jellizaveta <e.egorova@adguard.com>
Date:   Wed Dec 18 12:03:06 2024 +0300

    update changelog

commit a2c8adc
Author: jellizaveta <e.egorova@adguard.com>
Date:   Wed Dec 18 12:00:31 2024 +0300

    handle cases with wildcard in adblock style syntax

commit 22e49a9
Author: jellizaveta <e.egorova@adguard.com>
Date:   Fri Dec 13 17:36:14 2024 +0300

    allow TLD if limiting modifiers present

commit f946f3a
Author: jellizaveta <e.egorova@adguard.com>
Date:   Fri Dec 13 16:55:07 2024 +0300

    add test
  • Loading branch information
jellizaveta committed Dec 23, 2024
1 parent 85e2953 commit d6c545e
Show file tree
Hide file tree
Showing 4 changed files with 141 additions and 21 deletions.
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,18 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased
## [1.0.31] - 2024-12-23

### Fixed

- Validation of TLD domains [#63]

### Added

- Calculation of checksum for filters and `! Checksum` string to the filter list meta [#76]

[1.0.31]: https://github.com/AdguardTeam/HostlistCompiler/compare/v1.0.29...v1.0.31
[#63]: https://github.com/AdguardTeam/HostlistCompiler/issues/63
[#76]: https://github.com/AdguardTeam/FiltersCompiler/issues/76

## [1.0.29] - 2024-09-26
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,10 @@ So here's what it does:
- Discards rules with unsupported modifiers. [Click here](https://github.com/AdguardTeam/AdGuardHome/wiki/Hosts-Blocklists#-adblock-style-syntax) to learn more about which modifiers are supported.
- Discards rules that are too short.
- Discards IP addresses. If you need to keep IP addresses, use [ValidateAllowIp](#validate-allow-ip) instead.
- Removes rules that block entire top-level domains (TLDs) like `||*.org^`, unless they have specific limiting modifiers such as `$denyallow`, `$badfilter`, or `$client`.
Examples:
- `||*.org^` - this rule will be removed
- `||*.org^$denyallow=example.com` - this rule will be kept because it has a limiting modifier

If there are comments preceding the invalid rule, they will be removed as well.

Expand Down
84 changes: 64 additions & 20 deletions src/transformations/validate.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,36 @@ const tldts = require('tldts');
const utils = require('../utils');
const ruleUtils = require('../rule');

const DOMAIN_PREFIX = '||';
const DOMAIN_SEPARATOR = '^';
const WILDCARD = '*';
const WILDCARD_DOMAIN_PART = '*.';

// TODO: Remove lodash from the project if possible

/**
* The list of modifiers that limit the rule for specific domains.
*/
const ANY_PATTERN_MODIFIER = [
'denyallow',
'badfilter',
// DNS-related modifiers.
'client',
];

/**
* The list of modifiers supported by hosts-level blockers.
*/
const SUPPORTED_MODIFIERS = [
'important',
'~important',
'badfilter',
'ctag',
'denyallow',
// DNS-related modifiers.
'client',
'dnstype',
'dnsrewrite',
'ctag',
// modifiers that limit the rule for specific domains
...ANY_PATTERN_MODIFIER,
];

/**
Expand All @@ -31,9 +47,10 @@ const MAX_PATTERN_LENGTH = 5;
* @param {string} hostname - hostname to check
* @param {string} ruleText - original rule text (for logging)
* @param {boolean} allowedIP - flag to determine if IP validation is allowed
* @param {boolean} hasLimitModifier - flag to determine if the rule has a limit modifier, e.g. denyallow
* @returns {boolean} true if the hostname is okay to be in the blocklist.
*/
function validHostname(hostname, ruleText, allowedIP) {
function validHostname(hostname, ruleText, allowedIP, hasLimitModifier) {
const result = tldts.parse(hostname);

if (!result.hostname) {
Expand All @@ -46,7 +63,7 @@ function validHostname(hostname, ruleText, allowedIP) {
return false;
}

if (result.hostname === result.publicSuffix) {
if (result.hostname === result.publicSuffix && !hasLimitModifier) {
consola.debug(`matching the whole public suffix ${hostname} is not allowed: ${ruleText}`);
return false;
}
Expand Down Expand Up @@ -108,6 +125,9 @@ function validAdblockRule(ruleText, allowedIP) {
return false;
}

// need to check if rules with TLD has limit modifier
let hasLimitModifier = false;

// 1. It checks if the rule contains only supported modifiers.
if (props.options) {
// eslint-disable-next-line no-restricted-syntax
Expand All @@ -116,6 +136,10 @@ function validAdblockRule(ruleText, allowedIP) {
consola.debug(`Contains unsupported modifier ${option.name}: ${ruleText}`);
return false;
}
// if the rule has a limit modifier, TLD as a hostname is allowed
if (ANY_PATTERN_MODIFIER.includes(option.name)) {
hasLimitModifier = true;
}
}
}

Expand Down Expand Up @@ -151,30 +175,49 @@ function validAdblockRule(ruleText, allowedIP) {

// 4. Validate domain name
// Note that we don't check rules that contain wildcard characters
const sepIdx = props.pattern.indexOf('^');
const wildcardIdx = props.pattern.indexOf('*');
const sepIdx = props.pattern.indexOf(DOMAIN_SEPARATOR);
const wildcardIdx = props.pattern.indexOf(WILDCARD);
if (sepIdx !== -1 && wildcardIdx !== -1 && wildcardIdx > sepIdx) {
// Smth like ||example.org^test* -- invalid
return false;
}

if (_.startsWith(props.pattern, '||')
&& sepIdx !== -1
&& wildcardIdx === -1) {
const hostname = utils.substringBetween(ruleText, '||', '^');
if (!validHostname(hostname, ruleText, allowedIP)) {
return false;
}
// Check if the pattern does not start with the domain prefix and does not contain a domain separator
if (!_.startsWith(props.pattern, DOMAIN_PREFIX) || sepIdx === -1) {
return true;
}

// Extract the domain to check from the rule text
const domainToCheck = utils.substringBetween(ruleText, DOMAIN_PREFIX, DOMAIN_SEPARATOR);

// If there's something after ^ in the pattern - something went wrong
// unless it's `^|` which is a rather often case
if (props.pattern.length > (sepIdx + 1)
&& props.pattern[sepIdx + 1] !== '|') {
return false;
// If there are wildcard characters in the pattern
if (wildcardIdx !== -1) {
// Check if the rule has wildcard characters but includes only TLD (e.g., ||*.org^ or ||*.co.uk^)
const startsWithWildcard = domainToCheck.startsWith(WILDCARD_DOMAIN_PART);
// Get the TLD pattern to check (e.g. ||*.org^ --> org)
const TLDPattern = domainToCheck.replace(WILDCARD_DOMAIN_PART, '');
// Compare the TLD pattern with the public suffix
const isOnlyTLD = tldts.getPublicSuffix(TLDPattern) === TLDPattern;
// If it's a wildcard with TLD, validate the cleaned TLD
if (startsWithWildcard && isOnlyTLD) {
const cleanedDomain = domainToCheck.replace(WILDCARD_DOMAIN_PART, '');
return validHostname(cleanedDomain, ruleText, allowedIP, hasLimitModifier);
}
// If the rule has wildcard characters but is not a TLD (e.g., ||*.example.org^)
return true;
}

return true;
// Validate the domain
if (!validHostname(domainToCheck, ruleText, allowedIP, hasLimitModifier)) {
return false;
}

// Ensure there's nothing after the domain separator unless it's `^|`
// If there's something after ^ in the pattern - something went wrong
// unless it's `^|` which is a rather often case
// ||example.org^| -- valid
// @@||example.org^|$important -- invalid
return !(props.pattern.length > (sepIdx + 1) && props.pattern[sepIdx + 1] !== '|');
}

/**
Expand All @@ -197,6 +240,7 @@ function valid(ruleText, allowedIP) {
if (ruleUtils.isEtcHostsRule(ruleText)) {
return validEtcHostsRule(ruleText, allowedIP);
}

return validAdblockRule(ruleText, allowedIP);
}

Expand Down
66 changes: 66 additions & 0 deletions test/transformations/validate.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,70 @@ describe('Validate', () => {
'@@||example.org^|$important',
]);
});

it('adblock-style rules with wildcard and denyallow modifier', () => {
const rules = `||*.org^$denyallow=example.com
||*.asia^
||*.example.org^
||*.asia^$denyallow=fap.bar
||xyz^$denyallow=example.com
||xyz^`.split(/\r?\n/);
const filtered = validate(rules);

expect(filtered).toEqual([
'||*.org^$denyallow=example.com',
'||*.example.org^',
'||*.asia^$denyallow=fap.bar',
'||xyz^$denyallow=example.com',
]);
});

it('adblock-style rules with wildcard and badfilter modifier', () => {
const rules = `||*.org^$badfilter
||*.asia^
||*.example.org^
||*.asia^$badfilter
||xyz^$badfilter
||xyz^`.split(/\r?\n/);
const filtered = validate(rules);

expect(filtered).toEqual([
'||*.org^$badfilter',
'||*.example.org^',
'||*.asia^$badfilter',
'||xyz^$badfilter',
]);
});

it('adblock-style rules with wildcard and client modifier', () => {
const rules = `@@||*.org^$client=127.0.0.1
||*.asia^
||*.example.org^
||*.asia^$client=192.168.0.0/24
||xyz^$client=192.168.0.0/24
||xyz^`.split(/\r?\n/);
const filtered = validate(rules);

expect(filtered).toEqual([
'@@||*.org^$client=127.0.0.1',
'||*.example.org^',
'||*.asia^$client=192.168.0.0/24',
'||xyz^$client=192.168.0.0/24',
]);
});

it('check for composite TLDs', () => {
const rules = `||*.com.tr^$denyallow=example.com
||*.com.tr^
||*.co.uk^$client=127.0.0.1
||*.co.uk^
||*.example.org^`.split(/\r?\n/);
const filtered = validate(rules);

expect(filtered).toEqual([
'||*.com.tr^$denyallow=example.com',
'||*.co.uk^$client=127.0.0.1',
'||*.example.org^',
]);
});
});

0 comments on commit d6c545e

Please sign in to comment.