Skip to content

A fast and low-dependency public suffix list parser for rust, written with nom

License

Notifications You must be signed in to change notification settings

dwerner/nom-psl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crates.io

Faster public suffix domain parsing.

The scope of this library is limited to finding the tld+1 of a given domain from the public suffix list.

Approach:

  • Load public suffix list entries into memory
  • Match immutable, owned values of domains to be parsed
  • Leverage a user-sized lru cache for entries

Goals:

  • provide (mostly) compliant public suffix domain parsing.
  • avoid allocations during domain parsing.
  • offload as much work as possible to parsing stage.
  • avoid depedencies that might themselves bring unwanted baggage
  • inputs are not mutated, outputs are slices of inputs

Caveats:

  • still rely on idna crate for punycode parsing
  • we don't lower-case anything (for performance we ignore this)

Environment Variables

PUBLIC_SUFFIX_LIST_FILE=somefile - override which file will be loaded in place of public_suffix_list.dat

Example:

lazy_static! {
    static ref LIST: List = {
        let list = List::parse_source_file("public_suffix_list.dat", 10_000_000);
        list.expect("unable to parse PSL file")
    };
}

...

fn foo() {
    let domain = "abc.one.two.example.co.uk";
    let tldp1 = LIST.parse_domain(domain);
    
    assert_eq!(tldp1, Some("example.co.uk"));
}

TODO:

  • benchmarks

About

A fast and low-dependency public suffix list parser for rust, written with nom

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages