Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list quality #702

Closed
jawz101 opened this issue Mar 4, 2019 · 3 comments
Closed

list quality #702

jawz101 opened this issue Mar 4, 2019 · 3 comments
Labels
question user question

Comments

@jawz101
Copy link
Contributor

jawz101 commented Mar 4, 2019

At some point a list should be considered misguided.

If a list blindly enumerates all subdomains of a company and considers that "coverage" it isn't a deliberate list.
Blocking the corporate pages of a domain: its forums, blogs, documentation, privacy policies, & careers sites shouldn't make it an ad blocklist. Sometimes you just need to research a company by visiting its webpages to see what sort of business they do.

I would just consider this when putting sites on the list along with if they have been updated w/in the past X months, if they remove dead hosts, if they block any of the top X most popular sites. Unless it's a site focusing on a topic such as adult or gambling sites or targets a particular service (a Microsoft blocklist) it might be worth having a vetting team, have people resubmit some lists only if they are the official maintainer or something.

At some point the 1400 lists is 95% a pile of turds.

I spent the weekend working with about 150 host files and some were dead, full of dead hosts, duplicates of others' lists, or just cause confusing breakage given the descriptions of the intentions of lists.

@jawz101 jawz101 closed this as completed Mar 4, 2019
@collinbarrett
Copy link
Owner

Hey, @jawz101 , I completely agree.

The goal of FilterLists is to index all known lists. So, we want to keep doing that. However, we can certainly continue to improve surfacing those of higher-quality, etc.

One tip to view the most frequently updated lists is to show the "Updated" column via the checkbox below the grid and sort on that column.

We do have #371 that I'd really like to incorporate in some form eventually which would help with this.

@collinbarrett collinbarrett added the question user question label Mar 4, 2019
@jawz101
Copy link
Contributor Author

jawz101 commented Mar 4, 2019

How is the updated date derived? Is it from the file timestamp or also from the datetime's some list put in the header of the actual file?

@collinbarrett
Copy link
Owner

FilterLists crawls all of the lists. It gets through all of them every couple weeks or so (rough estimate). It compares hashes of the full file and bumps the updated date if the list has been updated. It's not a perfect solution (see #537), but it's somewhat helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question user question
Projects
None yet
Development

No branches or pull requests

2 participants