expose "Dead Weight" of a list #303

jawz101 · 2018-06-28T16:11:24Z

I would think displaying the number of dead hosts on the webpage may encourage list maintainers to make sure their lists are actively maintained and indicate to users when a list is poorly maintained.

https://pyfunceble.readthedocs.io/en/latest/

I came across this utility when looking for ways to clean up a tiny host file I maintain.

collinbarrett · 2018-06-28T17:20:13Z

Nice. I love the idea.

jawz101 · 2018-06-29T13:53:47Z

I played around with it last night on a list I maintain on GitHub. Really easy to use.

I saw it when reviewing a pull request on Adaway's host file (I hate that list, btw. The tool proves my point). 41% of the 410 hosts don't even exist anymore. The list hasn't been updated in 2 years. I tried once to submit some actual mobile ad domains and they reverted it because I'd accidentally blocked a URL shortener. Grr.

collinbarrett · 2018-06-29T15:55:40Z

PyFunceble looks cool, but would prefer to find a .NET approach or a 3rd-party API rather than incurring the extra overhead of rolling in a Python tool.

funilrys · 2018-09-27T10:50:46Z

Hi there,
may I ask what is the size of the list ?

I could add it to https://github.com/dead-hosts you'll then only have to pull the clean.list (generated at the end of the test of the whole list) and do your comparison, business logic or whatever you want with the generated data.

collinbarrett · 2018-09-27T11:02:49Z

Hey, @funilrys, @dead-hosts looks like a neat project. I hadn't heard about it.

The idea of this issue for FilterLists.com would be to monitor all of the ~800 (as of now) lists that we index for any domains, ips, and maybe even (but much harder to do) no longer valid static syntax rules. We could then expose this in various ways via a percentage of dead v. active, a listing of dead rules, etc.

I need to learn a bit more about how @dead-hosts works. Maybe we could partner up to not re-invent the wheel.

Is @dead-hosts scalable to the point where it could potentially monitor a majority or all of the lists that FilterLists indexes? (see here)

funilrys · 2018-09-27T17:30:52Z

Hi @collinbarrett ,

In reality, it depends on the frequency of the tests.
I'm part of @Ultimate-Hosts-Blacklist (main repository: https://github.com/mitchellkrogza/Ultimate.Hosts.Blacklist) and I can only say that will the big amount of information to test, it only depends on the frequency of the test.

Let me try to details some parts in order to help you understand.

About Dead-Hosts

The idea behind Dead-Hosts is to propose to list or hosts list maintainer a place where they can get the results of PyFunceble without having to think about the logistics behind.

Since recently, we generate a clean.list file which contains the list of ACTIVE domains caught by PyFunceble. I recently discovered that it is used by some maintainers to clean up their list.

How I work

Taking contact with maintainers

One of the ways for me is to contact a maintainer personally and ask them if they are interested in PyFunceble.
If they are interested, I propose them to create an instance into @dead-hosts. If they agree, we I create the repository structure and the whole process run automatically.

Maintainers contact me

As an example, https://github.com/dead-hosts/WindowsSpyBlocker-spy_git_crazy-max was a spontaneous question from the list maintainer. He asked me to create an instance so he can work with the results of PyFunceble. So I did 😸.

Issues or other talks about PyFunceble

I'm pretty sure you have already heard of https://github.com/StevenBlack/hosts.
All the lists that are parts of its unified hosts file are part of dead-hosts. Indeed, it all started with a discussion we had about dead domains. Steven does not agree with cleaning the unified as it is the role of the curator but he agrees that knowing some number may make some of the curators rethink about their list. So we did 😸 Now all new lists which are part of the unified hosts file have their instance under @dead-hosts.

Automation

How do I create a new instance?

For now, the creation of a new repository is done manually. But the construction of the repository structure I use a python script I wrote.

How are external user/maintainer/teams handled ?

One of the advantages of the GitHub organization system is that it allows us to create teams and permission. I use those feature in order to allow original maintainer (and their team member) to have write access to the repository.

Indeed, I work like follow:

I create the repository and the repository structure.
I assign the repository maintainer as the repository Admin.
I create a team with the name of the maintainer.
I invite the repository maintainer to the team.
Once he joins, I make him a maintainer of the team so he can add his team member.
Finally (and even before the maintainer joins the team) I assign the write permission to the team for the created repository.

How do we run the tests?

The tests are running inside Travis CI container.
We set up every repository automatically with the help of our internal script.

We simply call a script called update.py which is responsible for the whole process before we even start PyFunceble.

The update.py works like follow:

We check if we are currently under test
If so we launch of PyFunceble and it continues its work normally.
If we are not currently under test, we compare the number of day between each retest. Indeed, as some list are not updating every day, we set a number of days between each retest on the info.json of each repository.
If we are up to the number of days, we clean the whole repository (the output directory) and we test the list from the beginning.

The push process is done by PyFunceble thanks to the Travis CI "mode" which allow us to bypass the limitation of Travis CI.

Note: Once a month I check the whole process in order to see if something is not going well.

What are the limitation of Travis CI?

Travis CI is great for what we are doing. Indeed, it's free for public repositories and it has many IP which allows us to launch our test correctly without being constantly blocked by whois server for example.

The only limitation I can find with Travis CI is that it only allows 5 instances to be run at the same time. But it's not that hard to live with that limitation.

Note: In order to avoid being blocked by whois server (if we are allowed to use WHOIS records for our tests) we stop and continue our test in a new container after 10 minutes of the test.

About Ultimate-Hosts-Blacklist and Ultimate.Hosts.Blacklist

You have to understand that @Ultimate-Hosts-Blacklist is the backend of Ultimate.Hosts.Blacklist.

@Ultimate-Hosts-Blacklist work and run almost exactly like @dead-hosts. The only thing that differences both are the way they configured PyFunceble.

What we do with Ultimate.Hosts.Blacklist is:

We run everyday at T14:09:13Z our update script.
The update script:
- Pull all clean.list
- Pull all domains.list if the clean.list does not exist
- Generate all hosts file, lists, deny files ...
- Push everything to the repository

About your question:

Is @dead-hosts scalable to the point where it could potentially monitor a majority or all of the lists that FilterLists indexes?

For 800 entries !? No, we better create and set up a new organization only for your usage 😸

I hope that this may help you understand more. If you need something else or if you have any other questions, please let me know.

Have a nice day/night.

Cheers,
Nissar

collinbarrett · 2020-09-21T01:14:07Z

closing into #371

collinbarrett added the enhancement label Jun 28, 2018

collinbarrett changed the title ~~Feature Request: Consider integrating PyFunceble to show % of dead entries~~ add support for flagging a list of and/or displaying a count of unreachaple domains/IPs in lists Jun 28, 2018

collinbarrett changed the title ~~add support for flagging a list of and/or displaying a count of unreachaple domains/IPs in lists~~ support flagging a list of and/or displaying a count of unreachaple domains/IPs in lists Jun 30, 2018

collinbarrett mentioned this issue Jul 6, 2018

Ublock Origin are consuming so much resource #314

Closed

collinbarrett changed the title ~~support flagging a list of and/or displaying a count of unreachaple domains/IPs in lists~~ support flagging a list of and/or displaying a count of unreachable domains/IPs in lists Aug 13, 2018

collinbarrett added the web front-end user interface label Aug 13, 2018

collinbarrett mentioned this issue Aug 13, 2018

dead weight #337

Closed

collinbarrett changed the title ~~support flagging a list of and/or displaying a count of unreachable domains/IPs in lists~~ expose "Dead Weight" of a list Aug 21, 2018

collinbarrett added the url-validation service that validates URLs label Aug 24, 2018

collinbarrett removed the web front-end user interface label Sep 29, 2018

collinbarrett added the web front-end user interface label Sep 3, 2019

collinbarrett removed the enhancement label Feb 17, 2020

collinbarrett added analytics service that provides various statistics about FilterLists and removed url-validation service that validates URLs web front-end user interface labels Sep 13, 2020

collinbarrett added the url-validation service that validates URLs label Sep 21, 2020

collinbarrett closed this as completed Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expose "Dead Weight" of a list #303

expose "Dead Weight" of a list #303

jawz101 commented Jun 28, 2018 •

edited

Loading

collinbarrett commented Jun 28, 2018

jawz101 commented Jun 29, 2018

collinbarrett commented Jun 29, 2018

funilrys commented Sep 27, 2018

collinbarrett commented Sep 27, 2018

funilrys commented Sep 27, 2018

collinbarrett commented Sep 21, 2020

expose "Dead Weight" of a list #303

expose "Dead Weight" of a list #303

Comments

jawz101 commented Jun 28, 2018 • edited Loading

collinbarrett commented Jun 28, 2018

jawz101 commented Jun 29, 2018

collinbarrett commented Jun 29, 2018

funilrys commented Sep 27, 2018

collinbarrett commented Sep 27, 2018

funilrys commented Sep 27, 2018

About Dead-Hosts

How I work

Taking contact with maintainers

Maintainers contact me

Issues or other talks about PyFunceble

Automation

How do I create a new instance?

How are external user/maintainer/teams handled ?

How do we run the tests?

What are the limitation of Travis CI?

About Ultimate-Hosts-Blacklist and Ultimate.Hosts.Blacklist

collinbarrett commented Sep 21, 2020

jawz101 commented Jun 28, 2018 •

edited

Loading