Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

RawGit is shutting down #220

Closed
jspenguin2017 opened this issue Oct 16, 2018 · 49 comments
Closed

RawGit is shutting down #220

jspenguin2017 opened this issue Oct 16, 2018 · 49 comments
Labels
archived This thread was archived, open new issues for similar problems. fixed

Comments

@jspenguin2017
Copy link
Member

jspenguin2017 commented Oct 16, 2018

RawGit will stop working in about a year. Got to update those URLs...

cc @gorhill

@jspenguin2017
Copy link
Member Author

cc @collinbarrett of filterlists.com
cc @farrokhi of IRN: Adblock-Iran

@farrokhi
Copy link

Are you aware of (or have any recommendation on) any replacement service? Unfortunately access to raw.githubusercontent.com is not possible from certain countries, this was why we used RawGit.

@gwarser
Copy link

gwarser commented Oct 16, 2018

https://rawgit.com/ proposes some replacements.

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 16, 2018

Most of the recommended replacements are for delivering JavaScript libraries, not really for filter lists.

I'm looking at GitCDN, but we might want to ask the owner first.

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 16, 2018

Note that jsDelivr and UNPKG will work for delivering filter lists, just we might want to ask them first, because obviously they are designed to deliver JavaScript libraries, not filter lists.

@DandelionSprout
Copy link

DandelionSprout commented Oct 17, 2018

@farrokhi I don't have any first-hand knowledge of the Iranian internet climate, but here's a quick list of other services that I've seen be used to host adblock lists. Maybe at least one of them could be accessible there?

  • GitLab
  • Dropbox (dl.dropboxusercontent.com)
  • Repo.or.cz
  • OSDN (git.sourceforge.jp)
  • GitHub Gist (gist.githubusercontent.com)
  • Gitlab IO (e.g. zerodot1.gitlab.io, for one user-specific such site)
  • s3.amazonaws.com (which unfortunately isn't free of charge)
  • 000webhostapp.com (Used by someone I know to host a list, who wanted for the list in question to remain secret)

Purely personally, I've had good recent experiences with Repo.or.cz, as it can be set up to auto-sync with a GitHub repo for free.

Edit: Adding a few more options that I saw when looking more closely through https://github.com/collinbarrett/FilterLists/blob/master/data/FilterList.json:

  • Bitbucket (bitbucket.org)
  • Ideone (ideone.com; seems to work for a list called "Adblock-Persian list")
  • NotABug.org

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 17, 2018

@DandelionSprout

  • Dropbox - Let's not
  • AWS - You'll have better luck with your wallet with OVH or Scaleway, they offer unmetered traffic
  • 000webhost - It's too slow, I tried to host a website with it, and oh boy it's trash; it might be OK for a small filter that only a few people use, but 20+ mil users of uBO will be too much for it

@jspenguin2017
Copy link
Member Author

Can some ask jsDelivr, UNPKG, and/or GitCDN if we can host filter lists on them?

@dimqua
Copy link

dimqua commented Oct 17, 2018

What about GitLab? I mean, why use CDN in the first place.

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 17, 2018

@gorhill was worried that GitHub/GitLab may ask us to host filters elsewhere, so we want to have a plan B in case things go wrong.

The raw file access feature of GitHub/GitLab is not really designed to be CDN, so we want to use a real CDN to minimize the chance that we'll have to move on short notice.

@collinbarrett
Copy link

Even setting using GitHub as a CDN aside, with the recent CHEF-KOCH debacle, list maintainers hosting on GitHub should unfortunately be a bit concerned...

@farrokhi
Copy link

GitCDN seems to be a hassle-free replacement of RawGit for now. I just opened a PR on the main uBlock repo: gorhill#3738

@jspenguin2017
Copy link
Member Author

@collinbarrett Eh... Got to put this on the priority list.

@mikhoul
Copy link

mikhoul commented Oct 17, 2018

jsdeliver accept any kind of files (Including filters), they have even made a landing page to welcome Rawgit Users 😸

https://www.jsdelivr.com/rawgit

But GitCDN also seem to be good 😄

Regards :octocat:

@DandelionSprout
Copy link

DandelionSprout commented Oct 17, 2018

Both of the topics that are discussed in this thread, have made me think about the importance of maintaining backups in general.

Me and my lists are decently well covered already, as I maintain a mirror of my GitHub repo on Repo.or.cz. I must admit that I also don't feel that I'm in any immediate-future danger of being targeted by GitHub's bots, as I neither maintain NSFW lists or custom bots. But still.

Let's take https://github.com/NanoAdblocker/NanoCore2/blob/master/src/assets.json (Alternately https://github.com/gorhill/uBlock/blob/master/assets/assets.json). Many of the lists there have no registered backups; so if they run into problems with e.g. server moderators, domain admins, or national content censors, then those lists would be in severe troubles.

I therefore propose the hosting of mirrors for most (if not all) of the lists in question. GitLab allows mirroring of GitHub repos... but only on paid plans. Repo.or.cz and GitCDN allows mirroring for free, and as far as I can tell they even allow for mirroring without any input from a GitHub repo's original owner. As for the other 12-ish services that have been suggested in this thread, they'd have to be looked into to determine how easy it is to mirror repos on them.

@jspenguin2017
Copy link
Member Author

GitCDN doesn't say what kind of file you can serve with it, so I'll assume it means any file on GitHub. I'll move my links to it.

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 17, 2018

@DandelionSprout I have backups and mirrors of all default-enabled assets, along with the full commit history of uAssets.

I think the biggest technical problem is how to create an up-to-date copy of issues. It's not that hard to dump existing issues, but it's a bit difficult to keep the dump to up date.

@DandelionSprout
Copy link

DandelionSprout commented Oct 17, 2018

I also propose the unilateral keeping of backups of others' filter lists, be it with or without outright permission from a list's owner, especially for those lists that have reasonably open-sourced licences. Randomly-picked examples of Git-based lists that currently have no backups listed in the Assets lists, include hufilter, EasyList Hebrew, Frellwit's Swedish Filter, and EasyList Italy, among dozens of others.

Conversely (or even additionally), some popular lists that are hosted on limited-userbase sites, can have already-existing Git mirrors added for them. MVPS Hosts comes to mind, for which a perfectly good (as far as I can tell) mirror exists at https://raw.githubusercontent.com/StevenBlack/hosts/master/data/mvps.org/hosts.

An additional side effect of doing this, is that with ≥1 mirror for each list, if not even ≥2, then most cases of end-users having problems with updating their lists would disappear almost overnight. If I interpret farrokhi correctly, then a lot of lists that currently are only hosted on raw.githubusercontent.com, cannot be accessed in e.g. Iran; which means that by adding various mirrors for those lists, then they can even be accessed and updated by users in pretty much every country on Earth.

@DandelionSprout
Copy link

If it's needed, and if the idea doesn't sound repulsive to you guys (e.g. jspenguin2017 and @gorhill), then I myself could easily set up mirrors for various lists on Repo.or.cz and/or GitCDN fairly soon (Possibly even tonight), which could then be added to the assets.json files.

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 17, 2018

I have the code ready to mirror any filter list, but the current repository structure design is kind of bad... https://github.com/NanoMeow/MDLMirror

Ideally we want to pull all filters into a central repository that is not on GitHub. We don't want one point of failure. When GitHub removes an user, I don't think those CDNs will keep serving the content for long.

@mikhoul
Copy link

mikhoul commented Oct 17, 2018

BTW I've modified an old userscript that added a button to Github to have a direct link to "Rawgit" now it work again but use GitCDN. https://greasyfork.org/en/scripts/373361-github-gitcdn-button

image

@DandelionSprout
Copy link

Testing that script out a bit, I'm impressed that it also accounts for !#include tags, as far as I can tell. This isn't a critical thing for me personally, but for people who e.g. can access github.com but not raw.githubusercontent.com (e.g. Iranians, and Westeners on some school networks), that button could mean the world for them.

@jspenguin2017
Copy link
Member Author

gorhill#3739

@DandelionSprout
Copy link

DandelionSprout commented Oct 19, 2018

When GitHub removes an user, I don't think those CDNs will keep serving the content for long.

I have now done a bit of additional testing of Repo.or.cz, and am very glad to be able to say that their mirrors do not get deleted when the GitHub source repo is. I've been able to confirm this by making https://repo.or.cz/RepoOrCz-deletion-test-repo.git, whose source repo link was deleted soon after the mirror was created. Since Repo.or.cz tries to update its mirrors every 60~90min, this means that it doesn't try to delete mirrors that it detects as having dead source links. Moreover, there's an option for repo owners (i.e. those who know the editing password) to change the source repo link, in case a list's main repo is simply moved.

And to demonstrate the ease of making mirrors there, I've set up https://repo.or.cz/Adblock-list-backups-Frellwits-filter-lists.git, whose raw version of the SWE list at https://repo.or.cz/Adblock-list-backups-Frellwits-filter-lists.git/blob_plain/refs/heads/master:/Frellwits-Swedish-Filter.txt seems to work perfectly well for me.

Note 1: Repo.or.cz does not have a user system for mirror repos that I'm aware of. Instead, editing privileges are based on per-repo passwords. I've set up the SWE repo mirror above with a 45-character password that includes non-ASCII characters (A scenario that I think that very few password cracker tools have been set up for), but am not sure how I'd go about with sharing the passwords of this and of future mirrors with relevant people (e.g. you, gorhill, the lists' original owners, etc.) in a sufficiently secretive way.

Note 2: Repo.or.cz mirrors are not updated instantly, and may therefore be very slightly behind GitHub and GitCDN on the newest file updates. Additionally, their About page says that their server rack isn't a particularly powerful one. As such, I'd list repo.or.cz links below GitHub and GitCDN links in assets.json files.

@DandelionSprout
Copy link

While doing research on a pull for the assets.json files (Presumably that of uBO), I stumbled across https://notabug.org, as it was being used by the LVT list, and which also allows for hassle-free and free-of-cost mirroring. Suddenly I feel that we've got a lot of good options on our table now.

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 19, 2018

We can also use jsDelivr and UNPKG. We do want to have as many mirrors as possible.

We do need to mirror those filters that are not on GitHub onto GitHub first. Anyone want to do it or should I do it?


Also, I've been looking into how to mirror GitHub issues, and it seems that we'll need WebHook to keep the mirror up to date...

@DandelionSprout
Copy link

https://github.com/StevenBlack/hosts/tree/master/data has mirrors of MVPS, Dan Pollock, and Peter Lowe's List.

It's unfortunately way beyond my skills to figure out how to use GitHub as a mirror host for all the rest of the non-GitHub lists.

@DandelionSprout
Copy link

DandelionSprout commented Oct 19, 2018

We can also use jsDelivr and UNPKG. We do want to have as many mirrors as possible.

That's also great, although I haven't looked into those sites just yet.

Edit: I somehow struggle to find out how to create repos on those two sites, let alone mirrors.

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 19, 2018

@DandelionSprout
I'll be like this one: https://github.com/NanoMeow/MDLMirror
Except that it'll be one repository for all filters.

I can set up a Lambda function or Lightsail server to automate the process.
The mirror will be updated once a day max though.

@DandelionSprout
Copy link

In comparison, NotABug mirrors can be set to sync as often as every hour (The default is 8 hours), and Repo.or.cz syncs every ca. 75min. As such, users that have connection issues with the source link and who need the newest version of a list, could be mildly inconvenienced by having 24~48hr old versions of frequently updated lists.

@KonoromiHimaries
Copy link

KonoromiHimaries commented Oct 21, 2018

https://raw.githack.com/ he look good + supports Bitbucket and GitLab

@KonoromiHimaries
Copy link

KonoromiHimaries commented Oct 21, 2018

I also propose the unilateral keeping of backups of others' filter lists, be it with or without outright permission from a list's owner, especially for those lists that have reasonably open-sourced licences.

check this https://peerpad.net/#/

@DandelionSprout
Copy link

Jspenguin seems to have written off Githack as an option for now, at least for GitHub lists.

Taking a quick look through Peerpad, I can't find any ways to view documents on it in raw mode, which is kind-of a make-or-break if they're meant to be subscribable by adblockers. It also doesn't seem to allow easy file mirroring, which rules out having me use it for my lists, at least.

@KonoromiHimaries
Copy link

Taking a quick look through Peerpad, I can't find any ways to view documents on it in raw mode, which is kind-of a make-or-break if they're meant to be subscribable by adblockers. It also doesn't seem to allow easy file mirroring, which rules out having me use it for my lists, at least.

you can report this sugestion here https://github.com/ipfs-shipyard/peer-pad/issues

@DandelionSprout
Copy link

DandelionSprout commented Oct 21, 2018

I have my eyes set on NotABug and (to an increasingly smaller degree) Repo.or.cz, but if we were to need even more places to back up lists in a subscribable format, then I'll make sure to report those things there. 👍

@neoascetic
Copy link

@jspenguin2017 rawgithack now correctly caches master branch for 5 minutes, even for CDN URLs.

@jspenguin2017
Copy link
Member Author

@neoascetic Oh, that's great! Thanks!

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 21, 2018

@DandelionSprout
I'd say we can put raw.githack.com on the list and take repo.or.cz off. I'm worried that a failover onto repo.or.cz can take their servers down.

We have way more flexibility when it comes to backing up Git history, we can even setup our own Git servers, I just need something that's 8 times lighter than GitLab... I don't know why people like Ruby on Rail, it's slow and heavy beyond reason, it might be easier to code but not easier to host.

Things I want to try:

  • Gogs (golang)
  • Gitea (golang)
  • GitBucket (scala)

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 27, 2018

Alright, I got time to test all of them out, here's the summary:

GitLab CE: Written in Ruby, so that goes out of the window.
RhodeCode CE: Written in Python, so out of the window as well.

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 27, 2018

GitBucket: Scala on Java VM, uses quite a bit of memory (300MB+ idle). Easy reverse proxy configuration. Two sets of labels for issues (normal labels and priority labels).

Simple and mobile friendly, has basically everything we need. GitBucket also has a plugin system that can be used to extend features.

image
image
image
image

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 27, 2018

Gogs: Written in Go, Gogs is blazing fast while using little memory (50MB+ idle). Even with the SQLite embedded database, it is significantly faster than GitBucket.

It is, however, not mobile friendly. And lot of configuration need to be changed using the configuration file.

image
image
image
image
image

@jspenguin2017
Copy link
Member Author

Gitea: A fork of Gogs, also in Go. Uses a little bit more resources, but has more features.

Pretty mobile friendly, although some places looks a bit weird in mobile mode. Like Gogs, a lot of configuration need to be changed using the configuration file.

image
image
image
image
image

@jspenguin2017
Copy link
Member Author

Overall, I think Gitea is the best. The extra features it has are pretty useful.

@jspenguin2017
Copy link
Member Author

jspenguin2017 commented Oct 28, 2018

It appears that Gitea's mirror feature doesn't copy issues... That's unfortunate... I still need to think about how to backup the issues.

@jspenguin2017
Copy link
Member Author

Is there anything else to do beside mirroring all resources in assets.json?

@DandelionSprout
Copy link

Not that I know of, off the top of my head.

@jspenguin2017
Copy link
Member Author

Alright. I'm closing this then.

I moved progress tracking for mirroring all resources to another issue: #225

@github-actions github-actions bot added the archived This thread was archived, open new issues for similar problems. label Aug 21, 2020
@github-actions github-actions bot locked and limited conversation to collaborators Aug 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
archived This thread was archived, open new issues for similar problems. fixed
Projects
None yet
Development

No branches or pull requests

10 participants