Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Suggestion] GitHub life boat #3337

Open
HebaruSan opened this issue Mar 30, 2021 · 5 comments
Open

[Suggestion] GitHub life boat #3337

HebaruSan opened this issue Mar 30, 2021 · 5 comments
Labels
AutoUpdate Issues affecting the automatic updating Build Issues affecting the build system Core (ckan.dll) Issues affecting the core part of CKAN Discussion needed Infrastructure Issues affecting everything around CKAN (the GitHub repos, build process, CI, ...) Netkan Issues affecting the netkan data Policy Issues with our policy

Comments

@HebaruSan
Copy link
Member

HebaruSan commented Mar 30, 2021

Background

CKAN leans heavily on GitHub and GitHub services for almost everything:

  • All of our source code is here, obviously
  • Our client's releases are hosted here (and built by GitHub Actions)
    • The client's auto-updater looks for them here
  • The client gets all of its metadata from here (CKAN-meta)
  • All the meta-metadata used to generate that metadata is hosted here (NetKAN)
  • Most of our work flows revolve around pull requests, now using GitHub Actions for validation (xKAN-meta_testing)
  • Our user and developer documentation is in the project wiki
  • Many many mods are hosted here (but that's not something we are responsible for or could change)

We also use a few resources that are independent of GitHub:

  • The KSP forum
  • Discord
  • AWS / S3
  • ksp-ckan.space

Motivation

While enjoying some completely unrelated 5-year-old drama in a GitHub issue that will not be linked here, I started thinking about GitHub's business model, which is essentially "freemium": they offer free services to draw in users, some fraction of whom will then see value in paying for extras. It occurred to me that depending on the metrics available to GitHub's management, this model may not last forever. If they could analyze users or projects and determine definitively which ones will never ever pay them one dollar for anything, wouldn't it make economic sense to be less generous to those users? Maybe the drawing-in effect could be achieved just as well by a one-year free trial?

While it's true that we all have local clones of all the important git repos and could push them elsewhere, many other parts of CKAN's architecture strongly assume that GitHub will continue providing various services for free. We could be thrown into some pretty severe turmoil overnight by a "bad" announcement from GH HQ.

Ideas

It may be wise to formulate contingency plans regarding what the CKAN project would do if, for example, GitHub announced that after some date all projects would have to pay. And outline any steps that might make us so dependent on GitHub that we could no longer escape, if there are any.

  • Pay up - We decide that GitHub is so valuable that we add it to pjf's monthly expenses
  • GitLab - Move the repos across one-for-one
  • S3 - Put the metadata in an S3 bucket
  • Self-hosted - Set up our own git server for the metadata

In every case other than the first, we would have to update both the client and the Infra with the new URLs, whatever they might be.

@HebaruSan HebaruSan added Core (ckan.dll) Issues affecting the core part of CKAN Policy Issues with our policy Build Issues affecting the build system Netkan Issues affecting the netkan data Infrastructure Issues affecting everything around CKAN (the GitHub repos, build process, CI, ...) Discussion needed Metadata AutoUpdate Issues affecting the automatic updating labels Mar 30, 2021
@DasSkelett
Copy link
Member

I think a "GitHub goes paid" scenario unlikely, but I could imagine other troubles to happen; like a DMCA takedown request because someone doesn't like CKAN and doesn't know how it works, and we've already learned that GitHub is quite happy enacting them without prior notice.

In any case, I agree that we should make some thoughts and maybe even do some preparations to make an emergency jump, or a planned one because who knows why, easier.

Adding some thoughts to your ideas:

  • Pay up - We decide that GitHub is so valuable that we add it to pjf's monthly expenses
    That one's probably only relevant in the "GitHub goes paid" scenario, and wouldn't help if GH shows us the door for some reason. It would likely be more than I'd like to add to pjf's expenses, though maybe we could compensate with user donations (and our own money).

  • GitLab - Move the repos across one-for-one
    A lot of projects use GitLab as a mirror or "hot standby", with the repositories automatically syncing. This could be done right now, we could start with the CKAN, NetKAN and CKAN-meta repos simply being mirrored passively to GitLab. Then we could slowly expand it and do some client and infra changes to use them as a potential fallback. Aside from an easier (emergency) transition, this has more benefits:

    • For CKAN-meta: a fallback for when GitHub has one of its usual outages. GH returns an error for the archive -> retry from GitLab. This might actually not be that hard to implement.
    • For CKAN: if we manage to publish releases there as well, we could already make the update logic use it as a fallback. More complex to set up.
  • Self-hosted - Set up our own git server for the metadata
    Maintaining our own git server is probably not something we have the personnel for, and recent news have shown how fast this can go sideways. And this could even turn out more expensive than paying GH.

  • S3 - Put the metadata in an S3 bucket
    That's an interesting one... The archive is around 3.4MB currently, so this shouldn't even be expensive to do, to my very basic S3 billing knowledge. Maybe another fallback we could already add to the client, if it's only to handle GH outages.

@HebaruSan
Copy link
Member Author

I like the idea of passive mirroring to GitLab and treating it as a fallback now, before any disaster. 👍

@techman83
Copy link
Member

I really like the idea of passive mirroring. It's always been a bit of risk in my mind that we have a bit of an 'Eggs in One Basket' approach. I don't think we need to get too carried away too fast, but some work towards it would be good.

S3 - Put the metadata in an S3 bucket
That's an interesting one... The archive is around 3.4MB currently, so this shouldn't even be expensive to do, to my very basic S3 billing knowledge. Maybe another fallback we could already add to the client, if it's only to handle GH outages.

Storage charges aren't the killer for S3, that'll almost be nonchargeable for that amount of data. It's transfer out that will get us and we can't predict that. We could front it with a free service like CloudFlare, but that involves giving them DNS as well and another eggs/basket to worry about.

@DasSkelett
Copy link
Member

DasSkelett commented Apr 21, 2021

GitLab group created: https://gitlab.com/KSP-CKAN
Going to set up mirroring next, probably starting with this repository (CKAN).

Edit: I've now set up mirroring for all unarchived repositories except:

  • "Test": contains releases used in CKAN's unit tests; releases can't be mirrored
  • "MirrorKAN": we are about to archive it as well

I did decide pro archiving ".github", since it contains files like the Code of Conduct and PR + issue templates that are in itself useful and good to have in case of jump, even if we need to move them to different places to be useful for GitLab.
I had to name it "_github" though, since GitLab doesn't allow dots as first character.

Notable observations:

In case of pull mirroring, your user will be the author of all events in the activity feed that are the result of an update, like new branches being created or new commits being pushed to existing branches.

This means that you need to allow at least your user or "Maintainer" to push to mirrored repos, otherwise it will fail to push new commits.
I suspect you also need to enable "Allow force push" if you want to make use of "Overwrite diverged branches" (i.e. handling force-pushes in the upstream repo).
Initially I tried to prohibit pushing completely, to make them forcefully read-only.

@HebaruSan
Copy link
Member Author

HebaruSan commented Apr 23, 2021

Excellent! For what it's worth, I checked the mirror of CKAN-meta and things seem to be propagating across nicely. Trying to brainstorm the remaining steps that would allow us to close this issue as addressed via GitLab...

Fallback capabilities in the short term:

  • Figure out how to host client releases on GitLab (last I checked they have "releases" but not assets, so some clever hacking may be required)
  • Update the auto updater to fallback to GitLab if it can't access GitHub
  • Update the client's registry updater to fallback to GitLab if it can't access GitHub
    • Maybe something more generic where we can specify arbitrary many sources or host a list of URLs on ksp-ckan.space
  • Mirror the wiki to GitLab (does this exist?) or find some platform-neutral hosting for it
    • Update the client to fall back to the mirror

Longer term preparations:

  • Find/design GitLab replacements for webhooks, GitHub Actions, etc.
    • Can arguably leave the concrete implementation till after the "emergency", as long as we know it will be possible and the general approaches to use
  • Make the repository URLs configurable in the Infra so it can treat GitLab as the source of truth
    • Figure out the GitLab equivalent of GitHub tokens for the Infra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoUpdate Issues affecting the automatic updating Build Issues affecting the build system Core (ckan.dll) Issues affecting the core part of CKAN Discussion needed Infrastructure Issues affecting everything around CKAN (the GitHub repos, build process, CI, ...) Netkan Issues affecting the netkan data Policy Issues with our policy
Projects
None yet
Development

No branches or pull requests

3 participants