Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

Wikipedia Integrations #46

Open
10 tasks
jbenet opened this issue Sep 15, 2015 · 9 comments
Open
10 tasks

Wikipedia Integrations #46

jbenet opened this issue Sep 15, 2015 · 9 comments

Comments

@jbenet
Copy link
Member

jbenet commented Sep 15, 2015

We've been planning to "put Wikipedia on IPFS" for a long, long time. this issue will track possible integration points and their progress. These may lead to independent repos, etc.

In short, the way i see it, we have multiple layers of "integration" with wikipedia. these are discussed below in more detail.

  1. Archive: archive all of wikipedia on IPFS -- as in https://github.com/ipfs/archives
  2. Media: assist wikipedia.org with serving wikipedia media via IPFS ("the big stuff")
  3. Rehost: serve all of wikipedia over IPFS (falling back to ipfs http gateway)
  4. Restructure: rethink wikipedia's datastructures as CRDTs (or even basic git commits), to create new wiki software that leverages IPFS.

(4) is the most exciting to me, but wont happen for a while. (1-3) we can already do. Let's start with (1) and (2).

1. Archive: archive all of wikipedia on IPFS -- as in https://github.com/ipfs/archives

This is a matter of regularly downloading data dumps and adding them. We need to construct "help archive X" pages to publish the newest heads and guide people to help get an archive setup. (may need ipfs-cluster for good success to happen.

We can do this on our own and do not need to ask for permission, as everything is CC. (correct me if i'm wrong pls).

Steps:

  • open an issue in https://github.com/ipfs/archives
  • plan out there how to ingest all of it
  • ingest all of it
  • figure out how to keep up to date
  • make the "help archive X" pages
  • make ipfs-cluster

2. Media: assist wikipedia.org with serving wikipedia media via IPFS ("the big stuff")

This means hosting all of the big files that wikipedia has to serve. It's perhaps where we can contribute the most, but then again our poor gateway may not be able to deal with the massive bandwidth usage.

What we need, then, is

3. Rehost: serve all of wikipedia over IPFS (falling back to ipfs http gateway)

After 2 is done, we can proceed with a full mirror. (it may be easier to skip 2. and go to 3., this is to be discussed, but seems harder given difficulty on their end integrating with their backend and so on).

4. Restructure: rethink wikipedia's datastructures

This means restructuring how wikipedia's internal datastructures work to provide an editing model based on either CRDTs (or basic git commits). We could then put these directly on top of IPFS and allow people to edit + create "wikipedia commits" and "wikipedia PRs" all over IPFS.

This is a large undertaking, so perhaps step 1 is rethink the mediawiki data storage layer over ipfs first, and try making a demo. Also worth thinking about federated wiki in this context and see where "upgrading wikipedia with fedwiki" might lead. I think in general, it may be safest to just replace the storage layer first, and go from there.

To me, this is the most interesting part. But it's the biggest and the one which will take the longest to do.

@jbenet
Copy link
Member Author

jbenet commented Sep 15, 2015

@domschiener
Copy link

moved by @jbenet to #47 (comment)

@jbenet
Copy link
Member Author

jbenet commented Sep 15, 2015

moved by @jbenet to #47 (comment)

@domschiener
Copy link

moved by @jbenet to #47 (comment)

@jbenet
Copy link
Member Author

jbenet commented Sep 15, 2015

@domschiener please move this discussion to another issue i moved it to #47

@davidar
Copy link
Member

davidar commented Sep 15, 2015

👍

@rht
Copy link

rht commented Sep 17, 2015

For layer 4, at least today there are several implementations of git-based wiki (i.e. can be distributed but minus the built-in way to preserve a canonical dag chain).

@almereyda
Copy link

almereyda commented Jun 12, 2016

@opn and @WardCunningham have been working on a so-called transformerporter to load Wikipedia pages into Federated Wiki.

Entrance points to this could be


Hey, what's this?

@ldct
Copy link

ldct commented Feb 22, 2018

hey @jbenet I saw the blog post about the Turkish wikipedia dump on IPFS. Are goals 2-4 still being worked on?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants