-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate IPFS Repositories #8543
Comments
2021-11-05 notes
|
Assuming this is about avoiding coupling within the monorepo, I'm happy to help with this bit; it should be a fairly straightforward Go test, e.g. in the root package. |
Some useful data: A topological ordering of
List of those modules and their versions in the
|
Script I'll use for transferring issues: https://gist.github.com/guseggert/b9622b794b0886d66e8fbf8f234ca709 |
I'm going to try out moving github.com/ipfs/tar-utils to Another interesting one is For now I'm not going to touch |
At the moment, assuming you change the module path, the best you can do is freeze/archive the old repo and just let the existing users keep using old versions. That's as far as we can go in terms of not "breaking" them, even though hiding newer versions from them can be a form of breakage if they use There are middle grounds perhaps, such as publishing one more version in the old module that adds something like:
The right solution is https://go-review.googlesource.com/c/proposal/+/335849, but that's still a draft proposal right now, so we wouldn't be able to rely on it for at least another 12-18 months until it's shipped in a Go release. What that would get you is full forwarding - users doing |
I'm going to provide the contrarian opinion here, and aim to keep the status quo. I think the reasons are a bit exaggerated. I am personally fine with the current layout as the purpose and interface of every different repo is very clear. It is possible to follow what happens on every different piece very easily. Dependabot makes the process painless.
Dependabot makes this trivial
One commit. Not having to wait until everything is aligned in a huge repo to publish a module release and notify the world about it.
Well, single repo or multi-repo, for anything you need to do it is necessary to ensure CI still works. In fact, CI in bigger repos is way worse and tend to be broken way more often (see go-ipfs).
Are we in a hurry? I think travis keeps working.
Was done already? It is just more lines in a config file and a bot does it?
We don't do major versions except in a very reduced number of places.
Monorepo or single repo, same responsibility.
Absolutely the same. go-ipfs centralizes most issues and many PRs and that doesn't mean they are tended to any better.
They don't. But they did before these repos were extracted from go-ipfs. The fact that these repos are separate is actually an assurance that the dependency graph is sane. At least this is not "often".
what libp2p did is way more lightweight than what is proposed here. Libp2p created a "core" repo that contains interfaces and data types, but MOST individual repositories still exist (check your libp2p dependency graph). I understand that landing on "ipfs" and seeing 50 repos (+ other 50 for libp2p) is a very daunting thing. A big "WTF, how on earth did we get to this". Consolidation may be a perfectly sane thing, but it also causes a few issues that are not there:
I want to think some of the consolidation proposed makes sense, but certainly there is room between going from 50 to 3, and reducing less, or just consolidating types and interfaces in one place (ala libp2p). Disclaimer: I created dozens of these individual repos and extracted code so that it could be re-used independently, so that building a fully functional IPFS application did not require importing go-ipfs, and you could actually pick and use versions of each module as needed, without being forced by go-ipfs to use a defined set. |
I am a huge fan of monorepos. Right now I can't use the as-a-library example because of cidutil :/ Working on this and I'll share code if I don't make progress. Basically ipfs is perfect for downloading chunked []byte blockchain state. |
For visibility, some initial experiments are happening in https://github.com/ipfs/libkubo/pull/1 |
We did not want to namesquat on "IPFS" with "libipfs", like we did with go-ipfs, but found "libkubo" to be even more confusing since its intention is to have code that is reusable and not Kubo-specific, which its name belies. Since this is a library, not an implementation, we don't think "libipfs" has the same problem that led to renaming "go-ipfs" to "kubo". So we have moved back to "libipfs", you can find the repo here: https://github.com/ipfs/go-libipfs I have written some tools to ease this migration: https://github.com/guseggert/repo-migration-tools. These tools:
You can find a generic checklist for moving a repo into go-libipfs in the Example Workflow section of the README. |
Flagging some where we some thought is going to be needed: Things that are highly depended on:
Things that aren't 'ipfs owned':
|
Agreed ^^ that list was not intended as an official todo list for libipfs, I should have made that more clear. |
Here's my proposed list of repos we should definitely migrate, repos that are unclear, and repos definitely not to migrate to go-libipfs To definitely migrate, in the order of migration:
Repos that I'm unclear about:
Repos that should definitely not be moved into go-libipfs:
|
As per discussion in ipfs/boxo#36, please don't move go-merkledag, it would be preferable to wean people off it than lock it in stone with neverending releases as if it's best-practice dagpb. github.com/ipfs/go-log really should be excluded too, it's used almost universally across all our repos as a generic logger. The fact that you've moved github.com/ipfs/go-block-format is also pretty disruptive. Ideally we wouldn't be relying on it but it's got a deep dependency tree all over the place and I don't see why it makes sense to absorb it into the mega-repo. (tbqh I think this is all much less than ideal and disruptive for everyone but Kubo). |
The end goal of this project is not to benefit Kubo devs, that is just a side effect--we want to lower the barrier to entry so that people will use these libraries and refactor/contribute, instead of using Kubo when it's not appropriate or avoiding the ecosystem altogether. The number of repos and their version inconsistencies is overwhelming, even for some of us who work on them full-time. The cost and risk of bubbling changes around between dozens of repos and versions is so high that even PL folks try to avoid making changes to them, which is not a healthy dynamic. The pain of the refactor that you're pointing out is an example of this. We believe that putting IPFS things in one place (as much as possible/practical) and treating them as one cohesive product, testing them together, and ensuring version consistency will result in a much better experience for other devs who want to build applications and implementations on top of these libraries. There will be pain as we make this transition, and some ambiguities to work out, but I think the end result for the community and users will be more than worth it. |
You've gotten concerns from other devs on this transition. What signal are you using to track if this ends up being a better experience for us? Anecdotally, I can say that the ipfs org is harder for my team (and I suspect non-stewards generally) to work in today than it was 18 months ago.
|
@guseggert to quote you from the all-hands today: "Kubo is becoming a kitchen sink" .. so "we're extracting stuff to go-libipfs". If this were all it is then I think that's a laudable goal. But what's going on here that's causing the rest of us pain is that go-libipfs is becoming the kitchen-sink replacement; it's just a shell game of kitchen sinks. By pulling in existing repos, you're building up a DX that's similar to the Kubo UX—it does too much, all in one place, with no opt-out mechanism. A lot of the components that have been sucked in here are good for one-off use, the libraries were modular and small enough that they could be pulled in for special-purpose tasks that are IPFS-ish or IPFS-adjacent, but don't require everything else. Now you're forcing us to require everything to get simple things done. Want a peertaskqueue? You need all of go-libipfs. Want to manipulate IPFS-compatible paths? Go get all-the-things and you can do that! As is our style, we have multiple generations of tools / libraries in our ecosystem; we have trouble putting things down and saying goodbye and telling users that that thing isn't supported and that either there's a replacement or it shouldn't be used at all anymore—this is something we need to get better at. Unfortunately, by baking things into an official go-libipfs, you're making it much much harder to retire components. One of my personal gripes is around the previous generation IPLD tooling. I'd really like all of the I also think the definition of "IPFS" is a huge problem here, you're essentially saying that it's all of the things in go-libipfs. That's far too expansive and just smells like Kubo's notion of "IPFS". People building their own IPFS on top of go-libipfs basically means building their own Kubo, but perhaps with some features removed. Many of us would prefer that the definition of "IPFS" be much more trim and leave room for significant innovation around it. That's best achieved by having a more decoupled set of components that may or may not be used. I really, really don't want to have to pull in this new beast repo just to get simple things done that might happen to go anywhere near the Kubo-style of "IPFS" and now find myself developing around the deprecation landmines that are being regularly set off. |
Another interesting lens to view this through: What makes a repo in or out of To me this is a weird place to end up for a "general purpose go library for IPFS" Bitswap and GraphSync seem like two widely deployed data transfer protocols for content addressed data. But I have to import This seems like we're ending up with "go-libkubo" despite the original intention for it not to be that. Other libraries are not in this specifically cause it would drag development for the other teams that maintain them. This brings us back to "a thing that is supposed to make DX easier is making it harder". I believe the missing signal here is new developers in the ecosystem, which are the ones who might possibly be helped by less complex repo structure. @rvagg @willscott and I are all devs who are experienced in the existing structure, so this can only be disruptive for us. I also wonder though if a big repo rearrange is right approach to "making it easier for new devs". It seems to me that the real barrier to entry for new devs who don't want to just talk to one of the IPFS implementations HTTP APIs is step by step instructions on how to build a functioning IPFS node from the various go libraries. I feel like this could be a first step before a big repo re-org. I'm not going to get very far with go-libipfs if I don't have some step by step code on how to use it. Either way, it seems like the arbiter is signal from new devs, so I wonder what the arbiter of that should be. |
There’s lots here 😅. I appreciate the feedback. It’s been heard by the team and me. I’m going to do my best to reply here. Much correspondence here comes from conversations and notes with @aschmahmann and @guseggert. (Anything useful or well-said should be credited to them; anything foolish or ignorant is on me.) I want to hear all sides. That said, we need to get over this hump as this is a drain on everyone while in limbo. I suspect we’re getting close to needing to disagree and commit. General comments The set of people the go-libipfs maintainers plan to be helping here is primarily people trying to build with IPFS that are currently either giving up or relying on the Kubo HTTP RPC API. Some of these people will be better served by IPFS tooling in other languages (Javascript, Rust, Java, Python, …). Still, for those who are either looking to write in Go or to leverage the set of IPFS tooling we already have in Go, we’d like to make their lives easier. We’d also like to make life easier on ourselves as the maintainers by reducing the maintenance burden that comes from being the owners of many repos and then use that time to contribute more to the community in the form of easier-to-use libraries, better implementations, improved protocols, new protocols, etc. Some of those changes will make their way into Kubo and others will not. None of us (EngRes IPFS Stewards) likes moving repos around for fun. We’re doing this because we spend time in chat channels, forums, in-person events, and engaging either directly or indirectly with companies and hackathon builders operating on our stack. Many of these people find building an IPFS implementation with just the parts they need hard, some have explicitly flagged the many repos as a problem, so we’re going to try making it easier for them. Last week alone, I know Adin got pulled into multiple conversations around people not understanding how they can pull in the relevant libraries and get going rather than pulling in all of Kubo. Over the years, there have been a few “lite” implementations that try and pull in some of the basic libraries together. However, these tend to be maintained by one person who has a lot of experience with where all the libraries are or were created with the assistance of someone who has. I don’t think this is a scalable solution for supporting many developers trying to build using IPFS. If somehow, repo consolidation manages to make things worse for both users of our stack and the maintainers, then this endeavor will not have fulfilled its intended purpose. We’re optimistic it’ll do both though. |
Is this because of the repo access permissions delays you mentioned or something else? If it's not related to repo organization (the topic of this issue), then let's cover it in a different forum. (I'd like to learn more.)
I haven't been tracking SLAs on ipfs/github-mgmt. I'm game to look into this further (but maybe best to raise an issue in that repo). That said, doesn't it support repo consolidation in a minor way? Instead of asking for permissions in many repos, it will be much fewer.
We intend to follow the merge criteria here: https://github.com/ipfs/go-libipfs#should-i-add-my-ipfs-component-to-go-libipfs At the moment the policy is:
There's been some discussion for alternatives if we want to make it easier to add experimental components (e.g., an experimental subpackage). However, there has been no PR to change the policy. If there's going to be a policy change it'll happen there. I assume some of the confusion/concern here is that this is a PR in go-libipfs, and there is a "master plan / roadmap" for this functionality showing up in go-lipipfs. I can see how that could be misinterpreted and need clarification. A couple of callouts:
I think a clarifying step we could take currently is to move this PR and its issues out into a separate repo.
If I understand correctly, as it depends on tagged versions of most of the Versioning together makes dealing with this much easier. Of course, if you need to fork go-libipfs or some subpackage, you're welcome to. If you want to contribute that code back to go-libipfs you're welcome to do that too. |
We believe doing this at a library and binary level are pretty different things. A binary level “put everything in here” allows for very little choice in what you support and results in either one-size-misfits-all defaults or unwieldy config files. A repo that has tons of sub-packages is not really that; you can use the ones you want and not use the ones you don’t. As Gus wrote above, there will still be “Careful consideration of cross-package dependencies.” Just because the packages live together and version together doesn’t mean we want them to all depend on each other.
Aside from whether that particular repo should be moved, what are you concerned about here, binary bloat? It seems like Go should mostly avoid that now with module pruning + lazy loading.
Yes, generally the more users we have dependent on a given chunk of code the more we try to avoid breaking them. Generally speaking, when we are able to retire components or make breaking changes to them it comes with an effort around communication and making upgrade paths doable. This happened with the go-ipfs-blockstore changes (dropping v0 support and breaking changes around context plumbing). It was a pain for many people depending on them, but the status quo was painful too, so we made the changes, bubbled them up and communicated with people about when the changes would be coming. A lot of the plumbing there was pretty miserable and the number of repos we had to communicate around was painful too. With a smaller number of repos these kinds of changes would be easier to execute and communicate about.
Jorropo’s proposal to bring in go-unixfs was to tag it with the “style” of the module ipfs/boxo#36. I’d hope over time we’d only feel the need to support one and take the best of what we need from each and help users migrate over, but if they’re sufficiently different and important to the community then we can have multiple in there.
I’m not sure what this means. As before we’re going to keep maintaining existing code and working to make things better, which might include migrations and breaking changes. Is a chief concern here the repo name “ipfs/go-libipfs”? Is the feedback that this should be named something different? Gus posted this #8543 (comment) in November 2022 with no comments to the contrary and so life has moved on. To be clear, I’m fully supportive if Bedrock (or any other team) would like to make a ipfs/go-greatipfslib repo rather than contribute to go-libipfs. It can be listed on docs.ipfs.tech’s list of implementations.
I’m going to sidestep the issue of the definition of IPFS. We’ll update the README to be clear that IPFS != all the things in go-libipfs, but rather if you’d like to build an IPFS implementation here are some tools you might want that are maintained by a group that has long-term commitments to the IPFS project. The fact that some of the repos Bedrock maintains and doesn’t want included here (e.g. go-car, go-graphsync, most things IPNI related) are useful to IPFS implementations is fine. Similarly, if someone uses Rust to make a UnixFS implementation that could be used through Go (via FFI or WASM) that’s cool too and absolutely doesn’t need to be part of go-libipfs (and likely shouldn’t). Our goal is to help people build things. Right now they can’t find anything or figure out how to use what they do find so they run kubo and use its HTTP RPC API… We’d like them to be able to do better. Taking the libraries they were already effectively relying on in production and making them more easily discoverable and usable is one way we’re trying.
We might have to diverge here, but what are you thinking is going to go wrong here? (I want to make sure we’re knowingly entering risks here.) However, if you find it easier to fork and maintain a subset of functionality or rewrite things in an alternative style that’s fine. If you want that code to be usable with anything that already exists you’ll need some bridging code (e.g. like the go-ipld-prime storage adapters) and if not then you won’t. |
With being on the front lines of support issues, user conversations, etc. and handling much of the maintenance of these repos, we have a perspective on what will make users’ lives better. I admit we are certainly giving preference currently to the repos that we believe help users and Kubo maintainers. I’m not going to claim this ensemble of repos is perfect and feedback welcome. I do want to make sure it’s understood though that this isn’t being formed in a vacuum.
Yeah, you’re right. Per above, libipfs isn’t exhaustive, and [we’ll update the docs](ipfs/boxo#171) to make this more clear. Per Gus comment, there isn’t any opposition to graphsync being in libipfs in principle. But given the only folks who have signed up for go-libipfs maintenance don’t maintain go-graphsync currently and because IPFS Stewards haven’t encountered users requested how to pull it in to solve their problem, I agree with Gus that it doesn’t makes sense to include currently.
I agree this is key. I realize this issue description doesn’t make that clear as this effort was originally motivated from the maintenance drain for the EngRes IPFS Stewards team. It was fueled and escalated for us though as:
Not complete, but this has been the intent of the go-libipfs examples. We make sure these pass in CI. Some gateway functionality is covered now, and we plan to expand this as as a more scalable mechanism when handling IPFS support inquiries. Contributions are also welcome here. |
Closing thoughts Here are some actions I believe should be taken:
Thanks again all for the input here. Your patience with reading, engaging, and dealing with the fallout are appreciated. |
Give a 👍 here if you'd like to be involved a synchronous closeout this week. |
Is there a realistic outcome that consists in "leaving things as they were"? Because if I'm reading it right, that is the request from comments above: to reconsider this effort. I don't think we can hide behind the "we're doing this for the developers" in a way that holds together. We have (had*) a consistent approach, which was "every distinct component lives in its repository" and we had a consistent developer workflow (open PR, get approval if needed, merge, release the component separately). It was not perfect but it was consistent and most flexible. The original issue description does not mention the topic of "supporting developers", because that was not what was driving the rationale of the change (rather it was the code maintenance effort paid by Stewards). The best way to help users and developers is mostly uncontroversial and consists in more, better and improved documentation. |
I don't think it's worth it, the issues were a mistake on my part, however I don't see what moving them now that everyone got spammed with notifications will do. (I'll if anyone insist) About the PR, it is not targeting master, I don't see what should be done here, should I make a |
Circling back to this quickly. The team has been tied up with other release and operational events this week.
The request to reconsider is heard. Leaving things as they were isn't an outcome we're considering currently.
There are also additional items related to better project maintenance/hygiene after Kubo 0.19 ships:
I understand that delayed responses and communication here aren't ideal. The offer to go-back-and-forth verbally still stands too in the interim. |
2023-02-22 update (that was also in FIL Slack): |
We have had consistent feedback from users and PL new hires that the sheer volume of repos and effort required to plumb changes around is immediately off-putting. I was one of those new hires when I created this issue, there have been others with the same sentiment, there are users who contribute with the same sentiment, and there's likely a large group of people who would have contributed or built on this ecosystem but don't because the cost is so high. I'm not sure how addressing this issue is not "for the developers" ? Sure it may not be for all developers but there is a clear signal from many folks that they don't like working with the existing setup. |
There is no more documentation, there are still dozens of modules (even in the same repo) and understanding things is complex. Plumbing a change that touches everything will be easier, but the contribution flow for people like me or anyone that uses modules separately in mix&match&fork fashion is much worse.
Not only this is an assumption, but also it doesn't take into account that the people that are OK with how things were is not reaching out to remind you that they were happy. This is a very complex change, with uncertain outcome, based on assumptions and personal preferences, that alters the status quo and has been contested. These are all flags that it is not an endeavor to take in a world with much much lower hanging fruit and more pressing matters when it comes to the adoption of our tech. I sincerely hope I am wrong and hope to see an uptick in contributions from the community in the coming months. ☮️ |
2023-03-09 update:
|
Closing this as the work is happening in ipfs/boxo and relevant docs have been updated there including: |
Description
Problem
The
go-ipfs
dependency closure includes 47 modules undergh.neting.cc/ipfs
. Here are their interdependencies (this does not include libp2p nor other PL orgs):Pain
Current Desirable Properties
go-ipfs
Why are repos structured this way?
The intention of the current layout was to encourage flexibility, extensibility, and experimentation. Functionality of IPFS could be reused in other projects without depending on IPFS as a whole.
Also these repos predate most Go tooling.
How much does a repo cost?
Repo maintenance costs include:
Why now? What's changed?
We have an increasing amount of:
Also, Go modules now exist, along with module graph pruning. The latter is key to preventing consumers from having an explosion of transient dependencies if they just want to reuse some small piece of code.
How can we consolidate repos? What's the ideal end state?
We want our repo layout to facilitate day-to-day development, while also letting us reuse components and functionality. Code that is commonly changed and built together should be in the same repo (as much as possible), so that it can be tested and released together.
We can leverage some of the new tooling around Go modules to retain the flexibility of separate repos, without having to pay the significant cost.
The ideal repo layout:
github.com/ipfs/go-*
go-libipfs
repo for long-term maintenancego-libipfs
and produces theipfs
binarygo-libipfs
Other consumers of
go-libipfs
include libp2p (datastore) and Filecoin and IPFS cluster and ipfs-lite, and the IPFS examples.What about consumers of repos we want to remove/archive? How do we roll this out?
go-libp2p did something similar a couple years ago, largely avoiding breaking consumers by shimming out existing repos to point to the consolidated one, example: https://github.com/libp2p/go-libp2p-protocol/blob/master/protocol.go
We can use this same trick to incrementally consolidate without breaking consumers.
See e.g. this PoC of moving go-namesys into go-ipfs while preserving backwards compatibility (in reality we'd move it to go-libipfs):
There may be some cases where this isn't possible without breaking changes.
The text was updated successfully, but these errors were encountered: