Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unvendor dependencies #1668

Open
JeanChristopheMorinPerso opened this issue Feb 25, 2024 · 12 comments
Open

Unvendor dependencies #1668

JeanChristopheMorinPerso opened this issue Feb 25, 2024 · 12 comments
Labels
openssf-best-practices https://www.bestpractices.dev/en/projects/8389 tech-debt

Comments

@JeanChristopheMorinPerso
Copy link
Member

JeanChristopheMorinPerso commented Feb 25, 2024

As part of the OpenSSF Best Practices badges, we must monitor and periodically verify that none of the project dependencies contain known vulnerabilities. We also need to have the dependencies in a machine-parseable format. And we also need to make it easy to update dependencies.

Currently, all our dependencies are vendored in src/rez/vendor. This makes it hard to monitor them and update them. It's also very easy to start modifying them, which makes it even harder to update them after. Even worse, some of our vendored dependencies were heavily modified :(

This was done mainly because some studios don't have internet access and the original author of rez wanted these studios to be able to install rez in an easy way. Another reason was because rez is "special". After multiple conversations within the TSC, we think that this is not a valid concern for multiple reasons:

  • they still need to download rez in some ways
  • they probably have other softwares to install that do require internet access to be installed
  • they probably already have infrastucture in place to mirror parts of the internet. For example they probably have a PyPI cache/mirror in place (Artifactory, Nexus, devpi, etc).

But vendoring does provide value because dependencies are guaranteed to always work due to the fact that their version are baked and the code is vendored.

We should look at unvendoring our dependencies. We'll have to plan this very carefully and potentially do it gradually. We might also want to run our test suite using the lowest and highest supported version of each library (so one run with lower bounds and one with upper bounds) and have them run on a periodic basis. We'd want these tests to report any error very loudly so that we can address incompatibilities with newer versions.

We could have a requirements.txt file for users that install with the install script. That requirements file would have == for all versions. (Not sure though).

Requirements:

@maxnbk
Copy link
Contributor

maxnbk commented Feb 28, 2024

I'm not entirely on board with this idea (But hear me out).

I think a valid alternative might be looking into an automated approach to vendoring dependencies, such as using the https://pypi.org/project/vendoring/ project. (I am not saying this exactly, but, some sort of similar automated approach). I could imagine a setup where the dependencies are still shipped as vendored (provided that licensing is not a problem as it isn't a problem now), but that it is easier to automatically apply updates to the latest things automatically, automatically test that nothing is broken, etc. The same caveats of "doing this carefully" apply, and I do agree we should remove any/every vendored dep that we don't actually need, but I don't necessarily think that moving to a "user pip-installs things" model is 100% the best approach.

Edit: And, in the case that we decide to nix the easy "install rez behind studio firewall" capabilities, we should be very careful to explicitly state what approach we expect users/studios to take who are in that scenario, and vet that the approach works. While it was never an explicitly stated "compatibility guarantee", it was a fairly big implicit compatibility promise that rez made for years and years, and we shouldn't necessarily abandon it without explicit alternatives laid out.

@JeanChristopheMorinPerso
Copy link
Member Author

Thanks for the input @maxnbk!

I'd be very interested in understanding what makes you hesitate. You probably wrote about this in the past, but I can't remember your arguments. When you have time, can you describe what value you see in keeping our dependencies vendored please? I did try to add some pros in the issue description, but I'm sure I missed some.

It might be one of these decisions that will require us to prob the community to get a good sense of what they want. I could even take some time to write a proper REP with lots of details and clear details on how things would work for our users.

I think a valid alternative might be looking into an automated approach to vendoring dependencies, such as using the https://pypi.org/project/vendoring/ project.

I thought about using vendoring and I think it would work. But it wouldn't remove us the burden of keeping things up to date. We'd have to come up with a way to automatically monitor the vendored dependencies and update them. That could be achieved by creating a requirements.txt file and feed it to vendoring. With that requirements.txt file, we could instruct dependabot to create PRs when newer versions are available. The missing piece is how to actually trigger the re-vendoring process from there. I'm not sure how we could do that.

I'm still not sold to the idea though. It just adds us more work and things to maintain. Another argument for unvendoring is that we get support for newer python versions for free.

Edit: And, in the case that we decide to nix the easy "install rez behind studio firewall" capabilities, we should be very careful to explicitly state what approach we expect users/studios to take who are in that scenario, and vet that the approach works. While it was never an explicitly stated "compatibility guarantee", it was a fairly big implicit compatibility promise that rez made for years and years, and we shouldn't necessarily abandon it without explicit alternatives laid out.

I agree that if we go that route, we'll need to better document how to deploy rez.

@JeanChristopheMorinPerso
Copy link
Member Author

@instinct-vfx and @bpabel any opinions? I think Thorsten was/is supportive of unvendoring our dependencies, but a written statement might help here.

@bpabel
Copy link
Contributor

bpabel commented Feb 29, 2024

I don't see any reasons to vendor something that could simply be specified as a dependency that is handled at build time, so I'd prefer to get rid of them.

That being said, I think the most likely issue with switching from vendored libraries to dependencies is that it introduces the potential for dependency conflicts with other libraries that use the rez python api. I think that's actually a good thing for users of the api, because it means the 3rd party types are better defined and documented, rather than being an unknown version of some vendored library. And vendoring a library that's a different version than the normal installed version in a python environment has potential for weird errors as well, so vendoring isn't really a ironclad solution anyways. I'd prefer to offload that problem to package management systems.

For the "install rez behind firewall" situation, all studios have at least one machine connected to the internet, and are capable of building a rez installation that can then by manually copied to all the other airgapped machines, so I don't really see that as a real issue. I also don't really like that rez needs to be installed from source. That really shouldn't be the way that rez is most commonly distributed IMHO.

I'd prefer to see rez deployed mainly as a bundled exe/app. PyInstaller is an option, but there are a ton of others. Technically, we'd still be redistributing dependencies, but they would be managed as dependencies via a package management tool, like pip-tools or poetry, instead of vendoring them. Using a package manager for dependencies that are resolved at build time makes managing them easier. It makes it trivial to use compiled dependencies with different versions for different architectures, python versions, OS's, etc. At the moment, we're basically limited to pure python dependencies because we vendor them.

Even better if we can distribute MSI installers, app pkg's, brew installers, etc., though I know that's asking a lot.

Having a bundled exe/app allows us to still install in airgapped environments and get rid of vendoring.

@chadrik
Copy link
Contributor

chadrik commented Mar 21, 2024

From my POV, the most important aspect of vendoring was -- and still remains -- site package isolation. Rez's python dependencies must be accessible only to the rez CLI -- they should not end up on the PYTHONPATH within environments created by Rez. Hopefully that statement comes as no surprise.

That said, this story gets more complicated for those who wish to use rez as a python API. With the vendored solution, you have a guarantee that providing access to the rez python package via the PYTHONPATH will not introduce Rez's third-party python libraries, e.g. yaml, in a way that could cause conflicts with your own third-party requirements. This is a nice guarantee to have for such a low-level tool. Providing access to rez's python dependencies via rez-pip is not a valid solution, because in order to detect/prevent conflicts, a studio would need to adopt rez-pip for their own tooling and, frankly, rez-pip is an abomination (Any solution that could add potentially hundreds of entries to the PYTHONPATH is a non-starter for performance reasons. Also, Windows has a very real maximum on env-var length, and also python's platform compatibility tags can get very complex when it comes to platform, implementation, and ABI compatibility and trying to reverse engineer it using Rez concepts is a Sisyphean task). I think that unvendoring will force you into a position where you, as the authors of Rez, are forced to prescribe how the users of Rez ought to handle python dependencies (in order to provide access to rez as a python API), and I think you
may want to carefully consider if you want to insert yourself into that debate. As one of the lowest level tools in a studio, Rez is special -- it needs to keep a low profile and avoid creating or complicating the types of problems it's designed to solve.

Personally, I think the best choice is an automated vendoring solution like the one proposed by @maxnbk, so that you can achieve your OpenSSF goal rapidly without painfully rediscovering why vendoring was introduced in the first place. If you choose not to vendor, that's fine, just make sure you carefully consider the API scenario presented above.

@chadrik
Copy link
Contributor

chadrik commented Mar 21, 2024

By the way, the policy and rationale for pip’s use of vendoring is probably relevant here: https://pip.pypa.io/en/stable/development/vendoring-policy/#

@instinct-vfx
Copy link
Contributor

Sorry for not following up earlier. Will give some feedback today.

@instinct-vfx
Copy link
Contributor

I have always been pro-unvendoring (or contra-vendoring?). The current status is a testament to how bad things get. The most recent (!) version vendored is 4 years old. For some of the dependencies we do not even really know. Not even if we may have modified dependencies.

For $current_job, we isolate our rez runtime environment completely. But even if you don't would you not at least use a virtualenv? Why would that force us to prescribe handling of dependencies? It is a virtualenv separate from whatever you would need on the system level or in other runtime environments.
I actually feel it is the other way around right now: I have no choice but use old vendored versions (that might get flagged by security scans and worst case enforce removal of rez alltogether). With dependencies documented as python dependencies we also get to use dependabot for example.

re: requirements.txt: For some of our projects we use requirements.in -> requirements.txt compilation to manage "general restrictions for requirements vs. version pins for releases". That seems to work quite well.

re: air gapped scenarios: My take always was there needs to be some kind of process in place already if you are in an air-gapped scenario and my expectation would be for studios to use their chosen process. (we are not, so this is pure speculation on my behalf!). We also use Nexus as a pypi proxy/cache that we also use to host internal packages privately for example.

Sorry for the brevity, this week is wild here. Also keep in mind this is from a mostly Windows perspective so i might be missing Linux specifics completely.

@maxnbk
Copy link
Contributor

maxnbk commented Mar 21, 2024

Just to be clear, nothing says that an automated-vendoring approach wouldn't also use dependabot-updatable-requirements files. I don't see that as an entirely valid argument either for or against; We want dependabot's to work and flag issues, but neither approach (other than leaving it as it is) has to go without that aspect.

I could imagine a world where we could ship a "all dependencies vendored-in" package, and an "some assembly required but you can tweak the requirements yourselves or bring in your own pypi cache or whatever"-package. (Which is really just the source, but that's semantics.)

@JeanChristopheMorinPerso
Copy link
Member Author

Thank you so much @chadrik for taking some time to give us your perspective on this!

I'll take some time to reply to react to @instinct-vfx and @chadrik later, but

I could imagine a world where we could ship a "all dependencies vendored-in" package, and an "some assembly required but you can tweak the requirements yourselves or bring in your own pypi cache or whatever"-package. (Which is really just the source, but that's semantics.)

is what I'm kind of what I also was starting to think about.

@maxnbk
Copy link
Contributor

maxnbk commented Mar 21, 2024

I'm also not married to the "all deps vendored" not just being a matrix of platform/arch/os venv's that get built and shipped out from the same requirements, if it has to be that. Automation can still be what we get at the end of the day. Of course, it'd be a lot easier if we can still rely on locking ourselves to ABI-agnostic dependencies.

@dbr
Copy link
Contributor

dbr commented Aug 9, 2024

Two thoughts:

1: We have a way to download files to the locked-down production network - but not an easy way to mirror arbitrary set of PyPI dependencies.

So it is meaningfully easier to grab the rez source tarball and be ready to install it, compared to to grabbing the rez and finding/downloading 19 separate packages of the compatible versions for the vendored dependencies

I suspect this is not uncommon - it's been case in at least two places I've worked

Something like Gaffer's dependencies repo might be a good middleground - where the packages can be referenced in a way compatible with dependency scans, but also a known-good set of dependencies are packaged into a single download

2: A bunch of the dependencies (which are nicely listed here) look like they should maybe be removed, rather than unvendored, e.g

  • six should be unneeded now Python 2 support is dropped
  • enum also included in Python 3.4 onward
  • atomicwrites is unmaintained, and from it's README seems like it can now be implemented with os.rename
  • pydot seems like it could potentially replaced with a small helper function (removing the pyparsing dependency also) - as, based on very brief search, it it only seems to be used in graph_utils.py to essentially call dot tempfile.dot -o output.png

It's also hard to tell from that list which are dependencies-of-dependencies (e.g I think six is not used directly, unclear what requires it), and how that situation would change updating to newer versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openssf-best-practices https://www.bestpractices.dev/en/projects/8389 tech-debt
Projects
Status: Todo
Development

No branches or pull requests

6 participants