-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: pipenv
patterns and antipatterns for python library project
#1911
Comments
Some references to
|
I love this too. Maybe we should add this to Pipenv’s documentation, or even the Python Packaging User Guide. |
The corollary of the above advice appears to be "forego deterministic/reproducible CI builds", which strikes me as a very large anti-pattern. What are you proposing as an alternative which would still allow for determinism? |
@tsiq-oliverc Deterministic builds have their place at the moment, an application is to be built. Imagine following attempt to perform really deterministic builds of python library:
This is a lot of extra effort. And what you get is a library, which will be installed in different context (e.g. a week later standard installation will pick up upgraded dependency or two) and which will not get anything from the fact, you used The conflict is in the fact the library must never define strict dependencies inside. If you think, there is another alternative to gain deterministic builds for python library - describe it. |
@vlcinsky - If a consumer of your library uses different versions of dependencies, etc. then that's out of your control. So I agree there's no feasible way for a library maintainer to manage that. But the goal here is presumably much smaller scope. In particular, I'd see the goals for a library maintainer as the following (which are roughly equivalences):
If any of those three things don't hold, it strikes me as antithetical to quality control. So yes, I'd say that if you guarantee to support Python variants A, B and C to your consumer, and they behave differently enough that one lockfile (etc.) doesn't cut it, then you should have three lockfiles (or whatever). I haven't used Pipenv enough to know how easy that would be in practice, though. |
I'm currently considering adding I absolutely need the dependency locking (+hashing) for complying with company-wide security guidelines and I currently don't need to test with different Python versions, since there's only one that's officially supported. And the fact that pipenv simplifies setting up a local development environment, including the virtualenv, is a nice side-effect.
This is not universally true. In the world of enterprise software, you still have very specific environments that are officially supported and a security issue in a dependency results in your product being updated rather than the customer updating the dependency themselves. (Yes, I'm talking about a library, not an application here...) |
@Moritz90 your scenario is for python library in enterprise environment and there My description is aiming at general python libraries such as Getting deterministic builds is good thing, but it incurs costs. And if done wrong, you may pay extra for lower quality result - and this is what I wanted to prevent. |
I’d argue this is one of the instances we don’t want the builds to be absolutely deterministic. If you don’t pin your dependencies with |
@uranusjr - Sure. I agree that if the desire is "non-deterministic builds", then the advice up top may well make sense. In fact, it's almost a logical equivalence, and could be stated much more succinctly: "If you don't want deterministic builds, then don't use a tool ( But that's certainly not a desirable goal in general. |
@tsiq-oliverc nice scope definition - it supports focused discussion. I would add one more requirement: The CI determinism shall not hide possible issues within tested library. If we use To me it seems more important to detect issues within a library than to run CI in deterministic way. If there is a way to do both (e.g. running the test behind private pypi index, which could also support determinism) I have no problem, but if there is a conflict, I have my priorities. Do not take me wrong: there is no desire to run non-deterministic builds, my desire is to run CI builds, which will detect as much issues as possible. |
@vlcinsky Sure, I just wanted to share my experience to make sure that the updated documentation reflects it as well. The current documentation does a great job at explaining the tradeoffs:
(Highlighted the part that applies in my case.) I just want to make sure it stays that way. I think your original post contains too many blanket statements without a disclaimer that you're talking about an open-source project that's going to be published on PyPI. |
@Moritz90 I completely agree. I was trying to highlight that focus but I can make it even more visible. |
@Moritz90 I added introductory note reflecting your comment. |
@vlcinsky - That makes sense. I understand that you don't explicitly want non-deterministic builds, but I think that it's unavoidably equivalent to what you do want (i.e. to catch issues when your upstream dependencies update). Thinking out loud, what's the best way to resolve these two conflicting goals? One possibility is to have a two-phase CI process:
|
@tsiq-oliverc To get deterministic builds, I would think of following setup:
Using build pypi cache job$ git clone <repo_url> <project_dir>
$ cd <project_dir>
$ pip install pipenv
$ $ # clean pypi cache and make it ready to cache somehow - not described here
$ pipenv install -e .[test]
$ # if we need extra testing packages in pipenv
$ pipenv install <extra_test_packages>
$ # record current requirements expressed in `Pipfile.lock`
$ pipenv lock
$ # if needed, record the `Pipfile.lock` somewhere Outputs of such job are:
library testing jobthere are phases:
what we get
Another advantage is, this setup does not require developers maintaining What is still missing (and can be done)The pypi cache is the part which needs some research. I guess, simple directory would be sufficient and maybe |
As a package that does dependency resolution many of our own tests rely on deterministic builds — that is, taking known stuff and expecting a resolved graph. We use Love the lively discussion on this topic. I think the nuance is important and you should always test against known dependencies as well as unpinned ones |
I second this suggestion. It's a good idea to always have an explicit "known good state" for reproducible builds and to simplify debugging in case an update breaks something in addition to making sure that newer minor/bugfix versions work as well. (In my very personal opinion, the ideal situation would be that the package manager installs the latest minor versions by default so that libraries can always specify the concrete dependency versions that they were tested with, but I realize that's a highly controversial opinion and requires everyone to follow semver.) |
@Moritz90 @techalchemy @uranusjr @tsiq-oliverc Here is my summary from previous discussion. Particular problems and proposed solutionsMany execution contexts - who shall maintain
|
If the goal here is to update the advice in the docs, then honestly it feels irresponsible to say something dramatically different to "Follow best practice (reproducible builds) by default, until you have no choice." |
@vlcinsky Under the headline "Mode: Generate and seal", it might make sense to mention that the last successful |
The more I think about it, the more I feel like this documentation will become a section on why using |
@tsiq-oliverc vast majority of general python packages are in mode "Run, Forrest, Run". I have helped few of these packages with introducing Now there is another great tool and I wonder how to use What would I say to Flask project?
The goal is to find functional working style. If it ends in doc, nice, if not, no problem. |
@vlcinsky I'd say (1) and (4) should be the recommendation for such projects. While without a pre-existing Edit: The tl;dr version of my recommendation would be:
(Of course, the actual documentation should have a bit more detail and examples.) |
@Moritz90 I modified the "Generate and Seal" as you proposed. Re (1): easy to say, impossible to execute without being more specific. Re (4): yes, I also think, that "Generate and Seal" is most feasible mode. But in case of Flask I will not dare (at least not at the moment). Re pre-existing I guess such a workflow would be more secure compared to someone creating it (semi)manually. But my knowledge of enterprise environment is very limited. |
Hello pipenv team. I do share a lot of what is said in this text, it helps a lot any developer better understand the limitations of Pipfile/pipenv when developing a library. I do want to see this text or part of this text integrated inside the official pipenv documentation. I do have a the following amendement I would like to discuss: For our internal python package, fully reusable, published on our internal pypi, etc, and even for my own python packages (ex: cfgtree, txrwlock, pipenv-to-requirements), I use a package that some may already know or even use, that abstracts these details and make life of python developer easier: PBR. I work on a support of Pipfile for PBR, so that it will be able to read the I do not know if other similar packages exist, because it also does other things people might not want (version from git history, auto generation of AUTHORS and ChangLog). But at the end, I really feel it is so easier to write, maintain and handle versioning of a Python library, I would be sad not to share this experience. I am promoting it as the "recommended" way of writing modern python libraries on my company. I do recon that it is like "cheating" on all difficulties about library and pipenv, but at the end the work is done and developers are happy to use it so far. Part of the python training, I am giving to new python developer in my company, involves, first writing a python library maintaining Part of the reason to declare the libraries dependencies using a dedicated file also for libraries is to be able to use tools such as readthedocs or pyup (even if pyup makes more sence when linked to an application). I do not necessarily want to promote this method as the "standard" way of doing python package, it is actually the "OpenStack" way, but I would to share my experience, and if others have similar or contradictory experience, I'll be happy to ear them and update my point of view. Team, what do you think of a kind of "community" section on the documentation? So that users like me can share his experience on how he uses pipenv, without necessarily the full endorsement of pipenv team? PS: I can move this to a dedicated issue if you do not want to polute this thread |
@vlcinsky (1) is very easy to execute - put your lockfile in your repo. I think what you instead mean is: it's impossible to give specific advice once this basic strategy is no longer sufficient. That's certainly true, but that's because the specific problem probably differs on a case-by-case basis. Or to put it another way, the solution depends on what additional guarantees you want your CI workflow to provide. |
@gsemet you know what? All my python packages created in last two years are based on In case of this issue (searching for
On the other hand, I am really looking forward to a recipe of yours for pbr-lovers. I will read it. |
See I think, that one |
@vlcinsky I'm not really sure where you want to take this (I'm not sure what kind of PR you're asking for !), so I'm going to bow out of this conversation for now. To restate the TL;DR of my position:
@uranusjr Got it. Though I don't think there's anything language-specific here, it's simply that different communities have settled on different heuristics for dealing with a problem with no generic solution - if you have version conflicts, you have a problem. Maven/Java (for example) forces you to think about it at build time. The NPM way means you have runtime issues if the mismatched versions cross an interface. Runtime resolution (e.g. Python, dynamic libraries) means that a dependent may crash/etc. if the dependency version is not what it expected. |
pbr seems nice and all, but it falls under the category that I was trying to address with this:
I think such tools shouldn't be necessary in the first place.
When it comes to pypi packages, I ended up using Pipenv for handling dev-dependencies, In other modern languages I have to learn one command line tool (package manager) plus one dependency file format. Documentation is in one place and easier to follow and a newcomer will get all this sorted out in a couple of hours. It's a matter of |
I reinterate my call for a kind of "community" section/wiki for this discution. There are several "pattern" can are legit and some of us might want to share it "way of doing python libraries", some like me with pbr, and other might have a very good pattern. But a page inside the pipenv document, no sure if it is a good idea. PS: to prepare migration to the new pypi, you should use twine and not python setup.py upload. Using "upload" should be considered as an antipattern. Maybe pipenv can grow a "publish" commands ? |
@feluxe You might want to take a look at poetry. I just stumble across it and it seems that it's what you are looking for. It does what I wonder if |
I want to reiterate myself again before this discussion goes too far. Pipenv cannot simply grow a It may seem almost everyone is onboard with this merge , but the truth is there are a lot more people not joining this discussion because things work for them and they are doing something else. I’ve repeatedly said it: Discussion about improving the design of toolchains and file formats should happen somewhere higher in the Python packaging hierarchy, so it receives more exposure to people designing more fundamental things that Pipenv relies on. Please take the discussion there. There is no use suggesting it here, because Pipenv is not at the position to change it. |
I agree that the discussion on this bug spirals out of control now that packaging and publishing came up (this bug is only about dependency management!), but could you please point us at the right place to have this discussion? People are having it here because pipenv is seen as a much-needed step in the right direction, not because they want to impose additional responsibilities upon the pipenv maintainers. Edit: Sorry, I must have missed the post in which you did exactly that when reading the new comments the first time.
I very much agree with this. We should first figure out what the best possible workflow for library maintainers is right now before we come up with big plans. So let's focus on that again, as we did at the start of this thread. I don't think we've reached a conclusion yet. |
Back to topic: Quoting @uranusjr's post about why dependencies should be defined in a different file for libraries:
I still don't see why the official recommendation for libraries for now cannot be to use And I also don't see why this is an argument against defining your abstract dependencies in the same file that applications use to define their abstract dependencies. It's okay if |
See my above post. Could you please elaborate on why you think that? I simply cannot see any downside in principle. Right now, it might be a bad idea to include a Note that I've already agreed that Edit: Also, if it turns out that |
@Moritz90 Several of Python’s mailing lists would be good venues to hold this discussion. pypa-dev is the most definite for discussions centring Python packaging, and the ecosystem around it. I’d probably start here if I were to post a similar discussion. python-ideas is a place to get ideas discussed, and has quite high visibility to the whole Python community. It would also be a good starting point if you want to push this to the PEP level (eventually you would, I think). |
@tsiq-oliverc By PR I mean: show an example proving your concept viable. So pick up some existing library, fork it, apply your (1) - you say it shall be easy with If your (2) means "someone else has to do the work", your PR will not exist. In (3) you talk about "small subset of cases" without giving any real number. Are all the top libraries I described in regards to number of virtualenvs considered "small subset"? |
To conclude this discussion, I created short summary of what was found during discussion. Focus:
|
@vlcinsky You have a And when it packages your library, it will use the abstract dependencies (and not the pinned one) so you keep the flexibility when distributing your package (via PyPI for example). Tha advantage of this is that it will use abstract dependencies for libraries and the lock file for applications. This is the best of both worlds. |
@zface poetry not using pinned dependencies is literally defeating the entire purpose. Pipenv is idempotent and this requires reproduction of an environment. Please stop using this issue as a platform to try and sell everyone something that has listed as its first reason for why to use it over pipenv that the author doesn't like the cli. At the end of the day, our software is deployed across hundreds of thousands of machines and actually acknowledges and uses the best practices around packaging. If you don't want an idempotent environment and you do want to blur the lines between development and packaging please don't participate in this discussion because we are not moving in that direction and it will not be productive. |
Essentially we spend a lot of time and effort on resiliency that small projects which make lofty claims don’t have to spend as much effort on because people aren’t hitting edge cases. If you truly believe that another tool offers you the best of all worlds then I encourage you to use it— pipenv itself is not going to handle packaging for you in the near term, if ever. |
@techalchemy I am not selling anything, really, I am merely directing towards ideas that could be used in And The only time it uses abstract dependencies is when it packages the project for distribution (so basically for libraries) since in this case you do not want pinned dependencies. |
@vlcinsky There are still a few points that need to be sorted out, corrected, or expanded on, but I am still very keen on this going into documentation form, Pipenv or otherwise. Would you be interested in sending in a pull request? I’d be more than happy to help flesh out the article. |
Regarding poetry, I am not personally a fan as a whole, but it does do many correct things. It should probably not be mentioned in Pipenv docs because it violates a few best practices Pipenv devs want to push people towards, but it should be mentioned if the discussion is held in pypa-dev or similar, to provide a complete picture of how the packaging ecosystem currently is. poetry can also use more attention and contribution. This would be the best for the community, including Pipenv. With viable choices, people can weight on their choices instead of going into Pipenv head first and complaining it not doing what they expect. Good competition between libraries can also spur forward technical improvements in the dependency resolution front, which Pipenv and poetry both do (and neither perfectly). We can learn a lot from each other. |
@uranusjr Yes, I think few things were clarified and deserve sharing with wider audience. Your assistance is really welcome. What about "pair documentation drafting"? I think that at this moment it would be most effective to work on it in small scale of two persons only. Thinks to do are (possibly with one or two iterations):
If you feel like writing it on your own (based on what was discussed) and have me as a reviewer, I would not complain. I will contact you by e-mail to agree on next actions. |
@vlcinsky Also I’m available as |
@uranusjr That's what I meant by gathering effort. Python desperately need a good package manager like cargo. The Python ecosystem pales in comparison with the other languages due to the lack of a standard way to do things. And What bothers me is that Also, you say that it was inspired by cargo, npm, yarn which are packaging tools along with dependency managers while piping is not. And here is the flaw of And when you say:
What do you mean? From what I have seen, their dependency manager is much more resilient than the one provided by Anyway, I think, like you said, that both projects can learn from each other. And, one more thing, what's the future of pipenv if, ultimately, pip handles the |
If the poetry dependency manager relies on the json api it’s not only sometimes wrong due to ‘badly published packages’, it’s going to be very limited in what it can actually resolve correctly. The warehouse json api posts the most recent dependencies even if you’re dealing with an old version, and that’s if it has that info at all. We used to incorporate the json api too, it was great because it was fast, but the infrastructure team told us not to trust it. It seems a bit disingenuous to call something resilient if it relies on an unreliable source to start off with. Ultimately the challenges are around actually building a dependency graph that executed a setup file because currently, that’s how packaging works. There is just no way around it. A dependency graph that resolves on my machine may be different from one that resolves on your machine even for the same package. It’s easy to hand wave and say ‘well doesn’t that just make pipenv a virtualenv manager if pip can read a pipfile?’ No. Pipenv is a dependency manager. It manages idempotent environments and generates a reproducible lockfile. I realize this must seem trivial to you because you are waving it away and reducing this tool to a virtualenv manager, but it isn’t. We resolve lockfiles and include markers for python versions that you don’t have, aren’t using, and keep that available so that you can precisely deploy and reproduce across platforms and python versions. We use several resolution methods including handling local wheels and files, vcs repositories (we resolve the graph there too) remote artifacts, pypi packages, private indexes, etc. At the end of the day pip will handle pipfiles, that’s the plan, it’s been the plan since the format was created. But that is the same as asking ‘but what about when pip can handle requirements files?’ The question is basically identical. Pip can install that format. It’s not really relevant to any of the functionality I described other than that we also install the files (using pip, by the way). |
This just plain wrong, you can get a specific version dependencies by calling And the packaging/publishing part of python projects really need to be improved because in the end it will benefit everyone, since it will make it possible to use the JSON API reliably.
And so does |
@zface I will say this one final time, please take this to somewhere higher in the hierarchy. Pipenv does not self-proclaim to be the officially recommended Python packaging tool; it says that because it is. If you feel that is inappropriate, tell it to the officials that recommend Pipenv. Please do not put these things on Pipenv dev. This is the wrong place to complain, and you cannot possibly get resolutions for your complaints here. You can also get better answers on technical questions you have there. This is an issue tracker for Pipenv, not a discussion board for Python packaging tools and how Python packaging is done. |
Pipenv doesn't just rely on pip-tools for resolution, please stop reducing our software to one liners that demonstrate a lack of understanding. I know very well how the PyPI api works, I talked directly to the team that implemented it.
This kind of attitude is not welcome here. Do not assume we don't understand what we are talking about. Please practice courtesy.
Pipenv does not currently flatten dependency graphs. Pointing to one specific issue where a tree has been flattened and claiming the entire tool is therefore both better and more resilient is foolish, you are proving over and over again that you are simply here to insult pipenv and promote poetry. Please be on your way, this behavior is not welcome. |
I agree the discussion is way off-topic, that was trying to capitalize the "good practices" arround pipenv. However,
I share this opinion, getting new developers to successfully package their own Python code is actually complex, too complex, requires to read way to much online documentation. And pipenv (and probably poetry) is a very good step forward. Having to maintain on one side Their should be a way of extracting just the part that does the So we would be able to have the better of each world:
This another project that would not be related to |
@gsemet From my understanding PyPA has been trying to fill that with pyproject.toml instead, led by flit. You’ll need to talk to them first (at pypa-dev or distutils-sig) about this before proceeding to use Pipfile as the source format. As for parsing Pipfile (and the lock file), that is handled in pypa/pipfile (which Pipenv vendors to provide the core parsing logic). Edit: Please drop me a message if you decide to start a discussion about this in either mailing list. I do have some ideas how we can bring the two parts of Python packaging distribution together. |
I must admit I am a bit sad seeing dependencies declared in Thanks for the pointer to flit and pipfile. There is also Kennethreitz 's pipenvlib that seems lighter. PBR's setup.cfg seems more complete compared to the official documentation (ex: |
Hacking
maya
I learned few lessons which resulted in my following proposal of recommended usage ofpipenv
in python libraries. I expect others to review the proposal and if we reach agreement, the (updated) text could end up inpipenv
docs.pipenv
patterns and antipatterns for python library projectEDIT
Following is best applicable for general (mostly Open Source) python libraries, which are supposed to run on different python versions and OSes. Libraries developed in strict Enterprise environment may be different case (be sure to review all the Problems sections anyway).
END OF EDIT
TL;DR: Adding
pipenv
files into python library project is likely to introduce extra complexity and can hide some errors while not adding anything to library security. For this reason, keepPipfile
,Pipfile.lock
and.env
out of library source control.You will be able to use full power of
pipenv
regardless of it's files living in.gitignore
.Python library versus python application
By python library I mean a project, typically having
setup.py
, being targeted for distribution and usage on various platform differing in python version and/or OS.Examples being
maya
,requests
,flask
etc.On the other side (not python library) there are applications targeted for specific python interpreter, OS and often being deployed in strictly consistent environment.
pipfile
describes these differences very well in it's Pipfile vs setup.py.What is
pipenv
(deployment tool)I completely agree on the statement, that
pipenv
is deployment tool as it allows to:Pipfile.lock
) for deployment of virtual environmentIt helps when one has to deploy an application or develop in python environment very consistent across multiple developers.
To call
pipenv
packaging tool is misleading if one expects it to create python libraries or to be deeply involved in creation of them. Yes,pipenv
can help a lot (in local development of libraries) but can possibly harm (often in CI tests when used without deeper thought).Applying "security reasons" in wrong context
TL;DR:
pipenv
provides secure environment via applying approved concrete dependencies described inPipfile.lock
file and python library is only allowed to define abstract dependencies (thus cannot providePipfile.lock
).pipenv
shines in deployment scenarios following these steps:Pipfile
)Pipfile.lock
Pipfile.lock
as definition of approved python environmentpipenv sync
to apply "the golden"Pipfile.lock
elsewhere getting identical python environment.With development of python library one cannot achieve such security, because libraries must not define concrete dependencies. Breaking this rule (thus trying to declare concrete dependencies by python library) results in problems such as:
Problem: Hiding broken
setup.py
defined dependenciessetup.py
shall define all abstract dependencies viainstall_requires
.If
Pipfile
defines those dependencies too, it may easily hide problems such as:install_requires
Pipfile
defines specific rules (version ranges etc.) for a dependency andinstall_requires
does not.To prevent it, follow these rules:
Pipfile
[packages]
section inPipfile
shall be either empty or define only single dependency on the library itself.Problem:
Pipfile.lock
in repositoryKeeping
Pipfile.lock
(typically for "security reasons") in library repository is wrong, because:To prevent it, one should:
Pipfile.lock
from repository and add it into.gitignore
Problem: Competing with
tox
(hidingusedevelop
)If
tox.ini
contains in it'scommands
section entries such as:pipenv install
pipenv install --dev
pipenv lock
it is often a problem, because:
pipenv install
shall install only the library itself, andtox
is (by default) doing it too. Apart from duplicity it also prevents ofusedevelop=True
andusedevelop=False
intox.ini
becausePipenv
is able to express it only in one variant (andtox.ini
allows differencies in different environments).To prevent it, one should:
pipenv
intox.ini
. See requests tox.iniProblem: Breaking builds, if
pipenv
failspipenv
is under heavy development and things break sometime. If such issue breaks your CI build, there is a failure which could be prevented by not usingpipenv
and using traditional tools (which are often a bit more mature).To prevent it, one should:
pipenv
into a CI build script,tox.ini
or similar place. Do you know what value you get from adding it? Could be the job done with existing tooling?Summary
Key questions regarding
pipenv
role in development of python library are:pipenv
really brings? A: Virtualenv management tool.pipenv
? A: Manage virtualenv.Few more details and tricks follow.
pipenv
will not add any security to your packageDo not push it into project just because everybody does it or because you expect extra security. It will disappoint you.
Securing by using concrete (and approved) dependencies shall take place in later phase in the application going to use your library.
Keep
Pipfile
,Pipfile.lock
and.env
files out of repositoryPut the files into
.gitignore
.Pipfile
is easy to recreate as demonstrated below as most or all requirements are already defined in yoursetup.py
. And the.env
file probably contains private information, which shall not be shared.Keeping these files out of repository will prevent all the problems, which may happen with CI builds when using
pipenv
in situations, which are not appropriate.pipenv
as developer's private toolboxpipenv
may simplify developer's work as virtualenv management tool.The trick is to learn, how to quickly recreate your (private)
pipenv
related files, e.g.:Use
.env
file if you need convenient method for setting up environment variables.Remember: Keep
pipenv
usage out of your CI builds and your life will be simpler.Trick: Use
setup.py
ability to declare extras dependenciesIn your
setup.py
use theextras_requires
section:To install all dependencies declared for
tests
extra:Note, that it will always include the
install_requires
dependencies.This method does not allow spliting dependencies into default and dev sections, but this shall not be real problem in expected scenarios.
The text was updated successfully, but these errors were encountered: