Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve virtualenv support & egg-link resolution #562

Closed
wants to merge 2 commits into from

Conversation

immerrr
Copy link
Contributor

@immerrr immerrr commented Apr 5, 2015

AFAIU, it is generally considered ok to mix jedi's own sys.path and virtualenv paths, but recently I have run into a conflict between setuptools/distribute found in:

  • jedi's virtualenv
  • user site directory ($HOME/.local/lib/pythonX.X/site-packages)
  • virtualenv I was actually using

This conflict caused problems in resolving something as simple as setuptools.setup and made me rethink module resolution towards "purity".

This PR adds the following:

  • add sys_path= kwarg to Script & Evaluator constructors
  • store sys_path for each evaluator instance (closes Suggestion: keep sys.path per Evaluator instance #384)
  • add get_venv_path(venv) function to determine virtualenv's path:
    • by default, dump path from live python interpreter
    • use old path extension variant for fallback
    • in old variant, use addsitedir to load .pth extension files (closes .pth status #443)
  • look for egg-link files in all directories in path

Several discussion questions here:

  • starting python interpreter from the virtualenv is the most precise way to know its sys.path, but are there drawbacks besides process startup overhead? e.g. should we clean PYTHONSTARTUP env var?
  • if running python interpreter is ok, how do we test it?
  • as path resolution grows more and more heavy (.pth, egg-links, now subprocesses), do we need to cache it somehow? and for a long-running jedi process, like ycmd, how do we flush the cache to reflect package installation/removal.
  • is there a use case for removing site.USER_SITE from lookups, too?

@immerrr
Copy link
Contributor Author

immerrr commented Apr 5, 2015

I have tried to use this with emacs and it turns out that storing sys_path per Script (or Evaluator) instances is not very useful as long as Script objects are not persistent. Otherwise one might as well just calculate the sys_path to store it in their IDE of choice and send it with the regular completion/goto-definition requests.

Another alternative would be to make Script objects stateful, enable changing source piecewise (e.g. script.set_source(begin, end, text) and run completion as completions(line, column). But something tells me that avoiding state was one of the initial design decisions.

@immerrr
Copy link
Contributor Author

immerrr commented Apr 6, 2015

I have encountered #539 which uses the same "live interpreter" approach and see it is turned down because of non-cross-version-safe imports happening in jedi. Ping me if you need this venv path detection scheme removed.

@davidhalter
Copy link
Owner

Wow! That's quite a lot! Thanks for all the work that went into this!

I really like the general direction your pushing this. Please give me some time to review it.

if sys_path is None:
venv = os.getenv('VIRTUAL_ENV')
if venv:
sys_path = list(get_venv_path(venv)) + sys.path
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if VIRTUAL_ENV is set, why is it not enough to just take the current sys.path? I have never really understood this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to have jedi installed somewhere once per system (or per user) and to have it look into various virtual environments without installing jedi in all of these environments. In Emacs one of the easiest ways to accomplish this (without pulling virtualenv names through all the APIs) is to launch jedi process with this extra environment variable whenever I change the venv I'm working on.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you change the virtualenv variable, wouldn't the sys.path also change automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. AFAIR, sys.path is generated once at the startup and the virtualenv magic really relies on looking up candidate directories relative to the python interpreter location. Here's an article that probably can explain it better than I can (especially, within the scope of a comment): http://mikeboers.com/blog/2014/05/23/where-does-the-sys-path-start

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I've read it, very good article! I think I finally understand why site.py exists and what it does.

If you start a new process with Jedi in an activated virtualenv, wouldn't we see the correct sys.path? Are you talking about a changing just the VIRTUAL_ENV variable without restarting the process?

In general, have you read the idea of #385, to make virtualenv support easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this some thought at the time and I couldn't find a way around the client being able to supply the requested sys.path to Jedi in public API. Now, to my liking, full sys_path is too long for a proper interface parameter and most of the time it will refer to one or more virtualenvs, so probably specifying virtualenv paths will look better. For example, we could add venvs=[path1, path2] to use those venvs and extra_sys_path=[path1, path2] just in case someone wants, say, to additionally use some source code checkouts without installing them into the said venvs.

If this API addition goes through, I'm not sure if it is worth keeping the VIRTUAL_ENV logic:
it boils down to having both VIRTUAL_ENV's and jedi's sys.path in one interpreter and this is better accomplished the other way around. I mean, that logic starts with jedi on sys.path and then tries to add venv path afterwards, but as long as jedi has no external dependencies, we can just start up PATH/TO/VENV/bin/python PATH/TO/JEDI/portable.py with portable.py adding its location to sys.path and then running jedi as usual. That way it will have more binary compatibility with code in the venv.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of the time it will refer to one or more virtualenvs, so probably specifying virtualenv paths will look better.

Why would it refer to multiple virtualenvs? That's not something that happens in Python itself?

If this API addition goes through, I'm not sure if it is worth keeping the VIRTUAL_ENV logic: it boils down to having both VIRTUAL_ENV's and jedi's sys.path in one interpreter and this is better accomplished the other way around. I mean, that logic starts with jedi on sys.path and then tries to add venv path afterwards,

Hmm I know what you mean. I personally think that adding sys_path as an API is really valuable, because some people really want to play with it. Most of the time this is not the case and we should focus on what happens when it is None. This still includes virtual envs. However, I agree totally with the portable.py idea. I think Jedi should be started with just the virtualenv as its path. This is what I'm talking about when I talk about Jedi's future async server/client structure. You name a certain virtualenv and Jedi starts a new process that includes that virtualenv. This way, we don't have segfaults and all other crazy stuff...

What do you think? This would however mean that we don't include this pull request, we would rather just add the sys_path logic. And BTW this is not to say that your work is bad. I'm really thankful for it. It taught me a lot! :)

~ Dave

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it refer to multiple virtualenvs? That's not something that happens in Python itself?

Yes, that was a mistake. Some time ago I tried conserving precious SSD space using a virtualenv for project dependencies and another virtualenv for common tools, like ipython, and pudb, and profilers, and whatnot. But then I realized that with conda you don't need that as it hard-links packages where possible.

What do you think? This would however mean that we don't include this pull request, we would rather just add the sys_path logic. And BTW this is not to say that your work is bad. I'm really thankful for it. It taught me a lot! :)

I am not completely against ditching this PR altogether, although I do think that this PR offers better support of the current Jedi workflow. It also doesn't seem to me that the new, arguably more correct, workflow involving portable.py and dropping all that sys_path magic is coming soon. But if it is and there's no time window during which this PR can be useful then feel free to close it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also doesn't seem to me that the new, arguably more correct, workflow involving portable.py and dropping all that sys_path magic is coming soon.

Unfortunately and probably true :( Can you quote yourself from earlier how this PR improves virtualenvs? I still don't see it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you quote yourself from earlier how this PR improves virtualenvs?

  • fetching sys_path from a live interpreter in the specified virtualenv, if possible
  • if online fetching fails, offline detector now additionally follows .pth links (cannot seem to find the addsitedir invocation in the old code)
  • egg-link files are consulted in all new path directories, not just the site-packages one

Ok. I don't see the advantage from fetching it from a different interpreter (ecept for pth stuff). However, I feel that pth is something we shouldn't really execute like this, because it can contain arbitrary Python code and I'm not sure if I want to execute that (addsitedir does this).

What are "new path directories"? I'm strongly for splitting up this PR, because it tries to change too many things at once. We also needs tests. It would be way less messy, then.

@davidhalter
Copy link
Owner

I don't see how this solves #443. Can you explain?

starting python interpreter from the virtualenv is the most precise way to know its sys.path, but are there drawbacks besides process startup overhead? e.g. should we clean PYTHONSTARTUP env var?

Good question, I think it's ok as it is. The overhead shouldn't be that crazy - let's at least try. If the overhead is too big, we could work with caches that are deleted over time and take VIRTUAL_ENV as a key. I would actually delete the PYTHONSTARTUP var, I think that would make sense.

if running python interpreter is ok, how do we test it?

Well, good question. Virtualenv stuff hasn't been really tested anyways. I guess we should start testing it. Those tests might be a bit "slow", but as long as there's an easy switch (py.test -k or --ignore), that's not a big issue. You could setup a virtualenv with code or just fake it (with a few symlinks). Whatever you prefer.

as path resolution grows more and more heavy (.pth, egg-links, now subprocesses), do we need to cache it somehow? and for a long-running jedi process, like ycmd, how do we flush the cache to reflect package installation/removal.

As written above, Jedi supports time caches. However, I stopped caching stuff without actually profiling it first. So if you think that this really takes quite a lot of time, try to profile it first. Get a good use case and then we can actually work on improving the speed there.

My intuition is that actually checking all of that is not as time consuming as you might think it is. The big parts are probably still parsing and evaluation of the tree, but I might be wrong.

is there a use case for removing site.USER_SITE from lookups, too?

I don't see a reason for that.

@immerrr
Copy link
Contributor Author

immerrr commented Apr 8, 2015

I don't see how this solves #443. Can you explain?

When using live interpreter, the interpreter's startup procedure will resolve .pth links as it happens for the regular Python scripts. For offline detection, this section is an approximation of what happens during startup.

Well, good question. Virtualenv stuff hasn't been really tested anyways. I guess we should start testing it.

I'm not sure I understand jedi/travis testing quite well, is it OK if I ask you to address the testing infrastructure (UPD: and I will do the tests themselves) ?

re:drop PYTHONSTARTUP

ok, will do

re:caching

The problem with caching this stuff is that the cache will never know when I install or remove packages into the environment, so there has to be an API to reset it.

I was looking at this from emacs-jedi perspective and couldn't find a better solution than just calculating sys.path once and then pass it for every jedi RPC command. Basically, RPC-wise it would be nice to be able to create a named piece of configuration via API (e.g. configs["virtualenv=MYVENV"] = get_venv_path('MYVENV') to live with the server and refer to it in future API requests, however it feels more like a problem for Jedi's RPC servers, not for Jedi itself. I wonder if we could do this in ycmd API...

@davidhalter davidhalter reopened this Apr 30, 2015
@davidhalter
Copy link
Owner

I'm not sure I understand jedi/travis testing quite well, is it OK if I ask you to address the testing infrastructure (UPD: and I will do the tests themselves) ?

What do you mean by that? I'm certainly willing to help, I just don't know what you mean :)

I wonder if we could do this in ycmd API...

That would certainly be an option! I think at the same time it would also be nice to have something lightweight, pure Python.

"""Get sys.path for specified virtual environment."""
try:
sys_path = _get_venv_path_online(venv)
except Exception as e:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this. Exceptions should not be catched like that. Only catch what you really know can go wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this catch-all expression is that whatever prevents this sys.path detection heuristic from working should not prevent normal jedi functioning and only report the error as a warning (which is done one line below that) as "An indication that something unexpected happened... The software is still working as expected."

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is: Whenever you do something like this, it's really hard to tell what the issue is from the outside. That's why I never do this.

@cjrh
Copy link

cjrh commented Jun 30, 2015

I've made vim-conda which modifies $PATH (for running code) and sys.path (for jedi-vim completion) depending on the selected conda environment.

I would much prefer to pass in a particular sys.path for completion as per this PR, rather than what is currently happening. I just wanted to mention that hard-coded 'VIRTUAL_ENV' statements make me a little nervous. Conda uses CONDA_DEFAULT_ENV (not a complete path) and CONDA_ENV_PATH (a complete path, but not of the virtual environment) as part of its internal mechanism for switching between envs. I guess I just wanted offer a suggestion that it might be worthwhile to treat the handling of custom sys.path args and the use of venv/conda as separate concerns. I have not looked at this PR in enough detail of offer any specific advice however.

@davidhalter
Copy link
Owner

@cjrh I agree with you: It would be nice if we had a sys.path like in this PR. I would also be ok with a separate PR for that (if you or anyone else wants to do it). There has been a long time in which @immerrr has not replied and the code base has shifted anyway (so merge conflicts will be the case anyway). Just create a new PR if you want.

@immerrr
Copy link
Contributor Author

immerrr commented Jul 12, 2015

@davidhalter i'm sorry for being rather inactive, my schedule is a bit tight.

I was under impression that you rather wanted something client-server like with a portable jedi server to be run by a python in the target environment (portable in a sense that it doesn't need to be installed in that environment).

@davidhalter
Copy link
Owner

@immerrr No problem, I don't expect you to be active, thanks for the PR anyway :) Hmm you're probably right. I wasn't actually reading the old conversation anymore. I just realized that this solution was an improvment, but has still certain short comings.

I was waiting for the questions to be answered. There are some things in this PR that are really nice (if the virtualenv stuff gets merged or not, like egg-link resolution and sys.path api support.

- add sys_path= kwarg to Script & Evaluator constructors

- store sys_path for each evaluator instance

- add get_venv_path(venv) function to determine virtualenv's path:
  - by default, dump path from live python interpreter
  - use old path extension variant for fallback
  - in old variant, use addsitedir to load .pth extension files

- look for egg-link files in all directories in path
@immerrr
Copy link
Contributor Author

immerrr commented Sep 22, 2015

hey @davidhalter is there anything else you want me to do here before merging?

I have stopped env-vars leaking into subprocesses by using -E cmdline parameter, but other than that I think the code is pretty much the same.

@immerrr
Copy link
Contributor Author

immerrr commented Oct 27, 2015

Ok, I'm probably going to close this, because the offline sys path detection after #636 is good enough for me. More accurate detection of paths that involves running a subprocess will invariably pick up some incompatible binary libraries which might crash the interpreter.

@immerrr immerrr closed this Oct 27, 2015
@davidhalter
Copy link
Owner

Ok. Thanks still. I think some of this code might even be useful when we're trying to improve virtualenv stuff, but for now I agree that this is an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants