Improve virtualenv support & egg-link resolution #562

immerrr · 2015-04-05T10:36:41Z

AFAIU, it is generally considered ok to mix jedi's own sys.path and virtualenv paths, but recently I have run into a conflict between setuptools/distribute found in:

jedi's virtualenv
user site directory ($HOME/.local/lib/pythonX.X/site-packages)
virtualenv I was actually using

This conflict caused problems in resolving something as simple as setuptools.setup and made me rethink module resolution towards "purity".

This PR adds the following:

add sys_path= kwarg to Script & Evaluator constructors
store sys_path for each evaluator instance (closes Suggestion: keep sys.path per Evaluator instance #384)
add get_venv_path(venv) function to determine virtualenv's path:
- by default, dump path from live python interpreter
- use old path extension variant for fallback
- in old variant, use addsitedir to load .pth extension files (closes .pth status #443)
look for egg-link files in all directories in path

Several discussion questions here:

starting python interpreter from the virtualenv is the most precise way to know its sys.path, but are there drawbacks besides process startup overhead? e.g. should we clean PYTHONSTARTUP env var?
if running python interpreter is ok, how do we test it?
as path resolution grows more and more heavy (.pth, egg-links, now subprocesses), do we need to cache it somehow? and for a long-running jedi process, like ycmd, how do we flush the cache to reflect package installation/removal.
is there a use case for removing site.USER_SITE from lookups, too?

immerrr · 2015-04-05T20:39:36Z

I have tried to use this with emacs and it turns out that storing sys_path per Script (or Evaluator) instances is not very useful as long as Script objects are not persistent. Otherwise one might as well just calculate the sys_path to store it in their IDE of choice and send it with the regular completion/goto-definition requests.

Another alternative would be to make Script objects stateful, enable changing source piecewise (e.g. script.set_source(begin, end, text) and run completion as completions(line, column). But something tells me that avoiding state was one of the initial design decisions.

immerrr · 2015-04-06T14:54:55Z

I have encountered #539 which uses the same "live interpreter" approach and see it is turned down because of non-cross-version-safe imports happening in jedi. Ping me if you need this venv path detection scheme removed.

davidhalter · 2015-04-07T12:23:21Z

Wow! That's quite a lot! Thanks for all the work that went into this!

I really like the general direction your pushing this. Please give me some time to review it.

davidhalter · 2015-04-07T12:28:55Z

jedi/api/__init__.py

+        if sys_path is None:
+            venv = os.getenv('VIRTUAL_ENV')
+            if venv:
+                sys_path = list(get_venv_path(venv)) + sys.path


So if VIRTUAL_ENV is set, why is it not enough to just take the current sys.path? I have never really understood this.

The idea is to have jedi installed somewhere once per system (or per user) and to have it look into various virtual environments without installing jedi in all of these environments. In Emacs one of the easiest ways to accomplish this (without pulling virtualenv names through all the APIs) is to launch jedi process with this extra environment variable whenever I change the venv I'm working on.

But if you change the virtualenv variable, wouldn't the sys.path also change automatically?

I don't think so. AFAIR, sys.path is generated once at the startup and the virtualenv magic really relies on looking up candidate directories relative to the python interpreter location. Here's an article that probably can explain it better than I can (especially, within the scope of a comment): http://mikeboers.com/blog/2014/05/23/where-does-the-sys-path-start

Hmm I've read it, very good article! I think I finally understand why site.py exists and what it does.

If you start a new process with Jedi in an activated virtualenv, wouldn't we see the correct sys.path? Are you talking about a changing just the VIRTUAL_ENV variable without restarting the process?

In general, have you read the idea of #385, to make virtualenv support easier?

I gave this some thought at the time and I couldn't find a way around the client being able to supply the requested sys.path to Jedi in public API. Now, to my liking, full sys_path is too long for a proper interface parameter and most of the time it will refer to one or more virtualenvs, so probably specifying virtualenv paths will look better. For example, we could add venvs=[path1, path2] to use those venvs and extra_sys_path=[path1, path2] just in case someone wants, say, to additionally use some source code checkouts without installing them into the said venvs.

If this API addition goes through, I'm not sure if it is worth keeping the VIRTUAL_ENV logic:
it boils down to having both VIRTUAL_ENV's and jedi's sys.path in one interpreter and this is better accomplished the other way around. I mean, that logic starts with jedi on sys.path and then tries to add venv path afterwards, but as long as jedi has no external dependencies, we can just start up PATH/TO/VENV/bin/python PATH/TO/JEDI/portable.py with portable.py adding its location to sys.path and then running jedi as usual. That way it will have more binary compatibility with code in the venv.

most of the time it will refer to one or more virtualenvs, so probably specifying virtualenv paths will look better.

Why would it refer to multiple virtualenvs? That's not something that happens in Python itself?

If this API addition goes through, I'm not sure if it is worth keeping the VIRTUAL_ENV logic: it boils down to having both VIRTUAL_ENV's and jedi's sys.path in one interpreter and this is better accomplished the other way around. I mean, that logic starts with jedi on sys.path and then tries to add venv path afterwards,

Hmm I know what you mean. I personally think that adding sys_path as an API is really valuable, because some people really want to play with it. Most of the time this is not the case and we should focus on what happens when it is None. This still includes virtual envs. However, I agree totally with the portable.py idea. I think Jedi should be started with just the virtualenv as its path. This is what I'm talking about when I talk about Jedi's future async server/client structure. You name a certain virtualenv and Jedi starts a new process that includes that virtualenv. This way, we don't have segfaults and all other crazy stuff...

What do you think? This would however mean that we don't include this pull request, we would rather just add the sys_path logic. And BTW this is not to say that your work is bad. I'm really thankful for it. It taught me a lot! :)

~ Dave

Why would it refer to multiple virtualenvs? That's not something that happens in Python itself?

Yes, that was a mistake. Some time ago I tried conserving precious SSD space using a virtualenv for project dependencies and another virtualenv for common tools, like ipython, and pudb, and profilers, and whatnot. But then I realized that with conda you don't need that as it hard-links packages where possible.

What do you think? This would however mean that we don't include this pull request, we would rather just add the sys_path logic. And BTW this is not to say that your work is bad. I'm really thankful for it. It taught me a lot! :)

I am not completely against ditching this PR altogether, although I do think that this PR offers better support of the current Jedi workflow. It also doesn't seem to me that the new, arguably more correct, workflow involving portable.py and dropping all that sys_path magic is coming soon. But if it is and there's no time window during which this PR can be useful then feel free to close it.

It also doesn't seem to me that the new, arguably more correct, workflow involving portable.py and dropping all that sys_path magic is coming soon.

Unfortunately and probably true :( Can you quote yourself from earlier how this PR improves virtualenvs? I still don't see it :)

Can you quote yourself from earlier how this PR improves virtualenvs?

fetching sys_path from a live interpreter in the specified virtualenv, if possible

if online fetching fails, offline detector now additionally follows .pth links (cannot seem to find the addsitedir invocation in the old code)

egg-link files are consulted in all new path directories, not just the site-packages one

Ok. I don't see the advantage from fetching it from a different interpreter (ecept for pth stuff). However, I feel that pth is something we shouldn't really execute like this, because it can contain arbitrary Python code and I'm not sure if I want to execute that (addsitedir does this).

What are "new path directories"? I'm strongly for splitting up this PR, because it tries to change too many things at once. We also needs tests. It would be way less messy, then.

davidhalter · 2015-04-08T13:27:32Z

I don't see how this solves #443. Can you explain?

starting python interpreter from the virtualenv is the most precise way to know its sys.path, but are there drawbacks besides process startup overhead? e.g. should we clean PYTHONSTARTUP env var?

Good question, I think it's ok as it is. The overhead shouldn't be that crazy - let's at least try. If the overhead is too big, we could work with caches that are deleted over time and take VIRTUAL_ENV as a key. I would actually delete the PYTHONSTARTUP var, I think that would make sense.

if running python interpreter is ok, how do we test it?

Well, good question. Virtualenv stuff hasn't been really tested anyways. I guess we should start testing it. Those tests might be a bit "slow", but as long as there's an easy switch (py.test -k or --ignore), that's not a big issue. You could setup a virtualenv with code or just fake it (with a few symlinks). Whatever you prefer.

as path resolution grows more and more heavy (.pth, egg-links, now subprocesses), do we need to cache it somehow? and for a long-running jedi process, like ycmd, how do we flush the cache to reflect package installation/removal.

As written above, Jedi supports time caches. However, I stopped caching stuff without actually profiling it first. So if you think that this really takes quite a lot of time, try to profile it first. Get a good use case and then we can actually work on improving the speed there.

My intuition is that actually checking all of that is not as time consuming as you might think it is. The big parts are probably still parsing and evaluation of the tree, but I might be wrong.

is there a use case for removing site.USER_SITE from lookups, too?

I don't see a reason for that.

immerrr · 2015-04-08T13:50:14Z

I don't see how this solves #443. Can you explain?

When using live interpreter, the interpreter's startup procedure will resolve .pth links as it happens for the regular Python scripts. For offline detection, this section is an approximation of what happens during startup.

Well, good question. Virtualenv stuff hasn't been really tested anyways. I guess we should start testing it.

I'm not sure I understand jedi/travis testing quite well, is it OK if I ask you to address the testing infrastructure (UPD: and I will do the tests themselves) ?

re:drop PYTHONSTARTUP

ok, will do

re:caching

The problem with caching this stuff is that the cache will never know when I install or remove packages into the environment, so there has to be an API to reset it.

I was looking at this from emacs-jedi perspective and couldn't find a better solution than just calculating sys.path once and then pass it for every jedi RPC command. Basically, RPC-wise it would be nice to be able to create a named piece of configuration via API (e.g. configs["virtualenv=MYVENV"] = get_venv_path('MYVENV') to live with the server and refer to it in future API requests, however it feels more like a problem for Jedi's RPC servers, not for Jedi itself. I wonder if we could do this in ycmd API...

davidhalter · 2015-04-30T17:55:01Z

I'm not sure I understand jedi/travis testing quite well, is it OK if I ask you to address the testing infrastructure (UPD: and I will do the tests themselves) ?

What do you mean by that? I'm certainly willing to help, I just don't know what you mean :)

I wonder if we could do this in ycmd API...

That would certainly be an option! I think at the same time it would also be nice to have something lightweight, pure Python.

davidhalter · 2015-05-01T00:06:56Z

jedi/evaluate/sys_path.py

+    """Get sys.path for specified virtual environment."""
+    try:
+        sys_path = _get_venv_path_online(venv)
+    except Exception as e:


I don't like this. Exceptions should not be catched like that. Only catch what you really know can go wrong.

The idea behind this catch-all expression is that whatever prevents this sys.path detection heuristic from working should not prevent normal jedi functioning and only report the error as a warning (which is done one line below that) as "An indication that something unexpected happened... The software is still working as expected."

Problem is: Whenever you do something like this, it's really hard to tell what the issue is from the outside. That's why I never do this.

cjrh · 2015-06-30T23:09:57Z

I've made vim-conda which modifies $PATH (for running code) and sys.path (for jedi-vim completion) depending on the selected conda environment.

I would much prefer to pass in a particular sys.path for completion as per this PR, rather than what is currently happening. I just wanted to mention that hard-coded 'VIRTUAL_ENV' statements make me a little nervous. Conda uses CONDA_DEFAULT_ENV (not a complete path) and CONDA_ENV_PATH (a complete path, but not of the virtual environment) as part of its internal mechanism for switching between envs. I guess I just wanted offer a suggestion that it might be worthwhile to treat the handling of custom sys.path args and the use of venv/conda as separate concerns. I have not looked at this PR in enough detail of offer any specific advice however.

davidhalter · 2015-07-12T18:24:58Z

@cjrh I agree with you: It would be nice if we had a sys.path like in this PR. I would also be ok with a separate PR for that (if you or anyone else wants to do it). There has been a long time in which @immerrr has not replied and the code base has shifted anyway (so merge conflicts will be the case anyway). Just create a new PR if you want.

immerrr · 2015-07-12T19:21:21Z

@davidhalter i'm sorry for being rather inactive, my schedule is a bit tight.

I was under impression that you rather wanted something client-server like with a portable jedi server to be run by a python in the target environment (portable in a sense that it doesn't need to be installed in that environment).

davidhalter · 2015-07-12T23:13:17Z

@immerrr No problem, I don't expect you to be active, thanks for the PR anyway :) Hmm you're probably right. I wasn't actually reading the old conversation anymore. I just realized that this solution was an improvment, but has still certain short comings.

I was waiting for the questions to be answered. There are some things in this PR that are really nice (if the virtualenv stuff gets merged or not, like egg-link resolution and sys.path api support.

- add sys_path= kwarg to Script & Evaluator constructors - store sys_path for each evaluator instance - add get_venv_path(venv) function to determine virtualenv's path: - by default, dump path from live python interpreter - use old path extension variant for fallback - in old variant, use addsitedir to load .pth extension files - look for egg-link files in all directories in path

immerrr · 2015-09-22T16:51:39Z

hey @davidhalter is there anything else you want me to do here before merging?

I have stopped env-vars leaking into subprocesses by using -E cmdline parameter, but other than that I think the code is pretty much the same.

immerrr · 2015-10-27T20:27:24Z

Ok, I'm probably going to close this, because the offline sys path detection after #636 is good enough for me. More accurate detection of paths that involves running a subprocess will invariably pick up some incompatible binary libraries which might crash the interpreter.

davidhalter · 2015-10-28T10:21:49Z

Ok. Thanks still. I think some of this code might even be useful when we're trying to improve virtualenv stuff, but for now I agree that this is an issue.

immerrr force-pushed the virtualenv-per-evaluator branch from a64a55f to c0ea437 Compare April 5, 2015 21:19

davidhalter reviewed Apr 7, 2015
View reviewed changes

This was referenced Apr 7, 2015

Autocompletion in virtualenv davidhalter/jedi-vim#379

Closed

Jedi 0.9.0 release #566

Closed

davidhalter closed this Apr 30, 2015

davidhalter reopened this Apr 30, 2015

davidhalter reviewed May 1, 2015
View reviewed changes

immerrr added 2 commits July 13, 2015 10:01

_get_venv_path_online: prevent PYTHON* vars leaking into subprocess (-E)

96b87b5

immerrr force-pushed the virtualenv-per-evaluator branch from e26583b to 96b87b5 Compare September 22, 2015 16:50

davidhalter mentioned this pull request Oct 6, 2015

[READY] Use a http wrapper for jedi ycm-core/ycmd#246

Merged

immerrr mentioned this pull request Oct 20, 2015

Improve virtualenv support & egg-link resolution #636

Merged

immerrr closed this Oct 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve virtualenv support & egg-link resolution #562

Improve virtualenv support & egg-link resolution #562

immerrr commented Apr 5, 2015

immerrr commented Apr 5, 2015

immerrr commented Apr 6, 2015

davidhalter commented Apr 7, 2015

davidhalter Apr 7, 2015

immerrr Apr 7, 2015

davidhalter Apr 7, 2015

immerrr Apr 7, 2015

davidhalter Apr 8, 2015

immerrr Jul 13, 2015

davidhalter Sep 30, 2015

immerrr Sep 30, 2015

davidhalter Oct 1, 2015

immerrr Oct 1, 2015

davidhalter commented Apr 8, 2015

immerrr commented Apr 8, 2015

davidhalter commented Apr 30, 2015

davidhalter May 1, 2015

immerrr Jul 12, 2015

davidhalter Sep 30, 2015

cjrh commented Jun 30, 2015

davidhalter commented Jul 12, 2015

immerrr commented Jul 12, 2015

davidhalter commented Jul 12, 2015

immerrr commented Sep 22, 2015

immerrr commented Oct 27, 2015

davidhalter commented Oct 28, 2015

Improve virtualenv support & egg-link resolution #562

Improve virtualenv support & egg-link resolution #562

Conversation

immerrr commented Apr 5, 2015

immerrr commented Apr 5, 2015

immerrr commented Apr 6, 2015

davidhalter commented Apr 7, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidhalter commented Apr 8, 2015

immerrr commented Apr 8, 2015

davidhalter commented Apr 30, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjrh commented Jun 30, 2015

davidhalter commented Jul 12, 2015

immerrr commented Jul 12, 2015

davidhalter commented Jul 12, 2015

immerrr commented Sep 22, 2015

immerrr commented Oct 27, 2015

davidhalter commented Oct 28, 2015