Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to make targets in a new session #333

Closed
wlandau opened this issue Mar 20, 2018 · 12 comments
Closed

Option to make targets in a new session #333

wlandau opened this issue Mar 20, 2018 · 12 comments

Comments

@wlandau
Copy link
Member

wlandau commented Mar 20, 2018

From #296, #330, and @rkrug.

make(session = "vanilla")

Creates the config object and then calls make_with_config() in a new callr::r_vanilla() process. Here, spend some overhead to enhance reproducibility.

make(session = "background")

Same as make(session = "vanilla"), but with callr::r_bg(). That way, make() will not block the user's R session. Here, we may also want to print a console message when make() is done.

make(session = "local")

The default option, lowest overhead, already the current behavior. Just run make_with_config() in the current R session.

Other options from callr?

Should we consider r_copycat() and r() itself? Maybe pass the callr function directly to session instead of a string? The arguments look mostly similar enough.

@rkrug
Copy link
Contributor

rkrug commented Mar 21, 2018

That sounds like a very useful suggestion.

Other possible options would be, to use a saved session. One would need a function in drake which safes the session (including loaded packages?) and which could be recalled when passed as an argument:

drake_save_session("MySession")

to save the current session in the .drake directory

make(session = "saved", sessionname = "MySession"

to start the make with a vanilla session but needs a sessionname.

r_copycat() should be the new default option, as all pollution of the user environment woluld be avoided. The behavior should be the same, I assume, as "local", which should be kept for backward compatibility(?).

@wlandau
Copy link
Member Author

wlandau commented Mar 21, 2018

Other possible options would be, to use a saved session.

I am not sure about this. It seems like something users can take care of outside drake.

r_copycat() should be the new default option, as all pollution of the user environment would be avoided.

Yes, drake is about reproducibility, but many aspects of reproducibility are just out of scope. For example, packrat is a much better solution to package reproducibility (#6) than drake could provide. On similar lines, I do not believe drake should automatically try to micromanage your session. I think of it as a different (but similar) problem than drake was designed to solve. Drake was deliberately designed to be part of the calling R session for the sake of minimizing surprises and adding convenience. I think users are already accustomed to putting their workflows in re-usable R scripts and then executing them in new sessions. I do like how r_vanilla() ignores .Rprofile files, but not enough to make it the default.

@wlandau
Copy link
Member Author

wlandau commented Mar 22, 2018

Re: #296 (comment), I agree with @krlmlr. Drake is totally R-focused, so it should concern itself with what happens inside its session (#296 (comment)). I think redrake is a better place to manage drake sessions.

@wlandau
Copy link
Member Author

wlandau commented Mar 28, 2018

Re: #296 (comment) and #296 (comment), it should not be too hard to add a session argument to make() and drake_config() that takes a function from callr. It seems like the trickiest thing will be to make sure the new session has the right global environment, and maybe some callr functions already take care of that.

I had concerns before, and my biggest remaining one is about defaults. Initially, I think the default should be to build the targets in the current session. Then, we can actually look at the overhead and make a decision for the next release.

@AlexAxthelm
Copy link
Collaborator

Initially, I think the default should be to build the targets in the current session.

I think this is a reasonable choice, for now. Perhaps using make(session = NULL) would be the best way to signal this?

@wlandau
Copy link
Member Author

wlandau commented Mar 28, 2018

I think so.

To be honest, still feel resistant to this feature, and I think a separate NULL default is a nice way to at least give it a try. Maybe I'm wrong about the overhead, but I will need to rely on the reports of users with large global environments. I also worry about users who set global options outside make() and then expect their plot targets to respond. These are things I think we will need to observe as they play out.

@AlexAxthelm
Copy link
Collaborator

I hear you, but I am viewing this as primarily a power-user feature. Maybe vanilla is way too strict, and disrupts all the global options. But on the other hand, If I'm planning on using drake in a production environment, the idea of forcing a vanilla session, and putting everything in the script rather than the environment sounds pretty nice.

@wlandau
Copy link
Member Author

wlandau commented Mar 29, 2018

Just submitted a PR (#346) to fix this. @rkrug and @AlexAxthelm, it would be great to have your input if you have time to take a look.

@rkrug
Copy link
Contributor

rkrug commented Mar 29, 2018

But on the other hand, If I'm planning on using drake in a production environment, the idea of forcing a vanilla session, and putting everything in the script rather than the environment sounds pretty nice.

That is exactly what I am planning to do. I want to use drake in a production environment (real time likely) to import data, clean, process, analyse, model, ... . And for that, having the vanilla environment will be crusial, as I will be calling drake::make() from littler.

@wlandau
Copy link
Member Author

wlandau commented Mar 29, 2018

Be warned: in #346, the new session is still unavoidably dirty. Imports are usually globals, and they need to be copied to the new session individually. Even if you define an envir for make(), you will still have this problem if envir inherits from globalenv(). If I thought this would be such a problem from the beginning, I might have designed drake to focus on script files rather than environments. But with the current design choices, #346 is as far as we can go.

@AlexAxthelm
Copy link
Collaborator

Thanks! excited to start trying this today.

@wlandau
Copy link
Member Author

wlandau commented May 18, 2018

Update: r-lib/processx#113 poses major problems with this feature if the user selects jobs > 1. It appears processx (the backend of callr) and parallel (which has mclapply(), etc.) cannot be used simultaneously without a high chance of zombie processes.

I will need to rethink the newly refactored "future_lapply" backend as well. Probably will go with system2() rather than processx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants