Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting build time #80

Closed
wants to merge 15 commits into from
Closed

Predicting build time #80

wants to merge 15 commits into from

Conversation

violetcereza
Copy link
Contributor

@violetcereza violetcereza commented Aug 29, 2017

I'm not sure if this is what you had in mind, but here's my initial implementation (issue #64)! I'm open to critique and revisions on this. Here's what the output looks like for the debug plan:

> predict_runtime(plan, envir = envir, from_scratch = T)
import i
import a
import b
import c
import saveRDS
import 'input.rds'
import readRDS
import j
import h
import g
import f
Build stage 1 
  Targets: myinput 
  Est build time: 0s 
Build stage 2 
  Targets: nextone 
  Est build time: 0s 
Build stage 3 
  Targets: yourinput 
  Est build time: 0s 
Build stage 4 
  Targets: combined 
  Est build time: 0s 
Build stage 5 
  Targets: 'intermediatefile.rds' 
  Est build time: 0s 
Build stage 6 
  Targets: final 
  Est build time: 0s 

TOTAL BUILD TIME: 0s 
  0 untimed targets (never built)
  (assuming max_useful_jobs)
  (not including hashing and storage time [yet])

Jasper added 2 commits August 29, 2017 18:02
I wouldn't actually consider it the "build phase" if length(remaining_targets) == 0.
@violetcereza violetcereza changed the title Predicting build time (issue #64) Predicting build time Aug 29, 2017
@codecov-io
Copy link

codecov-io commented Aug 29, 2017

Codecov Report

Merging #80 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #80   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          42     42           
  Lines        1951   2004   +53     
=====================================
+ Hits         1951   2004   +53
Impacted Files Coverage Δ
R/outdated.R 100% <ø> (ø) ⬆️
R/timing.R 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 61e3164...843840b. Read the comment docs.

@wlandau-lilly
Copy link
Collaborator

@dapperjapper Looks like a great start. I like that you print an estimate separately for each parallelizable stage. This feature will take some iteration, and I have a couple preliminary suggestions.

  • We may want to return more than just a single time estimate. Maybe a list or data frame of the stage-specific information would help.
  • What about a range of times rather than a single time estimate? Sorry this did not occur to me before, but it may not be right to always assume either max_useful_jobs() or serial execution.
  • You could use drake:::multiline_message() to keep console output clean if there are many targets.
  • We may want to include a verbosity flag (for the imports).
  • And of course, docs and tests.

Also, I should mention that I have done a lot of development on my own this month, and none of it has been approved for release from my company yet. I will merge your feature into the upstream copy when it is ready, and I will integrate the changes into my local development copy, but the merge conflicts for the latter will be a headache. Your contribution will keep its functionality, but your code will probably look different in the end.

@wlandau-lilly
Copy link
Collaborator

@dapperjapper I just pushed 4.1.0 (on its way to CRAN). Sorry, but there are some new merge conflicts.

Jasper added 6 commits September 20, 2017 13:30
@wlandau-lilly
Copy link
Collaborator

wlandau-lilly commented Sep 21, 2017

@dapperjapper I really like your idea to include untimed_method. For my own projects, I plan to supply min() and max() to predict best-case and worst-case scenarios.

That got me thinking: what if we allowed users to manually pre-specify their own runtime predictions for each target? Actual stored build times would override them. I have a couple different interface alternatives in mind:

  1. Allow an optional third column, something like time or runtime_prediction, for the workflow plan.
  2. Allow untimed_method to be a named list, named vector, or data frame of target-level runtime predictions.

What do you think?

@ropensci ropensci deleted a comment from lintr-bot Sep 21, 2017
As per wlandau-lilly suggestions
R/timing.R Outdated
config = NULL,
...){

if (missing(config))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For missing(), see a tweet from @gaborcsardi and the ensuing thread.

@wlandau-lilly
Copy link
Collaborator

I really appreciate what you have done here, and I have thought a lot about it. It has helped me think about what a good prediction would look like and what the user might want. I just sketched out my own first attempt in the predict_runtime branch. I plan to debug it, add unit tests, and write a whole vignette on timing.

@ropensci ropensci deleted a comment from lintr-bot Sep 27, 2017
@wlandau-lilly
Copy link
Collaborator

Like I said, you really helped me think about #64. But I went ahead and wrote my own code, so I will not be accepting the exact code in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants