Predicting build time #80

violetcereza · 2017-08-29T22:07:54Z

I'm not sure if this is what you had in mind, but here's my initial implementation (issue #64)! I'm open to critique and revisions on this. Here's what the output looks like for the debug plan:

> predict_runtime(plan, envir = envir, from_scratch = T)
import i
import a
import b
import c
import saveRDS
import 'input.rds'
import readRDS
import j
import h
import g
import f
Build stage 1 
  Targets: myinput 
  Est build time: 0s 
Build stage 2 
  Targets: nextone 
  Est build time: 0s 
Build stage 3 
  Targets: yourinput 
  Est build time: 0s 
Build stage 4 
  Targets: combined 
  Est build time: 0s 
Build stage 5 
  Targets: 'intermediatefile.rds' 
  Est build time: 0s 
Build stage 6 
  Targets: final 
  Est build time: 0s 

TOTAL BUILD TIME: 0s 
  0 untimed targets (never built)
  (assuming max_useful_jobs)
  (not including hashing and storage time [yet])

I wouldn't actually consider it the "build phase" if length(remaining_targets) == 0.

#64

codecov-io · 2017-08-29T22:17:16Z

Codecov Report

Merging #80 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master    #80   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          42     42           
  Lines        1951   2004   +53     
=====================================
+ Hits         1951   2004   +53

Impacted Files	Coverage Δ
R/outdated.R	`100% <ø> (ø)`	⬆️
R/timing.R	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 61e3164...843840b. Read the comment docs.

wlandau-lilly · 2017-08-30T13:00:48Z

@dapperjapper Looks like a great start. I like that you print an estimate separately for each parallelizable stage. This feature will take some iteration, and I have a couple preliminary suggestions.

We may want to return more than just a single time estimate. Maybe a list or data frame of the stage-specific information would help.
What about a range of times rather than a single time estimate? Sorry this did not occur to me before, but it may not be right to always assume either max_useful_jobs() or serial execution.
You could use drake:::multiline_message() to keep console output clean if there are many targets.
We may want to include a verbosity flag (for the imports).
And of course, docs and tests.

Also, I should mention that I have done a lot of development on my own this month, and none of it has been approved for release from my company yet. I will merge your feature into the upstream copy when it is ready, and I will integrate the changes into my local development copy, but the merge conflicts for the latter will be a headache. Your contribution will keep its functionality, but your code will probably look different in the end.

wlandau-lilly · 2017-09-01T21:47:55Z

@dapperjapper I just pushed 4.1.0 (on its way to CRAN). Sorry, but there are some new merge conflicts.

Conflicts: R/cache.R R/parallel.R tests/testthat/test-cache.R

And more condensed output, returns data.table instead of single number, smarter (?) about untimed targets

wlandau-lilly · 2017-09-21T02:58:48Z

@dapperjapper I really like your idea to include untimed_method. For my own projects, I plan to supply min() and max() to predict best-case and worst-case scenarios.

That got me thinking: what if we allowed users to manually pre-specify their own runtime predictions for each target? Actual stored build times would override them. I have a couple different interface alternatives in mind:

Allow an optional third column, something like time or runtime_prediction, for the workflow plan.
Allow untimed_method to be a named list, named vector, or data frame of target-level runtime predictions.

What do you think?

As per wlandau-lilly suggestions

wlandau-lilly · 2017-09-26T20:00:12Z

R/timing.R

+                            config = NULL,
+                            ...){
+
+  if (missing(config))


For missing(), see a tweet from @gaborcsardi and the ensuing thread.

Conflicts: NAMESPACE R/cache.R

wlandau-lilly · 2017-09-27T20:40:15Z

I really appreciate what you have done here, and I have thought a lot about it. It has helped me think about what a good prediction would look like and what the user might want. I just sketched out my own first attempt in the predict_runtime branch. I plan to debug it, add unit tests, and write a whole vignette on timing.

wlandau-lilly · 2017-09-28T03:42:36Z

Like I said, you really helped me think about #64. But I went ahead and wrote my own code, so I will not be accepting the exact code in this PR.

Jasper added 2 commits August 29, 2017 18:02

Fixed confusing "checking hashes" messages

6a56809

I wouldn't actually consider it the "build phase" if length(remaining_targets) == 0.

First stab at runtime prediction

52a029a

#64

violetcereza changed the title ~~Predicting build time (issue #64)~~ Predicting build time Aug 29, 2017

Jasper added 2 commits August 30, 2017 11:39

Tests for predict_runtime

8f7d6ae

Beginning docs

bc86efc

wlandau-lilly added the TOP PRIORITY label Sep 4, 2017

Jasper added 6 commits September 20, 2017 13:30

Merge branch 'master' of https://github.com/wlandau-lilly/drake

7a8d190

Conflicts: R/cache.R R/parallel.R tests/testthat/test-cache.R

Moved predict function to timing.R

a76df27

Updated function to return data.frame

e7de81b

And more condensed output, returns data.table instead of single number, smarter (?) about untimed targets

Updating testing for new prediction format

b27b59a

Added documentation to predict_runtime()

8ff045c

Tryna get travis CI to like me

fe685e8

Forgot to document plan argument for predict_runtime

a20f7f2

ropensci deleted a comment from lintr-bot Sep 21, 2017

Adds build_times argument for predict_runtime

fb032e9

As per wlandau-lilly suggestions

ropensci deleted a comment from lintr-bot Sep 24, 2017

wlandau-lilly added status: priority and removed status: priority TOP PRIORITY labels Sep 24, 2017

wlandau-lilly reviewed Sep 26, 2017

View reviewed changes

violetcereza and others added 3 commits September 26, 2017 16:03

Stop using missing()

fc1c3dc

Merge branch 'master' of https://github.com/wlandau-lilly/drake

8bf1cbe

Conflicts: NAMESPACE R/cache.R

Merge branch 'master' of https://github.com/dapperjapper/drake

843840b

ropensci deleted a comment from lintr-bot Sep 27, 2017

wlandau-lilly closed this Sep 28, 2017

wlandau mentioned this pull request Apr 12, 2019

new transform function split to chunk a data.frame #833

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predicting build time #80

Predicting build time #80

violetcereza commented Aug 29, 2017 •

edited

Loading

codecov-io commented Aug 29, 2017 •

edited

Loading

wlandau-lilly commented Aug 30, 2017

wlandau-lilly commented Sep 1, 2017

wlandau-lilly commented Sep 21, 2017 •

edited

Loading

wlandau-lilly Sep 26, 2017

wlandau-lilly commented Sep 27, 2017

wlandau-lilly commented Sep 28, 2017

Predicting build time #80

Predicting build time #80

Conversation

violetcereza commented Aug 29, 2017 • edited Loading

codecov-io commented Aug 29, 2017 • edited Loading

Codecov Report

wlandau-lilly commented Aug 30, 2017

wlandau-lilly commented Sep 1, 2017

wlandau-lilly commented Sep 21, 2017 • edited Loading

wlandau-lilly Sep 26, 2017

Choose a reason for hiding this comment

wlandau-lilly commented Sep 27, 2017

wlandau-lilly commented Sep 28, 2017

violetcereza commented Aug 29, 2017 •

edited

Loading

codecov-io commented Aug 29, 2017 •

edited

Loading

wlandau-lilly commented Sep 21, 2017 •

edited

Loading