Skip to content

Releases: ropensci/drake

Down with drake_config()!

01 Feb 17:59
Compare
Choose a tag to compare

Version 7.10.0

Unavoidable but minor breaking changes

These changes invalidate some targets in some workflows, but they are necessary bug fixes.

  • Remove spurious local variables detected in $<-() and @<-() (#1144).
  • Avoid target names with trailing dots (#1147, @plebejer).

Bug fixes

  • Handle unequal list columns in bind_plans() (#1136, @jennysjaarda).
  • Handle non-vector sub-targets in dynamic branching (#1138).
  • Handle calls in analyze_assign() (#1119, @jennysjaarda).
  • Restore correct environment locking (#1143, @kuriwaki).
  • Log "running" progress of dynamic targets.
  • Log dynamic targets as failed if a sub-target fails (#1158).

New features

  • Add a new "fst_tbl" format for large tibble targets (#1154, @kendonB).
  • Add a new format argument to make(), an optional custom storage format for targets without an explicit target(format = ...) in the plan (#1124).
  • Add a new lock_cache argument to make() to optionally suppress cache locking (#1129). (It can be annoying to interrupt make() repeatedly and unlock the cache manually every time.)
  • Add new functions cancel() and cancel_if() function to cancel targets mid-build (#1131).
  • Add a new subtarget_list argument to loadd() and readd() to optionally load a dynamic target as a list of sub-targets (#1139, @MilesMcBain).
  • Prohibit dynamic file_out() (#1141).

Enhancements

  • Check for illegal formats early on at the drake_config() level (#1156, @MilesMcBain).
  • Smoothly deprecate the config argument in all user-side functions (#1118, @vkehayas). Users can now supply the plan and other make() arguments directly, without bothering with drake_config(). Now, you only need to call drake_config() in the _drake.R file for r_make() and friends. Old code with config objects should still work. Affected functions:
    • make()
    • outdated()
    • drake_build()
    • drake_debug()
    • recoverable()
    • missed()
    • deps_target()
    • deps_profile()
    • drake_graph_info()
    • vis_drake_graph()
    • sankey_drake_graph()
    • drake_graph()
    • text_drake_graph()
    • predict_runtime(). Needed to rename the targets argument to targets_predict and jobs to jobs_predict.
    • predict_workers(). Same argument name changes as predict_runtime().
  • Because of #1118, the only remaining user-side purpose of drake_config() is to serve functions r_make() and friends.
  • Document the limitations of grouping variables (#1128).
  • Handle the @ operator. For example, in the static code analysis of x@y, do not register y as a dependency (#1130, @famuvie).
  • Remove superfluous/incorrect information about imports from the output of deps_profile() (#1134, @kendonB).
  • Append hashes to deps_target() output (#1134, @kendonB).
  • Add S3 class and pretty print method for drake_meta_() objects objects.
  • Use call stacks instead of environment inheritance to power drake_envir() and id_chr() (#1132).
  • Allow drake_envir() to select the environment with imports (#882).
  • Improve visualization labels for dynamic targets: clarify that the listed runtime is a total runtime over all sub-targets and list the number of sub-targets.

Speedups and better dynamic branching

22 Dec 16:48
Compare
Choose a tag to compare

Version 7.9.0

Breaking changes in dynamic branching

  • Embrace the vctrs paradigm and its type stability for dynamic branching (#1105, #1106).
  • Accept target as a symbol by default in read_trace(). Required for the trace to make sense in #1107.

Bug fixes

  • Repair reference to custom HPC resources in the "future" backend (#1083, @jennysjaarda).
  • Properly copy data when importing targets from one cache into another (#1120, @brendanf).
  • Prevent dynamic vector sizes from conflicting with file sizes in metadata.

New features

  • Add a new log_build_times argument to make() and drake_config(). Allows users to disable the recording of build times. Produces a speedup of up to 20% on Macs (#1078).
  • Implement cache locking to prohibit concurrent calls to make(), outdated(make_imports = TRUE), recoverable(make_imports = TRUE), vis_drake_graph(make_imports = TRUE), clean(), etc. on the same cache.
  • Add a new format trigger to invalidate targets when the specialized data format changes (#1104, @kendonB).
  • Add new functions cache_planned() and cache_unplanned() to help selectively clean workflows with dynamic targets (#1110, @kendonB).
  • Add S3 classes and pretty print methods for drake_config() objects and analyze_code() objects.
  • Add a new "qs" format (#1121, @kendonB).

Speedups

  • Avoid setting seeds for imports (#1086, @adamkski).
  • Avoid working directly with POSIXct times (#1086, @adamkski)
  • Avoid excessive calls to %||% (%|||% is faster). (#1089, @billdenney)
  • Remove %||NA due to slowness (#1089, @billdenney).
  • Use hash tables to speed up is_dynamic() and is_subtarget() (#1089, @billdenney).
  • Use getVDigest() instead of digest() (#1089, #1092, eddelbuettel/digest#139 (comment), @eddelbuettel, @billdenney).
  • Pre-compute backtick and .deparseOpts() to speed up deparse() (#1086, https://stackoverflow.com/users/516548/g-grothendieck, @adamkski).
  • Pre-compute which targets exist in advance (#1095).
  • Avoid gratuitous cache interactions and data frame operations in build_times() (#1098).
  • Use mget_hash() in progress() (#1098).
  • Get target progress info only once in drake_graph_info() (#1098).
  • Speed up the retrieval of old metadata in outdated() (#1098).
  • In make(), avoid checking for nonexistent metadata for missing targets.
  • Reduce logging in drake_config().

Enhancements

  • Write a complete project structure in use_drake() (#1097, @lorenzwalthert, @tjmahr).
  • Add a minor logger note to say how many dynamic sub-targets are registered at a time (#1102, @kendonB).
  • Handle dependencies that are dynamic targets but not declared as such for the current target (#1107).
  • Internally, the "layout" data structure is now called the "workflow specification", or "spec" for short. The spec is drake's interpretation of the plan. In the plan, all the dependency relationships among targets and files are implicit. In the spec, they are all explicit. We get from the plan to the spec using static code analysis, e.g. analyze_code().

Dynamic branching

02 Dec 12:10
Compare
Choose a tag to compare

Version 7.8.0

Bug fixes

  • Prevent drake::drake_plan(x = target(...)) from throwing an error if drake is not loaded (#1039, @mstr3336).
  • Move the transformations lifecycle badge to the proper location in the docstring (#1040, @jeroen).
  • Prevent readd() / loadd() from turning an imported function into a target (#1067).
  • Align in-memory disk.frame targets with their stored values (#1077, @brendanf).

New features

  • Implement dynamic branching (#685).
  • Add a new subtargets() function to get the cached names of the sub-targets of a dynamic target.
  • Add new subtargets arguments to loadd() and readd() to retrieve specific sub-targets from a parent dynamic target.
  • Add new get_trace() and read_trace() functions to help track which values of grouping variables go into the making of dynamic sub-targets.
  • Add a new id_chr() function to get the name of the target while make() is running.
  • Implement plot(plan) (#1036).
  • vis_drake_graph(), drake_graph_info(), and render_drake_graph() now
    take arguments that allow behavior to be defined upon selection of nodes. (#1031, @mstr3336).
  • Add a new max_expand argument to make() and drake_config() to scale down dynamic branching (#1050, @hansvancalster).

Enhancements

  • Document transformation functions in a way that avoids having to create true functions (#979).
  • Avoid always invalidating the memoized layout when we set the knitr hash.
  • Change the names of environments in drake_config() objects.
  • Assert that prework is a language object, list of language objects, or character vector (#1 at pat-s/multicore-debugging on GitHub, @pat-s).
  • Use an environment instead of a list for config$layout. Supports internal modifications by reference. Required for #685.
  • Clean up the code of the parallel backends.
  • Make dynamic a formal argument of target().
  • Always lock/unlock the environment target by target, allowing informative error messages to appear more readily (#1062, @PedramNavid)
  • Automatically ignore storrs and decorated storrs (#1071).
  • Speed up memory management by avoiding a call to setdiff() and avoiding names(config$envir_targets).

disk.frame and code_to_function()

15 Oct 00:04
Compare
Choose a tag to compare

Version 7.7.0

Bug fixes

  • Take the sum instead of the max in dir_size(). Incurs rehashing for some workflows, but should not invalidate any targets.

New features

  • Add a new which_clean() function to preview which targets will be invalidated by clean() (#1014, @pat-s).
  • Add serious import and export methods for the decorated storr (#1015, @billdenney, @noamross).
  • Add a new "diskframe" format for larger-than-memory data (#1004, @xiaodaigh).
  • Add a new drake_tempfile() function to help with "diskframe" format. It makes sure we are not copying large datasets across different physical storage media (#1004, @xiaodaigh).
  • Add new function code_to_function() to allow for parsing script based workflows into functions so drake_plan() can begin to manage the workflow and track dependencies. (#994, @thebioengineer)

Continuing with efficient data formats

14 Sep 20:11
Compare
Choose a tag to compare

Version 7.6.2

Bug fixes

  • Remove README.md from CRAN altogether. Also remove all links from the news and vignette. The links trigger too many CRAN notes, which made the automated checks too brittle.
  • Serialize formats that need serialization (like "keras") before sending the data from HPC workers to the master process (#989).
  • Check for custom-formatted files when checking checksums.
  • Force fst-formatted targets to plain data frames. Same goes for the new "fst_dt" format.
  • Change the meaning and behavior of max_expand in drake_plan(). max_expand is now the maximum number of targets produced by map(), split(), and cross(). For cross(), this reduces the number of targets (less cumbersome) and makes the subsample of targets more representative of the complete grid. It also. ensures consistent target naming when .id is FALSE (#1002). Note: max_expand is not for production workflows anyway, so this change does not break anything important. Unfortunately, we do lose the speed boost in drake_plan() originally due to max_expand, but drake_plan() is still fast, so that is not so bad.
  • Drop specialized formats of NULL targets (#998).
  • Prevent false grouping variables from partially tagging along in cross() (#1009). The same fix should apply to map() and split() too.
  • Respect graph topology when recovering old grouping variables for map() (#1010).

New features

  • Add a new "fst_dt" format for fst-powered saving of data.table objects.
  • Support a custom "caching" column of the plan to select master vs worker caching for each target individually (#988).
  • Make transform a formal argument of target() so that users do not have to type "transform =" all the time in drake_plan() (#993).
  • Migrate the documentation website from ropensci.github.io/drake to docs.ropensci.org/drake.

Enhancements

  • Document the HPC limitations of target(format = "keras") (#989).
  • Remove the now-superfluous vignette.
  • Wrap up console and text file logging functionality into a reference class (#964).
  • Deprecate the verbose argument in various caching functions. The location of the cache is now only printed in make(). This made the previous feature easier to implement.
  • Carry forward nested grouping variables in combine() (#1008).
  • Improve the encapsulation of hash tables in the decorated storr (#968).

CRAN hotfix

19 Aug 18:34
Compare
Choose a tag to compare

Fix broken README links.

Big data formats

17 Aug 16:46
Compare
Choose a tag to compare

Version 7.6.0

New features

  • Support specialized data storage via a decorated cache and format argument of target() (#971). This allows users to leverage faster ways to save and load targets, such as write_fst() for data frames and save_model_hdf5() for Keras models. It also improves memory because it prevents storr from making a serialized in-memory copy of large data objects.
  • Add tidyselect functionality for ... in progress(), analogous to loadd(), build_times(), and clean().
  • Support S3 for user-defined generics (#959). If the generic do_stuff() and the method stuff.your_class() are defined in envir, and if do_stuff() has a call to UseMethod("stuff"), then drake's code analysis will detect stuff.your_class() as a dependency of do_stuff().
  • Add authentication support for file_in() URLs. Requires the new curl_handles argument of make() and drake_config() (#981).

Bug fixes

  • Make drake_plan(transform = slice()) understand .id and grouping variables (#963).
  • Repair clean(garbage_collection = TRUE, destroy = TRUE). Previously it destroyed the cache before trying to collect garbage.
  • Ensure that r_make() passes informative error messages back to the calling process (#969).
  • Avoid downloading full contents of URLs when rehashing (#982)
  • Retain upstream grouping variables of map() and cross() on topologically side-by-side targets (#983).
  • Manually enforce the correct ordering in dsl_left_outer_join() so cross() selects the right combinations of existing targets (#986). This bug was probably introduced in the solution to #983.
  • Make the output of progress() more consistent, less dependent on whether tidyselect is installed.

Enhancements

  • Document DSL keywords as if they were true functions: target(), map(), split(), cross(), and combine() (#979).
  • Do garbage collection between the unloading and loading phases of memory management.
  • Keep file_out() files in clean() unless garbage_collection is TRUE. That way, make(recover = TRUE) is a true "undo button" for clean(). clean(garbage_collection = TRUE) still removes data in the cache, as well as any file_out() files from targets currently being cleaned.
  • The menu in clean() only appears if garbage_collection is TRUE. Also, this menu is added to rescue_cache(garbage_collection = TRUE).
  • Reorganize the internal code files and functions to make development easier.
  • Move the history inside the cache folder .drake/. The old .drake_history/ folder was awkward. Old histories are migrated during drake_config(), and drake_history().
  • Add lifecycle badges to exported functions.

CRAN hotfix

21 Jul 14:41
Compare
Choose a tag to compare

History, provenance, and recovery

21 Jul 01:59
Compare
Choose a tag to compare

Version 7.5.0

New features

  • Add automated data recovery (#945). This is still experimental and disabled by default. Requires make(recover = TRUE).
  • Add new functions recoverable() and r_recoverable() to show targets that are outdated but recoverable via make(recover = TRUE).
  • Track the history and provenance of targets, viewable with drake_history(). Powered by txtq (#918, #920).
  • Add a new no_deps() function, similar to ignore(). no_deps() suppresses dependency detection but still tracks changes to the literal code (#910).
  • Add a new "autoclean" memory strategy (#917).
  • Export transform_plan().
  • Allow a custom seed column of drake plans to set custom seeds (#947).
  • Add a new seed trigger to optionally ignore changes to the target seed (#947).

Enhancements

  • In drake_plan(), interpret custom columns as non-language objects (#942).
  • Suggest and assert clustermq >= 0.8.8.
  • Log the target name in a special column in the console log file (#909).
  • Rename the "memory" memory strategy to "preclean" (with deprecation; #917).
  • Deprecate ensure_workers in drake_config() and make().
  • Warn when the user supplies additional arguments to make() after config is already supplied.
  • Prevent users from running make() from inside the cache (#927).
  • Add CITATION file with JOSS paper.
  • In deps_profile(), include the seed and change the names.
  • Allow the user to set a different seed in make(). All this does is invalidate old targets.
  • Use set_hash() and get_hash() in storr to double the speed of progress tracking.

Bug fixes

  • In the static code analysis for dependency detection, ignore list elements referenced with $ (#938).
  • Minor: handle strings embedded in language objects (#934).
  • Minor: supply xxhash64 as the default hash algorithm for non-storr hashing if the driver does not have a hash algorithm.

Data splitting, and URL tracking, and advanced memory management

07 Jun 03:15
Compare
Choose a tag to compare

Version 7.4.0

Mildly breaking changes

These changes are technically breaking changes, but they should only affect advanced users.

  • rescue_cache() no longer returns a value.

Bug fixes

  • Restore compatibility with clustermq (#898). Suggest version >= 0.8.8 but allow 0.8.7 as well.
  • Ensure drake recomputes config$layout when knitr reports change (#887).
  • Do not rehash large imported files every make() (#878).
  • Repair parsing of long tidy eval inputs in the DSL (#878).
  • Clear up cache confusion when a custom cache exists adjacent to the default cache (#883).
  • Accept targets as symbols in r_drake_build().
  • Log progress during r_make() (#889).
  • Repair expose_imports(): do not do the environment<- trick unless the object is a non-primitive function.
  • Use different static analyses of assign() vs delayedAssign().
  • Fix a superfluous code analysis warning incurred by multiple file_in() files and other strings (#896).
  • Make ignore() work inside loadd(), readd(), file_in(), file_out(), and knitr_in().

New features

  • Add experimental support for URLs in file_in() and file_out(). drake now treats file_in()/file_out() files as URLS if they begin with "http://", "https://", or "ftp://". The fingerprint is a concatenation of the ETag and last-modified timestamp. If neither can be found or if there is no internet connection, drake throws an error.
  • Implement new memory management strategies "unload" and "none", which do not attempt to load a target's dependencies from memory (#897).
  • Allow users to give each target its own memory strategy (#897).
  • Add drake_slice() to help split data across multiple targets. Related: #77, #685, #833.
  • Introduce a new drake_cache() function, which is now recommended instead of get_cache() (#883).
  • Introduce a new r_deps_target() function.
  • Add RStudio addins for r_make(), r_vis_drake_graph(), and r_outdated() (#892).