-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import drake caches into other drake caches #1015
Comments
The decorated Line 130 in 9d50193
|
Thanks for starting this one! In my use case, I’d like to have a bit more and different functionality. I have a few different concerns and use cases: For my collaboration work, people are often working with similar data types and therefore target names. For example, cache1 and cache2 may both have targets named “vital_signs”. For that, a direct import would fail without putting the caches into some separated namespace. The other issue that I encounter is that I’d like to be able to run the plans that originally generated cache1 and cache2 in case new data comes in without modifying the plans. In practice, I usually want only a subset of the targets out of cache1 and cache2 in the new, combined cache. Often, I only need one target from both when I’m combining them. In my ideal scenario the way I’m thinking about it now, I think it would be possible to have something akin to namespaces within caches. Then, I could import cache1 and cache2 into a new cache3. I could access cache1 objects from cache3 like: cache1::targetname or cache2::targetname, and then when generating new items in cache3 they could refer to those target names. And, overall, I would be ok rerunning everything and my use case is more at the plan than the cache level (though both would be helpful in different ways). For me, most of my plans take minutes to hours to run, and a one time cost of rerunning them upon combination would be ok. |
I think this would cover it in my case. In my scenario people are working on the same project / plan, so I don't expect clashes between target names from different plans. I note that the alternative workflow I'm thinking of using a shared S3-based storr, which require someone, maybe me, wrapping up this old PR to |
Sounds good. I think the path to a first implementation is clear.
Yes, if the @billdenney, unfortunately, |
@wlandau, yes, I think that an |
Hmm... I hesitate to check for outdatedness because it is comparatively slow and requires an examination of the whole plan. Ultimately, you will need to call |
Another thing I just realized. In |
Could there be a flag to check for the imported target being outdated when imported? I ask because, while slow/expensive, I would appreciate the consistency that it would guarantee for the plan combination. |
Taking a step backSorry, I think that's the point where things get too complicated to automate well. Another way to think about it: a target you import could be invalid in your cache and plan even if it was originally up to date in your colleague's cache and plan. So there is no way to be sure about the kind of target status that ultimately matters. Spelling this out actually makes me question the wisdom of The more I think about it, the more I think the merging you describe is going to require a fully-fledged version control system. For all I know, git may be able to merge everyone's Can we talk about why you want to merge different Proposal
library(drake)
cache1 <- new_cache("cache1")
cache2 <- new_cache("cache2")
plan1 <- drake_plan(x = 1)
plan2 <- drake_plan(
x = target(
ignore(cache1)$get("x"),
trigger = trigger(change = ignore(cache1)$get_hash("x"))
)
)
# First runthroughs
make(plan1, cache = cache1)
#> target x
make(plan2, cache = cache2)
#> target x
# Change nothing
make(plan1, cache = cache1)
#> All targets are already up to date.
make(plan2, cache = cache2)
#> All targets are already up to date.
# Change x in the *first* plan
plan1 <- drake_plan(x = 2)
make(plan1, cache = cache1)
#> target x
# x changes in plan2 because of the change trigger
make(plan2, cache = cache2)
#> target x Created on 2019-09-27 by the reprex package (v0.3.0) |
That type of chaining would do effectively all of what I need. Everything else I'm asking for (like detecting outdated objects from the other cache) is more of a want. I don't think I would have gotten to the As a simplification, would it be reasonable to do something like this instead for plan2?
If so, perhaps the cache argument of |
Special-casing would unfortunately break the promise of static code analysis that the values of objects behind the symbols do not matter. But if it helps make the plan look cleaner, this will probably work: cache1 <- function() {
drake_cache("cache1")
}
plan2 <- drake_plan(
x = target(
cache1()$get("x"),
trigger = trigger(change = cache1()$get_hash("x"))
)
) |
In that case, I'd go for the slightly-less-indirection method you originally proposed. I think that from my perspective, you can consider this closed. My only slight additional request would be that your suggestion in #1015 (comment) would be helpful to have in some form of documentation (perhaps it's not exactly a "frequently" asked question, but knowing how to do it could perhaps help others ;) ). |
I think an FAQ is easiest to way to insert this workaround in the docs. If you find a place in the manual that fits, please feel free to submit a PR. I changed my mind on
As a side-effect of this new strategy, you will be able to import targets one at a time. |
Thanks! |
@billdenney, just an FYI: in the next release of |
That's great, thank you! |
Prework
drake
's code of conduct.Description
$import()
instorr
gets us part of the way there, but we also have history files and specialized data files to wrangle too. (So we need wlandau/txtq#17). With some work ondrake
's decoratedstorr
, this should be possible to do the following with impunity:@billdenney and @noamross, how much would this address the issues around remote caches and collaborative projects you brought up recently?
The text was updated successfully, but these errors were encountered: