Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache$destroy deletes entire directory #926

Closed
2 tasks
nettoyoussef opened this issue Jul 1, 2019 · 5 comments
Closed
2 tasks

cache$destroy deletes entire directory #926

nettoyoussef opened this issue Jul 1, 2019 · 5 comments
Assignees

Comments

@nettoyoussef
Copy link

nettoyoussef commented Jul 1, 2019

Prework

  • Read and abide by drake's code of conduct.
  • Advanced users: Version 7.3.0.

Description

Trying to understand better what happened at #925, I decided to remove the whole cache of my Drake plan. Strangely, even after the removal of the cache, when I tried to generate a visualization of the plan there were some steps of the plan that were up to date (green).

Wishing to remove all those targets, I re-run cache$destroy(). That had the side effect of permanently deleting all files in the current working directory, including datasets and code scripts. I'm unsure exactly why this was the case.

I think that what has probably happened is that find_cache() or some similar function to return the path to the drake cache returned NULL and some kind of command such as rm delete permanently all the files in the current working directory.

This, of course, is giving me some extensive headache to recover the most updated code files from my project. I think some kind of warning or check before destroying the cache should be in place, such as if(is.null(find_cache())){stop("Cache was not found.")} to avoid this kind of problem.

Reproducible example

library(here)

#This version will destroy everything in the .here folder 
#For protection, instead of using a .here file, I am creating an temporary folder
dir.create(here("project_folder"))


cache <- storr::storr_rds(here("project_folder"), compress = FALSE)

#Drake part
plan <- drake_plan(some plan....)
config<- drake_config(plan, cache = cache)
make(config)

#Warning - it has side effects - use with caution
#This will destroy the project_folder created above
cache$destroy()

#This version creates an specific folder for the Drake cache
cache <- storr::storr_rds(here("project_folder/.drake"), compress = FALSE)

#Drake part
plan <- drake_plan(some plan....)
config<- drake_config(plan, cache = cache)
make(config)

# With the last format, only the Drake cache is destroyed.
cache$destroy()

Expected output

Drake would not have removed files not related to Drake's cache.

@wlandau
Copy link
Member

wlandau commented Jul 2, 2019

Sorry to hear that.

Strangely, even after the removal of the cache, when I tried to generate a visualization of the plan there were some steps of the plan that were up to date (green).

There is probably a cache in a parent directory. By default, drake searches up through parent directories to find the nearest .drake/ cache and assumes that one is the cache of the current project. If find_cache() returns NULL, then your caches are clear and no targets should be up to date.

Wishing to remove all those targets, I re-run cache$destroy(). That had the side effect of permanently deleting all files in the current working directory, including datasets and code scripts. I'm unsure exactly why this was the case.

Odd, that should not happen. When you set up your project, what does cache$driver$path say? This path should be something like your/project/root/.drake, not your/project/root. In the latter case, all the files in your project root are, in fact, deleted. But drake itself makes sure your cache folder and your project root are different.

Reproducible example
? # Warning - contain side effects
setwd("path/to/directory")
cache$destroy()

Sorry, this is not enough information to reproduce what you are seeing. It is missing the steps to define cache and set up your project. Without more steps, I cannot help.

@wlandau wlandau closed this as completed Jul 2, 2019
@nettoyoussef
Copy link
Author

nettoyoussef commented Jul 3, 2019

Hi Will,

Sorry, I was kind of in a rush when I submitted this issue. I apologize. I'm editing the reprex above to correctly reproduce the error.

TL/DR: the users may not be aware of the behavior of how Drake's cache is created/destroyed and may suffer side effects from their lack of knowledge/attention.

I am now aware of what exactly happened. Not sure however if something must be done about it.

I was using the library here to create the path for the drake cache, but without creating a specific folder for it. I was aware that Drake used the storr package to create the cache, but I thought that even when submitting the path for it in the config or make functions, it would still create a .drake directory to save the cache.

Unfortunately, that does not happen. Drakes uses the exact folder that was given in the cache argument. Since Drake does not have its own method to destroy its cache but instead uses the method cache$destroy from the storr package, it deletes everything that exists on the specified cache path. So, even if drake creates only some of the files on the cache path, cache$destroy will delete everything that happens to be there.

This can be very dangerous if the user does not specify an exclusive folder for the Drake cache.

For beginners, the manual mentions on a brief passage how to delete the cache, but as it is, it appears that cache$destroy is some kind of internal method that the Drake package uses to identify and destroy its own caches. If the users, unaware of this behavior, give Drake a project folder in the hope that it would create its own .drake hidden folder there, then they risk deleting all their information.

While this now seems obvious to me, given the potential destructive behavior of the function cache$destroy, maybe Drake would benefit of its own wrapper for the storr functions, being picky in how it creates and destroy its files in order to avoid unintended consequences for its users.

@wlandau
Copy link
Member

wlandau commented Jul 3, 2019

Thanks, I did not realize this was a problem.

So then was the path of the cache really the same as the root of the project? How exactly were you creating the cache? There are a couple ways we could handle this, but before I implement something, it would help to see the code you were using to set up the cache and call make().

@nettoyoussef
Copy link
Author

nettoyoussef commented Jul 3, 2019

So then was the path of the cache really the same as the root of the project?

In the first example.

Let me explain what I did. The here package works on the following way. If you don't set a starting directory, it will create relative paths from the same directory where you started your first script. So, you would usually start your r session with the scripts that are in your project folder.

So, let's suppose my project folder is located on ~/my_project. I can put a .here file in this directory using the function set_here(). I usually have the following organization:

  • Project folder
    • code
    • data
    • dockerfiles
    • experiments

So, to access the data directory to read some file, I just have to use read_file(here("data/my_file.ext")). In other words, all my paths are relative paths.

When I was creating the drake cache trying to follow your suggestion on #907, I inadvertently used the line cache <- storr::storr_rds(here(), compress = FALSE). This made Drake creates its usual folders directly on ~/my_project.

I then created my plan:

library(here)
library(drake)
cache <- storr::storr_rds(here(), compress = FALSE)
plan <- drake_plan(temp = matrix(runif(10^3), ncol = 5))
config <- drake_config(plan,
  cache_log_file = "cache_log.csv",
  verbose = 2,
  garbage_collection = T,
  retries = 2,
  lock_envir = FALSE,
  cache = cache
)
make(config = config)

Drake created other subdirectories:

  • Project folder
    • code
    • data
    • dockerfiles
    • experiments
    • config
    • data
    • keys
    • scratch

And then, finally, when I ran cache$destroy(), it erased the whole ~/my_project folder.

The Drake step doesn't really matter, because everything is being managed by the storr package. What Drake would do is to some kind of search of a .drake folder before calling cache$destroy, like a program that only uninstall its files. So, if Drake doesn't find it, it doesn't destroy anything.

An example would be:

delete_cache <- function() {
  library(storr)

  drake_path <- dir(pattern = "\\.drake", recursive = TRUE, all.files = T, include.dirs = T)
  if (length(drake_path) == 0) {
    stop("Cache not found.")
  }

  cache <- storr::storr_rds(paste0(getwd(), "/", drake_path), compress = FALSE) # not sure if this destroys the cache created
  # by make - it is just a sketch
  cache$destroy()
}

# Using the here package
delete_cache <- function() {
  library(storr)
  library(here)

  setwd(here())

  drake_path <- dir(pattern = "\\.drake", recursive = TRUE, all.files = T, include.dirs = T)
  if (length(drake_path) == 0) {
    stop("Cache not found.")
  }

  cache <- storr::storr_rds(paste0(here(), "/", drake_path), compress = FALSE)
  cache$destroy()
}

@wlandau
Copy link
Member

wlandau commented Jul 3, 2019

Thanks. I think it is enough for make() to throw a warning when the cache path and the current working directory are the same. This will not totally prevent accidental deletion, but it will alert users when there is a non-negligible risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants