-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge number of files in .drake/keys .drake/data #154
Comments
Edit: TL;DR
Original postUltimately, I think it will be up to you to define bigger, fewer targets. Unfortunately, there is no way to consolidate targets post-hoc. What kind of quota and how many files? Is it imposed by the operating system or storr? How many targets do you have, and what is the cap on the number of files? Is the quota is on a per-folder basis? Because then you could cleverly use multiple To consolidate targets, the closest thing #129 probably exacerbated the problem, but it was the right design choice, and I would rather not go back. |
Can you run |
Just to garbage collect the underlying storr (#118 and links within). If you really want to restrict the number of files, you can move to a non-rds based storr (this will require changes in drake but these are restricted to cache initialisation). There is support for SQLite in the current storr release, but this is significantly slower than rds (factor of x10 to x100 from memory). I have a new package (thor) which will be about as fast as rds, perhaps a little faster. |
By the way: I am thinking of adding a |
It's not very fast - it must read all files that contain the hash references (the contents of keys/objects etc) - that's probably the limiting factor and not something that I can see a way around unfortunately. But it's not crazy slow either. You might make it something disableable within |
Exactly what I am thinking. |
As of 29a299f, you can use |
On second thought, you may just want to use the new |
> system.time(drake_gc())
cache /....
user system elapsed
49.299 19.371 282.253 Not super fast but it's an infrequent operation so not a big deal. I also am not sure it makes sense to have garbage collection in # GBs in drake after
> sum(file.info(list.files("./.drake", all.files = TRUE, full.names=TRUE, recursive = TRUE))$size)/1024^3
[1] 25.22917
# Files in drake after
> sum(length(list.files("./.drake", all.files = TRUE, full.names=TRUE, recursive = TRUE)))
[1] 234441 For my case, this didn't actually solve anything unfortunately as I guess my builds were new enough that there wasn't much (or any) junk left behind. I have taken some data off the server to make file-count room for now. I presume |
You can already use custom Ordinarily, yes, a function should do one thing well. I think I will keep the |
Also, I will reiterate how lucky I am to have someone test drive |
Update: effective 3a6e613, a bunch of superfluous storr namespaces are eliminated. For new cache <- get_cache()
cache$clear(namespace = "attempts") # Totally harmless.
cache$clear(namespace = "imported") # My tests say you don't need this one.
cache$clear(namespace = "type") # Or this one.
cache$gc() # Garbage collection
# cache$clear(namespace = "readd) # Not actually sure about this one. You might need it for existing caches. Otherwise, I really like spreading out data for a target over different namespaces (#129). Namespaces are one of my favorite features of storr, and they help organize, clean up, and future-proof |
FYI: |
In light of #181, I am considering condensing some namespaces into a single
So we would be left with the following target-level namespaces.
I think that should halve the number of small files, but the change would affect back compatibility with the current development version. So I would submit a GitHub release of the current version 4.4.1.9001 and then make the change for 4.4.1.9002. |
I have a solution that is nearly ready to deploy. It dramatically reduces the number of tiny files in the cache, and I predict that I condensed all the target-level namespaces into a single
To get this to work, I needed to define "subspaces" of |
I wanted to record progress in stages during |
To summarize our load_basic_example()
make(my_plan)
get_cache()$list_namespaces() ## [1] "config" "kernels" "meta" "objects" "progress" "session" The |
I'm running up against my file count quota. Is there any way to consolidate these? I used
clean
to remove unwanted targets already.The text was updated successfully, but these errors were encountered: