-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download only necessary packages #112
Comments
This probably duplicates #56. The issue is that dependencies may have changed (new dependency packages) and/or dependency versions may have changed, so I think downloading all dependencies is probably the correct thing to do here. Of course it would be better if there were a way to only update packages that need to be updated, but that's more complicated to determine. Do you have any thoughts on how to examine/determine the update dependency graph? |
I agree you want to check all dependencies as well. I'm not sure how How-to: Once you make the list of all packages that are needed using |
You're correct that it currently doesn't address this issue. We hadn't yet taken the time to work through a solution. Presumably, package dependencies that are no longer required should be removed from the repo, along with adding new package deps. |
Below a draft of a function that will only download packages that are not already in an (online) repo or for which a newer package version is available on CRAN. Seems to work and saves a good amount of time for a reasonably big repo (https://github.com/radiant-rstats/minicran). The function also returns any packages that could be removed but doesn't actually do anything with that information just yet. Looking forward to hearing your comments. For an example script that uses the function see: https://github.com/radiant-rstats/minicran/blob/gh-pages/minicran.R This script also has a (very) crude function to remove older-version files from the repo. If you have ideas on how to improve that I'd also be interested. selMakeRepo <- function(
pkgs, path, minicran, repos = getOption("repos"),
type = "source", Rversion = R.version, ...
) {
minicran_avail <- miniCRAN::pkgAvail(repos = minicran, type = type, Rversion = Rversion)[, "Version"]
cran_avail <- miniCRAN::pkgAvail(repos = repos, type = type, Rversion = Rversion)[, "Version"]
## in dependent pkgs but not in miniCRAN repo
to_fetch <- pkgs[!pkgs %in% names(minicran_avail)]
## not in dependent pkgs but in miniCRAN repo
to_remove <- minicran_avail[!names(minicran_avail) %in% pkgs]
## which packages should be updated
to_compare <- intersect(names(cran_avail), names(minicran_avail))
pkgs_comp <- data.frame(
compare = to_compare,
pkgs = cran_avail[to_compare],
minicran = minicran_avail[to_compare],
stringsAsFactors = FALSE
)
to_update <- apply(pkgs_comp, 1, function(x) compareVersion(x[2], x[3]))
to_update <- names(to_update[to_update == 1])
to_fetch <- c(to_update, to_fetch)
## selective set of packages to download and add to repo
dwnload <- makeRepo(to_fetch, path = path, type = type, Rversion = Rversion, ...)
## returning packages to remove
invisible(to_remove)
} |
Thanks for contributing the code. I'm taking a look, in a new branch My thought is to make this part of (I'm open for suggestion what this argument should be called.) |
Note that my function also has the url for the remote host of the minicran repo. You could use the |
Good pointer, thank you. I think we can extract the package versions from a local path with something along these lines: pkg_versions <- function(path){
file_ptn <- "\\.tar\\.gz|zip|tgz"
p <- basename(list.files(path, pattern = file_ptn, recursive = TRUE))
z <- strsplit(p, "_")
pkg <- sapply(z, "[[", 1)
version <- gsub(file_ptn, "", sapply(z, "[[", 2))
names(version) <- pkg
version
} pkg_versions(path)
## dplyr plyr Rcpp
## "0.7.5" "1.8.4" "0.12.17" |
Nice @andrie ! I like this better than my approach of using the remote miniCRAN repo. Once you have the local path you should also be able to remove deps that are no longer needed as @achubaty suggested Note: This runs through the entire repo right? So if my repo has macOS 3.4 and 3.5, which is the version that will be used to check if new files should be downloaded? CRAN only very slowly updates 3.4 for macOS (if at all). So if 3.5 is up to date this function would never update 3.4 version packages right? Same issue might come up with src files which are always first. Once you have the src file in your repo the others may not get updated. Perhaps the search could be per type and Rversion? Minor tweak: |
We don't need any of this complication, since pkgAvail(path, type = "source", Rversion = "3.5.0")[, "Version"]
|
Even better :) |
Sadly I no longer seem to have a trace of this branch that I reference. |
When I run the lines below all dependencies of the packages in the
pkg_src
character vector are downloaded from CRAN, even the ones that are already up to date in the local miniCRAN directory. This can take a pretty long time if there are many deps. Is there already a function in the miniCRAN package to check this? I looked around but could not see one. Did I miss something?The text was updated successfully, but these errors were encountered: