Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow incomplete renv.lock #10

Merged
merged 8 commits into from
Nov 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 49 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,18 @@

`locksmith` is a utility to generate `renv.lock` file containing all dependencies of given set of R packages.

Given the input list of git repositories containing the R packages, as well as a list of R package repositories (e.g. in a package manager, CRAN, BioConductor etc.), `locksmith` will try to determine the list of all dependencies and their versions required to make the input list of packages work. It will then save the result in an `renv.lock`-compatible file.
Given the input list of git repositories containing the R packages, as well as a list of R package
repositories (e.g. in a package manager, CRAN, BioConductor etc.), `locksmith` will try to determine
the list of all dependencies and their versions required to make the input list of packages work.
It will then save the result in an `renv.lock`-compatible file.

For additional information about `renv.lock`, please refer to the [`renv` documentation](https://rstudio.github.io/renv/articles/renv.html).

## Installation

Simply download the project for your distribution from the [releases](https://github.com/insightsengineering/locksmith/releases) page. `locksmith` is distributed as a single binary file and does not require any additional system requirements.
Simply download the project for your distribution from the
[releases](https://github.com/insightsengineering/locksmith/releases) page. `locksmith` is
distributed as a single binary file and does not need any additional system requirements.

Alternatively, you can install the latest version by running:

Expand All @@ -18,7 +25,8 @@ go install github.com/insightsengineering/locksmith@latest

## Usage

`locksmith` is a command line utility, so after installing the binary in your `PATH`, simply run the following command to view its capabilities:
`locksmith` is a command line utility, so after installing the binary in your `PATH`, simply run the
following command to view its capabilities:

```bash
locksmith --help
Expand All @@ -31,13 +39,15 @@ locksmith --logLevel debug --exampleParameter 'exampleValue'
```

Real-life example with multiple input packages and repositories.
Please see below for [an example](#configuration-file) how to set package and repository lists more easily in a configuration file.
Please see below for [an example](#configuration-file) how to set package and repository lists more
easily in a configuration file.

```bash
locksmith --inputPackageList https://raw.githubusercontent.com/insightsengineering/formatters/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/rtables/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/scda/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/scda.2022/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/nestcolor/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/tern/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/rlistings/main/DESCRIPTION --inputRepositoryList BioC=https://bioconductor.org/packages/release/bioc,CRAN=https://cran.rstudio.com
locksmith --inputPackageList https://raw.githubusercontent.com/insightsengineering/formatters/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/rtables/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/scda/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/scda.2022/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/nestcolor/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/tern/main/DESCRIPTION,https://raw.githubusercontent.com/insightsengineering/rlistings/main/DESCRIPTION,https://gitlab.example.com/projectgroup/projectsubgroup/projectname/-/raw/main/DESCRIPTION --inputRepositoryList BioC=https://bioconductor.org/packages/release/bioc,CRAN=https://cran.rstudio.com
```

In order to download the packages from GitHub or GitLab repositories, please set the environment variables containing the Personal Access Tokens.
In order to download the packages from non-public GitHub or GitLab repositories, please set the environment
variables containing the Personal Access Tokens.

* For GitHub, set the `LOCKSMITH_GITHUBTOKEN` environment variable.
* For GitLab, set the `LOCKSMITH_GITLABTOKEN` environment variable.
Expand All @@ -46,12 +56,15 @@ By default `locksmith` will save the resulting output file to `renv.lock`.

## Configuration file

If you'd like to set the above options in a configuration file, by default `locksmith` checks `~/.locksmith`, `~/.locksmith.yaml` and `~/.locksmith.yml` files.
If you'd like to set the above options in a configuration file, by default `locksmith` checks
`~/.locksmith`, `~/.locksmith.yaml` and `~/.locksmith.yml` files.

If any of these files exist, `locksmith` will use options defined there, unless they are overridden by command line flags or environment variables.
If any of these files exist, `locksmith` will use options defined there, unless they are overridden
by command line flags or environment variables.

You can also specify custom path to configuration file with `--config <your-configuration-file>.yml` command line flag.
When using custom configuration file, if you specify command line flags, the latter will still take precedence.
You can also specify custom path to configuration file with `--config <your-configuration-file>.yml`
command line flag. When using custom configuration file, if you specify command line flags,
the latter will still take precedence.

Example contents of configuration file:

Expand All @@ -62,6 +75,7 @@ inputPackages:
- https://raw.githubusercontent.com/insightsengineering/rtables/main/DESCRIPTION
- https://raw.githubusercontent.com/insightsengineering/scda/main/DESCRIPTION
- https://raw.githubusercontent.com/insightsengineering/scda.2022/main/DESCRIPTION
- https://gitlab.example.com/projectgroup/projectsubgroup/projectname/-/raw/main/DESCRIPTION
inputRepositories:
- Bioconductor.BioCsoft=https://bioconductor.org/packages/release/bioc
- CRAN=https://cran.rstudio.com
Expand All @@ -70,11 +84,23 @@ inputRepositories:
The example above shows an alternative way of providing input packages, and input repositories,
as opposed to `inputPackageList` and `inputRepositoryList` CLI flags/YAML keys.

Additionally, `inputPackageList`/`inputRepositoryList` CLI flags take precendence over `inputPackages`/`inputRepositories` YAML keys.
Additionally, `inputPackageList`/`inputRepositoryList` CLI flags take precendence over
`inputPackages`/`inputRepositories` YAML keys.

## Environment variables

`locksmith` reads environment variables with `LOCKSMITH_` prefix and tries to match them with CLI
flags. For example, setting the following variables will override the respective values from the
configuration file: `LOCKSMITH_LOGLEVEL`, `LOCKSMITH_INPUTPACKAGELIST`, `LOCKSMITH_INPUTREPOSITORYLIST` etc.

The order of precedence is:

CLI flag → environment variable → configuration file → default value.

## Binary dependencies

For `locksmith` in order to generate an `renv.lock` with binary R packages, it is necessary to provide URLs to binary repositories in `inputRepositories`/`inputRepositoryList`.
For `locksmith` in order to generate an `renv.lock` with binary R packages,
it is necessary to provide URLs to binary repositories via `inputRepositories`/`inputRepositoryList`.

Examples illustrating the expected format of URLs to repositories with binary packages:

Expand Down Expand Up @@ -113,23 +139,27 @@ As a result, the configuration file could look like this:
- Bioc-Windows=https://www.bioconductor.org/packages/release/bioc/bin/windows/contrib/4.3
```

## Environment variables
## Packages not found in the repositories

`locksmith` reads environment variables with `LOCKSMITH_` prefix and tries to match them with CLI flags.
For example, setting the following variables will override the respective values from configuration file:
`LOCKSMITH_LOGLEVEL`, `LOCKSMITH_EXAMPLEPARAMETER` etc.
It may happen that some of the dependencies required by the input packages cannot be found in any of
the input repositories. By default, `locksmith` will fail in such case and show a list of such dependencies.

The order of precedence is:
However, it is possible to override this behavior by using the `--allowIncompleteRenvLock` flag.
Simply list the types of dependencies which should not cause the `renv.lock` generation to fail:

CLI flag → environment variable → configuration file → default value.
```bash
locksmith --allowIncompleteRenvLock 'Imports,Depends,Suggests,LinkingTo'
```

## Development

This project is built with the [Go programming language](https://go.dev/).

### Development Environment

It is recommended to use Go 1.21+ for developing this project. This project uses a pre-commit configuration and it is recommended to [install and use pre-commit](https://pre-commit.com/#install) when you are developing this project.
It is recommended to use Go 1.21+ for developing this project. This project uses a pre-commit
configuration and it is recommended to [install and use pre-commit](https://pre-commit.com/#install)
when you are developing this project.

### Common Commands

Expand Down
28 changes: 21 additions & 7 deletions cmd/construct.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,10 @@ import (
// which should be included in the output renv.lock file,
// based on the list of package descriptions, and information contained in the PACKAGES files.
func ConstructOutputPackageList(packages []PackageDescription, packagesFiles map[string]PackagesFile,
repositoryList []string) []PackageDescription {
repositoryList []string, allowedMissingDependencyTypes []string) []PackageDescription {
var outputPackageList []PackageDescription
var fatalErrors string
var nonFatalErrors string
// Add all input packages to output list, as the packages should be downloaded from git repositories.
for _, p := range packages {
outputPackageList = append(outputPackageList, PackageDescription{
Expand All @@ -44,7 +45,8 @@ func ConstructOutputPackageList(packages []PackageDescription, packagesFiles map
log.Info(p.Package, " → ", d.DependencyName, " (", d.DependencyType, ")")
ResolveDependenciesRecursively(
&outputPackageList, d.DependencyName, d.VersionOperator,
d.VersionValue, repositoryList, packagesFiles, 1, &fatalErrors,
d.VersionValue, d.DependencyType, allowedMissingDependencyTypes,
repositoryList, packagesFiles, 1, &fatalErrors, &nonFatalErrors,
)
}
}
Expand All @@ -53,6 +55,9 @@ func ConstructOutputPackageList(packages []PackageDescription, packagesFiles map
if fatalErrors != "" {
log.Fatal(fatalErrors)
}
if nonFatalErrors != "" {
log.Error(nonFatalErrors)
}
return outputPackageList
}

Expand All @@ -61,8 +66,9 @@ func ConstructOutputPackageList(packages []PackageDescription, packagesFiles map
// (later used to generate the renv.lock), or if the dependency should be downloaded from a package repository.
// Repeats the process recursively for all dependencies not yet processed.
func ResolveDependenciesRecursively(outputList *[]PackageDescription, name string, versionOperator string,
versionValue string, repositoryList []string, packagesFiles map[string]PackagesFile, recursionLevel int,
fatalErrors *string) {
versionValue string, dependencyType string, allowedMissingDependencyTypes []string,
repositoryList []string, packagesFiles map[string]PackagesFile, recursionLevel int,
fatalErrors *string, nonFatalErrors *string) {
var indentation string
for i := 0; i < recursionLevel; i++ {
indentation += " "
Expand Down Expand Up @@ -103,7 +109,8 @@ func ResolveDependenciesRecursively(outputList *[]PackageDescription, name strin
)
ResolveDependenciesRecursively(
outputList, d.DependencyName, d.VersionOperator, d.VersionValue,
repositoryList, packagesFiles, recursionLevel+1, fatalErrors,
d.DependencyType, allowedMissingDependencyTypes, repositoryList,
packagesFiles, recursionLevel+1, fatalErrors, nonFatalErrors,
)
}
}
Expand All @@ -115,9 +122,16 @@ func ResolveDependenciesRecursively(outputList *[]PackageDescription, name strin
}
var versionConstraint string
if versionOperator != "" && versionValue != "" {
versionConstraint = " in version " + versionOperator + " " + versionValue
versionConstraint = " (version " + versionOperator + " " + versionValue + ")"
}
message := "Could not find package " + name + versionConstraint + " in any of the repositories.\n"
if stringInSlice(dependencyType, allowedMissingDependencyTypes) {
log.Warn(indentation + message)
*nonFatalErrors += message
} else {
log.Error(indentation + message)
*fatalErrors += message
}
*fatalErrors += "Could not find package " + name + versionConstraint + " in any of the repositories.\n"
}

// CheckIfBasePackage checks whether the package should be treated as a base R package
Expand Down
9 changes: 9 additions & 0 deletions cmd/construct_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,12 @@ func Test_ConstructOutputPackageList(t *testing.T) {
"",
"",
},
{
"LinkingTo",
"nonExistentPackage",
"",
"",
},
},
"", "", "", "", "", "", "",
},
Expand Down Expand Up @@ -382,6 +388,9 @@ func Test_ConstructOutputPackageList(t *testing.T) {
},
},
packagesFiles, repositoryList,
// Let the generation of renv.lock proceed, despite 'nonExistentPackage'
// (dependency type LinkingTo) not being found in any repository.
[]string{"LinkingTo"},
)
assert.Equal(t, outputPackageList,
[]PackageDescription{
Expand Down
29 changes: 22 additions & 7 deletions cmd/parse.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,21 @@ func ParsePackagesFiles(repositoryPackageFiles map[string]string) map[string]Pac
// with those fields/properties that are required for further processing.
func ProcessPackagesFile(content string) PackagesFile {
var allPackages PackagesFile
// PACKAGES files in binary Windows repositories use CRLF line endings.
// Therefore, we first change them to LF line endings.
for _, lineGroup := range strings.Split(strings.ReplaceAll(content, "\r\n", "\n"), "\n\n") {
if lineGroup == "" {
continue
}
// Each lineGroup contains information about one package and is separated by an empty line.
firstLine := strings.Split(lineGroup, "\n")[0]
packageName := strings.ReplaceAll(firstLine, "Package: ", "")
cleaned := CleanDescriptionOrPackagesEntry(lineGroup)
cleaned := CleanDescriptionOrPackagesEntry(lineGroup, false)
if cleaned == "" {
// Package entry pointing to a "Path:" subdirectory encountered.
// Such package entries are skipped altogether.
continue
}
packageMap := make(map[string]string)
err := yaml.Unmarshal([]byte(cleaned), &packageMap)
if err != nil {
Expand All @@ -75,7 +82,7 @@ func ProcessPackagesFile(content string) PackagesFile {
// ProcessDescription reads a string containing DESCRIPTION file and returns a structure
// with those fields/properties that are required for further processing.
func ProcessDescription(description DescriptionFile, allPackages *[]PackageDescription) {
cleaned := CleanDescriptionOrPackagesEntry(description.Contents)
cleaned := CleanDescriptionOrPackagesEntry(description.Contents, true)
packageMap := make(map[string]string)
err := yaml.Unmarshal([]byte(cleaned), &packageMap)
checkError(err)
Expand All @@ -92,16 +99,24 @@ func ProcessDescription(description DescriptionFile, allPackages *[]PackageDescr
)
}

// CleanDescriptionOrPackagesEntry processes a multiline string representing information about one package
// from PACKAGES file, or the whole contents of DESCRIPTION file. Removes newlines occurring within
// filtered fields (which are predominantly fields containing lists of package dependencies).
// Also removes fields which are not required for further processing.
func CleanDescriptionOrPackagesEntry(description string) string {
// CleanDescriptionOrPackagesEntry processes a multiline string representing information about one
// package from PACKAGES file (if isDescription is false), or the whole contents of DESCRIPTION file
// (if isDescription is true). Removes newlines occurring within filtered fields (which are
// predominantly fields containing lists of package dependencies). Also removes fields which are not
// required for further processing.
func CleanDescriptionOrPackagesEntry(description string, isDescription bool) string {
lines := strings.Split(description, "\n")
filterFields := []string{"Package:", "Version:", "Depends:", "Imports:", "Suggests:", "LinkingTo:"}
outputContent := ""
processingFilteredField := false
for _, line := range lines {
if strings.HasPrefix(line, "Path:") && !isDescription {
// This means that the package is located in a subdirectory mentioned in this field.
// For example "Path: 4.4.0/Recommended" means that the package is located in
// "latest/src/contrib/4.4.0/Recommended/" subdirectory. We want to avoid these kinds of
// packages and prefer to download them from "latest/src/contrib/".
return ""
}
filteredFieldFound := false
// Check if we start processing any of the filtered fields.
for _, field := range filterFields {
Expand Down
11 changes: 9 additions & 2 deletions cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ var logLevel string
var gitHubToken string
var gitLabToken string
var outputRenvLock string
var allowIncompleteRenvLock string

// In case the lists are provided as arrays in YAML configuration file:
var inputPackages []string
Expand Down Expand Up @@ -93,13 +94,14 @@ in an renv.lock-compatible file.`,
fmt.Println("inputPackages =", inputPackages)
fmt.Println("inputRepositories =", inputRepositories)
fmt.Println("outputRenvLock =", outputRenvLock)
fmt.Println("allowIncompleteRenvLock =", allowIncompleteRenvLock)

packageDescriptionList, repositoryList, repositoryMap := ParseInput()
packageDescriptionList, repositoryList, repositoryMap, allowedMissingDependencyTypes := ParseInput()
inputDescriptionFiles := DownloadDescriptionFiles(packageDescriptionList, DownloadTextFile)
inputPackages := ParseDescriptionFileList(inputDescriptionFiles)
repositoryPackagesFiles := DownloadPackagesFiles(repositoryList, DownloadTextFile)
packagesFiles := ParsePackagesFiles(repositoryPackagesFiles)
outputPackageList := ConstructOutputPackageList(inputPackages, packagesFiles, repositoryList)
outputPackageList := ConstructOutputPackageList(inputPackages, packagesFiles, repositoryList, allowedMissingDependencyTypes)
renvLock := GenerateRenvLock(outputPackageList, repositoryMap)
writeJSON(outputRenvLock, renvLock)
},
Expand All @@ -118,6 +120,10 @@ in an renv.lock-compatible file.`,
"Token to download non-public files from GitLab.")
rootCmd.PersistentFlags().StringVar(&outputRenvLock, "outputRenvLock", "renv.lock",
"File name to save the output renv.lock file.")
rootCmd.PersistentFlags().StringVar(&allowIncompleteRenvLock, "allowIncompleteRenvLock", "",
"Locksmith will fail if any of dependencies of input packages cannot be found in the repositories. "+
"However, it will not fail for comma-separated dependency types listed in this argument, e.g.: "+
"'Imports,Depends,Suggests,LinkingTo'")

// Add version command.
rootCmd.AddCommand(extension.NewVersionCobraCmd())
Expand Down Expand Up @@ -173,6 +179,7 @@ func initializeConfig() {
"gitHubToken",
"gitLabToken",
"outputRenvLock",
"allowIncompleteRenvLock",
} {
// If the flag has not been set in newRootCommand() and it has been set in initConfig().
// In other words: if it's not been provided in command line, but has been
Expand Down
9 changes: 9 additions & 0 deletions cmd/testdata/PACKAGES
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,15 @@ License: GPL-3
MD5sum: bbb222333444555666
NeedsCompilation: no

Package: skippedPackage
Version: 5.0.0
Depends: R (>= 3.6.0)
Imports: grDevices, graphics, grid, lattice, stats, utils
License: GPL-3
MD5sum: aaabbbccc999888777
NeedsCompilation: no
Path: 4.4.0/Recommended

Package: somePackage3
Version: 0.0.1
Depends: R (>= 3.1.0)
Expand Down
Loading
Loading