Skip to content

Commit

Permalink
Merge pull request #1387 from jemrobinson/software-package-requests
Browse files Browse the repository at this point in the history
Policy for software package requests
  • Loading branch information
jemrobinson authored Feb 15, 2023
2 parents 8424c59 + 91681b5 commit a53f4d3
Show file tree
Hide file tree
Showing 6 changed files with 126 additions and 2 deletions.
42 changes: 42 additions & 0 deletions .github/ISSUE_TEMPLATE/software_package_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
name: Software package request
about: Request that a new package is added to the Tier 3 allow list
title: "<short description of issue>"
labels: "type: enhancement"
assignees: ""
---

## :white_check_mark: Checklist

<!--
Before reporting a problem please check the following. Replace the empty checkboxes [ ] below with checked ones [x] accordingly.
-->

- [ ] I have searched open and closed issues for duplicates.
- [ ] This is a request for a new software package to be added to the Data Safe Haven
- [ ] The package is still missing in the [latest version](https://github.com/alan-turing-institute/data-safe-haven/releases).

## :gift: Package details

<!--
Provide details about the package you would like to see added:
- Package name
- Target audience: "core" (broadly useful for any project) or "extra" (domain specific)
- Package version (if different from latest)
- Package repository (e.g. CRAN, PyPI)
- Number of authors/contributors to the package codebase
- Any existing versions that should not be used (linking to publicly-accessible CVE databases if relevant)
- Download statistics (recent and longer-term, for both current and previous versions)
- List of packages that this package depends on
-->

## :steam_locomotive: Why is this needed?

<!--
To support this request, please respond to the following questions:
- What will you be able to do with this package that you can't currently do?
- Is this the most widely supported package for the intended purpose? What alternatives are there?
- What risks to data integrity/security might arise from including this package or its dependencies?
-->
Original file line number Diff line number Diff line change
Expand Up @@ -123,10 +123,10 @@ function Get-Dependencies {
# --------------------------
$languageName = @{cran = "r"; pypi = "python" }[$Repository]
$coreAllowlistPath = Join-Path $PSScriptRoot ".." ".." "environment_configs" "package_lists" "allowlist-core-${languageName}-${Repository}-tier3.list"
$extraAllowlistPath = Join-Path $PSScriptRoot ".." ".." "environment_configs" "package_lists" "allowlist-extra-${languageName}-${Repository}-tier3.list"
$fullAllowlistPath = Join-Path $PSScriptRoot ".." ".." "environment_configs" "package_lists" "allowlist-full-${languageName}-${Repository}-tier3.list"
$dependencyCachePath = Join-Path $PSScriptRoot ".." ".." "environment_configs" "package_lists" "dependency-cache.json"
$corePackageList = Get-Content $coreAllowlistPath | Sort-Object -Unique

$corePackageList = (Get-Content $coreAllowlistPath) + (Get-Content $extraAllowlistPath) | Sort-Object -Unique

# Initialise the package queue
# ----------------------------
Expand Down
4 changes: 4 additions & 0 deletions docs/processes/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ data_ingress.md
data_egress.md
data_access_controls.md
data_transfer_protocol.md
software_package_approval.md
```

[Data handling](data_handling.md)
Expand All @@ -29,3 +30,6 @@ data_transfer_protocol.md

[Data transfer protocol](data_transfer_protocol.md)
: A description of the protocol used for data transfer at the Turing

[Software approval policy](software_package_approval.md)
: The checks needed before a software package can be approved for use in Tier 3 environments
78 changes: 78 additions & 0 deletions docs/processes/software_package_approval.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Software approval policy

To maximise the usability of the secure research environments, we pre-install certain software packages that are deemed broadly useful to a cross section of researchers, thus making them available to all users.

Other software packages which are only of interest to a subset of users can be made available for download from remote package repositories.
Currently, we support PyPI (Python) and CRAN (R) as remote repositories.

For higher {ref}`sensitivity tiers <policy_classification_sensitivity_tiers>` ({ref}`policy_tier_3` and above), only a subset of packages are made available in this way.
This subset of packages constitutes an "allowlist" of packages that have been vetted to mitigate the risk of introducing malicious or unsound software into the secure environment.

```{warning}
The Data Safe Haven team manages a default allowlist, but individual deployments may be using their own versions.
Check with your {ref}`role_system_manager` what is being used in your case
```

This page sets out the policy for adding software packages to the default allowlist and/or software to the pre-installed software list.
It also describes the procedure that users of the secure environment should follow to request new additions to the allowlist.

## Background

Given the safeguards afforded by the safe haven environment, and the separation of responsibilities between its constituent resources, the level of risk associated with the introduction of third party software packages is considered low.
Moreover, access to the environment is carefully controlled and there is a presumption of trust in the individual researchers to whom access is granted.

Nevertheless, the introduction of any software into the safe haven must be considered against the potential risks of:

- approved users having access to data to which they shouldn't (e.g. from data mixing)
- unapproved users having access to data (e.g. from a data breach)
- poisoning of data and/or outputs
- resource misuse (allocation of computational resources for unintended or wasteful purposes).

Such risks may originate unwittingly, from a user who wants to "just get the job done", or from a user, network team member or administrator acting maliciously.

Specific risks which this policy aims to mitigate include:

- package name squatting (allowlisting a similarly-named package instead of the intended one)
- privilege escalation attacks (enabling a user to gain elevated access permissions)
- unauthorised data ingress (in particular, it is possible to upload arbitrary data to PyPI without review)

(package_inclusion_policy)=

## Policy

- For each supported repository, three package lists will be maintained:
- a core allowlist of broadly useful packages that should be pre-installed in each environment
- an extra allowlist of packages that may be useful for specific projects
- an expanded list to be made available from the package repositories consisting of the core and extra packages plus their dependencies
- Users may request to add packages to these allowlists via the {ref}`package request procedure <package_request_procedure>`.
- In the interests of improving researcher productivity the aim will be to accommodate such requests, provided there are no outweighing security concerns associated with the package or its dependencies.
- Requests will be reviewed by the project team using the information provided by the user when making the request
- If approved, a requested package will be added to either the core or extra allowlist (as appropriate)

(package_inclusion_criteria)=

### Criteria for inclusion in core

Only software that is considered broadly useful to a cross section of researchers should be included in core.

To meet this condition, a package should:

- implement at least one generic (i.e. not domain-specific) statistical algorithm or method, or
- provide support for a cross-cutting analysis technique (e.g. geospatial data analysis, NLP), or
- facilitate data science or software development best practices (e.g. for robustness, correctness, reproducibility), or
- enhance the presentational features of the programming language (e.g. for producing plots, notebooks, articles, websites), or
- enhance the usability of the programming language or development environment (e.g. RStudio, PyCharm)

(package_request_procedure)=

## Package request/review procedure

- A user requests a package by opening a `Software package request` issue on the Data Safe Haven GitHub repository, including responses to the following questions:
- Is this package the mostly widely supported for the intended purpose?
- What will you be able to do with this package that you can't currently do? What alternatives are there?
- What risks to data integrity/security might arise from including this package or its dependencies?
- A member of the project team reviews the request according to the terms of the {ref}`package_inclusion_policy`.
- The reviewer adds their decision (accept/reject) to the issue and notifies the user who made the request.
- If the decision is to reject, the reviewer must include an explanation. Any subsequent request for the same package should address the specific concern raised.
- If the decision is to accept, a pull request should be made that will add the package to the appropriate list.
- Once the pull request is approved, system administrators of any running deployment can decide whether to update to the new allowlist definitions.
Empty file.
Empty file.

0 comments on commit a53f4d3

Please sign in to comment.