From 4b10ac3af97ededae3b877046925ab5cb8535fc5 Mon Sep 17 00:00:00 2001 From: David Evans Date: Fri, 25 Oct 2024 14:33:43 +0100 Subject: [PATCH 1/3] Revert "Update T1OO text to reflect new reality" This reverts commit 7bc93d1c690375cb513122f8c330b858ae2e7e00. --- docs/type-one-opt-outs.md | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/docs/type-one-opt-outs.md b/docs/type-one-opt-outs.md index c199c1601..b1719781d 100644 --- a/docs/type-one-opt-outs.md +++ b/docs/type-one-opt-outs.md @@ -25,18 +25,12 @@ Instead they describe the data they require using [ehrQL](https://docs.opensafel At the point where ehrQL needs to fetch the data, it is told (by the system described above) whether it should include data from opted-out patients or not. Every ehrQL query contains a "population definition" which specifies exactly which criteria a patient must meet to be included in the result e.g. "patients between the ages of 18 and 65 who have not recently changed GP practice". -Unless a project is named in the project permissions file, ehrQL will automatically add an extra condition to this population definition: the patient's pseudonymous ID number must appear in the list of allowed ID numbers. +Unless a project is named in the project permissions file, ehrQL will automatically add an extra condition to this population definition: the patient's pseudonymous ID number must not appear in the list of ID numbers with a registered type 1 opt-out. This list is provided by the system suppliers and stored in the secure database along with the rest of the patient data. -It contains all patients known to the system supplier with two classes of patient _removed_: +It consists of a single bespoke type 1 opt-out table, with a single list of pseudonymous IDs and no other information. - * patients which are known to have registered a type 1 opt-out; and - * patients which _might_ have registered a type 1 opt-out elsewhere, which would not be recorded by the system supplier. - -That is, the list contains just those patients which the system supplier can be confident have _not_ registered a type 1 opt-out. -It consists of a single bespoke table, with a single list of pseudonymous IDs and no other information. - -Again, the [code which enforces this](https://github.com/opensafely-core/ehrql/blob/f5b0d5f56b53039062cf1f95ea76dda584f485de/ehrql/backends/tpp.py#L97-L136) is publicly available on Github. +Again, the [code which enforces this](https://github.com/opensafely-core/ehrql/blob/8494b943be0d73d02413ad41272a612a5fddbff3/ehrql/backends/tpp.py#L75-L100) is publicly available on Github. ### Data access which does _not_ go via ehrQL @@ -45,7 +39,7 @@ There are three sorts of circumstances under which data access in OpenSAFELY doe #### 1. Cohort Extractor ehrQL's predecessor was a tool called "Cohort Extractor" and studies which began before the launch of ehrQL continue to use this tool; these all had permission to process data from patients with a type 1 opt-out. -Cohort Extractor applies exactly the [same rules](https://github.com/opensafely-core/cohort-extractor/blob/f07867c1b277115c28859bcf356e7379953ca43b/cohortextractor/tpp_backend.py#L420-L441) as does ehrQL with respect to opt-outs. +Cohort Extractor applies exactly the [same rules](https://github.com/opensafely-core/cohort-extractor/blob/18c954499ec0a8fbcf5f83e0a4d1bbe2a469b0c1/cohortextractor/tpp_backend.py#L417-L435) as does ehrQL with respect to opt-outs. However, as a tool, it was not originally intended to enforce data access controls and its design makes it difficult to implement the same security boundaries as ehrQL. As a result, we have limited access to Cohort Extractor to just those projects which _already_ have access to opted-out data. This is enforced by the same mechanism as access to opted-out data i.e. an auditable file of [permitted projects](https://github.com/opensafely-core/job-server/blob/main/jobserver/permissions/cohortextractor.py), and enforced [code protection rules](https://github.com/opensafely-core/job-server/blob/main/.github/CODEOWNERS). From e400418a9d5beb8703b4416553abdebcafd3a90e Mon Sep 17 00:00:00 2001 From: David Evans Date: Tue, 5 Nov 2024 16:49:24 +0000 Subject: [PATCH 2/3] Update text on discontinued Cohort Extractor --- docs/type-one-opt-outs.md | 25 ++++++++----------------- 1 file changed, 8 insertions(+), 17 deletions(-) diff --git a/docs/type-one-opt-outs.md b/docs/type-one-opt-outs.md index b1719781d..e180427ee 100644 --- a/docs/type-one-opt-outs.md +++ b/docs/type-one-opt-outs.md @@ -34,42 +34,33 @@ Again, the [code which enforces this](https://github.com/opensafely-core/ehrql/b ### Data access which does _not_ go via ehrQL -There are three sorts of circumstances under which data access in OpenSAFELY does not go via ehrQL, each with different behaviour with respect to type 1 opted-out patients' data. +There are two sorts of circumstances under which data access in OpenSAFELY does not go via ehrQL, each with different behaviour with respect to type 1 opted-out patients' data. -#### 1. Cohort Extractor - -ehrQL's predecessor was a tool called "Cohort Extractor" and studies which began before the launch of ehrQL continue to use this tool; these all had permission to process data from patients with a type 1 opt-out. -Cohort Extractor applies exactly the [same rules](https://github.com/opensafely-core/cohort-extractor/blob/18c954499ec0a8fbcf5f83e0a4d1bbe2a469b0c1/cohortextractor/tpp_backend.py#L417-L435) as does ehrQL with respect to opt-outs. -However, as a tool, it was not originally intended to enforce data access controls and its design makes it difficult to implement the same security boundaries as ehrQL. -As a result, we have limited access to Cohort Extractor to just those projects which _already_ have access to opted-out data. -This is enforced by the same mechanism as access to opted-out data i.e. an auditable file of [permitted projects](https://github.com/opensafely-core/job-server/blob/main/jobserver/permissions/cohortextractor.py), and enforced [code protection rules](https://github.com/opensafely-core/job-server/blob/main/.github/CODEOWNERS). - -We may make limited exceptions to this in the short term if there are specific reasons why a project cannot feasibly use ehrQL and where we can be confident there is no attempt to subvert Cohort Extractor's security. -Any such exceptions will appear, along with details of who approved them and why, in the public [audit log](https://github.com/opensafely-core/job-server/commits/main/jobserver/permissions/cohortextractor.py). - -Longer term, Cohort Extractor will be retired entirely. - -#### 2. SQL Runner +#### 1. SQL Runner SQL Runner is a tool which allows the user to retrieve data by writing "raw" SQL rather than ehrQL. It is intended for the data curation and investigation tasks necessary for operating the platform, rather than research purposes. Its use is therefore limited to just those OpenSAFELY staff involved in this work. Details of the circumstances under which OpenSAFELY staff may perform development and maintenance activities are described in our [Data Access Policy](https://docs.opensafely.org/data-access-policy/). -This is enforced by a parallel mechanism to that which controls use of Cohort Extractor and any changes to this policy will appear in the public [audit log](https://github.com/opensafely-core/job-server/commits/main/jobserver/permissions/sqlrunner.py). +This is enforced by a parallel mechanism to that which controls access to type 1 opt out data via ehrQL and any changes to this policy will appear in the public [audit log](https://github.com/opensafely-core/job-server/commits/main/jobserver/permissions/sqlrunner.py). SQL Runner does not itself grant or deny access to opted-out data. Instead the user must declare whether the task in question should properly exclude such data or not, and SQL Runner enforces that such a declaration has been made by rejecting any queries which do not explicitly reference the opt-out table. All SQL Runner code run against patient data is also visible on our public “jobs” server: [https://jobs.opensafely.org/](https://jobs.opensafely.org/); therefore, it will be possible to see which code (or jobs) were run against patients with a type 1 opt-out. -#### 3. Direct access to pseudonymised data +#### 2. Direct access to pseudonymised data In order to facilitate the operation and maintenance of the OpenSAFELY platform a small number of individuals are able to access the pseudonymised data directly, without going via ehrQL, Cohort Extractor or SQL Runner. It is important to note that the code run in such circumstances will not be publicly visible on our “jobs” server, but it is logged in the database audit file of the GP system suppliers; preventing access to patient data with a type 1 opt-out is not enforceable at this level. The circumstances under which this is permitted and the rationale are covered in detail in our [Data Access Policy](https://docs.opensafely.org/data-access-policy/) but, importantly, such access is never used for research purposes. +#### Cohort Extractor is discontinued + +ehrQL's predecessor was a tool called "Cohort Extractor". This has now been discontinued and is no longer permitted to be used. Although Cohort Extractor applied exactly the same rules as does ehrQL with respect to opt-outs it was not originally intended to enforce data access controls and its design makes it difficult to implement the same security boundaries as ehrQL. + ## Summary Diagram ![](./images/t1oos.png) From 1b5f8dac36e84f7cf8b684f034ea0485104d42af Mon Sep 17 00:00:00 2001 From: David Evans Date: Tue, 5 Nov 2024 16:52:30 +0000 Subject: [PATCH 3/3] Update ehrQL code links While we're updating this text we may as well point to the very latest version of the code. --- docs/type-one-opt-outs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/type-one-opt-outs.md b/docs/type-one-opt-outs.md index e180427ee..3d6b5d729 100644 --- a/docs/type-one-opt-outs.md +++ b/docs/type-one-opt-outs.md @@ -30,7 +30,7 @@ Unless a project is named in the project permissions file, ehrQL will automatica This list is provided by the system suppliers and stored in the secure database along with the rest of the patient data. It consists of a single bespoke type 1 opt-out table, with a single list of pseudonymous IDs and no other information. -Again, the [code which enforces this](https://github.com/opensafely-core/ehrql/blob/8494b943be0d73d02413ad41272a612a5fddbff3/ehrql/backends/tpp.py#L75-L100) is publicly available on Github. +Again, the [code which enforces this](https://github.com/opensafely-core/ehrql/blob/72f289b0183e4c5dcbd9cbd6fcfa243a76fb9a67/ehrql/backends/tpp.py#L97-L129) is publicly available on Github. ### Data access which does _not_ go via ehrQL