-
Notifications
You must be signed in to change notification settings - Fork 0
/
export.qmd
177 lines (90 loc) · 12.5 KB
/
export.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
title: "Export guidelines"
---
## **Export Review Guidelines**
To provide ADRF users with the ability to draw from sensitive data, results that are exported from the ADRF must meet rigorous standards meant to protect privacy and confidentiality. To ensure that those standards are met, the ADRF Export Review team reviews each request to ensure that it follows formal guidelines that are set by the respective agency providing the data in partnership with the Coleridge Initiative. Prior to moving data into the ADRF from the agency, the Export Review team suggests default guidelines to implement, based on standard statistical approaches in the U.S. government ^[1,2](references.qmd#References)^ as well as international standards ^[3,4](references.qmd#References)^, and ^[5](references.qmd#References)^. The Data Steward from the agency supplying the data works with the team to amend these default rules in line with the agency's requirements. If you are unsure about the review guidelines for the data you are using in the ADRF or if you have any questions relating to exports, please reach out to support\@coleridgeinitiative.org before submitting an export request.
To learn more about limiting disclosure more generally, please refer to the [textbook](https://textbook.coleridgeinitiative.org/ "https://textbook.coleridgeinitiative.org/") or view the [videos](https://www.youtube.com/playlist?list=PLCtgqmGgzkEzwdPYKfrNHtOKFu3P5ErrU "https://www.youtube.com/playlist?list=PLCtgqmGgzkEzwdPYKfrNHtOKFu3P5ErrU").
### General Best Practices for a Successful Export
1. Currently, the review process is highly manual: Reviewers will read your code and view your output files, which may be time-consuming.
2. Each additional release adds disclosure risk and therefore limits subsequent releases; we ask that users limit the number of files they request to export to just the outputs necessary to produce a particular report or paper. If you are requesting an export of more than 10 files, there may be an additional charge.
3. The reviewers may ask you to make changes to your code or output to meet the requirements of guidelines that have been given by the providers of the data in the ADRF. Thus, we strongly encourage you to produce all output files---tables with rounded numbers, graphs with titles, and so forth---through code, rather than manually.
4. We ask that you only request review of final versions of output files, rather than in-progress versions. Any file containing intermediate output will be rejected.
5. Every code file should have a header describing the contents of the file, including a summary of the data manipulation that takes place in the file (e.g., regression, table or figure creation, etc.).
6. Documenting code by using comments throughout is helpful for disclosure reviews. The better the documentation, the faster the turnaround of export requests. If data files are aggregated, please provide documentation on the level of aggregation and for where in the code the aggregation takes place.
7. To help reviewers, who may not have seen your code before, we ask that users create meaningful variable names. For instance, if you are calculating outflows, it is better to name the variable "outflows" than to name it "var1."
### Timelines for Export Process
1. Coleridge reviewers have five business days to complete an export from the day you submit an export request. However, timelines may differ depending on your agency, so please refer to your specific agency's guidelines.
2. The review process can be delayed if the reviewer needs additional information or if the reviewer needs you to make changes to your code or output to meet the ADRF nondisclosure requirements.
### Export Review Process
The ADRF Export Review process typically involves two main stages:
1. Primary Review:
This is an initial, cursory review of your documentation and exports to ensure they do not include micro-data. A primary review can take up to 5 business days, so please plan accordingly when submitting your materials.
In cases where the reviewer has questions or requires additional information, the primary review may extend beyond 5 business days.
2. Secondary Review:
This is a comprehensive review conducted by an approved Data Steward who has content knowledge for the data permissioned to your workspace.
If your submission pertains to multiple data assets, it will require approval by each Data Steward before the material can be exported from the ADRF.
### How to Check Your Export Review Status:
If you've submitted an export request, you can easily check the status of your submission by following these steps:
1. Log into the ADRF.
2. Open the ADRF Export module.
#### Status Descriptions:
To help you better understand the different stages of the Export Review process, here are the status descriptions you may encounter:
1. Awaiting Reviewer:
Your export is currently under primary review. If any issues arise during the primary review, your reviewer will notify you. Upon completion of the primary review, the secondary reviewer(s) will be notified.
2. Awaiting Secondary Review:
Your export is currently under secondary review. If your submission pertains to multiple data assets, it will require a review by each Data Steward before being approved.
## **Preparing Data for Export**
Each agency has specific disclosure review guidelines, especially with respect to the minimum allowable cell sizes for tables. Refer to these guidelines when preparing export requests. If you are unsure of what guidelines are in place for the dataset with which you are working in the ADRF, please reach out to [support\@coleridgeinitiative.org](mailto:support@coleridgeinitiative.org "mailto:support@coleridgeinitiative.org").
### Tables
1. Cell Sizes
1. For individual-level data, please report the number of observations from each cell. For individual-level data, the default rule is to suppress cells with fewer than 10 observations, unless otherwise directed by the guidelines of the agency that provided the data.
2. If your table includes row or column totals or is dependent on a preceding or subsequent table, reviewers will need to take into account complementary disclosure risks---that is, whether the tables' totals, or the separate tables when read together, might disclose information about individuals in the data in a way that a single, simpler table would not. Reviewers will work with you by offering guidance on implementing any necessary complementary suppression techniques.
2. Weighted Data
1. If weighted results are to be exported, you must report both weighted and unweighted counts.
3. Ratios
1. If ratios are reported, please report the number of valid cases for both the numerator and the denominator (e.g., number of men in state X and number of women in state X, in addition to the ratio of women in state X).
4. Percentiles
1. Do not report exact percentiles. Instead, for example, you may calculate a "fuzzy median," by averaging the true 45th and 55th percentiles.
5. Percentages
1. For any reported percentages or proportions, the underlying counts of individuals contributing to the numerators and denominators must be provided for each statistic in the desired export.
6. Maxima and Minima
1. Suppress maximum and minimum values in general.
2. You may replace an exact maximum or minimum with a top-coded value.
### Graphs
1. Graphs are representations of tables. Thus, for each graph (which may have, e.g., a jpg, pdf, png, or tif extension), provide the source data of the underlying table of the graph following the guidelines for tables above.
2. Because graphs and other figures take the most time to review, the number of generated graphs should be as low as possible. Please consider the possibility that you could export the underlying table instead, and generate the graph in another package.
3. If a graph is produced from aggregated data or from tables that have been disclosure-proofed following the guidelines above (e.g., bar charts of magnitudes), provide the underlying tables.
4. If a graph is produced directly from unit-record data but aggregated in the visualization (e.g., frequency histograms), provide the underlying tables.
5. If a graph is produced directly from unit-record data and displays unit-record values (e.g., scatterplots, plots of residuals), the graph can be released only after you ensure that individuals cannot be re-identified and that values can only be estimated with a high level of uncertainty. Further processing to meet this requirement can include, but is not restricted to, cutting off the tails of a distribution, removing outliers, jittering the actual values, and removing or modifying axis values.
6. If a graph is produced from the results of modeling or derivation and uses the unit-record data (e.g., regression curves), the graph can be released only if the values cannot be used to find original data values.
1. Graphs of this type are generally automatically cleared.
2. For precision/recall graphs, you will need to report the sample size used to generate your model(s).
### Model Output
1. Output from regression or machine-learning models generally does not pose a risk of disclosing personally identifiable information, as long as the models are not based on small samples. Provide the counts for each variable that produces the model output. If categorical variables are used then provide the counts for each category.
## **Submitting an Export Request**
To request an export be reviewed, please watch the following video or follow the instructions below:
[Export Video](https://www.youtube.com/watch?v=qXG_i0v_bDQ "https://www.youtube.com/watch?v=qXG_i0v_bDQ")
1. Click here: [http://adrf.okta.com](http://adrf.okta.com/ "http://adrf.okta.com") (ADRF 3).
2. Input your login credentials.
3. Verify yourself with Okta (download Okta Verify on your smartphone or other device).
4. Choose your project as seen in the photo below. For the purpose of this document, you are seeing the Coleridge Initiative Associate Access project.
![](images/exp1.png){fig-align="center"}
5. Select Desktop and login with the same credentials you had done previously.
6. Upon entering the ADRF, a chrome page will appear as shown in the photo below. On this page, click Export Request in the bottom left corner. Or, from the ADRF desktop, open Google Chrome and navigate to [export.adrf.net](http://export.adrf.net/ "http://export.adrf.net"). (Note: [export.adrf.net](http://export.adrf.net/ "http://export.adrf.net") is an address that will only work within the ADRF desktop).
![](images/exp2.png){fig-align="center"}
7. Click My Requests, or the top (person-shaped) icon, at the left side of the window as shown in the screenshot below.
![](images/exp3.png){fig-align="center"}
8. Click New Item as shown below
![](images/exp4.png){fig-align="center"}
- You will be asked to select the project to which your export relates. If you do not see the correct project listed in the dropdown list, please reach out to our support team at [support\@coleridgeinitiative.org](mailto:support@coleridgeinitiative.org "mailto:support@coleridgeinitiative.org").
- After selecting a project, click Continue.
![](images/exp5.png){fig-align="center"}
9. Read through the entire page that loads. This page, titled "Create Export Request," will ask for you to comment on all supporting code files to explain the commands used to generate the files in the export request. The Export Review team will reject all requests containing intermediate output, and there should be no more than 10 separate files for export unless approval is given in advance. The Export Review team will typically release export requests within five business days. However, if the team has any clarifying questions, this could result in a longer review process. You need to document your output files in the text box provided. See the example below:
![](images/exp6.png){fig-align="center"}
10. When you have read through and followed the page instructions, and are ready to proceed:
Move the slider at the bottom of the page to indicate that you have followed the page's guidelines.
At the bottom of the page, upload each of the files that you have prepared.
Click `Submit Request…` to create the export request.
![](images/exp7.png){fig-align="center"}
11. You can click `My Requests` at the left side of the window to view your current and previous export requests.
1. To learn more about exporting results, please watch these [videos](https://www.youtube.com/playlist?list=PLCtgqmGgzkEyzBP-R06Lvt5TRt1Wb2ySa "https://www.youtube.com/playlist?list=PLCtgqmGgzkEyzBP-R06Lvt5TRt1Wb2ySa").