-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First stab at the final data evaluation function. This includes a tem… #385
base: main
Are you sure you want to change the base?
Conversation
…plate for the final data expectations form which is used in the test function. It also includes an example in QMD format based on the housing affordability final data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See overarching feedback in a comment on the .qmd
. Looks good, JP!
# subgroups (logical): a true or false value indicating if the final file has subgroups | ||
# confidence_intervals (logical): a true or false value indicating if the final file has confidence intervals | ||
# Returns: | ||
# a series of test results that will throw an error is failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# a series of test results that will throw an error is failed | |
# a series of test results that will throw an error if failed |
select(X1:X5) %>% | ||
mutate(quality_title = ifelse(X5 == "Yes", paste0(X2, "_", "quality"), NA_character_), | ||
ci_low_title = ifelse(X4 == "Yes", paste0(X2, "_", "lb"), NA_character_), | ||
ci_high_title = ifelse(X4 == "Yes", paste0(X2, "_", "up"), NA_character_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ci_high_title = ifelse(X4 == "Yes", paste0(X2, "_", "up"), NA_character_), | |
ci_high_title = ifelse(X4 == "Yes", paste0(X2, "_", "ub"), NA_character_), |
) | ||
|
||
#For final data with multiple values expand the form results | ||
if(exp_form_variables %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference, but I either like to have a conditional test on a single line, or to separate it out and assign the result of the test to a relevantly named variable, that I then pass to the if
expression.
) | ||
|
||
#For final data with multiple values expand the form results | ||
if(exp_form_variables %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(exp_form_variables %>% | |
if (exp_form_variables %>% |
geography = "county", subgroups = FALSE, confidence_intervals = TRUE) { | ||
|
||
#Read in the data expectation form | ||
exp_form <- read_csv(here::here(exp_form_path), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest renaming all variables here. I don't like working with variables named X1
, X2
, etc. because I never know what they're supposed to represent, and so it's difficult to know if they're treated incorrectly at any point. Can we skip the first few lines of metadata in read_csv()
so that these are auto-named, then either manually rename each or use janitor::clean_names()
or something similar?
|
||
Third, the function asks whether there are subgroups in this data. For this example we input true. | ||
|
||
Finally, the function asks whether there are confidence intervals in this data. For this example we input false. Now we can run our tests on the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, the function asks whether there are confidence intervals in this data. For this example we input false. Now we can run our tests on the data. | |
Finally, the function asks whether there are confidence intervals in this data. |
|
||
It passed! Yay! | ||
|
||
Note that there is also an argument for geography in the function which is by default set to "county". Because we are evaluation county level data we do not have to change this but if you are evaluation place data you will. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there is also an argument for geography in the function which is by default set to "county". Because we are evaluation county level data we do not have to change this but if you are evaluation place data you will. |
|
||
```{r} | ||
evaluate_final_data(exp_form_path = "functions/testing/example_final_data_evaluation_form_housing.csv", | ||
data = final_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data = final_data, | |
data = final_data, | |
geography = "county", |
|
||
## Final Data Test Function | ||
|
||
The data team for the Mobility Metrics project has created a function that tests the final data being produced by each metric program. The function tests a series of baseline requirements for how the final data should be written out. It also looks at information from the final data expectations form which is filled out by a metric lead prior to updating a metric and approved by their technical reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data team for the Mobility Metrics project has created a function that tests the final data being produced by each metric program. The function tests a series of baseline requirements for how the final data should be written out. It also looks at information from the final data expectations form which is filled out by a metric lead prior to updating a metric and approved by their technical reviewer. | |
The data team for the Mobility Metrics project has created a function that tests baseline requirements for the structure and content of data produced for each metric. This function relies in part on information provided in a metric-specific final data expectations form, which should be filled out by a metric lead prior to having the metric approved by their technical reviewer. |
|
||
## Final Data Test Function | ||
|
||
The data team for the Mobility Metrics project has created a function that tests the final data being produced by each metric program. The function tests a series of baseline requirements for how the final data should be written out. It also looks at information from the final data expectations form which is filled out by a metric lead prior to updating a metric and approved by their technical reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there guidance for where this function will be applied for each metric? I'm imagining that at the end of the metric-specific Quarto document--after they've produced their final data frame of metric results--they'll pass that dataframe into this testing function, but curious if you have different thoughts about where makes sense?
…plate for the final data expectations form which is used in the test function. It also includes an example in QMD format based on the housing affordability final data.
Note that there is also a page in the Wiki now describing this function and how to apply it. Some outstanding questions below:
For final data expectations - do we have them read in the form at top of each metric program or is it fine as an argument in the function?
Where should the final data expectations forms live? Right now I'm encouraging the final data folder for the relevant metric.
And where should the final data expectations template live? I have it in the functions/testing folder as of now.
Where should the test function example live? I have it in the functions/testing folder right now.