Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First stab at the final data evaluation function. This includes a tem… #385

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jwalsh28
Copy link
Collaborator

…plate for the final data expectations form which is used in the test function. It also includes an example in QMD format based on the housing affordability final data.

Note that there is also a page in the Wiki now describing this function and how to apply it. Some outstanding questions below:

For final data expectations - do we have them read in the form at top of each metric program or is it fine as an argument in the function?
Where should the final data expectations forms live? Right now I'm encouraging the final data folder for the relevant metric.
And where should the final data expectations template live? I have it in the functions/testing folder as of now.
Where should the test function example live? I have it in the functions/testing folder right now.

…plate for the final data expectations form which is used in the test function. It also includes an example in QMD format based on the housing affordability final data
Copy link
Collaborator

@wcurrangroome wcurrangroome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See overarching feedback in a comment on the .qmd. Looks good, JP!

# subgroups (logical): a true or false value indicating if the final file has subgroups
# confidence_intervals (logical): a true or false value indicating if the final file has confidence intervals
# Returns:
# a series of test results that will throw an error is failed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# a series of test results that will throw an error is failed
# a series of test results that will throw an error if failed

select(X1:X5) %>%
mutate(quality_title = ifelse(X5 == "Yes", paste0(X2, "_", "quality"), NA_character_),
ci_low_title = ifelse(X4 == "Yes", paste0(X2, "_", "lb"), NA_character_),
ci_high_title = ifelse(X4 == "Yes", paste0(X2, "_", "up"), NA_character_),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ci_high_title = ifelse(X4 == "Yes", paste0(X2, "_", "up"), NA_character_),
ci_high_title = ifelse(X4 == "Yes", paste0(X2, "_", "ub"), NA_character_),

)

#For final data with multiple values expand the form results
if(exp_form_variables %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference, but I either like to have a conditional test on a single line, or to separate it out and assign the result of the test to a relevantly named variable, that I then pass to the if expression.

)

#For final data with multiple values expand the form results
if(exp_form_variables %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(exp_form_variables %>%
if (exp_form_variables %>%

geography = "county", subgroups = FALSE, confidence_intervals = TRUE) {

#Read in the data expectation form
exp_form <- read_csv(here::here(exp_form_path),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest renaming all variables here. I don't like working with variables named X1, X2, etc. because I never know what they're supposed to represent, and so it's difficult to know if they're treated incorrectly at any point. Can we skip the first few lines of metadata in read_csv() so that these are auto-named, then either manually rename each or use janitor::clean_names() or something similar?


Third, the function asks whether there are subgroups in this data. For this example we input true.

Finally, the function asks whether there are confidence intervals in this data. For this example we input false. Now we can run our tests on the data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Finally, the function asks whether there are confidence intervals in this data. For this example we input false. Now we can run our tests on the data.
Finally, the function asks whether there are confidence intervals in this data.


It passed! Yay!

Note that there is also an argument for geography in the function which is by default set to "county". Because we are evaluation county level data we do not have to change this but if you are evaluation place data you will.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that there is also an argument for geography in the function which is by default set to "county". Because we are evaluation county level data we do not have to change this but if you are evaluation place data you will.


```{r}
evaluate_final_data(exp_form_path = "functions/testing/example_final_data_evaluation_form_housing.csv",
data = final_data,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data = final_data,
data = final_data,
geography = "county",


## Final Data Test Function

The data team for the Mobility Metrics project has created a function that tests the final data being produced by each metric program. The function tests a series of baseline requirements for how the final data should be written out. It also looks at information from the final data expectations form which is filled out by a metric lead prior to updating a metric and approved by their technical reviewer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The data team for the Mobility Metrics project has created a function that tests the final data being produced by each metric program. The function tests a series of baseline requirements for how the final data should be written out. It also looks at information from the final data expectations form which is filled out by a metric lead prior to updating a metric and approved by their technical reviewer.
The data team for the Mobility Metrics project has created a function that tests baseline requirements for the structure and content of data produced for each metric. This function relies in part on information provided in a metric-specific final data expectations form, which should be filled out by a metric lead prior to having the metric approved by their technical reviewer.


## Final Data Test Function

The data team for the Mobility Metrics project has created a function that tests the final data being produced by each metric program. The function tests a series of baseline requirements for how the final data should be written out. It also looks at information from the final data expectations form which is filled out by a metric lead prior to updating a metric and approved by their technical reviewer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there guidance for where this function will be applied for each metric? I'm imagining that at the end of the metric-specific Quarto document--after they've produced their final data frame of metric results--they'll pass that dataframe into this testing function, but curious if you have different thoughts about where makes sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants