-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quality check and pipeline code fixes #169
Conversation
First mutate then filter, so that mutate no longer throws a warning when filter has resulted in zero records. Also add PopID as grouping variable to C5. Quality check should be able to run on single-population as well as multi-population pipeline outputs.
This function should not try to fit a logistic model when all ChickAge or Mass records are NA.
First mutate then filter, so that mutate no longer throws a warning when filter has resulted in zero records. Also change which column/variable is preferred for B6. When pipeline outputs from two versions (v1.0 and v1.1) are combined, datasets will contain old and new columns. Quality check should be run on the columns that actually contain the information.
Quality check should be able to run on single-population as well as multi-population pipeline outputs.
Checking whether individuals have BroodIDs is only done for chicks. Now the message supplied in the report explicitly says that the record concerns a chick without a BroodID.
Part of the message "Impossible chick age may be caused by problems with hatch date." was printed for every record, but it is clearer to only show this on the pages of the report where the checks are described.
Previously the check used RingAge == "chick", but this may include individuals first caught after they fledged, so they are not expected to have a BroodID. Now we use the more accurate Age_observed == 1.
Set empty strings ("") for FemaleID and MaleID in Brood_data to NA.
`create_individual_UAN()` is now primarily based on Capture_data. Only Sex is determined via primary data, because the UAN pipeline is created in version 1.0 of the standard format and therefore does not have Sex columns in the Capture_data.
Previously, when selecting either BOS or PEE, both were selected in the pipeline as there was no pop_filter.
BroodIDs were wrongly filled into BroodIDLaid for individuals caught first as adult.
When grouping structures inserted by `dplyr::group_by()` and `dplyr::rowwise()` are not removed (by `dplyr::ungroup()` or `dplyr::summarise(..., .groups = "drop")`), quality check is very slow.
After some quality check and pipeline fixes, I've run the quality check on subsets (years: 2005-2015) of the datasets, resulting in relatively small reports:
*Note that we haven't fixed the issue in the NIOO pipeline (the one related to individuals in Individual_data missing in Capture_data) yet. This subset seems to be unaffected (check I6 flags no missing records), but it might still be worth fixing before sending the documents to Marcel. @LiamDBailey - what do you think? |
This bug appeared when a pipeline output table is not a tbl/tibble, like for Brood_data in the WYT pipeline.
We are in the process of getting feedback on the quality check procedure, report and protocol document from the advisory council.
Before we send them the documents, I am fixing some bugs/issues in the quality check and pipeline codes that are revealed by the quality check procedure.
The finished pipelines of advisory council members are: NIOO, UAN, WYT, MON and PFN.
Quality check will be run on subsets of the pipeline outputs (approximately 5 years) so that the quality check reports are not terrifyingly large.
Quality check protocol document is here: https://github.com/SPI-Birds/documentation/blob/master/quality_check/SPI-Birds_quality-check-protocol_v1.0.pdf
create_individual_UAN()
to use Capture_data instead of unprocessed capture information.