Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion
Requirements
Description
Splitting the large checksum file into smaller ones.
Use case
Timeout errors are popping up in the Checksum step of the pipeline because of the 'LOAD DATA INFILE' command being run on a very large file. This fix consists of splitting the big file into multiple smaller ones and running the command on each. At the end, the code combines these smaller files into 1 to revert back to the previous state of things at the end of running.
This change is also accompanied by another in the DB model (ensemb-py) to set the engine for the checksum_xref table to MyISAM as this decreases the probability of getting the error.
Benefits
Probability of errors decreases.
Possible Drawbacks
If applicable, describe any possible undesirable consequence of the changes.
Testing
Dependencies