-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a no-curation option for NCBI Ingest #30
Labels
enhancement
New feature or request
Comments
I'm not sure we would need custom rules here... It should be enough to document the NCBI Datasets output target |
joverlee521
added a commit
that referenced
this issue
Mar 28, 2024
Provides an easy way for first time users to get the uncurated metadata from NCBI Datasets commands by running the ingest workflow with the specified target `data/ncbi_dataset_report.tsv`. Afterwards, users can easily remove fields that are not needed as part the workflow to reduce the file size and save space. Prompted by @jameshadfield in review of the tutorial¹ and resolves #30. ¹ nextstrain/docs.nextstrain.org#195 (comment)
joverlee521
added a commit
that referenced
this issue
Mar 29, 2024
Provides an easy way for first time users to get the full uncurated metadata from NCBI Datasets commands by running the ingest workflow with the specified target `dump_ncbi_dataset_report`. They can then inspect and explore the raw data to determine if they want to configure the workflow to use additional fields from NCBI. The rule is added to `fetch_from_ncbi.smk` to make it easy to run without additional configs. Note that it is not run as part of the default workflow and only intended to be used as a specified target. Prompted by @jameshadfield in review of the tutorial¹ and resolves #30. ¹ nextstrain/docs.nextstrain.org#195 (comment) Co-authored-by: James Hadfield <hadfield.james@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Context
Inspired by a review comment of the measles/add-ingest pull request, propose an optional rule for skipping (or bypassing) curation steps to produce raw outcomes. These outcomes can then guide the selection of specific curation steps or metadata field transformations.
This idea is loosely connected to the effort of standardizing NCBI field transformations, as discussed in #20.
Description
Allow users to initiate ingest to produce an uncurated
metadata.tsv
file, from which they can compare against the curatedmetadata.tsv
file. Users can then decide which curation steps to opt-in for or choose to use our defaults.Examples
Possible solution
One potential approach includes utilizing the "custom_rules" config in the Snakefile. Users can then import "no-curate" rules that take data/ncbi.ndjson , pass it through augur curate passthru , and generate an uncurated
metadata.tsv
file.The text was updated successfully, but these errors were encountered: