Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest #24

Closed
wants to merge 6 commits into from
Closed

Ingest #24

wants to merge 6 commits into from

Conversation

j23414
Copy link
Contributor

@j23414 j23414 commented Dec 13, 2022

Description of proposed changes

See commit messages.

Related issue(s)

Testing

  • Checks pass

If reviewer wants to run a local check as documented in ingest/README.md and README.md:

# Checkout branch
git clone https://github.com/nextstrain/zika.git
cd zika
git checkout ingest

# Test Ingest
cd ingest
nextstrain build .
ls -ltr data

# Test build
cd ..
cp -r ingest/data data
nextstrain build .
nextstrain view auspice

@j23414 j23414 marked this pull request as draft April 12, 2023 18:05
@j23414 j23414 marked this pull request as ready for review May 4, 2023 07:45
@j23414 j23414 requested a review from a team May 4, 2023 07:46
Copy link
Member

@victorlin victorlin May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, 342ead8 is copying from nextstrain/dengue#6 which copies from the monkeypox repo in nextstrain/dengue@4b19154 and makes additional changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yes! I wish there was a simpler way

Future commits will change this to work with zika data
Since the original zika ingest only pulled data from 2013 onward, modify
the genbank-url script to only pull data from 2013 onward.
Since Ingest pushes data to a different endpoint, update zika build to pull
from the new endpoint. Other modifications from the original described below:

Since strains (or isolates) may be re-sequenced resulting in duplicate strain
names breaking Nextstrain builds, other pathogen repos (e.g. Monkeypox) had
switched to indexing records by GenBank ID instead (`rule wrangle_metdata`)
and swapping in the final strain name at the end (`rule final_strain_name`)
and Zika was updated accordingly.

Other changes include updating the list of dropped strain names to GenBank IDs
(where the original strain names moved to comments) and updating the example
sequence fasta headers. In addition, added the strain name as a Tip Label
dropdown item as discussed in:

  #25 (comment)
@j23414 j23414 force-pushed the ingest branch 2 times, most recently from 6842ed2 to fd166ba Compare July 5, 2023 18:41
@j23414 j23414 marked this pull request as draft July 26, 2023 20:54
Rescue some of the original functionality of the zika_upload script from fauna.
https://github.com/nextstrain/fauna/blob/master/vdb/zika_upload.py#L14-L30

* Remove monkeypox annotations.tsv
* Move strain name, location, and date fixes to annotations.tsv
* Match strain names in fauna database
* Match locations in fauna database if both region and country names do not match
* Match dates in fauna database unless genbank has been updated
@j23414
Copy link
Contributor Author

j23414 commented Nov 9, 2023

There have been changes to how we add ingest so closing this PR. A new attempt can start from a blank slate.

@j23414 j23414 closed this Nov 9, 2023
@j23414 j23414 deleted the ingest branch November 27, 2023 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

2 participants