-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update zika instructions #147
Conversation
--fstem zika | ||
git clone https://github.com/nextstrain/zika.git | ||
cd zika | ||
git checkout persephone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm out of the loop. What's persephone
and why use it instead of the default branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! The reasoning was that the default branch doesn't include any ingest rules. Since we're still in the midst of designing the golden path but to meet the need to keep our zika build up-to-date, I created a persephone
branch to "live long enough to update the build" but will be replaced by the official path in the future. The persephone
branch also includes changes to the build parameters (e.g. indexing by "accession" instead of strain name) which were a necessary divergence from the default branch.
This PR is documenting the "reality" of updating the build, but—I agree—should point at the default branch when the default branch includes ingest
and subsequent build parameter modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation. Was the latest update to nextstrain.org/zika made using persephone
or the default branch? If using persephone
, I'd think to merge that into the default branch even if the golden path is not yet set in stone. Otherwise, what's seen at github.com/nextstrain/zika would be outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest update was indeed using persephone
, the old method (ViPR) doesn't work. The changes to ingest and phylogenetic are here:
Feel free to glance through, I’m certainly willing to submit a persephone
PR but am certain it will encounter a block, given that some of the modifications overlap with an ongoing Dengue PR that is currently delayed and unmerged (nextstrain/dengue#13). Additionally, there are specific changes related to zika, such as the inclusion of fauna-zika-data processing steps and the merging of USVI data. And given that some people might be pointed to zika as a “template for new pathogens”, it might be more prudent to cherry-pick generally accepted changes from persephone
as smaller PRs later. (Open to suggestions on how to split these out.)
The current github.com/nextstrain/zika is outdated but so is the current github.com/nextstrain/dengue .
You make good points! (And may yet be incorporated in subsequent PRs.) But I'd still scope this fauna PR as replacing the non-working ViPR-ingest documentation with some currently working zika pipeline documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m certainly willing to submit a persephone PR but am certain it will encounter a block
I'm sorry if the review process has been a hindrance to your work! I did not realize you were running builds off of branches. I would make a PR and merge it so that the default branch is not outdated. We can do post-merge reviews and incremental updates later. For reproducibility and keeping everyone on the same page, I think it's prudent that the production builds are made using the default branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that some people might be pointed to zika as a “template for new pathogens”, it might be more prudent to cherry-pick generally accepted changes from persephone as smaller PRs later
I still think we can cherry-pick the phylogenetic
modifications in smaller PRs later, the review process is fine as long as the live builds can be updated concurrently and those steps are documented somewhere.
Could I at least merge this fauna PR? This way we have appropriately documented the status quo?
I can submit a persephone
zika PR separately but don’t want it to be a pre-requisite to this PR. And if persephone
is eventually merged, I'm happy to make the subsequent change to builds/ZIKA.md
by basically deleting a few persephone
s from link urls and removing a git checkout
statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could I at least merge this fauna PR? This way we have appropriately documented the status quo?
Sure, I'm not against merging this. For future reference, I think it would be less likely to get out-of-sync and/or forgotten if documented within the zika repo itself.
The old instructions were written for ViPR which became obsolete and was replaced by BV-BRC. The old instructions no longer work and we have since moved to using NCBI datasets for downloading sequences and metadata files. The filtering steps are already part of the phylogenetic build steps so are no longer a consideration during ingest. Point team members to how to ingest recent zika data and push to the nextstrain data endpoint. Point team members to the current phylogenetic build steps.
Co-authored-by: Victor Lin <13424970+victorlin@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the new instructions no longer use or refer to fauna, I'd be inclined to just add link to the zika repo's README and add the detailed instructions there. This way we only have to maintain a single set of instructions.
Agree, my rewording basically refer to the READMEs in After thinking about the audience and purpose of this document, I envisioned a new hire pointed at Therefore, the instructions should walk that new hire through pulling the appropriate branch, ingesting the data, uploading that data, and updating the build. Each section provides commands they can copy and paste, while also linking to the official READMEs for more detail. In comparison, the official zika README is more general purpose, for an external user to run their own zika builds or modify it for their pathogen or biological question of interest. |
This PR has a well-defined small scope – to update outdated instructions – and I think great to have it merged. The outdated instructions prompted two broader discussion points which we'll continue on Slack:
|
Description of proposed changes
The old instructions were written for ViPR which became obsolete and was replaced by BV-BRC. The old instructions no longer work and we have since adopted NCBI datasets for downloading sequences and metadata files.
The filtering by length and collection year are already part of the phylogenetic build steps so are no longer a consideration during ingest.
This PR updates our instructions on how to ingest recent zika data and push to the nextstrain data endpoint. This PR also points to the current phylogenetic build steps.
Related issue(s)
Checklist