-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include option to prune outgroup in augur tree or augur refine #340
Comments
Would this be better as a very divergent root wouldn't affect treetime clock inference? Would then have to ensure treetime didn't reroot! |
This is true. Divergent outgroup could affect clock. |
Leaving the following code snippets here which we're using to perform this function prior to building it into augur:
|
I think this is worth attending to at some point. There are a bunch of SARS-CoV-2 analyses that now want to work with just a more recent clade rather than a full tree. Here, these trees should still be rooted off of Wuhan-Hu-1/2019, but we'd like Wuhan-Hu-1/2019 removed during the |
Bumping this issue as a solution to the incorrect rooting problem we are seeing in seasonal influenza builds. (see Slack thread). |
Add reference outgroup for 2y and 6m builds to avoid incorrect rooting of trees. The reference is pruned from the tree after the refine step using James' code snippet¹. The references added are the references used by Nextclade, but with the strain names changed to match the strain name within fauna. ¹ nextstrain/augur#340 (comment)
Add reference outgroup for 2y and 6m builds to avoid incorrect rooting of trees. The reference is pruned from the tree after the refine step using James' code snippet¹. The references added are the references used by Nextclade, but with the strain names changed to match the strain name within fauna. ¹ nextstrain/augur#340 (comment)
Add reference outgroup for 2y and 6m builds to avoid incorrect rooting of trees. The reference is pruned from the tree after the refine step based on James' code snippet¹. The references added are the references used by Nextclade, but with the strain names changed to match the strain name within fauna. ¹ nextstrain/augur#340 (comment)
+1 for a flag that removes the outgroup. As I mentioned in a related PR, we may want to rename |
Another bump, and a comment 😉 I'd suggest this be done inside In If we're not inferring a timetree then it's much simpler as we can prune the root straight after rerooting, and this change would be part of |
@joverlee521, @j23414, and I looked into this a bit more today and came to the same conclusion that supporting a In our discussion today, we wondered why rooting (and pruning of the root) isn't part of I see an argument against the |
Hi- @huddlej @joverlee521 and @j23414 I hope I have understood this thread correctly - at the moment TreeTime supports rooting the tree on an outgroup - the outgroup needs to be given to the |
Ideally this would not be the case -- the outgroup shouldn't bias the clock calculations (as it's often really divergent). The easiest solutions seems to be to prune the root in |
we could add such a
this would also remove some duplication here: |
@rneher's point above makes it clearer to me how we could do this without touching TreeTime internals. Thinking through the user experience of asking
To avoid handling each of these use cases, we could implement If we want to support removing the root for all other The alternative suggestion of implementing the root/remove root in
After working through these examples, I think implementing the
We don't actually use the term "outgroup" anywhere in the |
Thanks for this really detailed take John and working through with examples! I agree with you. In particular: I think moving rooting/removing-root to I also agree about "remove root" being only really useful in cases where we unambiguously know what the user wants removing. For using The only remaining case I'd perhaps be on the fence about is in specifying two sequences where the monophyletic MRCA is the "root" (your example 2). If you've selected an outgroup well and you know what you're doing, then it'll remove exactly what you expect, and this could be useful (sometimes supplying 2-3 outgroup seqs can give better results than just 1 seq, in my experience). However, if you are using slightly less 'clear' outgroups you might end up deleting much more than you intended to (especially if you don't check that your tree came out how you think before diving into I think in the end I'd probably vote for those two use-cases to be supported (one seq supplied, or two and remove the whole clade), but not supporting more ambiguous options. |
If we're rooting based an outgroup, it should be common to then remove this outgroup / root when displaying tree and doing analyses. I believe a good approach could be something like the following:
So, we explicitly specify an outgroup to root on using existing
--root
functionality. We then make a new argument--remove-outgroup
that prunes an outgroup if it exists. This means that there is a single tip subtending one of the branches from the root node of the tree. If there doesn't exist a single tip, then--remove-outgroup
does nothing.Alternatively, some of the outgroup rooting logic could be pushed into
augur tree
, which is a lot slimmer thanrefine
. Thenrefine
could be just run with--keep-root
when handed a tree rooted byaugur tree
that's already had outgroup pruned.The text was updated successfully, but these errors were encountered: