Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore how to add context to filtered selection #1241

Open
jameshadfield opened this issue Dec 10, 2020 · 7 comments
Open

Explore how to add context to filtered selection #1241

jameshadfield opened this issue Dec 10, 2020 · 7 comments

Comments

@jameshadfield
Copy link
Member

A common use case of auspice is to filter to a set of samples (e.g. https://nextstrain.org/ncov/europe?f_country=Spain&label=clade:20A.EU1) and wish to understand them in context.

A possible approach is to mirror the priority calculations currently used in our nCoV subsampling and allow users to choose a number or contextual samples to include. A slight variant of this would be to use a SNP cutoff or similar.

First pass at a UI:
image

cc @cassiawag

@emmahodcroft
Copy link
Member

emmahodcroft commented Dec 10, 2020

I'm not sure I understand the issue here - is this just... non-filtered but contextual tips to make visible? Could you not just zoom in to part of the tree (which is, in itself, a genetic filter, in a way) and then turn the filter for country off? (Then once the other issues are resolved, download just that part of the tree, etc?)

@cassiawag
Copy link
Contributor

@emmahodcroft: This doesn't work well for when you're investigating outbreaks that are epi-linked, but not actually genetically linked. For example, if I'm exploring outbreaks at a facility, and there are two separate introductions in the facility. I'll have to pan out, find where my two introductions + outbreaks fall on the tree, zoom in separately to each introduction to see what's related. It works, but it's cumbersome. Being able to see closely related sequences automatically, especially when combined with #1132, would really improve functionality.

I'm not sure how much of a group by location filter current ncov subsampling strategies use, but I could imagine that being a problem if it was too stringent since you really just want the most genetically related sequences. Otherwise, this is a neat first pass!

@emmahodcroft
Copy link
Member

emmahodcroft commented Dec 11, 2020

Ok, I think I'm just still not understanding the actual proposal - sorry!! 🙃 I think I'm also a bit confused about whether this is an auspice or ncov (/augur filter) feature?

I guess what I'm not understanding is what are 'contextual samples'? Are these... samples from the same place? Having the same metadata value? Genetically similar? (But your comment says 'not actually genetically linked'.) And how is seeing them different from removing (or temporarily disabling) a filter? (Wouldn't two introductions still be just as far apart on the tree?)

Maybe a specific example could be detailed? I feel like I'm just wrapping my head around this issue - sorry!

@cassiawag
Copy link
Contributor

cassiawag commented Dec 11, 2020

Thanks for trying to understand, @emmahodcroft! I really appreciate your feedback and perspective on this.

For background: A general challenge that we are running into in collaborations with public health labs is using a larger tree to interrogate smaller outbreaks. The big trees contain so many nodes, and it can be hard to visualize the “trees through the forest” (hah!). Thus, many of new PR’s focus on ways to subset/slice the tree easily. If you or I were investigating an outbreak, we’d probably build a tree with outbreak samples + contextual samples for each outbreak. But w/ public health groups often the person building the tree and the one using the tree are not the same. It’s not necessarily feasible to build a separate tree for each outbreak of interest.

It’s quite common samples all linked to the same location might be from separate introductions.
image (1)

Of course, we’d like to see the genetic context of each of these separate introductions (this is what I meant by contextual samples). You can definitely get the genetic context by zooming in to that branch of the tree (which you’ll want to do to get a closer look at relationships). But this PR would enable a user to visualize the genetic context for multiple introductions at the same time, and with big trees where tips get buried behind tips, that can be quite difficult to do.

However, I think the benefit mostly comes when combined with (#1235) because then you can download a tree containing samples of interest + appropriate contextual samples.

Does that make sense? Happy to also talk offline more about this.

@emmahodcroft
Copy link
Member

Ahhh so if I'm understanding: it would show sequences 'around' the ones selected right now (similar in practice to if you zoomed in on each point in the tree and turned off the filter) but without having to zoom in and do this individually?

So context are genetically similar to each 'selected' sequence, even if the 'selected' sequences are not genetically similar to each other.

I guess you'd still have often have to zoom in to see them in more detail but at least you wouldn't have to remember to toggle/untoggle the filters to do so and not lose your place?

@cassiawag
Copy link
Contributor

So context are genetically similar to each 'selected' sequence, even if the 'selected' sequences are not genetically similar to each other.

Exactly!

@emmahodcroft
Copy link
Member

Ok! I think I finally understand this now - thanks for taking the time to explain it so well @cassiawag !!

I think this indeed seems like a good idea and a useful one. Might take a bit of thinking on how to set 'context sequences' so that it's useful across different use-cases, but I'm sure there are good solutions to be found!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants