Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to disallow gap and/or N and/or non-AGTC ? #28

Closed
tseemann opened this issue Oct 28, 2015 · 4 comments
Closed

Option to disallow gap and/or N and/or non-AGTC ? #28

tseemann opened this issue Oct 28, 2015 · 4 comments

Comments

@tseemann
Copy link
Contributor

We are impressed by the speed of this tool (due to being C code).

A very useful feature we need to the ability to also filter out things like:

  • gap -
  • N
  • non-AGTC eg * and X etc

These would need to be independent options.

Ideally the current default behaviour to remove conserved (monmorphic) sites could also be an option. eg. so we could remove all columns with a gap only and leave the rest.

@ONeillMB1
Copy link

Great tool! We also are very impressed with the speed!

Building off the recommendation of @tseemann, it would also be nice to be able to impose thresholds for tolerable amounts of missing data. For example, if 75% of samples in the alignment have data and there exists a SNP among them, retain the data in the SNP alignment.

@andrewjpage
Copy link
Member

I've added in a 'pure' mode as discussed with @tseemann and a 'keep monomorphic' mode (so it will work with BEAST).

@jamiethompson77
Copy link

This does work with amino acids though right?

@tseemann
Copy link
Contributor Author

tseemann commented Apr 8, 2020

@jamiethompson77 i think AGTC is quite hard-coded into the tool, sorry. snp is in the name and that is DNA specific.

@tseemann tseemann closed this as completed Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants