-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added relevant COI mock community info and rep seqs #92
base: master
Are you sure you want to change the base?
Conversation
Thanks @devonorourke ! It looks like the tests failed; could you please fix those and then I can review once tests path? The error suggests that the dataset metadata file's header line is space delimited not tab delimited. |
Sorry; I fixed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @devonorourke ! Just a couple minor comments and a request.
In addition to the source formats (which can be provided as-is), would it be possible to provide expected taxonomy files?
- See here for an example file
- the directory structure should be
.../mock-29/<database-name>/<database-version-or-download-date-MMDDYYYY>/<OTU-cluster-percent>/
- The taxonomy file will contain taxon names (as row names) that match valid taxa in the reference database/version/otu% that you used. ideally these should be formatted for use with QIIME 2 (e.g., semicolon-delimited
- The "database identifier" file is a list of reference database identifiers that match the expected taxon names
- If you base this off of a custom database, just make sure the database is available on github, zenodo, or elsewhere (I think this is what you are already doing with your databases, correct?), and make sure it is all well documented (e.g., you can link to a github repo with code describing how the database was made)
- Note, a long time ago I put together some shoddy untested code for automatically generating the expected taxonomy files. Specifically, you want this.
The expected taxonomy files are not required at submission, so if this is too much to ask right now that is fine.
Thanks!
@@ -0,0 +1,12 @@ | |||
# mock-coi1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's call this mock-29 (to keep consistent)
Note: | ||
The mock sample described above was sequenced in conjunction with hundreds of bat guano samples in a single MiSeq run. All data are availble as BioSamples [here at NCBI](https://www.ncbi.nlm.nih.gov/bioproject/518082). Individual sequence data specific to the mock sample are found in the `dataset-metadata.tsv` document. | ||
|
||
These reads contain dual-index barcodes modeled after the Schloss lab [workflow described here](https://github.com/SchlossLab/MiSeq_WetLab_SOP/blob/master/MiSeq_WetLab_SOP.md). Reads were processed in QIIME2 as described in [this GitHub repo](https://github.com/devonorourke/tidybug/blob/master/docs/sequence_filtering.md#raw-sequence-data-processing). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be useful to provide a snippet of code showing how to import these reads into QIIME 2 (note that dual-index barcode support is now available in QIIME 2!)
Hi mockrobiota folk,
I've added the fasta file for the mock COI dataset I've used in a few bat guano related projects. Though I don't have a publication to link these data to at the moment, @nbokulich is on the forthcoming paper that describes their use. Reads are dumped as BioSamples via NCBI and I've provided a link in the
README.md
file for users to access.Please let me know what other information you'd like me to add.
Cheers