new data integrity checks #54

gregcaporaso · 2017-01-12T14:51:29Z

naming of OTU directories should be of the format: <similarity-threshold>-otus
expected sequences files should be named expected-sequences.fasta, and no other fasta files should be present in those directories

The text was updated successfully, but these errors were encountered:

nbokulich · 2017-01-12T20:37:53Z

I agree with item 1 but have some "devil's advocate" questions regarding item 2.

In some ways, the source directory could be useful as a sort of "junk drawer" for the mock community, and contributors could include other information that don't it elsewhere. For example, a list of Genbank accession #s for whole genome sequences (which might not be appropriate in the "expected taxonomy" directories that are specific for reference databases that provide taxonomy information). Of course, we have control over this so the files would never be "junk", just a collection of useful files that do not fit in the other directories (which are more regulated).

Naming conventions in source could also have some flexibility. For example, expected-sequences.fasta can be rather vague — instead, full-length-16S-expected-sequences.fasta or V4-domain-expected-sequences.fasta could be more informative.

What do you think?

gregcaporaso · 2017-01-12T20:50:42Z

I think that all makes sense, I'm good with it.

nbokulich · 2017-01-12T22:05:11Z

What should we do for shotgun metagenome datasets? I think I support keeping the <similarity-threshold>-otus requirement across the board for simplicity's sake, and such datasets could be labeled 100-otus. But would it be better to enforce this rule only for marker-gene datasets, and use different rules for metagenome datasets?

gregcaporaso · 2017-01-13T21:42:46Z

I think your suggestions would work well.

nbokulich · 2017-01-13T21:44:02Z

Thanks! I will make that rule standard then when I update the integrity checks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new data integrity checks #54

new data integrity checks #54

gregcaporaso commented Jan 12, 2017

nbokulich commented Jan 12, 2017

gregcaporaso commented Jan 12, 2017

nbokulich commented Jan 12, 2017

gregcaporaso commented Jan 13, 2017

nbokulich commented Jan 13, 2017

new data integrity checks #54

new data integrity checks #54

Comments

gregcaporaso commented Jan 12, 2017

nbokulich commented Jan 12, 2017

gregcaporaso commented Jan 12, 2017

nbokulich commented Jan 12, 2017

gregcaporaso commented Jan 13, 2017

nbokulich commented Jan 13, 2017