Skip to content

Commit

Permalink
extended README
Browse files Browse the repository at this point in the history
  • Loading branch information
eppinglen committed Aug 8, 2017
1 parent 5f9b4c6 commit a5297ef
Show file tree
Hide file tree
Showing 11 changed files with 103 additions and 50 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Version 0.0.1 of SeroBA
Version 0.1.3 of SeroBA: commandline parameter parser for k-mer coverage, adding updated database to SeroBA including serotypes 6E, 6F, 11E, 10X, 39X and two NT references
Version 0.1.3 of SeroBA: commandline parameter parser for k-mer coverage, adding updated database to SeroBA including serotypes 6E, 6F, 11E, 10X, 39X, 35D and two NT references
Version 0.1.4 bug fix with serogroup 6 and 11, bug fix unittest in for createDBs
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ SeroBA is a k-mer based Pipeline to identify the Serotype from Illumina NGS read
## Usage
Since SeroBA v0.1.3 an updated variant of the CTV from PneumoCat is provided in the SeroBA package. This includes the serotypes 6E, 6F, 11E, 10X, 39X and two NT references. It is not necessary to use SeroBA getPneumocat. You can directly start with seroba createDBs with the database folder from this repository. It is recommended to make a working copy of it.
```
usage: seroba getPneumocat out_dir
usage: seroba getPneumocat <database dir>
Downloads PneumoCat and build an tsv formatted meta data file out of it
positional arguments:
out_dir directory to store the PneumoCats capsular type variant (CTV) database
database dir directory to store the PneumoCats capsular type variant (CTV) database
usage: seroba createDBs <database dir> <kmer size>
Expand All @@ -21,12 +21,14 @@ positional arguments:
usage: seroba runSerotyping [options] <databases directory> <read1> <read2> <prefix>
identify serotype of your input data
Example : seroba createDBs my_database/ 71
Identify serotype of your input data
positional arguments:
databases path to database directory
database dir path to database directory
read1 forward read file
read2 backward read file
read2 reverse read file
prefix unique prefix
optional arguments:
Expand All @@ -39,6 +41,7 @@ positional arguments:
Summaries the output in one tsv file
usage: seroba summary <output folder>
Expand All @@ -54,7 +57,8 @@ In the folder 'prefix' you will find a pred.tsv including your predicted serotyp
as well as en file called detailed_serogroup_info.txt including information about
snps, genes, and alleles that are found in your reads.
After the use of "seroba summary" a tsv file called summary.tsv is created that
consists of two columns (sample Id , serotype).
consists of three columns (sample Id , serotype, comments).
Serotypes that do not match any reference are marked as "untypable"(v0.1.3).

## Database
You can use the CTV von PneumoCat by using seroba getPneumocat. It is also
Expand Down
49 changes: 49 additions & 0 deletions pneumocat_db_test.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
01 01 01 1 ref
06A 06A 06A 1 ref
06B 06A 06B 1 ref
06C 06A 06C 1 ref
06D 06A 06D 1 ref
06E 06A 06E 1 ref
07A 07A 07A 1 ref
07B 07B 07B 1 ref
07C 07B 07C 1 ref
07F 07A 07F 1 ref
40 07B 40 1 ref
wciN-1 06A 06A 0 allele
wciN-1 06A 06B 0 allele
wciN-2 06A 06C 0 allele
wciN-2 06A 06D 0 allele
wciN-3 06A 06E 0 allele
wciP_06A 06A 06A 0 snps 583 AGT
wciP_06B 06A 06B 0 snps 583 AAT
wciP_06C 06A 06C 0 snps 583 AGT
wciP_06D 06A 06D 0 snps 583 AAT
wcwD 07A 07A 1 pseudo 1
wcwD 07A 07F 1 pseudo 0
wcwK_07B 07B 07B 0 snps 145 CTT
wcwK_07B 07B 07B 0 snps 385 TTT
wcwK_07B 07B 07B 0 snps 46 GAT
wcwK_07B 07B 07B 0 snps 487 ACT
wcwK_07B 07B 07B 0 snps 706 CAT
wcwK_07B 07B 07B 0 snps 880 CTT
wcwK_07B 07B 07B 0 snps 928 AAT
wcwK_07B 07B 07B 0 snps 937 GCA
wcwK_07B 07B 07B 0 snps 946 GGT
wcwK_07C 07B 07C 0 snps 145 CTT
wcwK_07C 07B 07C 0 snps 385 TGT
wcwK_07C 07B 07C 0 snps 46 GGT
wcwK_07C 07B 07C 0 snps 487 GCT
wcwK_07C 07B 07C 0 snps 706 CAT
wcwK_07C 07B 07C 0 snps 880 CTT
wcwK_07C 07B 07C 0 snps 928 AAT
wcwK_07C 07B 07C 0 snps 937 GAA
wcwK_07C 07B 07C 0 snps 946 GGT
wcwK_40 07B 40 0 snps 145 TTT
wcwK_40 07B 40 0 snps 385 ACT
wcwK_40 07B 40 0 snps 46 AAT
wcwK_40 07B 40 0 snps 487 ACT
wcwK_40 07B 40 0 snps 706 TAT
wcwK_40 07B 40 0 snps 880 TTT
wcwK_40 07B 40 0 snps 928 AGT
wcwK_40 07B 40 0 snps 937 GAA
wcwK_40 07B 40 0 snps 946 GAT
2 changes: 1 addition & 1 deletion scripts/seroba
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ subparsers = parser.add_subparsers(title='Available commands', help='', metavar=
subparser_getPneumocat = subparsers.add_parser(
'getPneumocat',
help='downloads genetic information from PneumoCat',
usage='seroba getPneumocat <out_dir>',
usage='seroba getPneumocat <database dir>',
description='Downlaods PneumoCat and build an tsv formated meta data file out of it',
)
subparser_getPneumocat.add_argument('out_dir', type=str, \
Expand Down
2 changes: 1 addition & 1 deletion seroba/tasks/getPneumocat.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@


def run(options):
pneumo = get_pneumocat_data.GetPneumocatData(options.out_dir)
pneumo = get_pneumocat_data.GetPneumocatData(options.database_dir)
pneumo.run()
18 changes: 9 additions & 9 deletions seroba/tests/data/09V/detailed_serogroup_info.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@ Predicted Serotype: 09V
Serotype predicted by ariba :09A
assembly from ariba as an identiy of: 99.99 with this serotype
Serotype Genetic Variant
09L snps ['wcjA', '852', 'GCA']
09L snps ['wcjA', '429', 'GGT']
09L snps ['wcjA', '528', 'GAT']
09L snps ['wcjA', '636', 'GAT']
09L genes wcjD
09N snps ['wcjA', '957', 'ACT']
09N snps ['wcjA', '414', 'TAT']
09N genes wcjD
09V pseudo wcjE
09V genes wcjD
09V pseudo wcjE
09N genes wcjD
09N snps ['wcjA', '414', 'TAT']
09N snps ['wcjA', '957', 'ACT']
09A genes wcjD
09L genes wcjD
09L snps ['wcjA', '852', 'GCA']
09L snps ['wcjA', '636', 'GAT']
09L snps ['wcjA', '528', 'GAT']
09L snps ['wcjA', '429', 'GGT']
4 changes: 2 additions & 2 deletions seroba/tests/data/15B_C/detailed_serogroup_info.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ Serotype predicted by ariba :15C
assembly from ariba as an identiy of: 99.38 with this serotype
Serotype Genetic Variant
15B pseudo wciZ
15C allele wzd
15C allele wchL
15B allele wzd
15B allele wchL
14 changes: 7 additions & 7 deletions seroba/tests/data/ERR1438851/detailed_serogroup_info.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@ Predicted Serotype: 09N
Serotype predicted by ariba :09L
assembly from ariba as an identiy of: 99.53 with this serotype
Serotype Genetic Variant
09N snps ['wcjB', '789', 'ACC']
09V pseudo wcjE
09N snps ['wzy', '846', 'AAC']
09N snps ['wchA', '504', 'TCT']
09N snps ['wchA', '879', 'TCA']
09N snps ['wcjA', '957', 'ACT']
09N snps ['wcjA', '852', 'TCA']
09N snps ['wcjA', '429', 'AGT']
09N snps ['wcjA', '414', 'TAT']
09N snps ['wcjA', '528', 'GGT']
09N snps ['wcjA', '636', 'AAT']
09V pseudo wcjE
09N snps ['wcjA', '528', 'GGT']
09N snps ['wcjA', '429', 'AGT']
09N snps ['wcjA', '957', 'ACT']
09N snps ['wchA', '879', 'TCA']
09N snps ['wchA', '504', 'TCT']
09N snps ['wcjB', '789', 'ACC']
20 changes: 10 additions & 10 deletions seroba/tests/data/ERR1439287/detailed_serogroup_info.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@ Predicted Serotype: 07C
Serotype predicted by ariba :07C
assembly from ariba as an identiy of: 99.86 with this serotype
Serotype Genetic Variant
07B snps ['wcwK', '928', 'AAT']
07B snps ['wcwK', '946', 'GGT']
07B snps ['wcwK', '880', 'CTT']
07B snps ['wcwK', '706', 'CAT']
07B snps ['wcwK', '145', 'CTT']
40 snps ['wcwK', '937', 'GAA']
07C snps ['wcwK', '928', 'AAT']
07C snps ['wcwK', '385', 'TGT']
07C snps ['wcwK', '937', 'GAA']
07C snps ['wcwK', '46', 'GGT']
07C snps ['wcwK', '946', 'GGT']
07C snps ['wcwK', '880', 'CTT']
07C snps ['wcwK', '706', 'CAT']
07C snps ['wcwK', '46', 'GGT']
07C snps ['wcwK', '928', 'AAT']
07C snps ['wcwK', '487', 'GCT']
07C snps ['wcwK', '145', 'CTT']
07C snps ['wcwK', '385', 'TGT']
07C snps ['wcwK', '706', 'CAT']
40 snps ['wcwK', '937', 'GAA']
07B snps ['wcwK', '946', 'GGT']
07B snps ['wcwK', '880', 'CTT']
07B snps ['wcwK', '928', 'AAT']
07B snps ['wcwK', '145', 'CTT']
07B snps ['wcwK', '706', 'CAT']
10 changes: 5 additions & 5 deletions seroba/tests/data/ERR1439321/detailed_serogroup_info.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@ Predicted Serotype: 11A
Serotype predicted by ariba :11A
assembly from ariba as an identiy of: 99.17 with this serotype
Serotype Genetic Variant
11C pseudo gct
11C genes wcwC_11A
11A snps ['wcrL', '334', 'AAT']
11A pseudo gct
11C pseudo gct
11D genes wcwC_11A
11D pseudo gct
11A genes wcwC_11A
11A pseudo gct
11A snps ['wcrL', '334', 'AAT']
11B genes wcwC_11A
11D pseudo gct
11D genes wcwC_11A
16 changes: 8 additions & 8 deletions seroba/tests/data/ERR1440275/detailed_serogroup_info.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@ Predicted Serotype: 09L
Serotype predicted by ariba :09L
assembly from ariba as an identiy of: 99.98 with this serotype
Serotype Genetic Variant
09L snps ['wcjB', '789', 'GCC']
09V pseudo wcjE
09A pseudo wcjE
09L snps ['wzy', '846', 'GAC']
09L snps ['wchA', '504', 'TAT']
09L snps ['wchA', '879', 'CCA']
09L snps ['wcjA', '957', 'ATT']
09L snps ['wcjA', '852', 'GCA']
09L snps ['wcjA', '429', 'GGT']
09L snps ['wcjA', '414', 'CAT']
09L snps ['wcjA', '528', 'GAT']
09L snps ['wcjA', '636', 'GAT']
09V pseudo wcjE
09A pseudo wcjE
09L snps ['wcjA', '528', 'GAT']
09L snps ['wcjA', '429', 'GGT']
09L snps ['wcjA', '957', 'ATT']
09L snps ['wchA', '879', 'CCA']
09L snps ['wchA', '504', 'TAT']
09L snps ['wcjB', '789', 'GCC']

0 comments on commit a5297ef

Please sign in to comment.