Skip to content

TAMA GO: Split Files

GenomeRIK edited this page Feb 21, 2020 · 7 revisions

This set of tools in TAMA-GO is used to split files for parallel processing.

tama_mapped_sam_splitter.py

To split mapped and sorted SAM/BAM files for input into TAMA Collapse use tama_mapped_sam_splitter.py.

You can use either SAM or BAM files for splitting but the output will always be SAM. Also make sure to sort your SAM/BAM files prior to splitting as the splitter maintains the sort order while keeping chromosomes intact. Multi-maps to different chromosomes will need to be addressed using TAMA Merge.

USAGE:

python tama_mapped_sam_splitter.py sam_file num_output output_prefix

sam_file - This is the mapped and sorted SAM or BAM file.

num_output - This is the number of splits you want to make. However, the true output may differ from this if there are large chromosomes which interfere with even splitting. For example, if you put 10 here then you will get at least 10 split files but possibly more depending on chromosome sizes.

output_prefix - This prefix will be used for the output files along with the number of the split file.

'

tama_fasta_splitter.py

To split the ORF amino acid sequence fasta files before running BlastP for the ORF/NMD pipeline use tama_fasta_splitter.py.

After running BlastP, you can just concatenate the BlastP result files before running the next step of the TAMA ORF/NMD pipeline.

USAGE:

python tama_fasta_splitter.py fasta_file output_prefix num_splits

fasta_file - This is the amino acid sequence fasta file generated by tama_orf_seeker.py.

output_prefix - This is the prefix for the file names that will be generated by the splitting tool.

num_splits - This is the number of files you want to end up with. Note that the splitter will choose an approriate number based on this input which may not be exactly this number.