## Overview To define SPRITE clusters, all reads that have the same barcode sequence are grouped into a single cluster. To remove possible PCR duplicates, all reads with the same genomic position and an identical barcode are removed. We generate a SPRITE cluster file for all subsequent analyses, where each cluster occupies one line of the resulting text file containing the barcode name and genomic alignments. ## Usage The SPRITE cluster generation script can be run as follows, where N is the number of tags in a barcode: ``` python get_clusters.py --input example.DNA.chr.masked.bam --output example.clusters --num_tags N ``` where `N` is the number of tags in a barcode. ### Output file format The below line represents a single cluster of size three. The first column is the barcode itself ending with the sample name. Each subsequent column contains the type of SPRITE library (DNA), the strand in square brackets, followed by the alignment chromosome, start and end coordinates: ``` DPM6A5.NYBot35_Stg.Odd2Bo71.Even2Bo19.Odd2Bo6.example DNA[+]_chr1:18884355-18884455 DNA[-]_chr1:18834000-18834100 DNA[+]_chr1:200041887-200041900 ```