## Overview After the barcodes have been identified, the reads need to be aligned to the reference genome. In our standard ligation scheme, only read 1 contains genomic DNA, so only read 1 is aligned. We do not perform a paired-end alignment, despite having paired-end reads. ## Usage Any aligner will work. We use bowtie2: ``` bowtie2 -p 10 -t --phred33 -x -U example.R1.barcoded_full.fastq.gz | samtools view -bq 20 -F 4 -F 256 - > example.DNA.bowtie2.mapq20.bam ``` * -p specifies the number of cpu threads to use, -t prints the wall clock time required to load the index and perform the alignment, --phred33 specified the read quality encoding, which in this case is the latest one used by Illumina sequences, -x points to the location of bowtie2 indexes and -U specifies the location of the fastq file to be aligned. * We filter on MAPQ score of 20, outputting only mapped reads (-F 4) and removing reads that are not the primary alignment (-F 256). If the reference sequences used came from Ensembl, we additionally convert chromosomes to the UCSC style (chr1, chr2, etc.). ``` python ensembl2ucsc.py -i example.DNA.bowtie2.mapq20.bam -o example.DNA.chr.bam --assembly mm10 ``` ### Input and output files Example input: @HISEQ:623:HY5KHBCXX:2:2206:7231:7108::[DPM6A5][NYBot35_Stg][NOT_FOUND][Even2Bo19][Odd2Bo69] ATTGGTAGGTCGGAATTGCACGCTGTAGCGGCATGCTGATGGAGAGGAGAGACTTCTAGCTAGCTACGTGACTGATCCGCACACTGCGACACGTGATCGC + IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII Example output: HISEQ:623:HY5KHBCXX:2:2206:7231:7108::[DPM6A5][NYBot35_Stg][NOT_FOUND][Even2Bo19][Odd2Bo69] 0 chr1 17644 255 100M * 0 0 ATTGGTAGGTCGGAATTGCACGCTGTAGCGGCATGCTGATGGAGAGGAGAGACTTCTAGCTAGCTACGTGACTGATCCGCACACTGCGACACGTGATCGC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 MD:Z:100