## Overview

After the barcodes have been identified, the reads need to be aligned to the reference genome. In our standard ligation scheme, only read 1 contains genomic DNA, so only read 1 is aligned. We do not perform a paired-end alignment, despite having paired-end reads.

## Usage

Any aligner will work. We use bowtie2:

```
bowtie2 -p 10 -t --phred33 -x <bowtie2_index> -U example.R1.barcoded_full.fastq.gz | samtools view -bq 20 -F 4 -F 256 - > example.DNA.bowtie2.mapq20.bam
```

* -p specifies the number of cpu threads to use, -t prints the wall clock time required to load the index and perform the alignment, --phred33 specified the read quality encoding, which in this case is the latest one used by Illumina sequences, -x points to the location of bowtie2 indexes and -U specifies the location of the fastq file to be aligned.
* We filter on MAPQ score of 20, outputting only mapped reads (-F 4) and removing reads that are not the primary alignment (-F 256).

If the reference sequences used came from Ensembl, we additionally convert chromosomes to the UCSC style (chr1, chr2, etc.). 

```
python ensembl2ucsc.py -i example.DNA.bowtie2.mapq20.bam -o
example.DNA.chr.bam --assembly mm10
```

### Input and output files

Example input:

    @HISEQ:623:HY5KHBCXX:2:2206:7231:7108::[DPM6A5][NYBot35_Stg][NOT_FOUND][Even2Bo19][Odd2Bo69]
    ATTGGTAGGTCGGAATTGCACGCTGTAGCGGCATGCTGATGGAGAGGAGAGACTTCTAGCTAGCTACGTGACTGATCCGCACACTGCGACACGTGATCGC
    +
    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Example output:

    HISEQ:623:HY5KHBCXX:2:2206:7231:7108::[DPM6A5][NYBot35_Stg][NOT_FOUND][Even2Bo19][Odd2Bo69]	0	chr1	17644	255	100M	*	0	0	ATTGGTAGGTCGGAATTGCACGCTGTAGCGGCATGCTGATGGAGAGGAGAGACTTCTAGCTAGCTACGTGACTGATCCGCACACTGCGACACGTGATCGC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII		NM:i:0	MD:Z:100