Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10x Barcodes #2

Open
Simon-Coetzee opened this issue May 12, 2017 · 2 comments
Open

10x Barcodes #2

Simon-Coetzee opened this issue May 12, 2017 · 2 comments

Comments

@Simon-Coetzee
Copy link
Contributor

10x Barcodes with v2 chemistry work like this:
examples from merge_barcodefiles_10x()
looks like there may be some confusion about which file does what?

I1 = sample barcode (SB) (8 bp)

@ST-K00126:307:HFM3NBBXX:1:1101:3772:1244 1:N:0:NTCGCCCT
NTCGCCCT
+
#AAAFJ-J

python regex: (@.*)\n(?P<SB>.*)\n+(.*)\n(.*)\n

R1 = cellular barcode (CB) (16 bp) + molecular barcode (MB) (umi) 10bp

@ST-K00126:307:HFM3NBBXX:1:1101:3772:1244 1:N:0:NTCGCCCT
NCATTTGAGTAACCCTGATGTCATAA
+
#AAFFJJJJJJJJJJJJJJJFJJJJJ

python regex: (@.*)\n(?P<CB>.{16})(?P<MB>.{10})\n+(.*)\n(.*)\n

R2 = rna reads (98 bp)

@ST-K00126:307:HFM3NBBXX:1:1101:3772:1244 1:N:0:NTCGCCCT
NCATTTGAGTAACCCTGATGTCATAA
+
#AAFFJJJJJJJJJJJJJJJFJJJJJ

python regex: (?P<name>@.*) .*\n(?P<seq>.*)\n+(.*)\n(?P<qual>.*)\n

@Simon-Coetzee
Copy link
Contributor Author

@ST-K00126:307:HFM3NBBXX:1:1101:3772:1244 2:N:0:NTCGCCCT
NAAGCCAGTTGTGAATCATGCACATCAGCTCCTTCTGAAATGTGTTTATGGCCTAGGACACAGGGACCCTGGAGACTATGGTGCTGCAGTGCATTATG
+
#<<A<FJJJFJFJJJJJJJJJFJFJJJJJJJJJJJJJJJFJJFFJJJJJAFJJFJF7JJJJFJAJJJ<J<7-A<FFFFJ-F<FJJJJJJJJJJ7FJJA

is what i meant for R2

axtambe pushed a commit that referenced this issue Jul 19, 2017
@jnotwell
Copy link

@Simon-Coetzee, this is a correct description of the 10X V2 chemistry. I believe concatenating the sample and cellular barcodes, however, is incorrect (merge_barcodefiles_10x(), args['barcode_start'] = 0, args['barcode_end'] = 26).

This is because 10X uses four 8 bp oligonucleotides per sample index to address sequencing biases. This can be easily observed with any sample barcode file:

zcat SAMPLE_I1_001.fastq.gz | awk '{if(NR % 4 == 2) {a[$1] += 1}} END {for(x in a) {print x "\t" a[x]}}' | sort -k2,2gr

Concatenating the sample and cellular barcodes will (I think) result in reads for a given cell being associated with four different barcodes. Using just the 16 bp cellular barcode should avoid these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants