This little helper usues jinja2 and igvf_utils to generate seqspec files for mpra count and association data. It is very flexible because many templates can be defined. It loads all necessary infomation from the IGVF data portal, like file-size, md5sums, sequencing platform, etc.
You need to have jinja2 and click. I install it via mamba:
mamba install jinja2 click
Please install the actual seqspec development:
pip install git+https://github.com/pachterlab/seqspec@devel
Then install the igvf_utils (install documentation):
pip install https://github.com/IGVF-DACC/igvf_utils/archive/master.zip
Set the IGVF_API_KEY environment variable to your API key as well as the IGVF_SECRET_KEY to your secret key. See configuration documentation.
python generate_seqspec.py --help
python generate_seqspec.py --template templates/igvf_mpra_lenti_assignment.v0.3.0.yml \
--name mpra_shendure_proximal_promoter --modality dna \
--r1-id IGVFFI7003XVUG --r1-id IGVFFI5231ATBS --r1-id IGVFFI0921LXBF \
--r2-id IGVFFI8640LLIG --r2-id IGVFFI0576IRDC --r2-id IGVFFI6354NJGB \
--r3-id IGVFFI4403ENTR --r3-id IGVFFI2142OYFW --r3-id IGVFFI6807PKQA \
--r1-primer GGCCCGCTCTAGACCTGCAGGAGGACCGGATCAACT --r2-primer GCAAAGTGAACACATCGCTAAGCGAAAGCTAAG --r3-primer CATTGCGTGAACCGACACTAGAGGGTATATAATG \
--onlist-id IGVFFI2041KXFD \
--bc-length 15 --oligo-length 200 \
--output IGVF_shendure_proximalPromoter_assignment.yaml
DNA assignment seqspec of lenti virus MPRA from Shendure (UW) grant:
python generate_seqspec.py --template templates/igvf_mpra_lenti_assignment.v0.3.0.yml \
--name mpra_shendure_80K --modality dna \
--r1-id IGVFFI9931MZQI --r2-id IGVFFI9154RAYY --r3-id IGVFFI7509PYSL \
--r1-primer GGCCCGCTCTAGACCTGCAGGAGGACCGGATCAACT --r2-primer GCAAAGTGAACACATCGCTAAGCGAAAGCTAAG --r3-primer CATTGCGTGAACCGACACTAGAGGGTATATAATG \
--bc-length 15 --oligo-length 270 \
--output test.yaml
seqspec print -f seqspec-ascii test.yaml
returns:
dna
---
|------------------------------------------------------------------------------------------------------------------------------------------------->(1) Oligo fwd
|-------------->(3) BC
AATGATACGGCGACCACCGAGATCTACACXXXXXXXXXXCAGCCTGCATTTCTGCCAGGGCCCGCTCTAGACCTGCAGGAGGACCGGATCAACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGCAAAGTGAACACATCGCTAAGCGAAAGCTAAGGAAGCTCGACTTCCAGCTTGGCAATCCGGTACTGTCATTGCGTGAACCGACACTAGAGGGTATATAATGXXXXXXXXXXXXXXXACCGGTCGCCACCATGGTGAGCAAGGGCGAGGAGCATCTCGTATGCCGTCTTCTGCTTG
TTACTATGCCGCTGGTGGCTCTAGATGTGXXXXXXXXXXGTCGGACGTAAAGACGGTCCCGGGCGAGATCTGGACGTCCTCCTGGCCTAGTTGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGTTTCACTTGTGTAGCGATTCGCTTTCGATTCCTTCGAGCTGAAGGTCGAACCGTTAGGCCATGACAGTAACGCACTTGGCTGTGATCTCCCATATATTACXXXXXXXXXXXXXXXTGGCCAGCGGTGGTACCACTCGTTCCCGCTCCTCGTAGAGCATACGGCAGAAGACGAAC
<-------------------------------------------------------------------------------------------------------------------------------------------------|(2) Oligo rev
RNA count seqspec of lenti virus MPRA from shendure (UW) grant:
python generate_seqspec.py --template templates/igvf_mpra_lenti_counts.v0.3.0.yml \
--name mpra_shendure_80K --modality rna \
--r1-id IGVFFI8223UESF --r1-id IGVFFI9990NOMV --r1-id IGVFFI3050NXPU \
--r2-id IGVFFI9560VIAN --r2-id IGVFFI5074MDCR --r2-id IGVFFI4713QQLG \
--r3-id IGVFFI1814DAMK --r3-id IGVFFI0172AZKE --r3-id IGVFFI2509VPWV \
--r1-primer GCAAAGTGAACACATCGCTAAGCGAAAGCTAAG --r2-primer ACCGGTCGCCACCATGGTGAGCAAGGGCGAGGAGC \
--bc-length 15 \
--output test.yaml
seqspec print -f seqspec-ascii test.yaml
returns:
rna
---
|-------------->(1) RNA BC count fwd
|--------------->(3) RNA BC count id
AATGATACGGCGACCACCGAGATCTACACXXXXXXXXXXGCAAAGTGAACACATCGCTAAGCGAAAGCTAAGNNNNNNNNNNNNNNNACCGGTCGCCACCATGGTGAGCAAGGGCGAGGAGCXXXXXXXXXXXXXXXXATCTCGTATGCCGTCTTCTGCTTG
TTACTATGCCGCTGGTGGCTCTAGATGTGXXXXXXXXXXCGTTTCACTTGTGTAGCGATTCGCTTTCGATTCNNNNNNNNNNNNNNNTGGCCAGCGGTGGTACCACTCGTTCCCGCTCCTCGXXXXXXXXXXXXXXXXTAGAGCATACGGCAGAAGACGAAC
<--------------|(2) RNA BC count rev
RNA count seqspec of plasmid MPRA from Mohlke (UNC) grant:
python generate_seqspec.py --template templates/igvf_mpra_unc_counts.v0.3.0.yml \
--name mpra_unc_hepg2 --modality rna \
--r1-id IGVFFI1586GLDT --r1-id IGVFFI1618FFIN \
--r1-primer CCAAGAAGGGCGGCAAGATCGCCGTGTAATAATTCTAGA --bc-length 20 --onlist-id IGVFFI9520JZQK \
--output test.yaml
seqspec print -f seqspec-ascii test.yaml
returns:
rna
---
|-------------------------------------------------->(1) RNA BC count fw
AATGATACGGCGACCACCGAGATCTACACTACAACCGCCAAGAAGCTGCGCGGTGGTGTTGTGTTCGTGGACGAGGTGCCTAAAGGACTGACCGGCAAGTTGGACGCCCGCAAGATCCGCGAGATTCTCATTAAGGCCAAGAAGGGCGGCAAGATCGCCGTGTAATAATTCTAGANNNNNNNNNNNNNNNNNNNNACTAGTACACTCCCCGTCGGCAGTTGGGAAGAGCATAGTCGTAGAGCACGCGGACTCCTATCTCGTATGCCGTCTTCTGGTTG
TTACTATGCCGCTGGTGGCTCTAGATGTGATGTTGGCGGTTCTTCGACGCGCCACCACAACACAAGCACCTGCTCCACGGATTTCCTGACTGGCCGTTCAACCTGCGGGCGTTCTAGGCGCTCTAAGAGTAATTCCGGTTCTTCCCGCCGTTCTAGCGGCACATTATTAAGATCTNNNNNNNNNNNNNNNNNNNNTGATCATGTGAGGGGCAGCCGTCAACCCTTCTCGTATCAGCATCTCGTGCGCCTGAGGATAGAGCATACGGCAGAAGACCAAC