Skip to content

Making Representative Genome Graphs from Eukaryotic Viruses and Phages

License

Notifications You must be signed in to change notification settings

NCBI-Codeathons/Virus_Graphs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Virus_Graphs

Making Representative Genome Graphs from Eukaryotic Viruses and Phages

Markdown Monster icon

HIV inference based on NGS data.

Ideas

  • k-mer idea

HIV inference with reference genome assemblies

Markdown Monster icon Figure: HIV-1 references (n = 170) SWIGG, visualized in Gephi.

n = 39 without non-recombinants.

env, V1-V5

pol

How much variation can we have in a reference graph genome?

Tools considered: VG, NovoGraph, SWIGG.

The color (a compact representation of the number of sub-lineages a particular K-mer appears in) information can be computed by running the Annotate_Colors.ipynb note book. The inputs are the sketch graph of the viral genomes as obtained from SWIGG and the meta data obtained from the reference genomes.

Implementation

INSERT WORKFLOW WHEN READY FROM PAPER (Alexis)

Method options: SWIGG, VG Dataset options: T-virus, HIV-1 Visualization:

Datasets

HIV-1 Genome assemblies

HIV-1 datasets (as mutli-sequence alignments (MSA)/with gaps, and as multi-FASTA/gappless) from Los Alamos National Laboratory's HIV Sequence Database on November 5th-6th, 2019. Files set to include HXB2 K03455 as reference.

All complete genomes (n=10568).

Tissue type

CD4+ T cell (n=) vs PBMC (n=1555).

Tropism

Only CCR5 (n= 456) vs only CXCR4 (n= 43) vs R5X4 (n=51).

Patient Information

Acute infection (n=525)

AIDS (n=66)

Asymptomatic (n=223)

Symptomatic (n=83)

Deceased (n=7)

Chronic (n= 174)

HIV-1 Sequencing data

Markdown Monster icon Figure: cDNA+PCR DNAseq ("classic RNAseq"). A. Coverage summary of reads mapped to HXB2 K03455 with HISAT2 with usegalaxy.eu. B. RmDup-processed reads, controling for PCR duplicates after initial alignment. Search strategy = SRA. Searchterms: "HIV-1 and RNAseq and virus". Bioproject: PRJNA320293, specifically SRR3472915. Viewed in IGV.

Markdown Monster icon Figure: Unpublished HXB2 ONT dataset. Gener, 2019. Coverage summary of reads mapped to HXB2 K03455 with minimap2 with usegalaxy.eu. Viewed in IGV. Note, "4000 reads" denotes the FASTQ subsetting from Guppy basecaller. Note also that PCR was used during ONT library prep.

Markdown Monster icon Figure: Unpublished pNL4:3d1443 Tg26 subset. Coverage summary of reads mapped to HXB2 K03455 with minimap2 with usegalaxy.eu. Tg26 is an HIV-1 transgenic mouse with a deletion in gagpol. PCR-free PE 150. Viewed in IGV.