Repository for scripts, notebooks, data, and analyses associated with the paper entitled:
Characterization of the gene repertoire and environmentally driven expression patterns in Tanner crab (Chionoecetes bairdi)
Grace Crandall1, Pamela C. Jensen2, Sam White1, Steven Roberts1
1 School of Aquatic and Fishery Sciences, University of Washington, Seattle, Washington 98105, United States
2 Resource Assessment and Conservation Engineering Division, Alaska Fisheries Science Center, National Marine Fisheries Service, NOAA, 7600 Sand Point Way NE, Seattle, WA 98115, United States
For broader background information on this project, visit RobertsLab/project-crab
- Blastquery-GOslim.tab
Table from notebooks/transcriptomev3.1-BLAST-to-GOslim.ipynb from transcriptome v 3.1. - GOslim-P-pie.txt
Text file of GOslim terms from transcriptome v 3.1 with counts. To be used to create GOslim pie for paper. GOslim-P-pie.txt was create in this jupyter notebook: notebooks/transcriptomev3.1-BLAST-to-GOslim.ipynb - GOslim-pie-transcriptome-v3.1.pdf
.pdf of the GOslim pie from transcriptome v 3.1 create in this R script: scripts/GOslim-pie-transcriptome-v3.1.Rmd - _blast-sep.tab
From notebooks/transcriptomev3.1-BLAST-to-GOslim.ipynb. Nicely separated version of theblast
output from transcriptome v 3.1 - allterms.GOslim-pie-transcriptome-v3.1.pdf
PDF of GOslim pie using all terms from the transcriptome v 3.1, including those less descriptive terms like "other biological processes" in order to make the pie slices more accurate. - transcriptome-stress-genes.tab
Table of stress response genes from transcriptome v 3.1 according to the GOslim term "stress response".
DAVID
output files from the single crab genes over time heatmap clusters (Fig. 3)
6 clusters, one file per clusters. Clusters 5 and 6 are very small compared to the other 4, and as such are not discussed in the paper.
DAVID
output files from (Fig. 2) for the first three clusters. The fourth cluster didn't have any DAVID
results.
The files in this directory are from this Rmd: scripts/DESeq.Rmd
- DEGlist-contrast_temperature-counts.tab
List of 123 DEGs associated with infection status that are influenced by temperature treatment. This list of DEGs includes gene counts from the 4 libraries. - DEGlist-contrast_temperature.tab
List of 123 DEGs associated with infection status that are influenced by temperature treatment. - DEGlist-infection-with-temp-counts.tab
List of 408 DEGs associated with infection status while taking temperature treatment into account. These are not the list of genes influenced by temperature. This list of DEGs includes the gene count data for the fuor libraries. - DEGlist-infection-with-temp.tab
List of 408 DEGs associated with infection status while taking temperature treatment into account. - DEGlist-infectionONLY-counts.tab
List of 1343 DEGs comparing infection status. List includes count data from the four libraries. - DEGlist-infectionONLY.tab
List of 1343 DEGs comparing infection status.
Output files from this jupyter notebook to get gene count data from the 4 libraries used for DESeq2
. Jupyter notebook: notebooks/kallisto-4libraries.ipynb
- kallisto-0812.isoform.TMM.EXPR.matrix
One of the files fromkallisto
. - kallisto-0812.isoform.TPM.not_cross_norm
One of the files fromkallisto
. - kallisto-0812.isoform.TPM.not_cross_norm.runTMM.R
One of the files fromkallisto
. - kallisto-0812.isoform.counts.matrix
Count matrix for hte 4 libraries. Used inDESeq2
Rmd: scripts/DESeq.Rmd
Output files from the jupyter notebook to get gene count data from the individual crab RNAseq libraries for visualization. Jupyter notebook: notebooks/kallisto-individual-crab.ipynb
- kallisto-single-crab.isoform.TMM.EXPR.matrix
One of the files fromkallisto
. - kallisto-single-crab.isoform.TPM.not_cross_norm
One of the files fromkallisto
. - kallisto-single-crab.isoform.TPM.not_cross_norm.runTMM.R
One of the files fromkallisto
. - kallisto-single-crab.isoform.counts.matrix
Count matrix for the 3 libraries of the individual crab over time. Used in this Rmd to visualize counts: scripts/heatmaps-single_crab-over-time.Rmd.
- contrast-tempDEGs_singlecrab-cluster-blast-go.tab
Table of the temperature-influenced infection DEGs that are present in the single crab over time (Fig. 3) with cluster assignment, uniprot/swissprot and gene ontology information. Of the 123 temperature-influenced infection DEGs, 94 were present in the individual crab over time. - contrasttemp_DEGs-clusters-annot.tab
Table of the 123 temperature-influenced infection DEGs with cluster assignemnt from Fig. 2 withBLAST
- uniprot/swissprot gene ontology annotation. - individual-crab-over-time.pdf
PDF of Fig. 3 <-- single crab RNAseq contigs over time heatmap. - infection-temp-DEGs-counts_annot.tab
Table of the 408 differentially expressed contigs between infection statuses while taking temperature treatments into account. Table includes library counts,BLAST
-uniprot/swissprot and gene ontology annotation. - infection-tempDEGs_singlecrab-cluster-blast-go.tab
Table of the what is this.... - infectionDEGs_singlecrab-clust-blast-go.tab
What is this...? - single_crab-clusters-blast-GO.tab
Table of the Fig. 3 single crab RNAseq with cluster assignment,BLAST
-uniprot/swissprot and gene ontology annotations. - temp-influenced-infectionDEGs-singlecrab-overtime.tab
Table of the 123 temperature-influenced infection DEGs with the cluster assignments for both Fig. 2 and Fig. 3. 94 of the 123 DEGs were present in the individual crab (Fig.3). so the column for the cluster assigments for Fig. 3 has some "NA"s. - tempinfluenced-infectionDEGs-heatmap.pdf
PDF of Fig. 2 <-- 123 temperature-influenced infection DEGs heatmap. Created in this Rmd: scripts/heatmaps-4libraries.Rmd
- cbai_transcriptome_v3.1.zip
.zip of the transcriptome (v 3.1) used as the basis for differential expressoin analyses and characterization of crab response to temperature and infection with Hematodinium - uniprotKB-transcrblast3.1-GO.tab
Table of the uniprot Gene Ontology that is associated with the Trinity IDs from transcriptome v 3.1. Used in /scripts/get-stress_response-genes-transcriptome.Rmd) to create /supplemental-material/Supp04-transcriptomev3.1-blast-uniprotGO.tab.
Directory containing directories for each library that has output files from kallisto
.
- 178_ambient_infected_0
Output files fromkallisto
for library 178 - individual crab, ambient, infected, Day 0. Jupyter notebook: notebooks/kallisto-individual-crab.ipynb - 359_ambient_infected_2
Output files fromkallisto
for library 359 - individual crab, ambient, infected, Day 02. Jupyter notebook: notebooks/kallisto-individual-crab.ipynb - kallisto/380822_cold_uninfected
Output files fromkallisto
for library 380822 - cold, uninfected from Day 2. Jupyter notebook: notebooks/kallisto-4libraries.ipynb - kallisto/380823_cold_infected
Output files fromkallisto
for library 380823 - cold, infected from Day 2. Jupyter notebook: notebooks/kallisto-4libraries.ipynb - kallisto/380824_warm_uninfected
Output files fromkallisto
for library 380824 - warm, uninfected from Day 2. Jupyter notebook: notebooks/kallisto-4libraries.ipynb - kallisto/380825_warm_infected
OUtput files fromkallisto
for library 380825 - warm, infected from Day 2. Jupyter notebook: notebooks/kallisto-4libraries.ipynb - 463_ambient_infected_17
Output files fromkallisto
for library 463 - individual crab, ambient, infected, Day 17. Jupyter notebook: notebooks/kallisto-individual-crab.ipynb
- kallisto-4libraries.ipynb
Notebook for getting gene count data for the 4 libraries (380822, 380823, 380824, and 380825) to use inDESeq2
from transcriptome v 3.1. - kallisto-individual-crab.ipynb
Notebook for getting count matrix for the individual crab RNAseq from transcriptome v 3.1 (individual RNAseq libraries: 178, 359, and 463). - transcriptomev3.1-BLAST-to-GOslim.ipynb
Notebook for getting transcriptome v 3.1blast
output to Goslim terms.
- DESeq.Rmd
Rmd to useDESeq2
with the count matrix for the 4 libraries (380822, 380823, 380824, 380825). - GOslim-pie-transcriptome-v3.1.Rmd
Rmd to create a pie chart of GOslim terms with counts from transcriptome v 3.1. - get-stress_response-genes-transcriptome.Rmd
Rmd to get the list of trinity IDs from transcriptome v 3.1 that fall undert the GOslim term "Stress Response". - heatmaps-4libraries.Rmd
Rmd to create heatmaps usingpheatmap
for DEGs for the 4 libraries. - heatmaps-DEGs_in-singlecrab_over-time.Rmd
Rmd to create a heatmap of the temperature-influenced infection DEGs in the single crab ... double check what this Rmd is.. - heatmaps-single_crab-over-time.Rmd
Rmd to create heatmap of the contigs in the single crab over the three time points. - tempdegs-individualcrab.Rmd
Rmd tojoin
the cluster heatmap information from Fig. 2 in the manuscript with the cluster heatmap information from the heatmap of single crab genes over time. Purpose is to have a file that includes the cluster assignments for both heatmaps in order to find the temperature-influenced infection DEGs in the individual crab to get an idea of patterning of expression at the individual level over time.
-
Supp01-sample-list_pool-RNAseq.csv
Table containing metadata to the sample level for the RNAseq libraries. -
Supp02-RNAsequ-libraries.csv
Table containing descriptions of the pooled RNAseq libraries. -
Supp03-cbai_transcriptome_v3.1.zip
Compressed C. bairdi assembled transcriptome, with non-Alveolata taxonomic filter. -
Supp04-transcriptomev3.1-blast-uniprotGO.tab
Table of the C. bairdi transcriptome BLAST results with uniprot/swissprot and gene ontology information.
Columns are:
Trinity_ID
- contig ID from Trinity assembly
swissprot
- swissprot
uniprot_acc_ID
- uniprot accession ID for the contig
gene_id
- gene ID
pident
- percentage of identical matches
length
- alignment length
mismatch
- number of mismatches
gapopen
- number of gap openings
qstart
- start of alignment in query
qend
- end of alignment in query
sstart
- start of alignment in subject
send
- end of alignment in subject
evalue
- expect value
bitscore
- Bit score
Entry.name
- entry name
Status
- reviewed or not
Protein.names
- protein names for contig
Gene.names
- gene names for contig
Ogranisms
- taxonomic organisms
Length
- length
Gene.Ontology.biological.process
- biological process gene ontology
Gene.ontology.cellular.component
- cellular component gene ontology
Gene.ontology.GO
- GO terms
Gene.ontology.molecular.function
- molecular function gene ontology
Gene.ontology.IDs
- gene ontology IDs
-
Supp05-transcriptome-stress-genes.tab
Table of the C. bairdi transcripts that fell under the GOslim term "Stress response". -
Supp06-infection-temp-DEGs-counts_annot.tab
Table of the differentially expressed contigs between infected and uninfected crab taking temperature difference into account (decreased vs. elevated) at Day 2 (408 DEGs), including library count data and uniprot/swissprot and gene ontology information. -
Supp07-contrasttemp_DEGs-clusters-annot.tab
Table of a subset of the above 408 DEGs that are just the contigs directly influenced by temperature - 123 total. Table includes library count data, cluster assignment as seen in Fig. 2, and uniprot/swissprot and gene ontology information. -
Supp08-single_crab-clusters-blast-GO.tab
Table containing the count data for the single crab RNAseq libraries over time, with the cluster assignments as shown in Fig. 3, along with uniprot/swissprot and gene ontology information -
Supp09-temp-influenced-infectionDEGs-singlecrab-overtime.tab
Table combining Supplemental tables 07 and 08. Allows for finding the 123 infection DEGs that are influenced by temperature treatment in the individual crab over time.