-
Notifications
You must be signed in to change notification settings - Fork 1
/
get-stress_response-genes-transcriptome.Rmd
95 lines (76 loc) · 2.76 KB
/
get-stress_response-genes-transcriptome.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: "get-stress-genes"
output: html_document
---
Rmd to get list of stress response genes from the transcriptome v 3.1 with `blast` output and uniprot-SP-Go annotation.
```{r}
library(dplyr)
library(tidyverse)
```
Read in the annotation from the transcriptome v 3.1 blast output:
```{r}
uniprot <- read.delim("../data/uniprotKB-transcrblast3.1-GO.tab", sep = '\t', header = TRUE)
head(uniprot)
```
12931 annotations
Rename columns:
```{r}
colnames(uniprot) <- c("uniprot_acc_ID", "Entry.name", "Status", "Protein.names", "Gene.names", "Organism", "Length", "Gene.ontology.biological.process", "Gene.ontology.cellular.component", "Gene.ontology.GO", "Gene.ontology.molecular.function", "Gene.ontology.IDs")
head(uniprot)
```
Read in `blast` output from transcriptome v 3.1 - unique Trinity IDs (48551):
```{r}
blast <- read.delim("../analyses/BLAST-to-GOslim/_blast-sep.tab", header = FALSE)
head(blast)
```
Rename first and third column for `join`-ing purposes later on.
```{r}
colnames(blast) <- c("Trinity_ID", "swissprot", "uniprot_acc_ID", "gene_id", "pident", "length", "mismatch", "gapopen", "qstart", "qend", "sstart", "send", "evalue", "bitscore")
head(blast)
```
`join` the `blast` output with the uniprot-SP-GO by uniprot_acc_ID:
```{r}
blastgo <- left_join(blast, uniprot, by = "uniprot_acc_ID")
head(blastgo)
```
48551 rows with 25 columns
Write out annotated blast go output:
```{r}
#write.table(blastgo, "../analyses/BLAST-to-GOslim/transcriptomev3.1-blast-uniprotGO.tab", sep = '\t', quote = FALSE, row.names = FALSE)
```
Wrote out 02/11/2021. Commented out code.
Read in the table from the BLAST to GOslim jupyter notebook called "Blastquery-GOslim.tab":
```{r}
blastquery <- read.delim("../analyses/BLAST-to-GOslim/Blastquery-GOslim.tab", sep = '\t', header = FALSE)
head(blastquery)
```
456451 rows.
Add column names:
```{r}
colnames(blastquery) <- c("Trinity_ID", "GO_ID", "GOslim", "biological_process")
head(blastquery)
```
Subset the GOslim terms that have the word "stress" in them. Only the term "stress response" has the term "stress":
```{r}
stress <- blastquery %>%
filter(str_detect(GOslim, "stress"))
head(stress)
```
10910 Trinity IDs.
`join` the stress list with the annotated `blast` output with uniprot (blastgo):
```{r}
stress_annot <- left_join(stress, blastgo, by = "Trinity_ID")
head(stress_annot)
```
10910 rows. 24 columns.
write out the annotated stress response table:
```{r}
#write.table(stress_annot, "../analyses/BLAST-to-GOslim/transcriptome-stress-genes.tab", sep = '\t', quote = FALSE, row.names = FALSE)
```
Wrote out 02/11/2021. Commented out code.
#### find unique Trinity_IDs (tells how many contigs fall under the category "Stress Response":
```{r}
distinct_stress <- unique(stress_annot$Trinity_ID)
head(distinct_stress)
```
6191 rows.