Using Self-attention Convolutional Features and Auto Encoder to Predict Enhancer-promoter Interactions

Columbia University COMS W4762 - Machine Learning for Functional Genomics Final Project

Team: Ziheng Li, Daniel Lee, Thang Nguyen

In biology, transcription is the process of copying DNA into RNA by an enzyme called RNA polymerase in order to regulate gene expression. Specifically, RNA polymerase focuses on transcribing regions of DNA called genes. However, with the human genome having 3.2 billion base pairs long, locating said regions is not trivial. In order to facilitate the process, the enzyme leverages promoters, DNA sequences at the beginning of genes that mark the start of the transcription process. In addition, DNA also contains enhancers sequences, which are located thousands of base pairs away from promoters and contain activator proteins that boost RNA polymerase's efficiency. The interactive property between promoters and enhancers and its tie to gene expression has remained an open question, with many researches focusing on determining the relation between the sequence structures of enhancers, promoters and their interactions.

In this research, we create EPSAT, a deep learning model based on SPEID (Sequence-based Promoter-Enhancer Interaction with Deep learning) with an enhancement of self-attention approach from SATORI, and a novel deep learning beta variational auto-encoder architecture model called EPAE. Our results for EPSAT and APAE achieve higher F₁ score than SPEID and TargetFinder(a model solving the same task using boosted trees algorithms), while having lower count of trainable parameters and epochs. The models can be used for not only to predict EPI in DNA, but also provide a general method for evaluating the effects of sequence modification in gene expression.

Presentation

In order to run the models, please first install all necessary packages in requirements.txt.

pip install -r requirements.txt

Download dataset used here and put it in the data/ folder.

Train EPSAT by running:

python EPSAT.py

Train APAE by running:

python AEClassification/train_VAE.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
AEClassification		AEClassification
data		data
figures		figures
.gitignore		.gitignore
EPSAT.py		EPSAT.py
README.md		README.md
Self_Attention_Final_Report.pdf		Self_Attention_Final_Report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Self-attention Convolutional Features and Auto Encoder to Predict Enhancer-promoter Interactions

Columbia University COMS W4762 - Machine Learning for Functional Genomics Final Project

Team: Ziheng Li, Daniel Lee, Thang Nguyen

About

Releases

Packages

Languages

tnguye20/ML4FG_EPSAT_AE

Folders and files

Latest commit

History

Repository files navigation

Using Self-attention Convolutional Features and Auto Encoder to Predict Enhancer-promoter Interactions

Columbia University COMS W4762 - Machine Learning for Functional Genomics Final Project

Team: Ziheng Li, Daniel Lee, Thang Nguyen

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages