GitHub - Spandan2308/Protein-Protein-Interaction-Prediction: Sequence based PPI prediction algorithm is developed using Xgboost. Around 73,000 positive and negative interacting protein pairs were extracted from Pan’s PPI dataset

Sequence based PPI prediction algorithm is developed using Xgboost. Around 73,000 positive and negative interacting protein pairs were extracted from Pan’s PPI dataset (http://www.csbio.sjtu.edu.cn/bioinf/LR_PPI/Data.htm)

Supp-A contains positive protein pairs while Supp-B contains negative protein pairs. The 'Data_Loading' file extracts protein paris from these files into csv files. PPI Prediction file has the algorithm.

The function aa_composition(sequence) calculates the amino acid composition of a given protein sequence. It iterates over each amino acid (represented by the letters 'ACDEFGHIKLMNPQRSTVWY'), counts the occurrences of each amino acid in the sequence, and stores the counts in a dictionary. The final output is a list containing the count of each amino acid, which represents the feature vector for that protein sequence. This feature vector captures the relative abundance of different amino acids in the sequence and can be used as input. The model has accuracy of around 98%.

Authors

@spandan_sureja

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Authors

About

Releases

Packages

Spandan2308/Protein-Protein-Interaction-Prediction

Folders and files

Latest commit

History

Repository files navigation

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages