Requires Python 3.5 and above
Prediction requires tensorflow 2.1.0 with Keras
This is best installed via pip
pip install tensorflow
Or tensorflow can be run in a docker container (see tensorflow website for more details)
Tensorflow is tested and supported on the following 64-bit systems:
Python 3.5-3.7
Ubuntu 16.04 or later
Windows 7 or later
MacOS 10.12.6 (Sierra) or later (no GPU support)
Raspbian 9.0 or later
Files containing computer simulated (DL) sequences:
- spike_from_random_1.fasta
100 entirely simulated coronavirus spike protein sequences generated using seed texts with 16 amino acids selected at random
from the start of each protein in the training set. - spike_from_sars_0.5.fasta
Entirely simulated coronavirus spike protein sequences generated using a seed text of 64 amino acids from the start of SARS-CoV-2 spike protein. - spike_1000_sars_0.5.fasta
1000 simulated coronavirus spike protein sequences with a seed text of 64 amino acids from the start of SARS-CoV-2 spike
Place the prediction model files (model.h5 and model.json) in the same directory
To predict using random amino acids from the training set place seeds.txt in the same directory
Run the prediction with the following options:
lengths = the length of sequence you want to generate
seqs = the number of separate sequences of length [lengths] you want to generate
outfile = the name of the file to save the output
random = True if you wish to use a random 16 amino acids as seed text (also require seeds.txt to generate this)
False or leave blank if you wish to use 64 amino acids from SARS-CoV-2 as seed text.
temperature = scaling parameter between 0 and 1, with higher values giving more surprising sequences and lower values
remaining more true to the original training set sequences
Example:
python spike_sequence_generation.py --outfile tester --random True --lengths 1400 --seqs 10