This repository provides a "Python implementation" of the camcoil program (originally written in C) to estimate the random coil chemical shift values from a sequence (string) of amino-acids.
There might be updates in the future, but this first version is now fully operational.
M. Vrettas, PhD.
There are two options to install the software.
-
The easiest way is to visit the GitHub web-page of the project and download the code in zip format. This option does not require a prior installation of git on the computer.
-
Alternatively one can clone the project directly using git as follows:
$ git clone https://github.com/vrettasm/PyCamcoil.git
The recommended version is Python 3.6+. Some required packages are:
numpy, pathlib, pandas
To simplify the required packages just use:
$ pip install -r requirements.txt
To execute the program, first navigate to the main directory of the project (i.e. where the camcoil.py is located), and then run the following command:
$ ./camcoil.py -s TESTAMINOSEQ
The output should be:
SEQUENCE PROCESSED (pH=6.1):
1: TESTAMINOS
2: EQ
ID RES CA CB C H HA N
0 1 T 61.30774 70.08048 174.94060 8.22782 4.42540 115.78010
1 2 E 56.62326 30.30048 176.72142 8.39908 4.33240 122.87622
2 3 S 58.30388 63.78776 174.88248 8.30858 4.47402 116.85280
3 4 T 61.39336 69.90206 174.70254 8.17616 4.42510 116.43512
4 5 A 52.78414 19.21966 177.31598 8.29420 4.34400 126.55214
5 6 M 55.63120 32.83290 176.46296 8.22956 4.51812 120.84098
6 7 I 60.63876 39.09866 175.88120 8.26586 4.27638 120.65694
7 8 N 53.03970 39.16040 175.29378 8.39686 4.74836 122.90968
8 9 O 62.63300 33.84500 175.98300 NaN 4.76300 139.06800
9 10 S 58.07210 63.89002 174.83516 8.29660 4.51762 117.30086
10 11 E 56.58864 30.32174 176.38200 8.37418 4.34432 122.86766
11 12 Q 55.71748 29.58952 176.10200 8.20326 4.35074 120.44640
Hint: If you want to run the program on multiple sequences you can include them one after the other, with a space ' ' between them as follows:
$ ./camcoil.py -s TESTSEQA TESTSEQC TESTSEQD ...
This will run PyCamcoil on all the sequences and will display the results for each one separately.
To see the full help information, type:
$ ./camcoil.py --help
Note: The input sequence can be given with and without double-quotes. Example:
1. $ ./camcoil.py -s "TESTAMINOSEQ"
is equivalent to:
2. $ ./camcoil.py -s TESTAMINOSEQ
The code can also be called independently in any other Python program by importing the main module as:
# Import the main module.
from src.camcoil_engine import CamCoil
# Create an object.
r_coil = CamCoil()
# Use it to get a dataframe.
df_coil = r_coil("APKAPADGL")
# Display the random coil values.
print(df_coil)
ID RES CA CB C H HA N
0 1 A 51.41214 19.07014 176.52162 8.24460 4.47156 125.51232
1 2 P 62.60744 31.82580 177.00832 NaN 4.43674 137.18000
2 3 K 56.14292 32.88032 176.38484 8.31074 4.36828 122.10046
3 4 A 51.34046 19.11406 176.52994 8.21902 4.47378 125.42938
4 5 P 62.50610 31.85898 177.01894 NaN 4.44632 137.18000
5 6 A 52.64648 19.28308 177.36534 8.28882 4.33092 124.59918
6 7 D 54.28196 41.20876 176.66546 8.28206 4.61802 119.64270
7 8 G 45.88736 NaN 174.37640 8.27726 4.12000 109.77946
8 9 L 55.03396 42.38164 177.30200 8.04950 4.37060 122.16668
The main module (when is called directly inside a python environment as above) averages around '0.5' sec per '100' residue amino-acid chain:
%timeit r_coil("APKAPADGLKMEATKQHNAPVVAPKAPADGLKMEATKQHPVVAPKAPADG"
"LKMEATKQHPAPKAPADGLKMEATKQHNAPVVAPKAPADGLKMEATKQOH")
505 ms ± 35.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The work is described in detail at:
- Alfonso De Simone, Andrea Cavalli, Shang-Te Danny Hsu, Wim Vranken and Michele Vendruscolo (2009). "Accurate Random Coil Chemical Shifts from an Analysis of Loop Regions in Native States of Proteins". Journal of the American Chemical Society (JACS), 131 (45), 16332 - 16333.
For any questions/comments (regarding the code) please contact me at: vrettasm@gmail.com