This code is DNA image footprinting and matching. The main idea is to convert a DNA sequence to an image to find any related sequences in the image with common algorithms. The goal of the project is to generate a platform like the Google Map service to find, search and match any sequences.
First install requirements:
pip install -r requirements.txt
Then run the code for Fasta file sample
python dif.py sequence.fasta -o test.png
You should see a test.png like this in folder
The other option is using DnaFootprint.ipynb file with Jupyter
python dif.py sequence.fasta -o test.png -size 1000 -plotsize 4000000
This parameter limit the plot size
If you don't mention recordid. The code uses the first id of file. If you want to use a known recordid use this code
python dif.py -o test.png -i sequence.fasta -size 1000 -recordid chr0 -plotsize 4000000
For example, with this footprint, we can compare the sequence of "Human Genome" and "Chimpanzee Genome". As the pictures show, you can find any similarities and dissimilarities in the sequences that are related to this comparison.
The second sequence mutated with 5% and 29%.
The other application is mutation detection. I tested this image footprint for highly mutated sequences and 3 results are shown here: We can find any highly mutated sequence with a computer vision algorithm.
The goal application could be implemented like the following image. One can see the sequence and the name of that known sequence (yellow lines) and also see the mutated version on other animals (blue line)
- Initial algorithm and demo
- Generate footprint for common sequences
- Convert the footprint to tile image like google map
- Development of the image processing algorithm
- A website for comparing and seeing the related sequence like demo
- Update code for protein seq.
Now all footprint plots on one image in future releases the tile image will be generated.
Any suggestions and opinions are welcomed.