-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draw Repeat Landscapes #92
Comments
Hello, Thank you for using EDTA. There are many ways you can approach your goal, but most, if not all, of them, require some coding skills in one kind or the other. I have an old code that was developed to summarize LTR density in the LTR_retriever package: https://github.com/oushujun/LTR_retriever/blob/master/bin/LTR_sum.pl. Alternatively, you may try karyoploteR, which is made to fulfill similar tasks. But again, this time you need some R coding skills. Similarly, you can also try a simpler version in ggplot: https://www.biostars.org/p/69748/ Hope these are helpful and feel free to share your codes here. Best, |
I don't know if you figured that out but I've made my own visualization with R using the results from RepeatMakser - createRepeatLandscape.pl + the histogram information at the bottom of the output file. You can create a stacked barplot with ggplot2 which works perfectly for a publication. Let me know if you still need hep with that! |
@KristinaGagalova Can you share the codes and usage here so that other uses may benefit from your hard work? Thanks! |
Yes, of course! |
@KristinaGagalova Please upload them here. I have pinned this issue to the top so that everyone looking for this kind of illustration will see it. Thanks Kristina! |
Code is below
Below a the expected input (generated by createRepeatLandscape.pl from RepeatMasker) This plots all the classes of repeats, we may want to create a more efficient grouping of those, using the output from EDTA and including that into few classes of repeats. |
@KristinaGagalova Thanks for sharing the code. I notice that Gypsy is missing the input and the figure, is it something wrong or you want to leave it out? |
@oushujun
Looks like the pipelinen did not identify any of LTR Gypsy, I am not sure if that's unusual. I have also added the repeats from Repbase from a closer species that has several LTR but looks like RepeatMasker did not identify also those |
@KristinaGagalova I see, that makes sense! Thanks for sharing your data and the code. |
Hello, I am interested in producing a similar plot for my genome of interest. I have recently run EDTA on the genome. Is the Kimura substitution plot possible to make from the output of EDTA (i.e. is the output in one of the RepeatMasker directories)? r do I need to run RepeatMasker separately using the TE library generated by EDTA? Thank you in advance, |
Hi Aaron,
If you check the above conversations, you will see the input is obtained
from createRepeatLandscape.pl from RepeatMasker. While I never tried
myself, you may check it out and see what the script needs.
Best,
Shujun
…On Fri, Feb 26, 2021 at 12:20 PM aaronphillips7493 ***@***.***> wrote:
Hello,
I am interested in producing a similar plot for my genome of interest. I
have recently run EDTA on the genome. Is the Kimura substitution plot
possible to make from the output of EDTA (i.e. is the output in one of the
RepeatMasker directories)? r do I need to run RepeatMasker separately using
the TE library generated by EDTA?
Thank you in advance,
Aaron :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#92 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4NCE6GX4WEZRDOB53F3TA4ORHANCNFSM4OTDDQUA>
.
|
Hey Shujun, Thank you for your reply. I apologise, I got confused by the "RM" files generated by EDTA. I forgot that they come from Repeat Modeller not Repeat Masker. It seems I will have to run Repeat Masker separately and then follow createRepeatLandscape.pl from RepeatMasker. Aaron :) |
Sorry this is confusing. RM could be both in different context. It means
RepeatModeler in the final folder and RepeatMasker in all other contexts.
So you can get a masker-like out file in the anno folder, but I am not sure
if RepeatMasker can recognize it. -Shujun
…On Fri, Feb 26, 2021 at 2:24 PM aaronphillips7493 ***@***.***> wrote:
Hey Shujun,
Thank you for your reply. I apologise, I got confused by the "RM" files
generated by EDTA. I forgot that they come from Repeat Modeller not Repeat
Masker. It seems I will have to run Repeat Masker separately and then
follow createRepeatLandscape.pl from RepeatMasker.
Aaron :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#92 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4NG7FT7I55WLYT3R3BLTA45BPANCNFSM4OTDDQUA>
.
|
Now that I have finally been able to come back to analysing the repeats almost a year later. I just wanted to thank everyone for their suggestions and help with this. I thought I would outline the commands that I used in the end for generating a repeat landscape from the EDTA library.
The Kimura distance table is made up of the last 72 lines of the divsum file and can be extracted using
This then provides table which can be read into the R script from @KristinaGagalova above. A compilation of the landscapes that I plotted are attached. I am interested in figuring out what is up with the large peak with a Kimura distance of 0 in each of the landscapes. Could it be some artifact of the process? |
Thanks for sharing your route! You may want to check the LTR Assembly Index
to make sure these genomes are in good Quality for fair comparison. It
seems to me that the lappo genome has higher quality than the aestiva
genome.
…-Shujun
On Fri, May 28, 2021 at 6:51 PM CraigMichell ***@***.***> wrote:
Now that I have finally been able to come back to analysing the repeats
almost a year later. I just wanted to thank everyone for their suggestions
and help with this.
I thought I would outline the commands that I used in the end for
generating a repeat landscape from the EDTA library.
- singularity_wrapper exec EDTA.pl --genome genome.fasta --species
others --step all --anno 1 --threads 16 --overwrite 1
- RepeatMasker -pa 2 -s -a -inv -dir ./RepMask -no_is -norna -xsmall
-nolow -div 40 -lib EDTA.TElib.fa -cutoff 225 genome.fasta
- calcDivergenceFromAlign.pl -s genome.divsum genome.fasta.align
The Kimura distance table is made up of the last 72 lines of the divsum
file and can be extracted using
- tail -n 72 genome.divsum > genome.Kimura.distance
This then provides table which can be read into the R script from
@KristinaGagalova <https://github.com/KristinaGagalova> above.
A compilation of the landscapes that I plotted are attached.
[image: Repeat_Landscapes]
<https://user-images.githubusercontent.com/39945819/119971780-1578e980-bfba-11eb-8148-17308e076cfc.jpg>
I am interested in figuring out what is up with the large peak with a
Kimura distance of 0 in each of the landscapes. Could it be some artifact
of the process?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#92 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4ND67MAOSJIMU6UDMP3TP5YT7ANCNFSM4OTDDQUA>
.
|
This looks like an interesting package - https://github.com/dwinter/repeatR |
First of all, thank you all. the above data is a head file from @KristinaGagalova dataset Thank you |
Hi, these are the type of repeats that are annotated (broad definition) and their corresponding counts in the genome. It's basically a quantification per class. The div column shows the % divergence and it's the x-axis in Kimura plot |
https://github.com/jtlovell/GENESPACE might be helpful here too! |
Thanks for sharing! but I'm confused ,I added up all the numbers in the divsum file except for the first column and divided by the size of the genome,;There is a huge difference between the results of the two, the result of the *EDTA file is 68.17%, and the result of my calculation is 98%!, I would like to ask where I misunderstood,Because I'm trying to draw a pie chart that shows the percentage of these transposons QAQ *mod.EDTA.TEanno.sum:
|
Hi,
This is not a complaint about the program at all. I think that it is great and easy to use. I have been able to identify repeat elements in two genomes now with relative ease. I am a relative newbie to the area of repeat genomics and I would like to continue the analysis with the repeats identified from the EDTA pipeline.
Would you be able to advise on the best way to use the information from EDTA to create a repeat landscape type plot? I tried taking the repeat library created and masking my genome using that with RepeatMasker and following the next few steps for creating a repeat landscape plot. But the landscape plot is empty, I have a feeling it is because the classification of repeats is different? But as I said, I am a bit naive when it comes to analysis of repeat elements.
Any help or suggestions would be greatly appreciated.
Thanks!
The text was updated successfully, but these errors were encountered: