Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repeat landscape #159

Closed
hlin1221 opened this issue Nov 26, 2024 · 3 comments
Closed

repeat landscape #159

hlin1221 opened this issue Nov 26, 2024 · 3 comments

Comments

@hlin1221
Copy link

Hi,
A question about the repeat landscape confused me a lot.
The divergence was from the comparison between the sequences and the consensus sequences (the lib), Why it can reflect the time.
For example, lower values closer to 0 representing more recent events and higher value representing older events.
If I change the lib, whether The landscape will change ?
if the consensus is old , the time may be reverse ?

Thanks,

@TobyBaril
Copy link
Owner

Hi,

Consensus sequences are essentially a best-guess at what the ancestral sequence could have looked like (as it is made as an "average" of good copies in the genome of interest). Given this, divergence of an individual TE copy in comparison to the consensus sequence (which is the estimated TE sequence of the original element) is taken as a proxy for relative TE activity, where the TE looking nearly-identical to the consensus is taken as recent activity (as the sequence has not changed much following insertion), and the TE looking very different to the consensus is taken as ancient activity (as the sequence has changed a lot following insertion). This is all done under the caveat that TEs are neutrally-evolving, which we know is unlikely to be the case. Even so, this is the generally accepted method of determining relative TE activity within genomes and has been used extensively in the field.

If you change the TE library, then the annotated TEs are going to be compared to the respective consensus sequences, which can then change the divergence time estimates. For example, using a human TE library in mouse could make some TEs look older, as the consensus sequence generated in human might estimate an ancestral sequence that looks different to the consensus sequence that has been generated using copies in the mouse.

It is not possible to determine if a consensus sequence is "old", as it is an estimate of the ancestral sequence, so never actually existed, and is our best guess of what an active element might have looked like. In this regard, we always assume a TE looking similar to the consensus to have been active more recently, but this of course can be affected by the quality of the consensus.

It is generally very difficult to get accurate estimates of TE age, which is why most studies will use divergence as a measure of relative activity.

@hlin1221
Copy link
Author

hlin1221 commented Nov 27, 2024 via email

@hlin1221
Copy link
Author

Hi,
Sorry for I have an another question.
Like EDTA, the lib is just constructed from remove-redundancy, rather than constructed from consensus sequence . And then calculated the divergence by calcdivergencefromalign.pl in repeatmasker (oushujun/EDTA#92).

So, the Div calculated from comparisons between EDTA lib and repeats can reflect the TE activity?
Or, the calcdivergencefromalign.pl can construct the consensus seq?

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants