repeat landscape #159

hlin1221 · 2024-11-26T13:58:20Z

Hi,
A question about the repeat landscape confused me a lot.
The divergence was from the comparison between the sequences and the consensus sequences (the lib), Why it can reflect the time.
For example, lower values closer to 0 representing more recent events and higher value representing older events.
If I change the lib, whether The landscape will change ?
if the consensus is old , the time may be reverse ?

Thanks,

TobyBaril · 2024-11-26T15:06:05Z

Hi,

Consensus sequences are essentially a best-guess at what the ancestral sequence could have looked like (as it is made as an "average" of good copies in the genome of interest). Given this, divergence of an individual TE copy in comparison to the consensus sequence (which is the estimated TE sequence of the original element) is taken as a proxy for relative TE activity, where the TE looking nearly-identical to the consensus is taken as recent activity (as the sequence has not changed much following insertion), and the TE looking very different to the consensus is taken as ancient activity (as the sequence has changed a lot following insertion). This is all done under the caveat that TEs are neutrally-evolving, which we know is unlikely to be the case. Even so, this is the generally accepted method of determining relative TE activity within genomes and has been used extensively in the field.

If you change the TE library, then the annotated TEs are going to be compared to the respective consensus sequences, which can then change the divergence time estimates. For example, using a human TE library in mouse could make some TEs look older, as the consensus sequence generated in human might estimate an ancestral sequence that looks different to the consensus sequence that has been generated using copies in the mouse.

It is not possible to determine if a consensus sequence is "old", as it is an estimate of the ancestral sequence, so never actually existed, and is our best guess of what an active element might have looked like. In this regard, we always assume a TE looking similar to the consensus to have been active more recently, but this of course can be affected by the quality of the consensus.

It is generally very difficult to get accurate estimates of TE age, which is why most studies will use divergence as a measure of relative activity.

hlin1221 · 2024-11-27T01:35:08Z

Thank you for your kindly reply. | | 林海 | | ***@***.*** | ---- Replied Message ---- | From | Tobias ***@***.***> | | Date | 11/26/2024 23:06 | | To | ***@***.***> | | Cc | ***@***.***>, ***@***.***> | | Subject | Re: [TobyBaril/EarlGrey] repeat landscape (Issue #159) | Hi, Consensus sequences are essentially a best-guess at what the ancestral sequence could have looked like (as it is made as an "average" of good copies in the genome of interest). Given this, divergence of an individual TE copy in comparison to the consensus sequence (which is the estimated TE sequence of the original element) is taken as a proxy for relative TE activity, where the TE looking nearly-identical to the consensus is taken as recent activity (as the sequence has not changed much following insertion), and the TE looking very different to the consensus is taken as ancient activity (as the sequence has changed a lot following insertion). This is all done under the caveat that TEs are neutrally-evolving, which we know is unlikely to be the case. Even so, this is the generally accepted method of determining relative TE activity within genomes and has been used extensively in the field. If you change the TE library, then the annotated TEs are going to be compared to the respective consensus sequences, which can then change the divergence time estimates. For example, using a human TE library in mouse could make some TEs look older, as the consensus sequence generated in human might estimate an ancestral sequence that looks different to the consensus sequence that has been generated using copies in the mouse. It is not possible to determine if a consensus sequence is "old", as it is an estimate of the ancestral sequence, so never actually existed, and is our best guess of what an active element might have looked like. In this regard, we always assume a TE looking similar to the consensus to have been active more recently, but this of course can be affected by the quality of the consensus. It is generally very difficult to get accurate estimates of TE age, which is why most studies will use divergence as a measure of relative activity. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

hlin1221 · 2024-12-12T03:15:43Z

Hi,
Sorry for I have an another question.
Like EDTA, the lib is just constructed from remove-redundancy, rather than constructed from consensus sequence . And then calculated the divergence by calcdivergencefromalign.pl in repeatmasker (oushujun/EDTA#92).

So, the Div calculated from comparisons between EDTA lib and repeats can reflect the TE activity?
Or, the calcdivergencefromalign.pl can construct the consensus seq?

Thank you

TobyBaril closed this as completed Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repeat landscape #159

repeat landscape #159

hlin1221 commented Nov 26, 2024

TobyBaril commented Nov 26, 2024

hlin1221 commented Nov 27, 2024 via email

hlin1221 commented Dec 12, 2024

repeat landscape #159

repeat landscape #159

Comments

hlin1221 commented Nov 26, 2024

TobyBaril commented Nov 26, 2024

hlin1221 commented Nov 27, 2024 via email

hlin1221 commented Dec 12, 2024