Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output of pangenome #4

Open
liufy11 opened this issue Aug 31, 2020 · 5 comments
Open

output of pangenome #4

liufy11 opened this issue Aug 31, 2020 · 5 comments

Comments

@liufy11
Copy link

liufy11 commented Aug 31, 2020

Hello! I get the result of pangenome. but I don't know the meaning of the file. like pan-genome.wga.conensus.stats,pan-genome.wga.core.stats, pan-genome.wga.newseq.stats and files in tmp. like An-1.com-2.wga.bed, tmp.An-1.C24.bed and so on .

@wen-biao
Copy link
Contributor

Hi,

the output files resulted from pan-genome construction based on WGA:

pan-genome.wga.conensus.stats # for the pan-genome size

pan-genome.wga.core.stats # for the core-genome size

pan-genome.wga.newseq.stats # for the new sequence size

Since we have eight genomes, the result file contains eight lines.

Each line represents the pan-genome or core-genome or new sequence size under different number of input genomes (from 1 to 8 genomes).

Each line is tab-separated, each number is the pan-genome or core-genome or new sequence size calculated under different combinations of genomes.

the files with prefix tmp are just the intermediate files.

@liufy11
Copy link
Author

liufy11 commented Aug 31, 2020 via email

@wen-biao
Copy link
Contributor

For your case including three genomes, the last line of the file pan-genome.wga.conensus.stats should contains three numbers. Because each genome can be selected as the reference and other genomes will be compared to the reference. Each number of the last line is the pan-genome size (total length of non-redundant sequences) when three genomes are included.

If you just expect one number for the pan-genome including all your genomes, just select one of them.

@liufy11
Copy link
Author

liufy11 commented Aug 31, 2020 via email

@wen-biao
Copy link
Contributor

Again, since you have three genomes, each genome can be selected as the reference. The first line means if you just select one genome to build pan-genome, you will have three (any one of your three genomes). All columns except the first column indicate the pan-genome size. The second line means you select two genomes from your three genomes, you have 2*3 combinations (again all genomes can be the reference),

Yes, this only gives the size of pan-genome because these scripts are just used for the project that we recently did, not for a general method of pan-genome building.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants