Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on mouse data #5

Open
dvantwisk opened this issue Sep 1, 2023 · 6 comments
Open

Running on mouse data #5

dvantwisk opened this issue Sep 1, 2023 · 6 comments

Comments

@dvantwisk
Copy link

Is genion capable of running on mouse data? I've switched out the references, but it doesn't appear to be working for me so I just want to know if the package was only designed for analysis of human data.

@f0t1h
Copy link
Collaborator

f0t1h commented Sep 1, 2023

Hi, genion is species agnostic (However, only tested on human data).

The current version requires segmental duplication annotation (in the genomicSuperDups format. If it is not possible to find it for the mouse genome, an empty file should work.

If the problem is something else please let me know.

@dvantwisk
Copy link
Author

I've gotten the program to work without error, however, it doesn't find any output whereas other long-read fusion transcript programs do. Perhaps it is the case that it is not able to find the fusions in the mouse data that I have, but I suspect something may not be going right. I suspect it may have something to do with generating the sequence similarity file. The command:

cat [cdna.selfalign.paf] | cut -f1,6 | sed 's/_/\t/g' | awk 'BEGIN{OFS=\"\\t\";}{print substr($1,1,15),substr($2,1,15),substr($3,1,15),substr($4,1,15);}' | awk '$1!=$3' | sort | uniq > [cdna.selfalign.tsv]

Does not work on mouse data as mouse transcript ensembl identifiers begin with ENSMUST whereas human ones begin with ENST, therefore the awk substr(row,1,15) command must be substr(row,1,18) for mouse. It's also worth noting that the awk command listed above is not specific for all operating systems and may fail on some. It may be worth making a quick tool for generating it instead of giving the above piping of commands.

@f0t1h
Copy link
Collaborator

f0t1h commented Sep 10, 2023

Thank you for bringing this up. I will patch the tool as soon as possible to avoid this problem on non-human datasets.

@fhach
Copy link
Contributor

fhach commented Sep 30, 2023

@f0t1h any update on this?

@f0t1h
Copy link
Collaborator

f0t1h commented Oct 2, 2023

I finished the fix, testing it at the moment.

@dvantwisk
Copy link
Author

Has the fix been implemented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants