-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pRegion for all #172
pRegion for all #172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I got you correctly, you try to load only certain part of matrix instead of all of it and then plot some regions in that part, right? As you have mentioned yourself this is only possible if .cool
. I am not sure if from your changes I could see how you could handle that. With .h5
matrices the entire matrix needs to be loaded anyway. Or are pRegions basically the same regions which are given via --bed
?
try: | ||
self.hic_ma = HiCMatrix.hiCMatrix(self.properties['file'], pChrnameList=region) | ||
except Exception: | ||
region = [str(self.properties['region'][0]) + ':' + str(start) + '-' + str(self.properties['region'][2])] | ||
self.hic_ma = HiCMatrix.hiCMatrix(self.properties['file'], pChrnameList=region) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that if you would give chrX instead of X it would not work but it worked, so you are right, I will remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I experienced is that if you give a list which has a length of 2 this exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the code, it looks like it could fail:
https://github.com/deeptools/HiCMatrix/blob/cd54a2e7982bc0880e17536a44b09459620cb6db/hicmatrix/lib/cool.py#L106-L120
but I did not manage to make it fail maybe this is linked to the version of cooler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how you mean. I think it means that it can only load one region at the time, this can be confirmed by @joachimwolff . Maybe if you have more than one region you need to do it recursively ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed:
If the chromosome in the cool is UCSC style (chr1, chr2...) and you want to plot --region Y:2500000-2600000
you get:
Wrong chromosome format. Please check UCSC / ensembl notation.
but as it is an exit
you cannot except it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I will remove the try/catch and check it is UCSC format as UCSC format works on ensembl format but not the contrary.
@joachimwolff, in a next release, could you transform all the exit in exception, so we could handle them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was not a bad idea to fail or at least warn the user about format. just adding chr at the beginning of every single chr can make troubles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we really need to be able to catch the errors, @joachimwolff do you want me to do a PR?
I realized that if the region to plot is the end of the chromosome, then it will exit and we have no way to deal with it (and this but in independent of this PR this is a current bug).
If you want to deal with matrices in both format (ucsc and ensembl) it would be good to be able to plot both...
My prefered solution would be:
- update HiCMatrix to be able to catch the error and even better, check that the pChromList does not go over the chromosome size.
- use try/Catch to deal with ucsc/ensembl and if it is not implemented in HiCMatrix, to deal with region to get matrix above the chr size.
- have a good conversion UCSC/Ensembl because we will still have issues with contigs.
I went through your changes and I got the answer to my question, so you can ignore it. |
Hi,
|
HiCMatrix is implemented in a way that you simply pass the region you want to load and HiCMatrix takes care of it, independent if it is a cool file (and partial load is supported) or h5 (and HiCMatrix loads all and trims down to the requested region). |
@lldelisle I see you remove the UCSC/ensemble check. Do you cover UCSC/ensemble checking somewhere else? |
# The chromosome name will be UCSC format because | ||
# cooler v0.8.5 can fetch UCSC format in Ensembl-like format | ||
# but not the contrary | ||
if not chrom.startswith('chr'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it is mitochondrial dna or some contigs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be better to be able to catch the exception from HiCMatrix but for the moment we cannot as it is exit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to see how it behave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a very bad practice to add chr to just everything, that is why cooler doesn't do that too, i suppose. We already have tests to check for ensemble vs ucsc and i think that failure message in hicmatrix makes sense. This is my opinion. Maybe others have better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by
We already have tests to check for ensemble vs ucsc
I did not removed it is still here but latter on: pyGenomeTracks/pygenometracks/tracks/HiCMatrixTrack.py Lines 174 to 182 in 8bbb22e
And this is still useful. |
I was wrong, cooler cannot deal with UCSC vs Ensembl in none of the way but it raises exception with good messages:
I will remove the lines I added, however, I still think the best solution is:
|
I solved 3. in #174 |
This one needs an extensive rebase :( |
Yes but this one needs to work on HiCMatrix first.. So this is in standby... |
Dear all,
|
Sorry, finally, I am still working on it and I will call you when it is ready. |
@LeilyR @joachimwolff @bgruening I am ready for review.
I think this is a massive improvement for people using bedgraph/bed or gtf/cool files. |
@lldelisle I will only have time over the weekend, sorry. You are way too fast for me :) |
But weekend is more than fine. |
…into pRegionForAll
Just to let you know these changes implies that most color scales where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super happy with disabling the log messages for some lines of code, but I also don't have a better idea :(
Looks great @lldelisle ... feel free to merge.
Thanks for the review. Yes, for pybedtools, I don't like it neither... but the only alternative I see is to create only one temporary file and write into it all logs of bedtools and give the path at the beginning... Do you think it is better? |
…into pRegionForAll
I think it ok'ish like it is. Thanks @lldelisle! |
Hi,
There was an argument in HiCMatrix to be able to load only part of the cool matrix. Unfortunately, through the different changes, this pRegion argument were not used anymore.
I put it back and I thought it could be usefull to add it to all other Tracks, for example, bed/bedgraph/links etc...
This should speed the use of pygenometracks on real data.
If it is merged and #162 also, the message in #162 should be adapted.