Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behaviour of missing chromosome name inside a narrowPeak/begraph file #120

Closed
mevers opened this issue Oct 28, 2019 · 24 comments · Fixed by #149
Closed

Behaviour of missing chromosome name inside a narrowPeak/begraph file #120

mevers opened this issue Oct 28, 2019 · 24 comments · Fixed by #149
Assignees

Comments

@mevers
Copy link

mevers commented Oct 28, 2019

Versions

pyGenomeTracks

pgt --version
pgt 3.1.2

Python

python --version
Python 3.7.4

Issue

When I include a narrowPeak file that does not contain any entries for a specific chromosome (say 1) and then plot a region involving that chromosome, the following error is produced

*Error*
Neither 1 or chr1 exits as a chromosome name inside the bedgraph file.

I'm not sure if this is intended behaviour but it would be great to have the option to have pyGenomeTracks produce an empty track when plotting a region for which there are no entries in a narrowPeak file, rather than throwing an error.

This would be good for two reasons:

  1. We often look at specific genes/regions on chromosomes/scaffolds/contigs and want to show coverage and MACS2 narrowPeak tracks. However, the MACS2 narrowPeak file might not contain any peaks for that particular chromosome/scaffold/contig. We'd still like to show the coverage track and an empty narrowPeak track.
  2. The error breaks our automated snakemake analysis pipeline, in which we generate pyGenomeTracks plots for a set of different genes/regions. Having pyGenomeTracks produce an empty track for those genes/regions where there are no events in the narrowPeak file would allow the snakemake workflow to successfully finish without an error.
@LeilyR
Copy link
Contributor

LeilyR commented Oct 29, 2019

Hi Maurits,
Thanks for bringing this issue up. It shouldn't be like that, I will have a look and will get back to you.

@mevers
Copy link
Author

mevers commented Oct 29, 2019

Hi @LeilyR.

The same error occurs if we include a simple BED file.

Here is a minimal reprex:

tracks.ini

[x-axis]
where = top
title = Position
fontsize = 8

[bed test]
file = test.bed

test.bed

1	150000000	180000000	sample_feature	0	.

Running

pgt --tracks tracks.ini --region 2:100000000-200000000 -out test.pdf

produces the following error


title not set for 'section 2. [bed test]'
INFO:pygenometracks.tracksClass:time initializing track(s):
INFO:pygenometracks.tracksClass:0.0010640621185302734
DEBUG:pygenometracks.tracksClass:Figure size in cm is 40 x 1.5. Dpi is set to 72

INFO:pygenometracks.tracksClass:plotting 1. [x-axis]
INFO:pygenometracks.tracksClass:plotting 2. [bed test]
ERROR:pygenometracks.tracks.GenomeTrack:*Error*
Neither 2 nor chr2 exits as a chromosome name inside the bed file.

Running

pgt --tracks tracks.ini --region 1:100000000-200000000 -out test.pdf

works fine.

@LeilyR
Copy link
Contributor

LeilyR commented Oct 30, 2019

Hi Maurits,
Thanks for the detail, I have noticed that myself too. I have already a new PR soon to be merged to the develop branch which addresses your issue. I will let you as soon as it is merged then you can use it.

@mevers
Copy link
Author

mevers commented Oct 30, 2019

Hi @LeilyR.

Thanks for letting me know & the quick work. I've been enjoying pyGenomeTracks a lot!

@LeilyR
Copy link
Contributor

LeilyR commented Oct 30, 2019

develop branch is now updated with your request. Please feel free to use it and let us know if there was any issue left. Now, it only throws a warning message and still generates an empty track.

@mevers
Copy link
Author

mevers commented Oct 30, 2019

Hi @LeilyR.

Thanks for the quick update. Much appreciated. The minimal reprex now works; the only difference being that I now need to run the example with pyGenomeTracks (pgt seems to be discontinued?).

However, if I load a narrowPeak file, I now get an error

Traceback (most recent call last):
  File "/home/mevers/miniconda3/bin/pyGenomeTracks", line 11, in <module>
    main(args)
  File "/home/mevers/miniconda3/lib/python3.7/site-packages/pygenometracks/plotTracks.py", line 307, in main
    trp.plot(args.outFileName, *region, title=args.title)
  File "/home/mevers/miniconda3/lib/python3.7/site-packages/pygenometracks/tracksClass.py", line 262, in plot
    track.plot(plot_axis, chrom, start, end)
  File "/home/mevers/miniconda3/lib/python3.7/site-packages/pygenometracks/tracks/NarrowPeakTrack.py", line 93, in plot
    name, score, strand, signal_value, p_value, q_value, summit = peak
TypeError: cannot unpack non-iterable float object

I'm not sure if this is related, but this didn't happen with the previous version. I will do some more testing, and update with a suitable reprex.

@mevers
Copy link
Author

mevers commented Oct 31, 2019

A minimal reprex:

track.ini

[x-axis]
where = top
title = Position
fontsize = 8

[narrowPeak test]
file = test.narrowPeak

test.narrowPeak

1	150000000	180000000	sample_feature	0	.	1	-1	-1	100

Running

pyGenomeTracks --tracks tracks.ini --region 2:100000000-200000000 -out test.pdf

gives the following error


title not set for 'section 2. [narrowPeak test]'
INFO:pygenometracks.tracksClass:time initializing track(s):
INFO:pygenometracks.tracksClass:0.0009672641754150391
DEBUG:pygenometracks.tracksClass:Figure size in cm is 40 x 1.595744680851064. Dpi is set to 72

INFO:pygenometracks.tracksClass:plotting 1. [x-axis]
INFO:pygenometracks.tracksClass:plotting 2. [narrowPeak test]
Traceback (most recent call last):
  File "/Users/u2528469/miniconda3/bin/pyGenomeTracks", line 11, in <module>
    main(args)
  File "/Users/u2528469/miniconda3/lib/python3.7/site-packages/pygenometracks/plotTracks.py", line 307, in main
    trp.plot(args.outFileName, *region, title=args.title)
  File "/Users/u2528469/miniconda3/lib/python3.7/site-packages/pygenometracks/tracksClass.py", line 262, in plot
    track.plot(plot_axis, chrom, start, end)
  File "/Users/u2528469/miniconda3/lib/python3.7/site-packages/pygenometracks/tracks/NarrowPeakTrack.py", line 87, in plot
    score_list, pos_list = self.get_scores(chrom_region, start_region, end_region, return_nans=False)
  File "/Users/u2528469/miniconda3/lib/python3.7/site-packages/pygenometracks/tracks/BedGraphTrack.py", line 171, in get_scores
    self.log.warning("*Warning*\nNeither "
AttributeError: 'NarrowPeakTrack' object has no attribute 'log'

It seems that method get_scores still does the chromosome check leading to an error.

@LeilyR
Copy link
Contributor

LeilyR commented Oct 31, 2019

Hi Maurits,
I know where the error in your last comment comes from, i will work on that.
On the other hand, about the error in your previous message, all I can say at the moment is that it cannot be from my current changes and it seems to be a type issue, Can you send me your command line which generated that error (TypeError: cannot unpack non-iterable float object).
Also I am not sure I really got how you mean by pgt seems to be discontinued? That is totally independent of your issue with the data type right?

@LeilyR LeilyR self-assigned this Oct 31, 2019
@mevers
Copy link
Author

mevers commented Oct 31, 2019

Hi @LeilyR.

Regarding the TypeError: cannot unpack non-iterable float object: The full command was

pyGenomeTracks          \
    --tracks analysis/GRCh38+rDNA_repeat/pygenometracks/tracks_genome_ratio_bw10.ini         \
    --region 6:73509724-73527058         \   
    --width 40           \
    --dpi 300         \    
    --fontSize 8    \         
    --outFileName analysis/GRCh38+rDNA_repeat/pygenometracks/ratio_IP_vs_pooled_control_6:73509724-73527058_bw10.pdf

title not set for 'section 27. [annotation]'
INFO:pygenometracks.tracksClass:time initializing track(s):
INFO:pygenometracks.tracksClass:6.471177577972412
DEBUG:pygenometracks.tracksClass:Figure size in cm is 40.0 x 45.74468085106383. Dpi is set to 300

INFO:pygenometracks.tracksClass:plotting 1. [x-axis]
INFO:pygenometracks.tracksClass:plotting 2. [bigwig CX-5461+I-BET151 1]
INFO:pygenometracks.tracksClass:plotting 3. [bigwig CX-5461+I-BET151 2]
INFO:pygenometracks.tracksClass:plotting 4. [spacer]
INFO:pygenometracks.tracksClass:plotting 5. [narrowPeak CX-5461+I-BET151]
Traceback (most recent call last):
  File "/home/mevers/miniconda3/bin/pyGenomeTracks", line 11, in <module>
    main(args)
  File "/home/mevers/miniconda3/lib/python3.7/site-packages/pygenometracks/plotTracks.py", line 307, in main
    trp.plot(args.outFileName, *region, title=args.title)
  File "/home/mevers/miniconda3/lib/python3.7/site-packages/pygenometracks/tracksClass.py", line 262, in plot
    track.plot(plot_axis, chrom, start, end)
  File "/home/mevers/miniconda3/lib/python3.7/site-packages/pygenometracks/tracks/NarrowPeakTrack.py", line 93, in plot
    name, score, strand, signal_value, p_value, q_value, summit = peak
TypeError: cannot unpack non-iterable float object

Here are the relevant lines from the ini file

[x-axis]
where = top
title = Position
fontsize = 8

[bigwig CX-5461+I-BET151 1]
file = /home/mevers/Projects/ChIPseq_gammaH2AX/analysis/GRCh38+rDNA_repeat/deeptools/bamCompare/ratio/IP_CX-5461+I-BET151_rep1_vs_pooled_control.normSES.bw10.bw
height = 3
title = CX-5461+I-BET151 1 IP vs. pooled input
min_value = 0
max_value = 30
color = #E64B35FF

[bigwig CX-5461+I-BET151 2]
file = /home/mevers/Projects/ChIPseq_gammaH2AX/analysis/GRCh38+rDNA_repeat/deeptools/bamCompare/ratio/IP_CX-5461+I-BET151_rep2_vs_pooled_control.normSES.bw10.bw
height = 3
title = CX-5461+I-BET151 2 IP vs. pooled input
min_value = 0
max_value = 30
color = #E64B35FF

And here is the head of the narrowPeak file that seems to cause the error

1	11824	12001	IDR_peak_0	60	.	6.0955900000000005	-1	-1	55
1	16338	16502	IDR_peak_1	151	.	7.804939999999999	-1	-1	80
1	32612	32751	IDR_peak_2	60	.	4.93508	-1	-1	67
1	34147	34396	IDR_peak_3	204	.	6.64658	-1	-1	103
1	42843	43181	IDR_peak_4	245	.	6.97182	-1	-1	102
1	46226	46356	IDR_peak_5	205	.	6.46985	-1	-1	60
1	50042	50406	IDR_peak_6	60	.	5.60064	-1	-1	165
1	50726	51379	IDR_peak_7	217	.	6.89927	-1	-1	364
1	52284	52642	IDR_peak_8	317	.	7.647760000000001	-1	-1	232
1	52778	52956	IDR_peak_9	204	.	6.46985	-1	-1	81

I hope this helps. I'm also not entirely sure how/if this is related to the other issue and am still investigating.

What I meant with pgt seems to be discontinued? was that I used to call pyGenomeTracks with pgt; that doesn't seem to work anymore after installing from develop. So yes, unrelated to the chromosome name issue. But it seems to be related to the changes from develop.

@LeilyR
Copy link
Contributor

LeilyR commented Oct 31, 2019

Thanks a lot for the clarification, I totally agree with you that it has something to do with the develop version of the code, I will check on that and will update you.

@LeilyR
Copy link
Contributor

LeilyR commented Oct 31, 2019

btw can you also send me your narrow peak track from you ini file, thanks!

@LeilyR
Copy link
Contributor

LeilyR commented Oct 31, 2019

@lldelisle could you please have a look at what you have recently added to develop to see if any of them have caused this error (TypeError: cannot unpack non-iterable float object) I suspect it has something to do with add summit. Would you mind having a look and letting me know, it would save me some time. Thanks!

@mevers
Copy link
Author

mevers commented Oct 31, 2019

Here is the link to the full narrowPeak file: CX-5461+I-BET151_IDRpeaks.narrowPeak

@LeilyR
Copy link
Contributor

LeilyR commented Oct 31, 2019

Hey, thanks I meant the part in your tracks.ini file where you have your narrow peaks. You have already sent me the part with your bigwigs. Sorry if i was not clear

@mevers
Copy link
Author

mevers commented Oct 31, 2019

Ah sorry. I forgot to include that part. Here is the relevant part from the ini.

[x-axis]
where = top
title = Position
fontsize = 8

[bigwig CX-5461+I-BET151 1]
file = /home/mevers/Projects/ChIPseq_gammaH2AX/analysis/GRCh38+rDNA_repeat/deeptools/bamCompare/ratio/IP_CX-5461+I-BET151_rep1_vs_pooled_control.normSES.bw10.bw
height = 3
title = CX-5461+I-BET151 1 IP vs. pooled input
min_value = 0
max_value = 30
color = #E64B35FF

[bigwig CX-5461+I-BET151 2]
file = /home/mevers/Projects/ChIPseq_gammaH2AX/analysis/GRCh38+rDNA_repeat/deeptools/bamCompare/ratio/IP_CX-5461+I-BET151_rep2_vs_pooled_control.normSES.bw10.bw
height = 3
title = CX-5461+I-BET151 2 IP vs. pooled input
min_value = 0
max_value = 30
color = #E64B35FF


[spacer]


[narrowPeak CX-5461+I-BET151]
file = analysis/GRCh38+rDNA_repeat/pygenometracks/CX-5461+I-BET151_IDRpeaks.narrowPeak
height = 1
title = CX-5461+I-BET151 IDR peaks
show labels = no
type = box
color = #E64B35FF

@lldelisle
Copy link
Collaborator

@LeilyR I will look at it.

For the error with the narrow peak:AttributeError: 'NarrowPeakTrack' object has no attribute 'log' I know where it comes from and it will be solved in the PR I will put today.

@lldelisle lldelisle mentioned this issue Oct 31, 2019
@LeilyR
Copy link
Contributor

LeilyR commented Oct 31, 2019

What I meant with pgt seems to be discontinued? was that I used to call pyGenomeTracks with pgt; that doesn't seem to work anymore after installing from develop.

pgt is still working, i didn't see any issue there

@LeilyR
Copy link
Contributor

LeilyR commented Oct 31, 2019

Hi Maurits,
we have already started working on the issue you have brought up, It soon will be added to develop and hopefully will be out in the next release, but for now feel free to use pygenometracks from narrowPeak_debug branch it should work on your narrow peaks now, let me know if there was any problem left.

@mevers
Copy link
Author

mevers commented Oct 31, 2019

Hi @LeilyR.

Re: pgt is still working, i didn't see any issue there

Hmm, really? This is what I get with the previous minimal sample data

pyGenomeTracks --tracks tracks.ini --region 1:100000000-200000000 -out test.pdf

title not set for 'section 2. [narrowPeak test]'
INFO:pygenometracks.tracksClass:time initializing track(s):
INFO:pygenometracks.tracksClass:0.0015549659729003906
DEBUG:pygenometracks.tracksClass:Figure size in cm is 40 x 1.595744680851064. Dpi is set to 72

INFO:pygenometracks.tracksClass:plotting 1. [x-axis]
INFO:pygenometracks.tracksClass:plotting 2. [narrowPeak test]
DEBUG:pygenometracks.tracks.GenomeTrack:ylim 100,0

versus

pgt --tracks tracks.ini --region 1:100000000-200000000 -out test.pdf
usage: pyGenomeTracks --tracks tracks.ini --region chr1:1000000-4000000 -o image.png

Plots genomic tracks on specified region(s). Citation : Ramirez et al. High-
resolution TADs reveal DNA sequences underlying genome organization in flies.
Nature Communications (2018) doi:10.1038/s41467-017-02525-w

optional arguments:
  -h, --help            show this help message and exit
  --tracks TRACKS       File containing the instructions to plot the tracks.
                        The tracks.ini file can be genarated using the
                        `make_tracks_file` program.
  --region REGION       Region to plot, the format is chr:start-end
  --BED BED             Instead of a region, a file containing the regions to
                        plot, in BED format, can be given. If this is the
                        case, multiple files will be created using a prefix
                        the value of --outFileName
  --width WIDTH         figure width in centimeters
  --height HEIGHT       Figure height in centimeters. If not given, the figure
                        height is computed based on the heights of the tracks.
                        If given, the track height are proportionally scaled
                        to match the desired figure height.
  --title TITLE, -t TITLE
                        Plot title
  --outFileName OUTFILENAME, -out OUTFILENAME
                        File name to save the image, file prefix in case
                        multiple images are stored
  --vlines VLINES [VLINES ...]
                        Genomic cooordindates separated by space. E.g.
                        --vlines 150000 3000000 124838433
  --fontSize FONTSIZE   Font size for the labels of the plot
  --dpi DPI             Resolution for the image in case theouput is a raster
                        graphics image (e.g png, jpg)
  --trackLabelFraction TRACKLABELFRACTION
                        By default the space dedicated to the track labels is
                        0.05 of theplot width. This fraction can be changed
                        with this parameter if needed.
  --version             show program's version number and exit

It seems pgt simply prints the usage. Perhaps this is intended, in which case I missed something and this can be ignored.

I will test the new version in narrowPeak_debug later and report back.

Cheers,
Maurits

@mevers
Copy link
Author

mevers commented Nov 1, 2019

Hi @LeilyR

After updating pyGenomeTracks from narrowPeak_debug I am still getting the error

title not set for 'section 27. [annotation]'
INFO:pygenometracks.tracksClass:time initializing track(s):
INFO:pygenometracks.tracksClass:6.1711320877075195
DEBUG:pygenometracks.tracksClass:Figure size in cm is 40.0 x 45.74468085106383. Dpi is set to 300

INFO:pygenometracks.tracksClass:plotting 1. [x-axis]
INFO:pygenometracks.tracksClass:plotting 2. [bigwig CX-5461+I-BET151 rep1]
INFO:pygenometracks.tracksClass:plotting 3. [bigwig CX-5461+I-BET151 rep2]
INFO:pygenometracks.tracksClass:plotting 4. [spacer]
INFO:pygenometracks.tracksClass:plotting 5. [narrowPeak CX-5461+I-BET151]
Traceback (most recent call last):
  File "/home/mevers/miniconda3/bin/pyGenomeTracks", line 11, in <module>
    main(args)
  File "/home/mevers/miniconda3/lib/python3.7/site-packages/pygenometracks/plotTracks.py", line 307, in main
    trp.plot(args.outFileName, *region, title=args.title)
  File "/home/mevers/miniconda3/lib/python3.7/site-packages/pygenometracks/tracksClass.py", line 262, in plot
    track.plot(plot_axis, chrom, start, end)
  File "/home/mevers/miniconda3/lib/python3.7/site-packages/pygenometracks/tracks/NarrowPeakTrack.py", line 94, in plot
    name, score, strand, signal_value, p_value, q_value, summit = peak
TypeError: cannot unpack non-iterable float object

@lldelisle
Copy link
Collaborator

Hi,
For the moment we are still working on it. I just PR a solution that I think should work (#128 ). You can use it meanwhile...

@LeilyR
Copy link
Contributor

LeilyR commented Nov 4, 2019

Hi Maurits,
pgt is working fine and I really cannot reproduce that issue. The fact that you see the help message itself means that pgt works. Are you using conda? If so, Could you please make a fresh environment , and re-install the pygenometrack from narrowPeak_debug branch? We were working on your issue for the past days and I have just tested it on your own narrow peak file giving a bed file with 2 regions where one is in the narrow peak file and the other one is not, and it seems to work fine. We will merge it to the develop branch as soon as it works fine for you too. Please let me know if there was any issue. Thanks!

@mevers
Copy link
Author

mevers commented Nov 4, 2019

Hi @LeilyR .

Thanks for the update! I've tested the version from the narrowPeak_debug branch, and everything seems to work fine. Very nice!

Re: pgt, yeah, I'm not sure what's going on. It's not really an issue as I can use pyGenomeTracks instead. I'll do some further testing as you suggested, please let me know if I should report back here or open a new issue.

Either way, thanks again to you and @lldelisle for looking into this and for the quick fix! Much appreciated.

Cheers,
Maurits

@LeilyR
Copy link
Contributor

LeilyR commented Nov 5, 2019

Hi Maurits,
Glad to hear that your problem with narrow peaks has been solved.
I close this issue but feel free to open another one if you still had an issue with pgt.

@LeilyR LeilyR closed this as completed Nov 5, 2019
bgruening added a commit that referenced this issue Dec 7, 2019
deprecations in preperation for a new 4.0 release.

During the years we have introduced several ways to enable/disable settings.
We have used on/off, yes/no, 0/1, true/false. From now on we recommend
to only use true/false so that we can unify our config files and make it
more intuitive for all pgt-users.

We also removed all 2 word items and concatenate them now by `_` (#132).
For example `line width` is now `line_width`. Thanks @LeilyR!

Features:
* Every config is now checked for syntax errors before anything is executed (@lldelisle)
* Added the possibility to merge transcripts into one single gene representation when using gtf (@lldelisle)
* Added the possibility to rasterize bedgraph plots (@lldelisle)
* Added the possibility to use summary functions on bedgraphs (@lldelisle)
* Generate an empty track if a requested region is not in the given track-files. Fixed #120 (@LeilyR)
* Generate an empty track if a chromosom is missing in bedgraph files. Fixed #120
* Improved UCSC style for intron arrows (@lldelisle)
* Flybase style now supports color_utr and height_uts (@lldelisle)
* A new tracktype `hlines` was added (@lldelisle)
* Allow to plot the arcs of the links with a color scale based on scores as proposed in #30 (@lldelisle)
* Allow to plot rectangle on a heatmap for loops. Fixed #47 (@Phlya, @lldelisle)
* A lot of HiC Matrix improvements (@lldelisle)
* The `alpha` property can now be used nearly everywhere (@lldelisle)

Also checkout our updated [readme]().

Merry Xmas and a happy new year!
The PGT team!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants