Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: 1) improve error in transform when providing an empty hap file and a --region and 2) allow for calling write() on Genotypes objects without variants #264

Merged
merged 9 commits into from
Dec 11, 2024

Conversation

aryarm
Copy link
Member

@aryarm aryarm commented Nov 27, 2024

This PR fixes two issues.

1. An issue with the transform command

When the input .hap file is empty and the --region parameter is provided, the transform command will fail with cryptic psyam errors instead of outputting an empty genotypes file. This PR resolves those.

The first error occurs with a regular hap file (that isn't indexed):

OSError: index `empty.hap.tbi` not found

https://github.com/CAST-genomics/haptools/actions/runs/12058438776/job/33625133365#step:9:380

And the second occurs when the hap file is indexed:

ValueError: could not create iterator for region '1:10116-10122'

https://github.com/CAST-genomics/haptools/actions/runs/12058465901/job/33625213549#step:9:400

2. An issue with the Genotypes classes

As an extra bonus, this PR also resolves an error in the GenotypesVCF and GenotypesPLINK classes when attempting to write files without any variants:

OSError: [Errno 9] Bad file descriptor

and

RuntimeError: b'No variants in test.pvar.\n'

https://github.com/CAST-genomics/haptools/actions/runs/12059160422/job/33627254672#step:9:378

@aryarm aryarm changed the title fix: avoid errors in transform when it is given an empty hap file fix: avoid errors in transform when it is given an empty hap file and a --region Nov 27, 2024
@aryarm aryarm changed the title fix: avoid errors in transform when it is given an empty hap file and a --region fix: avoid pysam errors in transform when providing an empty hap file and a --region Nov 27, 2024
@aryarm aryarm changed the title fix: avoid pysam errors in transform when providing an empty hap file and a --region fix: avoid pysam errors in transform when providing an empty hap file and a --region and allow for writing genotypes files without variants Nov 27, 2024
@aryarm aryarm requested a review from mlamkin7 November 27, 2024 22:38
@aryarm aryarm marked this pull request as ready for review November 27, 2024 22:38
@aryarm
Copy link
Member Author

aryarm commented Nov 27, 2024

@mlamkin7, what do you think about this? Is the correct behavior to write an empty file or should we error out instead?

It gets a little weird for the PGEN files. For those, I think the pgenlib library won't even let me write an empty PGEN file, so I just manually write an empty file, instead. Not sure if that's a good idea, though...

@aryarm aryarm changed the title fix: avoid pysam errors in transform when providing an empty hap file and a --region and allow for writing genotypes files without variants fix: 1) avoid pysam errors in transform when providing an empty hap file and a --region and 2) allow for writing genotypes files without variants Nov 27, 2024
@mlamkin7
Copy link
Collaborator

@mlamkin7, what do you think about this? Is the correct behavior to write an empty file or should we error out instead?

It gets a little weird for the PGEN files. For those, I think the pgenlib library won't even let me write an empty PGEN file, so I just manually write an empty file, instead. Not sure if that's a good idea, though...

Right now I can't think of any reason why transform would happen with an empty hap file and we would want an empty output. I'd say its better just to error out and let them know .hap is empty.

@aryarm
Copy link
Member Author

aryarm commented Dec 1, 2024

ok, that sounds reasonable

I'll add some code to raise those errors

@aryarm aryarm changed the title fix: 1) avoid pysam errors in transform when providing an empty hap file and a --region and 2) allow for writing genotypes files without variants fix: 1) improve error in transform when providing an empty hap file and a --region and 2) allow for calling write() on Genotypes objects without variants Dec 1, 2024
@aryarm
Copy link
Member Author

aryarm commented Dec 1, 2024

ok, done! See 525fd44 for the new Value Error in transform

Note that I also made some minor changes in 9889833 to unrelated things:
I integrated our project with VSCode's testing/validation suite in the .devcontainer.json file, and VSCode pointed out a few missing import statements and complained about some type-hints that I fixed. Together, these changes get us one step closer to using a type checker as requested in #117: we just need to address this TODO

If you want to view the changes in this PR without the minor changes in 9889833, click here

@aryarm aryarm merged commit 4e84178 into main Dec 11, 2024
14 checks passed
@aryarm aryarm deleted the fix/transform-empty-hap branch December 11, 2024 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants