Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream failing due to Xarray dims ordering change #583

Closed
tomwhite opened this issue May 25, 2021 · 5 comments · Fixed by #592
Closed

Upstream failing due to Xarray dims ordering change #583

tomwhite opened this issue May 25, 2021 · 5 comments · Fixed by #592
Labels
bug Something isn't working upstream Used when our build breaks due to upstream changes

Comments

@tomwhite
Copy link
Collaborator

Xarray no longer sorts dimension by name (pydata/xarray#4753). This affects the display representation, which is causing some upstream tests to fail. Example failure:

Differences (unified diff with -expected +actual):
    @@ -1,5 +1,5 @@
     <xarray.Dataset>
    -Dimensions:                  (alphas: 5, blocks: 2, contigs: 2, outcomes: 5, samples: 50)
    -Dimensions without coordinates: alphas, blocks, contigs, outcomes, samples
    +Dimensions:                  (blocks: 2, alphas: 5, samples: 50, outcomes: 5, contigs: 2)
    +Dimensions without coordinates: blocks, alphas, samples, outcomes, contigs
     Data variables:
         regenie_base_prediction  (blocks, alphas, samples, outcomes) float64 0.33...

I'm not sure how to fix this so that the tests work against both Xarray 0.18.2 and the main branch.

The test_gwas_linear_regression__validate_statistics test is also failing due to the same change, but the error is a numerical mismatch, which suggests the dimensions are somehow getting mixed up.

@tomwhite tomwhite added the bug Something isn't working label May 25, 2021
@hammer
Copy link
Contributor

hammer commented May 25, 2021

From pydata/xarray#4753:

Currently this PR retains the sorting in reprs.

Could we use repr() in the tests as a medium term fix?

@tomwhite
Copy link
Collaborator Author

Could we use repr() in the tests as a medium term fix?

I think that comment is out of date, since the repr was changed in the end.

@hammer
Copy link
Contributor

hammer commented May 27, 2021

🤔 this reminds me of discussions around https://github.com/pystatgen/sgkit/issues/43. I suppose we might want some kind of schema-aware comparison approach within the library?

@hammer
Copy link
Contributor

hammer commented May 27, 2021

@eric-czech suggests using http://xarray.pydata.org/en/stable/generated/xarray.Dataset.transpose.html to enforce dimension order.

@tomwhite
Copy link
Collaborator Author

I don't think transpose is what we want since it transposes the underlying arrays, not the order the dimensions are listed in the dataset/

@hammer hammer added the upstream Used when our build breaks due to upstream changes label Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Used when our build breaks due to upstream changes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants