-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concat prealigned objects #1413
Conversation
Let me expand on what this does. Many netCDF datasets consist of multiple files with identical coordinates, except for one (e.g. time). With xarray we can open these datasets with This This PR is a draft in progress. I still need to propagate the An alternative API would be to add another option to the Feedback welcome. |
if not prealigned: | ||
datasets = align(*datasets, join='outer', copy=False, exclude=[dim]) | ||
else: | ||
coords = 'minimal' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's bad form to unilaterally override an argument with another value -- it's better to raise an error (or maybe a warning).
The only value of coords
that really breaks here is 'different'
, and even that value could conceivably make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just adding the option coords='prealigned'
?
My initial thought was that, for prealigned data, all coords should just be drawn from the first object. But on second thought, what if there are other coords in the later dataset that do need to be concatenated, e.g. concat over time
with an auxiliary coordinates iteration_number
with dimension time
.
It definitely doesn't work with coords='different'
. I have not tried all the other options. I have a hard time conceptualizing what the different coords
options do. Some guidance would be very welcome. I don't really understand what the function _calc_concat_over
does.
This enhancement makes a lot of sense to me. Two things worth considering:
|
I guess we would want to check that (a) the necessary variables and dimensions exist in all datasets and (b) the dimensions have the same length. We would want to bypass the actual reading of the indices. I agree it would be nicer to subsume this logic into What is
I can add more careful checks once we sort out the align question. |
It verifies that all dimensions have the same length, and coordinates along all dimensions (used for indexing) also match. Unlike the normal version of It does not check that the necessary dimensions and variables exist in all datasets. But we should do that as part of the logic in |
As I think about this further, I realize it might be futile to avoid reading the dimensions from all the files. This is a basic part of how |
Well, we could potentially write a fast path constructor for loading
multiple netcdf files that avoids open_dataset. We just need another way to
specify the schema, e.g., using NCML.
…On Fri, May 19, 2017 at 10:53 AM Ryan Abernathey ***@***.***> wrote:
As I think about this further, I realize it might be futile to avoid
reading the dimensions from all the files. This is a basic part of how
open_dataset works.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1413 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1pDsz3dD_xmfKFgg-WYk3LBCP1raks5r7az9gaJpZM4NeYj->
.
|
Since the expensive part (for me) is actually reading all the coordinates, I'm not sure that this PR makes sense any more. The same thing I am going for here could probably be accomplished by allowing the user to pass For really big datasets, I think we will want to go the NCML approach, generating the xarray metadata as a pre-processing step. Then we could add a function like |
Sounds good to me!
|
@rabernat - I'm just catching up on this issue. Is you last comment indicating that we should close this PR? |
Yes, I think it should be closed. There are better ways to accomplish the desired goals. Specifically, allowing the user to pass kwargs to concat via open_mfdataset would be useful. |
Okay thanks, closing now. We can always reopen this if necessary. |
git diff upstream/master | flake8 --diff
whats-new.rst
for all changes andapi.rst
for new APIThis is an initial PR to bypass index alignment and coordinate checking when concatenating datasets.