Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intake-esm can't concatenate files #1

Closed
matt-long opened this issue Sep 30, 2019 · 5 comments
Closed

intake-esm can't concatenate files #1

matt-long opened this issue Sep 30, 2019 · 5 comments

Comments

@matt-long
Copy link
Contributor

notebooks/forcing_iron_flux.ipynb hangs up on this command:

dq = cesm2.search(experiment=['historical'], variable='IRON_FLUX').to_xarray(chunks={'time': 48})

I get the following error. It's possible that one ensemble member has a different file, I haven't diagnosed. First need to diagnose the problem and might need a mechanism to get around this with intake_esm.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-67f3a48652c5> in <module>
----> 1 dq = cesm2.search(experiment=['historical'], variable='IRON_FLUX').to_xarray(chunks={'time': 48})
      2 _, ds2 = dq.popitem()
      3 ds2 = ds2.drop([v for v in ds2.variables if v not in keep_vars])
      4 ds2

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/source.py in to_xarray(self, **kwargs)
    157         _kwargs.update(kwargs)
    158         self.kwargs = _kwargs
--> 159         return self.to_dask()
    160 
    161     def _get_schema(self):

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake/source/base.py in _load_metadata(self)
    115         """load metadata only if needed"""
    116         if self._schema is None:
--> 117             self._schema = self._get_schema()
    118             self.datashape = self._schema.datashape
    119             self.dtype = self._schema.dtype

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/source.py in _get_schema(self)
    166 
    167         if self._ds is None:
--> 168             self._open_dataset()
    169             metadata = {}
    170 

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/cesm.py in _open_dataset(self)
    151             member_column_name='member_id',
    152             variable_column_name='variable',
--> 153             file_fullpath_column_name='file_fullpath',
    154         )

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/source.py in _open_dataset_groups(self, dataset_fields, member_column_name, variable_column_name, file_fullpath_column_name, file_basename_column_name)
    134                         dsets,
    135                         time_coord_name_default=kwargs['time_coord_name'],
--> 136                         override_coords=kwargs['override_coords'],
    137                     )
    138                     var_dsets.append(var_dset_i)

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/aggregate.py in concat_time_levels(dsets, time_coord_name_default, restore_non_dim_coords, override_coords)
    181     objs_to_concat = [first] + rest
    182 
--> 183     ds = xr.concat(objs_to_concat, dim=time_coord_name, coords='minimal')
    184 
    185     new_history = f"\n{datetime.now()} xarray.concat(<ALL_TIMESTEPS>, dim='{time_coord_name}', coords='minimal')"

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join)
    131             "objects, got %s" % type(first_obj)
    132         )
--> 133     return f(objs, dim, data_vars, coords, compat, positions, fill_value, join)
    134 
    135 

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join)
    321                 raise ValueError(
    322                     "variables %r are present in some datasets but not others. "
--> 323                     % absent_merge_vars
    324                 )
    325 

ValueError: variables {'DXU', 'latent_heat_fusion_mks', 'nsurface_u', 'HU', 'rho_sw', 'UAREA', 'radius', 'TAREA', 'HUS', 'vonkar', 'latent_heat_fusion', 'salinity_factor', 'dzw', 'DXT', 'hflux_factor', 'moc_components', 'salt_to_ppt', 'T0_Kelvin', 'HUW', 'sound', 'nsurface_t', 'HTN', 'ANGLE', 'DYU', 'rho_fw', 'salt_to_mmday', 'KMT', 'omega', 'TLAT', 'sea_ice_salinity', 'ULONG', 'KMU', 'DYT', 'momentum_factor', 'stefan_boltzmann', 'HT', 'dz', 'latent_heat_vapor', 'ULAT', 'cp_sw', 'rho_air', 'ocn_ref_salinity', 'transport_regions', 'ppt_to_salt', 'fwflux_factor', 'heat_to_PW', 'cp_air', 'transport_components', 'mass_to_Sv', 'REGION_MASK', 'grav', 'HTE', 'TLONG', 'ANGLET', 'salt_to_Svppt', 'days_in_norm_year', 'sflux_factor'} are present in some datasets but not others. 

cc @andersy005, @mnlevy1981

@andersy005
Copy link

@matt-long, can you confirm that you are running the latest version of xarray == v0.13.0?
I've seen this issue before, and my solution was to downgrade xarray to v0.12.3.

The concatenation/merging behavior got changed in the latest version of xarray and I have not had time to look into how to address it in intake-esm.

@matt-long
Copy link
Contributor Author

I was indeed running v0.13.0. Downgrading to v0.12.3 solves the problem. Are we tracking this in intake-esm?

@dcherian
Copy link

dcherian commented Oct 1, 2019

Fixed upstream: pydata/xarray#3364

Please ask upstream about any such issues when combining/merging/concatenating. There are many edge cases that xarray's tests don't test for. (though this turned out to be an unimplemented feature request)

@andersy005
Copy link

Thank you for fixing it, @dcherian!

Please ask upstream about any such issues when combining/merging/concatenating

Definitely. I recently got swamped with a bunch of other stuffs, and was planning on reporting it upstream some time in the next few days.

@dcherian
Copy link

dcherian commented Oct 1, 2019

Awesome, thanks for planning to do that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants