-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safely open / close netCDF files without resource locking #2887
Comments
what is the recommended code pattern for reading a file; adding a few variables; and then writing back to the same file? |
This pattern should work: with xr.open_dataset('test.nc') as ds:
ds.load()
ds.to_netcdf('test.nc') |
OK. But what about the usual scientist workflow where you work in multiple cells
next cell:
...
I wonder if we should add |
I think this is more of a limitation of netCDF-C / HDF5 than xarray. For example, this example works if you use SciPy's netCDF reader/writer:
|
I didn't notice that I personally want to a simpler function for the daily (not so big data) analysis without caring open / close stuff. |
Just to clarify, def load_dataset(*args, **kwargs):
with xarray.open_dataset(*args, **kwargs) as ds:
return ds.load() This also seems pretty reasonable to me. I’ve written a version of this utility function a handful of times, so I at least would find it useful. Would we also want |
I used to use the |
Yes, that is actually in my mind. I added a tag |
BUG: Fixes pydata#2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray TST: Add tests for sequentially opening and writing to files using new functions DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring
Took a stab at implementing these functions. |
BUG: Fixes pydata#2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray TST: Add tests for sequentially opening and writing to files using new functions DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring
BUG: Fixes pydata#2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray TST: Add tests for sequentially opening and writing to files using new functions DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring Update docstrings and check for cache in kwargs Undeprecate load_dataset Add to api.rst, fix whats-new.rst typo, raise error instead of warning
* Partial fix for #2841 to improve formatting. Updates formatting to use .format() instead of % operator. Changed all instances of % to .format() and added test for using tuple as key, which errored using % operator. * Revert "Partial fix for #2841 to improve formatting." This reverts commit f17f3ad. * Implement load_dataset() and load_dataarray() BUG: Fixes #2887 by adding @shoyer solution for load_dataset and load_dataarray, wrappers around open_dataset and open_dataarray which open, load, and close the file and return the Dataset/DataArray TST: Add tests for sequentially opening and writing to files using new functions DOC: Add to whats-new.rst. Also a tiny change to the open_dataset docstring Update docstrings and check for cache in kwargs Undeprecate load_dataset Add to api.rst, fix whats-new.rst typo, raise error instead of warning
Code Sample, a copy-pastable example if possible
(essentially the same to #1629)
Opening netCDF file via
xr.open_dataset
locks a resource, preventing to write a file with the same name (as pointed out and answered as an expected behavior in #1629).Problem description
Another program cannot write the same netCDF file that xarray has opened, unless
close
method is not called.-- EDIT --
close()
method does not return the object, thus it cannot be put in the chain call, such asIt is understandable when we do not want to load the entire file into the memory.
However, sometimes I want to read the file that will be updated soon by another program.
Also, I think that many users who are not accustomed to netCDF may expect this behavior (as
np.loadtxt
does) and will be surprised after gettingPermissionError
.I think it would be nice to have an option such as
load_all=True
or even make it a default?Expected Output
No error
Output of
xr.show_versions()
xarray: 0.12.0+11.g7d0e895f.dirty
pandas: 0.23.4
numpy: 1.15.4
scipy: 1.2.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.2.1
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 1.0.0
distributed: 1.25.0
matplotlib: 2.2.2
cartopy: None
seaborn: 0.9.0
setuptools: 40.5.0
pip: 18.1
conda: None
pytest: 4.0.1
IPython: 7.1.1
sphinx: 1.8.2
The text was updated successfully, but these errors were encountered: