Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derived variables save all data to disk before preprocessing #377

Closed
ledm opened this issue Nov 20, 2019 · 1 comment · Fixed by #1609
Closed

Derived variables save all data to disk before preprocessing #377

ledm opened this issue Nov 20, 2019 · 1 comment · Fixed by #1609
Labels
enhancement New feature or request

Comments

@ledm
Copy link
Contributor

ledm commented Nov 20, 2019

When running the derivation, the derive preprocessor saves to disk the loaded netcdf as an netcdf in preproc. This isn't ideal, as it is extremely slow to write to disk and also results it huge bloated preproc directories. Furthermore, as the data isn't realised, I don't see why we can't keep it lazy and avoid saving it to disk.

For instance, I'm running a multi-model comparison of @tillku's derived variable Ocean Heat Content (ohc.py) between the years 1960 and 2014 (or 2005 for CMIP5). For each model in my recipe, the derivation preprocessor loads a thetao variable, applies the relevant fix, then saves the data as a netcdf. Each of these is order 20GB and takes ~10 minutes on jasmin. A small run would be around 5 models, so and hour waiting and 100GB disk space. A real publishable run would be closer to 300GB and 2-3 hours.

The frustration is compounded by the fact that the preprocessor loads all the thetao files first (which all work), then tries to load the volcello files (which don't work due to several issues - ask @valeriupredoi) -> that is fixed by iris=3

It seems that this saving to disk is unnecessary. Is there anyway that we can avoid it? We don't need to save to disk at every preprocessing stage elsewhere. Why do we need to do it here?

This continues a conversation with @bouweandela in issue #42.

@ledm ledm added the enhancement New feature or request label Nov 20, 2019
@bouweandela
Copy link
Member

It would be nice to improve this, but I do not have the time to do it at the moment. If you do have time, it would be great if you can work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants