xCDAT/xarray/numpy and 'silent upcasting' question #575
Replies: 1 comment 1 reply
-
We found that cdms2 incorrectly type casts weights to Some related comments:
I'm not 100% sure, but I'm pretty confident most Numpy functions/methods maintain the From the team's experience developing xCDAT, we found that Xarray and xCDAT correctly maintains the
I'm currently experimenting with Dask and getting performance metrics in PR #489. I'm comparing xCDAT's serial and parallel performance against CDAT (serial-only). The preliminary results are extremely promising for xCDAT so far, so stay tuned. |
Beta Was this translation helpful? Give feedback.
-
Question criteria
Describe your question
When reading CDAT/cdms#449 this morning, I remembered why we had
Float
,Float32
andFloat64
in the first place.The idea was to accurately represent our data (with a good enough numerical precision), have a good mapping between NetCDF types and numpy, and only use the ram and disk space that we need
There used to be a time when you performed a numerical operation on a Float32 cdms2 variable and you ended up with a Float64 variable, thus doubling the required space. Then the savespace option was introduced to fix this
``savespace
is an integer flag: if set to 1, internal Numpy operations will attempt to avoid silent upcasting
, as described for instance in 2.10.2. Variable ConstructorsThis theoretically told cdms2 (or MV2 or numpy?) not too use more memory than actually required
Does somebody know what governs this behavior these days (or what are the rules) in numpy, xarray or xcdat and the way these packages interact?
This may seem a bit philosophical as we now have plenty of ram available. But the size of our datasets has grown, and our end users have not grown more careful (I'd say lots of them are more lazy now...), so having a conservative use of memory by default is still important
Loosely related use case: a PhD student asked me yesterday how to run his Python script on our cluster, because he did not have enough memory on his computer and had no time to split his Antarctica data, and later asked me what was the
PBS: job killed: walltime 43304 exceeded limit 43200
error message he got. I have not seen his script yet, but it could be that reducing the memory used would possibly allow the job to finish before reaching the allowed timeAre there are any possible answers you came across?
No response
Minimal Complete Verifiable Example (MVCE)
No response
Relevant log output
No response
Environment
No response
Anything else we need to know?
No response
Beta Was this translation helpful? Give feedback.
All reactions