Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting a time range from a cube is slow #4957

Closed
bouweandela opened this issue Sep 9, 2022 · 5 comments · Fixed by #4969
Closed

Extracting a time range from a cube is slow #4957

bouweandela opened this issue Sep 9, 2022 · 5 comments · Fixed by #4969

Comments

@bouweandela
Copy link
Member

bouweandela commented Sep 9, 2022

📰 Custom Issue

Extracting a time range as described in the documentation is quite slow if you want to do it for many cubes and/or cubes with many time points. For a single cube with 10000 time points it already takes 2 seconds on my computer, so if I want to subset a few hundred cubes that becomes quite slow.

Here is a script that demonstrates this:

import cf_units
import iris.cube
import iris.coords
import iris.time
import numpy as np

time_units = cf_units.Unit('days since 1850-01-01', calendar='standard')
time = iris.coords.DimCoord(np.arange(10000, dtype=np.float64), standard_name='time', units=time_units)
cube = iris.cube.Cube(np.arange(10000, dtype=np.float32))
cube.add_dim_coord(time, 0)
pdt1 = iris.time.PartialDateTime(year=1852)
pdt2 = iris.time.PartialDateTime(year=1854)
constraint = iris.Constraint(time=lambda cell: pdt1 <= cell.point < pdt2)

%timeit cube.extract(constraint)

Result:

1.83 s ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

From looking at the code in iris.coords, it looks like the slow behaviour is caused by converting all time points to datetimes individually for each cell, instead of converting them once and then generating the cells.

Here is some code with timings:

%timeit time.units.num2date(time.points)
27.3 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

and

%timeit list(time.units.num2date(p) for p in time.points)
1.53 s ± 29.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

If this is an interesting feature, I can make a pull request to change the code so it first converts all the time points and then generates the cells?

@rcomer
Copy link
Member

rcomer commented Sep 10, 2022

This was previously raised at #3609, which went stale. So I think this is a desirable feature that no-one got around to addressing yet.

@bjlittle
Copy link
Member

Fancy taking it on @rcomer ? 😉

@rcomer
Copy link
Member

rcomer commented Sep 14, 2022

I think @bouweandela was offering to put something up, and has clearly already given it more thought than I have!

@bouweandela
Copy link
Member Author

Yes, I already tried to implement something. I'll open a pull request and we can see from there..

@bouweandela
Copy link
Member Author

Just opened a pull request here: #4969

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants