Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate check_idle_saturated for Cythonization #4289

Merged
merged 5 commits into from
Dec 1, 2020

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Dec 1, 2020

In our profiling of the scheduler, we identified to main transitions that took a good chunk of time. These are transition_processing_memory and transition_waiting_processing. The former takes slightly longer than the other, but both are easily 2x slower than any transition that follows them. Both of them directly or indirectly make a call to check_idle_saturated. While this is not necessarily the worst bottleneck for either of them, it does stick out on the callgraph and stands a good chance of improving both transitions runtimes at once. Additionally check_idle_saturated includes a fair bit of code that simply crunches numbers and does not touch Python objects as much. It also isn't as dependent on the Cythonization of other classes as other functions in the profile are. So this makes it easier to Cythonize this piece of code without needing to touch too much other code.

Here we go through and annotate the local variables with types. Also we assign non-local variables accessed through attributes to local variables, which we type. Additionally we changed default argument values to be more friendly with C-style types. Initially we tried to type self.idle. However as self.idle is a sortedcontainers.SortedSet, this didn't work (as we need an actual Python set to type it). So we left this as untyped.

Should add as a good chunk of the time in check_idle_saturated is just spent doing ws.status == Status.closed, we still need PR ( #4270 ) to cutdown on the time spent in this method.

Combine assignments and separate them from `if` blocks.
To make it easier to type `occ` later, define this as a non-`None`
value, which is also clearly bogus (namely `-1.0`). That way we can
still replace this value when it hasn't been otherwise set while also
including a type that includes the default value.
For now to just see what is possible, assign these attributes to local
variables typed as `set`s. This will allow Cython to use the
corresponding Python C APIs with these objects; thus, optimizing how
they are handled. This saves us temporarily from trying to more
generally Cythonize this class while exploring optimizations here.
As `self.idle` is actually a `SortedSet` instead of a `set`, we can't
type it. So revert the typing of `idle`. Though leave all other
`set`-based typing for other optimizations to be applied where possible.
Also leave the assignment of `self.idle` to `idle` as this generates the
attribute access code only once and we need this for both branches
anyways.
@jakirkham
Copy link
Member Author

cc @mrocklin @quasiben

@mrocklin
Copy link
Member

mrocklin commented Dec 1, 2020

The sortedcontainers issue is interesting. It is convenient for scheduling logic to keep that around, but I'll be curious to learn how much that hurts us long-term.

The one other issue to consider is if we want to start including cython/ctypes level typing in the scheduler. This is still an experimental effort and so maybe a step backwards (readability gets mangled a bit). I think it's probably still ok because it'll be easy to strip out later on if we decide that to go in a different direction. I'm cc'ing @jcrist though, who I think is the person most likely to object to adding annotations like these.

@jakirkham
Copy link
Member Author

The Enum comparison presents a bigger issue ( #4270 ).

Sure. Would be interested to get thoughts here and PR ( #4290 ), Jim :)

@jcrist
Copy link
Member

jcrist commented Dec 1, 2020

I'm cc'ing @jcrist though, who I think is the person most likely to object to adding annotations like these.

No objections from me. I find running mypy in CI a bit onerous, but adding annotations for optional cythonization seems totally fine.

@mrocklin
Copy link
Member

mrocklin commented Dec 1, 2020

OK, in that case this looks good to me.

@jakirkham I'm happy to merge this. I notice that you still have the WIP flag up so I'll wait a bit unless you want to give the go-ahead. Alternatively, please merge at will.

@jakirkham jakirkham changed the title [WIP] Annotate check_idle_saturated for Cythonization Annotate check_idle_saturated for Cythonization Dec 1, 2020
@jakirkham jakirkham marked this pull request as ready for review December 1, 2020 20:43
@jakirkham
Copy link
Member Author

Thanks all! 😄

I think this is ready. May come back to this function after we have typed some of the *State objects as well, but that isn't a blocker for this change (just a future improvement 😉).

@mrocklin mrocklin merged commit db7de30 into dask:master Dec 1, 2020
@jakirkham jakirkham deleted the cy_check_idle_saturated branch December 1, 2020 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants