-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotate Task*
objects for Cythonization
#4302
Conversation
I'm excited about seeing this one in. It's probably too early, but I'd be
curious to see how this changes profiles, if at all.
…On Wed, Dec 2, 2020, 8:09 PM jakirkham ***@***.***> wrote:
Requires PR ( #4294 <#4294> )
Analogous to PR ( #4290 <#4290> )
( #4294 <#4294> ) except this is
annotating TaskState and usages thereof. This is a bit longer simply
because of how many attributes TaskState has and how frequently it is
used. That said, it follows the same pattern as was seen with the other two
and so shouldn't be too surprising.
As TaskState and WorkerState overlap a lot, this just builds on top of PR
( #4294 <#4294> ) to avoid
conflicts. Will rebase this after PR ( #4294
<#4294> ) is in. Though PR ( #4294
<#4294> ) should be reviewed and
merged before worrying about this one 😉
------------------------------
You can view, comment on, or merge this pull request online at:
#4302
Commit Summary
- Use `ws` variable name for `WorkerState` objects
- Name `WorkerState` variable distinctly in closure
- Assign selected `WorkerState` to variable
- Annotate `WorkerState` for Cythonization
- Annotate all `WorkerState` variables
- Add Python-level `property`s for attributes
- Add some `property.setter`s
- Create `list` from generator
- Run `black`
- Relax `_address` to `object`
- Relax `_name` to `object`
- Use `tsp` variable name for `TaskStreamPlugin`s
- Use `dts` for iterated `TaskState` variables
- Create `list` from generator
- Use `-1` as `TaskState.nbytes` default
- Assign `TaskState` instances to variables
- Annotate `TaskState` for Cythonization
- Annotate all `TaskState` variables
- Use closure to access `TaskState.priority`
- Add `_` before all `TaskState` attributes
- Use `_` prefixed `TaskState` attributes throughout
- Add Python-level `property`s for attributes
- Add some `property.setter`s
- Drop recently added `TaskState.priority` closures
File Changes
- *M* distributed/scheduler.py
<https://github.com/dask/distributed/pull/4302/files#diff-bbcf2e505bf2f9dd0dc25de4582115ee4ed4a6e80997affc7b22122912cc6591>
(1934)
Patch Links:
- https://github.com/dask/distributed/pull/4302.patch
- https://github.com/dask/distributed/pull/4302.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4302>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTARFXCAFFSSC37EYVTSS4FQTANCNFSM4ULNBWQA>
.
|
Both PR ( #4294 ) and this help based on my own local profiling and looking at the call graphs. Though the other thing they do is unlock further optimizations in the Scheduler transition methods. Since all of those methods are just working with these 3 objects Even just the Anyways if you have time to review PR ( #4294 ) tomorrow that will be really helpful. This PR is 95% there, but there is one or two more things that it may still need. So will be working on them as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments here. In general all of this looks straightforward.
I'm curious about a couple of the types, but those can wait as well.
Also beware, you're likely to get a few conflicts when we merge in the annotations work #4279
_key: str | ||
_hash: Py_hash_t | ||
_prefix: object | ||
_run_spec: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. This came from the docstring above. Later in the code it was implied this could be a dict
, but might not be. Seems like it is not very clear what it may be so perhaps it is best to leave as object
for now.
_prefix: object | ||
_run_spec: object | ||
_priority: tuple | ||
_state: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this into an enum? Would Cython prefer that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cython can't do much with a Python Enum
. There are enum
s in Cython that it does expose to the Python layer optionally. Though these are not available in pure Python mode unfortunately ( cython/cython#3923 ).
For now I think it is ok to just type this as a str
as part of our broad effort to type things. We can then revisit specifics once everything is in and we have had an opportunity to profile more closely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. We check and change state pretty often. I wouldn't be surprised if this has some effect in the future.
Are pure Python enums something that would make sense to request upstream, or is this likely to be hard to achieve with cython?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it may. I think line profiling after this lands makes sense and should hopefully guide us to which of several things we should further optimize. It could be Status
or it could be something else.
Seems like a reasonable request to me (though maybe I'm biased as I already made that request 😄). Have no idea if it will be easy or not. Certainly hope it is doable.
In any event, I think at a first pass having str
here is fine. Am more interested in broadly typing things first and worrying about more tuning in subsequent PRs after more profiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other thing I forgot to note, Cython does intern
any str
literals it sees by default as part of module initialization (IOW at import
time), which it hangs onto globally (so they don't go out of scope). That way, when we do something like ts._state = "running"
, Cython will already have intern
ed "running"
and assigned the intern
ed str
variable to ts._state
. Further when we later do something like assert ts._state == "running"
, this "running"
will also refer to the same intern
ed str
. As a result comparisons of str
s defined with literals should benefit from the intern
speedup.
Though there is a caveat. This does not apply for any dynamically generated str
s. So "My name is %s" % s
will construct a str
at runtime that is not intern
ed.
Since this just includes the |
I don't have a preference where the work is done. It'd be nice to merge one and then the other for git history. It was easier for me to comment here just because everything was here. |
Understood. If we'd like to merge two separate PRs, would suggest to have That said, if we find it simply more convenient to work here, it may make more sense to close out that PR and do all the work here. It probably gets harder to disambiguate where changes apply if we start pushing both |
Copied the |
f89eaaa
to
aaeeb9c
Compare
Rebased on |
_resource_restrictions: dict | ||
_loose_restrictions: bool | ||
_metadata: dict | ||
_annotations: dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also included changes related to annotations from PR ( #4279 ). AFAICT this is just a dict
, but please let me know if I'm missing something.
6198ee2
to
bdafb41
Compare
Do we want to handle Cythonization of Also would be good to get your thoughts on Thanks for the feedback thus far 🙂 |
No preference on TaskPrefix/TaskGroup. I think that you probably care more about git history than I do. It would be good to make |
Was more concerned about how that affects the review load for you. No strong feelings on commit history from me. Ok let's skip |
Review of this stuff is pretty simple. There isn't any tricky logic to go over. I'm generally happy. |
Found I was having issues getting Cython to use the type definitions correctly when quoted. So am proposing just swapping the ordering of |
842be89
to
182ece9
Compare
Need to flip the order of |
Now that we have properties accessible from `TaskState`, drop the closures we added previously to access the typed values internally. This should be equivalently performant and cut out a little bit of boilerplate.
Make sure to assign to the `TaskPrefix` variable, `tp`, first before assigning to the `dict`. This should avoid the admittedly likely low overhead of looking up the result in the dictionary when we already have the value available.
Saves us need to fetch this twice. Also makes the code a bit more readable. Finally may allow Cython optimizations on the variable later.
Instead of using `None` for `TaskPrefix.duration_average`, set it `-1`. This works better when typing `TaskPrefix.duration_average` as it can always be floating point. This also works logically with this value as it can't actually be negative unless it wasn't defined. Rework the logic around this variable to ensure it is positive semi-definite.
This allows Cython to perform C-level optimizations on these variables and usages thereof.
Make sure to assign to the `TaskGroup` variable, `tg`, first before assigning to the `dict`. This should avoid the admittedly likely low overhead of looking up the result in the dictionary when we already have the value available.
This ensures Cython still uses `TaskGroup` to annotate the variable iterated over. Otherwise it constructs a generator with its own scope where this is ignored.
Task*
objects for CythonizationTask*
objects for Cythonization
Ok this should be good to go. Please let me know if anything else is needed 🙂 |
I looked over the scheduler and seems like you are following the same techniques you laid out earlier -- _var with type and property decorators for Python access of the variables. Thanks @jakirkham !! Merging in now |
Thanks Ben! 😄 If anything else comes up, happy to follow up in a new issue/PR as appropriate. |
Analogous to PR ( #4290 ) ( #4294 ) except this is annotating
TaskState
,TaskPrefix
, andTaskGroup
. Plus all usages thereof. This is a bit longer simply because of how many attributesTaskState
has and how frequently it is used. That said, it follows the same pattern as was seen with the other two and so shouldn't be too surprising.