Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Distributed] Worker local race condition between put! and fetch for Futures #42339
[Distributed] Worker local race condition between put! and fetch for Futures #42339
Changes from 21 commits
9a9821d
81b3075
656546d
8d8712f
7c0dff0
64922da
389088b
7635442
50c4cd2
92188ba
306b6e4
07bbd7c
38b6419
c5045cf
0040080
87d30f6
4b634d3
2c78ed9
c93f743
c59408e
4b8d7da
3b68488
a115a32
83ecafa
67da4d5
59d910a
4d73d33
19fd304
b9eaef6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we lock
r::Future
here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't add a lock here since it's supposed to be a quick cache check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant here exactly, as in after the quick lookup.
For (my own) reference on performance:
(note: that last lock
lk
needs to be optimized, to be on-par with the other lks, but we didn't used to have the ability to express atomic operations in julia)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I put the whole thing in a
lock(r.lock)
then a localput!
will never be able to enter the lock, because thefetch
will be stuck on thefetch(rv.c)
. And input!
if i put into the channel outside of the lock I don't have control over who caches the local value firstI'm not sure what else could I improve here.
The local cache is always set once. The paths from
fetch(rv.c)
andcall_on_owner
will rightfully try to cache the value and only fail if it was already cached orput!
was locally.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, yeah, I see how it is complicated by the fact that we might be the owner, so we are waiting for someone else to set the value before we can return it here. But it seems like if this fails in the remote case, it is because there was a thread-synchronization error, which could happen because this lock was not being held across here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the call on owner case won't fail due to this lock, because it fetches on a remote channel
this
fetch
function isn't used in thecall_on_owner
scenarioThis looks safe to me as it is right now because of that, but I haven't dwelled that deep into the remote scenarios
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be locked, so that
call_on_owner
only gets called once, but I'm not sure if there's any hidden implications of doing thatThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash Just not sure here. So when a fetch gets the value from
call_on_owner
at line 573 we cache the value inr.v
and then should fetch return thev_local
obtained from thecall_on_owner
or should I loadr.v
and get that cached value?Right now it's just returning
v_local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure here. Since
fetch(rv.c)
succeeded, it seems like that set the value for everyone, but now we might decide to copy it locally too, but that we must have strictly maintained thatrv.v === r.v
, sincer.v
is just a locally cached copy of the same pointer fromrv.v
(which has the canonical copy) in this case. Is that right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. The
rv.v === r.v
is maintained right now properly I think