Worker key reference counting #3641

jacobtomlinson · 2020-03-26T13:25:30Z

Add reference counting to allow workers to keep track of key deletions.

Avoids race conditions when keys are quickly deleted and recreated.

Fixes dask/dask#6027.

jacobtomlinson · 2020-03-26T13:34:24Z

Thanks for providing the test @quasiben!

quasiben

Thank you @jacobtomlinson . I left two comments. I also tried running @cjnolet's and I don't think it's passing quite yet

distributed/worker.py

quasiben · 2020-03-26T13:44:52Z

distributed/tests/test_worker_client.py

+
+
+@gen_cluster(client=True)
+async def test_task_unique_groups_scatter(c, s, a, b):


Not sure why, but occasionally when I run I get a timeouterror:

E tornado.util.TimeoutError: Operation timed out after 20 seconds

How occasionally? I'm not seeing this after 10 runs.

I was seeing it after 3 runs on my machine

I tried converting Corey's example into a test and now I see that timeouterror every time. I wonder if pytest is masking the CancelledError somehow.

@gen_cluster(client=True) async def test_task_unique_groups_scatter_tree_reduce(c, s, a, b): from toolz import first def tree_reduce(objs): while len(objs) > 1: new_objs = [] n_objs = len(objs) for i in range(0, n_objs, 2): inputs = objs[i : i + 2] obj = c.submit(sum, inputs) new_objs.append(obj) wait(new_objs) objs = new_objs return first(objs) for n_parts in [1, 2, 5, 10, 15]: a = await c.scatter(range(n_parts)) b = tree_reduce(a) assert sum(range(n_parts)) == await b del a

I've been seeing this timeout exception as well and what's very strange is that I've even increased the timeout to 50s and I get the exception long before 50s has passed. This is specifically true in the case of dask/dask#6037

I'm getting it almost immediately. Within a second.

jacobtomlinson · 2020-03-26T14:02:46Z

~~I am now successfully able to run @cjnolet's reproducer without error.~~

It ran a couple of times without error, but not every time.

distributed/tests/test_worker_client.py

jacobtomlinson · 2020-03-27T15:36:02Z

The test is now consistently failing because the sum-aabbcc112233 key is getting deleted. This is better than the inconsistent failures I was seeing before.

The worker reference counting seems to have helped with the scattered items being deleted, but not other tasks which depend on the scattered items. I'll also look at adding some reference counting in the scheduler to try and avoid this.

…rence-counting

mrocklin · 2020-04-25T18:35:11Z

I think that the problem here might be on the client/scheduler side rather than on the worker (or at least there may be another problem there).

I think that we can delete a future on the client, then scatter the same data, and then, because these two messages are sent on different channels, the scheduler can get these messages out of order

To demonstrate this here is a diff

--- a/distributed/client.py
+++ b/distributed/client.py
@@ -1132,14 +1132,17 @@ class Client(Node):
         self.close()
 
     def _inc_ref(self, key):
+        print("inc", key, self.refcount[key])
         with self._refcount_lock:
             self.refcount[key] += 1
 
     def _dec_ref(self, key):
         with self._refcount_lock:
+            print("dec", key, self.refcount[key])
             self.refcount[key] -= 1
             if self.refcount[key] == 0:
                 del self.refcount[key]
+                print("release", key)
                 self._release_key(key)
 
     def _release_key(self, key):
@@ -1149,6 +1152,7 @@ class Client(Node):
         if st is not None:
             st.cancel()
         if self.status != "closed":
+            print("key in refcount", key in self.refcount)
             self._send_to_scheduler(
                 {"op": "client-releases-keys", "keys": [key], "client": self.id}
             )
@@ -1985,6 +1989,7 @@ class Client(Node):
                 if w.scheduler.address == self.scheduler.address:
                     direct = True
 
+        print("scatter data")
         if local_worker:  # running within task
             local_worker.update_data(data=data, report=False)

Which results in this output

scatter data
inc int-5c8a950061aa331153f4a172bbcbfd1b 0
inc int-c0a8a20f903a4915b94db8de3ea63195 0
inc sum-3bf879e7275058e8fc6d43c5dd9a2e3e 0
scatter data
dec sum-3bf879e7275058e8fc6d43c5dd9a2e3e 1
release sum-3bf879e7275058e8fc6d43c5dd9a2e3e
key in refcount False
dec int-c0a8a20f903a4915b94db8de3ea63195 1
release int-c0a8a20f903a4915b94db8de3ea63195
key in refcount False
dec int-5c8a950061aa331153f4a172bbcbfd1b 1
release int-5c8a950061aa331153f4a172bbcbfd1b
key in refcount False
inc int-5c8a950061aa331153f4a172bbcbfd1b 0
inc int-c0a8a20f903a4915b94db8de3ea63195 0
inc sum-3bf879e7275058e8fc6d43c5dd9a2e3e 0

We scatter the second round of data before the first release gets sent, so the client is releasing the data that it sent the second time.

For background, there are two channels of communication between the client and scheduler (or between any pair really),

self.scheduler, a ConnectionPool that is used for large messages (like scatter)
self.scheduler_comm, a BatchedSend that is used for small administrative messages. This also has a 2ms delay where it groups together lots of small messages into larger packages.

mrocklin · 2020-04-25T19:05:17Z

One approach here would be to stop reference counting on the client, and instead keep a list of Future identifiers on the scheduler. So if a client had two different Futures pointing to the same key then the scheduler would be aware of that fact, and would track the deletion of each future independently.

Otherwise there is a period where the user has requested that data exists, but we haven't incremented the reference counter yet. This way we've stated that yes, these keys are important to us, even while we wait for the data to transfer. This stops the client from sending an erroneous release message to the scheduler.

mrocklin · 2020-04-25T21:07:23Z

One approach here would be to stop reference counting on the client, and instead keep a list of Future identifiers on the scheduler

After trying this for a bit I'm not sure that it's a good idea. In any event, I think that only one of the scheduler and the client should be authoritative about if a future is active. My guess now is that this is the client. Allowing stale client-releases-keys messages pending in the scheduler_comm.buffer is maybe something that we shouldn't allow.

martindurant · 2021-03-12T15:11:27Z

This seems to have gone stale, suggest closing

jacobtomlinson added 2 commits March 26, 2020 13:22

Add worker key reference counting

8edba46

Add test

53691d2

jacobtomlinson requested review from quasiben and mrocklin March 26, 2020 13:26

quasiben reviewed Mar 26, 2020

View reviewed changes

Cast keys to list as it sometimes is a tuple

a799dd5

mrocklin reviewed Mar 26, 2020

View reviewed changes

distributed/tests/test_worker_client.py Outdated Show resolved Hide resolved

jacobtomlinson added 4 commits March 26, 2020 19:17

Update test

3b4b779

Fix test

6d6f44c

Check both workers for value

05b6069

Update test so that sum key is also the same both times

3eaa30f

mrocklin mentioned this pull request Apr 25, 2020

Nested scatter calls lead to KeyError #3703

Open

Merge branch 'master' of github.com:dask/distributed into worker-refe…

e249c8a

…rence-counting

mrocklin added 2 commits April 25, 2020 13:20

Improve logging on lost data

e84ff97

kumarprabhu1988 added a commit to kumarprabhu1988/distributed that referenced this pull request Jun 4, 2020

initial based on dask#3641

41ad13b

Base automatically changed from master to main March 8, 2021 19:04

jacobtomlinson closed this Mar 12, 2021

mvashishtha mentioned this pull request Jul 29, 2022

Getting concurrent.futures._base.CancelledError from simple binary tree built from futures #4612

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker key reference counting #3641

Worker key reference counting #3641

jacobtomlinson commented Mar 26, 2020 •

edited

Loading

jacobtomlinson commented Mar 26, 2020

quasiben left a comment

quasiben Mar 26, 2020

jacobtomlinson Mar 26, 2020

quasiben Mar 26, 2020

jacobtomlinson Mar 26, 2020

cjnolet Mar 26, 2020

jacobtomlinson Mar 26, 2020

jacobtomlinson commented Mar 26, 2020 •

edited

Loading

jacobtomlinson commented Mar 27, 2020

mrocklin commented Apr 25, 2020

mrocklin commented Apr 25, 2020

mrocklin commented Apr 25, 2020

martindurant commented Mar 12, 2021



		@gen_cluster(client=True)
		async def test_task_unique_groups_scatter(c, s, a, b):

Worker key reference counting #3641

Worker key reference counting #3641

Conversation

jacobtomlinson commented Mar 26, 2020 • edited Loading

jacobtomlinson commented Mar 26, 2020

quasiben left a comment

Choose a reason for hiding this comment

quasiben Mar 26, 2020

Choose a reason for hiding this comment

jacobtomlinson Mar 26, 2020

Choose a reason for hiding this comment

quasiben Mar 26, 2020

Choose a reason for hiding this comment

jacobtomlinson Mar 26, 2020

Choose a reason for hiding this comment

cjnolet Mar 26, 2020

Choose a reason for hiding this comment

jacobtomlinson Mar 26, 2020

Choose a reason for hiding this comment

jacobtomlinson commented Mar 26, 2020 • edited Loading

jacobtomlinson commented Mar 27, 2020

mrocklin commented Apr 25, 2020

mrocklin commented Apr 25, 2020

mrocklin commented Apr 25, 2020

martindurant commented Mar 12, 2021

jacobtomlinson commented Mar 26, 2020 •

edited

Loading

jacobtomlinson commented Mar 26, 2020 •

edited

Loading