Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

line_profiler results on 4 workers (w/o stealing) over 20 iterations #20

Open
jakirkham opened this issue Nov 23, 2020 · 26 comments
Open

Comments

@jakirkham
Copy link
Collaborator

jakirkham commented Nov 23, 2020

Using the changes in PR ( dask/distributed#4265 ) and running the shuffle.py benchmark with 20 iterations as shown in PR ( #14 ) with 4 workers, here are the results (this was too large to include inline so have attached in a text file). Note that these results are taken from multiple processes, which we breakout in the result file.

Edit: Should add some methods may not be run in all cases. So the profile notes this as no time being spent there. These can be ignored.

@jakirkham
Copy link
Collaborator Author

Taking a look at transition_processing_memory as an example (since a fair bit of time is spent there), we find the methods _remove_from_processing and _add_to_memory take ~55% of the runtime. The rest is doled out roughly evenly across the other lines.

@jakirkham
Copy link
Collaborator Author

jakirkham commented Nov 23, 2020

Most of the relevant bits seem to be captured in this particular profile. Have embedded the text below for easier perusing :)

Results from prof_15384.lstat:

Timer unit: 1e-06 s

Total time: 27.2099 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_waiting at line 4014

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4014                                               @profile
  4015                                               def transition_released_waiting(self, key):
  4016   1411200     882228.0      0.6      3.2          try:
  4017   1411200    1012036.0      0.7      3.7              ts = self.tasks[key]
  4018                                           
  4019   1411200     915500.0      0.6      3.4              if self.validate:
  4020                                                           assert ts.run_spec
  4021                                                           assert not ts.waiting_on
  4022                                                           assert not ts.who_has
  4023                                                           assert not ts.processing_on
  4024                                                           assert not any(dts.state == "forgotten" for dts in ts.dependencies)
  4025                                           
  4026   1411200     980521.0      0.7      3.6              if ts.has_lost_dependencies:
  4027                                                           return {key: "forgotten"}
  4028                                           
  4029   1411200    2254299.0      1.6      8.3              ts.state = "waiting"
  4030                                           
  4031   1411200     880052.0      0.6      3.2              recommendations = {}
  4032                                           
  4033   3864960    2592780.0      0.7      9.5              for dts in ts.dependencies:
  4034   2453760    1542102.0      0.6      5.7                  if dts.exception_blame:
  4035                                                               ts.exception_blame = dts.exception_blame
  4036                                                               recommendations[key] = "erred"
  4037                                                               return recommendations
  4038                                           
  4039   3864960    2493782.0      0.6      9.2              for dts in ts.dependencies:
  4040   2453760    1656881.0      0.7      6.1                  dep = dts.key
  4041   2453760    1639909.0      0.7      6.0                  if not dts.who_has:
  4042   2453760    1872609.0      0.8      6.9                      ts.waiting_on.add(dts)
  4043   2453760    2161532.0      0.9      7.9                  if dts.state == "released":
  4044                                                               recommendations[dep] = "waiting"
  4045                                                           else:
  4046   2453760    1906281.0      0.8      7.0                      dts.waiters.add(ts)
  4047                                           
  4048   1411200    2569667.0      1.8      9.4              ts.waiters = {dts for dts in ts.dependents if dts.state == "waiting"}
  4049                                           
  4050   1411200     962346.0      0.7      3.5              if not ts.waiting_on:
  4051     34560      24958.0      0.7      0.1                  if self.workers:
  4052     34560      25455.0      0.7      0.1                      recommendations[key] = "processing"
  4053                                                           else:
  4054                                                               self.unrunnable.add(ts)
  4055                                                               ts.state = "no-worker"
  4056                                           
  4057   1411200     836925.0      0.6      3.1              return recommendations
  4058                                                   except Exception as e:
  4059                                                       logger.exception(e)
  4060                                                       if LOG_PDB:
  4061                                                           import pdb
  4062                                           
  4063                                                           pdb.set_trace()
  4064                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_no_worker_waiting at line 4066

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4066                                               @profile
  4067                                               def transition_no_worker_waiting(self, key):
  4068                                                   try:
  4069                                                       ts = self.tasks[key]
  4070                                           
  4071                                                       if self.validate:
  4072                                                           assert ts in self.unrunnable
  4073                                                           assert not ts.waiting_on
  4074                                                           assert not ts.who_has
  4075                                                           assert not ts.processing_on
  4076                                           
  4077                                                       self.unrunnable.remove(ts)
  4078                                           
  4079                                                       if ts.has_lost_dependencies:
  4080                                                           return {key: "forgotten"}
  4081                                           
  4082                                                       recommendations = {}
  4083                                           
  4084                                                       for dts in ts.dependencies:
  4085                                                           dep = dts.key
  4086                                                           if not dts.who_has:
  4087                                                               ts.waiting_on.add(dts)
  4088                                                           if dts.state == "released":
  4089                                                               recommendations[dep] = "waiting"
  4090                                                           else:
  4091                                                               dts.waiters.add(ts)
  4092                                           
  4093                                                       ts.state = "waiting"
  4094                                           
  4095                                                       if not ts.waiting_on:
  4096                                                           if self.workers:
  4097                                                               recommendations[key] = "processing"
  4098                                                           else:
  4099                                                               self.unrunnable.add(ts)
  4100                                                               ts.state = "no-worker"
  4101                                           
  4102                                                       return recommendations
  4103                                                   except Exception as e:
  4104                                                       logger.exception(e)
  4105                                                       if LOG_PDB:
  4106                                                           import pdb
  4107                                           
  4108                                                           pdb.set_trace()
  4109                                                       raise

Total time: 114.266 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_processing at line 4151

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4151                                               @profile
  4152                                               def transition_waiting_processing(self, key):
  4153   1411200    1048041.0      0.7      0.9          try:
  4154   1411200    1175460.0      0.8      1.0              ts = self.tasks[key]
  4155                                           
  4156   1411200    1033032.0      0.7      0.9              if self.validate:
  4157                                                           assert not ts.waiting_on
  4158                                                           assert not ts.who_has
  4159                                                           assert not ts.exception_blame
  4160                                                           assert not ts.processing_on
  4161                                                           assert not ts.has_lost_dependencies
  4162                                                           assert ts not in self.unrunnable
  4163                                                           assert all(dts.who_has for dts in ts.dependencies)
  4164                                           
  4165   1411200   34630891.0     24.5     30.3              ws = self.decide_worker(ts)
  4166   1411200    1149730.0      0.8      1.0              if ws is None:
  4167                                                           return {}
  4168   1411200    1086274.0      0.8      1.0              worker = ws.address
  4169                                           
  4170   1411200    2886642.0      2.0      2.5              duration = self.get_task_duration(ts)
  4171   1411200    4862507.0      3.4      4.3              comm = self.get_comm_cost(ts, ws)
  4172                                           
  4173   1411200    1782616.0      1.3      1.6              ws.processing[ts] = duration + comm
  4174   1411200    1109006.0      0.8      1.0              ts.processing_on = ws
  4175   1411200    1438232.0      1.0      1.3              ws.occupancy += duration + comm
  4176   1411200    1369699.0      1.0      1.2              self.total_occupancy += duration + comm
  4177   1411200    3457038.0      2.4      3.0              ts.state = "processing"
  4178   1411200    2057449.0      1.5      1.8              self.consume_resources(ts, ws)
  4179   1411200   18874418.0     13.4     16.5              self.check_idle_saturated(ws)
  4180   1411200    1554932.0      1.1      1.4              self.n_tasks += 1
  4181                                           
  4182   1411200    1109645.0      0.8      1.0              if ts.actor:
  4183                                                           ws.actors.add(ts)
  4184                                           
  4185                                                       # logger.debug("Send job to worker: %s, %s", worker, key)
  4186                                           
  4187   1411200   32422899.0     23.0     28.4              self.send_task_to_worker(worker, key)
  4188                                           
  4189   1411200    1217573.0      0.9      1.1              return {}
  4190                                                   except Exception as e:
  4191                                                       logger.exception(e)
  4192                                                       if LOG_PDB:
  4193                                                           import pdb
  4194                                           
  4195                                                           pdb.set_trace()
  4196                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_memory at line 4198

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4198                                               @profile
  4199                                               def transition_waiting_memory(self, key, nbytes=None, worker=None, **kwargs):
  4200                                                   try:
  4201                                                       ws = self.workers[worker]
  4202                                                       ts = self.tasks[key]
  4203                                           
  4204                                                       if self.validate:
  4205                                                           assert not ts.processing_on
  4206                                                           assert ts.waiting_on
  4207                                                           assert ts.state == "waiting"
  4208                                           
  4209                                                       ts.waiting_on.clear()
  4210                                           
  4211                                                       if nbytes is not None:
  4212                                                           ts.set_nbytes(nbytes)
  4213                                           
  4214                                                       self.check_idle_saturated(ws)
  4215                                           
  4216                                                       recommendations = {}
  4217                                           
  4218                                                       self._add_to_memory(ts, ws, recommendations, **kwargs)
  4219                                           
  4220                                                       if self.validate:
  4221                                                           assert not ts.processing_on
  4222                                                           assert not ts.waiting_on
  4223                                                           assert ts.who_has
  4224                                           
  4225                                                       return recommendations
  4226                                                   except Exception as e:
  4227                                                       logger.exception(e)
  4228                                                       if LOG_PDB:
  4229                                                           import pdb
  4230                                           
  4231                                                           pdb.set_trace()
  4232                                                       raise

Total time: 129.754 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_memory at line 4234

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4234                                               @profile
  4235                                               def transition_processing_memory(
  4236                                                   self,
  4237                                                   key,
  4238                                                   nbytes=None,
  4239                                                   type=None,
  4240                                                   typename=None,
  4241                                                   worker=None,
  4242                                                   startstops=None,
  4243                                                   **kwargs,
  4244                                               ):
  4245   1411200    1337525.0      0.9      1.0          try:
  4246   1411200    1529421.0      1.1      1.2              ts = self.tasks[key]
  4247   1411200    1290201.0      0.9      1.0              assert worker
  4248   1411200    1602443.0      1.1      1.2              assert isinstance(worker, str)
  4249                                           
  4250   1411200    1312144.0      0.9      1.0              if self.validate:
  4251                                                           assert ts.processing_on
  4252                                                           ws = ts.processing_on
  4253                                                           assert ts in ws.processing
  4254                                                           assert not ts.waiting_on
  4255                                                           assert not ts.who_has, (ts, ts.who_has)
  4256                                                           assert not ts.exception_blame
  4257                                                           assert ts.state == "processing"
  4258                                           
  4259   1411200    1736374.0      1.2      1.3              ws = self.workers.get(worker)
  4260   1411200    1289137.0      0.9      1.0              if ws is None:
  4261                                                           return {key: "released"}
  4262                                           
  4263   1411200    3722970.0      2.6      2.9              if ws != ts.processing_on:  # someone else has this task
  4264                                                           logger.info(
  4265                                                               "Unexpected worker completed task, likely due to"
  4266                                                               " work stealing.  Expected: %s, Got: %s, Key: %s",
  4267                                                               ts.processing_on,
  4268                                                               ws,
  4269                                                               key,
  4270                                                           )
  4271                                                           return {}
  4272                                           
  4273   1411200    1359794.0      1.0      1.0              if startstops:
  4274   1411200    1588021.0      1.1      1.2                  L = list()
  4275   2893522    2840527.0      1.0      2.2                  for startstop in startstops:
  4276   1482322    1533347.0      1.0      1.2                      stop = startstop["stop"]
  4277   1482322    1360072.0      0.9      1.0                      start = startstop["start"]
  4278   1482322    1354265.0      0.9      1.0                      action = startstop["action"]
  4279   1482322    1483189.0      1.0      1.1                      if action == "compute":
  4280   1411200    1697570.0      1.2      1.3                          L.append((start, stop))
  4281                                           
  4282                                                               # record timings of all actions -- a cheaper way of
  4283                                                               # getting timing info compared with get_task_stream()
  4284   1482322    2947428.0      2.0      2.3                      ts.prefix.all_durations[action] += stop - start
  4285                                           
  4286   1411200    1642864.0      1.2      1.3                  if len(L) > 0:
  4287   1411200    1694107.0      1.2      1.3                      compute_start, compute_stop = L[0]
  4288                                                           else:  # This is very rare
  4289                                                               compute_start = compute_stop = None
  4290                                                       else:
  4291                                                           compute_start = compute_stop = None
  4292                                           
  4293                                                       #############################
  4294                                                       # Update Timing Information #
  4295                                                       #############################
  4296   1411200    2562924.0      1.8      2.0              if compute_start and ws.processing.get(ts, True):
  4297                                                           # Update average task duration for worker
  4298   1411200    1536530.0      1.1      1.2                  old_duration = ts.prefix.duration_average or 0
  4299   1411200    1305091.0      0.9      1.0                  new_duration = compute_stop - compute_start
  4300   1411200    1326463.0      0.9      1.0                  if not old_duration:
  4301         4          5.0      1.2      0.0                      avg_duration = new_duration
  4302                                                           else:
  4303   1411196    1634254.0      1.2      1.3                      avg_duration = 0.5 * old_duration + 0.5 * new_duration
  4304                                           
  4305   1411200    1552510.0      1.1      1.2                  ts.prefix.duration_average = avg_duration
  4306   1411200    2010247.0      1.4      1.5                  ts.group.duration += new_duration
  4307                                           
  4308   1413162    2115529.0      1.5      1.6                  for tts in self.unknown_durations.pop(ts.prefix.name, ()):
  4309      1962       1829.0      0.9      0.0                      if tts.processing_on:
  4310      1962       1701.0      0.9      0.0                          wws = tts.processing_on
  4311      1962       2104.0      1.1      0.0                          old = wws.processing[tts]
  4312      1962       4648.0      2.4      0.0                          comm = self.get_comm_cost(tts, wws)
  4313      1962       1985.0      1.0      0.0                          wws.processing[tts] = avg_duration + comm
  4314      1962       2036.0      1.0      0.0                          wws.occupancy += avg_duration + comm - old
  4315      1962       2001.0      1.0      0.0                          self.total_occupancy += avg_duration + comm - old
  4316                                           
  4317                                                       ############################
  4318                                                       # Update State Information #
  4319                                                       ############################
  4320   1411200    1378988.0      1.0      1.1              if nbytes is not None:
  4321   1411200    5390384.0      3.8      4.2                  ts.set_nbytes(nbytes)
  4322                                           
  4323   1411200    1392218.0      1.0      1.1              recommendations = {}
  4324                                           
  4325   1411200   29274900.0     20.7     22.6              self._remove_from_processing(ts)
  4326                                           
  4327   1411200   42100296.0     29.8     32.4              self._add_to_memory(ts, ws, recommendations, type=type, typename=typename)
  4328                                           
  4329   1411200    1619181.0      1.1      1.2              if self.validate:
  4330                                                           assert not ts.processing_on
  4331                                                           assert not ts.waiting_on
  4332                                           
  4333   1411200    1216503.0      0.9      0.9              return recommendations
  4334                                                   except Exception as e:
  4335                                                       logger.exception(e)
  4336                                                       if LOG_PDB:
  4337                                                           import pdb
  4338                                           
  4339                                                           pdb.set_trace()
  4340                                                       raise

Total time: 51.8889 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_memory_released at line 4342

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4342                                               @profile
  4343                                               def transition_memory_released(self, key, safe=False):
  4344   1382400    1056038.0      0.8      2.0          try:
  4345   1382400    1190016.0      0.9      2.3              ts = self.tasks[key]
  4346                                           
  4347   1382400    1059344.0      0.8      2.0              if self.validate:
  4348                                                           assert not ts.waiting_on
  4349                                                           assert not ts.processing_on
  4350                                                           if safe:
  4351                                                               assert not ts.waiters
  4352                                           
  4353   1382400    1102310.0      0.8      2.1              if ts.actor:
  4354                                                           for ws in ts.who_has:
  4355                                                               ws.actors.discard(ts)
  4356                                                           if ts.who_wants:
  4357                                                               ts.exception_blame = ts
  4358                                                               ts.exception = "Worker holding Actor was lost"
  4359                                                               return {ts.key: "erred"}  # don't try to recreate
  4360                                           
  4361   1382400     993972.0      0.7      1.9              recommendations = {}
  4362                                           
  4363   1382400    1401972.0      1.0      2.7              for dts in ts.waiters:
  4364                                                           if dts.state in ("no-worker", "processing"):
  4365                                                               recommendations[dts.key] = "waiting"
  4366                                                           elif dts.state == "waiting":
  4367                                                               dts.waiting_on.add(ts)
  4368                                           
  4369                                                       # XXX factor this out?
  4370   3575944    2968041.0      0.8      5.7              for ws in ts.who_has:
  4371   2193544    2276794.0      1.0      4.4                  ws.has_what.remove(ts)
  4372   2193544    3941574.0      1.8      7.6                  ws.nbytes -= ts.get_nbytes()
  4373   2193544    3274682.0      1.5      6.3                  ts.group.nbytes_in_memory -= ts.get_nbytes()
  4374   4387088   12047183.0      2.7     23.2                  self.worker_send(
  4375   2193544    2312324.0      1.1      4.5                      ws.address, {"op": "delete-data", "keys": [key], "report": False}
  4376                                                           )
  4377   1382400    1336090.0      1.0      2.6              ts.who_has.clear()
  4378                                           
  4379   1382400    3055590.0      2.2      5.9              ts.state = "released"
  4380                                           
  4381   1382400    8066002.0      5.8     15.5              self.report({"op": "lost-data", "key": key})
  4382                                           
  4383   1382400    1353934.0      1.0      2.6              if not ts.run_spec:  # pure data
  4384                                                           recommendations[key] = "forgotten"
  4385   1382400    1108180.0      0.8      2.1              elif ts.has_lost_dependencies:
  4386                                                           recommendations[key] = "forgotten"
  4387   1382400    1265666.0      0.9      2.4              elif ts.who_wants or ts.waiters:
  4388                                                           recommendations[key] = "waiting"
  4389                                           
  4390   1382400    1104451.0      0.8      2.1              if self.validate:
  4391                                                           assert not ts.waiting_on
  4392                                           
  4393   1382400     974760.0      0.7      1.9              return recommendations
  4394                                                   except Exception as e:
  4395                                                       logger.exception(e)
  4396                                                       if LOG_PDB:
  4397                                                           import pdb
  4398                                           
  4399                                                           pdb.set_trace()
  4400                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_erred at line 4402

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4402                                               @profile
  4403                                               def transition_released_erred(self, key):
  4404                                                   try:
  4405                                                       ts = self.tasks[key]
  4406                                           
  4407                                                       if self.validate:
  4408                                                           with log_errors(pdb=LOG_PDB):
  4409                                                               assert ts.exception_blame
  4410                                                               assert not ts.who_has
  4411                                                               assert not ts.waiting_on
  4412                                                               assert not ts.waiters
  4413                                           
  4414                                                       recommendations = {}
  4415                                           
  4416                                                       failing_ts = ts.exception_blame
  4417                                           
  4418                                                       for dts in ts.dependents:
  4419                                                           dts.exception_blame = failing_ts
  4420                                                           if not dts.who_has:
  4421                                                               recommendations[dts.key] = "erred"
  4422                                           
  4423                                                       self.report(
  4424                                                           {
  4425                                                               "op": "task-erred",
  4426                                                               "key": key,
  4427                                                               "exception": failing_ts.exception,
  4428                                                               "traceback": failing_ts.traceback,
  4429                                                           }
  4430                                                       )
  4431                                           
  4432                                                       ts.state = "erred"
  4433                                           
  4434                                                       # TODO: waiting data?
  4435                                                       return recommendations
  4436                                                   except Exception as e:
  4437                                                       logger.exception(e)
  4438                                                       if LOG_PDB:
  4439                                                           import pdb
  4440                                           
  4441                                                           pdb.set_trace()
  4442                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_erred_released at line 4444

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4444                                               @profile
  4445                                               def transition_erred_released(self, key):
  4446                                                   try:
  4447                                                       ts = self.tasks[key]
  4448                                           
  4449                                                       if self.validate:
  4450                                                           with log_errors(pdb=LOG_PDB):
  4451                                                               assert all(dts.state != "erred" for dts in ts.dependencies)
  4452                                                               assert ts.exception_blame
  4453                                                               assert not ts.who_has
  4454                                                               assert not ts.waiting_on
  4455                                                               assert not ts.waiters
  4456                                           
  4457                                                       recommendations = {}
  4458                                           
  4459                                                       ts.exception = None
  4460                                                       ts.exception_blame = None
  4461                                                       ts.traceback = None
  4462                                           
  4463                                                       for dep in ts.dependents:
  4464                                                           if dep.state == "erred":
  4465                                                               recommendations[dep.key] = "waiting"
  4466                                           
  4467                                                       self.report({"op": "task-retried", "key": key})
  4468                                                       ts.state = "released"
  4469                                           
  4470                                                       return recommendations
  4471                                                   except Exception as e:
  4472                                                       logger.exception(e)
  4473                                                       if LOG_PDB:
  4474                                                           import pdb
  4475                                           
  4476                                                           pdb.set_trace()
  4477                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_released at line 4479

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4479                                               @profile
  4480                                               def transition_waiting_released(self, key):
  4481                                                   try:
  4482                                                       ts = self.tasks[key]
  4483                                           
  4484                                                       if self.validate:
  4485                                                           assert not ts.who_has
  4486                                                           assert not ts.processing_on
  4487                                           
  4488                                                       recommendations = {}
  4489                                           
  4490                                                       for dts in ts.dependencies:
  4491                                                           s = dts.waiters
  4492                                                           if ts in s:
  4493                                                               s.discard(ts)
  4494                                                               if not s and not dts.who_wants:
  4495                                                                   recommendations[dts.key] = "released"
  4496                                                       ts.waiting_on.clear()
  4497                                           
  4498                                                       ts.state = "released"
  4499                                           
  4500                                                       if ts.has_lost_dependencies:
  4501                                                           recommendations[key] = "forgotten"
  4502                                                       elif not ts.exception_blame and (ts.who_wants or ts.waiters):
  4503                                                           recommendations[key] = "waiting"
  4504                                                       else:
  4505                                                           ts.waiters.clear()
  4506                                           
  4507                                                       return recommendations
  4508                                                   except Exception as e:
  4509                                                       logger.exception(e)
  4510                                                       if LOG_PDB:
  4511                                                           import pdb
  4512                                           
  4513                                                           pdb.set_trace()
  4514                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_released at line 4516

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4516                                               @profile
  4517                                               def transition_processing_released(self, key):
  4518                                                   try:
  4519                                                       ts = self.tasks[key]
  4520                                           
  4521                                                       if self.validate:
  4522                                                           assert ts.processing_on
  4523                                                           assert not ts.who_has
  4524                                                           assert not ts.waiting_on
  4525                                                           assert self.tasks[key].state == "processing"
  4526                                           
  4527                                                       self._remove_from_processing(
  4528                                                           ts, send_worker_msg={"op": "release-task", "key": key}
  4529                                                       )
  4530                                           
  4531                                                       ts.state = "released"
  4532                                           
  4533                                                       recommendations = {}
  4534                                           
  4535                                                       if ts.has_lost_dependencies:
  4536                                                           recommendations[key] = "forgotten"
  4537                                                       elif ts.waiters or ts.who_wants:
  4538                                                           recommendations[key] = "waiting"
  4539                                           
  4540                                                       if recommendations.get(key) != "waiting":
  4541                                                           for dts in ts.dependencies:
  4542                                                               if dts.state != "released":
  4543                                                                   s = dts.waiters
  4544                                                                   s.discard(ts)
  4545                                                                   if not s and not dts.who_wants:
  4546                                                                       recommendations[dts.key] = "released"
  4547                                                           ts.waiters.clear()
  4548                                           
  4549                                                       if self.validate:
  4550                                                           assert not ts.processing_on
  4551                                           
  4552                                                       return recommendations
  4553                                                   except Exception as e:
  4554                                                       logger.exception(e)
  4555                                                       if LOG_PDB:
  4556                                                           import pdb
  4557                                           
  4558                                                           pdb.set_trace()
  4559                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_erred at line 4561

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4561                                               @profile
  4562                                               def transition_processing_erred(
  4563                                                   self, key, cause=None, exception=None, traceback=None, **kwargs
  4564                                               ):
  4565                                                   try:
  4566                                                       ts = self.tasks[key]
  4567                                           
  4568                                                       if self.validate:
  4569                                                           assert cause or ts.exception_blame
  4570                                                           assert ts.processing_on
  4571                                                           assert not ts.who_has
  4572                                                           assert not ts.waiting_on
  4573                                           
  4574                                                       if ts.actor:
  4575                                                           ws = ts.processing_on
  4576                                                           ws.actors.remove(ts)
  4577                                           
  4578                                                       self._remove_from_processing(ts)
  4579                                           
  4580                                                       if exception is not None:
  4581                                                           ts.exception = exception
  4582                                                       if traceback is not None:
  4583                                                           ts.traceback = traceback
  4584                                                       if cause is not None:
  4585                                                           failing_ts = self.tasks[cause]
  4586                                                           ts.exception_blame = failing_ts
  4587                                                       else:
  4588                                                           failing_ts = ts.exception_blame
  4589                                           
  4590                                                       recommendations = {}
  4591                                           
  4592                                                       for dts in ts.dependents:
  4593                                                           dts.exception_blame = failing_ts
  4594                                                           recommendations[dts.key] = "erred"
  4595                                           
  4596                                                       for dts in ts.dependencies:
  4597                                                           s = dts.waiters
  4598                                                           s.discard(ts)
  4599                                                           if not s and not dts.who_wants:
  4600                                                               recommendations[dts.key] = "released"
  4601                                           
  4602                                                       ts.waiters.clear()  # do anything with this?
  4603                                           
  4604                                                       ts.state = "erred"
  4605                                           
  4606                                                       self.report(
  4607                                                           {
  4608                                                               "op": "task-erred",
  4609                                                               "key": key,
  4610                                                               "exception": failing_ts.exception,
  4611                                                               "traceback": failing_ts.traceback,
  4612                                                           }
  4613                                                       )
  4614                                           
  4615                                                       cs = self.clients["fire-and-forget"]
  4616                                                       if ts in cs.wants_what:
  4617                                                           self.client_releases_keys(client="fire-and-forget", keys=[key])
  4618                                           
  4619                                                       if self.validate:
  4620                                                           assert not ts.processing_on
  4621                                           
  4622                                                       return recommendations
  4623                                                   except Exception as e:
  4624                                                       logger.exception(e)
  4625                                                       if LOG_PDB:
  4626                                                           import pdb
  4627                                           
  4628                                                           pdb.set_trace()
  4629                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_no_worker_released at line 4631

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4631                                               @profile
  4632                                               def transition_no_worker_released(self, key):
  4633                                                   try:
  4634                                                       ts = self.tasks[key]
  4635                                           
  4636                                                       if self.validate:
  4637                                                           assert self.tasks[key].state == "no-worker"
  4638                                                           assert not ts.who_has
  4639                                                           assert not ts.waiting_on
  4640                                           
  4641                                                       self.unrunnable.remove(ts)
  4642                                                       ts.state = "released"
  4643                                           
  4644                                                       for dts in ts.dependencies:
  4645                                                           dts.waiters.discard(ts)
  4646                                           
  4647                                                       ts.waiters.clear()
  4648                                           
  4649                                                       return {}
  4650                                                   except Exception as e:
  4651                                                       logger.exception(e)
  4652                                                       if LOG_PDB:
  4653                                                           import pdb
  4654                                           
  4655                                                           pdb.set_trace()
  4656                                                       raise

Total time: 1.71535 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_memory_forgotten at line 4708

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4708                                               @profile
  4709                                               def transition_memory_forgotten(self, key):
  4710     28800      16741.0      0.6      1.0          try:
  4711     28800      18455.0      0.6      1.1              ts = self.tasks[key]
  4712                                           
  4713     28800      15977.0      0.6      0.9              if self.validate:
  4714                                                           assert ts.state == "memory"
  4715                                                           assert not ts.processing_on
  4716                                                           assert not ts.waiting_on
  4717                                                           if not ts.run_spec:
  4718                                                               # It's ok to forget a pure data task
  4719                                                               pass
  4720                                                           elif ts.has_lost_dependencies:
  4721                                                               # It's ok to forget a task with forgotten dependencies
  4722                                                               pass
  4723                                                           elif not ts.who_wants and not ts.waiters and not ts.dependents:
  4724                                                               # It's ok to forget a task that nobody needs
  4725                                                               pass
  4726                                                           else:
  4727                                                               assert 0, (ts,)
  4728                                           
  4729     28800      14684.0      0.5      0.9              recommendations = {}
  4730                                           
  4731     28800      16391.0      0.6      1.0              if ts.actor:
  4732                                                           for ws in ts.who_has:
  4733                                                               ws.actors.discard(ts)
  4734                                           
  4735     28800    1276319.0     44.3     74.4              self._propagate_forgotten(ts, recommendations)
  4736                                           
  4737     28800     225976.0      7.8     13.2              self.report_on_key(ts=ts)
  4738     28800     116022.0      4.0      6.8              self.remove_key(key)
  4739                                           
  4740     28800      14782.0      0.5      0.9              return recommendations
  4741                                                   except Exception as e:
  4742                                                       logger.exception(e)
  4743                                                       if LOG_PDB:
  4744                                                           import pdb
  4745                                           
  4746                                                           pdb.set_trace()
  4747                                                       raise

Total time: 25.5298 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_forgotten at line 4749

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4749                                               @profile
  4750                                               def transition_released_forgotten(self, key):
  4751   1382400     603103.0      0.4      2.4          try:
  4752   1382400     727341.0      0.5      2.8              ts = self.tasks[key]
  4753                                           
  4754   1382400     653809.0      0.5      2.6              if self.validate:
  4755                                                           assert ts.state in ("released", "erred")
  4756                                                           assert not ts.who_has
  4757                                                           assert not ts.processing_on
  4758                                                           assert not ts.waiting_on, (ts, ts.waiting_on)
  4759                                                           if not ts.run_spec:
  4760                                                               # It's ok to forget a pure data task
  4761                                                               pass
  4762                                                           elif ts.has_lost_dependencies:
  4763                                                               # It's ok to forget a task with forgotten dependencies
  4764                                                               pass
  4765                                                           elif not ts.who_wants and not ts.waiters and not ts.dependents:
  4766                                                               # It's ok to forget a task that nobody needs
  4767                                                               pass
  4768                                                           else:
  4769                                                               assert 0, (ts,)
  4770                                           
  4771   1382400     625352.0      0.5      2.4              recommendations = {}
  4772   1382400   10739499.0      7.8     42.1              self._propagate_forgotten(ts, recommendations)
  4773                                           
  4774   1382400    7430808.0      5.4     29.1              self.report_on_key(ts=ts)
  4775   1382400    4142468.0      3.0     16.2              self.remove_key(key)
  4776                                           
  4777   1382400     607406.0      0.4      2.4              return recommendations
  4778                                                   except Exception as e:
  4779                                                       logger.exception(e)
  4780                                                       if LOG_PDB:
  4781                                                           import pdb
  4782                                           
  4783                                                           pdb.set_trace()
  4784                                                       raise

Total time: 718.89 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition at line 4786

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4786                                               @profile
  4787                                               def transition(self, key, finish, *args, **kwargs):
  4788                                                   """Transition a key from its current state to the finish state
  4789                                           
  4790                                                   Examples
  4791                                                   --------
  4792                                                   >>> self.transition('x', 'waiting')
  4793                                                   {'x': 'processing'}
  4794                                           
  4795                                                   Returns
  4796                                                   -------
  4797                                                   Dictionary of recommendations for future transitions
  4798                                           
  4799                                                   See Also
  4800                                                   --------
  4801                                                   Scheduler.transitions: transitive version of this function
  4802                                                   """
  4803   7027200    6118742.0      0.9      0.9          try:
  4804   7027200    5614878.0      0.8      0.8              try:
  4805   7027200    7897663.0      1.1      1.1                  ts = self.tasks[key]
  4806                                                       except KeyError:
  4807                                                           return {}
  4808   7027200    7722672.0      1.1      1.1              start = ts.state
  4809   7027200    5787581.0      0.8      0.8              if start == finish:
  4810                                                           return {}
  4811                                           
  4812   7027200    5899690.0      0.8      0.8              if self.plugins:
  4813   7027200    8331747.0      1.2      1.2                  dependents = set(ts.dependents)
  4814   7027200    7728500.0      1.1      1.1                  dependencies = set(ts.dependencies)
  4815                                           
  4816   7027200    7323587.0      1.0      1.0              if (start, finish) in self._transitions:
  4817   7027200    6724423.0      1.0      0.9                  func = self._transitions[start, finish]
  4818   7027200  515948336.0     73.4     71.8                  recommendations = func(key, *args, **kwargs)
  4819                                                       elif "released" not in (start, finish):
  4820                                                           func = self._transitions["released", finish]
  4821                                                           assert not args and not kwargs
  4822                                                           a = self.transition(key, "released")
  4823                                                           if key in a:
  4824                                                               func = self._transitions["released", a[key]]
  4825                                                           b = func(key)
  4826                                                           a = a.copy()
  4827                                                           a.update(b)
  4828                                                           recommendations = a
  4829                                                           start = "released"
  4830                                                       else:
  4831                                                           raise RuntimeError(
  4832                                                               "Impossible transition from %r to %r" % (start, finish)
  4833                                                           )
  4834                                           
  4835   7027200    8628381.0      1.2      1.2              finish2 = ts.state
  4836   7027200   11789751.0      1.7      1.6              self.transition_log.append((key, start, finish2, recommendations, time()))
  4837   7027200    6142493.0      0.9      0.9              if self.validate:
  4838                                                           logger.debug(
  4839                                                               "Transitioned %r %s->%s (actual: %s).  Consequence: %s",
  4840                                                               key,
  4841                                                               start,
  4842                                                               finish2,
  4843                                                               ts.state,
  4844                                                               dict(recommendations),
  4845                                                           )
  4846   7027200    5993139.0      0.9      0.8              if self.plugins:
  4847                                                           # Temporarily put back forgotten key for plugin to retrieve it
  4848   7027200    8090607.0      1.2      1.1                  if ts.state == "forgotten":
  4849   1411200    1009916.0      0.7      0.1                      try:
  4850   1411200    1144130.0      0.8      0.2                          ts.dependents = dependents
  4851   1411200    1095419.0      0.8      0.2                          ts.dependencies = dependencies
  4852                                                               except KeyError:
  4853                                                                   pass
  4854   1411200    1384936.0      1.0      0.2                      self.tasks[ts.key] = ts
  4855  14054400   14741166.0      1.0      2.1                  for plugin in list(self.plugins):
  4856   7027200    5700658.0      0.8      0.8                      try:
  4857   7027200   40851164.0      5.8      5.7                          plugin.transition(key, start, finish2, *args, **kwargs)
  4858                                                               except Exception:
  4859                                                                   logger.info("Plugin failed with exception", exc_info=True)
  4860   7027200    8197994.0      1.2      1.1                  if ts.state == "forgotten":
  4861   1411200    1217534.0      0.9      0.2                      del self.tasks[ts.key]
  4862                                           
  4863   7027200    7951240.0      1.1      1.1              if ts.state == "forgotten" and ts.group.name in self.task_groups:
  4864                                                           # Remove TaskGroup if all tasks are in the forgotten state
  4865   1411200    1081768.0      0.8      0.2                  tg = ts.group
  4866   1411200    3200548.0      2.3      0.4                  if not any(tg.states.get(s) for s in ALL_TASK_STATES):
  4867       200        285.0      1.4      0.0                      ts.prefix.groups.remove(tg)
  4868       200        161.0      0.8      0.0                      del self.task_groups[tg.name]
  4869                                           
  4870   7027200    5570888.0      0.8      0.8              return recommendations
  4871                                                   except Exception as e:
  4872                                                       logger.exception("Error transitioning %r from %r to %r", key, start, finish)
  4873                                                       if LOG_PDB:
  4874                                                           import pdb
  4875                                           
  4876                                                           pdb.set_trace()
  4877                                                       raise

Total time: 642.483 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transitions at line 4879

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4879                                               @profile
  4880                                               def transitions(self, recommendations):
  4881                                                   """Process transitions until none are left
  4882                                           
  4883                                                   This includes feedback from previous transitions and continues until we
  4884                                                   reach a steady state
  4885                                                   """
  4886   1440025    1129581.0      0.8      0.2          keys = set()
  4887   1440025    1212167.0      0.8      0.2          recommendations = recommendations.copy()
  4888   7056025    2828403.0      0.4      0.4          while recommendations:
  4889   5616000    3401298.0      0.6      0.5              key, finish = recommendations.popitem()
  4890   5616000    3020471.0      0.5      0.5              keys.add(key)
  4891   5616000  626325166.0    111.5     97.5              new = self.transition(key, finish)
  4892   5616000    3787582.0      0.7      0.6              recommendations.update(new)
  4893                                           
  4894   1440025     778733.0      0.5      0.1          if self.validate:
  4895                                                       for key in keys:
  4896                                                           self.validate_key(key)

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition at line 5751

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  5751                                               @profile
  5752                                               def transition(self, key, start, finish, *args, **kwargs):
  5753                                                   if finish == "memory" or finish == "erred":
  5754                                                       ts = self.scheduler.tasks.get(key)
  5755                                                       if ts is not None and ts.key in self.keys:
  5756                                                           self.metadata[key] = ts.metadata
  5757                                                           self.state[key] = finish
  5758                                                           self.keys.discard(key)

Edit: After more perusing, I think this is the only one that is relevant to us. Have pushed commit ( dask/distributed@3562816 ) to skip writing out trivial profiles, which should simplify things going forward.

@quasiben
Copy link
Owner

quasiben commented Nov 23, 2020

Do you have an idea of what is slow in those functions ?

@jakirkham
Copy link
Collaborator Author

No I haven't profiled them yet. Though agree that would be the next step :)

@jakirkham
Copy link
Collaborator Author

Ok have gone ahead and profiled all methods and functions called, which took 10% or more of the time. Here are the results for the scheduler.

Results from prof_27105.lstat:

Timer unit: 1e-06 s

Total time: 27.8392 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: report at line 2589

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2589                                               @profile
  2590                                               def report(self, msg, ts=None, client=None):
  2591                                                   """
  2592                                                   Publish updates to all listening Queues and Comms
  2593                                           
  2594                                                   If the message contains a key then we only send the message to those
  2595                                                   comms that care about the key.
  2596                                                   """
  2597   4233600    3616675.0      0.9     13.0          comms = set()
  2598   4233600    2511367.0      0.6      9.0          if client is not None:
  2599                                                       try:
  2600                                                           comms.add(self.client_comms[client])
  2601                                                       except KeyError:
  2602                                                           pass
  2603                                           
  2604   4233600    2642927.0      0.6      9.5          if ts is None and "key" in msg:
  2605   2822400    2629326.0      0.9      9.4              ts = self.tasks.get(msg["key"])
  2606   4233600    2203882.0      0.5      7.9          if ts is None:
  2607                                                       # Notify all clients
  2608                                                       comms |= set(self.client_comms.values())
  2609                                                   else:
  2610                                                       # Notify clients interested in key
  2611   8467200    8705937.0      1.0     31.3              comms |= {
  2612                                                           self.client_comms[c.client_key]
  2613   4233600    2478586.0      0.6      8.9                  for c in ts.who_wants
  2614                                                           if c.client_key in self.client_comms
  2615                                                       }
  2616   4291200    2697782.0      0.6      9.7          for c in comms:
  2617     57600      29423.0      0.5      0.1              try:
  2618     57600     323288.0      5.6      1.2                  c.send(msg)
  2619                                                           # logger.debug("Scheduler sends message to client %s", msg)
  2620                                                       except CommClosedError:
  2621                                                           if self.status == Status.running:
  2622                                                               logger.critical("Tried writing to closed comm: %s", msg)

Total time: 47.3278 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: send_task_to_worker at line 2703

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2703                                               @profile
  2704                                               def send_task_to_worker(self, worker, key):
  2705                                                   """ Send a single computational task to a worker """
  2706   1413197    1352430.0      1.0      2.9          try:
  2707   1413197    1585059.0      1.1      3.3              ts = self.tasks[key]
  2708                                           
  2709   1413197    1568396.0      1.1      3.3              msg = {
  2710   1413197    1212180.0      0.9      2.6                  "op": "compute-task",
  2711   1413197    1171483.0      0.8      2.5                  "key": key,
  2712   1413197    1440921.0      1.0      3.0                  "priority": ts.priority,
  2713   1413197    2780412.0      2.0      5.9                  "duration": self.get_task_duration(ts),
  2714                                                       }
  2715   1413197    1322047.0      0.9      2.8              if ts.resource_restrictions:
  2716                                                           msg["resource_restrictions"] = ts.resource_restrictions
  2717   1413197    1297820.0      0.9      2.7              if ts.actor:
  2718                                                           msg["actor"] = True
  2719                                           
  2720   1413197    1295025.0      0.9      2.7              deps = ts.dependencies
  2721   1413197    1284963.0      0.9      2.7              if deps:
  2722   2756940    7307774.0      2.7     15.4                  msg["who_has"] = {
  2723   1378470    1162796.0      0.8      2.5                      dep.key: [ws.address for ws in dep.who_has] for dep in deps
  2724                                                           }
  2725   1378470    3137939.0      2.3      6.6                  msg["nbytes"] = {dep.key: dep.nbytes for dep in deps}
  2726                                           
  2727   1413197    1403380.0      1.0      3.0              if self.validate and deps:
  2728                                                           assert all(msg["who_has"].values())
  2729                                           
  2730   1413197    1513970.0      1.1      3.2              task = ts.run_spec
  2731   1413197    1577610.0      1.1      3.3              if type(task) is dict:
  2732   1407437    2386521.0      1.7      5.0                  msg.update(task)
  2733                                                       else:
  2734      5760       5268.0      0.9      0.0                  msg["task"] = task
  2735                                           
  2736   1413197   12521764.0      8.9     26.5              self.worker_send(worker, msg)
  2737                                                   except Exception as e:
  2738                                                       logger.exception(e)
  2739                                                       if LOG_PDB:
  2740                                                           import pdb
  2741                                           
  2742                                                           pdb.set_trace()
  2743                                                       raise

Total time: 14.9432 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: worker_send at line 2873

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2873                                               @profile
  2874                                               def worker_send(self, worker, msg):
  2875                                                   """Send message to worker
  2876                                           
  2877                                                   This also handles connection failures by adding a callback to remove
  2878                                                   the worker on the next cycle.
  2879                                                   """
  2880   3567969    1492691.0      0.4     10.0          try:
  2881   3567969   13450544.0      3.8     90.0              self.stream_comms[worker].send(msg)
  2882                                                   except (CommClosedError, AttributeError):
  2883                                                       self.loop.add_callback(self.remove_worker, address=worker)

Total time: 19.3332 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: report_on_key at line 3687

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3687                                               @profile
  3688                                               def report_on_key(self, key=None, ts=None, client=None):
  3689   1411200     885885.0      0.6      4.6          assert (key is None) + (ts is None) == 1, (key, ts)
  3690   1411200     671667.0      0.5      3.5          if ts is None:
  3691                                                       try:
  3692                                                           ts = self.tasks[key]
  3693                                                       except KeyError:
  3694                                                           self.report({"op": "cancelled-key", "key": key}, client=client)
  3695                                                           return
  3696                                                   else:
  3697   1411200     697457.0      0.5      3.6              key = ts.key
  3698   1411200    1086992.0      0.8      5.6          if ts.state == "forgotten":
  3699   1411200   15991235.0     11.3     82.7              self.report({"op": "cancelled-key", "key": key}, ts=ts, client=client)
  3700                                                   elif ts.state == "memory":
  3701                                                       self.report({"op": "key-in-memory", "key": key}, ts=ts, client=client)
  3702                                                   elif ts.state == "erred":
  3703                                                       failing_ts = ts.exception_blame
  3704                                                       self.report(
  3705                                                           {
  3706                                                               "op": "task-erred",
  3707                                                               "key": key,
  3708                                                               "exception": failing_ts.exception,
  3709                                                               "traceback": failing_ts.traceback,
  3710                                                           },
  3711                                                           ts=ts,
  3712                                                           client=client,
  3713                                                       )

Total time: 47.4572 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _remove_from_processing at line 3960

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3960                                               @profile
  3961                                               def _remove_from_processing(self, ts, send_worker_msg=None):
  3962                                                   """
  3963                                                   Remove *ts* from the set of processing tasks.
  3964                                                   """
  3965   1411200     855912.0      0.6      1.8          ws = ts.processing_on
  3966   1411200     771073.0      0.5      1.6          ts.processing_on = None
  3967   1411200     700607.0      0.5      1.5          w = ws.address
  3968   1411200    1015128.0      0.7      2.1          if w in self.workers:  # may have been removed
  3969   1411200    1048480.0      0.7      2.2              duration = ws.processing.pop(ts)
  3970   1411200     701188.0      0.5      1.5              if not ws.processing:
  3971       199        233.0      1.2      0.0                  self.total_occupancy -= ws.occupancy
  3972       199        107.0      0.5      0.0                  ws.occupancy = 0
  3973                                                       else:
  3974   1411001    1327628.0      0.9      2.8                  self.total_occupancy -= duration
  3975   1411001     946203.0      0.7      2.0                  ws.occupancy -= duration
  3976   1411200   37573980.0     26.6     79.2              self.check_idle_saturated(ws)
  3977   1411200    1803487.0      1.3      3.8              self.release_resources(ts, ws)
  3978   1411200     713196.0      0.5      1.5              if send_worker_msg:
  3979                                                           self.worker_send(w, send_worker_msg)

Total time: 70.7545 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _add_to_memory at line 3981

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3981                                               @profile
  3982                                               def _add_to_memory(
  3983                                                   self, ts, ws, recommendations, type=None, typename=None, **kwargs
  3984                                               ):
  3985                                                   """
  3986                                                   Add *ts* to the set of in-memory tasks.
  3987                                                   """
  3988   1411200    1179941.0      0.8      1.7          if self.validate:
  3989                                                       assert ts not in ws.has_what
  3990                                           
  3991   1411200    2041289.0      1.4      2.9          ts.who_has.add(ws)
  3992   1411200    1549998.0      1.1      2.2          ws.has_what.add(ts)
  3993   1411200    2523381.0      1.8      3.6          ws.nbytes += ts.get_nbytes()
  3994                                           
  3995   1411200    1057374.0      0.7      1.5          deps = ts.dependents
  3996   1411200    1325179.0      0.9      1.9          if len(deps) > 1:
  3997    103680    1071461.0     10.3      1.5              deps = sorted(deps, key=operator.attrgetter("priority"), reverse=True)
  3998   3864960    2797469.0      0.7      4.0          for dts in deps:
  3999   2453760    2034093.0      0.8      2.9              s = dts.waiting_on
  4000   2453760    2000899.0      0.8      2.8              if ts in s:
  4001   2453760    1919727.0      0.8      2.7                  s.discard(ts)
  4002   2453760    1650558.0      0.7      2.3                  if not s:  # new task ready to run
  4003   1376640    1381982.0      1.0      2.0                      recommendations[dts.key] = "processing"
  4004                                           
  4005   3864960    2807379.0      0.7      4.0          for dts in ts.dependencies:
  4006   2453760    2058255.0      0.8      2.9              s = dts.waiters
  4007   2453760    2044653.0      0.8      2.9              s.discard(ts)
  4008   2453760    1821353.0      0.7      2.6              if not s and not dts.who_wants:
  4009   1382400    1365302.0      1.0      1.9                  recommendations[dts.key] = "released"
  4010                                           
  4011   1411200    1136401.0      0.8      1.6          if not ts.waiters and not ts.who_wants:
  4012                                                       recommendations[ts.key] = "released"
  4013                                                   else:
  4014   1411200    1406285.0      1.0      2.0              msg = {"op": "key-in-memory", "key": ts.key}
  4015   1411200     987284.0      0.7      1.4              if type is not None:
  4016   1411200    1127395.0      0.8      1.6                  msg["type"] = type
  4017   1411200   24088613.0     17.1     34.0              self.report(msg)
  4018                                           
  4019   1411200    3847571.0      2.7      5.4          ts.state = "memory"
  4020   1411200    1119739.0      0.8      1.6          ts.type = typename
  4021   1411200    1771122.0      1.3      2.5          ts.group.types.add(typename)
  4022                                           
  4023   1411200    1212808.0      0.9      1.7          cs = self.clients["fire-and-forget"]
  4024   1411200    1426949.0      1.0      2.0          if ts in cs.wants_what:
  4025                                                       self.client_releases_keys(client="fire-and-forget", keys=[ts.key])

Total time: 27.9123 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_waiting at line 4027

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4027                                               @profile
  4028                                               def transition_released_waiting(self, key):
  4029   1411200     883814.0      0.6      3.2          try:
  4030   1411200    1010686.0      0.7      3.6              ts = self.tasks[key]
  4031                                           
  4032   1411200     954767.0      0.7      3.4              if self.validate:
  4033                                                           assert ts.run_spec
  4034                                                           assert not ts.waiting_on
  4035                                                           assert not ts.who_has
  4036                                                           assert not ts.processing_on
  4037                                                           assert not any(dts.state == "forgotten" for dts in ts.dependencies)
  4038                                           
  4039   1411200    1027454.0      0.7      3.7              if ts.has_lost_dependencies:
  4040                                                           return {key: "forgotten"}
  4041                                           
  4042   1411200    2279443.0      1.6      8.2              ts.state = "waiting"
  4043                                           
  4044   1411200     880032.0      0.6      3.2              recommendations = {}
  4045                                           
  4046   3864960    2660601.0      0.7      9.5              for dts in ts.dependencies:
  4047   2453760    1628576.0      0.7      5.8                  if dts.exception_blame:
  4048                                                               ts.exception_blame = dts.exception_blame
  4049                                                               recommendations[key] = "erred"
  4050                                                               return recommendations
  4051                                           
  4052   3864960    2567010.0      0.7      9.2              for dts in ts.dependencies:
  4053   2453760    1712263.0      0.7      6.1                  dep = dts.key
  4054   2453760    1755038.0      0.7      6.3                  if not dts.who_has:
  4055   2453760    1960436.0      0.8      7.0                      ts.waiting_on.add(dts)
  4056   2453760    2211599.0      0.9      7.9                  if dts.state == "released":
  4057                                                               recommendations[dep] = "waiting"
  4058                                                           else:
  4059   2453760    1964890.0      0.8      7.0                      dts.waiters.add(ts)
  4060                                           
  4061   1411200    2523281.0      1.8      9.0              ts.waiters = {dts for dts in ts.dependents if dts.state == "waiting"}
  4062                                           
  4063   1411200     991524.0      0.7      3.6              if not ts.waiting_on:
  4064     34560      25931.0      0.8      0.1                  if self.workers:
  4065     34560      28118.0      0.8      0.1                      recommendations[key] = "processing"
  4066                                                           else:
  4067                                                               self.unrunnable.add(ts)
  4068                                                               ts.state = "no-worker"
  4069                                           
  4070   1411200     846876.0      0.6      3.0              return recommendations
  4071                                                   except Exception as e:
  4072                                                       logger.exception(e)
  4073                                                       if LOG_PDB:
  4074                                                           import pdb
  4075                                           
  4076                                                           pdb.set_trace()
  4077                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_no_worker_waiting at line 4079

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4079                                               @profile
  4080                                               def transition_no_worker_waiting(self, key):
  4081                                                   try:
  4082                                                       ts = self.tasks[key]
  4083                                           
  4084                                                       if self.validate:
  4085                                                           assert ts in self.unrunnable
  4086                                                           assert not ts.waiting_on
  4087                                                           assert not ts.who_has
  4088                                                           assert not ts.processing_on
  4089                                           
  4090                                                       self.unrunnable.remove(ts)
  4091                                           
  4092                                                       if ts.has_lost_dependencies:
  4093                                                           return {key: "forgotten"}
  4094                                           
  4095                                                       recommendations = {}
  4096                                           
  4097                                                       for dts in ts.dependencies:
  4098                                                           dep = dts.key
  4099                                                           if not dts.who_has:
  4100                                                               ts.waiting_on.add(dts)
  4101                                                           if dts.state == "released":
  4102                                                               recommendations[dep] = "waiting"
  4103                                                           else:
  4104                                                               dts.waiters.add(ts)
  4105                                           
  4106                                                       ts.state = "waiting"
  4107                                           
  4108                                                       if not ts.waiting_on:
  4109                                                           if self.workers:
  4110                                                               recommendations[key] = "processing"
  4111                                                           else:
  4112                                                               self.unrunnable.add(ts)
  4113                                                               ts.state = "no-worker"
  4114                                           
  4115                                                       return recommendations
  4116                                                   except Exception as e:
  4117                                                       logger.exception(e)
  4118                                                       if LOG_PDB:
  4119                                                           import pdb
  4120                                           
  4121                                                           pdb.set_trace()
  4122                                                       raise

Total time: 52.2607 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: decide_worker at line 4124

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4124                                               @profile
  4125                                               def decide_worker(self, ts):
  4126                                                   """
  4127                                                   Decide on a worker for task *ts*.  Return a WorkerState.
  4128                                                   """
  4129   1411200    7025587.0      5.0     13.4          valid_workers = self.valid_workers(ts)
  4130                                           
  4131   1411200     786067.0      0.6      1.5          if not valid_workers and not ts.loose_restrictions and self.workers:
  4132                                                       self.unrunnable.add(ts)
  4133                                                       ts.state = "no-worker"
  4134                                                       return None
  4135                                           
  4136   1411200     900271.0      0.6      1.7          if ts.dependencies or valid_workers is not True:
  4137   2753280   35798434.0     13.0     68.5              worker = decide_worker(
  4138   1376640     636106.0      0.5      1.2                  ts,
  4139   1376640    3037288.0      2.2      5.8                  self.workers.values(),
  4140   1376640     676764.0      0.5      1.3                  valid_workers,
  4141   1376640    1322111.0      1.0      2.5                  partial(self.worker_objective, ts),
  4142                                                       )
  4143     34560      38380.0      1.1      0.1          elif self.idle:
  4144       240        297.0      1.2      0.0              if len(self.idle) < 20:  # smart but linear in small case
  4145       240       1305.0      5.4      0.0                  worker = min(self.idle, key=operator.attrgetter("occupancy"))
  4146                                                       else:  # dumb but fast in large case
  4147                                                           worker = self.idle[self.n_tasks % len(self.idle)]
  4148                                                   else:
  4149     34320      21600.0      0.6      0.0              if len(self.workers) < 20:  # smart but linear in small case
  4150     68640     264546.0      3.9      0.5                  worker = min(
  4151     34320      62674.0      1.8      0.1                      self.workers.values(), key=operator.attrgetter("occupancy")
  4152                                                           )
  4153                                                       else:  # dumb but fast in large case
  4154                                                           worker = self.workers.values()[self.n_tasks % len(self.workers)]
  4155                                           
  4156   1411200     983558.0      0.7      1.9          if self.validate:
  4157                                                       assert worker is None or isinstance(worker, WorkerState), (
  4158                                                           type(worker),
  4159                                                           worker,
  4160                                                       )
  4161                                                       assert worker.address in self.workers
  4162                                           
  4163   1411200     705744.0      0.5      1.4          return worker

Total time: 213.18 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_processing at line 4165

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4165                                               @profile
  4166                                               def transition_waiting_processing(self, key):
  4167   1411200     971613.0      0.7      0.5          try:
  4168   1411200    1128389.0      0.8      0.5              ts = self.tasks[key]
  4169                                           
  4170   1411200    1000248.0      0.7      0.5              if self.validate:
  4171                                                           assert not ts.waiting_on
  4172                                                           assert not ts.who_has
  4173                                                           assert not ts.exception_blame
  4174                                                           assert not ts.processing_on
  4175                                                           assert not ts.has_lost_dependencies
  4176                                                           assert ts not in self.unrunnable
  4177                                                           assert all(dts.who_has for dts in ts.dependencies)
  4178                                           
  4179   1411200   65666200.0     46.5     30.8              ws = self.decide_worker(ts)
  4180   1411200     997021.0      0.7      0.5              if ws is None:
  4181                                                           return {}
  4182   1411200    1044060.0      0.7      0.5              worker = ws.address
  4183                                           
  4184   1411200    2849616.0      2.0      1.3              duration = self.get_task_duration(ts)
  4185   1411200    4680765.0      3.3      2.2              comm = self.get_comm_cost(ts, ws)
  4186                                           
  4187   1411200    1698853.0      1.2      0.8              ws.processing[ts] = duration + comm
  4188   1411200    1061695.0      0.8      0.5              ts.processing_on = ws
  4189   1411200    1428010.0      1.0      0.7              ws.occupancy += duration + comm
  4190   1411200    1277266.0      0.9      0.6              self.total_occupancy += duration + comm
  4191   1411200    3389479.0      2.4      1.6              ts.state = "processing"
  4192   1411200    1943276.0      1.4      0.9              self.consume_resources(ts, ws)
  4193   1411200   36456024.0     25.8     17.1              self.check_idle_saturated(ws)
  4194   1411200    1435879.0      1.0      0.7              self.n_tasks += 1
  4195                                           
  4196   1411200    1108989.0      0.8      0.5              if ts.actor:
  4197                                                           ws.actors.add(ts)
  4198                                           
  4199                                                       # logger.debug("Send job to worker: %s, %s", worker, key)
  4200                                           
  4201   1411200   83979281.0     59.5     39.4              self.send_task_to_worker(worker, key)
  4202                                           
  4203   1411200    1063302.0      0.8      0.5              return {}
  4204                                                   except Exception as e:
  4205                                                       logger.exception(e)
  4206                                                       if LOG_PDB:
  4207                                                           import pdb
  4208                                           
  4209                                                           pdb.set_trace()
  4210                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_memory at line 4212

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4212                                               @profile
  4213                                               def transition_waiting_memory(self, key, nbytes=None, worker=None, **kwargs):
  4214                                                   try:
  4215                                                       ws = self.workers[worker]
  4216                                                       ts = self.tasks[key]
  4217                                           
  4218                                                       if self.validate:
  4219                                                           assert not ts.processing_on
  4220                                                           assert ts.waiting_on
  4221                                                           assert ts.state == "waiting"
  4222                                           
  4223                                                       ts.waiting_on.clear()
  4224                                           
  4225                                                       if nbytes is not None:
  4226                                                           ts.set_nbytes(nbytes)
  4227                                           
  4228                                                       self.check_idle_saturated(ws)
  4229                                           
  4230                                                       recommendations = {}
  4231                                           
  4232                                                       self._add_to_memory(ts, ws, recommendations, **kwargs)
  4233                                           
  4234                                                       if self.validate:
  4235                                                           assert not ts.processing_on
  4236                                                           assert not ts.waiting_on
  4237                                                           assert ts.who_has
  4238                                           
  4239                                                       return recommendations
  4240                                                   except Exception as e:
  4241                                                       logger.exception(e)
  4242                                                       if LOG_PDB:
  4243                                                           import pdb
  4244                                           
  4245                                                           pdb.set_trace()
  4246                                                       raise

Total time: 228.539 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_memory at line 4248

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4248                                               @profile
  4249                                               def transition_processing_memory(
  4250                                                   self,
  4251                                                   key,
  4252                                                   nbytes=None,
  4253                                                   type=None,
  4254                                                   typename=None,
  4255                                                   worker=None,
  4256                                                   startstops=None,
  4257                                                   **kwargs,
  4258                                               ):
  4259   1411200    1248913.0      0.9      0.5          try:
  4260   1411200    1433688.0      1.0      0.6              ts = self.tasks[key]
  4261   1411200    1195227.0      0.8      0.5              assert worker
  4262   1411200    1479923.0      1.0      0.6              assert isinstance(worker, str)
  4263                                           
  4264   1411200    1258636.0      0.9      0.6              if self.validate:
  4265                                                           assert ts.processing_on
  4266                                                           ws = ts.processing_on
  4267                                                           assert ts in ws.processing
  4268                                                           assert not ts.waiting_on
  4269                                                           assert not ts.who_has, (ts, ts.who_has)
  4270                                                           assert not ts.exception_blame
  4271                                                           assert ts.state == "processing"
  4272                                           
  4273   1411200    1628692.0      1.2      0.7              ws = self.workers.get(worker)
  4274   1411200    1214684.0      0.9      0.5              if ws is None:
  4275                                                           return {key: "released"}
  4276                                           
  4277   1411200    3300956.0      2.3      1.4              if ws != ts.processing_on:  # someone else has this task
  4278                                                           logger.info(
  4279                                                               "Unexpected worker completed task, likely due to"
  4280                                                               " work stealing.  Expected: %s, Got: %s, Key: %s",
  4281                                                               ts.processing_on,
  4282                                                               ws,
  4283                                                               key,
  4284                                                           )
  4285                                                           return {}
  4286                                           
  4287   1411200    1250703.0      0.9      0.5              if startstops:
  4288   1411200    1471193.0      1.0      0.6                  L = list()
  4289   2863008    2658858.0      0.9      1.2                  for startstop in startstops:
  4290   1451808    1422805.0      1.0      0.6                      stop = startstop["stop"]
  4291   1451808    1250283.0      0.9      0.5                      start = startstop["start"]
  4292   1451808    1241881.0      0.9      0.5                      action = startstop["action"]
  4293   1451808    1363291.0      0.9      0.6                      if action == "compute":
  4294   1411200    1586687.0      1.1      0.7                          L.append((start, stop))
  4295                                           
  4296                                                               # record timings of all actions -- a cheaper way of
  4297                                                               # getting timing info compared with get_task_stream()
  4298   1451808    2732735.0      1.9      1.2                      ts.prefix.all_durations[action] += stop - start
  4299                                           
  4300   1411200    1562350.0      1.1      0.7                  if len(L) > 0:
  4301   1411200    1577738.0      1.1      0.7                      compute_start, compute_stop = L[0]
  4302                                                           else:  # This is very rare
  4303                                                               compute_start = compute_stop = None
  4304                                                       else:
  4305                                                           compute_start = compute_stop = None
  4306                                           
  4307                                                       #############################
  4308                                                       # Update Timing Information #
  4309                                                       #############################
  4310   1411200    2503039.0      1.8      1.1              if compute_start and ws.processing.get(ts, True):
  4311                                                           # Update average task duration for worker
  4312   1411200    1479167.0      1.0      0.6                  old_duration = ts.prefix.duration_average or 0
  4313   1411200    1263221.0      0.9      0.6                  new_duration = compute_stop - compute_start
  4314   1411200    1246370.0      0.9      0.5                  if not old_duration:
  4315         4          3.0      0.8      0.0                      avg_duration = new_duration
  4316                                                           else:
  4317   1411196    1544315.0      1.1      0.7                      avg_duration = 0.5 * old_duration + 0.5 * new_duration
  4318                                           
  4319   1411200    1492919.0      1.1      0.7                  ts.prefix.duration_average = avg_duration
  4320   1411200    1890589.0      1.3      0.8                  ts.group.duration += new_duration
  4321                                           
  4322   1413163    1904992.0      1.3      0.8                  for tts in self.unknown_durations.pop(ts.prefix.name, ()):
  4323      1963       2095.0      1.1      0.0                      if tts.processing_on:
  4324      1963       1765.0      0.9      0.0                          wws = tts.processing_on
  4325      1963       2066.0      1.1      0.0                          old = wws.processing[tts]
  4326      1963       4354.0      2.2      0.0                          comm = self.get_comm_cost(tts, wws)
  4327      1963       1892.0      1.0      0.0                          wws.processing[tts] = avg_duration + comm
  4328      1963       1984.0      1.0      0.0                          wws.occupancy += avg_duration + comm - old
  4329      1963       1959.0      1.0      0.0                          self.total_occupancy += avg_duration + comm - old
  4330                                           
  4331                                                       ############################
  4332                                                       # Update State Information #
  4333                                                       ############################
  4334   1411200    1338522.0      0.9      0.6              if nbytes is not None:
  4335   1411200    5125180.0      3.6      2.2                  ts.set_nbytes(nbytes)
  4336                                           
  4337   1411200    1292095.0      0.9      0.6              recommendations = {}
  4338                                           
  4339   1411200   59052078.0     41.8     25.8              self._remove_from_processing(ts)
  4340                                           
  4341   1411200  114914821.0     81.4     50.3              self._add_to_memory(ts, ws, recommendations, type=type, typename=typename)
  4342                                           
  4343   1411200    1425268.0      1.0      0.6              if self.validate:
  4344                                                           assert not ts.processing_on
  4345                                                           assert not ts.waiting_on
  4346                                           
  4347   1411200    1170731.0      0.8      0.5              return recommendations
  4348                                                   except Exception as e:
  4349                                                       logger.exception(e)
  4350                                                       if LOG_PDB:
  4351                                                           import pdb
  4352                                           
  4353                                                           pdb.set_trace()
  4354                                                       raise

Total time: 69.4096 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_memory_released at line 4356

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4356                                               @profile
  4357                                               def transition_memory_released(self, key, safe=False):
  4358   1382400    1054686.0      0.8      1.5          try:
  4359   1382400    1149472.0      0.8      1.7              ts = self.tasks[key]
  4360                                           
  4361   1382400    1036987.0      0.8      1.5              if self.validate:
  4362                                                           assert not ts.waiting_on
  4363                                                           assert not ts.processing_on
  4364                                                           if safe:
  4365                                                               assert not ts.waiters
  4366                                           
  4367   1382400    1097996.0      0.8      1.6              if ts.actor:
  4368                                                           for ws in ts.who_has:
  4369                                                               ws.actors.discard(ts)
  4370                                                           if ts.who_wants:
  4371                                                               ts.exception_blame = ts
  4372                                                               ts.exception = "Worker holding Actor was lost"
  4373                                                               return {ts.key: "erred"}  # don't try to recreate
  4374                                           
  4375   1382400     985600.0      0.7      1.4              recommendations = {}
  4376                                           
  4377   1382400    1377740.0      1.0      2.0              for dts in ts.waiters:
  4378                                                           if dts.state in ("no-worker", "processing"):
  4379                                                               recommendations[dts.key] = "waiting"
  4380                                                           elif dts.state == "waiting":
  4381                                                               dts.waiting_on.add(ts)
  4382                                           
  4383                                                       # XXX factor this out?
  4384   3508368    2860867.0      0.8      4.1              for ws in ts.who_has:
  4385   2125968    2235027.0      1.1      3.2                  ws.has_what.remove(ts)
  4386   2125968    3478060.0      1.6      5.0                  ws.nbytes -= ts.get_nbytes()
  4387   2125968    3131575.0      1.5      4.5                  ts.group.nbytes_in_memory -= ts.get_nbytes()
  4388   4251936   17162583.0      4.0     24.7                  self.worker_send(
  4389   2125968    2169517.0      1.0      3.1                      ws.address, {"op": "delete-data", "keys": [key], "report": False}
  4390                                                           )
  4391   1382400    1319461.0      1.0      1.9              ts.who_has.clear()
  4392                                           
  4393   1382400    3039483.0      2.2      4.4              ts.state = "released"
  4394                                           
  4395   1382400   21471364.0     15.5     30.9              self.report({"op": "lost-data", "key": key})
  4396                                           
  4397   1382400    1313419.0      1.0      1.9              if not ts.run_spec:  # pure data
  4398                                                           recommendations[key] = "forgotten"
  4399   1382400    1160559.0      0.8      1.7              elif ts.has_lost_dependencies:
  4400                                                           recommendations[key] = "forgotten"
  4401   1382400    1311730.0      0.9      1.9              elif ts.who_wants or ts.waiters:
  4402                                                           recommendations[key] = "waiting"
  4403                                           
  4404   1382400    1119062.0      0.8      1.6              if self.validate:
  4405                                                           assert not ts.waiting_on
  4406                                           
  4407   1382400     934429.0      0.7      1.3              return recommendations
  4408                                                   except Exception as e:
  4409                                                       logger.exception(e)
  4410                                                       if LOG_PDB:
  4411                                                           import pdb
  4412                                           
  4413                                                           pdb.set_trace()
  4414                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_erred at line 4416

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4416                                               @profile
  4417                                               def transition_released_erred(self, key):
  4418                                                   try:
  4419                                                       ts = self.tasks[key]
  4420                                           
  4421                                                       if self.validate:
  4422                                                           with log_errors(pdb=LOG_PDB):
  4423                                                               assert ts.exception_blame
  4424                                                               assert not ts.who_has
  4425                                                               assert not ts.waiting_on
  4426                                                               assert not ts.waiters
  4427                                           
  4428                                                       recommendations = {}
  4429                                           
  4430                                                       failing_ts = ts.exception_blame
  4431                                           
  4432                                                       for dts in ts.dependents:
  4433                                                           dts.exception_blame = failing_ts
  4434                                                           if not dts.who_has:
  4435                                                               recommendations[dts.key] = "erred"
  4436                                           
  4437                                                       self.report(
  4438                                                           {
  4439                                                               "op": "task-erred",
  4440                                                               "key": key,
  4441                                                               "exception": failing_ts.exception,
  4442                                                               "traceback": failing_ts.traceback,
  4443                                                           }
  4444                                                       )
  4445                                           
  4446                                                       ts.state = "erred"
  4447                                           
  4448                                                       # TODO: waiting data?
  4449                                                       return recommendations
  4450                                                   except Exception as e:
  4451                                                       logger.exception(e)
  4452                                                       if LOG_PDB:
  4453                                                           import pdb
  4454                                           
  4455                                                           pdb.set_trace()
  4456                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_erred_released at line 4458

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4458                                               @profile
  4459                                               def transition_erred_released(self, key):
  4460                                                   try:
  4461                                                       ts = self.tasks[key]
  4462                                           
  4463                                                       if self.validate:
  4464                                                           with log_errors(pdb=LOG_PDB):
  4465                                                               assert all(dts.state != "erred" for dts in ts.dependencies)
  4466                                                               assert ts.exception_blame
  4467                                                               assert not ts.who_has
  4468                                                               assert not ts.waiting_on
  4469                                                               assert not ts.waiters
  4470                                           
  4471                                                       recommendations = {}
  4472                                           
  4473                                                       ts.exception = None
  4474                                                       ts.exception_blame = None
  4475                                                       ts.traceback = None
  4476                                           
  4477                                                       for dep in ts.dependents:
  4478                                                           if dep.state == "erred":
  4479                                                               recommendations[dep.key] = "waiting"
  4480                                           
  4481                                                       self.report({"op": "task-retried", "key": key})
  4482                                                       ts.state = "released"
  4483                                           
  4484                                                       return recommendations
  4485                                                   except Exception as e:
  4486                                                       logger.exception(e)
  4487                                                       if LOG_PDB:
  4488                                                           import pdb
  4489                                           
  4490                                                           pdb.set_trace()
  4491                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_released at line 4493

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4493                                               @profile
  4494                                               def transition_waiting_released(self, key):
  4495                                                   try:
  4496                                                       ts = self.tasks[key]
  4497                                           
  4498                                                       if self.validate:
  4499                                                           assert not ts.who_has
  4500                                                           assert not ts.processing_on
  4501                                           
  4502                                                       recommendations = {}
  4503                                           
  4504                                                       for dts in ts.dependencies:
  4505                                                           s = dts.waiters
  4506                                                           if ts in s:
  4507                                                               s.discard(ts)
  4508                                                               if not s and not dts.who_wants:
  4509                                                                   recommendations[dts.key] = "released"
  4510                                                       ts.waiting_on.clear()
  4511                                           
  4512                                                       ts.state = "released"
  4513                                           
  4514                                                       if ts.has_lost_dependencies:
  4515                                                           recommendations[key] = "forgotten"
  4516                                                       elif not ts.exception_blame and (ts.who_wants or ts.waiters):
  4517                                                           recommendations[key] = "waiting"
  4518                                                       else:
  4519                                                           ts.waiters.clear()
  4520                                           
  4521                                                       return recommendations
  4522                                                   except Exception as e:
  4523                                                       logger.exception(e)
  4524                                                       if LOG_PDB:
  4525                                                           import pdb
  4526                                           
  4527                                                           pdb.set_trace()
  4528                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_released at line 4530

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4530                                               @profile
  4531                                               def transition_processing_released(self, key):
  4532                                                   try:
  4533                                                       ts = self.tasks[key]
  4534                                           
  4535                                                       if self.validate:
  4536                                                           assert ts.processing_on
  4537                                                           assert not ts.who_has
  4538                                                           assert not ts.waiting_on
  4539                                                           assert self.tasks[key].state == "processing"
  4540                                           
  4541                                                       self._remove_from_processing(
  4542                                                           ts, send_worker_msg={"op": "release-task", "key": key}
  4543                                                       )
  4544                                           
  4545                                                       ts.state = "released"
  4546                                           
  4547                                                       recommendations = {}
  4548                                           
  4549                                                       if ts.has_lost_dependencies:
  4550                                                           recommendations[key] = "forgotten"
  4551                                                       elif ts.waiters or ts.who_wants:
  4552                                                           recommendations[key] = "waiting"
  4553                                           
  4554                                                       if recommendations.get(key) != "waiting":
  4555                                                           for dts in ts.dependencies:
  4556                                                               if dts.state != "released":
  4557                                                                   s = dts.waiters
  4558                                                                   s.discard(ts)
  4559                                                                   if not s and not dts.who_wants:
  4560                                                                       recommendations[dts.key] = "released"
  4561                                                           ts.waiters.clear()
  4562                                           
  4563                                                       if self.validate:
  4564                                                           assert not ts.processing_on
  4565                                           
  4566                                                       return recommendations
  4567                                                   except Exception as e:
  4568                                                       logger.exception(e)
  4569                                                       if LOG_PDB:
  4570                                                           import pdb
  4571                                           
  4572                                                           pdb.set_trace()
  4573                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_erred at line 4575

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4575                                               @profile
  4576                                               def transition_processing_erred(
  4577                                                   self, key, cause=None, exception=None, traceback=None, **kwargs
  4578                                               ):
  4579                                                   try:
  4580                                                       ts = self.tasks[key]
  4581                                           
  4582                                                       if self.validate:
  4583                                                           assert cause or ts.exception_blame
  4584                                                           assert ts.processing_on
  4585                                                           assert not ts.who_has
  4586                                                           assert not ts.waiting_on
  4587                                           
  4588                                                       if ts.actor:
  4589                                                           ws = ts.processing_on
  4590                                                           ws.actors.remove(ts)
  4591                                           
  4592                                                       self._remove_from_processing(ts)
  4593                                           
  4594                                                       if exception is not None:
  4595                                                           ts.exception = exception
  4596                                                       if traceback is not None:
  4597                                                           ts.traceback = traceback
  4598                                                       if cause is not None:
  4599                                                           failing_ts = self.tasks[cause]
  4600                                                           ts.exception_blame = failing_ts
  4601                                                       else:
  4602                                                           failing_ts = ts.exception_blame
  4603                                           
  4604                                                       recommendations = {}
  4605                                           
  4606                                                       for dts in ts.dependents:
  4607                                                           dts.exception_blame = failing_ts
  4608                                                           recommendations[dts.key] = "erred"
  4609                                           
  4610                                                       for dts in ts.dependencies:
  4611                                                           s = dts.waiters
  4612                                                           s.discard(ts)
  4613                                                           if not s and not dts.who_wants:
  4614                                                               recommendations[dts.key] = "released"
  4615                                           
  4616                                                       ts.waiters.clear()  # do anything with this?
  4617                                           
  4618                                                       ts.state = "erred"
  4619                                           
  4620                                                       self.report(
  4621                                                           {
  4622                                                               "op": "task-erred",
  4623                                                               "key": key,
  4624                                                               "exception": failing_ts.exception,
  4625                                                               "traceback": failing_ts.traceback,
  4626                                                           }
  4627                                                       )
  4628                                           
  4629                                                       cs = self.clients["fire-and-forget"]
  4630                                                       if ts in cs.wants_what:
  4631                                                           self.client_releases_keys(client="fire-and-forget", keys=[key])
  4632                                           
  4633                                                       if self.validate:
  4634                                                           assert not ts.processing_on
  4635                                           
  4636                                                       return recommendations
  4637                                                   except Exception as e:
  4638                                                       logger.exception(e)
  4639                                                       if LOG_PDB:
  4640                                                           import pdb
  4641                                           
  4642                                                           pdb.set_trace()
  4643                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_no_worker_released at line 4645

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4645                                               @profile
  4646                                               def transition_no_worker_released(self, key):
  4647                                                   try:
  4648                                                       ts = self.tasks[key]
  4649                                           
  4650                                                       if self.validate:
  4651                                                           assert self.tasks[key].state == "no-worker"
  4652                                                           assert not ts.who_has
  4653                                                           assert not ts.waiting_on
  4654                                           
  4655                                                       self.unrunnable.remove(ts)
  4656                                                       ts.state = "released"
  4657                                           
  4658                                                       for dts in ts.dependencies:
  4659                                                           dts.waiters.discard(ts)
  4660                                           
  4661                                                       ts.waiters.clear()
  4662                                           
  4663                                                       return {}
  4664                                                   except Exception as e:
  4665                                                       logger.exception(e)
  4666                                                       if LOG_PDB:
  4667                                                           import pdb
  4668                                           
  4669                                                           pdb.set_trace()
  4670                                                       raise

Total time: 5.99088 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: remove_key at line 4672

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4672                                               @profile
  4673                                               def remove_key(self, key):
  4674   1411200     826879.0      0.6     13.8          ts = self.tasks.pop(key)
  4675   1411200     993828.0      0.7     16.6          assert ts.state == "forgotten"
  4676   1411200     756618.0      0.5     12.6          self.unrunnable.discard(ts)
  4677   1411200     671349.0      0.5     11.2          for cs in ts.who_wants:
  4678                                                       cs.wants_what.remove(ts)
  4679   1411200     646576.0      0.5     10.8          ts.who_wants.clear()
  4680   1411200     632474.0      0.4     10.6          ts.processing_on = None
  4681   1411200     768481.0      0.5     12.8          ts.exception_blame = ts.exception = ts.traceback = None
  4682                                           
  4683   1411200     694671.0      0.5     11.6          if key in self.task_metadata:
  4684                                                       del self.task_metadata[key]

Total time: 20.5691 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _propagate_forgotten at line 4686

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4686                                               @profile
  4687                                               def _propagate_forgotten(self, ts, recommendations):
  4688   1411200    2372646.0      1.7     11.5          ts.state = "forgotten"
  4689   1411200     782302.0      0.6      3.8          key = ts.key
  4690   1411200     913445.0      0.6      4.4          for dts in ts.dependents:
  4691                                                       dts.has_lost_dependencies = True
  4692                                                       dts.dependencies.remove(ts)
  4693                                                       dts.waiting_on.discard(ts)
  4694                                                       if dts.state not in ("memory", "erred"):
  4695                                                           # Cannot compute task anymore
  4696                                                           recommendations[dts.key] = "forgotten"
  4697   1411200     900487.0      0.6      4.4          ts.dependents.clear()
  4698   1411200     894425.0      0.6      4.3          ts.waiters.clear()
  4699                                           
  4700   3864960    2030291.0      0.5      9.9          for dts in ts.dependencies:
  4701   2453760    1823758.0      0.7      8.9              dts.dependents.remove(ts)
  4702   2453760    1508690.0      0.6      7.3              s = dts.waiters
  4703   2453760    1524259.0      0.6      7.4              s.discard(ts)
  4704   2453760    1430587.0      0.6      7.0              if not dts.dependents and not dts.who_wants:
  4705                                                           # Task not needed anymore
  4706   1382400     682480.0      0.5      3.3                  assert dts is not ts
  4707   1382400    1012800.0      0.7      4.9                  recommendations[dts.key] = "forgotten"
  4708   1411200     862514.0      0.6      4.2          ts.dependencies.clear()
  4709   1411200     869880.0      0.6      4.2          ts.waiting_on.clear()
  4710                                           
  4711   1411200     795122.0      0.6      3.9          if ts.who_has:
  4712     28800      52812.0      1.8      0.3              ts.group.nbytes_in_memory -= ts.get_nbytes()
  4713                                           
  4714   1440000     883066.0      0.6      4.3          for ws in ts.who_has:
  4715     28800      28345.0      1.0      0.1              ws.has_what.remove(ts)
  4716     28800      36928.0      1.3      0.2              ws.nbytes -= ts.get_nbytes()
  4717     28800      18772.0      0.7      0.1              w = ws.address
  4718     28800      29165.0      1.0      0.1              if w in self.workers:  # in case worker has died
  4719     57600     275369.0      4.8      1.3                  self.worker_send(
  4720     28800      24445.0      0.8      0.1                      w, {"op": "delete-data", "keys": [key], "report": False}
  4721                                                           )
  4722   1411200     816533.0      0.6      4.0          ts.who_has.clear()

Total time: 4.97011 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_memory_forgotten at line 4724

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4724                                               @profile
  4725                                               def transition_memory_forgotten(self, key):
  4726     28800      15569.0      0.5      0.3          try:
  4727     28800      17614.0      0.6      0.4              ts = self.tasks[key]
  4728                                           
  4729     28800      15391.0      0.5      0.3              if self.validate:
  4730                                                           assert ts.state == "memory"
  4731                                                           assert not ts.processing_on
  4732                                                           assert not ts.waiting_on
  4733                                                           if not ts.run_spec:
  4734                                                               # It's ok to forget a pure data task
  4735                                                               pass
  4736                                                           elif ts.has_lost_dependencies:
  4737                                                               # It's ok to forget a task with forgotten dependencies
  4738                                                               pass
  4739                                                           elif not ts.who_wants and not ts.waiters and not ts.dependents:
  4740                                                               # It's ok to forget a task that nobody needs
  4741                                                               pass
  4742                                                           else:
  4743                                                               assert 0, (ts,)
  4744                                           
  4745     28800      14074.0      0.5      0.3              recommendations = {}
  4746                                           
  4747     28800      16807.0      0.6      0.3              if ts.actor:
  4748                                                           for ws in ts.who_has:
  4749                                                               ws.actors.discard(ts)
  4750                                           
  4751     28800    3965095.0    137.7     79.8              self._propagate_forgotten(ts, recommendations)
  4752                                           
  4753     28800     615491.0     21.4     12.4              self.report_on_key(ts=ts)
  4754     28800     296859.0     10.3      6.0              self.remove_key(key)
  4755                                           
  4756     28800      13213.0      0.5      0.3              return recommendations
  4757                                                   except Exception as e:
  4758                                                       logger.exception(e)
  4759                                                       if LOG_PDB:
  4760                                                           import pdb
  4761                                           
  4762                                                           pdb.set_trace()
  4763                                                       raise

Total time: 77.3148 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_forgotten at line 4765

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4765                                               @profile
  4766                                               def transition_released_forgotten(self, key):
  4767   1382400     640853.0      0.5      0.8          try:
  4768   1382400     748689.0      0.5      1.0              ts = self.tasks[key]
  4769                                           
  4770   1382400     664951.0      0.5      0.9              if self.validate:
  4771                                                           assert ts.state in ("released", "erred")
  4772                                                           assert not ts.who_has
  4773                                                           assert not ts.processing_on
  4774                                                           assert not ts.waiting_on, (ts, ts.waiting_on)
  4775                                                           if not ts.run_spec:
  4776                                                               # It's ok to forget a pure data task
  4777                                                               pass
  4778                                                           elif ts.has_lost_dependencies:
  4779                                                               # It's ok to forget a task with forgotten dependencies
  4780                                                               pass
  4781                                                           elif not ts.who_wants and not ts.waiters and not ts.dependents:
  4782                                                               # It's ok to forget a task that nobody needs
  4783                                                               pass
  4784                                                           else:
  4785                                                               assert 0, (ts,)
  4786                                           
  4787   1382400     642141.0      0.5      0.8              recommendations = {}
  4788   1382400   35949786.0     26.0     46.5              self._propagate_forgotten(ts, recommendations)
  4789                                           
  4790   1382400   25551920.0     18.5     33.0              self.report_on_key(ts=ts)
  4791   1382400   12493646.0      9.0     16.2              self.remove_key(key)
  4792                                           
  4793   1382400     622849.0      0.5      0.8              return recommendations
  4794                                                   except Exception as e:
  4795                                                       logger.exception(e)
  4796                                                       if LOG_PDB:
  4797                                                           import pdb
  4798                                           
  4799                                                           pdb.set_trace()
  4800                                                       raise

Total time: 977.211 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition at line 4802

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4802                                               @profile
  4803                                               def transition(self, key, finish, *args, **kwargs):
  4804                                                   """Transition a key from its current state to the finish state
  4805                                           
  4806                                                   Examples
  4807                                                   --------
  4808                                                   >>> self.transition('x', 'waiting')
  4809                                                   {'x': 'processing'}
  4810                                           
  4811                                                   Returns
  4812                                                   -------
  4813                                                   Dictionary of recommendations for future transitions
  4814                                           
  4815                                                   See Also
  4816                                                   --------
  4817                                                   Scheduler.transitions: transitive version of this function
  4818                                                   """
  4819   7027200    5621183.0      0.8      0.6          try:
  4820   7027200    5307244.0      0.8      0.5              try:
  4821   7027200    7597480.0      1.1      0.8                  ts = self.tasks[key]
  4822                                                       except KeyError:
  4823                                                           return {}
  4824   7027200    7446846.0      1.1      0.8              start = ts.state
  4825   7027200    5553585.0      0.8      0.6              if start == finish:
  4826                                                           return {}
  4827                                           
  4828   7027200    5873207.0      0.8      0.6              if self.plugins:
  4829   7027200    8071403.0      1.1      0.8                  dependents = set(ts.dependents)
  4830   7027200    7492950.0      1.1      0.8                  dependencies = set(ts.dependencies)
  4831                                           
  4832   7027200    7359525.0      1.0      0.8              if (start, finish) in self._transitions:
  4833   7027200    6505129.0      0.9      0.7                  func = self._transitions[start, finish]
  4834   7027200  779510115.0    110.9     79.8                  recommendations = func(key, *args, **kwargs)
  4835                                                       elif "released" not in (start, finish):
  4836                                                           func = self._transitions["released", finish]
  4837                                                           assert not args and not kwargs
  4838                                                           a = self.transition(key, "released")
  4839                                                           if key in a:
  4840                                                               func = self._transitions["released", a[key]]
  4841                                                           b = func(key)
  4842                                                           a = a.copy()
  4843                                                           a.update(b)
  4844                                                           recommendations = a
  4845                                                           start = "released"
  4846                                                       else:
  4847                                                           raise RuntimeError(
  4848                                                               "Impossible transition from %r to %r" % (start, finish)
  4849                                                           )
  4850                                           
  4851   7027200    8441461.0      1.2      0.9              finish2 = ts.state
  4852   7027200   11245817.0      1.6      1.2              self.transition_log.append((key, start, finish2, recommendations, time()))
  4853   7027200    5863675.0      0.8      0.6              if self.validate:
  4854                                                           logger.debug(
  4855                                                               "Transitioned %r %s->%s (actual: %s).  Consequence: %s",
  4856                                                               key,
  4857                                                               start,
  4858                                                               finish2,
  4859                                                               ts.state,
  4860                                                               dict(recommendations),
  4861                                                           )
  4862   7027200    6044480.0      0.9      0.6              if self.plugins:
  4863                                                           # Temporarily put back forgotten key for plugin to retrieve it
  4864   7027200    7922667.0      1.1      0.8                  if ts.state == "forgotten":
  4865   1411200     994510.0      0.7      0.1                      try:
  4866   1411200    1110818.0      0.8      0.1                          ts.dependents = dependents
  4867   1411200    1072177.0      0.8      0.1                          ts.dependencies = dependencies
  4868                                                               except KeyError:
  4869                                                                   pass
  4870   1411200    1415647.0      1.0      0.1                      self.tasks[ts.key] = ts
  4871  14054400   14448210.0      1.0      1.5                  for plugin in list(self.plugins):
  4872   7027200    5447556.0      0.8      0.6                      try:
  4873   7027200   39306982.0      5.6      4.0                          plugin.transition(key, start, finish2, *args, **kwargs)
  4874                                                               except Exception:
  4875                                                                   logger.info("Plugin failed with exception", exc_info=True)
  4876   7027200    8043597.0      1.1      0.8                  if ts.state == "forgotten":
  4877   1411200    1220531.0      0.9      0.1                      del self.tasks[ts.key]
  4878                                           
  4879   7027200    7790030.0      1.1      0.8              if ts.state == "forgotten" and ts.group.name in self.task_groups:
  4880                                                           # Remove TaskGroup if all tasks are in the forgotten state
  4881   1411200    1092401.0      0.8      0.1                  tg = ts.group
  4882   1411200    4121803.0      2.9      0.4                  if not any(tg.states.get(s) for s in ALL_TASK_STATES):
  4883       200        301.0      1.5      0.0                      ts.prefix.groups.remove(tg)
  4884       200        180.0      0.9      0.0                      del self.task_groups[tg.name]
  4885                                           
  4886   7027200    5289318.0      0.8      0.5              return recommendations
  4887                                                   except Exception as e:
  4888                                                       logger.exception("Error transitioning %r from %r to %r", key, start, finish)
  4889                                                       if LOG_PDB:
  4890                                                           import pdb
  4891                                           
  4892                                                           pdb.set_trace()
  4893                                                       raise

Total time: 798.499 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transitions at line 4895

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4895                                               @profile
  4896                                               def transitions(self, recommendations):
  4897                                                   """Process transitions until none are left
  4898                                           
  4899                                                   This includes feedback from previous transitions and continues until we
  4900                                                   reach a steady state
  4901                                                   """
  4902   1440025     972184.0      0.7      0.1          keys = set()
  4903   1440025    1161678.0      0.8      0.1          recommendations = recommendations.copy()
  4904   7056025    2737728.0      0.4      0.3          while recommendations:
  4905   5616000    3303728.0      0.6      0.4              key, finish = recommendations.popitem()
  4906   5616000    2947822.0      0.5      0.4              keys.add(key)
  4907   5616000  782935237.0    139.4     98.1              new = self.transition(key, finish)
  4908   5616000    3692147.0      0.7      0.5              recommendations.update(new)
  4909                                           
  4910   1440025     748527.0      0.5      0.1          if self.validate:
  4911                                                       for key in keys:
  4912                                                           self.validate_key(key)

Total time: 46.5721 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: check_idle_saturated at line 4947

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4947                                               @profile
  4948                                               def check_idle_saturated(self, ws, occ=None):
  4949                                                   """Update the status of the idle and saturated state
  4950                                           
  4951                                                   The scheduler keeps track of workers that are ..
  4952                                           
  4953                                                   -  Saturated: have enough work to stay busy
  4954                                                   -  Idle: do not have enough work to stay busy
  4955                                           
  4956                                                   They are considered saturated if they both have enough tasks to occupy
  4957                                                   all of their threads, and if the expected runtime of those tasks is
  4958                                                   large enough.
  4959                                           
  4960                                                   This is useful for load balancing and adaptivity.
  4961                                                   """
  4962   2868550   18403806.0      6.4     39.5          if self.total_nthreads == 0 or ws.status == Status.closed:
  4963                                                       return
  4964   2868550    1898719.0      0.7      4.1          if occ is None:
  4965   2845712    1862565.0      0.7      4.0              occ = ws.occupancy
  4966   2868550    1786135.0      0.6      3.8          nc = ws.nthreads
  4967   2868550    2184132.0      0.8      4.7          p = len(ws.processing)
  4968                                           
  4969   2868550    2334694.0      0.8      5.0          avg = self.total_occupancy / self.total_nthreads
  4970                                           
  4971   2868550    2691596.0      0.9      5.8          if p < nc or occ / nc < avg / 2:
  4972    143649     350383.0      2.4      0.8              self.idle.add(ws)
  4973    143649     184997.0      1.3      0.4              self.saturated.discard(ws)
  4974                                                   else:
  4975   2724901    6600862.0      2.4     14.2              self.idle.discard(ws)
  4976                                           
  4977   2724901    2516172.0      0.9      5.4              pending = occ * (p - nc) / p / nc
  4978   2724901    2242923.0      0.8      4.8              if p > nc and pending > 0.4 and pending > 1.9 * avg:
  4979     38498      49011.0      1.3      0.1                  self.saturated.add(ws)
  4980                                                       else:
  4981   2686403    3466109.0      1.3      7.4                  self.saturated.discard(ws)

Total time: 20.9667 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: decide_worker at line 5522

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  5522                                           @profile
  5523                                           def decide_worker(ts, all_workers, valid_workers, objective):
  5524                                               """
  5525                                               Decide which worker should take task *ts*.
  5526                                           
  5527                                               We choose the worker that has the data on which *ts* depends.
  5528                                           
  5529                                               If several workers have dependencies then we choose the less-busy worker.
  5530                                           
  5531                                               Optionally provide *valid_workers* of where jobs are allowed to occur
  5532                                               (if all workers are allowed to take the task, pass True instead).
  5533                                           
  5534                                               If the task requires data communication because no eligible worker has
  5535                                               all the dependencies already, then we choose to minimize the number
  5536                                               of bytes sent between workers.  This is determined by calling the
  5537                                               *objective* function.
  5538                                               """
  5539   1376640     987014.0      0.7      4.7      deps = ts.dependencies
  5540   1376640    3723298.0      2.7     17.8      assert all(dts.who_has for dts in deps)
  5541   1376640     985276.0      0.7      4.7      if ts.actor:
  5542                                                   candidates = all_workers
  5543                                               else:
  5544   1376640    5922852.0      4.3     28.2          candidates = frequencies([ws for dts in deps for ws in dts.who_has])
  5545   1376640     937333.0      0.7      4.5      if valid_workers is True:
  5546   1376640     866451.0      0.6      4.1          if not candidates:
  5547                                                       candidates = all_workers
  5548                                               else:
  5549                                                   candidates = valid_workers & set(candidates)
  5550                                                   if not candidates:
  5551                                                       candidates = valid_workers
  5552                                                       if not candidates:
  5553                                                           if ts.loose_restrictions:
  5554                                                               return decide_worker(ts, all_workers, True, objective)
  5555                                                           else:
  5556                                                               return None
  5557   1376640     822773.0      0.6      3.9      if not candidates:
  5558                                                   return None
  5559                                           
  5560   1376640    1115502.0      0.8      5.3      if len(candidates) == 1:
  5561   1287680    1118238.0      0.9      5.3          return first(candidates)
  5562                                           
  5563     88960    4488007.0     50.4     21.4      return min(candidates, key=objective)

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition at line 5769

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  5769                                               @profile
  5770                                               def transition(self, key, start, finish, *args, **kwargs):
  5771                                                   if finish == "memory" or finish == "erred":
  5772                                                       ts = self.scheduler.tasks.get(key)
  5773                                                       if ts is not None and ts.key in self.keys:
  5774                                                           self.metadata[key] = ts.metadata
  5775                                                           self.state[key] = finish
  5776                                                           self.keys.discard(key)

@jakirkham
Copy link
Collaborator Author

transition_waiting_processing spends a bit of time in decide_worker, which calls frequencies from toolz (or cytoolz if available). That said, it seems like we don't care about the frequencies themselves, but merely which values show up at all. Switching to set improves performance here and doesn't lose any relevant information. Submitted this change as PR ( dask/distributed#4267 ).

@jakirkham
Copy link
Collaborator Author

So with _add_to_memory the most notable thing is the call to report, which actually comes up in a few other places. The most relevant lines in report appear to be related to how additional values are added to a set. Namely whether those additional values should be added to their own set first or not. It appears to be a bit more efficient to just add them without that intermediate step. So submitted PR ( dask/distributed#4268 ) to make that change.

@jakirkham
Copy link
Collaborator Author

In _remove_from_processing ~80% of the time is spent calling check_idle_saturated. Looking at check_idle_saturated ~40% of the time is spent just running this line if self.total_nthreads == 0 or ws.status == Status.closed:. As total_nthreads is just an int, it seems unlikely that is taking much time, which suggests the Status comparison may be taking the bulk of the time. Status is an Enum instance, which implements its own __eq__ method. Put together a WIP PR to drop this ( dask/distributed#4270 ). Though there may be more discussion needed there.

@jakirkham
Copy link
Collaborator Author

Outside of those issues, a fair bit of time gets spent calling worker_send all over the place, which in turn spends ~90% of its time calling BatchedSend.send, which is a relatively short method. All it seems to be doing is queuing up a message, waking up a background coroutine, which takes over and sends the message. IOW we may be somewhat communication bound here.

@jakirkham
Copy link
Collaborator Author

With some of these changes merged into distributed as of commit ( dask/distributed@4f2130c ), reran the profile to see where things sit now. Results included below:

Results from prof_34963.lstat:

Timer unit: 1e-06 s

Total time: 31.9997 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: report at line 2588

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2588                                               @profile
  2589                                               def report(self, msg, ts=None, client=None):
  2590                                                   """
  2591                                                   Publish updates to all listening Queues and Comms
  2592                                           
  2593                                                   If the message contains a key then we only send the message to those
  2594                                                   comms that care about the key.
  2595                                                   """
  2596   4233600    4098325.0      1.0     12.8          comms = set()
  2597   4233600    2680128.0      0.6      8.4          if client is not None:
  2598                                                       try:
  2599                                                           comms.add(self.client_comms[client])
  2600                                                       except KeyError:
  2601                                                           pass
  2602                                           
  2603   4233600    2853439.0      0.7      8.9          if ts is None and "key" in msg:
  2604   2822400    2927477.0      1.0      9.1              ts = self.tasks.get(msg["key"])
  2605   4233600    2383569.0      0.6      7.4          if ts is None:
  2606                                                       # Notify all clients
  2607                                                       comms.update(self.client_comms.values())
  2608                                                   else:
  2609                                                       # Notify clients interested in key
  2610   8467200   11125512.0      1.3     34.8              comms.update(
  2611                                                           self.client_comms[c.client_key]
  2612   4233600    2625687.0      0.6      8.2                  for c in ts.who_wants
  2613                                                           if c.client_key in self.client_comms
  2614                                                       )
  2615   4291200    2861137.0      0.7      8.9          for c in comms:
  2616     57600      32435.0      0.6      0.1              try:
  2617     57600     412013.0      7.2      1.3                  c.send(msg)
  2618                                                           # logger.debug("Scheduler sends message to client %s", msg)
  2619                                                       except CommClosedError:
  2620                                                           if self.status == Status.running:
  2621                                                               logger.critical("Tried writing to closed comm: %s", msg)

Total time: 51.6189 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: send_task_to_worker at line 2702

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2702                                               @profile
  2703                                               def send_task_to_worker(self, worker, key):
  2704                                                   """ Send a single computational task to a worker """
  2705   1413501    1435875.0      1.0      2.8          try:
  2706   1413501    1670599.0      1.2      3.2              ts = self.tasks[key]
  2707                                           
  2708   1413501    1764909.0      1.2      3.4              msg = {
  2709   1413501    1307200.0      0.9      2.5                  "op": "compute-task",
  2710   1413501    1242894.0      0.9      2.4                  "key": key,
  2711   1413501    1533081.0      1.1      3.0                  "priority": ts.priority,
  2712   1413501    2877789.0      2.0      5.6                  "duration": self.get_task_duration(ts),
  2713                                                       }
  2714   1413501    1355227.0      1.0      2.6              if ts.resource_restrictions:
  2715                                                           msg["resource_restrictions"] = ts.resource_restrictions
  2716   1413501    1344324.0      1.0      2.6              if ts.actor:
  2717                                                           msg["actor"] = True
  2718                                           
  2719   1413501    1339918.0      0.9      2.6              deps = ts.dependencies
  2720   1413501    1364302.0      1.0      2.6              if deps:
  2721   2757882    8070063.0      2.9     15.6                  msg["who_has"] = {
  2722   1378941    1224636.0      0.9      2.4                      dep.key: [ws.address for ws in dep.who_has] for dep in deps
  2723                                                           }
  2724   1378941    3493807.0      2.5      6.8                  msg["nbytes"] = {dep.key: dep.nbytes for dep in deps}
  2725                                           
  2726   1413501    1485129.0      1.1      2.9              if self.validate and deps:
  2727                                                           assert all(msg["who_has"].values())
  2728                                           
  2729   1413501    1598706.0      1.1      3.1              task = ts.run_spec
  2730   1413501    1692411.0      1.2      3.3              if type(task) is dict:
  2731   1407741    2681966.0      1.9      5.2                  msg.update(task)
  2732                                                       else:
  2733      5760       5087.0      0.9      0.0                  msg["task"] = task
  2734                                           
  2735   1413501   14131013.0     10.0     27.4              self.worker_send(worker, msg)
  2736                                                   except Exception as e:
  2737                                                       logger.exception(e)
  2738                                                       if LOG_PDB:
  2739                                                           import pdb
  2740                                           
  2741                                                           pdb.set_trace()
  2742                                                       raise

Total time: 16.9188 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: worker_send at line 2872

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2872                                               @profile
  2873                                               def worker_send(self, worker, msg):
  2874                                                   """Send message to worker
  2875                                           
  2876                                                   This also handles connection failures by adding a callback to remove
  2877                                                   the worker on the next cycle.
  2878                                                   """
  2879   3582837    1611627.0      0.4      9.5          try:
  2880   3582837   15307170.0      4.3     90.5              self.stream_comms[worker].send(msg)
  2881                                                   except (CommClosedError, AttributeError):
  2882                                                       self.loop.add_callback(self.remove_worker, address=worker)

Total time: 21.0588 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: report_on_key at line 3686

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3686                                               @profile
  3687                                               def report_on_key(self, key=None, ts=None, client=None):
  3688   1411200     897376.0      0.6      4.3          assert (key is None) + (ts is None) == 1, (key, ts)
  3689   1411200     715072.0      0.5      3.4          if ts is None:
  3690                                                       try:
  3691                                                           ts = self.tasks[key]
  3692                                                       except KeyError:
  3693                                                           self.report({"op": "cancelled-key", "key": key}, client=client)
  3694                                                           return
  3695                                                   else:
  3696   1411200     721934.0      0.5      3.4              key = ts.key
  3697   1411200    1132930.0      0.8      5.4          if ts.state == "forgotten":
  3698   1411200   17591466.0     12.5     83.5              self.report({"op": "cancelled-key", "key": key}, ts=ts, client=client)
  3699                                                   elif ts.state == "memory":
  3700                                                       self.report({"op": "key-in-memory", "key": key}, ts=ts, client=client)
  3701                                                   elif ts.state == "erred":
  3702                                                       failing_ts = ts.exception_blame
  3703                                                       self.report(
  3704                                                           {
  3705                                                               "op": "task-erred",
  3706                                                               "key": key,
  3707                                                               "exception": failing_ts.exception,
  3708                                                               "traceback": failing_ts.traceback,
  3709                                                           },
  3710                                                           ts=ts,
  3711                                                           client=client,
  3712                                                       )

Total time: 52.6588 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _remove_from_processing at line 3959

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3959                                               @profile
  3960                                               def _remove_from_processing(self, ts, send_worker_msg=None):
  3961                                                   """
  3962                                                   Remove *ts* from the set of processing tasks.
  3963                                                   """
  3964   1411200     911838.0      0.6      1.7          ws = ts.processing_on
  3965   1411200     827070.0      0.6      1.6          ts.processing_on = None
  3966   1411200     726732.0      0.5      1.4          w = ws.address
  3967   1411200    1130843.0      0.8      2.1          if w in self.workers:  # may have been removed
  3968   1411200    1076609.0      0.8      2.0              duration = ws.processing.pop(ts)
  3969   1411200     704612.0      0.5      1.3              if not ws.processing:
  3970       200        209.0      1.0      0.0                  self.total_occupancy -= ws.occupancy
  3971       200        109.0      0.5      0.0                  ws.occupancy = 0
  3972                                                       else:
  3973   1411000    1130288.0      0.8      2.1                  self.total_occupancy -= duration
  3974   1411000     984596.0      0.7      1.9                  ws.occupancy -= duration
  3975   1411200   42458371.0     30.1     80.6              self.check_idle_saturated(ws)
  3976   1411200    1960908.0      1.4      3.7              self.release_resources(ts, ws)
  3977   1411200     746656.0      0.5      1.4              if send_worker_msg:
  3978                                                           self.worker_send(w, send_worker_msg)

Total time: 78.2735 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _add_to_memory at line 3980

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3980                                               @profile
  3981                                               def _add_to_memory(
  3982                                                   self, ts, ws, recommendations, type=None, typename=None, **kwargs
  3983                                               ):
  3984                                                   """
  3985                                                   Add *ts* to the set of in-memory tasks.
  3986                                                   """
  3987   1411200    1246781.0      0.9      1.6          if self.validate:
  3988                                                       assert ts not in ws.has_what
  3989                                           
  3990   1411200    2151966.0      1.5      2.7          ts.who_has.add(ws)
  3991   1411200    1669866.0      1.2      2.1          ws.has_what.add(ts)
  3992   1411200    2750898.0      1.9      3.5          ws.nbytes += ts.get_nbytes()
  3993                                           
  3994   1411200    1095964.0      0.8      1.4          deps = ts.dependents
  3995   1411200    1397070.0      1.0      1.8          if len(deps) > 1:
  3996    103680    1171433.0     11.3      1.5              deps = sorted(deps, key=operator.attrgetter("priority"), reverse=True)
  3997   3864960    3030428.0      0.8      3.9          for dts in deps:
  3998   2453760    2166838.0      0.9      2.8              s = dts.waiting_on
  3999   2453760    2140368.0      0.9      2.7              if ts in s:
  4000   2453760    2079241.0      0.8      2.7                  s.discard(ts)
  4001   2453760    1758894.0      0.7      2.2                  if not s:  # new task ready to run
  4002   1376640    1481114.0      1.1      1.9                      recommendations[dts.key] = "processing"
  4003                                           
  4004   3864960    3015508.0      0.8      3.9          for dts in ts.dependencies:
  4005   2453760    2146552.0      0.9      2.7              s = dts.waiters
  4006   2453760    2144959.0      0.9      2.7              s.discard(ts)
  4007   2453760    1912521.0      0.8      2.4              if not s and not dts.who_wants:
  4008   1382400    1427023.0      1.0      1.8                  recommendations[dts.key] = "released"
  4009                                           
  4010   1411200    1191693.0      0.8      1.5          if not ts.waiters and not ts.who_wants:
  4011                                                       recommendations[ts.key] = "released"
  4012                                                   else:
  4013   1411200    1526265.0      1.1      1.9              msg = {"op": "key-in-memory", "key": ts.key}
  4014   1411200    1051903.0      0.7      1.3              if type is not None:
  4015   1411200    1136769.0      0.8      1.5                  msg["type"] = type
  4016   1411200   28391676.0     20.1     36.3              self.report(msg)
  4017                                           
  4018   1411200    4171908.0      3.0      5.3          ts.state = "memory"
  4019   1411200    1221012.0      0.9      1.6          ts.type = typename
  4020   1411200    1987368.0      1.4      2.5          ts.group.types.add(typename)
  4021                                           
  4022   1411200    1323984.0      0.9      1.7          cs = self.clients["fire-and-forget"]
  4023   1411200    1483505.0      1.1      1.9          if ts in cs.wants_what:
  4024                                                       self.client_releases_keys(client="fire-and-forget", keys=[ts.key])

Total time: 28.7644 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_waiting at line 4026

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4026                                               @profile
  4027                                               def transition_released_waiting(self, key):
  4028   1411200     946995.0      0.7      3.3          try:
  4029   1411200    1051159.0      0.7      3.7              ts = self.tasks[key]
  4030                                           
  4031   1411200     962117.0      0.7      3.3              if self.validate:
  4032                                                           assert ts.run_spec
  4033                                                           assert not ts.waiting_on
  4034                                                           assert not ts.who_has
  4035                                                           assert not ts.processing_on
  4036                                                           assert not any(dts.state == "forgotten" for dts in ts.dependencies)
  4037                                           
  4038   1411200    1016954.0      0.7      3.5              if ts.has_lost_dependencies:
  4039                                                           return {key: "forgotten"}
  4040                                           
  4041   1411200    2467455.0      1.7      8.6              ts.state = "waiting"
  4042                                           
  4043   1411200     951936.0      0.7      3.3              recommendations = {}
  4044                                           
  4045   3864960    2740298.0      0.7      9.5              for dts in ts.dependencies:
  4046   2453760    1629308.0      0.7      5.7                  if dts.exception_blame:
  4047                                                               ts.exception_blame = dts.exception_blame
  4048                                                               recommendations[key] = "erred"
  4049                                                               return recommendations
  4050                                           
  4051   3864960    2608848.0      0.7      9.1              for dts in ts.dependencies:
  4052   2453760    1733243.0      0.7      6.0                  dep = dts.key
  4053   2453760    1732984.0      0.7      6.0                  if not dts.who_has:
  4054   2453760    2007999.0      0.8      7.0                      ts.waiting_on.add(dts)
  4055   2453760    2244100.0      0.9      7.8                  if dts.state == "released":
  4056                                                               recommendations[dep] = "waiting"
  4057                                                           else:
  4058   2453760    2033831.0      0.8      7.1                      dts.waiters.add(ts)
  4059                                           
  4060   1411200    2691752.0      1.9      9.4              ts.waiters = {dts for dts in ts.dependents if dts.state == "waiting"}
  4061                                           
  4062   1411200     998483.0      0.7      3.5              if not ts.waiting_on:
  4063     34560      27706.0      0.8      0.1                  if self.workers:
  4064     34560      29391.0      0.9      0.1                      recommendations[key] = "processing"
  4065                                                           else:
  4066                                                               self.unrunnable.add(ts)
  4067                                                               ts.state = "no-worker"
  4068                                           
  4069   1411200     889878.0      0.6      3.1              return recommendations
  4070                                                   except Exception as e:
  4071                                                       logger.exception(e)
  4072                                                       if LOG_PDB:
  4073                                                           import pdb
  4074                                           
  4075                                                           pdb.set_trace()
  4076                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_no_worker_waiting at line 4078

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4078                                               @profile
  4079                                               def transition_no_worker_waiting(self, key):
  4080                                                   try:
  4081                                                       ts = self.tasks[key]
  4082                                           
  4083                                                       if self.validate:
  4084                                                           assert ts in self.unrunnable
  4085                                                           assert not ts.waiting_on
  4086                                                           assert not ts.who_has
  4087                                                           assert not ts.processing_on
  4088                                           
  4089                                                       self.unrunnable.remove(ts)
  4090                                           
  4091                                                       if ts.has_lost_dependencies:
  4092                                                           return {key: "forgotten"}
  4093                                           
  4094                                                       recommendations = {}
  4095                                           
  4096                                                       for dts in ts.dependencies:
  4097                                                           dep = dts.key
  4098                                                           if not dts.who_has:
  4099                                                               ts.waiting_on.add(dts)
  4100                                                           if dts.state == "released":
  4101                                                               recommendations[dep] = "waiting"
  4102                                                           else:
  4103                                                               dts.waiters.add(ts)
  4104                                           
  4105                                                       ts.state = "waiting"
  4106                                           
  4107                                                       if not ts.waiting_on:
  4108                                                           if self.workers:
  4109                                                               recommendations[key] = "processing"
  4110                                                           else:
  4111                                                               self.unrunnable.add(ts)
  4112                                                               ts.state = "no-worker"
  4113                                           
  4114                                                       return recommendations
  4115                                                   except Exception as e:
  4116                                                       logger.exception(e)
  4117                                                       if LOG_PDB:
  4118                                                           import pdb
  4119                                           
  4120                                                           pdb.set_trace()
  4121                                                       raise

Total time: 57.3495 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: decide_worker at line 4123

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4123                                               @profile
  4124                                               def decide_worker(self, ts):
  4125                                                   """
  4126                                                   Decide on a worker for task *ts*.  Return a WorkerState.
  4127                                                   """
  4128   1411200    7633696.0      5.4     13.3          valid_workers = self.valid_workers(ts)
  4129                                           
  4130   1411200     876706.0      0.6      1.5          if not valid_workers and not ts.loose_restrictions and self.workers:
  4131                                                       self.unrunnable.add(ts)
  4132                                                       ts.state = "no-worker"
  4133                                                       return None
  4134                                           
  4135   1411200     951693.0      0.7      1.7          if ts.dependencies or valid_workers is not True:
  4136   2753280   39174857.0     14.2     68.3              worker = decide_worker(
  4137   1376640     696389.0      0.5      1.2                  ts,
  4138   1376640    3511349.0      2.6      6.1                  self.workers.values(),
  4139   1376640     740092.0      0.5      1.3                  valid_workers,
  4140   1376640    1518123.0      1.1      2.6                  partial(self.worker_objective, ts),
  4141                                                       )
  4142     34560      43410.0      1.3      0.1          elif self.idle:
  4143       240        292.0      1.2      0.0              if len(self.idle) < 20:  # smart but linear in small case
  4144       240       1286.0      5.4      0.0                  worker = min(self.idle, key=operator.attrgetter("occupancy"))
  4145                                                       else:  # dumb but fast in large case
  4146                                                           worker = self.idle[self.n_tasks % len(self.idle)]
  4147                                                   else:
  4148     34320      24660.0      0.7      0.0              if len(self.workers) < 20:  # smart but linear in small case
  4149     68640     264544.0      3.9      0.5                  worker = min(
  4150     34320      69789.0      2.0      0.1                      self.workers.values(), key=operator.attrgetter("occupancy")
  4151                                                           )
  4152                                                       else:  # dumb but fast in large case
  4153                                                           worker = self.workers.values()[self.n_tasks % len(self.workers)]
  4154                                           
  4155   1411200    1080029.0      0.8      1.9          if self.validate:
  4156                                                       assert worker is None or isinstance(worker, WorkerState), (
  4157                                                           type(worker),
  4158                                                           worker,
  4159                                                       )
  4160                                                       assert worker.address in self.workers
  4161                                           
  4162   1411200     762559.0      0.5      1.3          return worker

Total time: 233.682 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_processing at line 4164

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4164                                               @profile
  4165                                               def transition_waiting_processing(self, key):
  4166   1411200    1036939.0      0.7      0.4          try:
  4167   1411200    1211256.0      0.9      0.5              ts = self.tasks[key]
  4168                                           
  4169   1411200    1068914.0      0.8      0.5              if self.validate:
  4170                                                           assert not ts.waiting_on
  4171                                                           assert not ts.who_has
  4172                                                           assert not ts.exception_blame
  4173                                                           assert not ts.processing_on
  4174                                                           assert not ts.has_lost_dependencies
  4175                                                           assert ts not in self.unrunnable
  4176                                                           assert all(dts.who_has for dts in ts.dependencies)
  4177                                           
  4178   1411200   72213305.0     51.2     30.9              ws = self.decide_worker(ts)
  4179   1411200    1060550.0      0.8      0.5              if ws is None:
  4180                                                           return {}
  4181   1411200    1085216.0      0.8      0.5              worker = ws.address
  4182                                           
  4183   1411200    2988492.0      2.1      1.3              duration = self.get_task_duration(ts)
  4184   1411200    5453597.0      3.9      2.3              comm = self.get_comm_cost(ts, ws)
  4185                                           
  4186   1411200    1851110.0      1.3      0.8              ws.processing[ts] = duration + comm
  4187   1411200    1120435.0      0.8      0.5              ts.processing_on = ws
  4188   1411200    1506861.0      1.1      0.6              ws.occupancy += duration + comm
  4189   1411200    1393538.0      1.0      0.6              self.total_occupancy += duration + comm
  4190   1411200    3810611.0      2.7      1.6              ts.state = "processing"
  4191   1411200    2113746.0      1.5      0.9              self.consume_resources(ts, ws)
  4192   1411200   40980863.0     29.0     17.5              self.check_idle_saturated(ws)
  4193   1411200    1594913.0      1.1      0.7              self.n_tasks += 1
  4194                                           
  4195   1411200    1158362.0      0.8      0.5              if ts.actor:
  4196                                                           ws.actors.add(ts)
  4197                                           
  4198                                                       # logger.debug("Send job to worker: %s, %s", worker, key)
  4199                                           
  4200   1411200   90898678.0     64.4     38.9              self.send_task_to_worker(worker, key)
  4201                                           
  4202   1411200    1134199.0      0.8      0.5              return {}
  4203                                                   except Exception as e:
  4204                                                       logger.exception(e)
  4205                                                       if LOG_PDB:
  4206                                                           import pdb
  4207                                           
  4208                                                           pdb.set_trace()
  4209                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_memory at line 4211

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4211                                               @profile
  4212                                               def transition_waiting_memory(self, key, nbytes=None, worker=None, **kwargs):
  4213                                                   try:
  4214                                                       ws = self.workers[worker]
  4215                                                       ts = self.tasks[key]
  4216                                           
  4217                                                       if self.validate:
  4218                                                           assert not ts.processing_on
  4219                                                           assert ts.waiting_on
  4220                                                           assert ts.state == "waiting"
  4221                                           
  4222                                                       ts.waiting_on.clear()
  4223                                           
  4224                                                       if nbytes is not None:
  4225                                                           ts.set_nbytes(nbytes)
  4226                                           
  4227                                                       self.check_idle_saturated(ws)
  4228                                           
  4229                                                       recommendations = {}
  4230                                           
  4231                                                       self._add_to_memory(ts, ws, recommendations, **kwargs)
  4232                                           
  4233                                                       if self.validate:
  4234                                                           assert not ts.processing_on
  4235                                                           assert not ts.waiting_on
  4236                                                           assert ts.who_has
  4237                                           
  4238                                                       return recommendations
  4239                                                   except Exception as e:
  4240                                                       logger.exception(e)
  4241                                                       if LOG_PDB:
  4242                                                           import pdb
  4243                                           
  4244                                                           pdb.set_trace()
  4245                                                       raise

Total time: 253.479 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_memory at line 4247

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4247                                               @profile
  4248                                               def transition_processing_memory(
  4249                                                   self,
  4250                                                   key,
  4251                                                   nbytes=None,
  4252                                                   type=None,
  4253                                                   typename=None,
  4254                                                   worker=None,
  4255                                                   startstops=None,
  4256                                                   **kwargs,
  4257                                               ):
  4258   1411200    1366112.0      1.0      0.5          try:
  4259   1411200    1577507.0      1.1      0.6              ts = self.tasks[key]
  4260   1411200    1308946.0      0.9      0.5              assert worker
  4261   1411200    1661476.0      1.2      0.7              assert isinstance(worker, str)
  4262                                           
  4263   1411200    1363001.0      1.0      0.5              if self.validate:
  4264                                                           assert ts.processing_on
  4265                                                           ws = ts.processing_on
  4266                                                           assert ts in ws.processing
  4267                                                           assert not ts.waiting_on
  4268                                                           assert not ts.who_has, (ts, ts.who_has)
  4269                                                           assert not ts.exception_blame
  4270                                                           assert ts.state == "processing"
  4271                                           
  4272   1411200    1890173.0      1.3      0.7              ws = self.workers.get(worker)
  4273   1411200    1333083.0      0.9      0.5              if ws is None:
  4274                                                           return {key: "released"}
  4275                                           
  4276   1411200    3929029.0      2.8      1.6              if ws != ts.processing_on:  # someone else has this task
  4277                                                           logger.info(
  4278                                                               "Unexpected worker completed task, likely due to"
  4279                                                               " work stealing.  Expected: %s, Got: %s, Key: %s",
  4280                                                               ts.processing_on,
  4281                                                               ws,
  4282                                                               key,
  4283                                                           )
  4284                                                           return {}
  4285                                           
  4286   1411200    1348042.0      1.0      0.5              if startstops:
  4287   1411200    1635488.0      1.2      0.6                  L = list()
  4288   2863086    2895992.0      1.0      1.1                  for startstop in startstops:
  4289   1451886    1541087.0      1.1      0.6                      stop = startstop["stop"]
  4290   1451886    1366939.0      0.9      0.5                      start = startstop["start"]
  4291   1451886    1370709.0      0.9      0.5                      action = startstop["action"]
  4292   1451886    1490374.0      1.0      0.6                      if action == "compute":
  4293   1411200    1730307.0      1.2      0.7                          L.append((start, stop))
  4294                                           
  4295                                                               # record timings of all actions -- a cheaper way of
  4296                                                               # getting timing info compared with get_task_stream()
  4297   1451886    3031811.0      2.1      1.2                      ts.prefix.all_durations[action] += stop - start
  4298                                           
  4299   1411200    1717184.0      1.2      0.7                  if len(L) > 0:
  4300   1411200    1718285.0      1.2      0.7                      compute_start, compute_stop = L[0]
  4301                                                           else:  # This is very rare
  4302                                                               compute_start = compute_stop = None
  4303                                                       else:
  4304                                                           compute_start = compute_stop = None
  4305                                           
  4306                                                       #############################
  4307                                                       # Update Timing Information #
  4308                                                       #############################
  4309   1411200    2707050.0      1.9      1.1              if compute_start and ws.processing.get(ts, True):
  4310                                                           # Update average task duration for worker
  4311   1411200    1590841.0      1.1      0.6                  old_duration = ts.prefix.duration_average or 0
  4312   1411200    1372434.0      1.0      0.5                  new_duration = compute_stop - compute_start
  4313   1411200    1347876.0      1.0      0.5                  if not old_duration:
  4314         4          3.0      0.8      0.0                      avg_duration = new_duration
  4315                                                           else:
  4316   1411196    1696634.0      1.2      0.7                      avg_duration = 0.5 * old_duration + 0.5 * new_duration
  4317                                           
  4318   1411200    1615312.0      1.1      0.6                  ts.prefix.duration_average = avg_duration
  4319   1411200    2061702.0      1.5      0.8                  ts.group.duration += new_duration
  4320                                           
  4321   1413462    2216008.0      1.6      0.9                  for tts in self.unknown_durations.pop(ts.prefix.name, ()):
  4322      2262       3107.0      1.4      0.0                      if tts.processing_on:
  4323      2262       2683.0      1.2      0.0                          wws = tts.processing_on
  4324      2262       3450.0      1.5      0.0                          old = wws.processing[tts]
  4325      2262       8006.0      3.5      0.0                          comm = self.get_comm_cost(tts, wws)
  4326      2262       3187.0      1.4      0.0                          wws.processing[tts] = avg_duration + comm
  4327      2262       3294.0      1.5      0.0                          wws.occupancy += avg_duration + comm - old
  4328      2262       3293.0      1.5      0.0                          self.total_occupancy += avg_duration + comm - old
  4329                                           
  4330                                                       ############################
  4331                                                       # Update State Information #
  4332                                                       ############################
  4333   1411200    1437553.0      1.0      0.6              if nbytes is not None:
  4334   1411200    5751214.0      4.1      2.3                  ts.set_nbytes(nbytes)
  4335                                           
  4336   1411200    1436400.0      1.0      0.6              recommendations = {}
  4337                                           
  4338   1411200   65156507.0     46.2     25.7              self._remove_from_processing(ts)
  4339                                           
  4340   1411200  127968757.0     90.7     50.5              self._add_to_memory(ts, ws, recommendations, type=type, typename=typename)
  4341                                           
  4342   1411200    1554860.0      1.1      0.6              if self.validate:
  4343                                                           assert not ts.processing_on
  4344                                                           assert not ts.waiting_on
  4345                                           
  4346   1411200    1262897.0      0.9      0.5              return recommendations
  4347                                                   except Exception as e:
  4348                                                       logger.exception(e)
  4349                                                       if LOG_PDB:
  4350                                                           import pdb
  4351                                           
  4352                                                           pdb.set_trace()
  4353                                                       raise

Total time: 74.8063 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_memory_released at line 4355

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4355                                               @profile
  4356                                               def transition_memory_released(self, key, safe=False):
  4357   1382400    1113260.0      0.8      1.5          try:
  4358   1382400    1213541.0      0.9      1.6              ts = self.tasks[key]
  4359                                           
  4360   1382400    1079620.0      0.8      1.4              if self.validate:
  4361                                                           assert not ts.waiting_on
  4362                                                           assert not ts.processing_on
  4363                                                           if safe:
  4364                                                               assert not ts.waiters
  4365                                           
  4366   1382400    1107035.0      0.8      1.5              if ts.actor:
  4367                                                           for ws in ts.who_has:
  4368                                                               ws.actors.discard(ts)
  4369                                                           if ts.who_wants:
  4370                                                               ts.exception_blame = ts
  4371                                                               ts.exception = "Worker holding Actor was lost"
  4372                                                               return {ts.key: "erred"}  # don't try to recreate
  4373                                           
  4374   1382400    1060116.0      0.8      1.4              recommendations = {}
  4375                                           
  4376   1382400    1489823.0      1.1      2.0              for dts in ts.waiters:
  4377                                                           if dts.state in ("no-worker", "processing"):
  4378                                                               recommendations[dts.key] = "waiting"
  4379                                                           elif dts.state == "waiting":
  4380                                                               dts.waiting_on.add(ts)
  4381                                           
  4382                                                       # XXX factor this out?
  4383   3522932    3027403.0      0.9      4.0              for ws in ts.who_has:
  4384   2140532    2337584.0      1.1      3.1                  ws.has_what.remove(ts)
  4385   2140532    3704401.0      1.7      5.0                  ws.nbytes -= ts.get_nbytes()
  4386   2140532    3246301.0      1.5      4.3                  ts.group.nbytes_in_memory -= ts.get_nbytes()
  4387   4281064   18814008.0      4.4     25.2                  self.worker_send(
  4388   2140532    2359787.0      1.1      3.2                      ws.address, {"op": "delete-data", "keys": [key], "report": False}
  4389                                                           )
  4390   1382400    1373390.0      1.0      1.8              ts.who_has.clear()
  4391                                           
  4392   1382400    3235238.0      2.3      4.3              ts.state = "released"
  4393                                           
  4394   1382400   23672684.0     17.1     31.6              self.report({"op": "lost-data", "key": key})
  4395                                           
  4396   1382400    1384073.0      1.0      1.9              if not ts.run_spec:  # pure data
  4397                                                           recommendations[key] = "forgotten"
  4398   1382400    1163620.0      0.8      1.6              elif ts.has_lost_dependencies:
  4399                                                           recommendations[key] = "forgotten"
  4400   1382400    1309552.0      0.9      1.8              elif ts.who_wants or ts.waiters:
  4401                                                           recommendations[key] = "waiting"
  4402                                           
  4403   1382400    1145898.0      0.8      1.5              if self.validate:
  4404                                                           assert not ts.waiting_on
  4405                                           
  4406   1382400     969014.0      0.7      1.3              return recommendations
  4407                                                   except Exception as e:
  4408                                                       logger.exception(e)
  4409                                                       if LOG_PDB:
  4410                                                           import pdb
  4411                                           
  4412                                                           pdb.set_trace()
  4413                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_erred at line 4415

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4415                                               @profile
  4416                                               def transition_released_erred(self, key):
  4417                                                   try:
  4418                                                       ts = self.tasks[key]
  4419                                           
  4420                                                       if self.validate:
  4421                                                           with log_errors(pdb=LOG_PDB):
  4422                                                               assert ts.exception_blame
  4423                                                               assert not ts.who_has
  4424                                                               assert not ts.waiting_on
  4425                                                               assert not ts.waiters
  4426                                           
  4427                                                       recommendations = {}
  4428                                           
  4429                                                       failing_ts = ts.exception_blame
  4430                                           
  4431                                                       for dts in ts.dependents:
  4432                                                           dts.exception_blame = failing_ts
  4433                                                           if not dts.who_has:
  4434                                                               recommendations[dts.key] = "erred"
  4435                                           
  4436                                                       self.report(
  4437                                                           {
  4438                                                               "op": "task-erred",
  4439                                                               "key": key,
  4440                                                               "exception": failing_ts.exception,
  4441                                                               "traceback": failing_ts.traceback,
  4442                                                           }
  4443                                                       )
  4444                                           
  4445                                                       ts.state = "erred"
  4446                                           
  4447                                                       # TODO: waiting data?
  4448                                                       return recommendations
  4449                                                   except Exception as e:
  4450                                                       logger.exception(e)
  4451                                                       if LOG_PDB:
  4452                                                           import pdb
  4453                                           
  4454                                                           pdb.set_trace()
  4455                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_erred_released at line 4457

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4457                                               @profile
  4458                                               def transition_erred_released(self, key):
  4459                                                   try:
  4460                                                       ts = self.tasks[key]
  4461                                           
  4462                                                       if self.validate:
  4463                                                           with log_errors(pdb=LOG_PDB):
  4464                                                               assert all(dts.state != "erred" for dts in ts.dependencies)
  4465                                                               assert ts.exception_blame
  4466                                                               assert not ts.who_has
  4467                                                               assert not ts.waiting_on
  4468                                                               assert not ts.waiters
  4469                                           
  4470                                                       recommendations = {}
  4471                                           
  4472                                                       ts.exception = None
  4473                                                       ts.exception_blame = None
  4474                                                       ts.traceback = None
  4475                                           
  4476                                                       for dep in ts.dependents:
  4477                                                           if dep.state == "erred":
  4478                                                               recommendations[dep.key] = "waiting"
  4479                                           
  4480                                                       self.report({"op": "task-retried", "key": key})
  4481                                                       ts.state = "released"
  4482                                           
  4483                                                       return recommendations
  4484                                                   except Exception as e:
  4485                                                       logger.exception(e)
  4486                                                       if LOG_PDB:
  4487                                                           import pdb
  4488                                           
  4489                                                           pdb.set_trace()
  4490                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_released at line 4492

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4492                                               @profile
  4493                                               def transition_waiting_released(self, key):
  4494                                                   try:
  4495                                                       ts = self.tasks[key]
  4496                                           
  4497                                                       if self.validate:
  4498                                                           assert not ts.who_has
  4499                                                           assert not ts.processing_on
  4500                                           
  4501                                                       recommendations = {}
  4502                                           
  4503                                                       for dts in ts.dependencies:
  4504                                                           s = dts.waiters
  4505                                                           if ts in s:
  4506                                                               s.discard(ts)
  4507                                                               if not s and not dts.who_wants:
  4508                                                                   recommendations[dts.key] = "released"
  4509                                                       ts.waiting_on.clear()
  4510                                           
  4511                                                       ts.state = "released"
  4512                                           
  4513                                                       if ts.has_lost_dependencies:
  4514                                                           recommendations[key] = "forgotten"
  4515                                                       elif not ts.exception_blame and (ts.who_wants or ts.waiters):
  4516                                                           recommendations[key] = "waiting"
  4517                                                       else:
  4518                                                           ts.waiters.clear()
  4519                                           
  4520                                                       return recommendations
  4521                                                   except Exception as e:
  4522                                                       logger.exception(e)
  4523                                                       if LOG_PDB:
  4524                                                           import pdb
  4525                                           
  4526                                                           pdb.set_trace()
  4527                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_released at line 4529

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4529                                               @profile
  4530                                               def transition_processing_released(self, key):
  4531                                                   try:
  4532                                                       ts = self.tasks[key]
  4533                                           
  4534                                                       if self.validate:
  4535                                                           assert ts.processing_on
  4536                                                           assert not ts.who_has
  4537                                                           assert not ts.waiting_on
  4538                                                           assert self.tasks[key].state == "processing"
  4539                                           
  4540                                                       self._remove_from_processing(
  4541                                                           ts, send_worker_msg={"op": "release-task", "key": key}
  4542                                                       )
  4543                                           
  4544                                                       ts.state = "released"
  4545                                           
  4546                                                       recommendations = {}
  4547                                           
  4548                                                       if ts.has_lost_dependencies:
  4549                                                           recommendations[key] = "forgotten"
  4550                                                       elif ts.waiters or ts.who_wants:
  4551                                                           recommendations[key] = "waiting"
  4552                                           
  4553                                                       if recommendations.get(key) != "waiting":
  4554                                                           for dts in ts.dependencies:
  4555                                                               if dts.state != "released":
  4556                                                                   s = dts.waiters
  4557                                                                   s.discard(ts)
  4558                                                                   if not s and not dts.who_wants:
  4559                                                                       recommendations[dts.key] = "released"
  4560                                                           ts.waiters.clear()
  4561                                           
  4562                                                       if self.validate:
  4563                                                           assert not ts.processing_on
  4564                                           
  4565                                                       return recommendations
  4566                                                   except Exception as e:
  4567                                                       logger.exception(e)
  4568                                                       if LOG_PDB:
  4569                                                           import pdb
  4570                                           
  4571                                                           pdb.set_trace()
  4572                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_erred at line 4574

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4574                                               @profile
  4575                                               def transition_processing_erred(
  4576                                                   self, key, cause=None, exception=None, traceback=None, **kwargs
  4577                                               ):
  4578                                                   try:
  4579                                                       ts = self.tasks[key]
  4580                                           
  4581                                                       if self.validate:
  4582                                                           assert cause or ts.exception_blame
  4583                                                           assert ts.processing_on
  4584                                                           assert not ts.who_has
  4585                                                           assert not ts.waiting_on
  4586                                           
  4587                                                       if ts.actor:
  4588                                                           ws = ts.processing_on
  4589                                                           ws.actors.remove(ts)
  4590                                           
  4591                                                       self._remove_from_processing(ts)
  4592                                           
  4593                                                       if exception is not None:
  4594                                                           ts.exception = exception
  4595                                                       if traceback is not None:
  4596                                                           ts.traceback = traceback
  4597                                                       if cause is not None:
  4598                                                           failing_ts = self.tasks[cause]
  4599                                                           ts.exception_blame = failing_ts
  4600                                                       else:
  4601                                                           failing_ts = ts.exception_blame
  4602                                           
  4603                                                       recommendations = {}
  4604                                           
  4605                                                       for dts in ts.dependents:
  4606                                                           dts.exception_blame = failing_ts
  4607                                                           recommendations[dts.key] = "erred"
  4608                                           
  4609                                                       for dts in ts.dependencies:
  4610                                                           s = dts.waiters
  4611                                                           s.discard(ts)
  4612                                                           if not s and not dts.who_wants:
  4613                                                               recommendations[dts.key] = "released"
  4614                                           
  4615                                                       ts.waiters.clear()  # do anything with this?
  4616                                           
  4617                                                       ts.state = "erred"
  4618                                           
  4619                                                       self.report(
  4620                                                           {
  4621                                                               "op": "task-erred",
  4622                                                               "key": key,
  4623                                                               "exception": failing_ts.exception,
  4624                                                               "traceback": failing_ts.traceback,
  4625                                                           }
  4626                                                       )
  4627                                           
  4628                                                       cs = self.clients["fire-and-forget"]
  4629                                                       if ts in cs.wants_what:
  4630                                                           self.client_releases_keys(client="fire-and-forget", keys=[key])
  4631                                           
  4632                                                       if self.validate:
  4633                                                           assert not ts.processing_on
  4634                                           
  4635                                                       return recommendations
  4636                                                   except Exception as e:
  4637                                                       logger.exception(e)
  4638                                                       if LOG_PDB:
  4639                                                           import pdb
  4640                                           
  4641                                                           pdb.set_trace()
  4642                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_no_worker_released at line 4644

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4644                                               @profile
  4645                                               def transition_no_worker_released(self, key):
  4646                                                   try:
  4647                                                       ts = self.tasks[key]
  4648                                           
  4649                                                       if self.validate:
  4650                                                           assert self.tasks[key].state == "no-worker"
  4651                                                           assert not ts.who_has
  4652                                                           assert not ts.waiting_on
  4653                                           
  4654                                                       self.unrunnable.remove(ts)
  4655                                                       ts.state = "released"
  4656                                           
  4657                                                       for dts in ts.dependencies:
  4658                                                           dts.waiters.discard(ts)
  4659                                           
  4660                                                       ts.waiters.clear()
  4661                                           
  4662                                                       return {}
  4663                                                   except Exception as e:
  4664                                                       logger.exception(e)
  4665                                                       if LOG_PDB:
  4666                                                           import pdb
  4667                                           
  4668                                                           pdb.set_trace()
  4669                                                       raise

Total time: 6.19857 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: remove_key at line 4671

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4671                                               @profile
  4672                                               def remove_key(self, key):
  4673   1411200     889370.0      0.6     14.3          ts = self.tasks.pop(key)
  4674   1411200    1031175.0      0.7     16.6          assert ts.state == "forgotten"
  4675   1411200     782741.0      0.6     12.6          self.unrunnable.discard(ts)
  4676   1411200     665958.0      0.5     10.7          for cs in ts.who_wants:
  4677                                                       cs.wants_what.remove(ts)
  4678   1411200     671141.0      0.5     10.8          ts.who_wants.clear()
  4679   1411200     639907.0      0.5     10.3          ts.processing_on = None
  4680   1411200     827867.0      0.6     13.4          ts.exception_blame = ts.exception = ts.traceback = None
  4681                                           
  4682   1411200     690414.0      0.5     11.1          if key in self.task_metadata:
  4683                                                       del self.task_metadata[key]

Total time: 21.1178 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _propagate_forgotten at line 4685

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4685                                               @profile
  4686                                               def _propagate_forgotten(self, ts, recommendations):
  4687   1411200    2602659.0      1.8     12.3          ts.state = "forgotten"
  4688   1411200     795070.0      0.6      3.8          key = ts.key
  4689   1411200     945573.0      0.7      4.5          for dts in ts.dependents:
  4690                                                       dts.has_lost_dependencies = True
  4691                                                       dts.dependencies.remove(ts)
  4692                                                       dts.waiting_on.discard(ts)
  4693                                                       if dts.state not in ("memory", "erred"):
  4694                                                           # Cannot compute task anymore
  4695                                                           recommendations[dts.key] = "forgotten"
  4696   1411200     912857.0      0.6      4.3          ts.dependents.clear()
  4697   1411200     918376.0      0.7      4.3          ts.waiters.clear()
  4698                                           
  4699   3864960    2037529.0      0.5      9.6          for dts in ts.dependencies:
  4700   2453760    1883230.0      0.8      8.9              dts.dependents.remove(ts)
  4701   2453760    1532526.0      0.6      7.3              s = dts.waiters
  4702   2453760    1559837.0      0.6      7.4              s.discard(ts)
  4703   2453760    1446258.0      0.6      6.8              if not dts.dependents and not dts.who_wants:
  4704                                                           # Task not needed anymore
  4705   1382400     696653.0      0.5      3.3                  assert dts is not ts
  4706   1382400    1039558.0      0.8      4.9                  recommendations[dts.key] = "forgotten"
  4707   1411200     866784.0      0.6      4.1          ts.dependencies.clear()
  4708   1411200     873469.0      0.6      4.1          ts.waiting_on.clear()
  4709                                           
  4710   1411200     764540.0      0.5      3.6          if ts.who_has:
  4711     28800      59664.0      2.1      0.3              ts.group.nbytes_in_memory -= ts.get_nbytes()
  4712                                           
  4713   1440000     856937.0      0.6      4.1          for ws in ts.who_has:
  4714     28800      31305.0      1.1      0.1              ws.has_what.remove(ts)
  4715     28800      41380.0      1.4      0.2              ws.nbytes -= ts.get_nbytes()
  4716     28800      26864.0      0.9      0.1              w = ws.address
  4717     28800      35245.0      1.2      0.2              if w in self.workers:  # in case worker has died
  4718     57600     337811.0      5.9      1.6                  self.worker_send(
  4719     28800      28966.0      1.0      0.1                      w, {"op": "delete-data", "keys": [key], "report": False}
  4720                                                           )
  4721   1411200     824683.0      0.6      3.9          ts.who_has.clear()

Total time: 5.47277 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_memory_forgotten at line 4723

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4723                                               @profile
  4724                                               def transition_memory_forgotten(self, key):
  4725     28800      18375.0      0.6      0.3          try:
  4726     28800      20682.0      0.7      0.4              ts = self.tasks[key]
  4727                                           
  4728     28800      18384.0      0.6      0.3              if self.validate:
  4729                                                           assert ts.state == "memory"
  4730                                                           assert not ts.processing_on
  4731                                                           assert not ts.waiting_on
  4732                                                           if not ts.run_spec:
  4733                                                               # It's ok to forget a pure data task
  4734                                                               pass
  4735                                                           elif ts.has_lost_dependencies:
  4736                                                               # It's ok to forget a task with forgotten dependencies
  4737                                                               pass
  4738                                                           elif not ts.who_wants and not ts.waiters and not ts.dependents:
  4739                                                               # It's ok to forget a task that nobody needs
  4740                                                               pass
  4741                                                           else:
  4742                                                               assert 0, (ts,)
  4743                                           
  4744     28800      17293.0      0.6      0.3              recommendations = {}
  4745                                           
  4746     28800      18807.0      0.7      0.3              if ts.actor:
  4747                                                           for ws in ts.who_has:
  4748                                                               ws.actors.discard(ts)
  4749                                           
  4750     28800    4295467.0    149.1     78.5              self._propagate_forgotten(ts, recommendations)
  4751                                           
  4752     28800     738948.0     25.7     13.5              self.report_on_key(ts=ts)
  4753     28800     329526.0     11.4      6.0              self.remove_key(key)
  4754                                           
  4755     28800      15288.0      0.5      0.3              return recommendations
  4756                                                   except Exception as e:
  4757                                                       logger.exception(e)
  4758                                                       if LOG_PDB:
  4759                                                           import pdb
  4760                                           
  4761                                                           pdb.set_trace()
  4762                                                       raise

Total time: 80.311 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_forgotten at line 4764

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4764                                               @profile
  4765                                               def transition_released_forgotten(self, key):
  4766   1382400     668836.0      0.5      0.8          try:
  4767   1382400     794873.0      0.6      1.0              ts = self.tasks[key]
  4768                                           
  4769   1382400     700180.0      0.5      0.9              if self.validate:
  4770                                                           assert ts.state in ("released", "erred")
  4771                                                           assert not ts.who_has
  4772                                                           assert not ts.processing_on
  4773                                                           assert not ts.waiting_on, (ts, ts.waiting_on)
  4774                                                           if not ts.run_spec:
  4775                                                               # It's ok to forget a pure data task
  4776                                                               pass
  4777                                                           elif ts.has_lost_dependencies:
  4778                                                               # It's ok to forget a task with forgotten dependencies
  4779                                                               pass
  4780                                                           elif not ts.who_wants and not ts.waiters and not ts.dependents:
  4781                                                               # It's ok to forget a task that nobody needs
  4782                                                               pass
  4783                                                           else:
  4784                                                               assert 0, (ts,)
  4785                                           
  4786   1382400     671888.0      0.5      0.8              recommendations = {}
  4787   1382400   36202052.0     26.2     45.1              self._propagate_forgotten(ts, recommendations)
  4788                                           
  4789   1382400   27593058.0     20.0     34.4              self.report_on_key(ts=ts)
  4790   1382400   13039606.0      9.4     16.2              self.remove_key(key)
  4791                                           
  4792   1382400     640508.0      0.5      0.8              return recommendations
  4793                                                   except Exception as e:
  4794                                                       logger.exception(e)
  4795                                                       if LOG_PDB:
  4796                                                           import pdb
  4797                                           
  4798                                                           pdb.set_trace()
  4799                                                       raise

Total time: 1057.41 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition at line 4801

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4801                                               @profile
  4802                                               def transition(self, key, finish, *args, **kwargs):
  4803                                                   """Transition a key from its current state to the finish state
  4804                                           
  4805                                                   Examples
  4806                                                   --------
  4807                                                   >>> self.transition('x', 'waiting')
  4808                                                   {'x': 'processing'}
  4809                                           
  4810                                                   Returns
  4811                                                   -------
  4812                                                   Dictionary of recommendations for future transitions
  4813                                           
  4814                                                   See Also
  4815                                                   --------
  4816                                                   Scheduler.transitions: transitive version of this function
  4817                                                   """
  4818   7027200    5994248.0      0.9      0.6          try:
  4819   7027200    5634051.0      0.8      0.5              try:
  4820   7027200    8134181.0      1.2      0.8                  ts = self.tasks[key]
  4821                                                       except KeyError:
  4822                                                           return {}
  4823   7027200    7799890.0      1.1      0.7              start = ts.state
  4824   7027200    5777743.0      0.8      0.5              if start == finish:
  4825                                                           return {}
  4826                                           
  4827   7027200    6080713.0      0.9      0.6              if self.plugins:
  4828   7027200    8664187.0      1.2      0.8                  dependents = set(ts.dependents)
  4829   7027200    8019461.0      1.1      0.8                  dependencies = set(ts.dependencies)
  4830                                           
  4831   7027200    7537678.0      1.1      0.7              if (start, finish) in self._transitions:
  4832   7027200    6852817.0      1.0      0.6                  func = self._transitions[start, finish]
  4833   7027200  846515003.0    120.5     80.1                  recommendations = func(key, *args, **kwargs)
  4834                                                       elif "released" not in (start, finish):
  4835                                                           func = self._transitions["released", finish]
  4836                                                           assert not args and not kwargs
  4837                                                           a = self.transition(key, "released")
  4838                                                           if key in a:
  4839                                                               func = self._transitions["released", a[key]]
  4840                                                           b = func(key)
  4841                                                           a = a.copy()
  4842                                                           a.update(b)
  4843                                                           recommendations = a
  4844                                                           start = "released"
  4845                                                       else:
  4846                                                           raise RuntimeError(
  4847                                                               "Impossible transition from %r to %r" % (start, finish)
  4848                                                           )
  4849                                           
  4850   7027200    8791286.0      1.3      0.8              finish2 = ts.state
  4851   7027200   13000341.0      1.9      1.2              self.transition_log.append((key, start, finish2, recommendations, time()))
  4852   7027200    6281674.0      0.9      0.6              if self.validate:
  4853                                                           logger.debug(
  4854                                                               "Transitioned %r %s->%s (actual: %s).  Consequence: %s",
  4855                                                               key,
  4856                                                               start,
  4857                                                               finish2,
  4858                                                               ts.state,
  4859                                                               dict(recommendations),
  4860                                                           )
  4861   7027200    6162377.0      0.9      0.6              if self.plugins:
  4862                                                           # Temporarily put back forgotten key for plugin to retrieve it
  4863   7027200    8290464.0      1.2      0.8                  if ts.state == "forgotten":
  4864   1411200    1014898.0      0.7      0.1                      try:
  4865   1411200    1205599.0      0.9      0.1                          ts.dependents = dependents
  4866   1411200    1162623.0      0.8      0.1                          ts.dependencies = dependencies
  4867                                                               except KeyError:
  4868                                                                   pass
  4869   1411200    1484769.0      1.1      0.1                      self.tasks[ts.key] = ts
  4870  14054400   15304417.0      1.1      1.4                  for plugin in list(self.plugins):
  4871   7027200    5739830.0      0.8      0.5                      try:
  4872   7027200   43606307.0      6.2      4.1                          plugin.transition(key, start, finish2, *args, **kwargs)
  4873                                                               except Exception:
  4874                                                                   logger.info("Plugin failed with exception", exc_info=True)
  4875   7027200    8269286.0      1.2      0.8                  if ts.state == "forgotten":
  4876   1411200    1288030.0      0.9      0.1                      del self.tasks[ts.key]
  4877                                           
  4878   7027200    8074082.0      1.1      0.8              if ts.state == "forgotten" and ts.group.name in self.task_groups:
  4879                                                           # Remove TaskGroup if all tasks are in the forgotten state
  4880   1411200    1135224.0      0.8      0.1                  tg = ts.group
  4881   1411200    4022565.0      2.9      0.4                  if not any(tg.states.get(s) for s in ALL_TASK_STATES):
  4882       200        307.0      1.5      0.0                      ts.prefix.groups.remove(tg)
  4883       200        182.0      0.9      0.0                      del self.task_groups[tg.name]
  4884                                           
  4885   7027200    5568933.0      0.8      0.5              return recommendations
  4886                                                   except Exception as e:
  4887                                                       logger.exception("Error transitioning %r from %r to %r", key, start, finish)
  4888                                                       if LOG_PDB:
  4889                                                           import pdb
  4890                                           
  4891                                                           pdb.set_trace()
  4892                                                       raise

Total time: 854.22 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transitions at line 4894

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4894                                               @profile
  4895                                               def transitions(self, recommendations):
  4896                                                   """Process transitions until none are left
  4897                                           
  4898                                                   This includes feedback from previous transitions and continues until we
  4899                                                   reach a steady state
  4900                                                   """
  4901   1440025    1133280.0      0.8      0.1          keys = set()
  4902   1440025    1282607.0      0.9      0.2          recommendations = recommendations.copy()
  4903   7056025    2891591.0      0.4      0.3          while recommendations:
  4904   5616000    3576508.0      0.6      0.4              key, finish = recommendations.popitem()
  4905   5616000    3190613.0      0.6      0.4              keys.add(key)
  4906   5616000  837355473.0    149.1     98.0              new = self.transition(key, finish)
  4907   5616000    4020155.0      0.7      0.5              recommendations.update(new)
  4908                                           
  4909   1440025     770177.0      0.5      0.1          if self.validate:
  4910                                                       for key in keys:
  4911                                                           self.validate_key(key)

Total time: 52.5477 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: check_idle_saturated at line 4946

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4946                                               @profile
  4947                                               def check_idle_saturated(self, ws, occ=None):
  4948                                                   """Update the status of the idle and saturated state
  4949                                           
  4950                                                   The scheduler keeps track of workers that are ..
  4951                                           
  4952                                                   -  Saturated: have enough work to stay busy
  4953                                                   -  Idle: do not have enough work to stay busy
  4954                                           
  4955                                                   They are considered saturated if they both have enough tasks to occupy
  4956                                                   all of their threads, and if the expected runtime of those tasks is
  4957                                                   large enough.
  4958                                           
  4959                                                   This is useful for load balancing and adaptivity.
  4960                                                   """
  4961   2870757   21371986.0      7.4     40.7          if self.total_nthreads == 0 or ws.status == Status.closed:
  4962                                                       return
  4963   2870757    2148355.0      0.7      4.1          if occ is None:
  4964   2846847    2001666.0      0.7      3.8              occ = ws.occupancy
  4965   2870757    1923697.0      0.7      3.7          nc = ws.nthreads
  4966   2870757    2428612.0      0.8      4.6          p = len(ws.processing)
  4967                                           
  4968   2870757    2572901.0      0.9      4.9          avg = self.total_occupancy / self.total_nthreads
  4969                                           
  4970   2870757    2981512.0      1.0      5.7          if p < nc or occ / nc < avg / 2:
  4971    147930     405389.0      2.7      0.8              self.idle.add(ws)
  4972    147930     200669.0      1.4      0.4              self.saturated.discard(ws)
  4973                                                   else:
  4974   2722827    7432112.0      2.7     14.1              self.idle.discard(ws)
  4975                                           
  4976   2722827    2838403.0      1.0      5.4              pending = occ * (p - nc) / p / nc
  4977   2722827    2506815.0      0.9      4.8              if p > nc and pending > 0.4 and pending > 1.9 * avg:
  4978     44212      61236.0      1.4      0.1                  self.saturated.add(ws)
  4979                                                       else:
  4980   2678615    3674345.0      1.4      7.0                  self.saturated.discard(ws)

Total time: 21.9729 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: decide_worker at line 5521

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  5521                                           @profile
  5522                                           def decide_worker(ts, all_workers, valid_workers, objective):
  5523                                               """
  5524                                               Decide which worker should take task *ts*.
  5525                                           
  5526                                               We choose the worker that has the data on which *ts* depends.
  5527                                           
  5528                                               If several workers have dependencies then we choose the less-busy worker.
  5529                                           
  5530                                               Optionally provide *valid_workers* of where jobs are allowed to occur
  5531                                               (if all workers are allowed to take the task, pass True instead).
  5532                                           
  5533                                               If the task requires data communication because no eligible worker has
  5534                                               all the dependencies already, then we choose to minimize the number
  5535                                               of bytes sent between workers.  This is determined by calling the
  5536                                               *objective* function.
  5537                                               """
  5538   1376640    1129968.0      0.8      5.1      deps = ts.dependencies
  5539   1376640    4322413.0      3.1     19.7      assert all(dts.who_has for dts in deps)
  5540   1376640    1094415.0      0.8      5.0      if ts.actor:
  5541                                                   candidates = set(all_workers)
  5542                                               else:
  5543   1376640    4886431.0      3.5     22.2          candidates = {ws for dts in deps for ws in dts.who_has}
  5544   1376640    1052237.0      0.8      4.8      if valid_workers is True:
  5545   1376640     993795.0      0.7      4.5          if not candidates:
  5546                                                       candidates = set(all_workers)
  5547                                               else:
  5548                                                   candidates &= valid_workers
  5549                                                   if not candidates:
  5550                                                       candidates = valid_workers
  5551                                                       if not candidates:
  5552                                                           if ts.loose_restrictions:
  5553                                                               return decide_worker(ts, all_workers, True, objective)
  5554                                                           else:
  5555                                                               return None
  5556   1376640     921920.0      0.7      4.2      if not candidates:
  5557                                                   return None
  5558                                           
  5559   1376640    1261606.0      0.9      5.7      if len(candidates) == 1:
  5560   1286266    1332526.0      1.0      6.1          return first(candidates)
  5561                                           
  5562     90374    4977623.0     55.1     22.7      return min(candidates, key=objective)

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition at line 5768

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  5768                                               @profile
  5769                                               def transition(self, key, start, finish, *args, **kwargs):
  5770                                                   if finish == "memory" or finish == "erred":
  5771                                                       ts = self.scheduler.tasks.get(key)
  5772                                                       if ts is not None and ts.key in self.keys:
  5773                                                           self.metadata[key] = ts.metadata
  5774                                                           self.state[key] = finish
  5775                                                           self.keys.discard(key)

@jakirkham
Copy link
Collaborator Author

Ideally we would want to benchmark _background_send; however, have found this to be tricky. Adding a decorator before gen.coroutine seems to cause the gen.coroutine decorator issues. Adding the decorator after profiles gen.coroutine instead, which is not of interest. Also tried using a with-block inside of _background_send, but this function doesn't seem to be picked up in the profile results. So it seems like there is still some more tweaking needed to get a clearer picture of how this function is behaving.

@jakirkham
Copy link
Collaborator Author

report remains slow. The profile points to the addition of BatchedSend objects to a set even though they don't implement __eq__ or __hash__. Filed issue ( dask/distributed#4273 ) about this.

@jakirkham
Copy link
Collaborator Author

jakirkham commented Nov 24, 2020

Just to summarize what we have been seeing in terms of major bottlenecks in transition functions, here is a list of transition functions from most time consumed to least. Also functions that took more than 10% (usually much more than that) of the transition function are listed in order of most consuming to least.

  • transition_processing_memory: _add_to_memory, _remove_from_processing
  • transition_waiting_processing: send_task_to_worker, decide_worker, check_idle_saturated
  • transition_released_forgotten: _propagate_forgotten, report_on_key, remove_key
  • transition_memory_released: report, worker_send
  • transition_released_waiting: NA
  • transition_memory_forgotten: _propagate_forgotten, report_on_key

Would then add these functions have the following bottlenecks:

These lack any specific bottleneck:

  • decide_worker
  • transition_released_waiting
  • report
  • _propagate_forgotten
  • remove_key

Summary: Hashing and comparisons cause notable slowdowns in major bottlenecks and then also appear throughout due to sets and the like. Another significant cause of slowdowns in simply due to communication between the Scheduler and Workers. There are a few functions that lack specific bottlenecks and may merit Cythonization themselves (though this won't see much improvement without also Cythonizing the objects they use). Though these few functions are not as significant of bottlenecks compared to hashing/comparisons and communication issues.

Edited: To reflect the profile below.

@jakirkham
Copy link
Collaborator Author

With some of these changes merged into distributed as of commit ( dask/distributed@93f9f70 ), reran the profile to see where things sit now. Results included below:

Results from prof_1051.lstat:

Timer unit: 1e-06 s

Total time: 22.5638 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: report at line 2597

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2597                                               @profiler
  2598                                               def report(self, msg, ts=None, client=None):
  2599                                                   """
  2600                                                   Publish updates to all listening Queues and Comms
  2601                                           
  2602                                                   If the message contains a key then we only send the message to those
  2603                                                   comms that care about the key.
  2604                                                   """
  2605   4233600    3793845.0      0.9     16.8          if ts is None and "key" in msg:
  2606   2822400    3086971.0      1.1     13.7              ts = self.tasks.get(msg["key"])
  2607                                           
  2608   4233600    2673058.0      0.6     11.8          if ts is None:
  2609                                                       # Notify all clients
  2610                                                       client_keys = list(self.client_comms)
  2611   4233600    2681804.0      0.6     11.9          elif client is None:
  2612                                                       # Notify clients interested in key
  2613   4233600    6633565.0      1.6     29.4              client_keys = [c.client_key for c in ts.who_wants]
  2614                                                   else:
  2615                                                       # Notify clients interested in key (including `client`)
  2616                                                       client_keys = [c.client_key for c in ts.who_wants if c.client_key != client]
  2617                                                       client_keys.append(client)
  2618                                           
  2619   4291200    3179837.0      0.7     14.1          for k in client_keys:
  2620     57600      34523.0      0.6      0.2              try:
  2621     57600      56010.0      1.0      0.2                  c = self.client_comms[k]
  2622                                                       except KeyError:
  2623                                                           continue
  2624     57600      34005.0      0.6      0.2              try:
  2625     57600     390175.0      6.8      1.7                  c.send(msg)
  2626                                                           # logger.debug("Scheduler sends message to client %s", msg)
  2627                                                       except CommClosedError:
  2628                                                           if self.status == Status.running:
  2629                                                               logger.critical("Tried writing to closed comm: %s", msg)

Total time: 50.0444 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: send_task_to_worker at line 2710

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2710                                               @profiler
  2711                                               def send_task_to_worker(self, worker, key):
  2712                                                   """ Send a single computational task to a worker """
  2713   1413668    1398667.0      1.0      2.8          try:
  2714   1413668    1623292.0      1.1      3.2              ts = self.tasks[key]
  2715                                           
  2716   1413668    1685266.0      1.2      3.4              msg = {
  2717   1413668    1253702.0      0.9      2.5                  "op": "compute-task",
  2718   1413668    1199258.0      0.8      2.4                  "key": key,
  2719   1413668    1479699.0      1.0      3.0                  "priority": ts.priority,
  2720   1413668    2815498.0      2.0      5.6                  "duration": self.get_task_duration(ts),
  2721                                                       }
  2722   1413668    1313493.0      0.9      2.6              if ts.resource_restrictions:
  2723                                                           msg["resource_restrictions"] = ts.resource_restrictions
  2724   1413668    1297004.0      0.9      2.6              if ts.actor:
  2725                                                           msg["actor"] = True
  2726                                           
  2727   1413668    1309127.0      0.9      2.6              deps = ts.dependencies
  2728   1413668    1326640.0      0.9      2.7              if deps:
  2729   2758216    7749386.0      2.8     15.5                  msg["who_has"] = {
  2730   1379108    1190512.0      0.9      2.4                      dep.key: [ws.address for ws in dep.who_has] for dep in deps
  2731                                                           }
  2732   1379108    3293042.0      2.4      6.6                  msg["nbytes"] = {dep.key: dep.nbytes for dep in deps}
  2733                                           
  2734   1413668    1442941.0      1.0      2.9              if self.validate and deps:
  2735                                                           assert all(msg["who_has"].values())
  2736                                           
  2737   1413668    1579525.0      1.1      3.2              task = ts.run_spec
  2738   1413668    1649253.0      1.2      3.3              if type(task) is dict:
  2739   1407908    2582710.0      1.8      5.2                  msg.update(task)
  2740                                                       else:
  2741      5760       5315.0      0.9      0.0                  msg["task"] = task
  2742                                           
  2743   1413668   13850118.0      9.8     27.7              self.worker_send(worker, msg)
  2744                                                   except Exception as e:
  2745                                                       logger.exception(e)
  2746                                                       if LOG_PDB:
  2747                                                           import pdb
  2748                                           
  2749                                                           pdb.set_trace()
  2750                                                       raise

Total time: 16.7257 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: worker_send at line 2880

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2880                                               @profiler
  2881                                               def worker_send(self, worker, msg):
  2882                                                   """Send message to worker
  2883                                           
  2884                                                   This also handles connection failures by adding a callback to remove
  2885                                                   the worker on the next cycle.
  2886                                                   """
  2887   3582280    1495686.0      0.4      8.9          try:
  2888   3582280   15230059.0      4.3     91.1              self.stream_comms[worker].send(msg)
  2889                                                   except (CommClosedError, AttributeError):
  2890                                                       self.loop.add_callback(self.remove_worker, address=worker)

Total time: 16.6451 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: report_on_key at line 3694

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3694                                               @profiler
  3695                                               def report_on_key(self, key=None, ts=None, client=None):
  3696   1411200     903266.0      0.6      5.4          assert (key is None) + (ts is None) == 1, (key, ts)
  3697   1411200     690657.0      0.5      4.1          if ts is None:
  3698                                                       try:
  3699                                                           ts = self.tasks[key]
  3700                                                       except KeyError:
  3701                                                           self.report({"op": "cancelled-key", "key": key}, client=client)
  3702                                                           return
  3703                                                   else:
  3704   1411200     713971.0      0.5      4.3              key = ts.key
  3705   1411200    1097916.0      0.8      6.6          if ts.state == "forgotten":
  3706   1411200   13239339.0      9.4     79.5              self.report({"op": "cancelled-key", "key": key}, ts=ts, client=client)
  3707                                                   elif ts.state == "memory":
  3708                                                       self.report({"op": "key-in-memory", "key": key}, ts=ts, client=client)
  3709                                                   elif ts.state == "erred":
  3710                                                       failing_ts = ts.exception_blame
  3711                                                       self.report(
  3712                                                           {
  3713                                                               "op": "task-erred",
  3714                                                               "key": key,
  3715                                                               "exception": failing_ts.exception,
  3716                                                               "traceback": failing_ts.traceback,
  3717                                                           },
  3718                                                           ts=ts,
  3719                                                           client=client,
  3720                                                       )

Total time: 49.047 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _remove_from_processing at line 3970

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3970                                               @profiler
  3971                                               def _remove_from_processing(self, ts, send_worker_msg=None):
  3972                                                   """
  3973                                                   Remove *ts* from the set of processing tasks.
  3974                                                   """
  3975   1411200     850514.0      0.6      1.7          ws = ts.processing_on
  3976   1411200     783823.0      0.6      1.6          ts.processing_on = None
  3977   1411200     729739.0      0.5      1.5          w = ws.address
  3978   1411200    1046300.0      0.7      2.1          if w in self.workers:  # may have been removed
  3979   1411200    1044520.0      0.7      2.1              duration = ws.processing.pop(ts)
  3980   1411200     700733.0      0.5      1.4              if not ws.processing:
  3981       207        213.0      1.0      0.0                  self.total_occupancy -= ws.occupancy
  3982       207        103.0      0.5      0.0                  ws.occupancy = 0
  3983                                                       else:
  3984   1410993    1146035.0      0.8      2.3                  self.total_occupancy -= duration
  3985   1410993     965555.0      0.7      2.0                  ws.occupancy -= duration
  3986   1411200   39166196.0     27.8     79.9              self.check_idle_saturated(ws)
  3987   1411200    1878430.0      1.3      3.8              self.release_resources(ts, ws)
  3988   1411200     734881.0      0.5      1.5              if send_worker_msg:
  3989                                                           self.worker_send(w, send_worker_msg)

Total time: 69.4768 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _add_to_memory at line 3991

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3991                                               @profiler
  3992                                               def _add_to_memory(
  3993                                                   self, ts, ws, recommendations, type=None, typename=None, **kwargs
  3994                                               ):
  3995                                                   """
  3996                                                   Add *ts* to the set of in-memory tasks.
  3997                                                   """
  3998   1411200    1222514.0      0.9      1.8          if self.validate:
  3999                                                       assert ts not in ws.has_what
  4000                                           
  4001   1411200    1914079.0      1.4      2.8          ts.who_has.add(ws)
  4002   1411200    1592655.0      1.1      2.3          ws.has_what.add(ts)
  4003   1411200    2589781.0      1.8      3.7          ws.nbytes += ts.get_nbytes()
  4004                                           
  4005   1411200    1049157.0      0.7      1.5          deps = ts.dependents
  4006   1411200    1355229.0      1.0      2.0          if len(deps) > 1:
  4007    103680    1154872.0     11.1      1.7              deps = sorted(deps, key=operator.attrgetter("priority"), reverse=True)
  4008   3864960    2889018.0      0.7      4.2          for dts in deps:
  4009   2453760    2062738.0      0.8      3.0              s = dts.waiting_on
  4010   2453760    2030232.0      0.8      2.9              if ts in s:
  4011   2453760    1968661.0      0.8      2.8                  s.discard(ts)
  4012   2453760    1686841.0      0.7      2.4                  if not s:  # new task ready to run
  4013   1376640    1422570.0      1.0      2.0                      recommendations[dts.key] = "processing"
  4014                                           
  4015   3864960    2867176.0      0.7      4.1          for dts in ts.dependencies:
  4016   2453760    2102140.0      0.9      3.0              s = dts.waiters
  4017   2453760    2081852.0      0.8      3.0              s.discard(ts)
  4018   2453760    1839235.0      0.7      2.6              if not s and not dts.who_wants:
  4019   1382400    1392384.0      1.0      2.0                  recommendations[dts.key] = "released"
  4020                                           
  4021   1411200    1143829.0      0.8      1.6          if not ts.waiters and not ts.who_wants:
  4022                                                       recommendations[ts.key] = "released"
  4023                                                   else:
  4024   1411200    1432990.0      1.0      2.1              msg = {"op": "key-in-memory", "key": ts.key}
  4025   1411200    1013826.0      0.7      1.5              if type is not None:
  4026   1411200    1178108.0      0.8      1.7                  msg["type"] = type
  4027   1411200   21399331.0     15.2     30.8              self.report(msg)
  4028                                           
  4029   1411200    4088738.0      2.9      5.9          ts.state = "memory"
  4030   1411200    1177023.0      0.8      1.7          ts.type = typename
  4031   1411200    2123076.0      1.5      3.1          ts.group.types.add(typename)
  4032                                           
  4033   1411200    1260377.0      0.9      1.8          cs = self.clients["fire-and-forget"]
  4034   1411200    1438404.0      1.0      2.1          if ts in cs.wants_what:
  4035                                                       self.client_releases_keys(client="fire-and-forget", keys=[ts.key])

Total time: 28.4279 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_waiting at line 4037

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4037                                               @profiler
  4038                                               def transition_released_waiting(self, key):
  4039   1411200     925571.0      0.7      3.3          try:
  4040   1411200    1073000.0      0.8      3.8              ts = self.tasks[key]
  4041                                           
  4042   1411200     962241.0      0.7      3.4              if self.validate:
  4043                                                           assert ts.run_spec
  4044                                                           assert not ts.waiting_on
  4045                                                           assert not ts.who_has
  4046                                                           assert not ts.processing_on
  4047                                                           assert not any(dts.state == "forgotten" for dts in ts.dependencies)
  4048                                           
  4049   1411200    1023841.0      0.7      3.6              if ts.has_lost_dependencies:
  4050                                                           return {key: "forgotten"}
  4051                                           
  4052   1411200    2366596.0      1.7      8.3              ts.state = "waiting"
  4053                                           
  4054   1411200     932001.0      0.7      3.3              recommendations = {}
  4055                                           
  4056   3864960    2668594.0      0.7      9.4              for dts in ts.dependencies:
  4057   2453760    1606678.0      0.7      5.7                  if dts.exception_blame:
  4058                                                               ts.exception_blame = dts.exception_blame
  4059                                                               recommendations[key] = "erred"
  4060                                                               return recommendations
  4061                                           
  4062   3864960    2590176.0      0.7      9.1              for dts in ts.dependencies:
  4063   2453760    1744592.0      0.7      6.1                  dep = dts.key
  4064   2453760    1745343.0      0.7      6.1                  if not dts.who_has:
  4065   2453760    1984079.0      0.8      7.0                      ts.waiting_on.add(dts)
  4066   2453760    2237170.0      0.9      7.9                  if dts.state == "released":
  4067                                                               recommendations[dep] = "waiting"
  4068                                                           else:
  4069   2453760    2003901.0      0.8      7.0                      dts.waiters.add(ts)
  4070                                           
  4071   1411200    2630896.0      1.9      9.3              ts.waiters = {dts for dts in ts.dependents if dts.state == "waiting"}
  4072                                           
  4073   1411200     988017.0      0.7      3.5              if not ts.waiting_on:
  4074     34560      26227.0      0.8      0.1                  if self.workers:
  4075     34560      29566.0      0.9      0.1                      recommendations[key] = "processing"
  4076                                                           else:
  4077                                                               self.unrunnable.add(ts)
  4078                                                               ts.state = "no-worker"
  4079                                           
  4080   1411200     889419.0      0.6      3.1              return recommendations
  4081                                                   except Exception as e:
  4082                                                       logger.exception(e)
  4083                                                       if LOG_PDB:
  4084                                                           import pdb
  4085                                           
  4086                                                           pdb.set_trace()
  4087                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_no_worker_waiting at line 4089

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4089                                               @profiler
  4090                                               def transition_no_worker_waiting(self, key):
  4091                                                   try:
  4092                                                       ts = self.tasks[key]
  4093                                           
  4094                                                       if self.validate:
  4095                                                           assert ts in self.unrunnable
  4096                                                           assert not ts.waiting_on
  4097                                                           assert not ts.who_has
  4098                                                           assert not ts.processing_on
  4099                                           
  4100                                                       self.unrunnable.remove(ts)
  4101                                           
  4102                                                       if ts.has_lost_dependencies:
  4103                                                           return {key: "forgotten"}
  4104                                           
  4105                                                       recommendations = {}
  4106                                           
  4107                                                       for dts in ts.dependencies:
  4108                                                           dep = dts.key
  4109                                                           if not dts.who_has:
  4110                                                               ts.waiting_on.add(dts)
  4111                                                           if dts.state == "released":
  4112                                                               recommendations[dep] = "waiting"
  4113                                                           else:
  4114                                                               dts.waiters.add(ts)
  4115                                           
  4116                                                       ts.state = "waiting"
  4117                                           
  4118                                                       if not ts.waiting_on:
  4119                                                           if self.workers:
  4120                                                               recommendations[key] = "processing"
  4121                                                           else:
  4122                                                               self.unrunnable.add(ts)
  4123                                                               ts.state = "no-worker"
  4124                                           
  4125                                                       return recommendations
  4126                                                   except Exception as e:
  4127                                                       logger.exception(e)
  4128                                                       if LOG_PDB:
  4129                                                           import pdb
  4130                                           
  4131                                                           pdb.set_trace()
  4132                                                       raise

Total time: 53.3423 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: decide_worker at line 4134

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4134                                               @profiler
  4135                                               def decide_worker(self, ts):
  4136                                                   """
  4137                                                   Decide on a worker for task *ts*.  Return a WorkerState.
  4138                                                   """
  4139   1411200    7283658.0      5.2     13.7          valid_workers = self.valid_workers(ts)
  4140                                           
  4141   1411200     819768.0      0.6      1.5          if not valid_workers and not ts.loose_restrictions and self.workers:
  4142                                                       self.unrunnable.add(ts)
  4143                                                       ts.state = "no-worker"
  4144                                                       return None
  4145                                           
  4146   1411200     911725.0      0.6      1.7          if ts.dependencies or valid_workers is not True:
  4147   2753280   36042312.0     13.1     67.6              worker = decide_worker(
  4148   1376640     670071.0      0.5      1.3                  ts,
  4149   1376640    3353890.0      2.4      6.3                  self.workers.values(),
  4150   1376640     714943.0      0.5      1.3                  valid_workers,
  4151   1376640    1408103.0      1.0      2.6                  partial(self.worker_objective, ts),
  4152                                                       )
  4153     34560      41533.0      1.2      0.1          elif self.idle:
  4154       240        293.0      1.2      0.0              if len(self.idle) < 20:  # smart but linear in small case
  4155       240       1470.0      6.1      0.0                  worker = min(self.idle, key=operator.attrgetter("occupancy"))
  4156                                                       else:  # dumb but fast in large case
  4157                                                           worker = self.idle[self.n_tasks % len(self.idle)]
  4158                                                   else:
  4159     34320      21859.0      0.6      0.0              if len(self.workers) < 20:  # smart but linear in small case
  4160     68640     282473.0      4.1      0.5                  worker = min(
  4161     34320      65862.0      1.9      0.1                      self.workers.values(), key=operator.attrgetter("occupancy")
  4162                                                           )
  4163                                                       else:  # dumb but fast in large case
  4164                                                           worker = self.workers.values()[self.n_tasks % len(self.workers)]
  4165                                           
  4166   1411200    1001641.0      0.7      1.9          if self.validate:
  4167                                                       assert worker is None or isinstance(worker, WorkerState), (
  4168                                                           type(worker),
  4169                                                           worker,
  4170                                                       )
  4171                                                       assert worker.address in self.workers
  4172                                           
  4173   1411200     722670.0      0.5      1.4          return worker

Total time: 220.805 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_processing at line 4175

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4175                                               @profiler
  4176                                               def transition_waiting_processing(self, key):
  4177   1411200    1054329.0      0.7      0.5          try:
  4178   1411200    1182006.0      0.8      0.5              ts = self.tasks[key]
  4179                                           
  4180   1411200    1047337.0      0.7      0.5              if self.validate:
  4181                                                           assert not ts.waiting_on
  4182                                                           assert not ts.who_has
  4183                                                           assert not ts.exception_blame
  4184                                                           assert not ts.processing_on
  4185                                                           assert not ts.has_lost_dependencies
  4186                                                           assert ts not in self.unrunnable
  4187                                                           assert all(dts.who_has for dts in ts.dependencies)
  4188                                           
  4189   1411200   67324358.0     47.7     30.5              ws = self.decide_worker(ts)
  4190   1411200    1032261.0      0.7      0.5              if ws is None:
  4191                                                           return {}
  4192   1411200    1101500.0      0.8      0.5              worker = ws.address
  4193                                           
  4194   1411200    2903142.0      2.1      1.3              duration = self.get_task_duration(ts)
  4195   1411200    4966270.0      3.5      2.2              comm = self.get_comm_cost(ts, ws)
  4196                                           
  4197   1411200    1815638.0      1.3      0.8              ws.processing[ts] = duration + comm
  4198   1411200    1136592.0      0.8      0.5              ts.processing_on = ws
  4199   1411200    1471093.0      1.0      0.7              ws.occupancy += duration + comm
  4200   1411200    1403016.0      1.0      0.6              self.total_occupancy += duration + comm
  4201   1411200    3668669.0      2.6      1.7              ts.state = "processing"
  4202   1411200    2020312.0      1.4      0.9              self.consume_resources(ts, ws)
  4203   1411200   38115185.0     27.0     17.3              self.check_idle_saturated(ws)
  4204   1411200    1539829.0      1.1      0.7              self.n_tasks += 1
  4205                                           
  4206   1411200    1125811.0      0.8      0.5              if ts.actor:
  4207                                                           ws.actors.add(ts)
  4208                                           
  4209                                                       # logger.debug("Send job to worker: %s, %s", worker, key)
  4210                                           
  4211   1411200   86838579.0     61.5     39.3              self.send_task_to_worker(worker, key)
  4212                                           
  4213   1411200    1059109.0      0.8      0.5              return {}
  4214                                                   except Exception as e:
  4215                                                       logger.exception(e)
  4216                                                       if LOG_PDB:
  4217                                                           import pdb
  4218                                           
  4219                                                           pdb.set_trace()
  4220                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_memory at line 4222

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4222                                               @profiler
  4223                                               def transition_waiting_memory(self, key, nbytes=None, worker=None, **kwargs):
  4224                                                   try:
  4225                                                       ws = self.workers[worker]
  4226                                                       ts = self.tasks[key]
  4227                                           
  4228                                                       if self.validate:
  4229                                                           assert not ts.processing_on
  4230                                                           assert ts.waiting_on
  4231                                                           assert ts.state == "waiting"
  4232                                           
  4233                                                       ts.waiting_on.clear()
  4234                                           
  4235                                                       if nbytes is not None:
  4236                                                           ts.set_nbytes(nbytes)
  4237                                           
  4238                                                       self.check_idle_saturated(ws)
  4239                                           
  4240                                                       recommendations = {}
  4241                                           
  4242                                                       self._add_to_memory(ts, ws, recommendations, **kwargs)
  4243                                           
  4244                                                       if self.validate:
  4245                                                           assert not ts.processing_on
  4246                                                           assert not ts.waiting_on
  4247                                                           assert ts.who_has
  4248                                           
  4249                                                       return recommendations
  4250                                                   except Exception as e:
  4251                                                       logger.exception(e)
  4252                                                       if LOG_PDB:
  4253                                                           import pdb
  4254                                           
  4255                                                           pdb.set_trace()
  4256                                                       raise

Total time: 234.745 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_memory at line 4258

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4258                                               @profiler
  4259                                               def transition_processing_memory(
  4260                                                   self,
  4261                                                   key,
  4262                                                   nbytes=None,
  4263                                                   type=None,
  4264                                                   typename=None,
  4265                                                   worker=None,
  4266                                                   startstops=None,
  4267                                                   **kwargs,
  4268                                               ):
  4269   1411200    1315659.0      0.9      0.6          try:
  4270   1411200    1513048.0      1.1      0.6              ts = self.tasks[key]
  4271   1411200    1257852.0      0.9      0.5              assert worker
  4272   1411200    1563241.0      1.1      0.7              assert isinstance(worker, str)
  4273                                           
  4274   1411200    1305042.0      0.9      0.6              if self.validate:
  4275                                                           assert ts.processing_on
  4276                                                           ws = ts.processing_on
  4277                                                           assert ts in ws.processing
  4278                                                           assert not ts.waiting_on
  4279                                                           assert not ts.who_has, (ts, ts.who_has)
  4280                                                           assert not ts.exception_blame
  4281                                                           assert ts.state == "processing"
  4282                                           
  4283   1411200    1683447.0      1.2      0.7              ws = self.workers.get(worker)
  4284   1411200    1267902.0      0.9      0.5              if ws is None:
  4285                                                           return {key: "released"}
  4286                                           
  4287   1411200    3554700.0      2.5      1.5              if ws != ts.processing_on:  # someone else has this task
  4288                                                           logger.info(
  4289                                                               "Unexpected worker completed task, likely due to"
  4290                                                               " work stealing.  Expected: %s, Got: %s, Key: %s",
  4291                                                               ts.processing_on,
  4292                                                               ws,
  4293                                                               key,
  4294                                                           )
  4295                                                           return {}
  4296                                           
  4297   1411200    1300367.0      0.9      0.6              if startstops:
  4298   1411200    1588921.0      1.1      0.7                  L = list()
  4299   2865828    2789074.0      1.0      1.2                  for startstop in startstops:
  4300   1454628    1484264.0      1.0      0.6                      stop = startstop["stop"]
  4301   1454628    1300149.0      0.9      0.6                      start = startstop["start"]
  4302   1454628    1307001.0      0.9      0.6                      action = startstop["action"]
  4303   1454628    1426672.0      1.0      0.6                      if action == "compute":
  4304   1411200    1670338.0      1.2      0.7                          L.append((start, stop))
  4305                                           
  4306                                                               # record timings of all actions -- a cheaper way of
  4307                                                               # getting timing info compared with get_task_stream()
  4308   1454628    2983319.0      2.1      1.3                      ts.prefix.all_durations[action] += stop - start
  4309                                           
  4310   1411200    1621061.0      1.1      0.7                  if len(L) > 0:
  4311   1411200    1670403.0      1.2      0.7                      compute_start, compute_stop = L[0]
  4312                                                           else:  # This is very rare
  4313                                                               compute_start = compute_stop = None
  4314                                                       else:
  4315                                                           compute_start = compute_stop = None
  4316                                           
  4317                                                       #############################
  4318                                                       # Update Timing Information #
  4319                                                       #############################
  4320   1411200    2584478.0      1.8      1.1              if compute_start and ws.processing.get(ts, True):
  4321                                                           # Update average task duration for worker
  4322   1411200    1512375.0      1.1      0.6                  old_duration = ts.prefix.duration_average or 0
  4323   1411200    1294463.0      0.9      0.6                  new_duration = compute_stop - compute_start
  4324   1411200    1286496.0      0.9      0.5                  if not old_duration:
  4325         4          3.0      0.8      0.0                      avg_duration = new_duration
  4326                                                           else:
  4327   1411196    1635660.0      1.2      0.7                      avg_duration = 0.5 * old_duration + 0.5 * new_duration
  4328                                           
  4329   1411200    1523568.0      1.1      0.6                  ts.prefix.duration_average = avg_duration
  4330   1411200    1932548.0      1.4      0.8                  ts.group.duration += new_duration
  4331                                           
  4332   1413282    2035504.0      1.4      0.9                  for tts in self.unknown_durations.pop(ts.prefix.name, ()):
  4333      2082       2796.0      1.3      0.0                      if tts.processing_on:
  4334      2082       2372.0      1.1      0.0                          wws = tts.processing_on
  4335      2082       3326.0      1.6      0.0                          old = wws.processing[tts]
  4336      2082       6819.0      3.3      0.0                          comm = self.get_comm_cost(tts, wws)
  4337      2082       2684.0      1.3      0.0                          wws.processing[tts] = avg_duration + comm
  4338      2082       2849.0      1.4      0.0                          wws.occupancy += avg_duration + comm - old
  4339      2082       2793.0      1.3      0.0                          self.total_occupancy += avg_duration + comm - old
  4340                                           
  4341                                                       ############################
  4342                                                       # Update State Information #
  4343                                                       ############################
  4344   1411200    1365600.0      1.0      0.6              if nbytes is not None:
  4345   1411200    5435622.0      3.9      2.3                  ts.set_nbytes(nbytes)
  4346                                           
  4347   1411200    1367638.0      1.0      0.6              recommendations = {}
  4348                                           
  4349   1411200   61232589.0     43.4     26.1              self._remove_from_processing(ts)
  4350                                           
  4351   1411200  116218732.0     82.4     49.5              self._add_to_memory(ts, ws, recommendations, type=type, typename=typename)
  4352                                           
  4353   1411200    1488070.0      1.1      0.6              if self.validate:
  4354                                                           assert not ts.processing_on
  4355                                                           assert not ts.waiting_on
  4356                                           
  4357   1411200    1205944.0      0.9      0.5              return recommendations
  4358                                                   except Exception as e:
  4359                                                       logger.exception(e)
  4360                                                       if LOG_PDB:
  4361                                                           import pdb
  4362                                           
  4363                                                           pdb.set_trace()
  4364                                                       raise

Total time: 70.0916 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_memory_released at line 4366

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4366                                               @profiler
  4367                                               def transition_memory_released(self, key, safe=False):
  4368   1382400    1130453.0      0.8      1.6          try:
  4369   1382400    1240403.0      0.9      1.8              ts = self.tasks[key]
  4370                                           
  4371   1382400    1108371.0      0.8      1.6              if self.validate:
  4372                                                           assert not ts.waiting_on
  4373                                                           assert not ts.processing_on
  4374                                                           if safe:
  4375                                                               assert not ts.waiters
  4376                                           
  4377   1382400    1199390.0      0.9      1.7              if ts.actor:
  4378                                                           for ws in ts.who_has:
  4379                                                               ws.actors.discard(ts)
  4380                                                           if ts.who_wants:
  4381                                                               ts.exception_blame = ts
  4382                                                               ts.exception = "Worker holding Actor was lost"
  4383                                                               return {ts.key: "erred"}  # don't try to recreate
  4384                                           
  4385   1382400    1037677.0      0.8      1.5              recommendations = {}
  4386                                           
  4387   1382400    1443032.0      1.0      2.1              for dts in ts.waiters:
  4388                                                           if dts.state in ("no-worker", "processing"):
  4389                                                               recommendations[dts.key] = "waiting"
  4390                                                           elif dts.state == "waiting":
  4391                                                               dts.waiting_on.add(ts)
  4392                                           
  4393                                                       # XXX factor this out?
  4394   3522208    3016257.0      0.9      4.3              for ws in ts.who_has:
  4395   2139808    2311909.0      1.1      3.3                  ws.has_what.remove(ts)
  4396   2139808    3638810.0      1.7      5.2                  ws.nbytes -= ts.get_nbytes()
  4397   2139808    3260473.0      1.5      4.7                  ts.group.nbytes_in_memory -= ts.get_nbytes()
  4398   4279616   18621939.0      4.4     26.6                  self.worker_send(
  4399   2139808    2398023.0      1.1      3.4                      ws.address, {"op": "delete-data", "keys": [key], "report": False}
  4400                                                           )
  4401   1382400    1360437.0      1.0      1.9              ts.who_has.clear()
  4402                                           
  4403   1382400    3211796.0      2.3      4.6              ts.state = "released"
  4404                                           
  4405   1382400   19110338.0     13.8     27.3              self.report({"op": "lost-data", "key": key})
  4406                                           
  4407   1382400    1386485.0      1.0      2.0              if not ts.run_spec:  # pure data
  4408                                                           recommendations[key] = "forgotten"
  4409   1382400    1149398.0      0.8      1.6              elif ts.has_lost_dependencies:
  4410                                                           recommendations[key] = "forgotten"
  4411   1382400    1324792.0      1.0      1.9              elif ts.who_wants or ts.waiters:
  4412                                                           recommendations[key] = "waiting"
  4413                                           
  4414   1382400    1154593.0      0.8      1.6              if self.validate:
  4415                                                           assert not ts.waiting_on
  4416                                           
  4417   1382400     987061.0      0.7      1.4              return recommendations
  4418                                                   except Exception as e:
  4419                                                       logger.exception(e)
  4420                                                       if LOG_PDB:
  4421                                                           import pdb
  4422                                           
  4423                                                           pdb.set_trace()
  4424                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_erred at line 4426

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4426                                               @profiler
  4427                                               def transition_released_erred(self, key):
  4428                                                   try:
  4429                                                       ts = self.tasks[key]
  4430                                           
  4431                                                       if self.validate:
  4432                                                           with log_errors(pdb=LOG_PDB):
  4433                                                               assert ts.exception_blame
  4434                                                               assert not ts.who_has
  4435                                                               assert not ts.waiting_on
  4436                                                               assert not ts.waiters
  4437                                           
  4438                                                       recommendations = {}
  4439                                           
  4440                                                       failing_ts = ts.exception_blame
  4441                                           
  4442                                                       for dts in ts.dependents:
  4443                                                           dts.exception_blame = failing_ts
  4444                                                           if not dts.who_has:
  4445                                                               recommendations[dts.key] = "erred"
  4446                                           
  4447                                                       self.report(
  4448                                                           {
  4449                                                               "op": "task-erred",
  4450                                                               "key": key,
  4451                                                               "exception": failing_ts.exception,
  4452                                                               "traceback": failing_ts.traceback,
  4453                                                           }
  4454                                                       )
  4455                                           
  4456                                                       ts.state = "erred"
  4457                                           
  4458                                                       # TODO: waiting data?
  4459                                                       return recommendations
  4460                                                   except Exception as e:
  4461                                                       logger.exception(e)
  4462                                                       if LOG_PDB:
  4463                                                           import pdb
  4464                                           
  4465                                                           pdb.set_trace()
  4466                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_erred_released at line 4468

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4468                                               @profiler
  4469                                               def transition_erred_released(self, key):
  4470                                                   try:
  4471                                                       ts = self.tasks[key]
  4472                                           
  4473                                                       if self.validate:
  4474                                                           with log_errors(pdb=LOG_PDB):
  4475                                                               assert all(dts.state != "erred" for dts in ts.dependencies)
  4476                                                               assert ts.exception_blame
  4477                                                               assert not ts.who_has
  4478                                                               assert not ts.waiting_on
  4479                                                               assert not ts.waiters
  4480                                           
  4481                                                       recommendations = {}
  4482                                           
  4483                                                       ts.exception = None
  4484                                                       ts.exception_blame = None
  4485                                                       ts.traceback = None
  4486                                           
  4487                                                       for dep in ts.dependents:
  4488                                                           if dep.state == "erred":
  4489                                                               recommendations[dep.key] = "waiting"
  4490                                           
  4491                                                       self.report({"op": "task-retried", "key": key})
  4492                                                       ts.state = "released"
  4493                                           
  4494                                                       return recommendations
  4495                                                   except Exception as e:
  4496                                                       logger.exception(e)
  4497                                                       if LOG_PDB:
  4498                                                           import pdb
  4499                                           
  4500                                                           pdb.set_trace()
  4501                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_released at line 4503

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4503                                               @profiler
  4504                                               def transition_waiting_released(self, key):
  4505                                                   try:
  4506                                                       ts = self.tasks[key]
  4507                                           
  4508                                                       if self.validate:
  4509                                                           assert not ts.who_has
  4510                                                           assert not ts.processing_on
  4511                                           
  4512                                                       recommendations = {}
  4513                                           
  4514                                                       for dts in ts.dependencies:
  4515                                                           s = dts.waiters
  4516                                                           if ts in s:
  4517                                                               s.discard(ts)
  4518                                                               if not s and not dts.who_wants:
  4519                                                                   recommendations[dts.key] = "released"
  4520                                                       ts.waiting_on.clear()
  4521                                           
  4522                                                       ts.state = "released"
  4523                                           
  4524                                                       if ts.has_lost_dependencies:
  4525                                                           recommendations[key] = "forgotten"
  4526                                                       elif not ts.exception_blame and (ts.who_wants or ts.waiters):
  4527                                                           recommendations[key] = "waiting"
  4528                                                       else:
  4529                                                           ts.waiters.clear()
  4530                                           
  4531                                                       return recommendations
  4532                                                   except Exception as e:
  4533                                                       logger.exception(e)
  4534                                                       if LOG_PDB:
  4535                                                           import pdb
  4536                                           
  4537                                                           pdb.set_trace()
  4538                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_released at line 4540

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4540                                               @profiler
  4541                                               def transition_processing_released(self, key):
  4542                                                   try:
  4543                                                       ts = self.tasks[key]
  4544                                           
  4545                                                       if self.validate:
  4546                                                           assert ts.processing_on
  4547                                                           assert not ts.who_has
  4548                                                           assert not ts.waiting_on
  4549                                                           assert self.tasks[key].state == "processing"
  4550                                           
  4551                                                       self._remove_from_processing(
  4552                                                           ts, send_worker_msg={"op": "release-task", "key": key}
  4553                                                       )
  4554                                           
  4555                                                       ts.state = "released"
  4556                                           
  4557                                                       recommendations = {}
  4558                                           
  4559                                                       if ts.has_lost_dependencies:
  4560                                                           recommendations[key] = "forgotten"
  4561                                                       elif ts.waiters or ts.who_wants:
  4562                                                           recommendations[key] = "waiting"
  4563                                           
  4564                                                       if recommendations.get(key) != "waiting":
  4565                                                           for dts in ts.dependencies:
  4566                                                               if dts.state != "released":
  4567                                                                   s = dts.waiters
  4568                                                                   s.discard(ts)
  4569                                                                   if not s and not dts.who_wants:
  4570                                                                       recommendations[dts.key] = "released"
  4571                                                           ts.waiters.clear()
  4572                                           
  4573                                                       if self.validate:
  4574                                                           assert not ts.processing_on
  4575                                           
  4576                                                       return recommendations
  4577                                                   except Exception as e:
  4578                                                       logger.exception(e)
  4579                                                       if LOG_PDB:
  4580                                                           import pdb
  4581                                           
  4582                                                           pdb.set_trace()
  4583                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_erred at line 4585

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4585                                               @profiler
  4586                                               def transition_processing_erred(
  4587                                                   self, key, cause=None, exception=None, traceback=None, **kwargs
  4588                                               ):
  4589                                                   try:
  4590                                                       ts = self.tasks[key]
  4591                                           
  4592                                                       if self.validate:
  4593                                                           assert cause or ts.exception_blame
  4594                                                           assert ts.processing_on
  4595                                                           assert not ts.who_has
  4596                                                           assert not ts.waiting_on
  4597                                           
  4598                                                       if ts.actor:
  4599                                                           ws = ts.processing_on
  4600                                                           ws.actors.remove(ts)
  4601                                           
  4602                                                       self._remove_from_processing(ts)
  4603                                           
  4604                                                       if exception is not None:
  4605                                                           ts.exception = exception
  4606                                                       if traceback is not None:
  4607                                                           ts.traceback = traceback
  4608                                                       if cause is not None:
  4609                                                           failing_ts = self.tasks[cause]
  4610                                                           ts.exception_blame = failing_ts
  4611                                                       else:
  4612                                                           failing_ts = ts.exception_blame
  4613                                           
  4614                                                       recommendations = {}
  4615                                           
  4616                                                       for dts in ts.dependents:
  4617                                                           dts.exception_blame = failing_ts
  4618                                                           recommendations[dts.key] = "erred"
  4619                                           
  4620                                                       for dts in ts.dependencies:
  4621                                                           s = dts.waiters
  4622                                                           s.discard(ts)
  4623                                                           if not s and not dts.who_wants:
  4624                                                               recommendations[dts.key] = "released"
  4625                                           
  4626                                                       ts.waiters.clear()  # do anything with this?
  4627                                           
  4628                                                       ts.state = "erred"
  4629                                           
  4630                                                       self.report(
  4631                                                           {
  4632                                                               "op": "task-erred",
  4633                                                               "key": key,
  4634                                                               "exception": failing_ts.exception,
  4635                                                               "traceback": failing_ts.traceback,
  4636                                                           }
  4637                                                       )
  4638                                           
  4639                                                       cs = self.clients["fire-and-forget"]
  4640                                                       if ts in cs.wants_what:
  4641                                                           self.client_releases_keys(client="fire-and-forget", keys=[key])
  4642                                           
  4643                                                       if self.validate:
  4644                                                           assert not ts.processing_on
  4645                                           
  4646                                                       return recommendations
  4647                                                   except Exception as e:
  4648                                                       logger.exception(e)
  4649                                                       if LOG_PDB:
  4650                                                           import pdb
  4651                                           
  4652                                                           pdb.set_trace()
  4653                                                       raise

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_no_worker_released at line 4655

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4655                                               @profiler
  4656                                               def transition_no_worker_released(self, key):
  4657                                                   try:
  4658                                                       ts = self.tasks[key]
  4659                                           
  4660                                                       if self.validate:
  4661                                                           assert self.tasks[key].state == "no-worker"
  4662                                                           assert not ts.who_has
  4663                                                           assert not ts.waiting_on
  4664                                           
  4665                                                       self.unrunnable.remove(ts)
  4666                                                       ts.state = "released"
  4667                                           
  4668                                                       for dts in ts.dependencies:
  4669                                                           dts.waiters.discard(ts)
  4670                                           
  4671                                                       ts.waiters.clear()
  4672                                           
  4673                                                       return {}
  4674                                                   except Exception as e:
  4675                                                       logger.exception(e)
  4676                                                       if LOG_PDB:
  4677                                                           import pdb
  4678                                           
  4679                                                           pdb.set_trace()
  4680                                                       raise

Total time: 6.05581 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: remove_key at line 4682

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4682                                               @profiler
  4683                                               def remove_key(self, key):
  4684   1411200     844700.0      0.6     13.9          ts = self.tasks.pop(key)
  4685   1411200     992526.0      0.7     16.4          assert ts.state == "forgotten"
  4686   1411200     741772.0      0.5     12.2          self.unrunnable.discard(ts)
  4687   1411200     661518.0      0.5     10.9          for cs in ts.who_wants:
  4688                                                       cs.wants_what.remove(ts)
  4689   1411200     652394.0      0.5     10.8          ts.who_wants.clear()
  4690   1411200     659333.0      0.5     10.9          ts.processing_on = None
  4691   1411200     806417.0      0.6     13.3          ts.exception_blame = ts.exception = ts.traceback = None
  4692                                           
  4693   1411200     697154.0      0.5     11.5          if key in self.task_metadata:
  4694                                                       del self.task_metadata[key]

Total time: 20.4003 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _propagate_forgotten at line 4696

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4696                                               @profiler
  4697                                               def _propagate_forgotten(self, ts, recommendations):
  4698   1411200    2428875.0      1.7     11.9          ts.state = "forgotten"
  4699   1411200     788181.0      0.6      3.9          key = ts.key
  4700   1411200     924636.0      0.7      4.5          for dts in ts.dependents:
  4701                                                       dts.has_lost_dependencies = True
  4702                                                       dts.dependencies.remove(ts)
  4703                                                       dts.waiting_on.discard(ts)
  4704                                                       if dts.state not in ("memory", "erred"):
  4705                                                           # Cannot compute task anymore
  4706                                                           recommendations[dts.key] = "forgotten"
  4707   1411200     887062.0      0.6      4.3          ts.dependents.clear()
  4708   1411200     875744.0      0.6      4.3          ts.waiters.clear()
  4709                                           
  4710   3864960    1972497.0      0.5      9.7          for dts in ts.dependencies:
  4711   2453760    1791557.0      0.7      8.8              dts.dependents.remove(ts)
  4712   2453760    1478297.0      0.6      7.2              s = dts.waiters
  4713   2453760    1536905.0      0.6      7.5              s.discard(ts)
  4714   2453760    1416522.0      0.6      6.9              if not dts.dependents and not dts.who_wants:
  4715                                                           # Task not needed anymore
  4716   1382400     680453.0      0.5      3.3                  assert dts is not ts
  4717   1382400    1025178.0      0.7      5.0                  recommendations[dts.key] = "forgotten"
  4718   1411200     844955.0      0.6      4.1          ts.dependencies.clear()
  4719   1411200     849306.0      0.6      4.2          ts.waiting_on.clear()
  4720                                           
  4721   1411200     742415.0      0.5      3.6          if ts.who_has:
  4722     28800      56961.0      2.0      0.3              ts.group.nbytes_in_memory -= ts.get_nbytes()
  4723                                           
  4724   1440000     838015.0      0.6      4.1          for ws in ts.who_has:
  4725     28800      29621.0      1.0      0.1              ws.has_what.remove(ts)
  4726     28800      38811.0      1.3      0.2              ws.nbytes -= ts.get_nbytes()
  4727     28800      19088.0      0.7      0.1              w = ws.address
  4728     28800      43754.0      1.5      0.2              if w in self.workers:  # in case worker has died
  4729     57600     309972.0      5.4      1.5                  self.worker_send(
  4730     28800      27570.0      1.0      0.1                      w, {"op": "delete-data", "keys": [key], "report": False}
  4731                                                           )
  4732   1411200     793922.0      0.6      3.9          ts.who_has.clear()

Total time: 5.16369 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_memory_forgotten at line 4734

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4734                                               @profiler
  4735                                               def transition_memory_forgotten(self, key):
  4736     28800      18052.0      0.6      0.3          try:
  4737     28800      20911.0      0.7      0.4              ts = self.tasks[key]
  4738                                           
  4739     28800      18032.0      0.6      0.3              if self.validate:
  4740                                                           assert ts.state == "memory"
  4741                                                           assert not ts.processing_on
  4742                                                           assert not ts.waiting_on
  4743                                                           if not ts.run_spec:
  4744                                                               # It's ok to forget a pure data task
  4745                                                               pass
  4746                                                           elif ts.has_lost_dependencies:
  4747                                                               # It's ok to forget a task with forgotten dependencies
  4748                                                               pass
  4749                                                           elif not ts.who_wants and not ts.waiters and not ts.dependents:
  4750                                                               # It's ok to forget a task that nobody needs
  4751                                                               pass
  4752                                                           else:
  4753                                                               assert 0, (ts,)
  4754                                           
  4755     28800      16515.0      0.6      0.3              recommendations = {}
  4756                                           
  4757     28800      17453.0      0.6      0.3              if ts.actor:
  4758                                                           for ws in ts.who_has:
  4759                                                               ws.actors.discard(ts)
  4760                                           
  4761     28800    4121132.0    143.1     79.8              self._propagate_forgotten(ts, recommendations)
  4762                                           
  4763     28800     611865.0     21.2     11.8              self.report_on_key(ts=ts)
  4764     28800     324557.0     11.3      6.3              self.remove_key(key)
  4765                                           
  4766     28800      15176.0      0.5      0.3              return recommendations
  4767                                                   except Exception as e:
  4768                                                       logger.exception(e)
  4769                                                       if LOG_PDB:
  4770                                                           import pdb
  4771                                           
  4772                                                           pdb.set_trace()
  4773                                                       raise

Total time: 74.7099 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_forgotten at line 4775

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4775                                               @profiler
  4776                                               def transition_released_forgotten(self, key):
  4777   1382400     658170.0      0.5      0.9          try:
  4778   1382400     795466.0      0.6      1.1              ts = self.tasks[key]
  4779                                           
  4780   1382400     718609.0      0.5      1.0              if self.validate:
  4781                                                           assert ts.state in ("released", "erred")
  4782                                                           assert not ts.who_has
  4783                                                           assert not ts.processing_on
  4784                                                           assert not ts.waiting_on, (ts, ts.waiting_on)
  4785                                                           if not ts.run_spec:
  4786                                                               # It's ok to forget a pure data task
  4787                                                               pass
  4788                                                           elif ts.has_lost_dependencies:
  4789                                                               # It's ok to forget a task with forgotten dependencies
  4790                                                               pass
  4791                                                           elif not ts.who_wants and not ts.waiters and not ts.dependents:
  4792                                                               # It's ok to forget a task that nobody needs
  4793                                                               pass
  4794                                                           else:
  4795                                                               assert 0, (ts,)
  4796                                           
  4797   1382400     676034.0      0.5      0.9              recommendations = {}
  4798   1382400   35295797.0     25.5     47.2              self._propagate_forgotten(ts, recommendations)
  4799                                           
  4800   1382400   23184222.0     16.8     31.0              self.report_on_key(ts=ts)
  4801   1382400   12737155.0      9.2     17.0              self.remove_key(key)
  4802                                           
  4803   1382400     644416.0      0.5      0.9              return recommendations
  4804                                                   except Exception as e:
  4805                                                       logger.exception(e)
  4806                                                       if LOG_PDB:
  4807                                                           import pdb
  4808                                           
  4809                                                           pdb.set_trace()
  4810                                                       raise

Total time: 1011.34 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition at line 4812

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4812                                               @profiler
  4813                                               def transition(self, key, finish, *args, **kwargs):
  4814                                                   """Transition a key from its current state to the finish state
  4815                                           
  4816                                                   Examples
  4817                                                   --------
  4818                                                   >>> self.transition('x', 'waiting')
  4819                                                   {'x': 'processing'}
  4820                                           
  4821                                                   Returns
  4822                                                   -------
  4823                                                   Dictionary of recommendations for future transitions
  4824                                           
  4825                                                   See Also
  4826                                                   --------
  4827                                                   Scheduler.transitions: transitive version of this function
  4828                                                   """
  4829   7027200    5896993.0      0.8      0.6          try:
  4830   7027200    5547026.0      0.8      0.5              try:
  4831   7027200    7985148.0      1.1      0.8                  ts = self.tasks[key]
  4832                                                       except KeyError:
  4833                                                           return {}
  4834   7027200    7677432.0      1.1      0.8              start = ts.state
  4835   7027200    5675244.0      0.8      0.6              if start == finish:
  4836                                                           return {}
  4837                                           
  4838   7027200    5921361.0      0.8      0.6              if self.plugins:
  4839   7027200    8400656.0      1.2      0.8                  dependents = set(ts.dependents)
  4840   7027200    7692475.0      1.1      0.8                  dependencies = set(ts.dependencies)
  4841                                           
  4842   7027200    7438767.0      1.1      0.7              if (start, finish) in self._transitions:
  4843   7027200    6873481.0      1.0      0.7                  func = self._transitions[start, finish]
  4844   7027200  800501210.0    113.9     79.2                  recommendations = func(key, *args, **kwargs)
  4845                                                       elif "released" not in (start, finish):
  4846                                                           func = self._transitions["released", finish]
  4847                                                           assert not args and not kwargs
  4848                                                           a = self.transition(key, "released")
  4849                                                           if key in a:
  4850                                                               func = self._transitions["released", a[key]]
  4851                                                           b = func(key)
  4852                                                           a = a.copy()
  4853                                                           a.update(b)
  4854                                                           recommendations = a
  4855                                                           start = "released"
  4856                                                       else:
  4857                                                           raise RuntimeError(
  4858                                                               "Impossible transition from %r to %r" % (start, finish)
  4859                                                           )
  4860                                           
  4861   7027200    8672530.0      1.2      0.9              finish2 = ts.state
  4862   7027200   12393645.0      1.8      1.2              self.transition_log.append((key, start, finish2, recommendations, time()))
  4863   7027200    6125193.0      0.9      0.6              if self.validate:
  4864                                                           logger.debug(
  4865                                                               "Transitioned %r %s->%s (actual: %s).  Consequence: %s",
  4866                                                               key,
  4867                                                               start,
  4868                                                               finish2,
  4869                                                               ts.state,
  4870                                                               dict(recommendations),
  4871                                                           )
  4872   7027200    6033792.0      0.9      0.6              if self.plugins:
  4873                                                           # Temporarily put back forgotten key for plugin to retrieve it
  4874   7027200    8095530.0      1.2      0.8                  if ts.state == "forgotten":
  4875   1411200    1014380.0      0.7      0.1                      try:
  4876   1411200    1209828.0      0.9      0.1                          ts.dependents = dependents
  4877   1411200    1131806.0      0.8      0.1                          ts.dependencies = dependencies
  4878                                                               except KeyError:
  4879                                                                   pass
  4880   1411200    1437118.0      1.0      0.1                      self.tasks[ts.key] = ts
  4881  14054400   14992839.0      1.1      1.5                  for plugin in list(self.plugins):
  4882   7027200    5649004.0      0.8      0.6                      try:
  4883   7027200   45738804.0      6.5      4.5                          plugin.transition(key, start, finish2, *args, **kwargs)
  4884                                                               except Exception:
  4885                                                                   logger.info("Plugin failed with exception", exc_info=True)
  4886   7027200    8198496.0      1.2      0.8                  if ts.state == "forgotten":
  4887   1411200    1262689.0      0.9      0.1                      del self.tasks[ts.key]
  4888                                           
  4889   7027200    7902757.0      1.1      0.8              if ts.state == "forgotten" and ts.group.name in self.task_groups:
  4890                                                           # Remove TaskGroup if all tasks are in the forgotten state
  4891   1411200    1125767.0      0.8      0.1                  tg = ts.group
  4892   1411200    5223630.0      3.7      0.5                  if not any(tg.states.get(s) for s in ALL_TASK_STATES):
  4893       200        322.0      1.6      0.0                      ts.prefix.groups.remove(tg)
  4894       200        174.0      0.9      0.0                      del self.task_groups[tg.name]
  4895                                           
  4896   7027200    5518333.0      0.8      0.5              return recommendations
  4897                                                   except Exception as e:
  4898                                                       logger.exception("Error transitioning %r from %r to %r", key, start, finish)
  4899                                                       if LOG_PDB:
  4900                                                           import pdb
  4901                                           
  4902                                                           pdb.set_trace()
  4903                                                       raise

Total time: 829.475 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transitions at line 4905

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4905                                               @profiler
  4906                                               def transitions(self, recommendations):
  4907                                                   """Process transitions until none are left
  4908                                           
  4909                                                   This includes feedback from previous transitions and continues until we
  4910                                                   reach a steady state
  4911                                                   """
  4912   1440025    1188887.0      0.8      0.1          keys = set()
  4913   1440025    1187111.0      0.8      0.1          recommendations = recommendations.copy()
  4914   7056025    2888905.0      0.4      0.3          while recommendations:
  4915   5616000    3514573.0      0.6      0.4              key, finish = recommendations.popitem()
  4916   5616000    3106260.0      0.6      0.4              keys.add(key)
  4917   5616000  812842346.0    144.7     98.0              new = self.transition(key, finish)
  4918   5616000    3983896.0      0.7      0.5              recommendations.update(new)
  4919                                           
  4920   1440025     762934.0      0.5      0.1          if self.validate:
  4921                                                       for key in keys:
  4922                                                           self.validate_key(key)

Total time: 48.7669 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: check_idle_saturated at line 4957

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4957                                               @profiler
  4958                                               def check_idle_saturated(self, ws, occ=None):
  4959                                                   """Update the status of the idle and saturated state
  4960                                           
  4961                                                   The scheduler keeps track of workers that are ..
  4962                                           
  4963                                                   -  Saturated: have enough work to stay busy
  4964                                                   -  Idle: do not have enough work to stay busy
  4965                                           
  4966                                                   They are considered saturated if they both have enough tasks to occupy
  4967                                                   all of their threads, and if the expected runtime of those tasks is
  4968                                                   large enough.
  4969                                           
  4970                                                   This is useful for load balancing and adaptivity.
  4971                                                   """
  4972   2874113   19993488.0      7.0     41.0          if self.total_nthreads == 0 or ws.status == Status.closed:
  4973                                                       return
  4974   2874113    2030165.0      0.7      4.2          if occ is None:
  4975   2848559    1882541.0      0.7      3.9              occ = ws.occupancy
  4976   2874113    1800450.0      0.6      3.7          nc = ws.nthreads
  4977   2874113    2279959.0      0.8      4.7          p = len(ws.processing)
  4978                                           
  4979   2874113    2500894.0      0.9      5.1          avg = self.total_occupancy / self.total_nthreads
  4980                                           
  4981   2874113    2779122.0      1.0      5.7          if p < nc or occ / nc < avg / 2:
  4982    133219     327651.0      2.5      0.7              self.idle.add(ws)
  4983    133219     163475.0      1.2      0.3              self.saturated.discard(ws)
  4984                                                   else:
  4985   2740894    6615269.0      2.4     13.6              self.idle.discard(ws)
  4986                                           
  4987   2740894    2674225.0      1.0      5.5              pending = occ * (p - nc) / p / nc
  4988   2740894    2325307.0      0.8      4.8              if p > nc and pending > 0.4 and pending > 1.9 * avg:
  4989     31658      39552.0      1.2      0.1                  self.saturated.add(ws)
  4990                                                       else:
  4991   2709236    3354807.0      1.2      6.9                  self.saturated.discard(ws)

Total time: 20.4733 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: decide_worker at line 5548

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  5548                                           @profiler
  5549                                           def decide_worker(ts, all_workers, valid_workers, objective):
  5550                                               """
  5551                                               Decide which worker should take task *ts*.
  5552                                           
  5553                                               We choose the worker that has the data on which *ts* depends.
  5554                                           
  5555                                               If several workers have dependencies then we choose the less-busy worker.
  5556                                           
  5557                                               Optionally provide *valid_workers* of where jobs are allowed to occur
  5558                                               (if all workers are allowed to take the task, pass True instead).
  5559                                           
  5560                                               If the task requires data communication because no eligible worker has
  5561                                               all the dependencies already, then we choose to minimize the number
  5562                                               of bytes sent between workers.  This is determined by calling the
  5563                                               *objective* function.
  5564                                               """
  5565   1376640    1023459.0      0.7      5.0      deps = ts.dependencies
  5566   1376640    4361734.0      3.2     21.3      assert all(dts.who_has for dts in deps)
  5567   1376640    1021277.0      0.7      5.0      if ts.actor:
  5568                                                   candidates = set(all_workers)
  5569                                               else:
  5570   1376640    4462003.0      3.2     21.8          candidates = {ws for dts in deps for ws in dts.who_has}
  5571   1376640     929015.0      0.7      4.5      if valid_workers is True:
  5572   1376640     900195.0      0.7      4.4          if not candidates:
  5573                                                       candidates = set(all_workers)
  5574                                               else:
  5575                                                   candidates &= valid_workers
  5576                                                   if not candidates:
  5577                                                       candidates = valid_workers
  5578                                                       if not candidates:
  5579                                                           if ts.loose_restrictions:
  5580                                                               return decide_worker(ts, all_workers, True, objective)
  5581                                                           else:
  5582                                                               return None
  5583   1376640     834019.0      0.6      4.1      if not candidates:
  5584                                                   return None
  5585                                           
  5586   1376640    1176111.0      0.9      5.7      if len(candidates) == 1:
  5587   1286508    1220457.0      0.9      6.0          return first(candidates)
  5588                                           
  5589     90132    4545008.0     50.4     22.2      return min(candidates, key=objective)

Total time: 0 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition at line 5795

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  5795                                               @profiler
  5796                                               def transition(self, key, start, finish, *args, **kwargs):
  5797                                                   if finish == "memory" or finish == "erred":
  5798                                                       ts = self.scheduler.tasks.get(key)
  5799                                                       if ts is not None and ts.key in self.keys:
  5800                                                           self.metadata[key] = ts.metadata
  5801                                                           self.state[key] = finish
  5802                                                           self.keys.discard(key)

@mrocklin
Copy link
Contributor

I haven't looked deeply at these numbers yet. If you don't mind my being lazy, I'm broadly curious about two questions:

  1. Have the recent changes had a significant effect (my historical experience was that profiling and then optimizing helped, but not as much as I hoped)
  2. Do you have a sense for what the next bottlenecks are?

@jakirkham
Copy link
Collaborator Author

Sure. I've also made some updates to the summary above.

High level the transition_ functions seem to have seen ~6% (more or less depending on the function) improvement. So an improvement, but not as big we would like.

The work on report seems to be paying off. It is running ~33% faster and report_on_key is ~20% faster. So that's a nice improvement. This is particularly helpful for transition_processing_memory (the slowest transition), which spends a good chunk of time in report.

The hashing improvements seemed to have helped. Can tell as we haven't done anything to help transition_waiting_processing (the second slowest transition) specifically, yet we have seen a ~5% improvement there. As transition_processing_memory was mainly blocked by report, this improvement makes sense (it's at the higher end with ~7.5%). There's another hashing improvement in PR ( dask/distributed#4278 ), which I think is the last of these.

It would be good to understand what is going on in BatchedSend.send. Several transitions rely on calling send, but it appears to be slow for them. Interestingly report does too, but doesn't appear to be blocked there. Maybe there's a difference between sending to the Client vs. Workers? Haven't had much luck profiling send (maybe because it is a coroutine?). So any tips you have there would be welcome 🙂

@jakirkham
Copy link
Collaborator Author

Separately would add that when it comes to Cythonization (as it seems like we are getting closer to that stage), we might want to look at check_idle_saturated first. It's not the absolute slowest function, but it is pretty slow and does pop up in the two slowest transitions either directly or indirectly. As it's code is pretty well contained and the work it does is well suited for C compilation (arithmetic, branch, etc.), expect this one to have the most notable effect without needing to write a lot of code. That said, this function also is pretty heavily effected by the __eq__ in Status ( dask/distributed#4270 ) spending 40% of its time there.

Other functions tend to use one of the State objects. So we probably need to start Cythonizing State objects before delving into them. This might be the sort of task that parallelizes well over a few devs (cc @quasiben) 😉

@mrocklin
Copy link
Contributor

mrocklin commented Nov 25, 2020 via email

@mrocklin
Copy link
Contributor

mrocklin commented Nov 25, 2020 via email

@jakirkham
Copy link
Collaborator Author

As send effectively turns into write later on, we have made a few improvements there already ( dask/distributed#4254 ) ( dask/distributed#4257 ). Also have included another improvement in PR ( dask/distributed#4281 ). Additionally had tried another simplification ( dask/distributed#4258 ), but it wound up being more-or-less a wash unfortunately.

@jakirkham
Copy link
Collaborator Author

Reran with line_profiler using commit ( dask/distributed@5c56503 ) now that PR ( dask/distributed#4281 ) is included. Should add have dropped @profiler from any of the transition functions, which did not show any usage. Results below:

Results from prof_38688.lstat:

Timer unit: 1e-06 s

Total time: 22.2487 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: report at line 2613

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2613                                               @profiler
  2614                                               def report(self, msg, ts=None, client=None):
  2615                                                   """
  2616                                                   Publish updates to all listening Queues and Comms
  2617                                           
  2618                                                   If the message contains a key then we only send the message to those
  2619                                                   comms that care about the key.
  2620                                                   """
  2621   4233600    3605846.0      0.9     16.2          if ts is None and "key" in msg:
  2622   2822400    2882170.0      1.0     13.0              ts = self.tasks.get(msg["key"])
  2623                                           
  2624   4233600    2620233.0      0.6     11.8          if ts is None:
  2625                                                       # Notify all clients
  2626                                                       client_keys = list(self.client_comms)
  2627   4233600    2611566.0      0.6     11.7          elif client is None:
  2628                                                       # Notify clients interested in key
  2629   4233600    6795293.0      1.6     30.5              client_keys = [c.client_key for c in ts.who_wants]
  2630                                                   else:
  2631                                                       # Notify clients interested in key (including `client`)
  2632                                                       client_keys = [c.client_key for c in ts.who_wants if c.client_key != client]
  2633                                                       client_keys.append(client)
  2634                                           
  2635   4291200    3215840.0      0.7     14.5          for k in client_keys:
  2636     57600      35712.0      0.6      0.2              try:
  2637     57600      55773.0      1.0      0.3                  c = self.client_comms[k]
  2638                                                       except KeyError:
  2639                                                           continue
  2640     57600      35350.0      0.6      0.2              try:
  2641     57600     390930.0      6.8      1.8                  c.send(msg)
  2642                                                           # logger.debug("Scheduler sends message to client %s", msg)
  2643                                                       except CommClosedError:
  2644                                                           if self.status == Status.running:
  2645                                                               logger.critical("Tried writing to closed comm: %s", msg)

Total time: 47.7492 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: send_task_to_worker at line 2726

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2726                                               @profiler
  2727                                               def send_task_to_worker(self, worker, key):
  2728                                                   """ Send a single computational task to a worker """
  2729   1413219    1314588.0      0.9      2.8          try:
  2730   1413219    1548044.0      1.1      3.2              ts = self.tasks[key]
  2731                                           
  2732   1413219    1621890.0      1.1      3.4              msg = {
  2733   1413219    1219069.0      0.9      2.6                  "op": "compute-task",
  2734   1413219    1163967.0      0.8      2.4                  "key": key,
  2735   1413219    1443711.0      1.0      3.0                  "priority": ts.priority,
  2736   1413219    2703755.0      1.9      5.7                  "duration": self.get_task_duration(ts),
  2737                                                       }
  2738   1413219    1297739.0      0.9      2.7              if ts.resource_restrictions:
  2739                                                           msg["resource_restrictions"] = ts.resource_restrictions
  2740   1413219    1273498.0      0.9      2.7              if ts.actor:
  2741                                                           msg["actor"] = True
  2742                                           
  2743   1413219    1281117.0      0.9      2.7              deps = ts.dependencies
  2744   1413219    1291263.0      0.9      2.7              if deps:
  2745   2757318    7474169.0      2.7     15.7                  msg["who_has"] = {
  2746   1378659    1164166.0      0.8      2.4                      dep.key: [ws.address for ws in dep.who_has] for dep in deps
  2747                                                           }
  2748   1378659    3245371.0      2.4      6.8                  msg["nbytes"] = {dep.key: dep.nbytes for dep in deps}
  2749                                           
  2750   1413219    1409605.0      1.0      3.0              if self.validate and deps:
  2751                                                           assert all(msg["who_has"].values())
  2752                                           
  2753   1413219    1544567.0      1.1      3.2              task = ts.run_spec
  2754   1413219    1582941.0      1.1      3.3              if type(task) is dict:
  2755   1407459    2381986.0      1.7      5.0                  msg.update(task)
  2756                                                       else:
  2757      5760       5346.0      0.9      0.0                  msg["task"] = task
  2758                                           
  2759   1413219   12782457.0      9.0     26.8              self.worker_send(worker, msg)
  2760                                                   except Exception as e:
  2761                                                       logger.exception(e)
  2762                                                       if LOG_PDB:
  2763                                                           import pdb
  2764                                           
  2765                                                           pdb.set_trace()
  2766                                                       raise

Total time: 15.7014 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: worker_send at line 2896

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2896                                               @profiler
  2897                                               def worker_send(self, worker, msg):
  2898                                                   """Send message to worker
  2899                                           
  2900                                                   This also handles connection failures by adding a callback to remove
  2901                                                   the worker on the next cycle.
  2902                                                   """
  2903   3574143    1486289.0      0.4      9.5          try:
  2904   3574143   14215140.0      4.0     90.5              self.stream_comms[worker].send(msg)
  2905                                                   except (CommClosedError, AttributeError):
  2906                                                       self.loop.add_callback(self.remove_worker, address=worker)

Total time: 17.2748 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: report_on_key at line 3710

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3710                                               @profiler
  3711                                               def report_on_key(self, key=None, ts=None, client=None):
  3712   1411200     904526.0      0.6      5.2          assert (key is None) + (ts is None) == 1, (key, ts)
  3713   1411200     696981.0      0.5      4.0          if ts is None:
  3714                                                       try:
  3715                                                           ts = self.tasks[key]
  3716                                                       except KeyError:
  3717                                                           self.report({"op": "cancelled-key", "key": key}, client=client)
  3718                                                           return
  3719                                                   else:
  3720   1411200     723210.0      0.5      4.2              key = ts.key
  3721   1411200    1091874.0      0.8      6.3          if ts.state == "forgotten":
  3722   1411200   13858188.0      9.8     80.2              self.report({"op": "cancelled-key", "key": key}, ts=ts, client=client)
  3723                                                   elif ts.state == "memory":
  3724                                                       self.report({"op": "key-in-memory", "key": key}, ts=ts, client=client)
  3725                                                   elif ts.state == "erred":
  3726                                                       failing_ts = ts.exception_blame
  3727                                                       self.report(
  3728                                                           {
  3729                                                               "op": "task-erred",
  3730                                                               "key": key,
  3731                                                               "exception": failing_ts.exception,
  3732                                                               "traceback": failing_ts.traceback,
  3733                                                           },
  3734                                                           ts=ts,
  3735                                                           client=client,
  3736                                                       )

Total time: 48.4139 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _remove_from_processing at line 3986

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  3986                                               @profiler
  3987                                               def _remove_from_processing(self, ts, send_worker_msg=None):
  3988                                                   """
  3989                                                   Remove *ts* from the set of processing tasks.
  3990                                                   """
  3991   1411200     827009.0      0.6      1.7          ws = ts.processing_on
  3992   1411200     797808.0      0.6      1.6          ts.processing_on = None
  3993   1411200     714121.0      0.5      1.5          w = ws.address
  3994   1411200    1033489.0      0.7      2.1          if w in self.workers:  # may have been removed
  3995   1411200    1480163.0      1.0      3.1              duration = ws.processing.pop(ts)
  3996   1411200     677038.0      0.5      1.4              if not ws.processing:
  3997       224        211.0      0.9      0.0                  self.total_occupancy -= ws.occupancy
  3998       224        104.0      0.5      0.0                  ws.occupancy = 0
  3999                                                       else:
  4000   1410976    1048409.0      0.7      2.2                  self.total_occupancy -= duration
  4001   1410976     932058.0      0.7      1.9                  ws.occupancy -= duration
  4002   1411200   38448200.0     27.2     79.4              self.check_idle_saturated(ws)
  4003   1411200    1755324.0      1.2      3.6              self.release_resources(ts, ws)
  4004   1411200     699982.0      0.5      1.4              if send_worker_msg:
  4005                                                           self.worker_send(w, send_worker_msg)

Total time: 69.9074 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _add_to_memory at line 4007

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4007                                               @profiler
  4008                                               def _add_to_memory(
  4009                                                   self, ts, ws, recommendations, type=None, typename=None, **kwargs
  4010                                               ):
  4011                                                   """
  4012                                                   Add *ts* to the set of in-memory tasks.
  4013                                                   """
  4014   1411200    1210579.0      0.9      1.7          if self.validate:
  4015                                                       assert ts not in ws.has_what
  4016                                           
  4017   1411200    1829109.0      1.3      2.6          ts.who_has.add(ws)
  4018   1411200    1898703.0      1.3      2.7          ws.has_what.add(ts)
  4019   1411200    2470581.0      1.8      3.5          ws.nbytes += ts.get_nbytes()
  4020                                           
  4021   1411200    1028448.0      0.7      1.5          deps = ts.dependents
  4022   1411200    1317079.0      0.9      1.9          if len(deps) > 1:
  4023    103680    1091394.0     10.5      1.6              deps = sorted(deps, key=operator.attrgetter("priority"), reverse=True)
  4024   3864960    2878164.0      0.7      4.1          for dts in deps:
  4025   2453760    2057961.0      0.8      2.9              s = dts.waiting_on
  4026   2453760    2594680.0      1.1      3.7              if ts in s:
  4027   2453760    2535527.0      1.0      3.6                  s.discard(ts)
  4028   2453760    1699707.0      0.7      2.4                  if not s:  # new task ready to run
  4029   1376640    1388993.0      1.0      2.0                      recommendations[dts.key] = "processing"
  4030                                           
  4031   3864960    2902393.0      0.8      4.2          for dts in ts.dependencies:
  4032   2453760    2067228.0      0.8      3.0              s = dts.waiters
  4033   2453760    2667237.0      1.1      3.8              s.discard(ts)
  4034   2453760    1841714.0      0.8      2.6              if not s and not dts.who_wants:
  4035   1382400    1353523.0      1.0      1.9                  recommendations[dts.key] = "released"
  4036                                           
  4037   1411200    1129651.0      0.8      1.6          if not ts.waiters and not ts.who_wants:
  4038                                                       recommendations[ts.key] = "released"
  4039                                                   else:
  4040   1411200    1394085.0      1.0      2.0              msg = {"op": "key-in-memory", "key": ts.key}
  4041   1411200    1000947.0      0.7      1.4              if type is not None:
  4042   1411200    1080898.0      0.8      1.5                  msg["type"] = type
  4043   1411200   20553655.0     14.6     29.4              self.report(msg)
  4044                                           
  4045   1411200    3878407.0      2.7      5.5          ts.state = "memory"
  4046   1411200    1137377.0      0.8      1.6          ts.type = typename
  4047   1411200    1794015.0      1.3      2.6          ts.group.types.add(typename)
  4048                                           
  4049   1411200    1221412.0      0.9      1.7          cs = self.clients["fire-and-forget"]
  4050   1411200    1883948.0      1.3      2.7          if ts in cs.wants_what:
  4051                                                       self.client_releases_keys(client="fire-and-forget", keys=[ts.key])

Total time: 29.5135 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_waiting at line 4053

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4053                                               @profiler
  4054                                               def transition_released_waiting(self, key):
  4055   1411200     944982.0      0.7      3.2          try:
  4056   1411200    1021014.0      0.7      3.5              ts = self.tasks[key]
  4057                                           
  4058   1411200     949978.0      0.7      3.2              if self.validate:
  4059                                                           assert ts.run_spec
  4060                                                           assert not ts.waiting_on
  4061                                                           assert not ts.who_has
  4062                                                           assert not ts.processing_on
  4063                                                           assert not any(dts.state == "forgotten" for dts in ts.dependencies)
  4064                                           
  4065   1411200     983350.0      0.7      3.3              if ts.has_lost_dependencies:
  4066                                                           return {key: "forgotten"}
  4067                                           
  4068   1411200    2434188.0      1.7      8.2              ts.state = "waiting"
  4069                                           
  4070   1411200     923084.0      0.7      3.1              recommendations = {}
  4071                                           
  4072   3864960    2704440.0      0.7      9.2              for dts in ts.dependencies:
  4073   2453760    1636913.0      0.7      5.5                  if dts.exception_blame:
  4074                                                               ts.exception_blame = dts.exception_blame
  4075                                                               recommendations[key] = "erred"
  4076                                                               return recommendations
  4077                                           
  4078   3864960    2603189.0      0.7      8.8              for dts in ts.dependencies:
  4079   2453760    1684904.0      0.7      5.7                  dep = dts.key
  4080   2453760    1721045.0      0.7      5.8                  if not dts.who_has:
  4081   2453760    2569478.0      1.0      8.7                      ts.waiting_on.add(dts)
  4082   2453760    2195807.0      0.9      7.4                  if dts.state == "released":
  4083                                                               recommendations[dep] = "waiting"
  4084                                                           else:
  4085   2453760    2485468.0      1.0      8.4                      dts.waiters.add(ts)
  4086                                           
  4087   1411200    2727458.0      1.9      9.2              ts.waiters = {dts for dts in ts.dependents if dts.state == "waiting"}
  4088                                           
  4089   1411200     991857.0      0.7      3.4              if not ts.waiting_on:
  4090     34560      25773.0      0.7      0.1                  if self.workers:
  4091     34560      28478.0      0.8      0.1                      recommendations[key] = "processing"
  4092                                                           else:
  4093                                                               self.unrunnable.add(ts)
  4094                                                               ts.state = "no-worker"
  4095                                           
  4096   1411200     882079.0      0.6      3.0              return recommendations
  4097                                                   except Exception as e:
  4098                                                       logger.exception(e)
  4099                                                       if LOG_PDB:
  4100                                                           import pdb
  4101                                           
  4102                                                           pdb.set_trace()
  4103                                                       raise

Total time: 51.6245 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: decide_worker at line 4149

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4149                                               @profiler
  4150                                               def decide_worker(self, ts):
  4151                                                   """
  4152                                                   Decide on a worker for task *ts*.  Return a WorkerState.
  4153                                                   """
  4154   1411200    6950235.0      4.9     13.5          valid_workers = self.valid_workers(ts)
  4155                                           
  4156   1411200     823270.0      0.6      1.6          if not valid_workers and not ts.loose_restrictions and self.workers:
  4157                                                       self.unrunnable.add(ts)
  4158                                                       ts.state = "no-worker"
  4159                                                       return None
  4160                                           
  4161   1411200     942099.0      0.7      1.8          if ts.dependencies or valid_workers is not True:
  4162   2753280   34966143.0     12.7     67.7              worker = decide_worker(
  4163   1376640     663632.0      0.5      1.3                  ts,
  4164   1376640    3068443.0      2.2      5.9                  self.workers.values(),
  4165   1376640     708768.0      0.5      1.4                  valid_workers,
  4166   1376640    1368706.0      1.0      2.7                  partial(self.worker_objective, ts),
  4167                                                       )
  4168     34560      40737.0      1.2      0.1          elif self.idle:
  4169       240        458.0      1.9      0.0              if len(self.idle) < 20:  # smart but linear in small case
  4170       240       1452.0      6.0      0.0                  worker = min(self.idle, key=operator.attrgetter("occupancy"))
  4171                                                       else:  # dumb but fast in large case
  4172                                                           worker = self.idle[self.n_tasks % len(self.idle)]
  4173                                                   else:
  4174     34320      22120.0      0.6      0.0              if len(self.workers) < 20:  # smart but linear in small case
  4175     68640     284848.0      4.1      0.6                  worker = min(
  4176     34320      65678.0      1.9      0.1                      self.workers.values(), key=operator.attrgetter("occupancy")
  4177                                                           )
  4178                                                       else:  # dumb but fast in large case
  4179                                                           worker = self.workers.values()[self.n_tasks % len(self.workers)]
  4180                                           
  4181   1411200     998029.0      0.7      1.9          if self.validate:
  4182                                                       assert worker is None or isinstance(worker, WorkerState), (
  4183                                                           type(worker),
  4184                                                           worker,
  4185                                                       )
  4186                                                       assert worker.address in self.workers
  4187                                           
  4188   1411200     719895.0      0.5      1.4          return worker

Total time: 214.028 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_waiting_processing at line 4190

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4190                                               @profiler
  4191                                               def transition_waiting_processing(self, key):
  4192   1411200    1015003.0      0.7      0.5          try:
  4193   1411200    1151378.0      0.8      0.5              ts = self.tasks[key]
  4194                                           
  4195   1411200    1039907.0      0.7      0.5              if self.validate:
  4196                                                           assert not ts.waiting_on
  4197                                                           assert not ts.who_has
  4198                                                           assert not ts.exception_blame
  4199                                                           assert not ts.processing_on
  4200                                                           assert not ts.has_lost_dependencies
  4201                                                           assert ts not in self.unrunnable
  4202                                                           assert all(dts.who_has for dts in ts.dependencies)
  4203                                           
  4204   1411200   65662031.0     46.5     30.7              ws = self.decide_worker(ts)
  4205   1411200    1011036.0      0.7      0.5              if ws is None:
  4206                                                           return {}
  4207   1411200    1078485.0      0.8      0.5              worker = ws.address
  4208                                           
  4209   1411200    2788221.0      2.0      1.3              duration = self.get_task_duration(ts)
  4210   1411200    4767624.0      3.4      2.2              comm = self.get_comm_cost(ts, ws)
  4211                                           
  4212   1411200    2475662.0      1.8      1.2              ws.processing[ts] = duration + comm
  4213   1411200    1062961.0      0.8      0.5              ts.processing_on = ws
  4214   1411200    1446193.0      1.0      0.7              ws.occupancy += duration + comm
  4215   1411200    1331571.0      0.9      0.6              self.total_occupancy += duration + comm
  4216   1411200    3576395.0      2.5      1.7              ts.state = "processing"
  4217   1411200    1970378.0      1.4      0.9              self.consume_resources(ts, ws)
  4218   1411200   36097575.0     25.6     16.9              self.check_idle_saturated(ws)
  4219   1411200    1476605.0      1.0      0.7              self.n_tasks += 1
  4220                                           
  4221   1411200    1115330.0      0.8      0.5              if ts.actor:
  4222                                                           ws.actors.add(ts)
  4223                                           
  4224                                                       # logger.debug("Send job to worker: %s, %s", worker, key)
  4225                                           
  4226   1411200   83900102.0     59.5     39.2              self.send_task_to_worker(worker, key)
  4227                                           
  4228   1411200    1061320.0      0.8      0.5              return {}
  4229                                                   except Exception as e:
  4230                                                       logger.exception(e)
  4231                                                       if LOG_PDB:
  4232                                                           import pdb
  4233                                           
  4234                                                           pdb.set_trace()
  4235                                                       raise

Total time: 234.312 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_processing_memory at line 4272

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4272                                               @profiler
  4273                                               def transition_processing_memory(
  4274                                                   self,
  4275                                                   key,
  4276                                                   nbytes=None,
  4277                                                   type=None,
  4278                                                   typename=None,
  4279                                                   worker=None,
  4280                                                   startstops=None,
  4281                                                   **kwargs,
  4282                                               ):
  4283   1411200    1278733.0      0.9      0.5          try:
  4284   1411200    1483233.0      1.1      0.6              ts = self.tasks[key]
  4285   1411200    1259483.0      0.9      0.5              assert worker
  4286   1411200    1549686.0      1.1      0.7              assert isinstance(worker, str)
  4287                                           
  4288   1411200    1296993.0      0.9      0.6              if self.validate:
  4289                                                           assert ts.processing_on
  4290                                                           ws = ts.processing_on
  4291                                                           assert ts in ws.processing
  4292                                                           assert not ts.waiting_on
  4293                                                           assert not ts.who_has, (ts, ts.who_has)
  4294                                                           assert not ts.exception_blame
  4295                                                           assert ts.state == "processing"
  4296                                           
  4297   1411200    1758508.0      1.2      0.8              ws = self.workers.get(worker)
  4298   1411200    1249815.0      0.9      0.5              if ws is None:
  4299                                                           return {key: "released"}
  4300                                           
  4301   1411200    3586286.0      2.5      1.5              if ws != ts.processing_on:  # someone else has this task
  4302                                                           logger.info(
  4303                                                               "Unexpected worker completed task, likely due to"
  4304                                                               " work stealing.  Expected: %s, Got: %s, Key: %s",
  4305                                                               ts.processing_on,
  4306                                                               ws,
  4307                                                               key,
  4308                                                           )
  4309                                                           return {}
  4310                                           
  4311   1411200    1284340.0      0.9      0.5              if startstops:
  4312   1411200    1609767.0      1.1      0.7                  L = list()
  4313   2863435    2782399.0      1.0      1.2                  for startstop in startstops:
  4314   1452235    1448715.0      1.0      0.6                      stop = startstop["stop"]
  4315   1452235    1287043.0      0.9      0.5                      start = startstop["start"]
  4316   1452235    1291953.0      0.9      0.6                      action = startstop["action"]
  4317   1452235    1402346.0      1.0      0.6                      if action == "compute":
  4318   1411200    1628003.0      1.2      0.7                          L.append((start, stop))
  4319                                           
  4320                                                               # record timings of all actions -- a cheaper way of
  4321                                                               # getting timing info compared with get_task_stream()
  4322   1452235    2758818.0      1.9      1.2                      ts.prefix.all_durations[action] += stop - start
  4323                                           
  4324   1411200    1611300.0      1.1      0.7                  if len(L) > 0:
  4325   1411200    1625412.0      1.2      0.7                      compute_start, compute_stop = L[0]
  4326                                                           else:  # This is very rare
  4327                                                               compute_start = compute_stop = None
  4328                                                       else:
  4329                                                           compute_start = compute_stop = None
  4330                                           
  4331                                                       #############################
  4332                                                       # Update Timing Information #
  4333                                                       #############################
  4334   1411200    3318797.0      2.4      1.4              if compute_start and ws.processing.get(ts, True):
  4335                                                           # Update average task duration for worker
  4336   1411200    1523835.0      1.1      0.7                  old_duration = ts.prefix.duration_average or 0
  4337   1411200    1288761.0      0.9      0.6                  new_duration = compute_stop - compute_start
  4338   1411200    1283990.0      0.9      0.5                  if not old_duration:
  4339         4          4.0      1.0      0.0                      avg_duration = new_duration
  4340                                                           else:
  4341   1411196    1597758.0      1.1      0.7                      avg_duration = 0.5 * old_duration + 0.5 * new_duration
  4342                                           
  4343   1411200    1533053.0      1.1      0.7                  ts.prefix.duration_average = avg_duration
  4344   1411200    2192198.0      1.6      0.9                  ts.group.duration += new_duration
  4345                                           
  4346   1413186    2171529.0      1.5      0.9                  for tts in self.unknown_durations.pop(ts.prefix.name, ()):
  4347      1986       2221.0      1.1      0.0                      if tts.processing_on:
  4348      1986       2208.0      1.1      0.0                          wws = tts.processing_on
  4349      1986       3403.0      1.7      0.0                          old = wws.processing[tts]
  4350      1986       5556.0      2.8      0.0                          comm = self.get_comm_cost(tts, wws)
  4351      1986       2926.0      1.5      0.0                          wws.processing[tts] = avg_duration + comm
  4352      1986       2404.0      1.2      0.0                          wws.occupancy += avg_duration + comm - old
  4353      1986       2502.0      1.3      0.0                          self.total_occupancy += avg_duration + comm - old
  4354                                           
  4355                                                       ############################
  4356                                                       # Update State Information #
  4357                                                       ############################
  4358   1411200    1364708.0      1.0      0.6              if nbytes is not None:
  4359   1411200    5138487.0      3.6      2.2                  ts.set_nbytes(nbytes)
  4360                                           
  4361   1411200    1343424.0      1.0      0.6              recommendations = {}
  4362                                           
  4363   1411200   60037063.0     42.5     25.6              self._remove_from_processing(ts)
  4364                                           
  4365   1411200  116589360.0     82.6     49.8              self._add_to_memory(ts, ws, recommendations, type=type, typename=typename)
  4366                                           
  4367   1411200    1509406.0      1.1      0.6              if self.validate:
  4368                                                           assert not ts.processing_on
  4369                                                           assert not ts.waiting_on
  4370                                           
  4371   1411200    1205832.0      0.9      0.5              return recommendations
  4372                                                   except Exception as e:
  4373                                                       logger.exception(e)
  4374                                                       if LOG_PDB:
  4375                                                           import pdb
  4376                                           
  4377                                                           pdb.set_trace()
  4378                                                       raise

Total time: 68.5131 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_memory_released at line 4380

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4380                                               @profiler
  4381                                               def transition_memory_released(self, key, safe=False):
  4382   1382400    1093511.0      0.8      1.6          try:
  4383   1382400    1203731.0      0.9      1.8              ts = self.tasks[key]
  4384                                           
  4385   1382400    1082299.0      0.8      1.6              if self.validate:
  4386                                                           assert not ts.waiting_on
  4387                                                           assert not ts.processing_on
  4388                                                           if safe:
  4389                                                               assert not ts.waiters
  4390                                           
  4391   1382400    1109163.0      0.8      1.6              if ts.actor:
  4392                                                           for ws in ts.who_has:
  4393                                                               ws.actors.discard(ts)
  4394                                                           if ts.who_wants:
  4395                                                               ts.exception_blame = ts
  4396                                                               ts.exception = "Worker holding Actor was lost"
  4397                                                               return {ts.key: "erred"}  # don't try to recreate
  4398                                           
  4399   1382400    1016819.0      0.7      1.5              recommendations = {}
  4400                                           
  4401   1382400    1403900.0      1.0      2.0              for dts in ts.waiters:
  4402                                                           if dts.state in ("no-worker", "processing"):
  4403                                                               recommendations[dts.key] = "waiting"
  4404                                                           elif dts.state == "waiting":
  4405                                                               dts.waiting_on.add(ts)
  4406                                           
  4407                                                       # XXX factor this out?
  4408   3514520    2942603.0      0.8      4.3              for ws in ts.who_has:
  4409   2132120    3150306.0      1.5      4.6                  ws.has_what.remove(ts)
  4410   2132120    3529445.0      1.7      5.2                  ws.nbytes -= ts.get_nbytes()
  4411   2132120    3251992.0      1.5      4.7                  ts.group.nbytes_in_memory -= ts.get_nbytes()
  4412   4264240   17842964.0      4.2     26.0                  self.worker_send(
  4413   2132120    2266780.0      1.1      3.3                      ws.address, {"op": "delete-data", "keys": [key], "report": False}
  4414                                                           )
  4415   1382400    1336448.0      1.0      2.0              ts.who_has.clear()
  4416                                           
  4417   1382400    3094435.0      2.2      4.5              ts.state = "released"
  4418                                           
  4419   1382400   18308152.0     13.2     26.7              self.report({"op": "lost-data", "key": key})
  4420                                           
  4421   1382400    1364405.0      1.0      2.0              if not ts.run_spec:  # pure data
  4422                                                           recommendations[key] = "forgotten"
  4423   1382400    1138079.0      0.8      1.7              elif ts.has_lost_dependencies:
  4424                                                           recommendations[key] = "forgotten"
  4425   1382400    1294829.0      0.9      1.9              elif ts.who_wants or ts.waiters:
  4426                                                           recommendations[key] = "waiting"
  4427                                           
  4428   1382400    1128707.0      0.8      1.6              if self.validate:
  4429                                                           assert not ts.waiting_on
  4430                                           
  4431   1382400     954547.0      0.7      1.4              return recommendations
  4432                                                   except Exception as e:
  4433                                                       logger.exception(e)
  4434                                                       if LOG_PDB:
  4435                                                           import pdb
  4436                                           
  4437                                                           pdb.set_trace()
  4438                                                       raise

Total time: 6.45733 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: remove_key at line 4690

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4690                                               @profiler
  4691                                               def remove_key(self, key):
  4692   1411200     844324.0      0.6     13.1          ts = self.tasks.pop(key)
  4693   1411200     985107.0      0.7     15.3          assert ts.state == "forgotten"
  4694   1411200    1124671.0      0.8     17.4          self.unrunnable.discard(ts)
  4695   1411200     679898.0      0.5     10.5          for cs in ts.who_wants:
  4696                                                       cs.wants_what.remove(ts)
  4697   1411200     650127.0      0.5     10.1          ts.who_wants.clear()
  4698   1411200     640211.0      0.5      9.9          ts.processing_on = None
  4699   1411200     822192.0      0.6     12.7          ts.exception_blame = ts.exception = ts.traceback = None
  4700                                           
  4701   1411200     710804.0      0.5     11.0          if key in self.task_metadata:
  4702                                                       del self.task_metadata[key]

Total time: 22.1209 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: _propagate_forgotten at line 4704

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4704                                               @profiler
  4705                                               def _propagate_forgotten(self, ts, recommendations):
  4706   1411200    2524663.0      1.8     11.4          ts.state = "forgotten"
  4707   1411200     791275.0      0.6      3.6          key = ts.key
  4708   1411200     901457.0      0.6      4.1          for dts in ts.dependents:
  4709                                                       dts.has_lost_dependencies = True
  4710                                                       dts.dependencies.remove(ts)
  4711                                                       dts.waiting_on.discard(ts)
  4712                                                       if dts.state not in ("memory", "erred"):
  4713                                                           # Cannot compute task anymore
  4714                                                           recommendations[dts.key] = "forgotten"
  4715   1411200     903820.0      0.6      4.1          ts.dependents.clear()
  4716   1411200     888673.0      0.6      4.0          ts.waiters.clear()
  4717                                           
  4718   3864960    2076673.0      0.5      9.4          for dts in ts.dependencies:
  4719   2453760    2557095.0      1.0     11.6              dts.dependents.remove(ts)
  4720   2453760    1494149.0      0.6      6.8              s = dts.waiters
  4721   2453760    2084819.0      0.8      9.4              s.discard(ts)
  4722   2453760    1469916.0      0.6      6.6              if not dts.dependents and not dts.who_wants:
  4723                                                           # Task not needed anymore
  4724   1382400     687383.0      0.5      3.1                  assert dts is not ts
  4725   1382400    1003554.0      0.7      4.5                  recommendations[dts.key] = "forgotten"
  4726   1411200     887282.0      0.6      4.0          ts.dependencies.clear()
  4727   1411200     877860.0      0.6      4.0          ts.waiting_on.clear()
  4728                                           
  4729   1411200     766537.0      0.5      3.5          if ts.who_has:
  4730     28800      59992.0      2.1      0.3              ts.group.nbytes_in_memory -= ts.get_nbytes()
  4731                                           
  4732   1440000     845619.0      0.6      3.8          for ws in ts.who_has:
  4733     28800      37784.0      1.3      0.2              ws.has_what.remove(ts)
  4734     28800      42757.0      1.5      0.2              ws.nbytes -= ts.get_nbytes()
  4735     28800      21123.0      0.7      0.1              w = ws.address
  4736     28800      33648.0      1.2      0.2              if w in self.workers:  # in case worker has died
  4737     57600     326948.0      5.7      1.5                  self.worker_send(
  4738     28800      29989.0      1.0      0.1                      w, {"op": "delete-data", "keys": [key], "report": False}
  4739                                                           )
  4740   1411200     807894.0      0.6      3.7          ts.who_has.clear()

Total time: 5.82323 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_memory_forgotten at line 4742

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4742                                               @profiler
  4743                                               def transition_memory_forgotten(self, key):
  4744     28800      18926.0      0.7      0.3          try:
  4745     28800      22296.0      0.8      0.4              ts = self.tasks[key]
  4746                                           
  4747     28800      19927.0      0.7      0.3              if self.validate:
  4748                                                           assert ts.state == "memory"
  4749                                                           assert not ts.processing_on
  4750                                                           assert not ts.waiting_on
  4751                                                           if not ts.run_spec:
  4752                                                               # It's ok to forget a pure data task
  4753                                                               pass
  4754                                                           elif ts.has_lost_dependencies:
  4755                                                               # It's ok to forget a task with forgotten dependencies
  4756                                                               pass
  4757                                                           elif not ts.who_wants and not ts.waiters and not ts.dependents:
  4758                                                               # It's ok to forget a task that nobody needs
  4759                                                               pass
  4760                                                           else:
  4761                                                               assert 0, (ts,)
  4762                                           
  4763     28800      17623.0      0.6      0.3              recommendations = {}
  4764                                           
  4765     28800      19738.0      0.7      0.3              if ts.actor:
  4766                                                           for ws in ts.who_has:
  4767                                                               ws.actors.discard(ts)
  4768                                           
  4769     28800    4672777.0    162.2     80.2              self._propagate_forgotten(ts, recommendations)
  4770                                           
  4771     28800     679284.0     23.6     11.7              self.report_on_key(ts=ts)
  4772     28800     356189.0     12.4      6.1              self.remove_key(key)
  4773                                           
  4774     28800      16466.0      0.6      0.3              return recommendations
  4775                                                   except Exception as e:
  4776                                                       logger.exception(e)
  4777                                                       if LOG_PDB:
  4778                                                           import pdb
  4779                                           
  4780                                                           pdb.set_trace()
  4781                                                       raise

Total time: 76.9497 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition_released_forgotten at line 4783

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4783                                               @profiler
  4784                                               def transition_released_forgotten(self, key):
  4785   1382400     643289.0      0.5      0.8          try:
  4786   1382400     742005.0      0.5      1.0              ts = self.tasks[key]
  4787                                           
  4788   1382400     696688.0      0.5      0.9              if self.validate:
  4789                                                           assert ts.state in ("released", "erred")
  4790                                                           assert not ts.who_has
  4791                                                           assert not ts.processing_on
  4792                                                           assert not ts.waiting_on, (ts, ts.waiting_on)
  4793                                                           if not ts.run_spec:
  4794                                                               # It's ok to forget a pure data task
  4795                                                               pass
  4796                                                           elif ts.has_lost_dependencies:
  4797                                                               # It's ok to forget a task with forgotten dependencies
  4798                                                               pass
  4799                                                           elif not ts.who_wants and not ts.waiters and not ts.dependents:
  4800                                                               # It's ok to forget a task that nobody needs
  4801                                                               pass
  4802                                                           else:
  4803                                                               assert 0, (ts,)
  4804                                           
  4805   1382400     651811.0      0.5      0.8              recommendations = {}
  4806   1382400   36808645.0     26.6     47.8              self._propagate_forgotten(ts, recommendations)
  4807                                           
  4808   1382400   23702470.0     17.1     30.8              self.report_on_key(ts=ts)
  4809   1382400   13073145.0      9.5     17.0              self.remove_key(key)
  4810                                           
  4811   1382400     631679.0      0.5      0.8              return recommendations
  4812                                                   except Exception as e:
  4813                                                       logger.exception(e)
  4814                                                       if LOG_PDB:
  4815                                                           import pdb
  4816                                           
  4817                                                           pdb.set_trace()
  4818                                                       raise

Total time: 1000.16 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transition at line 4820

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4820                                               @profiler
  4821                                               def transition(self, key, finish, *args, **kwargs):
  4822                                                   """Transition a key from its current state to the finish state
  4823                                           
  4824                                                   Examples
  4825                                                   --------
  4826                                                   >>> self.transition('x', 'waiting')
  4827                                                   {'x': 'processing'}
  4828                                           
  4829                                                   Returns
  4830                                                   -------
  4831                                                   Dictionary of recommendations for future transitions
  4832                                           
  4833                                                   See Also
  4834                                                   --------
  4835                                                   Scheduler.transitions: transitive version of this function
  4836                                                   """
  4837   7027200    5709607.0      0.8      0.6          try:
  4838   7027200    5402490.0      0.8      0.5              try:
  4839   7027200    7729242.0      1.1      0.8                  ts = self.tasks[key]
  4840                                                       except KeyError:
  4841                                                           return {}
  4842   7027200    7708527.0      1.1      0.8              start = ts.state
  4843   7027200    5630351.0      0.8      0.6              if start == finish:
  4844                                                           return {}
  4845                                           
  4846   7027200    5799880.0      0.8      0.6              if self.plugins:
  4847   7027200    8286459.0      1.2      0.8                  dependents = set(ts.dependents)
  4848   7027200    7527204.0      1.1      0.8                  dependencies = set(ts.dependencies)
  4849                                           
  4850   7027200    7504276.0      1.1      0.8              if (start, finish) in self._transitions:
  4851   7027200    6725502.0      1.0      0.7                  func = self._transitions[start, finish]
  4852   7027200  793254887.0    112.9     79.3                  recommendations = func(key, *args, **kwargs)
  4853                                                       elif "released" not in (start, finish):
  4854                                                           func = self._transitions["released", finish]
  4855                                                           assert not args and not kwargs
  4856                                                           a = self.transition(key, "released")
  4857                                                           if key in a:
  4858                                                               func = self._transitions["released", a[key]]
  4859                                                           b = func(key)
  4860                                                           a = a.copy()
  4861                                                           a.update(b)
  4862                                                           recommendations = a
  4863                                                           start = "released"
  4864                                                       else:
  4865                                                           raise RuntimeError(
  4866                                                               "Impossible transition from %r to %r" % (start, finish)
  4867                                                           )
  4868                                           
  4869   7027200    8510810.0      1.2      0.9              finish2 = ts.state
  4870   7027200   11770715.0      1.7      1.2              self.transition_log.append((key, start, finish2, recommendations, time()))
  4871   7027200    6004088.0      0.9      0.6              if self.validate:
  4872                                                           logger.debug(
  4873                                                               "Transitioned %r %s->%s (actual: %s).  Consequence: %s",
  4874                                                               key,
  4875                                                               start,
  4876                                                               finish2,
  4877                                                               ts.state,
  4878                                                               dict(recommendations),
  4879                                                           )
  4880   7027200    5881056.0      0.8      0.6              if self.plugins:
  4881                                                           # Temporarily put back forgotten key for plugin to retrieve it
  4882   7027200    7992841.0      1.1      0.8                  if ts.state == "forgotten":
  4883   1411200    1022709.0      0.7      0.1                      try:
  4884   1411200    1206869.0      0.9      0.1                          ts.dependents = dependents
  4885   1411200    1137310.0      0.8      0.1                          ts.dependencies = dependencies
  4886                                                               except KeyError:
  4887                                                                   pass
  4888   1411200    1452184.0      1.0      0.1                      self.tasks[ts.key] = ts
  4889  14054400   14360653.0      1.0      1.4                  for plugin in list(self.plugins):
  4890   7027200    5508030.0      0.8      0.6                      try:
  4891   7027200   45854805.0      6.5      4.6                          plugin.transition(key, start, finish2, *args, **kwargs)
  4892                                                               except Exception:
  4893                                                                   logger.info("Plugin failed with exception", exc_info=True)
  4894   7027200    8070193.0      1.1      0.8                  if ts.state == "forgotten":
  4895   1411200    1282591.0      0.9      0.1                      del self.tasks[ts.key]
  4896                                           
  4897   7027200    7894107.0      1.1      0.8              if ts.state == "forgotten" and ts.group.name in self.task_groups:
  4898                                                           # Remove TaskGroup if all tasks are in the forgotten state
  4899   1411200    1136694.0      0.8      0.1                  tg = ts.group
  4900   1411200    4508934.0      3.2      0.5                  if not any(tg.states.get(s) for s in ALL_TASK_STATES):
  4901       200        293.0      1.5      0.0                      ts.prefix.groups.remove(tg)
  4902       200        168.0      0.8      0.0                      del self.task_groups[tg.name]
  4903                                           
  4904   7027200    5287323.0      0.8      0.5              return recommendations
  4905                                                   except Exception as e:
  4906                                                       logger.exception("Error transitioning %r from %r to %r", key, start, finish)
  4907                                                       if LOG_PDB:
  4908                                                           import pdb
  4909                                           
  4910                                                           pdb.set_trace()
  4911                                                       raise

Total time: 817.688 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: transitions at line 4913

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4913                                               @profiler
  4914                                               def transitions(self, recommendations):
  4915                                                   """Process transitions until none are left
  4916                                           
  4917                                                   This includes feedback from previous transitions and continues until we
  4918                                                   reach a steady state
  4919                                                   """
  4920   1440025    1168714.0      0.8      0.1          keys = set()
  4921   1440025    1147518.0      0.8      0.1          recommendations = recommendations.copy()
  4922   7056025    2859935.0      0.4      0.3          while recommendations:
  4923   5616000    3438861.0      0.6      0.4              key, finish = recommendations.popitem()
  4924   5616000    3042869.0      0.5      0.4              keys.add(key)
  4925   5616000  801454540.0    142.7     98.0              new = self.transition(key, finish)
  4926   5616000    3846267.0      0.7      0.5              recommendations.update(new)
  4927                                           
  4928   1440025     729271.0      0.5      0.1          if self.validate:
  4929                                                       for key in keys:
  4930                                                           self.validate_key(key)

Total time: 46.1939 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: check_idle_saturated at line 4965

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4965                                               @profiler
  4966                                               def check_idle_saturated(self, ws, occ=None):
  4967                                                   """Update the status of the idle and saturated state
  4968                                           
  4969                                                   The scheduler keeps track of workers that are ..
  4970                                           
  4971                                                   -  Saturated: have enough work to stay busy
  4972                                                   -  Idle: do not have enough work to stay busy
  4973                                           
  4974                                                   They are considered saturated if they both have enough tasks to occupy
  4975                                                   all of their threads, and if the expected runtime of those tasks is
  4976                                                   large enough.
  4977                                           
  4978                                                   This is useful for load balancing and adaptivity.
  4979                                                   """
  4980   2870428   18726772.0      6.5     40.5          if self.total_nthreads == 0 or ws.status == Status.closed:
  4981                                                       return
  4982   2870428    1926510.0      0.7      4.2          if occ is None:
  4983   2846678    1856335.0      0.7      4.0              occ = ws.occupancy
  4984   2870428    1735161.0      0.6      3.8          nc = ws.nthreads
  4985   2870428    2180278.0      0.8      4.7          p = len(ws.processing)
  4986                                           
  4987   2870428    2376943.0      0.8      5.1          avg = self.total_occupancy / self.total_nthreads
  4988                                           
  4989   2870428    2758911.0      1.0      6.0          if p < nc or occ / nc < avg / 2:
  4990    147248     336067.0      2.3      0.7              self.idle.add(ws)
  4991    147248     174726.0      1.2      0.4              self.saturated.discard(ws)
  4992                                                   else:
  4993   2723180    6058109.0      2.2     13.1              self.idle.discard(ws)
  4994                                           
  4995   2723180    2567536.0      0.9      5.6              pending = occ * (p - nc) / p / nc
  4996   2723180    2277894.0      0.8      4.9              if p > nc and pending > 0.4 and pending > 1.9 * avg:
  4997     33700      39976.0      1.2      0.1                  self.saturated.add(ws)
  4998                                                       else:
  4999   2689480    3178696.0      1.2      6.9                  self.saturated.discard(ws)

Total time: 20.1571 s
File: /Users/jkirkham/Developer/distributed/distributed/scheduler.py
Function: decide_worker at line 5556

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  5556                                           @profiler
  5557                                           def decide_worker(ts, all_workers, valid_workers, objective):
  5558                                               """
  5559                                               Decide which worker should take task *ts*.
  5560                                           
  5561                                               We choose the worker that has the data on which *ts* depends.
  5562                                           
  5563                                               If several workers have dependencies then we choose the less-busy worker.
  5564                                           
  5565                                               Optionally provide *valid_workers* of where jobs are allowed to occur
  5566                                               (if all workers are allowed to take the task, pass True instead).
  5567                                           
  5568                                               If the task requires data communication because no eligible worker has
  5569                                               all the dependencies already, then we choose to minimize the number
  5570                                               of bytes sent between workers.  This is determined by calling the
  5571                                               *objective* function.
  5572                                               """
  5573   1376640     976544.0      0.7      4.8      deps = ts.dependencies
  5574   1376640    4267110.0      3.1     21.2      assert all(dts.who_has for dts in deps)
  5575   1376640     972274.0      0.7      4.8      if ts.actor:
  5576                                                   candidates = set(all_workers)
  5577                                               else:
  5578   1376640    4627572.0      3.4     23.0          candidates = {ws for dts in deps for ws in dts.who_has}
  5579   1376640     933553.0      0.7      4.6      if valid_workers is True:
  5580   1376640     879762.0      0.6      4.4          if not candidates:
  5581                                                       candidates = set(all_workers)
  5582                                               else:
  5583                                                   candidates &= valid_workers
  5584                                                   if not candidates:
  5585                                                       candidates = valid_workers
  5586                                                       if not candidates:
  5587                                                           if ts.loose_restrictions:
  5588                                                               return decide_worker(ts, all_workers, True, objective)
  5589                                                           else:
  5590                                                               return None
  5591   1376640     814480.0      0.6      4.0      if not candidates:
  5592                                                   return None
  5593                                           
  5594   1376640    1122617.0      0.8      5.6      if len(candidates) == 1:
  5595   1286498    1162310.0      0.9      5.8          return first(candidates)
  5596                                           
  5597     90142    4400894.0     48.8     21.8      return min(candidates, key=objective)

@jakirkham
Copy link
Collaborator Author

Briefly explored some light type annotations of extract_serialize over the break. This was an easier one to profile and see the effects from changes in IPython since we don't need to run the full profile. Was able to get a nearly 2x speed up just doing that. Submitted this as PR ( dask/distributed#4283 ), which has since been reviewed and merged.

Also tried the same thing with check_idle_saturated today. This required a bit of guessing as to what the types were, which took a few tries to nail down. Seem to have got something working in PR ( dask/distributed#4289 ). There was at least one custom object from an external library, which couldn't be typed. It's a little tricky to see how much that helps atm as the Status.__eq__ call still takes a fair bit of time ( dask/distributed#4270 ).

Additionally Ben and I discussed earlier how we want to handle Cythonization in the benchmarks. Put together PR ( #33 ). Though a few follow on PRs were needed ( #34 ) ( #35 ) ( #36 ), it seems like we have something working on that front now.

@jakirkham
Copy link
Collaborator Author

Have taken a pass at annotating one of the *State objects as well, ClientState ( dask/distributed#4290 ). As these are pretty heavily used in all of the transition_* functions, any further Cythonization improvements will depend on well optimized implementations of the *State objects.

@jakirkham
Copy link
Collaborator Author

Just to update this thread a bit, we have since gone through and annotated WorkerState ( dask/distributed#4294 ) and all Task* objects (including TaskState) ( dask/distributed#4302 ). So this covers all classes we can meaningfully annotate atm.

@jakirkham
Copy link
Collaborator Author

Have started looking at how best to split the Scheduler class into two separate classes. One which we can Cythonize managing the transitions and relevant state. The other retaining most of the Scheduler's current content (async methods, interactions with Client(s) and Workers, communication, etc.).

Think most of the transitions with the exception of transition_waiting_processing. Since transition_waiting_processing is a notably slow transition, looked into how we might speed this up in the interim. Based on previous profiling, we have a good sense of where to target optimizations to improve this path. Submitted PRs to optimize get_comm_cost ( dask/distributed#4328 ), transition_waiting_processing ( dask/distributed#4330 ), valid_workers ( dask/distributed#4329 ), send_task_to_worker ( dask/distributed#4331 ), and decide_worker ( dask/distributed#4332 ). All except the last one are merged. The only other things to cover here are check_idle_saturated, which we addressed previously ( dask/distributed#4289 ), and Status.__eq__ ( dask/distributed#4270 ), which is waiting on a dask-jobqueue release. So that seems like as good as we can do there for the moment.

Will update once we have a rough implementation of the refactored Scheduler to look at.

@jakirkham
Copy link
Collaborator Author

Did a bit more annotation and optimization along the communication path way with PR ( dask/distributed#4341 ). Though I think that might be as good as that particular path gets for the moment. As much of the communication bits in higher level Python, they are not particularly amenable to Cythonization. Instead I think our best bet there is to defer communication until it is needed and try to handle as much of that together as possible.

In an attempt to both handle the communication problem mentioned above and to make it easier to refactor out the transition functions into a separate Cython extension class for more thorough optimization, submitted PR ( dask/distributed#4343 ). This tries to move all communication to before and after transition functions as opposed to within them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants