Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The anonymous-telemetry subsystem ~doubles Pants no-op latency when non-async callbacks are enabled. #11833

Closed
jsirois opened this issue Apr 2, 2021 · 10 comments · Fixed by #12209 or #12392
Assignees

Comments

@jsirois
Copy link
Contributor

jsirois commented Apr 2, 2021

$ hyperfine -w 2 './pants --no-anonymous-telemetry-enabled -V' './pants -V'
Benchmark #1: ./pants --no-anonymous-telemetry-enabled -V
  Time (mean ± σ):     697.6 ms ±   9.0 ms    [User: 486.5 ms, System: 43.7 ms]
  Range (min … max):   684.8 ms … 710.4 ms    10 runs
 
Benchmark #2: ./pants -V
  Time (mean ± σ):      1.317 s ±  0.087 s    [User: 485.8 ms, System: 50.3 ms]
  Range (min … max):    1.257 s …  1.553 s    10 runs
 
Summary
  './pants --no-anonymous-telemetry-enabled -V' ran
    1.89 ± 0.13 times faster than './pants -V'

Since user and system times did not budge, this is all IO latency.

@jsirois
Copy link
Contributor Author

jsirois commented Apr 2, 2021

There is a tangle of issues making this slow.

Even though we mark the anonymous telemetry workunit callback as having an async finish:

@property
def can_finish_async(self) -> bool:
# Because we don't log anything, it's safe to finish in the background.
return True

We still block on workunit callback completion since there are other callbacks in the system - not all capable of async finish:

if self.block_until_complete:
super().join()

# TODO: Have a thread per callback so that some callbacks can always finish async even
# if others must be finished synchronously.
self.block_until_complete = not pantsd or any(
callback.can_finish_async is False for callback in self.callbacks
)

That aside, the anonymous telemetry callback is not actually asynchronous anyhow since:

# We don't want to spawn a thread in the engine, and we're
# already running in a background thread in pantsd.
mode=Modes.SYNCHRONOUS,

Some added timing measurement shows this join takes ~550ms with telemetry turned on.

@Eric-Arellano
Copy link
Contributor

Are you using BuildSense / Toolchain plugin? If so, that is currently not async, so you are right that it will cause this to be sync.

I use this to disable the plugin:

export PANTS_VERIFY_CONFIG=false
export PANTS_PLUGINS="-['toolchain.pants.plugin==0.7.0']"
export PANTS_BACKEND_PACKAGES="-['toolchain.pants.auth', 'toolchain.pants.buildsense', 'toolchain.pants.common']"

@jsirois
Copy link
Contributor Author

jsirois commented Apr 2, 2021

Are you using BuildSense / Toolchain plugin?

No. See the hyperfine, but the slowdown is isolated with just toggling the --no-anonymous-telemetry-enabled flag.

@jsirois
Copy link
Contributor Author

jsirois commented Apr 2, 2021

And - there is plenty more than the tc plugin in our WorkunitCallback list that is not async.

@Eric-Arellano
Copy link
Contributor

No. See the hyperfine, but the slowdown is isolated with just toggling the --no-anonymous-telemetry-enabled flag.

Yes, but whether BuildSense is enabled impacts if --anonymous-telemetry will be async or not. Note that BuildSense is enabled by default in pantsbuild/pants, so you would have to go out of your way to disable it.

there is plenty more than the tc plugin in our WorkunitCallback list that is not async.

Nack, the only other one should be --stats-log, which is set to be async if the option is not configured. Are you seeing other things making this not true?

--

It would be good to try this again with BuildSense disabled to see the impact when it can complete async. We knew it would impact non-async times - the more important case is when async is possible.

@jsirois
Copy link
Contributor Author

jsirois commented Apr 2, 2021

OK. What you say appears to be true. I am using BuildSense since these are measurements in the Pants repo where that is enabled. So I think my conclusion and this bug still stand. I'll highlight from the tangle I pointed out above:

# TODO: Have a thread per callback so that some callbacks can always finish async even
# if others must be finished synchronously.
self.block_until_complete = not pantsd or any(
callback.can_finish_async is False for callback in self.callbacks
)

That TODO is on-target. Not having it implemented has this real-world impact.

@Eric-Arellano
Copy link
Contributor

Agreed. Although note that the upcoming Toolchain plugin is marked async, so this won't be a problem then.

@stuhood
Copy link
Member

stuhood commented Apr 16, 2021

#11618 is fixed (and I'll cherrypick it to 2.4.x), so the TC plugin should be unblocked to be async in the next release.

It's possible that when that lands we should close this ticket, and open a new one for the TODO above.

@stuhood stuhood changed the title The anonymous-telemetry subsystem ~doubles Pants no-op latency. The anonymous-telemetry subsystem ~doubles Pants no-op latency when non-async callbacks are enabled. Apr 16, 2021
@stuhood stuhood linked a pull request Jun 16, 2021 that will close this issue
@stuhood
Copy link
Member

stuhood commented Jun 16, 2021

#12209 will fix this issue: I've updated the title of #10092 to account for the remainder of the above TODO.

@asherf
Copy link
Member

asherf commented Jul 20, 2021

When running in CI, and a particular check fails (lint/typecheck) the pants process exits when a non-zero exist code, which causes CI to kill the container and not allowing pantsd to continue to run and allow plugins to complete async.

stuhood added a commit that referenced this issue Jul 21, 2021
… by default in containers (#12392)

When run inside of a container that exits as soon as the client returns, async completion of workunit handlers can lead to loss of metrics. Since async completion has significant performance benefits, we disable it conditionally based on whether it is likely that we are running inside of a container.

Fixes #11833.

[ci skip-rust]
stuhood added a commit to stuhood/pants that referenced this issue Jul 22, 2021
… by default in containers (pantsbuild#12392)

When run inside of a container that exits as soon as the client returns, async completion of workunit handlers can lead to loss of metrics. Since async completion has significant performance benefits, we disable it conditionally based on whether it is likely that we are running inside of a container.

Fixes pantsbuild#11833.

[ci skip-rust]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this issue Jul 22, 2021
… by default in containers (pantsbuild#12392)

When run inside of a container that exits as soon as the client returns, async completion of workunit handlers can lead to loss of metrics. Since async completion has significant performance benefits, we disable it conditionally based on whether it is likely that we are running inside of a container.

Fixes pantsbuild#11833.

[ci skip-rust]
Eric-Arellano pushed a commit that referenced this issue Jul 22, 2021
… by default in containers (Cherry-pick of #12392) (#12398)

When run inside of a container that exits as soon as the client returns, async completion of workunit handlers can lead to loss of metrics. Since async completion has significant performance benefits, we disable it conditionally based on whether it is likely that we are running inside of a container.

Fixes #11833.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano pushed a commit that referenced this issue Jul 22, 2021
… by default in containers (Cherry-pick of #12392) (#12399)

When run inside of a container that exits as soon as the client returns, async completion of workunit handlers can lead to loss of metrics. Since async completion has significant performance benefits, we disable it conditionally based on whether it is likely that we are running inside of a container.

Fixes #11833.

[ci skip-rust]
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants