Port to tokio 0.2, and to stdlib futures for fs and task_executor #9071

stuhood · 2020-02-05T06:43:34Z

Problem

We're on an older version of tokio, which doesn't smoothly support usage of async/await.

Solution

Switch to tokio 0.2, which supports directly spawning and awaiting (via its macros) stdlib futures, which is an important step toward being able to utilize async/await more broadly. Additionally, port the fs and task_executor crates to stdlib futures.

Finally, transitively fixup for the new APIs: in particular, since both task_executor and tokio now consume stdlib futures to spawn tasks, we switch all relevant tests and main methods to use the tokio::main and tokio::test macros, which annotate async methods and spawn a runtime to allow for awaiting futures inline.

Result

Progress toward more usage of async/await!

illicitonion

Very exciting!!

illicitonion · 2020-02-05T12:00:48Z

src/rust/engine/engine_cffi/src/lib.rs

-
-    scheduler.core.executor.spawn_on_io_pool(work_future).wait()
+    })
+    .wait()


Why is this changing from blocking to not? Is it because the blocking-ness is pushed down onto the actual-blocking parts? IIRC last time I tried to do that, there was a massive perf regression, which is why this is how it was... 🤞

This callsite was redundant: wait() on a 0.1 future is itself a blocking method that uses the current thread as the executor for the future. It's gone in stdlib futures in favor of explicit calls to ${MyRuntime}.block_on(..).

Oooor I haven't run tests that have covered this yet, and it will complain about one of the nested spawn calls not occurring in the context of a Runtime! We shall see, heh. The patch currently runs ./pants list successfully, but some of the rust tests haven't been ported, and there are likely to be a few "no Runtime" or "too many Runtimes" type errors at various callsites.

illicitonion · 2020-02-05T12:03:55Z

src/rust/engine/fs/src/lib.rs

-    }))
+    self
+      .executor
+      .spawn_blocking(move || vfs.scandir_sync(&dir_relative_to_root))


It will be interesting to see if this regresses perf in the way that when I tried to use the blocking stuff last time it did... 🤞

spawn_blocking is a different API, afaict. It's basically: "please literally dedicate me a thread". So it should be a 1:1 replacement for the CpuPool we used to have here.

The tokio-fs blocking APIs are definitely still a thing, but having removed one moving piece is nice.

illicitonion · 2020-02-05T12:04:59Z

src/rust/engine/fs/src/lib.rs

-            format!("Failed to read file {:?}: {}", path_abs, e),
-          )
+    let executor = self.executor.clone();
+    Box::pin(async move {


I guess if it's how it is, it's how it is, but I'm kind of surprised at the amount of boilerplate to run a closure asynchronously - Box::pin(async move { spawn_blocking( move || { seems like more boilerplate than I'd expect :(

This is the boilerplate required to "box" an async block (note: not a closure). The .boxed() combinator was reintroduced, so I think that this could maybe also be written:

let f = async move { .. }; f.boxed()

...but chaining a combinator off a block requires a local variable... etc.

See the note in ASYNC.md (and all of the callsites in the patch that reference it: this one included): the explicit use of BoxFuture is a stopgap to improve the futures 0.1/stdlib futures boundary (because an async fn can actually take references without cloning them, but that requires them to stay alive on the caller's stack (not the case with 0.1 futures): once callers have been ported, we should be able to remove almost all explicit usage of BoxFuture.

illicitonion · 2020-02-05T12:08:19Z

src/rust/engine/fs/src/lib.rs

@@ -788,25 +808,35 @@ impl PosixFS {
    }
  }

-  pub fn stat_sync(&self, relative_path: PathBuf) -> Result<Stat, io::Error> {
+  pub fn stat_sync(&self, relative_path: PathBuf) -> Result<Option<Stat>, io::Error> {


This seems like a surprising API change - what about asyncifying things means this is easier to couple than do separately?

Very thorough review! Heh.

I think that before I committed to async-await-ifying the calling methods, this simplified the combinators. Could undo it. But it also avoided leaking implementation details (avoiding having callers interpret errors where possible is good, imo).

src/rust/engine/task_executor/src/lib.rs

pierrechevalier83

I love seeing these changes moving us ever closer to a desirable future 😄
I'd like an answer to my comment in collect_from before approving in case it's unintentional.

pierrechevalier83 · 2020-02-12T11:18:08Z

src/rust/engine/fs/src/glob_matching.rs

  }

  ///
  /// Recursively expands PathGlobs into PathStats while applying excludes.
  ///
-  fn expand(&self, path_globs: PathGlobs) -> BoxFuture<Vec<PathStat>, E> {
-    GlobMatchingImplementation::expand(self, path_globs)
+  /// TODO: See the note on references in ASYNC.md.


Maybe add a summary to let the reader decide whether it's worth checking out ASYNC.md or not.
Something like this:

TODO: Turn into an async fn once all call sites are ready. See note on references in ASYNC.md for more details.

Also applies to other places in the commit.

pierrechevalier83 · 2020-02-12T11:24:52Z

src/rust/engine/fs/src/glob_matching.rs


-    self


The diff is a bit hard to read, but it's nice to see that async/await is already used to make the code incrementally a tiny bit more readable than prior to the change (breaking massive chains into multiple statements with.await? and dropping some levels of nesting). It's a taste of more refactorings that will be possible with this. 😄 Same comment for a few diffs in the rest of the commit.

pierrechevalier83 · 2020-02-12T12:08:01Z

src/rust/engine/process_execution/src/local.rs

+  ) -> futures::future::BoxFuture<Result<ChildResults, String>> {
+    let mut stdout = BytesMut::with_capacity(8192);
+    let mut stderr = BytesMut::with_capacity(8192);
+    let mut exit_code = 1;


Are you changing the default exit code from 0 to 1 here? Is it on purpose? If so, I would probably make it its own commit to explain the intent in the commit message.

Yes, this changes the default... good eye. While converting this, I noticed that if for whatever reason we exited this loop without ever receiving an exit code from the process, we would consider that to be a success. That felt error prone, but not worth a separate PR.

not worth a separate PR

In general, when doing a large refactoring like this, I would bias for doing unrelated cleanups at least in their own commits (but maybe in their own PRs) so they're more visible to reviewers. If this change causes any unforeseen regression, this is more likely to cause it than the rest of the PR.
Anyway, I'm happy with your reasoning that it's better to fail by default 😄

pierrechevalier83

Looks great to me 😍

### Problem It has been a long time since we've updated some of our dependencies! Additionally, during the course of #9071, I encountered the need to run `cargo update`, but didn't want to land that update at the same time as the change to a new runtime. ### Solution Run `cargo update`, and fix one resulting library incompatibility. ### Result Modern(-ish) rust deps: `cargo update` will only update within the ranges specified in `Cargo.toml`.

…o 0.2 and stdlib futures.

### Problem #9395 occurs within our `grpcio` dependency, which is quite stale. Although #9395 is more likely to be related to changes in our executor (#9071) or to our transitive rust dependencies (#9122), getting on a more recent version of the `grpcio` crate _might_ resolve the issue, or make it easier to report an issue. ### Solution Bump to `0.5.1` with one patch (tikv/grpc-rs#457) pulled forward from our previous fork. [ci skip-jvm-tests] # No JVM changes made.

…ntsbuild#9071) We're on an older version of tokio, which doesn't smoothly support usage of async/await. Switch to tokio 0.2, which supports directly spawning and awaiting (via its macros) stdlib futures, which is an important step toward being able to utilize async/await more broadly. Additionally, port the `fs` and `task_executor` crates to stdlib futures. Finally, transitively fixup for the new APIs: in particular, since both `task_executor` and `tokio` now consume stdlib futures to spawn tasks, we switch all relevant tests and main methods to use the `tokio::main` and `tokio::test` macros, which annotate async methods and spawn a runtime to allow for `await`ing futures inline. Progress toward more usage of async/await!

@hrfuller

* Port to tokio 0.2, and to stdlib futures for fs and task_executor (#9071) We're on an older version of tokio, which doesn't smoothly support usage of async/await. Switch to tokio 0.2, which supports directly spawning and awaiting (via its macros) stdlib futures, which is an important step toward being able to utilize async/await more broadly. Additionally, port the `fs` and `task_executor` crates to stdlib futures. Finally, transitively fixup for the new APIs: in particular, since both `task_executor` and `tokio` now consume stdlib futures to spawn tasks, we switch all relevant tests and main methods to use the `tokio::main` and `tokio::test` macros, which annotate async methods and spawn a runtime to allow for `await`ing futures inline. Progress toward more usage of async/await! * Add notify fs watcher to engine. (#9318) * Use the notify crate to implement an `InvalidationWatcher` for Graph operations. * Make watch async, and watcher log to pantsd.log. Relativize paths returned from notify to the build_root. Refactor invalidate method to be an associated method on the InvalidationWatcher. * Respond to feedback. * Use spawn on io pool instead of custom future impl * Write python fs tests * Relativize paths to invalidate to build root * invalidate nodes with parent paths. * Comments * Add rust tests. Make some things public so we can use them in tests. Use canonical path to build root for relativizing changed paths. * Refactor Python tests. Return watch errors as core::Failure all the way to user. Move task executor onto invalidation watcher. Move test_support trait impl into test_support mod. * use futures lock on watcher * Platform specific watching behavior. On Darwin recursively watch the build root at startup. On Linux watch individual directory roots. Co-authored-by: Stu Hood <stuhood@gmail.com> * Ignore notify events for pants_ignore patterns. (#9406) * Create a git ignorer on the context object. Adjust all call sites which create a posix fs to pass in an ignorer. * Ignore fsevents from files that match pants_ignore patterns. * Always pass is_dir = false to ignorer to avoid stat-ing every path the event watch thread sees. * Add a feature gate to disable the engine fs watcher introduced in #9318 (#9416) * Add a feature gate to disable the engine fs watcher introduced in #9318 by default, to mitigate issues seen in #9415 until a fix is in place. * Don't rerun uncachable nodes if they are dirtied while running. (#9452) * Don't rerun uncachable nodes if they are dirtied while running. - Retry dependencies of uncacheable nodes a few times to get a result until we are exhausted from trying too many times. - Bubble uncacheable node retry errors up to the user, tell them things were chaning to much. - Don't propagate dirtiness past uncacheable nodes when invalidating from changed roots. Otherwise dirty dependents of uncacheable nodes will need to re-run. * enable the engine fs watcher by default, now that it won't cause issues. Remove execution option override from tests. * use reference to self in stop_walk_predicate closure * invalidate often enough that test doesn't flake * Add a flag to prevent the FsEventService and watchman from starting (#9487) * add --watchman-enable flag * disable watchman when flag is false * Don't wait for the initial watchman event if we aren't using watchman. * check invalidation watcher liveness from scheduler service * Extract a `watch` crate. (#9635) The `watch` module directly accesses the `engine` crate's `Graph`, which makes it more challenging to test. Extract a `watch` crate which is used via an `trait Invalidatable` which is implemented for the engine's `Graph`, as well as independently in tests. [ci skip-jvm-tests] * Simplify Scheduler::execute and unify Graph retry (#9674) Both the `Graph` and the `Scheduler` implemented retry for requested Nodes, but the `Scheduler` implementation was pre-async-await and much more complicated. Unify the retry implementations into `Graph::get` (either roots or uncacheable nodes are retried), and simplify the `Scheduler`'s loop down to: ``` let maybe_display_handle = Self::maybe_display_initialize(&session); let result = loop { if let Ok(res) = receiver.recv_timeout(refresh_interval) { break Ok(res); } else if let Err(e) = Self::maybe_display_render(&session, &mut tasks) { break Err(e); } }; Self::maybe_display_teardown(session, maybe_display_handle); result ``` A single, more modern retry implementation (thanks @hrfuller!), and a cleaner `Scheduler::execute` loop. * Move file invalidation handling to rust (#9636) A few different kinds of file watching span the boundary between the `SchedulerService` and `FSEventService`: 1. pantsd invalidation globs - how `pantsd` detects that its implementing code or config has changed 2. pidfile - watches `pantsd`'s pidfile to ensure that the daemon exits if it loses exclusivity 3. graph invalidation - any files changing in the workspace should invalidate the engine's `Graph` 4. `--loop` - implemented directly in the `SchedulerService` Because of the semi-cyclic nature of the relationship between the `SchedulerService` and `FSEventService`, it's challenging to understand the interplay of these usecases. And, unsurprisingly, that lead to the `notify` crate implementation only satisfying one of them. The fundamental change in this PR is to add support for two new parameters to engine executions which are implemented by the `Graph`: * `poll: bool` - When `poll` is enabled, a `product_request` will wait for the requested Nodes to have changed from their last-observed values before returning. When a poll waits, an optional `poll_delay` is applied before it returns to "debounce" polls. * `timeout: Optional[Duration]` - When a `timeout` is set, a `product_request` will wait up to the given duration for the requested Node(s) to complete (including any time `poll`ing). These features are then used by: * `--loop`: uses `poll` (with a `poll_delay`, but without a `timeout`) to immediately re-run a `Goal` when its inputs have changed. * invalidation globs and pidfile watching: use `poll` (with no `poll_delay`) and `timeout` to block their `SchedulerService` thread and watch for changes to those files. The `FSEventService` and `SchedulerService` are decoupled, and each now interacts only with the `Scheduler`: `FSEventService` to push `watchman` events to the `Graph`, and the `SchedulerService` to pull invalidation information from the `Graph`. Because all events now flow through the `Graph`, the `notify` crate has reached feature parity with `watchman`. In followup changes we can remove the experimental flag, disable `watchman` (and thus the `FSEventService`) by default, and remove the dependency between `--loop` and `pantsd`. * pin tokio at exactly 0.2.13 * fix lint issues * fix mypy typing issues * Move away from the debounced notify watcher #9754 * Remove test that has raced file invalidation ever since the notify backend was added, but which will now fairly consistently lose that race. # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] * As explained in the comment: we can no longer create duplicate parallel BUILD files and hope that pants does not notice them before we scan the directory again! # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] Co-authored-by: Stu Hood <stuhood@gmail.com> Co-authored-by: Stu Hood <stuhood@twitter.com>

stuhood force-pushed the stuhood/fs-async-await branch 2 times, most recently from d86f592 to 5c53e80 Compare February 5, 2020 06:47

illicitonion reviewed Feb 5, 2020

View reviewed changes

stuhood requested review from gshuflin and pierrechevalier83 February 11, 2020 07:01

stuhood marked this pull request as ready for review February 11, 2020 07:01

stuhood force-pushed the stuhood/fs-async-await branch from 9bfd382 to b818f54 Compare February 11, 2020 07:26

pierrechevalier83 reviewed Feb 12, 2020

View reviewed changes

stuhood mentioned this pull request Feb 13, 2020

Update compatible rust dependencies #9122

Merged

stuhood force-pushed the stuhood/fs-async-await branch from b818f54 to 98d7c21 Compare February 13, 2020 06:28

This comment has been minimized.

Sign in to view

pierrechevalier83 approved these changes Feb 13, 2020

View reviewed changes

stuhood force-pushed the stuhood/fs-async-await branch from 98d7c21 to 45eee35 Compare February 14, 2020 05:31

stuhood added 4 commits March 21, 2020 19:59

Switch to tokio 0.2

22013f7

Adapt transitive crates based on port of task_executor and fs to toki…

3e60eaa

…o 0.2 and stdlib futures.

Port binaries and runtime-consuming tests.

66ef79d

Port to a reqwest that is compatible with new tokio.

30b4a5b

stuhood force-pushed the stuhood/fs-async-await branch from 45eee35 to 30b4a5b Compare March 22, 2020 05:21

stuhood added 3 commits March 22, 2020 11:13

Fix brfs.

8b5eccd

Port logging and workunit locals to tokio.

9bd78fe

Upgrade tokio, and pin max threads.

af3f661

stuhood force-pushed the stuhood/fs-async-await branch from 9b07300 to af3f661 Compare March 23, 2020 00:51

stuhood merged commit 20e9db1 into pantsbuild:master Mar 23, 2020

stuhood deleted the stuhood/fs-async-await branch March 23, 2020 02:25

stuhood mentioned this pull request Apr 4, 2020

Bump to grpcio 0.5.1. #9470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port to tokio 0.2, and to stdlib futures for fs and task_executor #9071

Port to tokio 0.2, and to stdlib futures for fs and task_executor #9071

stuhood commented Feb 5, 2020 •

edited

Loading

illicitonion left a comment

illicitonion Feb 5, 2020

stuhood Feb 5, 2020

stuhood Feb 5, 2020

illicitonion Feb 5, 2020

stuhood Feb 5, 2020

illicitonion Feb 5, 2020

stuhood Feb 5, 2020 •

edited

Loading

illicitonion Feb 5, 2020

stuhood Feb 5, 2020 •

edited

Loading

pierrechevalier83 left a comment

pierrechevalier83 Feb 12, 2020

pierrechevalier83 Feb 12, 2020

pierrechevalier83 Feb 12, 2020

stuhood Feb 13, 2020

pierrechevalier83 Feb 13, 2020 •

edited

Loading

This comment has been minimized.

pierrechevalier83 left a comment

Port to tokio 0.2, and to stdlib futures for fs and task_executor #9071

Port to tokio 0.2, and to stdlib futures for fs and task_executor #9071

Conversation

stuhood commented Feb 5, 2020 • edited Loading

Problem

Solution

Result

illicitonion left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood Feb 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood Feb 5, 2020 • edited Loading

Choose a reason for hiding this comment

pierrechevalier83 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierrechevalier83 Feb 13, 2020 • edited Loading

Choose a reason for hiding this comment

This comment has been minimized.

pierrechevalier83 left a comment

Choose a reason for hiding this comment

stuhood commented Feb 5, 2020 •

edited

Loading

stuhood Feb 5, 2020 •

edited

Loading

stuhood Feb 5, 2020 •

edited

Loading

pierrechevalier83 Feb 13, 2020 •

edited

Loading