-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port to tokio 0.2, and to stdlib futures for fs and task_executor #9071
Conversation
d86f592
to
5c53e80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very exciting!!
|
||
scheduler.core.executor.spawn_on_io_pool(work_future).wait() | ||
}) | ||
.wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changing from blocking to not? Is it because the blocking-ness is pushed down onto the actual-blocking parts? IIRC last time I tried to do that, there was a massive perf regression, which is why this is how it was... 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This callsite was redundant: wait()
on a 0.1 future is itself a blocking method that uses the current thread as the executor for the future. It's gone in stdlib
futures in favor of explicit calls to ${MyRuntime}.block_on(..)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooor I haven't run tests that have covered this yet, and it will complain about one of the nested spawn
calls not occurring in the context of a Runtime
! We shall see, heh. The patch currently runs ./pants list
successfully, but some of the rust tests haven't been ported, and there are likely to be a few "no Runtime" or "too many Runtimes" type errors at various callsites.
})) | ||
self | ||
.executor | ||
.spawn_blocking(move || vfs.scandir_sync(&dir_relative_to_root)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be interesting to see if this regresses perf in the way that when I tried to use the blocking stuff last time it did... 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spawn_blocking
is a different API, afaict. It's basically: "please literally dedicate me a thread". So it should be a 1:1 replacement for the CpuPool
we used to have here.
The tokio-fs
blocking APIs are definitely still a thing, but having removed one moving piece is nice.
format!("Failed to read file {:?}: {}", path_abs, e), | ||
) | ||
let executor = self.executor.clone(); | ||
Box::pin(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if it's how it is, it's how it is, but I'm kind of surprised at the amount of boilerplate to run a closure asynchronously - Box::pin(async move { spawn_blocking( move || {
seems like more boilerplate than I'd expect :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the boilerplate required to "box" an async block (note: not a closure). The .boxed()
combinator was reintroduced, so I think that this could maybe also be written:
let f = async move { .. };
f.boxed()
...but chaining a combinator off a block requires a local variable... etc.
See the note in ASYNC.md
(and all of the callsites in the patch that reference it: this one included): the explicit use of BoxFuture
is a stopgap to improve the futures 0.1
/stdlib futures boundary (because an async fn
can actually take references without cloning them, but that requires them to stay alive on the caller's stack (not the case with 0.1
futures): once callers have been ported, we should be able to remove almost all explicit usage of BoxFuture
.
@@ -788,25 +808,35 @@ impl PosixFS { | |||
} | |||
} | |||
|
|||
pub fn stat_sync(&self, relative_path: PathBuf) -> Result<Stat, io::Error> { | |||
pub fn stat_sync(&self, relative_path: PathBuf) -> Result<Option<Stat>, io::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a surprising API change - what about asyncifying things means this is easier to couple than do separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very thorough review! Heh.
I think that before I committed to async-await-ifying the calling methods, this simplified the combinators. Could undo it. But it also avoided leaking implementation details (avoiding having callers interpret errors where possible is good, imo).
9bfd382
to
b818f54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love seeing these changes moving us ever closer to a desirable future 😄
I'd like an answer to my comment in collect_from
before approving in case it's unintentional.
} | ||
|
||
/// | ||
/// Recursively expands PathGlobs into PathStats while applying excludes. | ||
/// | ||
fn expand(&self, path_globs: PathGlobs) -> BoxFuture<Vec<PathStat>, E> { | ||
GlobMatchingImplementation::expand(self, path_globs) | ||
/// TODO: See the note on references in ASYNC.md. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a summary to let the reader decide whether it's worth checking out ASYNC.md or not.
Something like this:
TODO: Turn into an async fn once all call sites are ready. See note on references in ASYNC.md for more details.
Also applies to other places in the commit.
|
||
self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff is a bit hard to read, but it's nice to see that async/await is already used to make the code incrementally a tiny bit more readable than prior to the change (breaking massive chains into multiple statements with.await?
and dropping some levels of nesting). It's a taste of more refactorings that will be possible with this. 😄 Same comment for a few diffs in the rest of the commit.
) -> futures::future::BoxFuture<Result<ChildResults, String>> { | ||
let mut stdout = BytesMut::with_capacity(8192); | ||
let mut stderr = BytesMut::with_capacity(8192); | ||
let mut exit_code = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you changing the default exit code from 0 to 1 here? Is it on purpose? If so, I would probably make it its own commit to explain the intent in the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this changes the default... good eye. While converting this, I noticed that if for whatever reason we exited this loop without ever receiving an exit code from the process, we would consider that to be a success. That felt error prone, but not worth a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not worth a separate PR
In general, when doing a large refactoring like this, I would bias for doing unrelated cleanups at least in their own commits (but maybe in their own PRs) so they're more visible to reviewers. If this change causes any unforeseen regression, this is more likely to cause it than the rest of the PR.
Anyway, I'm happy with your reasoning that it's better to fail by default 😄
b818f54
to
98d7c21
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me 😍
### Problem It has been a long time since we've updated some of our dependencies! Additionally, during the course of #9071, I encountered the need to run `cargo update`, but didn't want to land that update at the same time as the change to a new runtime. ### Solution Run `cargo update`, and fix one resulting library incompatibility. ### Result Modern(-ish) rust deps: `cargo update` will only update within the ranges specified in `Cargo.toml`.
98d7c21
to
45eee35
Compare
45eee35
to
30b4a5b
Compare
9b07300
to
af3f661
Compare
### Problem #9395 occurs within our `grpcio` dependency, which is quite stale. Although #9395 is more likely to be related to changes in our executor (#9071) or to our transitive rust dependencies (#9122), getting on a more recent version of the `grpcio` crate _might_ resolve the issue, or make it easier to report an issue. ### Solution Bump to `0.5.1` with one patch (tikv/grpc-rs#457) pulled forward from our previous fork. [ci skip-jvm-tests] # No JVM changes made.
…ntsbuild#9071) We're on an older version of tokio, which doesn't smoothly support usage of async/await. Switch to tokio 0.2, which supports directly spawning and awaiting (via its macros) stdlib futures, which is an important step toward being able to utilize async/await more broadly. Additionally, port the `fs` and `task_executor` crates to stdlib futures. Finally, transitively fixup for the new APIs: in particular, since both `task_executor` and `tokio` now consume stdlib futures to spawn tasks, we switch all relevant tests and main methods to use the `tokio::main` and `tokio::test` macros, which annotate async methods and spawn a runtime to allow for `await`ing futures inline. Progress toward more usage of async/await!
* Port to tokio 0.2, and to stdlib futures for fs and task_executor (#9071) We're on an older version of tokio, which doesn't smoothly support usage of async/await. Switch to tokio 0.2, which supports directly spawning and awaiting (via its macros) stdlib futures, which is an important step toward being able to utilize async/await more broadly. Additionally, port the `fs` and `task_executor` crates to stdlib futures. Finally, transitively fixup for the new APIs: in particular, since both `task_executor` and `tokio` now consume stdlib futures to spawn tasks, we switch all relevant tests and main methods to use the `tokio::main` and `tokio::test` macros, which annotate async methods and spawn a runtime to allow for `await`ing futures inline. Progress toward more usage of async/await! * Add notify fs watcher to engine. (#9318) * Use the notify crate to implement an `InvalidationWatcher` for Graph operations. * Make watch async, and watcher log to pantsd.log. Relativize paths returned from notify to the build_root. Refactor invalidate method to be an associated method on the InvalidationWatcher. * Respond to feedback. * Use spawn on io pool instead of custom future impl * Write python fs tests * Relativize paths to invalidate to build root * invalidate nodes with parent paths. * Comments * Add rust tests. Make some things public so we can use them in tests. Use canonical path to build root for relativizing changed paths. * Refactor Python tests. Return watch errors as core::Failure all the way to user. Move task executor onto invalidation watcher. Move test_support trait impl into test_support mod. * use futures lock on watcher * Platform specific watching behavior. On Darwin recursively watch the build root at startup. On Linux watch individual directory roots. Co-authored-by: Stu Hood <stuhood@gmail.com> * Ignore notify events for pants_ignore patterns. (#9406) * Create a git ignorer on the context object. Adjust all call sites which create a posix fs to pass in an ignorer. * Ignore fsevents from files that match pants_ignore patterns. * Always pass is_dir = false to ignorer to avoid stat-ing every path the event watch thread sees. * Add a feature gate to disable the engine fs watcher introduced in #9318 (#9416) * Add a feature gate to disable the engine fs watcher introduced in #9318 by default, to mitigate issues seen in #9415 until a fix is in place. * Don't rerun uncachable nodes if they are dirtied while running. (#9452) * Don't rerun uncachable nodes if they are dirtied while running. - Retry dependencies of uncacheable nodes a few times to get a result until we are exhausted from trying too many times. - Bubble uncacheable node retry errors up to the user, tell them things were chaning to much. - Don't propagate dirtiness past uncacheable nodes when invalidating from changed roots. Otherwise dirty dependents of uncacheable nodes will need to re-run. * enable the engine fs watcher by default, now that it won't cause issues. Remove execution option override from tests. * use reference to self in stop_walk_predicate closure * invalidate often enough that test doesn't flake * Add a flag to prevent the FsEventService and watchman from starting (#9487) * add --watchman-enable flag * disable watchman when flag is false * Don't wait for the initial watchman event if we aren't using watchman. * check invalidation watcher liveness from scheduler service * Extract a `watch` crate. (#9635) The `watch` module directly accesses the `engine` crate's `Graph`, which makes it more challenging to test. Extract a `watch` crate which is used via an `trait Invalidatable` which is implemented for the engine's `Graph`, as well as independently in tests. [ci skip-jvm-tests] * Simplify Scheduler::execute and unify Graph retry (#9674) Both the `Graph` and the `Scheduler` implemented retry for requested Nodes, but the `Scheduler` implementation was pre-async-await and much more complicated. Unify the retry implementations into `Graph::get` (either roots or uncacheable nodes are retried), and simplify the `Scheduler`'s loop down to: ``` let maybe_display_handle = Self::maybe_display_initialize(&session); let result = loop { if let Ok(res) = receiver.recv_timeout(refresh_interval) { break Ok(res); } else if let Err(e) = Self::maybe_display_render(&session, &mut tasks) { break Err(e); } }; Self::maybe_display_teardown(session, maybe_display_handle); result ``` A single, more modern retry implementation (thanks @hrfuller!), and a cleaner `Scheduler::execute` loop. * Move file invalidation handling to rust (#9636) A few different kinds of file watching span the boundary between the `SchedulerService` and `FSEventService`: 1. pantsd invalidation globs - how `pantsd` detects that its implementing code or config has changed 2. pidfile - watches `pantsd`'s pidfile to ensure that the daemon exits if it loses exclusivity 3. graph invalidation - any files changing in the workspace should invalidate the engine's `Graph` 4. `--loop` - implemented directly in the `SchedulerService` Because of the semi-cyclic nature of the relationship between the `SchedulerService` and `FSEventService`, it's challenging to understand the interplay of these usecases. And, unsurprisingly, that lead to the `notify` crate implementation only satisfying one of them. The fundamental change in this PR is to add support for two new parameters to engine executions which are implemented by the `Graph`: * `poll: bool` - When `poll` is enabled, a `product_request` will wait for the requested Nodes to have changed from their last-observed values before returning. When a poll waits, an optional `poll_delay` is applied before it returns to "debounce" polls. * `timeout: Optional[Duration]` - When a `timeout` is set, a `product_request` will wait up to the given duration for the requested Node(s) to complete (including any time `poll`ing). These features are then used by: * `--loop`: uses `poll` (with a `poll_delay`, but without a `timeout`) to immediately re-run a `Goal` when its inputs have changed. * invalidation globs and pidfile watching: use `poll` (with no `poll_delay`) and `timeout` to block their `SchedulerService` thread and watch for changes to those files. The `FSEventService` and `SchedulerService` are decoupled, and each now interacts only with the `Scheduler`: `FSEventService` to push `watchman` events to the `Graph`, and the `SchedulerService` to pull invalidation information from the `Graph`. Because all events now flow through the `Graph`, the `notify` crate has reached feature parity with `watchman`. In followup changes we can remove the experimental flag, disable `watchman` (and thus the `FSEventService`) by default, and remove the dependency between `--loop` and `pantsd`. * pin tokio at exactly 0.2.13 * fix lint issues * fix mypy typing issues * Move away from the debounced notify watcher #9754 * Remove test that has raced file invalidation ever since the notify backend was added, but which will now fairly consistently lose that race. # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] * As explained in the comment: we can no longer create duplicate parallel BUILD files and hope that pants does not notice them before we scan the directory again! # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] Co-authored-by: Stu Hood <stuhood@gmail.com> Co-authored-by: Stu Hood <stuhood@twitter.com>
Problem
We're on an older version of tokio, which doesn't smoothly support usage of async/await.
Solution
Switch to tokio 0.2, which supports directly spawning and awaiting (via its macros) stdlib futures, which is an important step toward being able to utilize async/await more broadly. Additionally, port the
fs
andtask_executor
crates to stdlib futures.Finally, transitively fixup for the new APIs: in particular, since both
task_executor
andtokio
now consume stdlib futures to spawn tasks, we switch all relevant tests and main methods to use thetokio::main
andtokio::test
macros, which annotate async methods and spawn a runtime to allow forawait
ing futures inline.Result
Progress toward more usage of async/await!