Pick notify watcher series #9741

hrfuller · 2020-05-11T18:55:15Z

Problem

To release notify watcher changes we need to pick the notify watcher series from upstream.

Solution

Cherry pick the notify watcher change series.

Result

Running ./pants --v2-ui --v2 --no-v1 --experimental-fs-watcher --no-watchman-enable --enable-pantsd fmt2 src/python/pants/:: works.

…ntsbuild#9071) We're on an older version of tokio, which doesn't smoothly support usage of async/await. Switch to tokio 0.2, which supports directly spawning and awaiting (via its macros) stdlib futures, which is an important step toward being able to utilize async/await more broadly. Additionally, port the `fs` and `task_executor` crates to stdlib futures. Finally, transitively fixup for the new APIs: in particular, since both `task_executor` and `tokio` now consume stdlib futures to spawn tasks, we switch all relevant tests and main methods to use the `tokio::main` and `tokio::test` macros, which annotate async methods and spawn a runtime to allow for `await`ing futures inline. Progress toward more usage of async/await!

* Use the notify crate to implement an `InvalidationWatcher` for Graph operations. * Make watch async, and watcher log to pantsd.log. Relativize paths returned from notify to the build_root. Refactor invalidate method to be an associated method on the InvalidationWatcher. * Respond to feedback. * Use spawn on io pool instead of custom future impl * Write python fs tests * Relativize paths to invalidate to build root * invalidate nodes with parent paths. * Comments * Add rust tests. Make some things public so we can use them in tests. Use canonical path to build root for relativizing changed paths. * Refactor Python tests. Return watch errors as core::Failure all the way to user. Move task executor onto invalidation watcher. Move test_support trait impl into test_support mod. * use futures lock on watcher * Platform specific watching behavior. On Darwin recursively watch the build root at startup. On Linux watch individual directory roots. Co-authored-by: Stu Hood <stuhood@gmail.com>

* Create a git ignorer on the context object. Adjust all call sites which create a posix fs to pass in an ignorer. * Ignore fsevents from files that match pants_ignore patterns. * Always pass is_dir = false to ignorer to avoid stat-ing every path the event watch thread sees.

…tsbuild#9318 (pantsbuild#9416) * Add a feature gate to disable the engine fs watcher introduced in pantsbuild#9318 by default, to mitigate issues seen in pantsbuild#9415 until a fix is in place.

…sbuild#9452) * Don't rerun uncachable nodes if they are dirtied while running. - Retry dependencies of uncacheable nodes a few times to get a result until we are exhausted from trying too many times. - Bubble uncacheable node retry errors up to the user, tell them things were chaning to much. - Don't propagate dirtiness past uncacheable nodes when invalidating from changed roots. Otherwise dirty dependents of uncacheable nodes will need to re-run. * enable the engine fs watcher by default, now that it won't cause issues. Remove execution option override from tests. * use reference to self in stop_walk_predicate closure * invalidate often enough that test doesn't flake

…antsbuild#9487) * add --watchman-enable flag * disable watchman when flag is false * Don't wait for the initial watchman event if we aren't using watchman. * check invalidation watcher liveness from scheduler service

The `watch` module directly accesses the `engine` crate's `Graph`, which makes it more challenging to test. Extract a `watch` crate which is used via an `trait Invalidatable` which is implemented for the engine's `Graph`, as well as independently in tests. [ci skip-jvm-tests]

@hrfuller

Both the `Graph` and the `Scheduler` implemented retry for requested Nodes, but the `Scheduler` implementation was pre-async-await and much more complicated. Unify the retry implementations into `Graph::get` (either roots or uncacheable nodes are retried), and simplify the `Scheduler`'s loop down to: ``` let maybe_display_handle = Self::maybe_display_initialize(&session); let result = loop { if let Ok(res) = receiver.recv_timeout(refresh_interval) { break Ok(res); } else if let Err(e) = Self::maybe_display_render(&session, &mut tasks) { break Err(e); } }; Self::maybe_display_teardown(session, maybe_display_handle); result ``` A single, more modern retry implementation (thanks @hrfuller!), and a cleaner `Scheduler::execute` loop.

A few different kinds of file watching span the boundary between the `SchedulerService` and `FSEventService`: 1. pantsd invalidation globs - how `pantsd` detects that its implementing code or config has changed 2. pidfile - watches `pantsd`'s pidfile to ensure that the daemon exits if it loses exclusivity 3. graph invalidation - any files changing in the workspace should invalidate the engine's `Graph` 4. `--loop` - implemented directly in the `SchedulerService` Because of the semi-cyclic nature of the relationship between the `SchedulerService` and `FSEventService`, it's challenging to understand the interplay of these usecases. And, unsurprisingly, that lead to the `notify` crate implementation only satisfying one of them. The fundamental change in this PR is to add support for two new parameters to engine executions which are implemented by the `Graph`: * `poll: bool` - When `poll` is enabled, a `product_request` will wait for the requested Nodes to have changed from their last-observed values before returning. When a poll waits, an optional `poll_delay` is applied before it returns to "debounce" polls. * `timeout: Optional[Duration]` - When a `timeout` is set, a `product_request` will wait up to the given duration for the requested Node(s) to complete (including any time `poll`ing). These features are then used by: * `--loop`: uses `poll` (with a `poll_delay`, but without a `timeout`) to immediately re-run a `Goal` when its inputs have changed. * invalidation globs and pidfile watching: use `poll` (with no `poll_delay`) and `timeout` to block their `SchedulerService` thread and watch for changes to those files. The `FSEventService` and `SchedulerService` are decoupled, and each now interacts only with the `Scheduler`: `FSEventService` to push `watchman` events to the `Graph`, and the `SchedulerService` to pull invalidation information from the `Graph`. Because all events now flow through the `Graph`, the `notify` crate has reached feature parity with `watchman`. In followup changes we can remove the experimental flag, disable `watchman` (and thus the `FSEventService`) by default, and remove the dependency between `--loop` and `pantsd`.

hrfuller · 2020-05-11T18:57:43Z

Review methodology:
@wisechengyi : leading 1.25.x backport effort.
@pierrechevalier83 : General rust knowledge de-truck factoring notify watcher.
@blorente : watchman and pantsd.

blorente

I've reviewed the code that is changed in the Daemon and the new watch crate. The daemon code looks good, I'm fairly confident in it, but another pair of eyes on the watch crate would be great.

Thanks for picking this, good work!

build-support/githooks/pre-commit

blorente · 2020-05-13T16:26:57Z

src/python/pants/pantsd/pants_daemon.py

@@ -413,6 +414,46 @@ def _write_named_sockets(self, socket_map):
        for socket_name, socket_info in socket_map.items():
            self.write_named_socket(socket_name, socket_info)

+    def _initialize_pid(self):


I'm so so happy for this change, this will bring so much stability :D

Yeah I think these changes are really great.

…ckend was added, but which will now fairly consistently lose that race. # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

…er/pants into hfuller/pick-notify-watcher-series

pierrechevalier83

WIP. Sharing comments I have so far

pierrechevalier83 · 2020-05-14T11:13:45Z

src/rust/engine/graph/src/lib.rs

@@ -29,10 +29,12 @@ use hashing;

 use petgraph;

-mod entry;
+// make the entry module public for testing purposes. We use it to contruct mock


typo: contruct -> construct

pierrechevalier83 · 2020-05-14T11:20:04Z

src/rust/engine/graph/src/lib.rs

@@ -29,10 +29,12 @@ use hashing;

 use petgraph;

-mod entry;
+// make the entry module public for testing purposes. We use it to contruct mock


As a general guideline, I'm a bit opposed to changing an API for the purpose of testing. An API should be able to expose what logically makes sense for a user and test code should emulate a user. That being said, there are practicalities which do need some implementation details leaking, and mocking is one of them; so I'm willing to accept it as a compromise if the tests enabled by this carry their weight. Will comment on the test site if I don't think they do. If I add no comment, consider this a general rambly discussion point and feel free to ignore it 😄

pierrechevalier83 · 2020-05-14T11:25:28Z

src/rust/engine/graph/src/lib.rs

+// to use the extra test functions, and they should only be imported into
+// test modules.
+pub mod test_support {
+  use super::{EntryId, EntryState, Graph, Node};


Actually I like this approach a much more 😄 : you force the user to declare they want this functionality for tests.
Do you think it would be possible to move the pub mod entry to inside this module and maybe instead of going from pub (crate) to pub, go to pub (in test_support)?

pierrechevalier83 · 2020-05-14T13:40:50Z

src/rust/engine/src/context.rs


    let http_client = reqwest::Client::new();
    let rule_graph = RuleGraph::new(tasks.as_map(), root_subject_types);

    Ok(Core {
-      graph: Graph::new(),
+      graph: graph,


I'm kind of being clippy, but can't it be omitted here?

pierrechevalier83 · 2020-05-14T14:10:32Z

src/rust/engine/src/watch.rs

+///
+/// TODO: Need the above polling
+///
+/// TODO: To simplify testing the InvalidationWatcher we could create  a trait which


This indirectly answers my question on exposing parts of the graph entry publicly. When we drop support for watchman, it should be possible to tighten the visibility of these functions and types too. It may be worth mentioning it in this comment so that when this TODO gets enacted, the implementer remember to tighten these APIs again.

pierrechevalier83 · 2020-05-14T16:10:54Z

src/rust/engine/src/watch.rs

@@ -49,6 +50,7 @@ impl InvalidationWatcher {
    executor: Executor,
    build_root: PathBuf,
    ignorer: Arc<GitignoreStyleExcludes>,
+    enabled: bool,


It feels to me that creating a disabled InvalidationWatcher is a weird pattern. Would it be possible to replace this logic with the caller holding an Option<InvalidationWatcher> and avoid the need for any change to this class?

pierrechevalier83 · 2020-05-14T16:15:34Z

src/rust/engine/graph/src/lib.rs

@@ -287,12 +291,18 @@ impl<N: Node> InnerGraph<N> {
  /// The Walk will iterate over all nodes that descend from the roots in the direction of


Probably worth updating the do comment to add: "until meeting a node that matches the stop_walking_predicate"

pierrechevalier83 · 2020-05-14T16:17:32Z

src/rust/engine/graph/src/lib.rs

  type Item = EntryId;

  fn next(&mut self) -> Option<Self::Item> {
    while let Some(id) = self.deque.pop_front() {
-      if !self.walked.insert(id) {
+      // Visit this node and it neighbors if this node has not yet be visited and we aren't


typo: be -> been

pierrechevalier83 · 2020-05-14T16:18:20Z

src/rust/engine/graph/src/lib.rs

+      // Visit this node and it neighbors if this node has not yet be visited and we aren't
+      // stopping our walk at this node, based on if it satifies the stop_walking_predicate.
+      // This mechanism gives us a way to selectively dirty parts of the graph respecting node boundaries
+      // like uncacheable nodes, which sholdn't be dirtied.


typo: sholdn't -> shouldn't

src/python/pants/option/global_options.py

…el BUILD files and hope that pants does not notice them before we scan the directory again! # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

hrfuller · 2020-05-14T22:12:38Z

Ok @wisechengyi This is as green as it will get!

pierrechevalier83

I've left comments as I reviewed the code, but no blocker. Happy with this change. It's quite the heavy lifting that you and Stu did there! Thanks both for your work on this 😄

pierrechevalier83 · 2020-05-15T11:41:39Z

src/rust/engine/graph/src/entry.rs

+        if generation == last_seen_generation && result.poll_should_wait(context) {
+          // The Node is currently clean with the observed generation: add a poller on the
+          // Completed node that will be notified when it is dirtied or dropped. If the Node moves
+          // to another state, the received will be notified that the sender was dropped, and it


typo: received -> receiver

pierrechevalier83 · 2020-05-15T11:52:40Z

src/rust/engine/graph/src/tests.rs

+
+  // But polling with the previous token should wait, since nothing has changed.
+  let request = graph.poll(TNode::new(2), Some(token2), None, &context);
+  match timeout(Duration::from_millis(1000), request).await {


Am I right in interpreting this as this test will actually wait a full second before convincing itself that the polling timeout? Does it need to be that high? The drawback is that this test won't be responsive. What poses a lower bound? I'm guessing the polling interval of 100 ms does pose one and we want to allow a few polling intervals. Would it even be worth making that configurable and using a low value in the context of tests?

pierrechevalier83 · 2020-05-15T11:53:10Z

src/rust/engine/graph/src/tests.rs

+
+  // Polling with the previous token (in the same session) should wait, since nothing has changed.
+  let request = graph.poll(TNode::new(2), Some(token1), None, &context);
+  match timeout(Duration::from_millis(1000), request).await {


Same comment on full second wait. These add up pretty fast.

pierrechevalier83 · 2020-05-15T11:54:22Z

src/rust/engine/logging/src/logger.rs

@@ -23,7 +23,7 @@ use tokio::task_local;
 use ui::EngineDisplay;
 use uuid::Uuid;

-const TIME_FORMAT_STR: &str = "%H:%M:%S";
+const TIME_FORMAT_STR: &str = "%H:%M:%S:%3f";


Could you comment to explain what this format means. I don't know by heart and it may help other readers too.

stuhood and others added 10 commits May 5, 2020 14:49

Add a feature gate to disable the engine fs watcher introduced in pan…

b2bab03

…tsbuild#9318 (pantsbuild#9416) * Add a feature gate to disable the engine fs watcher introduced in pantsbuild#9318 by default, to mitigate issues seen in pantsbuild#9415 until a fix is in place.

pin tokio at exactly 0.2.13

adeb6cf

hrfuller requested review from wisechengyi and pierrechevalier83 May 11, 2020 18:55

hrfuller changed the title ~~Hfuller/pick notify watcher series~~ Pick notify watcher series May 11, 2020

hrfuller requested a review from blorente May 11, 2020 18:56

fix lint issues

d08894f

blorente approved these changes May 13, 2020

View reviewed changes

Henry Fuller and others added 5 commits May 13, 2020 12:07

fix mypy typing issues

dbd5457

Move away from the debounced notify watcher pantsbuild#9754

f518032

Merge branch '1.25.x-twtr' into hfuller/pick-notify-watcher-series

806fe45

Remove test that has raced file invalidation ever since the notify ba…

20f39e7

…ckend was added, but which will now fairly consistently lose that race. # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

Merge branch 'hfuller/pick-notify-watcher-series' of github.com:twitt…

ab0e2d7

…er/pants into hfuller/pick-notify-watcher-series

pierrechevalier83 reviewed May 14, 2020

View reviewed changes

As explained in the comment: we can no longer create duplicate parall…

1f98eb6

…el BUILD files and hope that pants does not notice them before we scan the directory again! # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]

pierrechevalier83 approved these changes May 15, 2020

View reviewed changes

hrfuller merged commit c786629 into pantsbuild:1.25.x-twtr May 15, 2020

hrfuller deleted the hfuller/pick-notify-watcher-series branch May 15, 2020 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pick notify watcher series #9741

Pick notify watcher series #9741

hrfuller commented May 11, 2020 •

edited

Loading

hrfuller commented May 11, 2020

blorente left a comment

blorente May 13, 2020

hrfuller May 13, 2020

pierrechevalier83 left a comment

pierrechevalier83 May 14, 2020

pierrechevalier83 May 14, 2020

pierrechevalier83 May 14, 2020

pierrechevalier83 May 14, 2020

pierrechevalier83 May 14, 2020

pierrechevalier83 May 14, 2020

pierrechevalier83 May 14, 2020

pierrechevalier83 May 14, 2020

pierrechevalier83 May 14, 2020

hrfuller commented May 14, 2020

pierrechevalier83 left a comment

pierrechevalier83 May 15, 2020

pierrechevalier83 May 15, 2020 •

edited

Loading

pierrechevalier83 May 15, 2020

pierrechevalier83 May 15, 2020

		@@ -287,12 +291,18 @@ impl<N: Node> InnerGraph<N> {
		/// The Walk will iterate over all nodes that descend from the roots in the direction of

Pick notify watcher series #9741

Pick notify watcher series #9741

Conversation

hrfuller commented May 11, 2020 • edited Loading

Problem

Solution

Result

hrfuller commented May 11, 2020

blorente left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierrechevalier83 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrfuller commented May 14, 2020

pierrechevalier83 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierrechevalier83 May 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrfuller commented May 11, 2020 •

edited

Loading

pierrechevalier83 May 15, 2020 •

edited

Loading