Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: replace deprecated runloop in fsevents #8304

Merged
merged 1 commit into from
Oct 15, 2023
Merged

refactor: replace deprecated runloop in fsevents #8304

merged 1 commit into from
Oct 15, 2023

Conversation

kevinji
Copy link
Contributor

@kevinji kevinji commented Jul 31, 2023

Replace the use of the deprecated FSEventStreamScheduleWithRunLoop() with FSEventStreamSetDispatchQueue().

Since a dispatch queue spawns new threads, we need to call caml_c_thread_register() and caml_c_thread_unregister() for these threads. This is done indirectly via a pthread_key_t so that the functions only need to be called more than once if the dispatch queue switches which thread is running the callback.

We replicate the blocking nature of the existing code using a mutex and condition variable.

See git/git@b022600 for more details about this approach.

Fixes #7352.

@kevinji kevinji changed the title feat: replace deprecate runloop in fsevents feat: replace deprecated runloop in fsevents Jul 31, 2023
@kevinji
Copy link
Contributor Author

kevinji commented Jul 31, 2023

This is my first time interfacing with C from OCaml and I'm getting the following error in CI:

+  ../watching/helpers.sh: line 3: 85234 Illegal instruction: 4  ( dune build "$@" --passive-watch-mode > .#dune-output 2>&1 )

Some guidance around how I can reproduce this error locally and how I could go about debugging this would be helpful!

@Alizter
Copy link
Collaborator

Alizter commented Jul 31, 2023

@kevinji That is a very strange error. What is your OS / arch? What kind of file system are you using?

@kevinji
Copy link
Contributor Author

kevinji commented Jul 31, 2023

@Alizter This is the macOS CI build error. I’m having trouble replicating it on a local M1 machine so I’m also asking for some help setting up my dev environment—I’ve run make bootstrap, make dev, and then make test but I’m getting other errors that aren’t present in CI.

I’ve modified the C bindings to fsevents so I’m wondering if I need to pass a flag for pthread support, or if I messed something up when doing FFI between C and OCaml.

@Alizter
Copy link
Collaborator

Alizter commented Jul 31, 2023

I wonder if the C optimizer is doing some funky things. What C toolchain versions are you using?

@kevinji
Copy link
Contributor Author

kevinji commented Jul 31, 2023

On my machine both gcc and clang refer to Apple's clang:

~
❯ gcc --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

~
❯ clang --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

@kevinji
Copy link
Contributor Author

kevinji commented Jul 31, 2023

For reference, on my local machine, I get warnings that look like

File "test/blackbox-tests/test-cases/virtual-libraries/incorrect-archive-7027.t", line 1, characters 0-0:
diff --git a/_build/.sandbox/8a8c802dbf0064c575d0050dcd663d21/default/test/blackbox-tests/test-cases/virtual-libraries/incorrect-archive-7027.t b/_build/.sandbox/8a8c802dbf0064c575d0050dcd663d21/default/test/blackbox-tests/test-cases/virtual-libraries/incorrect-archive-7027.t.corrected
index bfdebcb23..fcc113691 100644
--- a/_build/.sandbox/8a8c802dbf0064c575d0050dcd663d21/default/test/blackbox-tests/test-cases/virtual-libraries/incorrect-archive-7027.t
+++ b/_build/.sandbox/8a8c802dbf0064c575d0050dcd663d21/default/test/blackbox-tests/test-cases/virtual-libraries/incorrect-archive-7027.t.corrected
@@ -48,3 +48,10 @@ https://github.com/ocaml/dune/issues/7027
   > EOF

   $ dune exec ./foo.exe
+  /var/folders/5d/8_6n7qx13nnbr5nmmr096gqw0000gn/T/build_1858d6_dune/build_874b91_dune/camlobj3924a9.c:1525:14: warning: a function declaration without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
+  extern value caml_get_public_method();
+               ^
+  /var/folders/5d/8_6n7qx13nnbr5nmmr096gqw0000gn/T/build_1858d6_dune/build_874b91_dune/camlobj3924a9.c:1727:14: warning: a function declaration without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
+  extern value caml_set_oo_id();
+               ^
+  2 warnings generated.

and failed expect tests like

File "test/expect-tests/persistent_tests.ml", line 1, characters 0-0:
diff --git a/_build/default/test/expect-tests/persistent_tests.ml b/_build/.sandbox/11d2f2745866f56ea24aef378f7a7bc4/default/test/expect-tests/persistent_tests.ml.corrected
index c87263c42..3042d3c43 100644
--- a/_build/default/test/expect-tests/persistent_tests.ml
+++ b/_build/.sandbox/11d2f2745866f56ea24aef378f7a7bc4/default/test/expect-tests/persistent_tests.ml.corrected
@@ -28,7 +28,7 @@ let%expect_test "persistent digests" =
     ---

     DIGEST-DB version 6
-    a4ae8e07cf52a9fb38c47c32b6d59fa6
+    a6df9e528c50debc9264b7a95489392e
     ---

     INSTALL-COOKIE version 1
@@ -40,7 +40,7 @@ let%expect_test "persistent digests" =
     ---

     COPY-LINE-DIRECTIVE-MAP version 1
-    7e311b06ebde9ff1708e4c3a1d3f5633
+    7dac5b11f6f654bb6f230392493b363f
     ---

     merlin-conf version 4
@@ -48,5 +48,5 @@ let%expect_test "persistent digests" =
     ---

     INCREMENTAL-DB version 5
-    fa67cc9b60c9f3a1b9b1ad93a56df691
+    1cc656a4502ef88e70adab1f3c9a868e
     --- |}]

as well as an error that seems relevant:

File "test/blackbox-tests/test-cases/watching/path-pwd.t", line 1, characters 0-0:
diff --git a/_build/.sandbox/5edf3dab400d052bedb3a5f7236b8b4e/default/test/blackbox-tests/test-cases/watching/path-pwd.t b/_build/.sandbox/5edf3dab400d052bedb3a5f7236b8b4e/default/test/blackbox-tests/test-cases/watching/path-pwd.t.corrected
index 2152be750..6c993817e 100644
--- a/_build/.sandbox/5edf3dab400d052bedb3a5f7236b8b4e/default/test/blackbox-tests/test-cases/watching/path-pwd.t
+++ b/_build/.sandbox/5edf3dab400d052bedb3a5f7236b8b4e/default/test/blackbox-tests/test-cases/watching/path-pwd.t.corrected
@@ -9,6 +9,7 @@ Reproduce #6907
   $ echo "(lang dune 2.0)" > dune-project

   $ start_dune
+  ./helpers.sh: line 3: 41395 Trace/BPT trap: 5       ( dune build "$@" --passive-watch-mode > .#dune-output 2>&1 )

   $ cat > x <<EOF
   > original-contents
@@ -29,3 +30,4 @@ Reproduce #6907
   $ stop_dune
   Success, waiting for filesystem changes...
   Success, waiting for filesystem changes...
+  exit 133

@Alizter
Copy link
Collaborator

Alizter commented Jul 31, 2023

Just to confirm, the test is OK before this PR?

@anmonteiro
Copy link
Collaborator

anmonteiro commented Jul 31, 2023

I ran ./dune.exe build @test/blackbox-tests/test-cases/watching/path-pwd and that only fails with this PR (fine in main)

EDIT: it only fails with ./dune.exe build @test/blackbox-tests/test-cases/watching/runtest, but the failure isn't present in main nonetheless.

@anmonteiro
Copy link
Collaborator

Actually it doesn't repro anymore after the latest force-push.

@kevinji
Copy link
Contributor Author

kevinji commented Jul 31, 2023

I pushed a new version that fixes the C logic. The original code freed the dispatch queue fields in the wrong place (dune_fsevents_stop); the updated commit moves the changes to the end of dune_fsevents_dispatch_queue_run instead. I also removed some comments in fsevents.mli and some unused functions in fsevents_stubs.c to reflect what the current code actually does.

@kevinji kevinji marked this pull request as ready for review July 31, 2023 23:08
@rgrinberg rgrinberg requested a review from gridbugs August 1, 2023 09:24
@emillon
Copy link
Collaborator

emillon commented Aug 1, 2023

This area had some issues with memory safety before (#6151). We were not convinced of the soundness of the bindings so it's not completely unexpected that touching this will reveal issues.

@kevinji
Copy link
Contributor Author

kevinji commented Aug 1, 2023

That's helpful to know. I pushed a new commit with the following small changes:

  • Dispatch queue cleanup (including for the mutex/condvar) is now moved to the finalize function.
  • The custom_operations types now are prefixed with build. to be Java-style per the OCaml docs and also to be consistent with the dispatch queue name.
  • dune_fsevents_dispatch_queue_current is renamed to dune_fsevents_dispatch_queue_create to better reflect what it's doing, as unlike the original runloop code, it's not getting the current thread's runloop, but rather creating a new dispatch queue.

@kevinji
Copy link
Contributor Author

kevinji commented Aug 2, 2023

New updates:

  • The function name dune_fsevents_dispatch_queue_run is now dune_fsevents_dispatch_queue_wait_until_stopped to better reflect that it no longer "runs" anything directly.
  • pthread_cond_broadcast(&t->dq->dq_finished); is now also called in dune_fsevents_stop so dune_fsevents_dispatch_queue_wait_until_stopped can be stopped gracefully.

@Alizter
Copy link
Collaborator

Alizter commented Aug 5, 2023

When does the dispatch queue spawn new threads?

@kevinji
Copy link
Contributor Author

kevinji commented Aug 5, 2023

According to the macOS documentation: "Work submitted to dispatch queues executes on a pool of threads managed by the system." I'm not sure at which specific point the system actually creates the threads though.

}
CAMLdrop;
caml_release_runtime_system();
caml_c_thread_unregister();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are calling unregister in every single callback invocation. Are you sure this call is cheap enough for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good metric for what you mean here? From what I can tell caml_c_thread_register creates a thread info block and then attaches it to an existing linked list of thread info blocks, and caml_c_thread_unregister reverses those changes. However, I'm not sure how expensive that is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so concerned about the linked list nor the block, but rather the lock that has to be acquired to do this. Given that this callback in our case isn't doing much work (just notifying which memoization nodes are out of date and hence invalidating the build can easily add up) acquiring this lock can easily affect the latency in watch mode.

I'd be more eager if this PR added some feature in exchange for the worse performance. Are there are any concrete benefits to using dispatch queues that I'm unaware of perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original reason is that FSEventStreamScheduleWithRunLoop is deprecated with a suggestion to use FSEventStreamSetDispatchQueue instead. I think in practice, since we're using a serial queue, the background thread running the callback should remain the same, but I couldn't find an easy way to manually manage the threads used for the queue, which could allow us to only run caml_c_thread_register as needed.

Alternatively, we could use dispatch_get_main_queue, which uses a serial dispatch queue in the main thread. This would have roughly the same behavior as the original code that use CFRunLoopGetCurrent assuming there was originally only one thread, but I think we would probably need to run the main dispatch queue somehow with either dispatch_main or CFRunLoopRun.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative, I've introduced some thread-local storage that keeps track of whether caml_c_thread_register has been called already by the same thread, and only unregisters once the thread exits. Since a serial dispatch queue often uses the same thread, this should in principle reduce the number of register/unregister calls needed, but additional testing is probably needed.

Copy link
Member

@rgrinberg rgrinberg Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a bit better. If you could do some testing to confirm that we aren't spamming caml_c_thread_register/unregister then this should be good to go.

Another alternative that's a little more bulletproof but requires a bit more code is to have the code in the dispatch queue populate some sort of queue with events, and require the OCaml side to poll this queue for events in a loop. This will require some synchronization and the use of CV's to avoid busy polling, but it's probably the best we can do here.

cc @patricoferris who discussed this issue with me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgrinberg Sorry just got some time to come back to look at this. I can confirm that running the test/blackbox-tests/test-cases/watching test that the same background thread is being reused and caml_c_thread_register is only being run once. Are there some longer workflows that I can test to make sure the behavior is also as expected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could try running longer builds on more serious projects. But your test seems enough to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have some spare cycles, I would suggest looking at the queue based workaround I suggest previous comments. I think that should guarantee this is well behaved.

Replace the use of the deprecated `FSEventStreamScheduleWithRunLoop()`
with `FSEventStreamSetDispatchQueue()`.

Since a dispatch queue spawns new threads, we need to call
`caml_c_thread_register()` and `caml_c_thread_unregister()` for these
threads. This is done indirectly via a `pthread_key_t` so that the
functions only need to be called more than once if the dispatch queue
switches which thread is running the callback.

We replicate the blocking nature of the existing code using a mutex and
condition variable.

See git/git@b022600 for more details
about this approach.

Signed-off-by: Kevin Ji <1146876+kevinji@users.noreply.github.com>
@rgrinberg rgrinberg changed the title feat: replace deprecated runloop in fsevents refactor: replace deprecated runloop in fsevents Oct 15, 2023
@rgrinberg rgrinberg merged commit e6a5199 into ocaml:main Oct 15, 2023
20 checks passed
@kevinji kevinji deleted the replace-runloop-with-dispatch-queue branch December 5, 2023 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not a bug: some functions are deprecated on MacOS >=13 (Ventura)
6 participants