Simplify internal commit notification #7031

tgoyne · 2023-10-04T23:50:51Z

The concrete bug which this fixes is that await realm.subscriptions.update {}; await realm.refresh() would hang forever. SubscriptionStore writes failed to notify RealmCoordinator of the write, so the async refresh would see that the Realm is not on the latest version, register a handler to be called when the autorefresh happened, and then nothing would ever schedule the autorefresh.

The sync client needs to be notified of non-sync writes and notify non-sync components when it performs writes. When it was first written, DB did not exist yet and so this was orchestrated via RealmCoordinator. However, that's a very awkward place to do it: not all writes go via RealmCoordinator, and the lifetime of sync sessions isn't actually tied to a coordinator. Nowadays we do have DB, and handling commit notifications there greatly simplifies everything.

There was also a second mechanism for notifying the sync client of writes which modified the subscription store. This appears to have been mostly redundant and unnecessary. The only additional information it conveyed was a number only used in some assertions.

Sync progress notifications somewhat relied on that some of the internal writes by the sync client didn't trigger them, and this change made it so that some very useless notifications were sent. To fix this, I made it so that commits will only trigger notifications if they changed the uploadable bytes, i.e. empty changesets don't produce notifications.

Ideally ExternalCommitHelper would live on DB rather than RealmCoordinator and nonsync_transact_notify() could go away entirely, but that looks like it'd be a pretty complicated change.

coveralls-official · 2023-10-05T17:57:13Z

Pull Request Test Coverage Report for Build github_pull_request_279324

425 of 432 (98.38%) changed or added relevant lines in 20 files are covered.
59 unchanged lines in 14 files lost coverage.
Overall coverage increased (+0.02%) to 91.586%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
test/object-store/sync/flx_sync.cpp	7	8	87.5%
test/test_transform.cpp	19	20	95.0%
test/test_sync.cpp	230	235	97.87%

Files with Coverage Reduction	New Missed Lines	%
test/object-store/sync/flx_sync.cpp	1	98.36%
test/test_index_string.cpp	1	94.13%
src/realm/sync/network/http.hpp	2	80.87%
src/realm/table_view.cpp	2	94.18%
test/fuzz_group.cpp	2	54.4%
src/realm/sort_descriptor.cpp	3	93.7%
src/realm/util/future.hpp	3	95.81%
test/test_thread.cpp	3	66.67%
src/realm/util/file.cpp	4	81.25%
src/realm/sync/network/websocket.cpp	5	74.74%

Totals
Change from base Build 1745:	0.02%
Covered Lines:	230468
Relevant Lines:	251641

💛 - Coveralls

tgoyne · 2023-10-06T23:06:59Z

src/realm/object-store/impl/realm_coordinator.cpp

-    // Ensure the notifiers aren't holding on to Transactions after we destroy
-    // the History object the DB depends on
+    // If there's any active NotificationTokens they'll keep the notifiers alive,
+    // so tell the notifiers to release their Transactions so that the DB can
+    // be closed immediately.


This comment was stale and the reason why we originally needed release_data() no longer applies, but it is still required for other reasons.

tgoyne · 2023-10-06T23:13:31Z

src/realm/sync/noinst/client_impl_base.hpp

-    // In general, `m_upload_target_version` follows `m_last_version_available`
-    // as it is increased, but in some cases, `m_upload_target_version` will be
-    // kept fixed for a while in order to constrain the uploading process.


I think this was once true, but now the only time m_upload_target_version wasn't the same as m_last_version_available was at times where we couldn't be uploading changesets anyway (such as while in the process of applying a client reset recovery on the sync worker thread).

tgoyne · 2023-10-06T23:15:09Z

src/realm/sync/noinst/client_impl_base.cpp

-        if (!m_pending_flx_sub_set || m_pending_flx_sub_set->snapshot_version < m_upload_progress.client_version) {
-            m_pending_flx_sub_set = get_flx_subscription_store()->get_next_pending_version(
-                m_last_sent_flx_query_version, m_upload_progress.client_version);
-        }


This code was dead: the single caller of send_upload_message() (send_message()) ensures m_pending_flx_sub_set up to date as part of deciding if it should call send_upload_message() in the first place, so it never needs to be refreshed here.

tgoyne · 2023-10-06T23:19:02Z

test/test_client_reset.cpp

@@ -231,25 +219,24 @@ TEST(ClientReset_InitialLocalChanges)
    ClientServerFixture fixture(dir, test_context);
    fixture.start();

-    Session session_1 = fixture.make_session(path_1, server_path);
+    DBRef db_1 = DB::create(make_client_replication(), path_1);


This test predates core 6 and was doing something which is now quite weird (writing to a Realm via a second DB not linked to the sync session and not via a RealmCoordinator). It would have continued to work unchanged by continuing to use nonsync_transact_notify(), but since it isn't actually trying to test multiprocess things I made it normal instead.

tgoyne · 2023-10-06T23:20:00Z

test/test_sync.cpp

-    // NOTE: There was a race condition with `write_transaction_notifying_session` where session_2
-    // was completing sync before the write transaction was completed, leading to a
-    // `realm::TableNameInUse` exception. Broke up this function and moved the call to
-    // `nonsync_transact_notify()` to after the write transactions.
-    auto version_1 = perform_write_transaction(db_1, std::move(fn_1));
-    auto version_2 = perform_write_transaction(db_2, std::move(fn_2));
-    session_1.nonsync_transact_notify(version_1);
-    session_2.nonsync_transact_notify(version_2);


This race goes away by just moving the writes to before binding.

tgoyne · 2023-10-06T23:20:41Z

test/test_sync.cpp

-        else {
-            CHECK_GREATER(progress_version, 0);
-            CHECK_GREATER(snapshot_version, 3);
+        switch (entry_1) {


This more precise test also passes on master (with some nonsync_transact_notify()s added).

tgoyne · 2023-10-06T23:21:00Z

test/test_sync.cpp

@@ -3783,6 +3771,76 @@ TEST(Sync_UploadDownloadProgress_7)
    // down the session that is in the process of being created.
 }

+TEST(Sync_UploadProgress_EmptyCommits)


This test doesn't pass on master, but I think the new behavior is sensible.

There seems to be no issue with the test

danieltabacaru · 2023-10-10T07:33:17Z

src/realm/sync/noinst/client_impl_base.cpp

-        if (m_pending_flx_sub_set && m_pending_flx_sub_set->snapshot_version < m_upload_target_version) {
-            target_upload_version = m_pending_flx_sub_set->snapshot_version;
-        }
+    version_type target_upload_version = get_db()->get_version_of_latest_snapshot();


shouldn't the target be m_last_version_available? Is this because that's actually not the case since subscriptions don't report their snapshot version anymore?

AFAICT there's no reason to limit it to m_last_version_available. If a commit happens on another thread while we're enqueued to send, it's fine to upload that changeset while the notification is still waiting in the event loop's queue.

danieltabacaru

LGTM

michael-wb

LGTM - Nice, this simplifies some of the coordination around realm updates.

test/test_client_reset.cpp

ironage

Any simplification to the notification system is a win from my perspective 👍

src/realm/object-store/impl/realm_coordinator.hpp

The documentation suggests there was once a mechanism for uploading up to a specific version and then stopping, but this is now only used for sending QUERY messages at the correct time, and that can be done more directly. This cuts down on the amount of state that needs to be tracked and sometimes (very insignificantly) improves upload latency.

The concrete bug which this fixes is that `await realm.subscriptions.update {}; await realm.refresh()` would hang forever. SubscriptionStore writes failed to notify RealmCoordinator of the write, so the async refresh would see that the Realm is not on the latest version, register a handler to be called when the autorefresh happened, and then nothing would ever schedule the autorefresh. The sync client needs to be notified of non-sync writes and notify non-sync components when it performs writes. When it was first written, DB did not exist yet and so this was orchestrated via RealmCoordinator. However, that's a very awkward place to do it: not all writes go via RealmCoordinator, and the lifetime of sync sessions isn't actually tied to a coordinator. Nowadays we do have DB, and handling commit notifications there greatly simplifies everything. There was also a *second* mechanism for notifying the sync client of writes which modified the subscription store. This appears to have been entirely redundant and unnecessary.

tgoyne self-assigned this Oct 4, 2023

tgoyne force-pushed the tg/commit-notify branch from 1900351 to 4091ad5 Compare October 5, 2023 17:21

tgoyne force-pushed the tg/commit-notify branch 2 times, most recently from 747823f to 24f055f Compare October 6, 2023 23:01

tgoyne commented Oct 6, 2023

View reviewed changes

tgoyne force-pushed the tg/commit-notify branch 2 times, most recently from 30c2109 to 7fb0ce1 Compare October 9, 2023 18:25

tgoyne marked this pull request as ready for review October 9, 2023 21:52

tgoyne requested review from ironage, danieltabacaru and michael-wb October 9, 2023 21:52

danieltabacaru reviewed Oct 10, 2023

View reviewed changes

danieltabacaru approved these changes Oct 10, 2023

View reviewed changes

michael-wb approved these changes Oct 10, 2023

View reviewed changes

test/test_client_reset.cpp Outdated Show resolved Hide resolved

ironage approved these changes Oct 11, 2023

View reviewed changes

src/realm/object-store/impl/realm_coordinator.hpp Outdated Show resolved Hide resolved

tgoyne added 4 commits October 11, 2023 10:45

Fix data races in the c api websocket test

a0a5d50

Consistently pass test_context.logger to compare_group()

52b4f97

tgoyne force-pushed the tg/commit-notify branch from 5c83687 to 52b4f97 Compare October 11, 2023 17:46

tgoyne merged commit 8f4f990 into master Oct 11, 2023
26 of 29 checks passed

tgoyne deleted the tg/commit-notify branch October 11, 2023 20:13

sync-by-unito bot mentioned this pull request Oct 19, 2023

Random CI failures due to timeouts #7071

Closed

github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify internal commit notification #7031

Simplify internal commit notification #7031

tgoyne commented Oct 4, 2023 •

edited

Loading

coveralls-official bot commented Oct 5, 2023 •

edited

Loading

tgoyne Oct 6, 2023

tgoyne Oct 6, 2023

tgoyne Oct 6, 2023

tgoyne Oct 6, 2023

tgoyne Oct 6, 2023

tgoyne Oct 6, 2023

tgoyne Oct 6, 2023

danieltabacaru Oct 10, 2023

danieltabacaru Oct 10, 2023 •

edited

Loading

tgoyne Oct 10, 2023

danieltabacaru Oct 10, 2023

danieltabacaru left a comment

michael-wb left a comment

ironage left a comment

Simplify internal commit notification #7031

Simplify internal commit notification #7031

Conversation

tgoyne commented Oct 4, 2023 • edited Loading

coveralls-official bot commented Oct 5, 2023 • edited Loading

Pull Request Test Coverage Report for Build github_pull_request_279324

💛 - Coveralls

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danieltabacaru Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danieltabacaru left a comment

Choose a reason for hiding this comment

michael-wb left a comment

Choose a reason for hiding this comment

ironage left a comment

Choose a reason for hiding this comment

tgoyne commented Oct 4, 2023 •

edited

Loading

coveralls-official bot commented Oct 5, 2023 •

edited

Loading

danieltabacaru Oct 10, 2023 •

edited

Loading