Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Make background updates controllable via a plugin #11306

Merged
merged 23 commits into from
Nov 29, 2021
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
4a1a832
Store whether a BG update is oneshot or not
erikjohnston Nov 10, 2021
957da6f
Add a `BackgroundUpdateController` class.
erikjohnston Nov 9, 2021
0c3ba88
Add a `register_background_update_controller`
erikjohnston Nov 10, 2021
0ace9f8
Expose a `sleep(..)` func on `ModuleApi`
erikjohnston Nov 10, 2021
dccddf1
Add tests
erikjohnston Nov 11, 2021
c7f1498
Newsfile
erikjohnston Nov 11, 2021
c77bad8
Convert API to use callbacks
erikjohnston Nov 16, 2021
dddfdca
Merge remote-tracking branch 'origin/develop' into erikj/bg_update_co…
erikjohnston Nov 16, 2021
3dcda89
Merge branch 'develop' into erikj/bg_update_controller
babolivier Nov 18, 2021
4a1d77e
Remove callback wrapping
babolivier Nov 18, 2021
31a4897
Lint and docstrings
babolivier Nov 18, 2021
d89cadd
Don't ignore module callbacks if we don't want to sleep
babolivier Nov 18, 2021
df863fb
Rename update handler to avoid name clashes
babolivier Nov 19, 2021
66aae92
Add docs
babolivier Nov 19, 2021
f5d551a
Fixup changelog
babolivier Nov 19, 2021
82e880e
Let more time for the update to complete
babolivier Nov 19, 2021
99ef30c
Merge branch 'develop' of github.com:matrix-org/synapse into erikj/bg…
babolivier Nov 23, 2021
26a61b4
Fix test
babolivier Nov 23, 2021
1a99abe
Allow modules to run a function in a thread
babolivier Nov 23, 2021
08bb9b1
Lint
babolivier Nov 23, 2021
8c7ec0f
Retune test_do_background_update
babolivier Nov 26, 2021
589d8ea
Incorporate review comments
babolivier Nov 29, 2021
e05809d
Merge branch 'develop' of github.com:matrix-org/synapse into erikj/bg…
babolivier Nov 29, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/11306.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add plugin support for controlling database background updates.
68 changes: 68 additions & 0 deletions docs/modules/background_update_controller_callbacks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Background update controller callbacks

Background update controller callbacks allow module developers to control (e.g. rate-limit)
how database background updates are run. A database background update is an operation
Synapse runs on its database in the background after it starts. It's usually used to run
database operations that would take too long if they were run at the same time as schema
updates (which are run on startup) and delay Synapse's startup too much: populating a
table with a big amount of data, adding an index on a big table, etc.
babolivier marked this conversation as resolved.
Show resolved Hide resolved

Background update controller callbacks can be registered using the module API's
`register_background_update_controller_callbacks` method. Only the first module (in order
of appearance in Synapse's configuration file) calling this method can register background
update controller callbacks, subsequent calls are ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the plugin author get to know if their callback was ignored?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently.

The available background update controller callbacks are:

### `on_update`

_First introduced in Synapse v1.48.0_
babolivier marked this conversation as resolved.
Show resolved Hide resolved

```python
def on_update(update_name: str, database_name: str, one_shot: bool) -> AsyncContextManager[int]
```

Called when about to do an iteration of a background update. The module is given the name
of the update, the name of the database, and a flag to indicate whether the background
update will happen in one go and may take a long time (e.g. creating indices). If this last
argument is set to `False`, the update will be run in batches.

The module must return an async context manager that returns the desired duration of the
iteration, in milliseconds, and will be exited when the iteration completes. Note that the
duration returned by the context manager is a target, and an iteration may take
substantially longer or shorter. If the `one_shot` flag is set to `True`, the duration
returned is ignored.
babolivier marked this conversation as resolved.
Show resolved Hide resolved

__Note__: Unlike most module callbacks in Synapse, this one is _synchronous_. This is
because asynchronous operations are expected to be run by the async context manager.

This callback is required when registering any other background update controller callback.

### `default_batch_size`

_First introduced in Synapse v1.48.0_
babolivier marked this conversation as resolved.
Show resolved Hide resolved

```python
async def default_batch_size(update_name: str, database_name: str) -> int
```

Called before the first iteration of a background update, with the name of the update and
of the database. The module must return the number of elements to process in this first
iteration.

If this callback is not defined, Synapse will use a default value of 100.

### `min_batch_size`

_First introduced in Synapse v1.48.0_
babolivier marked this conversation as resolved.
Show resolved Hide resolved

```python
async def min_batch_size(update_name: str, database_name: str) -> int
```

Called before running a new batch for a background update, with the name of the update and
of the database. The module must return the minimum number of elements to process in this
iteration. This number must be greater than 0, and is used to ensure that progress is
always made.
babolivier marked this conversation as resolved.
Show resolved Hide resolved

If this callback is not defined, Synapse will use a default value of 100.
12 changes: 6 additions & 6 deletions docs/modules/writing_a_module.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,15 +71,15 @@ Modules **must** register their web resources in their `__init__` method.
## Registering a callback

Modules can use Synapse's module API to register callbacks. Callbacks are functions that
Synapse will call when performing specific actions. Callbacks must be asynchronous, and
are split in categories. A single module may implement callbacks from multiple categories,
and is under no obligation to implement all callbacks from the categories it registers
callbacks for.
Synapse will call when performing specific actions. Callbacks must be asynchronous (unless
specified otherwise), and are split in categories. A single module may implement callbacks
from multiple categories, and is under no obligation to implement all callbacks from the
categories it registers callbacks for.

Modules can register callbacks using one of the module API's `register_[...]_callbacks`
methods. The callback functions are passed to these methods as keyword arguments, with
the callback name as the argument name and the function as its value. This is demonstrated
in the example below. A `register_[...]_callbacks` method exists for each category.
the callback name as the argument name and the function as its value. A
`register_[...]_callbacks` method exists for each category.

Callbacks for each category can be found on their respective page of the
[Synapse documentation website](https://matrix-org.github.io/synapse).
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,9 @@ def exec_file(path_segments):
# Tests assume that all optional dependencies are installed.
#
# parameterized_class decorator was introduced in parameterized 0.7.0
CONDITIONAL_REQUIREMENTS["test"] = ["parameterized>=0.7.0"]
#
# We use `mock` library as that backports `AsyncMock` to Python 3.6
CONDITIONAL_REQUIREMENTS["test"] = ["parameterized>=0.7.0", "mock>=4.0.0"]

CONDITIONAL_REQUIREMENTS["dev"] = (
CONDITIONAL_REQUIREMENTS["lint"]
Expand Down
57 changes: 56 additions & 1 deletion synapse/module_api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
List,
Optional,
Tuple,
TypeVar,
Union,
)

Expand Down Expand Up @@ -81,10 +82,19 @@
)
from synapse.http.servlet import parse_json_object_from_request
from synapse.http.site import SynapseRequest
from synapse.logging.context import make_deferred_yieldable, run_in_background
from synapse.logging.context import (
defer_to_thread,
make_deferred_yieldable,
run_in_background,
)
from synapse.metrics.background_process_metrics import run_as_background_process
from synapse.rest.client.login import LoginResponse
from synapse.storage import DataStore
from synapse.storage.background_updates import (
DEFAULT_BATCH_SIZE_CALLBACK,
MIN_BATCH_SIZE_CALLBACK,
ON_UPDATE_CALLBACK,
)
from synapse.storage.database import DatabasePool, LoggingTransaction
from synapse.storage.databases.main.roommember import ProfileInfo
from synapse.storage.state import StateFilter
Expand All @@ -104,6 +114,8 @@
from synapse.app.generic_worker import GenericWorkerSlavedStore
from synapse.server import HomeServer

TV = TypeVar("TV")

"""
This package defines the 'stable' API which can be used by extension modules which
are loaded into Synapse.
Expand Down Expand Up @@ -307,6 +319,24 @@ def register_password_auth_provider_callbacks(
auth_checkers=auth_checkers,
)

def register_background_update_controller_callbacks(
self,
on_update: ON_UPDATE_CALLBACK,
default_batch_size: Optional[DEFAULT_BATCH_SIZE_CALLBACK] = None,
min_batch_size: Optional[MIN_BATCH_SIZE_CALLBACK] = None,
) -> None:
"""Registers background update controller callbacks.

Added in Synapse v1.49.0.
"""

for db in self._hs.get_datastores().databases:
db.updates.register_update_controller_callbacks(
on_update=on_update,
default_batch_size=default_batch_size,
min_batch_size=min_batch_size,
)

def register_web_resource(self, path: str, resource: Resource):
"""Registers a web resource to be served at the given path.

Expand Down Expand Up @@ -970,6 +1000,11 @@ def looping_background_call(
f,
)

async def sleep(self, seconds: float) -> None:
"""Sleeps for the given number of seconds."""

await self._clock.sleep(seconds)

async def send_mail(
self,
recipient: str,
Expand Down Expand Up @@ -1124,6 +1159,26 @@ async def get_room_state(

return {key: state_events[event_id] for key, event_id in state_ids.items()}

async def defer_to_thread(
self,
f: Callable[..., TV],
*args: Any,
**kwargs: Any,
) -> TV:
"""Runs the given function in a separate thread from Synapse's thread pool.

Added in Synapse v1.49.0.

Args:
f: The function to run.
args: The function's arguments.
kwargs: The function's keyword arguments.

Returns:
The return value of the function once ran in a thread.
"""
return await defer_to_thread(self._hs.get_reactor(), f, *args, **kwargs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not directly related with the background update work in this PR it's going to be used by https://gitlab.matrix.org/new-vector/ems-synapse-background-update-controller and it's a small change so I thought I'd include it in here. I can appreciate if people think it's too much scope-creeping, and would prefer me to open a separate PR for it.


class PublicRoomListManager:
"""Contains methods for adding to, removing from and querying whether a room
Expand Down
Loading