Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Commit

Permalink
Add documentation for cancellation of request processing
Browse files Browse the repository at this point in the history
Signed-off-by: Sean Quah <seanq@matrix.org>
  • Loading branch information
Sean Quah committed May 17, 2022
1 parent 6d8d121 commit c09f777
Show file tree
Hide file tree
Showing 3 changed files with 389 additions and 0 deletions.
1 change: 1 addition & 0 deletions changelog.d/12761.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add documentation for cancellation of request processing.
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@
- [Database Schemas](development/database_schema.md)
- [Experimental features](development/experimental_features.md)
- [Synapse Architecture]()
- [Cancellation](cancellation.md)
- [Log Contexts](log_contexts.md)
- [Replication](replication.md)
- [TCP Replication](tcp_replication.md)
Expand Down
387 changes: 387 additions & 0 deletions docs/cancellation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
# Cancellation
Sometimes, requests take a long time to service and clients disconnect
before Synapse produces a response. To avoid wasting resources, Synapse
can cancel request processing for select endpoints with the `@cancelled`
decorator.

Synapse makes use of Twisted's `Deferred.cancel()` feature to make
cancellation work.

## Enabling cancellation for an endpoint
1. Check that the endpoint method, and any `async` functions in its call
tree handle cancellation correctly. See
[Handling cancellation correctly](#handling-cancellation-correctly)
for a list of things to look out for.
2. Apply the `@cancellable` decorator to the `on_GET/POST/PUT/DELETE`
method. It's not recommended to make non-`GET` methods cancellable,
since cancellation midway through some database updates is less
likely to be handled correctly.

## Mechanics
There are two stages to cancellation: downward propagation of a
`cancel()` call, followed by upwards propagation of a `CancelledError`
out of a blocked `await`.
Both Twisted and asyncio have a cancellation mechanism.

| | Method | Exception | Exception inherits from |
|---------------|---------------------|-----------------------------------------|-------------------------|
| Twisted | `Deferred.cancel()` | `twisted.internet.defer.CancelledError` | `Exception` (!) |
| asyncio | `Task.cancel()` | `asyncio.CancelledError` | `BaseException` |

### Deferred.cancel()
When Synapse starts handling a request, it runs the async method
responsible for handling it using `defer.ensureDeferred`, which returns
a `Deferred`.

```python
def do_something() -> Deferred[None]:
...

async def on_GET() -> Tuple[int, JsonDict]:
d = make_deferred_yieldable(do_something())
await d
return 200, {}

request = defer.ensureDeferred(on_GET())
```

During cancellation, `Deferred.cancel()` is called on the `Deferred`
from `defer.ensureDeferred`, `request`. Twisted knows which `Deferred`
`request` is waiting on and passes the `cancel()` call on to `d`.

The `Deferred` being waited on, `d`, may have its own handling for
`cancel()` and pass the call on to other `Deferred`s.

Eventually, a `Deferred` handles the `cancel()` call by resolving itself
with a `CancelledError`.

### CancelledError
The `CancelledError` gets raised out of the `await` and bubbles up, as
per normal Python exception handling.

## Handling cancellation correctly
In general, when writing code that might be subject to cancellation, two
things must be considered:
* The effect of `CancelledError`s raised out of `await`s.
* The effect of `Deferred`s being `cancel()`ed.

Examples of code that handles cancellation incorrectly include:
* `try-except` blocks which swallow `CancelledError`s.
* Code that shares the same `Deferred`, which may be cancelled, between
multiple requests.
* Code that starts some processing that's exempt from cancellation, but
uses a logging context from cancellable code. The logging context
will be finished upon cancellation, while the uncancelled processing
is still using it.

Some common patterns are listed below in more detail.

### `async` function calls
Most functions in Synapse are relatively straightforward from a
cancellation standpoint: they don't do anything with `Deferred`s and
purely call and `await` other `async` functions.

An `async` function handles cancellation correctly if its own code
handles cancellation correctly and all the async function it calls
handle cancellation correctly. For example:
```python
async def do_two_things() -> None:
check_something()
await do_something()
await do_something_else()
```
`do_two_things` handles cancellation correctly if `do_something` and
`do_something_else` handle cancellation correctly.

That is, when checking whether a function handles cancellation
correctly, its implementation and all its `async` function calls need to
be checked, recursively.

As `check_something` is not `async`, it does not need to be checked.

### CancelledErrors
Because Twisted's `CancelledError`s are `Exception`s, it's easy to
accidentally catch and suppress them. Care must be taken to ensure that
`CancelledError`s are allowed to propagate upwards.

<table width="100%">
<tr>
<td width="50%" valign="top">

**Bad**:
```python
try:
await do_something()
except Exception:
# `CancelledError` gets swallowed here.
logger.info(...)
```
</td>
<td width="50%" valign="top">

**Good**:
```python
try:
await do_something()
except CancelledError:
raise
except Exception:
logger.info(...)
```
</td>
</tr>
<tr>
<td width="50%" valign="top">

**OK**:
```python
try:
check_something()
# A `CancelledError` won't ever be raised here.
except Exception:
logger.info(...)
```
</td>
<td width="50%" valign="top">

**Good**:
```python
try:
await do_something()
except ValueError:
logger.info(...)
```
</td>
</tr>
</table>

#### defer.gatherResults
`defer.gatherResults` produces a `Deferred` which:
* broadcasts `cancel()` calls to every `Deferred` being waited on.
* wraps the first exception it sees in a `FirstError`.

Together, this means that `CancelledError`s will be wrapped in
a `FirstError` unless unwrapped. Such `FirstError`s are liable to be
swallowed, so they must be unwrapped.

<table width="100%">
<tr>
<td width="50%" valign="top">

**Bad**:
```python
async def do_something() -> None:
await make_deferred_yieldable(
defer.gatherResults([...], consumeErrors=True)
)

try:
await do_something()
except CancelledError:
raise
except Exception:
# `FirstError(CancelledError)` gets swallowed here.
logger.info(...)
```

</td>
<td width="50%" valign="top">

**Good**:
```python
async def do_something() -> None:
await make_deferred_yieldable(
defer.gatherResults([...], consumeErrors=True)
).addErrback(unwrapFirstError)

try:
await do_something()
except CancelledError:
raise
except Exception:
logger.info(...)
```
</td>
</tr>
</table>

### Creation of `Deferred`s
If a function creates a `Deferred`, the effect of cancelling it must be considered. `Deferred`s that get shared are likely to have unintended behaviour when cancelled.

<table width="100%">
<tr>
<td width="50%" valign="top">

**Bad**:
```python
cache: Dict[str, Deferred[None]] = {}

def wait_for_room(room_id: str) -> Deferred[None]:
deferred = cache.get(room_id)
if deferred is None:
deferred = Deferred()
cache[room_id] = deferred
# `deferred` can have multiple waiters.
# All of them will observe a `CancelledError`
# if any one of them is cancelled.
return make_deferred_yieldable(deferred)

# Request 1
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
# Request 2
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
```
</td>
<td width="50%" valign="top">

**Good**:
```python
cache: Dict[str, Deferred[None]] = {}

def wait_for_room(room_id: str) -> Deferred[None]:
deferred = cache.get(room_id)
if deferred is None:
deferred = Deferred()
cache[room_id] = deferred
# `deferred` will never be cancelled now.
# A `CancelledError` will still come out of
# the `await`.
# `delay_cancellation` may also be used.
return make_deferred_yieldable(stop_cancellation(deferred))

# Request 1
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
# Request 2
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
```
</td>
</tr>
<tr>
<td width="50%" valign="top">
</td>
<td width="50%" valign="top">

**Good**:
```python
cache: Dict[str, List[Deferred[None]]] = {}

def wait_for_room(room_id: str) -> Deferred[None]:
if room_id not in cache:
cache[room_id] = []
# Each request gets its own `Deferred` to wait on.
deferred = Deferred()
cache[room_id]].append(deferred)
return make_deferred_yieldable(deferred)

# Request 1
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
# Request 2
await wait_for_room("!aAAaaAaaaAAAaAaAA:matrix.org")
```
</td>
</table>

### Uncancelled processing
Some `async` functions may kick off some `async` processing which is
intentionally protected from cancellation, by `stop_cancellation` or
other means. If the `async` processing inherits the logcontext of the
request which initiated it, care must be taken to ensure that the
logcontext is not finished before the `async` processing completes.

<table width="100%">
<tr>
<td width="50%" valign="top">

**Bad**:
```python
cache: Optional[ObservableDeferred[None]] = None

async def do_something_else(
to_resolve: Deferred[None]
) -> None:
await ...
logger.info("done!")
to_resolve.callback(None)

async def do_something() -> None:
if not cache:
to_resolve = Deferred()
cache = ObservableDeferred(to_resolve)
# `do_something_else` will never be cancelled and
# can outlive the `request-1` logging context.
run_in_background(do_something_else, to_resolve)

await make_deferred_yieldable(cache.observe())

with LoggingContext("request-1"):
await do_something()
```
</td>
<td width="50%" valign="top">

**Good**:
```python
cache: Optional[ObservableDeferred[None]] = None

async def do_something_else(
to_resolve: Deferred[None]
) -> None:
await ...
logger.info("done!")
to_resolve.callback(None)

async def do_something() -> None:
if not cache:
to_resolve = Deferred()
cache = ObservableDeferred(to_resolve)
run_in_background(do_something_else, to_resolve)
# We'll wait until `do_something_else` is
# done before raising a `CancelledError`.
await make_deferred_yieldable(
delay_cancellation(cache.observe())
)
else:
await make_deferred_yieldable(cache.observe())

with LoggingContext("request-1"):
await do_something()
```
</td>
</tr>
<tr>
<td width="50%">

**OK**:
```python
cache: Optional[ObservableDeferred[None]] = None

async def do_something_else(
to_resolve: Deferred[None]
) -> None:
await ...
logger.info("done!")
to_resolve.callback(None)

async def do_something() -> None:
if not cache:
to_resolve = Deferred()
cache = ObservableDeferred(to_resolve)
# `do_something_else` will get its own independent
# logging context. `request-1` will not count any
# metrics from `do_something_else`.
run_as_background_process(
"do_something_else",
do_something_else,
to_resolve,
)

await make_deferred_yieldable(cache.observe())

with LoggingContext("request-1"):
await do_something()
```
</td>
<td width="50%">
</td>
</tr>
</table>

0 comments on commit c09f777

Please sign in to comment.