-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add a dockerfile for running a set of Synapse worker processes #9162
Conversation
285920b
to
3eb9fec
Compare
I've found that the |
4aae6f3
to
c52ee68
Compare
Still not sure what's causing this. I propose we continue on with it disabled for now. @MatMaul will investigate further in the coming day. I've written up an issue for this as well: #9192 |
This was causing worker_listener blocks to get generated with Nonetype resources, crashing things. This was only a problem for worker types that don't have any configured resources.
c52ee68
to
1373a76
Compare
1373a76
to
e66f9e7
Compare
Hm, and now media traffic isn't reaching the media worker... some more debugging needed. |
MatMaul and I were able to figure out why the media repository wasn't reachable. Turns out Complement was sending its traffic directly to the main process instead of nginx. The main process doesn't have the media resource available by default, so that tipped us off. Combined with the changes in matrix-org/synapse#9162 outside requests to 8008 now actually get routed to workers if necessary. With this, the media tests now work - and we can switch on all available worker types!
Turns out this was only a problem when using the Complement Dockerfile, and was a result of traffic being sent to the wrong port. Things have been fixed on the Complement side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should restrict this to complement, I don't think its worth having a docker image that sort-of-but-not-really can be used to run workers.
This will also need to support running multiple workers as a given type, e.g. event_persisters
, so that we can test sharding. If we're doing this as a complement specific docker image then I'd vote we just support "monolith" or "all the workers" modes.
(I think its probably to ask whether it would have been easier to make complement support docker compose?)
docker/conf/homeserver.yaml
Outdated
@@ -40,7 +40,7 @@ listeners: | |||
compress: false | |||
{% endif %} | |||
|
|||
- port: 8008 | |||
- port: {{ SYNAPSE_PORT or 8008 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we making this configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that the reverse proxy can listen on 8008 and route traffic instead. Complement will only contact the homeserver on 8008/8448 (and making it configurable on Complement's end has been rejected).
I can place a comment down if you'd like?
proxy_pass http://localhost:8080; | ||
proxy_set_header X-Forwarded-For $remote_addr; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why nginx? We use haproxy everywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nginx was the solution that was reached for first, back in the design doc, mostly due to its simple config format. I can replace it with HAProxy instead, though it'd be a bit of a pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I will note to myself that there is quite a lot we can crib from https://github.com/matrix-org/sytest/blob/c75ff0449b5b62bdebfdb9024170988b519628d9/lib/SyTest/Homeserver/Synapse.pm#L1145-L1305.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if we want to support more than just the basic load-balancing techniques, then we'll likely need to switch over to HAProxy.
I'll check how big of a task this will be, but at the moment I'm leaning towards just supporting round-robin LB for now with nginx, and then switch to HAProxy after the fact. Doing so shouldn't require any changes in Synapse/Complement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So nginx does actually support load-balancing by request IP, or by hashing something in the request: https://nginx.org/en/docs/http/ngx_http_upstream_module.html#hash
But actually thinking about it again I don't think we really want any load-balancing techniques other than round-robin for the Complement usecase. We certainly don't want to load-balance by IP, as all requests from our test infra would be coming from localhost.
In fact I think what we really want is consistency in load-balancing between test runs. And round-robin gives us exactly that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason to use haproxy rather than nginx is that haproxy is kinda what we "officially" support. It doesn't matter too much but its nice to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should move some of the stuff specific to the worker docker to conf-workers/
instead? It's a bit confusing what is being used for what etc.
I also think this would have been a lot simpler if instead of making it all configurable we just had the majority of it as static config, but I'm also conscious that this has been open for ages now.
I'd quite like to get thoughts from other people here about whether this is good enough for now. |
Good idea.
I think generating config files is still the way to go, rather than including a separate config for each worker. The added complexity on top of that which allows for customising which workers are in use isn't too substantial I don't think. Plus as mentioned earlier it can be handy to easily be able to specify your own setup of workers, such as when Kegan did for his project recently. |
Co-authored-by: Erik Johnston <erik@matrix.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd quite like to get thoughts from other people here about whether this is good enough for now.
Generally, I think it's fine. I've mentioned a few nits below, but I think the main thing I'd like to see changed is to move the docs out of the main docker/README, both because it's a separate docker image, but also because it helps point out the fact that this probably isn't something you want to be using unless you know what you're doing.
docker/README.md
Outdated
You can read about jemalloc by reading the Synapse [README](../README.md) | ||
|
||
### Running all worker processes in a single container for testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be a separate file. This file is copied to https://hub.docker.com/r/matrixdotorg/synapse, and it's not really relevant to the main matrixdotorg/synapse
image. (indeed, it would be rather confusing for people reading it at https://hub.docker.com/r/matrixdotorg/synapse)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh indeed, I completely forgot that it ends up on Docker Hub. Good call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I'm struggling to come up with a good place to put it... Perhaps README-testing.md? Or maybe it should go in docs/dev/
or CONTRIBUTING instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go with a README-testing.md
in the docker dir, since it's mostly describing how the docker image works. We can always add references to it from elsewhere (eg it would be good to stick a comment in the the dockerfile itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now done this.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
…apse_worker_docker
…apse_worker_docker
As we already have a SYNAPSE_WORKER env var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm once you fix the nits below
Synapse 1.33.0rc1 (2021-04-28) ============================== Features -------- - Update experimental support for [MSC3083](matrix-org/matrix-spec-proposals#3083): restricting room access via group membership. ([\#9800](#9800), [\#9814](#9814)) - Add experimental support for handling presence on a worker. ([\#9819](#9819), [\#9820](#9820), [\#9828](#9828), [\#9850](#9850)) - Return a new template when an user attempts to renew their account multiple times with the same token, stating that their account is set to expire. This replaces the invalid token template that would previously be shown in this case. This change concerns the optional account validity feature. ([\#9832](#9832)) Bugfixes -------- - Fixes the OIDC SSO flow when using a `public_baseurl` value including a non-root URL path. ([\#9726](#9726)) - Fix thumbnail generation for some sites with non-standard content types. Contributed by @rkfg. ([\#9788](#9788)) - Add some sanity checks to identity server passed to 3PID bind/unbind endpoints. ([\#9802](#9802)) - Limit the size of HTTP responses read over federation. ([\#9833](#9833)) - Fix a bug which could cause Synapse to get stuck in a loop of resyncing device lists. ([\#9867](#9867)) - Fix a long-standing bug where errors from federation did not propagate to the client. ([\#9868](#9868)) Improved Documentation ---------------------- - Add a note to the docker docs mentioning that we mirror upstream's supported Docker platforms. ([\#9801](#9801)) Internal Changes ---------------- - Add a dockerfile for running Synapse in worker-mode under Complement. ([\#9162](#9162)) - Apply `pyupgrade` across the codebase. ([\#9786](#9786)) - Move some replication processing out of `generic_worker`. ([\#9796](#9796)) - Replace `HomeServer.get_config()` with inline references. ([\#9815](#9815)) - Rename some handlers and config modules to not duplicate the top-level module. ([\#9816](#9816)) - Fix a long-standing bug which caused `max_upload_size` to not be correctly enforced. ([\#9817](#9817)) - Reduce CPU usage of the user directory by reusing existing calculated room membership. ([\#9821](#9821)) - Small speed up for joining large remote rooms. ([\#9825](#9825)) - Introduce flake8-bugbear to the test suite and fix some of its lint violations. ([\#9838](#9838)) - Only store the raw data in the in-memory caches, rather than objects that include references to e.g. the data stores. ([\#9845](#9845)) - Limit length of accepted email addresses. ([\#9855](#9855)) - Remove redundant `synapse.types.Collection` type definition. ([\#9856](#9856)) - Handle recently added rate limits correctly when using `--no-rate-limit` with the demo scripts. ([\#9858](#9858)) - Disable invite rate-limiting by default when running the unit tests. ([\#9871](#9871)) - Pass a reactor into `SynapseSite` to make testing easier. ([\#9874](#9874)) - Make `DomainSpecificString` an `attrs` class. ([\#9875](#9875)) - Add type hints to `synapse.api.auth` and `synapse.api.auth_blocking` modules. ([\#9876](#9876)) - Remove redundant `_PushHTTPChannel` test class. ([\#9878](#9878)) - Remove backwards-compatibility code for Python versions < 3.6. ([\#9879](#9879)) - Small performance improvement around handling new local presence updates. ([\#9887](#9887))
…worker configuration (#105) This PR updates Complement's Synapse worker-flavoured docker image in a few ways: * Updates the base image name from `synapse:workers` to `synapse-workers`, as it's built from a separate Dockerfile from the base `matrixdotorg/synapse` image. * Update the `SYNAPSE_WORKERS` env var name to `SYNAPSE_WORKER_TYPES` * Switches to an explicit list of worker names. We dropped the `*` functionality after adding sharding capabilities. The worker configuration mirrors [that of sytest](https://github.com/matrix-org/sytest/blob/7d297956d6c5a177d2ea492ce6561b2e2500b291/lib/SyTest/Homeserver/Synapse.pm#L657-L1051). These changes grew out of [this PR on Synapse](matrix-org/synapse#9162) which changed the base image. I'd ideally like feedback from the Synapse team on the chosen worker configuration.
Synapse 1.33.0 (2021-05-05) =========================== Features -------- - Build Debian packages for Ubuntu 21.04 (Hirsute Hippo). ([\#9909](matrix-org/synapse#9909)) Synapse 1.33.0rc2 (2021-04-29) ============================== Bugfixes -------- - Fix tight loop when handling presence replication when using workers. Introduced in v1.33.0rc1. ([\#9900](matrix-org/synapse#9900)) Synapse 1.33.0rc1 (2021-04-28) ============================== Features -------- - Update experimental support for [MSC3083](matrix-org/matrix-spec-proposals#3083): restricting room access via group membership. ([\#9800](matrix-org/synapse#9800), [\#9814](matrix-org/synapse#9814)) - Add experimental support for handling presence on a worker. ([\#9819](matrix-org/synapse#9819), [\#9820](matrix-org/synapse#9820), [\#9828](matrix-org/synapse#9828), [\#9850](matrix-org/synapse#9850)) - Return a new template when an user attempts to renew their account multiple times with the same token, stating that their account is set to expire. This replaces the invalid token template that would previously be shown in this case. This change concerns the optional account validity feature. ([\#9832](matrix-org/synapse#9832)) Bugfixes -------- - Fixes the OIDC SSO flow when using a `public_baseurl` value including a non-root URL path. ([\#9726](matrix-org/synapse#9726)) - Fix thumbnail generation for some sites with non-standard content types. Contributed by @rkfg. ([\#9788](matrix-org/synapse#9788)) - Add some sanity checks to identity server passed to 3PID bind/unbind endpoints. ([\#9802](matrix-org/synapse#9802)) - Limit the size of HTTP responses read over federation. ([\#9833](matrix-org/synapse#9833)) - Fix a bug which could cause Synapse to get stuck in a loop of resyncing device lists. ([\#9867](matrix-org/synapse#9867)) - Fix a long-standing bug where errors from federation did not propagate to the client. ([\#9868](matrix-org/synapse#9868)) Improved Documentation ---------------------- - Add a note to the docker docs mentioning that we mirror upstream's supported Docker platforms. ([\#9801](matrix-org/synapse#9801)) Internal Changes ---------------- - Add a dockerfile for running Synapse in worker-mode under Complement. ([\#9162](matrix-org/synapse#9162)) - Apply `pyupgrade` across the codebase. ([\#9786](matrix-org/synapse#9786)) - Move some replication processing out of `generic_worker`. ([\#9796](matrix-org/synapse#9796)) - Replace `HomeServer.get_config()` with inline references. ([\#9815](matrix-org/synapse#9815)) - Rename some handlers and config modules to not duplicate the top-level module. ([\#9816](matrix-org/synapse#9816)) - Fix a long-standing bug which caused `max_upload_size` to not be correctly enforced. ([\#9817](matrix-org/synapse#9817)) - Reduce CPU usage of the user directory by reusing existing calculated room membership. ([\#9821](matrix-org/synapse#9821)) - Small speed up for joining large remote rooms. ([\#9825](matrix-org/synapse#9825)) - Introduce flake8-bugbear to the test suite and fix some of its lint violations. ([\#9838](matrix-org/synapse#9838)) - Only store the raw data in the in-memory caches, rather than objects that include references to e.g. the data stores. ([\#9845](matrix-org/synapse#9845)) - Limit length of accepted email addresses. ([\#9855](matrix-org/synapse#9855)) - Remove redundant `synapse.types.Collection` type definition. ([\#9856](matrix-org/synapse#9856)) - Handle recently added rate limits correctly when using `--no-rate-limit` with the demo scripts. ([\#9858](matrix-org/synapse#9858)) - Disable invite rate-limiting by default when running the unit tests. ([\#9871](matrix-org/synapse#9871)) - Pass a reactor into `SynapseSite` to make testing easier. ([\#9874](matrix-org/synapse#9874)) - Make `DomainSpecificString` an `attrs` class. ([\#9875](matrix-org/synapse#9875)) - Add type hints to `synapse.api.auth` and `synapse.api.auth_blocking` modules. ([\#9876](matrix-org/synapse#9876)) - Remove redundant `_PushHTTPChannel` test class. ([\#9878](matrix-org/synapse#9878)) - Remove backwards-compatibility code for Python versions < 3.6. ([\#9879](matrix-org/synapse#9879)) - Small performance improvement around handling new local presence updates. ([\#9887](matrix-org/synapse#9887))
This PR adds a Dockerfile and some supporting files to the
docker/
directory. The Dockerfile's intention is to spin up a container with:SYNAPSE_WORKER_TYPES
environment variable supplied at runtime.Note that this is not currently intended to be used in production. If you'd like to use Synapse workers with Docker, instead make use of the official image, with one worker per container. The purpose of this dockerfile is currently to allow testing Synapse in worker mode with the Complement test suite.
configure_workers_and_start.py
is where most of the magic happens in this PR. It reads from environment variables (documented in the file) and creates all necessary config files for the processes. It is the entrypoint of the Dockerfile, and thus is run any time the docker container is spun up, recreating all config files in case you want to use a different set of workers. One can specify which workers they'd like to use by setting theSYNAPSE_WORKER_TYPES
environment variable (as a comma-separated list of arbitrary worker names) or by setting it to*
for all worker processes. We will be using the latter in CI.Huge thanks to @MatMaul for helping get this all working 🎉 This PR is paired with its equivalent on the Complement side: matrix-org/complement#62.
Note, for the purpose of testing this PR before it's merged: You'll need to (re)build the base Synapse docker image for everything to work (
matrixdotorg/synapse:latest
). Then build the worker-based docker image on top (matrixdotorg/synapse:workers
).TODO:
Take off the /dev/null's from postgres and caddy. It can hide startup errors, and they seem to mostly be quiet after startup.This is probably best reviewed with all commits at once.