Add a dockerfile for running a set of Synapse worker processes #9162

anoadragon453 · 2021-01-19T17:52:36Z

This PR adds a Dockerfile and some supporting files to the docker/ directory. The Dockerfile's intention is to spin up a container with:

A Synapse main process.
Any desired worker processes, defined by a SYNAPSE_WORKER_TYPES environment variable supplied at runtime.
A redis for worker communication.
A nginx for routing traffic.
A supervisord to start all worker processes and monitor them if any go down.

Note that this is not currently intended to be used in production. If you'd like to use Synapse workers with Docker, instead make use of the official image, with one worker per container. The purpose of this dockerfile is currently to allow testing Synapse in worker mode with the Complement test suite.

configure_workers_and_start.py is where most of the magic happens in this PR. It reads from environment variables (documented in the file) and creates all necessary config files for the processes. It is the entrypoint of the Dockerfile, and thus is run any time the docker container is spun up, recreating all config files in case you want to use a different set of workers. One can specify which workers they'd like to use by setting the SYNAPSE_WORKER_TYPES environment variable (as a comma-separated list of arbitrary worker names) or by setting it to * for all worker processes. We will be using the latter in CI.

Huge thanks to @MatMaul for helping get this all working 🎉 This PR is paired with its equivalent on the Complement side: matrix-org/complement#62.

Note, for the purpose of testing this PR before it's merged: You'll need to (re)build the base Synapse docker image for everything to work (matrixdotorg/synapse:latest). Then build the worker-based docker image on top (matrixdotorg/synapse:workers).

TODO:

~~Take off the /dev/null's from postgres and caddy. It can hide startup errors, and they seem to mostly be quiet after startup.~~

This is probably best reviewed with all commits at once.

anoadragon453 · 2021-01-20T11:02:38Z

I've found that the federation_sender worker currently causes some tests (TestOutboundFederationSend) to fail. I'm currently investigating this.

anoadragon453 · 2021-01-21T14:51:04Z

I've found that the federation_sender worker currently causes some tests (TestOutboundFederationSend) to fail. I'm currently investigating this.

Still not sure what's causing this. I propose we continue on with it disabled for now. @MatMaul will investigate further in the coming day. I've written up an issue for this as well: #9192

This was causing worker_listener blocks to get generated with Nonetype resources, crashing things. This was only a problem for worker types that don't have any configured resources.

anoadragon453 · 2021-01-22T14:44:36Z

Hm, and now media traffic isn't reaching the media worker... some more debugging needed.

MatMaul and I were able to figure out why the media repository wasn't reachable. Turns out Complement was sending its traffic directly to the main process instead of nginx. The main process doesn't have the media resource available by default, so that tipped us off. Combined with the changes in matrix-org/synapse#9162 outside requests to 8008 now actually get routed to workers if necessary. With this, the media tests now work - and we can switch on all available worker types!

anoadragon453 · 2021-01-25T15:04:06Z

Hm, and now media traffic isn't reaching the media worker... some more debugging needed.

Turns out this was only a problem when using the Complement Dockerfile, and was a result of traffic being sent to the wrong port. Things have been fixed on the Complement side.

docker/Dockerfile-workers

erikjohnston

I think we should restrict this to complement, I don't think its worth having a docker image that sort-of-but-not-really can be used to run workers.

This will also need to support running multiple workers as a given type, e.g. event_persisters, so that we can test sharding. If we're doing this as a complement specific docker image then I'd vote we just support "monolith" or "all the workers" modes.

(I think its probably to ask whether it would have been easier to make complement support docker compose?)

erikjohnston · 2021-01-27T16:27:39Z

docker/conf/homeserver.yaml

@@ -40,7 +40,7 @@ listeners:
        compress: false
  {% endif %}

-  - port: 8008
+  - port: {{ SYNAPSE_PORT or 8008 }}


Why are we making this configurable?

So that the reverse proxy can listen on 8008 and route traffic instead. Complement will only contact the homeserver on 8008/8448 (and making it configurable on Complement's end has been rejected).

I can place a comment down if you'd like?

docker/Dockerfile-workers

docker/configure_workers_and_start.py

erikjohnston · 2021-01-27T16:29:14Z

docker/configure_workers_and_start.py

+        proxy_pass http://localhost:8080;
+        proxy_set_header X-Forwarded-For $remote_addr;
+    }
+}


Why nginx? We use haproxy everywhere else.

Nginx was the solution that was reached for first, back in the design doc, mostly due to its simple config format. I can replace it with HAProxy instead, though it'd be a bit of a pain.

Although I will note to myself that there is quite a lot we can crib from https://github.com/matrix-org/sytest/blob/c75ff0449b5b62bdebfdb9024170988b519628d9/lib/SyTest/Homeserver/Synapse.pm#L1145-L1305.

It looks like if we want to support more than just the basic load-balancing techniques, then we'll likely need to switch over to HAProxy.

I'll check how big of a task this will be, but at the moment I'm leaning towards just supporting round-robin LB for now with nginx, and then switch to HAProxy after the fact. Doing so shouldn't require any changes in Synapse/Complement.

So nginx does actually support load-balancing by request IP, or by hashing something in the request: https://nginx.org/en/docs/http/ngx_http_upstream_module.html#hash

But actually thinking about it again I don't think we really want any load-balancing techniques other than round-robin for the Complement usecase. We certainly don't want to load-balance by IP, as all requests from our test infra would be coming from localhost.

In fact I think what we really want is consistency in load-balancing between test runs. And round-robin gives us exactly that.

The main reason to use haproxy rather than nginx is that haproxy is kinda what we "officially" support. It doesn't matter too much but its nice to be consistent

erikjohnston

I wonder if we should move some of the stuff specific to the worker docker to conf-workers/ instead? It's a bit confusing what is being used for what etc.

I also think this would have been a lot simpler if instead of making it all configurable we just had the majority of it as static config, but I'm also conscious that this has been open for ages now.

docker/README.md

erikjohnston · 2021-04-06T09:26:01Z

I'd quite like to get thoughts from other people here about whether this is good enough for now.

anoadragon453 · 2021-04-06T09:36:27Z

I wonder if we should move some of the stuff specific to the worker docker to conf-workers/ instead? It's a bit confusing what is being used for what etc.

Good idea.

I also think this would have been a lot simpler if instead of making it all configurable we just had the majority of it as static config, but I'm also conscious that this has been open for ages now.

I think generating config files is still the way to go, rather than including a separate config for each worker. The added complexity on top of that which allows for customising which workers are in use isn't too substantial I don't think.

Plus as mentioned earlier it can be handy to easily be able to specify your own setup of workers, such as when Kegan did for his project recently.

Co-authored-by: Erik Johnston <erik@matrix.org>

richvdh

I'd quite like to get thoughts from other people here about whether this is good enough for now.

Generally, I think it's fine. I've mentioned a few nits below, but I think the main thing I'd like to see changed is to move the docs out of the main docker/README, both because it's a separate docker image, but also because it helps point out the fact that this probably isn't something you want to be using unless you know what you're doing.

docker/README.md

richvdh · 2021-04-06T13:31:28Z

docker/README.md

+You can read about jemalloc by reading the Synapse [README](../README.md)
+
+### Running all worker processes in a single container for testing


I feel like this should be a separate file. This file is copied to https://hub.docker.com/r/matrixdotorg/synapse, and it's not really relevant to the main matrixdotorg/synapse image. (indeed, it would be rather confusing for people reading it at https://hub.docker.com/r/matrixdotorg/synapse)

Oh indeed, I completely forgot that it ends up on Docker Hub. Good call.

Although I'm struggling to come up with a good place to put it... Perhaps README-testing.md? Or maybe it should go in docs/dev/ or CONTRIBUTING instead?

I'd go with a README-testing.md in the docker dir, since it's mostly describing how the docker image works. We can always add references to it from elsewhere (eg it would be good to stick a comment in the the dockerfile itself).

I've now done this.

docker/README.md

docker/conf/log.config

docker/configure_workers_and_start.py

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>

…apse_worker_docker

…nt testing

docker/README-testing.md

…apse_worker_docker

As we already have a SYNAPSE_WORKER env var.

richvdh

lgtm once you fix the nits below

docker/README.md

docker/README-testing.md

docker/configure_workers_and_start.py

…apse_worker_docker

@rkfg

Synapse 1.33.0rc1 (2021-04-28) ============================== Features -------- - Update experimental support for [MSC3083](matrix-org/matrix-spec-proposals#3083): restricting room access via group membership. ([\#9800](#9800), [\#9814](#9814)) - Add experimental support for handling presence on a worker. ([\#9819](#9819), [\#9820](#9820), [\#9828](#9828), [\#9850](#9850)) - Return a new template when an user attempts to renew their account multiple times with the same token, stating that their account is set to expire. This replaces the invalid token template that would previously be shown in this case. This change concerns the optional account validity feature. ([\#9832](#9832)) Bugfixes -------- - Fixes the OIDC SSO flow when using a `public_baseurl` value including a non-root URL path. ([\#9726](#9726)) - Fix thumbnail generation for some sites with non-standard content types. Contributed by @rkfg. ([\#9788](#9788)) - Add some sanity checks to identity server passed to 3PID bind/unbind endpoints. ([\#9802](#9802)) - Limit the size of HTTP responses read over federation. ([\#9833](#9833)) - Fix a bug which could cause Synapse to get stuck in a loop of resyncing device lists. ([\#9867](#9867)) - Fix a long-standing bug where errors from federation did not propagate to the client. ([\#9868](#9868)) Improved Documentation ---------------------- - Add a note to the docker docs mentioning that we mirror upstream's supported Docker platforms. ([\#9801](#9801)) Internal Changes ---------------- - Add a dockerfile for running Synapse in worker-mode under Complement. ([\#9162](#9162)) - Apply `pyupgrade` across the codebase. ([\#9786](#9786)) - Move some replication processing out of `generic_worker`. ([\#9796](#9796)) - Replace `HomeServer.get_config()` with inline references. ([\#9815](#9815)) - Rename some handlers and config modules to not duplicate the top-level module. ([\#9816](#9816)) - Fix a long-standing bug which caused `max_upload_size` to not be correctly enforced. ([\#9817](#9817)) - Reduce CPU usage of the user directory by reusing existing calculated room membership. ([\#9821](#9821)) - Small speed up for joining large remote rooms. ([\#9825](#9825)) - Introduce flake8-bugbear to the test suite and fix some of its lint violations. ([\#9838](#9838)) - Only store the raw data in the in-memory caches, rather than objects that include references to e.g. the data stores. ([\#9845](#9845)) - Limit length of accepted email addresses. ([\#9855](#9855)) - Remove redundant `synapse.types.Collection` type definition. ([\#9856](#9856)) - Handle recently added rate limits correctly when using `--no-rate-limit` with the demo scripts. ([\#9858](#9858)) - Disable invite rate-limiting by default when running the unit tests. ([\#9871](#9871)) - Pass a reactor into `SynapseSite` to make testing easier. ([\#9874](#9874)) - Make `DomainSpecificString` an `attrs` class. ([\#9875](#9875)) - Add type hints to `synapse.api.auth` and `synapse.api.auth_blocking` modules. ([\#9876](#9876)) - Remove redundant `_PushHTTPChannel` test class. ([\#9878](#9878)) - Remove backwards-compatibility code for Python versions < 3.6. ([\#9879](#9879)) - Small performance improvement around handling new local presence updates. ([\#9887](#9887))

…worker configuration (#105) This PR updates Complement's Synapse worker-flavoured docker image in a few ways: * Updates the base image name from `synapse:workers` to `synapse-workers`, as it's built from a separate Dockerfile from the base `matrixdotorg/synapse` image. * Update the `SYNAPSE_WORKERS` env var name to `SYNAPSE_WORKER_TYPES` * Switches to an explicit list of worker names. We dropped the `*` functionality after adding sharding capabilities. The worker configuration mirrors [that of sytest](https://github.com/matrix-org/sytest/blob/7d297956d6c5a177d2ea492ce6561b2e2500b291/lib/SyTest/Homeserver/Synapse.pm#L657-L1051). These changes grew out of [this PR on Synapse](matrix-org/synapse#9162) which changed the base image. I'd ideally like feedback from the Synapse team on the chosen worker configuration.

@rkfg

Synapse 1.33.0 (2021-05-05) =========================== Features -------- - Build Debian packages for Ubuntu 21.04 (Hirsute Hippo). ([\#9909](matrix-org/synapse#9909)) Synapse 1.33.0rc2 (2021-04-29) ============================== Bugfixes -------- - Fix tight loop when handling presence replication when using workers. Introduced in v1.33.0rc1. ([\#9900](matrix-org/synapse#9900)) Synapse 1.33.0rc1 (2021-04-28) ============================== Features -------- - Update experimental support for [MSC3083](matrix-org/matrix-spec-proposals#3083): restricting room access via group membership. ([\#9800](matrix-org/synapse#9800), [\#9814](matrix-org/synapse#9814)) - Add experimental support for handling presence on a worker. ([\#9819](matrix-org/synapse#9819), [\#9820](matrix-org/synapse#9820), [\#9828](matrix-org/synapse#9828), [\#9850](matrix-org/synapse#9850)) - Return a new template when an user attempts to renew their account multiple times with the same token, stating that their account is set to expire. This replaces the invalid token template that would previously be shown in this case. This change concerns the optional account validity feature. ([\#9832](matrix-org/synapse#9832)) Bugfixes -------- - Fixes the OIDC SSO flow when using a `public_baseurl` value including a non-root URL path. ([\#9726](matrix-org/synapse#9726)) - Fix thumbnail generation for some sites with non-standard content types. Contributed by @rkfg. ([\#9788](matrix-org/synapse#9788)) - Add some sanity checks to identity server passed to 3PID bind/unbind endpoints. ([\#9802](matrix-org/synapse#9802)) - Limit the size of HTTP responses read over federation. ([\#9833](matrix-org/synapse#9833)) - Fix a bug which could cause Synapse to get stuck in a loop of resyncing device lists. ([\#9867](matrix-org/synapse#9867)) - Fix a long-standing bug where errors from federation did not propagate to the client. ([\#9868](matrix-org/synapse#9868)) Improved Documentation ---------------------- - Add a note to the docker docs mentioning that we mirror upstream's supported Docker platforms. ([\#9801](matrix-org/synapse#9801)) Internal Changes ---------------- - Add a dockerfile for running Synapse in worker-mode under Complement. ([\#9162](matrix-org/synapse#9162)) - Apply `pyupgrade` across the codebase. ([\#9786](matrix-org/synapse#9786)) - Move some replication processing out of `generic_worker`. ([\#9796](matrix-org/synapse#9796)) - Replace `HomeServer.get_config()` with inline references. ([\#9815](matrix-org/synapse#9815)) - Rename some handlers and config modules to not duplicate the top-level module. ([\#9816](matrix-org/synapse#9816)) - Fix a long-standing bug which caused `max_upload_size` to not be correctly enforced. ([\#9817](matrix-org/synapse#9817)) - Reduce CPU usage of the user directory by reusing existing calculated room membership. ([\#9821](matrix-org/synapse#9821)) - Small speed up for joining large remote rooms. ([\#9825](matrix-org/synapse#9825)) - Introduce flake8-bugbear to the test suite and fix some of its lint violations. ([\#9838](matrix-org/synapse#9838)) - Only store the raw data in the in-memory caches, rather than objects that include references to e.g. the data stores. ([\#9845](matrix-org/synapse#9845)) - Limit length of accepted email addresses. ([\#9855](matrix-org/synapse#9855)) - Remove redundant `synapse.types.Collection` type definition. ([\#9856](matrix-org/synapse#9856)) - Handle recently added rate limits correctly when using `--no-rate-limit` with the demo scripts. ([\#9858](matrix-org/synapse#9858)) - Disable invite rate-limiting by default when running the unit tests. ([\#9871](matrix-org/synapse#9871)) - Pass a reactor into `SynapseSite` to make testing easier. ([\#9874](matrix-org/synapse#9874)) - Make `DomainSpecificString` an `attrs` class. ([\#9875](matrix-org/synapse#9875)) - Add type hints to `synapse.api.auth` and `synapse.api.auth_blocking` modules. ([\#9876](matrix-org/synapse#9876)) - Remove redundant `_PushHTTPChannel` test class. ([\#9878](matrix-org/synapse#9878)) - Remove backwards-compatibility code for Python versions < 3.6. ([\#9879](matrix-org/synapse#9879)) - Small performance improvement around handling new local presence updates. ([\#9887](matrix-org/synapse#9887))

This was referenced Jan 19, 2021

Add a new dockerfile for worker-mode synapse matrix-org/complement#62

Merged

Implement knock feature #6739

Merged

anoadragon453 force-pushed the anoa/synapse_worker_docker branch from 285920b to 3eb9fec Compare January 19, 2021 18:56

anoadragon453 force-pushed the anoa/synapse_worker_docker branch from 4aae6f3 to c52ee68 Compare January 21, 2021 14:09

anoadragon453 added 2 commits January 21, 2021 15:00

Add docker image for Synapse with Workers

31e9579

Prevent an empty resources: list if no listener_resources defined

d61dc37

This was causing worker_listener blocks to get generated with Nonetype resources, crashing things. This was only a problem for worker types that don't have any configured resources.

anoadragon453 force-pushed the anoa/synapse_worker_docker branch from c52ee68 to 1373a76 Compare January 21, 2021 15:00

anoadragon453 requested a review from a team January 21, 2021 15:00

This was referenced Jan 21, 2021

Synapse: Add a step for running Complement tests in worker mode matrix-org/pipelines#122

Closed

Complement: Add Synapse Workers build step matrix-org/pipelines#123

Closed

anoadragon453 added 2 commits January 21, 2021 16:41

Add documentation for the dockerfile

3523080

Changelog

e66f9e7

anoadragon453 force-pushed the anoa/synapse_worker_docker branch from 1373a76 to e66f9e7 Compare January 21, 2021 16:41

anoadragon453 removed the request for review from a team January 22, 2021 14:44

MatMaul added 6 commits January 22, 2021 20:45

Missing media path for media worker

2e3134b

Actually launch redis through supervisord

f6571ee

Add worker_extra_conf for media repo

3f1b649

Keep redis in foreground

8b41fdf

Reacts on keyboard signals

2d6af88

Move nginx to 8008 and master to 8080

cb0b272

lint

842dc50

eMPee584 mentioned this pull request Jan 24, 2021

Synapse workers spantaleev/matrix-docker-ansible-deploy#456

Merged

anoadragon453 requested a review from a team January 25, 2021 15:03

anoadragon453 commented Jan 25, 2021

View reviewed changes

docker/Dockerfile-workers Outdated Show resolved Hide resolved

erikjohnston reviewed Jan 27, 2021

View reviewed changes

erikjohnston reviewed Apr 6, 2021

View reviewed changes

docker/README.md Outdated Show resolved Hide resolved

erikjohnston requested a review from a team April 6, 2021 09:24

anoadragon453 and others added 2 commits April 6, 2021 10:37

Clarify this is intended for testing purposes

5a77614

Co-authored-by: Erik Johnston <erik@matrix.org>

Move worker setup-specific config template files to a separate dir

69a6399

richvdh reviewed Apr 6, 2021

View reviewed changes

anoadragon453 and others added 5 commits April 6, 2021 17:34

Apply suggestions from code review

e3fbd62

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>

Update worker docker image name to matrixdotorg/synapse-workers

d844d97

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>

Move config block section templates outside of 'generate_worker_files

f218ae9

Merge branch 'develop' of github.com:matrix-org/synapse into anoa/syn…

a7666d1

…apse_worker_docker

Move worker docker documentation to README-testing, document Compleme…

486d5d1

…nt testing

anoadragon453 commented Apr 9, 2021

View reviewed changes

docker/README-testing.md Outdated Show resolved Hide resolved

anoadragon453 requested a review from richvdh April 9, 2021 18:28

anoadragon453 added 2 commits April 13, 2021 16:23

Merge branch 'develop' of github.com:matrix-org/synapse into anoa/syn…

e47b390

…apse_worker_docker

SYNAPSE_WORKERS -> SYNAPSE_WORKER_TYPES

700c203

As we already have a SYNAPSE_WORKER env var.

richvdh approved these changes Apr 14, 2021

View reviewed changes

anoadragon453 added 3 commits April 14, 2021 12:35

Apply suggestions from code review

314473f

Merge branch 'develop' of github.com:matrix-org/synapse into anoa/syn…

030b1f4

…apse_worker_docker

Remove python text encoding delcaration

1a37f25

anoadragon453 merged commit 7e460ec into develop Apr 14, 2021

anoadragon453 deleted the anoa/synapse_worker_docker branch April 14, 2021 12:54

anoadragon453 mentioned this pull request Apr 23, 2021

Update the Synapse Worker configuration with new base image name and worker configuration matrix-org/complement#105

Merged

This was referenced May 24, 2021

Default generated logger config causes "Permission denied: '/homeserver.log'" for non-root containers #9970

Closed

Fix docker image to not log at /homeserver.log #10045

Merged

richvdh mentioned this pull request Apr 27, 2022

Use supervisord to supervise Postgres and Caddy in the Complement image. #12480

Merged

dklimpel mentioned this pull request Oct 17, 2022

Add all Stream Writer worker types to configure_workers_and_start.py #14197

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a dockerfile for running a set of Synapse worker processes #9162

Add a dockerfile for running a set of Synapse worker processes #9162

anoadragon453 commented Jan 19, 2021 •

edited

Loading

anoadragon453 commented Jan 20, 2021 •

edited

Loading

anoadragon453 commented Jan 21, 2021

anoadragon453 commented Jan 22, 2021

anoadragon453 commented Jan 25, 2021

erikjohnston left a comment

erikjohnston Jan 27, 2021

anoadragon453 Feb 1, 2021 •

edited

Loading

erikjohnston Jan 27, 2021

anoadragon453 Feb 1, 2021

anoadragon453 Mar 8, 2021

anoadragon453 Mar 22, 2021

anoadragon453 Mar 24, 2021

erikjohnston Apr 6, 2021 •

edited

Loading

erikjohnston left a comment

erikjohnston commented Apr 6, 2021

anoadragon453 commented Apr 6, 2021

richvdh left a comment

richvdh Apr 6, 2021

anoadragon453 Apr 6, 2021

anoadragon453 Apr 6, 2021

richvdh Apr 7, 2021

anoadragon453 Apr 9, 2021

richvdh left a comment

		You can read about jemalloc by reading the Synapse [README](../README.md)

		### Running all worker processes in a single container for testing

Add a dockerfile for running a set of Synapse worker processes #9162

Add a dockerfile for running a set of Synapse worker processes #9162

Conversation

anoadragon453 commented Jan 19, 2021 • edited Loading

anoadragon453 commented Jan 20, 2021 • edited Loading

anoadragon453 commented Jan 21, 2021

anoadragon453 commented Jan 22, 2021

anoadragon453 commented Jan 25, 2021

erikjohnston left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anoadragon453 Feb 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erikjohnston Apr 6, 2021 • edited Loading

Choose a reason for hiding this comment

erikjohnston left a comment

Choose a reason for hiding this comment

erikjohnston commented Apr 6, 2021

anoadragon453 commented Apr 6, 2021

richvdh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richvdh left a comment

Choose a reason for hiding this comment

anoadragon453 commented Jan 19, 2021 •

edited

Loading

anoadragon453 commented Jan 20, 2021 •

edited

Loading

anoadragon453 Feb 1, 2021 •

edited

Loading

erikjohnston Apr 6, 2021 •

edited

Loading