-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
send SNI for federation requests #3439
send SNI for federation requests #3439
Conversation
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
This will probably also resolve part of #1691. |
The documentation still needs to be adapted. The current documentation on reverse-proxying the federation port states:
This pull request would change this, requiring an SSL certificate signed by a valid CA in order to establish a connection. Is this acceptable? With Let's Encrypt, certificate issuing and renewing is free and easy. Requiring certificate validation also enhances security so to me it makes sense to enable this, but if you want to keep the policy from the documentation then certificate validation should be disabled. |
The validation of the certificates is based on views. So the certificates will only be accepted if other known servers see the same fingerprint. If this is aimed to be changed it needs a Spec Proposal as it would change the federation api (e. G. here). So just adding SNI would be OK for this PR but requiring a valid signed certificate would need a spec change and would be too much for this one PR |
Certificate validation is a standard feature of the twisted API (https://twistedmatrix.com/documents/18.4.0/api/twisted.internet.ssl.optionsForClientTLS.html), so it was added as a side effect to the support for SNI, which is included in the same API. As it is not compatible with the spec, I'll disable this feature, because my primary aim was to implement support for SNI. Originally I added a configuration option to enable / disable certificate validation, but as validation should be disabled anyway I suppose there is no point in keeping it so I'll revert this change as well. |
@vojeroen thanks for this! as @krombel says, we do application-layer certificate validation (which i'm hoping this preserves). we're currently in a bit of a security crunch but will try to review this asap; lots of people need SNI (especially anyone behind cloudflare who doesn't want to spend $$$ for a non-SNI listener) |
@ara4n I hope it checks out for you. I'm running this patch now on two servers (with CA-signed certificates and self-signed certificates) behind a reverse proxy that forwards to the correct server based on SNI, and it works fine. When reviewing you'll notice that I used some private methods from the twisted library, but as the public API automatically does certificate validation using CA, I had to reimplement some functions. |
# Conflicts: # synapse/http/endpoint.py
…eration_requests # Conflicts: # synapse/crypto/context_factory.py
@matrixbot: test this please |
@matrixbot: test this please |
@vojeroen To fix the issue with |
thanks for the pointer, the issue with |
@matrixbot test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modulo a few nits as mentioned below, this looks amazing - thank you for doing all this work.
It would also be helpful if you could update the description of the PR - in particular to note that it no longer validates certificates.
One thing I will say is that the interaction with the SSL libs has historically been a big tale of woe - see #2350 and the related https://twistedmatrix.com/trac/ticket/9210#comment:15 for some painful reading.
I'm going to try running this on my personal server for a bit to see what breaks.
synapse/crypto/context_factory.py
Outdated
|
||
def get_options(self, host): | ||
return ClientTLSOptions( | ||
unicode(host), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're in the process of porting to python 3, and this won't work. Maybe you want host.encode('ascii')
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClientTLSOptions should get a unicode string, so the python 3 equivalent of unicode(host)
would be str(host)
. If the same code should run in python 2 and 3, assigning str
to unicode
for python 3 is a straightforward solution. Is this OK for you?
if sys.version_info.major >= 3:
unicode = str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I was being stupid here. Let's not go defining unicode
though.
I think you actually want host.decode('ascii')
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason you want to use decode('ascii')
, otherwise I would suggest decode('utf-8')
to keep it more generic?
Just quickly looking at synapse.http.endpoint.matrix_federation_endpoint
(where ClientTLSOptionsFactory
is actually used), it seems to me that this function will have more problems with bytes vs str when running in python 3 than just this one. Probably to be looked at in another PR, as this would require some more investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason you want to use decode('ascii'), otherwise I would suggest decode('utf-8') to keep it more generic?
Well, I think we're expecting a plain ascii hostname here. If it's not plain ascii, something is going wrong.
And yes, there are currently other python 3 problems in this area. @hawkowl is working on python 3 support...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you already merged this. I did the commit with utf-8, so if you prefer ascii it might be best to change it then.
synapse/crypto/context_factory.py
Outdated
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class ServerContextFactory(ssl.ContextFactory): | ||
class ServerContextFactory(ContextFactory): | ||
"""Factory for PyOpenSSL SSL contexts that are used to handle incoming | ||
connections and to make connections to remote servers.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right in thinking this is no longer used for making outbound connections? Could you update the docstring if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, I'll update it.
def __init__(self, hostname, ctx): | ||
self._ctx = ctx | ||
self._hostname = hostname | ||
self._hostnameBytes = _idnaBytes(hostname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be nice to copy _idnaBytes
and _tolerateErrors
(both of which seem to be trivial) rather than using the private impls from twisted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not best practice to use private implementations. But, if I just copy _tolerateErrors, the pep8 test fails as it contains a bare "except:" statement. I'm not familiar enough with twisted / openssl to know which errors this try/except clause should catch. I can change this to "except Exception:" to both catch all errors and pass the pep8 test, but that doesn't seem right either. However it is the easiest solution. What do you think?
This is the _tolerateErrors code:
def _tolerateErrors(wrapped):
def infoCallback(connection, where, ret):
try:
return wrapped(connection, where, ret)
except:
f = Failure()
logger.exception("Error during info_callback")
connection.get_app_data().failVerification(f)
return infoCallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write # noqa: E722
to disable the pep8 test. See https://github.com/matrix-org/synapse/blob/develop/synapse/handlers/message.py#L583 for an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an even easier solution. I'll add this comment.
(btw it's been running fine for me on my HS, so let's get the niggles cleaned up and we can merge this \o/) |
I'm glad it's running fine on your HS. I haven't encountered problems yet either when running this on mine. Let me know if you have any remarks after these commits. |
@matrixbot: test this please |
ok, let's ship this :) thank you! |
next challenge: add a test to sytest to check that it doesn't regres... |
Features -------- - Add support for the SNI extension to federation TLS connections. Thanks to @vojeroen! ([\#3439](#3439)) - Add /_media/r0/config ([\#3184](#3184)) - speed up /members API and add `at` and `membership` params as per MSC1227 ([\#3568](#3568)) - implement `summary` block in /sync response as per MSC688 ([\#3574](#3574)) - Add lazy-loading support to /messages as per MSC1227 ([\#3589](#3589)) - Add ability to limit number of monthly active users on the server ([\#3633](#3633)) - Support more federation endpoints on workers ([\#3653](#3653)) - Basic support for room versioning ([\#3654](#3654)) - Ability to disable client/server Synapse via conf toggle ([\#3655](#3655)) - Ability to whitelist specific threepids against monthly active user limiting ([\#3662](#3662)) - Add some metrics for the appservice and federation event sending loops ([\#3664](#3664)) - Where server is disabled, block ability for locked out users to read new messages ([\#3670](#3670)) - set admin uri via config, to be used in error messages where the user should contact the administrator ([\#3687](#3687)) - Synapse's presence functionality can now be disabled with the "use_presence" configuration option. ([\#3694](#3694)) - For resource limit blocked users, prevent writing into rooms ([\#3708](#3708)) Bugfixes -------- - Fix occasional glitches in the synapse_event_persisted_position metric ([\#3658](#3658)) - Fix bug on deleting 3pid when using identity servers that don't support unbind API ([\#3661](#3661)) - Make the tests pass on Twisted < 18.7.0 ([\#3676](#3676)) - Don’t ship recaptcha_ajax.js, use it directly from Google ([\#3677](#3677)) - Fixes test_reap_monthly_active_users so it passes under postgres ([\#3681](#3681)) - Fix mau blocking calulation bug on login ([\#3689](#3689)) - Fix missing yield in synapse.storage.monthly_active_users.initialise_reserved_users ([\#3692](#3692)) - Improve HTTP request logging to include all requests ([\#3700](#3700)) - Avoid timing out requests while we are streaming back the response ([\#3701](#3701)) - Support more federation endpoints on workers ([\#3705](#3705), [\#3713](#3713)) - Fix "Starting db txn 'get_all_updated_receipts' from sentinel context" warning ([\#3710](#3710)) - Fix bug where `state_cache` cache factor ignored environment variables ([\#3719](#3719)) - Fix bug in v0.33.3rc1 which caused infinite loops and OOMs ([\#3723](#3723)) - Fix bug introduced in v0.33.3rc1 which made the ToS give a 500 error ([\#3732](#3732)) Deprecations and Removals ------------------------- - The Shared-Secret registration method of the legacy v1/register REST endpoint has been removed. For a replacement, please see [the admin/register API documentation](https://github.com/matrix-org/synapse/blob/master/docs/admin_api/register_api.rst). ([\#3703](#3703)) Internal Changes ---------------- - The test suite now can run under PostgreSQL. ([\#3423](#3423)) - Refactor HTTP replication endpoints to reduce code duplication ([\#3632](#3632)) - Tests now correctly execute on Python 3. ([\#3647](#3647)) - Sytests can now be run inside a Docker container. ([\#3660](#3660)) - Port over enough to Python 3 to allow the sytests to start. ([\#3668](#3668)) - Update docker base image from alpine 3.7 to 3.8. ([\#3669](#3669)) - Rename synapse.util.async to synapse.util.async_helpers to mitigate async becoming a keyword on Python 3.7. ([\#3678](#3678)) - Synapse's tests are now formatted with the black autoformatter. ([\#3679](#3679)) - Implemented a new testing base class to reduce test boilerplate. ([\#3684](#3684)) - Rename MAU prometheus metrics ([\#3690](#3690)) - add new error type ResourceLimit ([\#3707](#3707)) - Logcontexts for replication command handlers ([\#3709](#3709)) - Update admin register API documentation to reference a real user ID. ([\#3712](#3712))
I'm afraid I can't really help extending sytest, I'm not familiar with perl. |
This probably needs to wait for #3439 to land to avoid conflicting with it.
Since I need synapse to be available behind a reverse proxy, I added SNI support to outgoing federation requests. This should resolve #1491.
As a side effect, server certificates for federation requests are now validated as well (and connections will fail if validation fails).To bypass the server certificate validation in development environments, where these are mostly self-signed so validation will fail, I added an option to the config file to disable certificate validation. The bypass for certificate validation could be done better, but with the current twisted API (and unfortunately I'm not an expert on twisted), I couldn't find a better solution.UPDATE: as discussed below, self-signed certificates will still work as expected by the federation API. The default behavior of twisted to validate SSL certificates has been bypassed.
Signed-off-by: Jeroen van Os vo.jeroen@gmail.com