Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typescriptify & use service worker for MSC3916 authentication #27326

Merged
merged 27 commits into from
May 14, 2024

Conversation

turt2live
Copy link
Member

@turt2live turt2live commented Apr 11, 2024

Requires matrix-org/matrix-react-sdk#12414
Requires matrix-org/matrix-js-sdk#4169

Notes: Support authenticated media downloads.

Checklist

  • Tests written for new code (and old code if feasible).
  • New or updated public/exported symbols have accurate TSDoc documentation.
  • Linter and other CI checks pass.
  • Sign-off given on the changes (see CONTRIBUTING.md).

@turt2live
Copy link
Member Author

I don't think it's really possible to write tests for this at the moment. It's best tested through integration tests, and those use Synapse, which doesn't currently support MSC3916.

This PR was manually tested against MMR using t2bot/matrix-media-repo#509, and is confirmed working when not using MMR as well.

@turt2live
Copy link
Member Author

I've been informed that Playwright can mock the endpoints required to see if this is working, so I'll add that to my list.

// @ts-expect-error - service worker types are not available. See 'fetch' event handler.
skipWaiting();

self.addEventListener("message", (event) => {

Check failure

Code scanning / SonarCloud

Origins should be verified during cross-origin communications

<!--SONAR_ISSUE_KEY:AY7PWlWeuP8G6MXDcCy--->Verify the origin of the received message. <p>See more on <a href="https://sonarcloud.io/project/issues?id=element-web&issues=AY7PWlWeuP8G6MXDcCy-&open=AY7PWlWeuP8G6MXDcCy-&pullRequest=27326">SonarCloud</a></p>
@turt2live
Copy link
Member Author

I believe the SonarCloud check will need bypassing/resolving for this PR. I can't reasonably write tests at this layer, Math.random() is deliberate, and postMessage usage is scoped to the service worker and browser channels.

@turt2live
Copy link
Member Author

The layered build also appears to not check out the react-sdk?? Feels like something not broken by my PR.

@turt2live turt2live marked this pull request as ready for review April 22, 2024 21:02
@turt2live turt2live requested a review from a team as a code owner April 22, 2024 21:02
@turt2live turt2live requested review from richvdh and robintown April 22, 2024 21:02
@t3chguy
Copy link
Member

t3chguy commented Apr 23, 2024

I believe the SonarCloud check will need bypassing/resolving for this PR

Dismiss them then :D

image

src/vector/platform/WebPlatform.ts Outdated Show resolved Hide resolved
res/sw.js Show resolved Hide resolved
src/serviceworker/index.ts Outdated Show resolved Hide resolved
Comment on lines 91 to 96
// If we have server support (and a means of authentication), rewrite the URL to use MSC3916 endpoints.
if (serverSupportMap[csApi].supportsMSC3916 && accessToken) {
// Currently unstable only.
// TODO: Support stable endpoints when available.
url = url.replace(/\/media\/v3\/(.*)\//, "/client/unstable/org.matrix.msc3916/media/$1/");
} // else by default we make no changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewriting the URL in the depths of the serviceworker feels very, very magical. Shouldn't this happen in the js-sdk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the js-sdk? Our URL generation for image elements and etc is synchronous, and making it async would require massive refactoring. This prevents us from doing a /versions check.

Instead, the idea is we let the service worker rewrite the url and if that fails or is unsupported, the natural fallback behaviour exists instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a method on MatrixClient or something for turning mxc URIs into http URLs. Cache the versions check early on, I guess.

Instead, the idea is we let the service worker rewrite the url and if that fails or is unsupported, the natural fallback behaviour exists instead.

Seems like we can't rely on this anyway, because we're going to need authentication. Or rather: if it fails often enough that we need that fallback, then the whole approach seems doomed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions which convert MXC URIs to HTTP URLs are not async-capable, unfortunately, and there's nowhere in the app where we can realistically cache /versions early enough. By the time we load the MatrixClientPeg for instance, the user's own avatar is already shown - service workers allow us to intercept this (relatively) synchronous call with an async one.

While the MSC is unstable I think we can (and should) maintain the fallback, though as we get closer to freezing the unauthenticated endpoints mentioned in MSC3916, we can change the js-sdk to use the authenticated endpoints by default (or always?). The service worker would detect that route and append an access token rather than rewriting it at that point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well: we can't use the authenticated endpoint until we get hold of an access token, and getting hold of an access token is an async operation. That seems to me a good time to decide whether we ought to use that access token in media requests.

I guess that's a bunch more work, though :(.

The fact we might be able to get rid of this code in future does little to quell my concerns. Experience strongly suggests that this code will still be here in 8 years time.

I think my concerns here are twofold:

  • We're basically saying the js-sdk is useless without this service worker. That's actually inherent in the fact we have to add authorization, but the less the service worker has to do the better. Anyway, at the very least we need to add documentation to the js-sdk so that people using it understand what their service worker has to do.

  • I feel like rewriting URLs in a service worker is subtle, unexpected behaviour. A future generation of developers is going to be extremely confused by this. For reasons I can't entirely explain, it feels worse than "just" adding an Authorization header. Hence, I'm pretty keen we do our best to find alternatives.

At the end of the day, if we can't do better then we can't do better. We can at least do our best to document our way out of it though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tldr - js-sdk PR for documentation is here: matrix-org/matrix-js-sdk#4185

Well: we can't use the authenticated endpoint until we get hold of an access token, and getting hold of an access token is an async operation. That seems to me a good time to decide whether we ought to use that access token in media requests.

I guess that's a bunch more work, though :(.

The issue is that the call stack is not capable of having any async code in it. The getHttpUriForMxc function in the js-sdk (and client.mxcUrlToHttp, which just calls getHttpUriForMxc) is not async - introducing a promise of any kind would be a breaking change. We could add secondary functions (getAuthenticatedHttpUriForMxc, for example), though then we reach the react-sdk layer where the Media class exists and is engrained throughout the entire layer, and of course doesn't support async operations in its general cases either. This same problem propagates outwards to the remaining components, functions, classes, etc of the react-sdk.

It is indeed a bunch more work to fix this, and matrix-org/matrix-react-sdk@5cdd706 only covers some of it (as an alternative I wrote to avoid using service workers). Even if we replace the URL rewrite with a js-sdk function call, we're still collecting blobs and filling up our memory buffers. The hardest things to fix are the hidden uses of non-image elements, like in the case of the composer where avatars are applied using CSS.

Service workers cover all of these nuanced cases, though ideally I think the js-sdk would indeed be producing authentication-first URLs (which the service worker can detect and append an Authorization header to). This is however a pretty breaking change to the app, so at least while things are unstable I think it's best to keep the fallback.

The fact we might be able to get rid of this code in future does little to quell my concerns. Experience strongly suggests that this code will still be here in 8 years time.

This is my experience too, fwiw. I'm hopeful our rollout plan (internal only) will help actually make this code temporary and work to eradicate the unauthenticated option, though the service worker itself will need to be maintained. We can at least remove/reduce some of the magic as the MSC progresses towards total stability.

  • We're basically saying the js-sdk is useless without this service worker. That's actually inherent in the fact we have to add authorization, but the less the service worker has to do the better. Anyway, at the very least we need to add documentation to the js-sdk so that people using it understand what their service worker has to do.

I'm not sure I'd classify it as "useless" given servers aren't likely to turn on an unstable feature just yet, but it is something worth documenting. I've done so here: matrix-org/matrix-js-sdk#4185

I suspect it's possible for the js-sdk to support these sorts of operations when it's running in a NodeJS environment, though in a browser environment it's almost certainly a service worker problem given the consumer's (react-sdk) usage.

  • I feel like rewriting URLs in a service worker is subtle, unexpected behaviour. A future generation of developers is going to be extremely confused by this. For reasons I can't entirely explain, it feels worse than "just" adding an Authorization header. Hence, I'm pretty keen we do our best to find alternatives.

Mentioned above, it is pretty magical. With the browser devtools making it clear what's happening and the rollout strategy though, I think we'll be okay. At the very least we can switch to using stable endpoints in the js-sdk pretty quickly by default (once the spec process allows us to), and convert the service worker to 'just' adding the Authorization header in place. This would break compatibility with older servers, but at least on T&S we consider that to be a feature (once the MSC is stable and/or released - doing so right now would require servers to advertise unstable functionality, which effectively bypasses the MSC process and leads to defacto spec).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tldr - js-sdk PR for documentation is here: matrix-org/matrix-js-sdk#4185

Thanks for starting to document this. It certainly helps quell some of my concerns here.

The issue is that the call stack is not capable of having any async code in it. The getHttpUriForMxc function in the js-sdk (and client.mxcUrlToHttp, which just calls getHttpUriForMxc) is not async - introducing a promise of any kind would be a breaking change.

I don't really follow this. Why can't we make the /versions request before we construct a MatrixClient, just as we currently make a /login request? That would mean that client.mxcUrlToHttp could still be synchronous, but do the right thing.

So is the problem that there are some places that we construct media links where there is no access to a MatrixClient? Or before a MatrixClient is even constructed? It feels like such things probably need to work without authentication anyway?

... which is not to say it's not a bunch of work, but I'm still keen to understand if we're doing this for pragmatic reasons or because it's the only way it can possibly work.

At the very least we can switch to using stable endpoints in the js-sdk pretty quickly by default (once the spec process allows us to), and convert the service worker to 'just' adding the Authorization header in place.

Well, I'll believe it when I see it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving comment into thread

@t3chguy says:

I don't really follow this. Why can't we make the /versions request before we construct a MatrixClient, just as we currently make a /login request? That would mean that client.mxcUrlToHttp could still be synchronous, but do the right thing.

This would break the offline mode in Element Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really follow this. Why can't we make the /versions request before we construct a MatrixClient, just as we currently make a /login request? That would mean that client.mxcUrlToHttp could still be synchronous, but do the right thing.

So is the problem that there are some places that we construct media links where there is no access to a MatrixClient? Or before a MatrixClient is even constructed? It feels like such things probably need to work without authentication anyway?

... which is not to say it's not a bunch of work, but I'm still keen to understand if we're doing this for pragmatic reasons or because it's the only way it can possibly work.

/login I think is trivial to support. If a user has an existing session though, we seem to show their avatar very early in the cycle, making some code a bit racy. The earliest we can check /versions is sometimes too late, at least as far as my browser is concerned: matrix-org/matrix-react-sdk@0a1d45c (this commit has other problems, like lack of re-checking server support, linting, etc).

The service worker avoids the lack of MatrixClient by reaching into idb directly for the token. The postMessage for user information also reaches directly into localstorage, avoiding MatrixClient too.

This is out of pragmatism imo, though I'm still open to suggestions/hints on where to put this code otherwise.

src/serviceworker/index.ts Show resolved Hide resolved
src/vector/platform/WebPlatform.ts Outdated Show resolved Hide resolved
@turt2live
Copy link
Member Author

I believe the SonarCloud check will need bypassing/resolving for this PR

Dismiss them then :D

image

I can't for some reason. I could on the react-sdk though.

@turt2live turt2live requested a review from richvdh April 23, 2024 20:11
src/vector/platform/WebPlatform.ts Outdated Show resolved Hide resolved
src/vector/platform/WebPlatform.ts Outdated Show resolved Hide resolved
Comment on lines 91 to 96
// If we have server support (and a means of authentication), rewrite the URL to use MSC3916 endpoints.
if (serverSupportMap[csApi].supportsMSC3916 && accessToken) {
// Currently unstable only.
// TODO: Support stable endpoints when available.
url = url.replace(/\/media\/v3\/(.*)\//, "/client/unstable/org.matrix.msc3916/media/$1/");
} // else by default we make no changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well: we can't use the authenticated endpoint until we get hold of an access token, and getting hold of an access token is an async operation. That seems to me a good time to decide whether we ought to use that access token in media requests.

I guess that's a bunch more work, though :(.

The fact we might be able to get rid of this code in future does little to quell my concerns. Experience strongly suggests that this code will still be here in 8 years time.

I think my concerns here are twofold:

  • We're basically saying the js-sdk is useless without this service worker. That's actually inherent in the fact we have to add authorization, but the less the service worker has to do the better. Anyway, at the very least we need to add documentation to the js-sdk so that people using it understand what their service worker has to do.

  • I feel like rewriting URLs in a service worker is subtle, unexpected behaviour. A future generation of developers is going to be extremely confused by this. For reasons I can't entirely explain, it feels worse than "just" adding an Authorization header. Hence, I'm pretty keen we do our best to find alternatives.

At the end of the day, if we can't do better then we can't do better. We can at least do our best to document our way out of it though.

src/serviceworker/index.ts Outdated Show resolved Hide resolved
@t3chguy

This comment was marked as duplicate.

@robintown robintown removed their request for review May 2, 2024 15:54
@turt2live turt2live requested a review from richvdh May 13, 2024 17:08
@turt2live
Copy link
Member Author

@richvdh please take a look - if there's a blocking concern I missed, let me know.

src/serviceworker/index.ts Outdated Show resolved Hide resolved
src/serviceworker/index.ts Outdated Show resolved Hide resolved
turt2live and others added 3 commits May 14, 2024 13:02
@turt2live
Copy link
Member Author

I'm force-merging this to bypass the SonarCloud test requirements check. All other required checks are passing, and SonarCloud's report has been reviewed for solvable problems.

Tests for this would likely not appear in this layer anyways.

@turt2live turt2live merged commit bcd5c83 into develop May 14, 2024
22 of 23 checks passed
@turt2live turt2live deleted the travis/msc3916 branch May 14, 2024 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants