-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Media in the content repo is not authed #870
Comments
This comment has been minimized.
This comment has been minimized.
This is no more security through obscurity than any other key based authentication mechanism, this is called URL based authentication, the key in your example bSRWdHBFqtVzowZDhwRGbzDq is 24 characters long and uses upper and lower case, this is 52^24 which is more than 128bits. But lets work through your concern. Imagine we write your trivial app and start it running... Assuming the CDN can store 1 PB (PetaByte), and an average image size of 1KB, thats a trillion images (10^12 or 1,000,000,000,000). Lets assume that you have a really high speed Internet link and the CDN will let you do 10^8 (100,000,000) queries per second, tcpdump says that a single query is 4.7KB so were doing 470GB of traffic every second, and apparently both your link and the server are able to handle 3760Gbps. Lets say that no one notices that the server is getting hit by a Denial of Service attack 6 times larger than anything ever seen before, and they let you keep going for 10 years (60*60*24*365*10) 52^24/10^12/10^8/(60*60*24*365*10) At this point we can determine that you have a 1 in 4,844,775,310,744 chance of getting a random Cat pic... Meanwhile you have a better chance of getting struck by lightning... while drowning at 1 in 183 million. Personally I would be more concerned about someone walking up to the server and stealing it... or the server gets hacked due to a bug somewhere... which is why you should be using encrypted chat... This is what an image looks like when it is sent to a group using encrypted chat: https://matrix.org/_matrix/media/v1/download/matrix.org/qctIqdoPymLbqdNpOkWZGtvo If you grab this file (which was a jpeg of a cat) you will notice that it it encrypted. |
It's still less secure than Hangouts et al though because it only requires correctly guessing one key rather than two or more. To access a privately shared image via Hangouts, you'd have to gain access to a whole account that has been granted permission to view the image, so you'd have to know both the username and the password, which is much harder to randomly guess. Moreover, some accounts are configured with 2FA, further increasing the security. This implementation is far from that, and I think addressing this would be worth doing at some point. |
My understanding of your concern was that the media-id's which were being generated by Synapse, left users of Synapse open to a brute-force keyspace attack using a simple app (an understandable concern). The Matrix specification does not provide details on media-id keyspace, so the keyspace for the media-id can be easily increased to increase security without issue, if required. However a keyspace attack against the Synapse content repository API implementation is already infeasible, so no change is necessary. Synapse is the reference implementation for the Matrix specification and adding user authentication to the content repository API would require a change to the Matrix Specification. To propose changes to the Matrix Specifications see the following: https://github.com/matrix-org/matrix-doc/blob/master/CONTRIBUTING.rst PS If you are concerned about privacy, use encryption. |
Encryption is nice, but if I have your file, I could deploy infinite time and resources to brute force that encryption. What you want is to make it as hard as possible for me to get your file in the first place, then encrypt it on top of that. That's why people are so reticent to hand over their phones or laptops to border patrol even when they use full disk encryption. Physical security matters perhaps even more than encryption. As such, what concerns me here is it's so easy to gain physical access (in a sense) to random people's files by stumbling on a random file just by guessing a single key, rather than having to match at least two matching pairs. In other image sharing services, there are similar long, unique keys to access the image itself, but in addition to that you need to present valid account credentials and that account has to have been given explicit permission to view that image. I do think it would be prudent add those additional layers of security here. |
I totally agree with kethinov. I can imagine deploying fail2ban on the server to monitor 404 errors would slow down the attacker but still does not solve the main issue. |
dup of matrix-org/synapse#1403 |
See also https://github.com/matrix-org/matrix-doc/issues/701 for the spec issue here. |
It is highly unlikely someone could guess the media url, the key in each media link is reasonably long enough to prevent guessing. The more likely attack vector would be obtaining the URL directly somehow; perhaps it is accidentally posted into a channel or someone who already has the link shares it without permission, your browser has a toolbar that is scraping your URL entries without your knowledge, some other person in the channel has malware on their machine that is sending away data it is collecting from a channel they are participating in with you, etc. |
Crossposting for the purposes of visibility (source):
|
So this comes up on a regular basis, especially from corporate security folks who don't like the idea that a URL leaked in HTTP logs (or proxy logs) etc could then be simply curl'd by any random user to access the content. It's not a matter of the chances of guessing the URL correctly (or the chances of being hit by lightning) but instead whether an attacker who does manage to get the URL automagically gets access to the content too. One thing we could do is to auth access to the content itself, but this means tracking the event(s) that the content is referenced by and in turn which users have access to those events and so can view the content. This is a potentially nasty leak of metadata for e2e attachments which we don't currently have otherwise. (It's possible we might need this for quotas as per matrix-org/synapse#3339, but hopefully not). It's also quite heavy for the media repo to have to check auth rules for a room for every piece of content that is viewed (and is a bit unfortunate if the media repo is otherwise independent of the room server). An alternative naive solution could be to just track a random bearer token alongside each mxc:// URL for each piece of content, stored in the event and in the repo. Clients would then submit this bearer token as I think there was also another solution involving HMACs (which I think is how we did it pre-Matrix?), but I can't remember how that worked. @erikjohnston any idea? Edit: we could of course also mandate that the user has a valid access_token for the server too when they are accessing the media repo, although that doesn't lock access to any particular piece of content. |
@turt2live did you have any ideas on how this should/could work? |
Not too much beyond the verbose spiel above (which ends with "I have no idea"). In any case, we should consider having a way for users/bridges/bots to say "this is supposed to be unauthed" via the API for things like the IRC bridge. How insane would it be to always end to end encrypt media regardless of room? |
on second thought, encrypting everything doesn't really help. The authorization token probably makes the most sense, although I'm curious as to how the HMAC stuff would work. |
For bridges, I suspect that users will end up having to request the file using a URL from the bridge, and the bridge would have to do the auth dance. Maybe we could add an endpoint that will return a time-limited download URL that the bridge can 302 the user to, so that it won't have to proxy the whole file. But this would allow to check that the original event hasn't been redacted. |
Maybe investigate how this done in Hangouts? |
alternatively, when the bridge could deliberately expose the URL with a ?secret=... querystring rather than an Auth header if it's intended to be accessible by the general public. (In addition, we /could/ track whether a given MXC should be world-readable or not in the media repo DB, or whether it should require an access_token for access (in addition to the secret)) |
It's worth noting that we probably want to support being able open media in a separate window, e.g. to view large images or PDFs etc, and I don't think you can make the browser add auth headers in those cases |
there are ways of fixing that - e.g. have the client download the content itself with the right headers and then expose it to the user as a blob URL, which can then be viewed in separate windows/tabs etc. |
Turns out that the way we used to do it was to never send access_tokens in requests at all, but send an HMAC(method, url, access_token) and then use the access_token as a shared secret, so that a leaked URL wouldn't leak an individual user's access_token. I assume we didn't do this for Matrix because calculating that HMAC would be too onerous for trivial HTTP clients, hence passing raw access_tokens around. In practice it doesn't buy us anything in this instance, as the resulting URL could still be passed blindly around anyway; we might as well create a new random secret for each URL and use that instead. |
(cf https://github.com/matrix-org/matrix-doc/issues/1043 for "access tokens suck") |
What if each user would get its unique link to media or may be a common link with personal auth token, based on his id. When accessing media, the server could check that access token is correct for the user and the user is authenticated. |
In reply to @ara4n:
The reason that I suggested having the Bridge do the auth dance, rather than forwarding the secret in the querystring was so that a file that's redacted Matrix-side would become inacessible to bridged users.
I would just say that a file can be uploaded with a token or without a token. If it's uploaded with a token, then downloads need to be authed; if it's uploaded without a token, then it's a free-for-all. In reply to @user318
That doesn't really work with end-to-end encrypted files, as the server doesn't get to see what file IDs are visible to what users. |
I do not actually know how it works in e2e. I thought that files are embedded there as a base64-encoded message. And not stored as media. |
Messages have a size limit, so you can't store files within the message itself. You also don't want to send the whole file to everyone until they request it. e2e file events are basically just pointers to an encrypted blob in the media store, along with the decryption key. |
I've written a spec proposal for solving this over at https://github.com/matrix-org/matrix-doc/issues/701, review welcome on the googledoc. |
Is matrix-org/synapse#1263 going to be taken care of with this change as well? I'm only seeing concerns of GDPR erasure, which I presume mean when someone deactivates and deletes their account. Right now its fairly easy to have a tragedy if an inappropriate attachment link gos out a bridge. |
This comment has been minimized.
This comment has been minimized.
Reading this thread, it appears most people mentioned brute force attacks or someone providing the URL to other people. What I'm really concerned of is if somehow Google or other Search Engines end up indexing these images, because they are, after all, public URLs. If someone posts the URL in public (like the OP of this thread), the image may potentially become indexed. This Issue is an important one that needs to be resolved, especially on a project that takes Encryption and Privacy with high priority :) |
This comment has been minimized.
This comment has been minimized.
I'm going to move this to the matrix-doc repo since this would need to be specced before synapse can implement anything. |
And now that we've transferred it it seems that matrix-org/matrix-spec-proposals#3796 is the duplicate for this. |
A conversation in a public space is still public, even if the conversation is between three people. If the conversation should be secret, or the participants always wants privacy, they choose to encrypt all the communication. That renders the mediaURL useless, as all you can get from the link is an encrypted blob - as pointed out in the linked cat-picture. In many ways the conclusions is simple:
There's no reason to trust the servers implementation (or lack thereof) of anything if there's E2EE involved anyway... |
matrix-org/matrix-spec-proposals#3796 is a proposal to fix it; this is the canonical issue. |
The assertions in this thread seem to assume that, and, please correct me if I am wrong:
This "too negligible for most people to actually communicate it properly" approach is personally making me feel uneasy, even if it were possibly more likely for me to get struck by lighting, considering that there are opportunities (in the future) to actually bring the chance of anyone ever receiving anything down to an absolute zero. |
A comment on this, as far as I can see, this will break media being shared across bridges, unless these bridges relay binary data directly. But in turn, this will defeat the purpose of protecting the media since it will be directy available on another platform, maybe without the original poster being aware of this. |
Correct me if I'm wrong, but: this makes Matrix a great filesharing host. Just create an anonymous account and an unencrypted non-public room and upload whatever you want in chunks as big as the server allows, then let the world know about the URLs to be consumed by tools like JDownloader. With some more effort on the client side, having public access to encrypted chunks is even more perfidious. And the server operator is probably liable for any illegal content (hello DCMA takedown or worse). |
@Iruwen In most (many?) legal regimes, you are only liable for things you know to be hosting, and become liable once you've been informed of the case (and often, the material must also be "manifestly illegal" or similar). Simply having something "bad" on your server doesn't automatically make you liable. A lot of large services (such as Youtube) will automatically take something down as soon as they are notified that it could be problematic, because that's when their legal liability starts. But most of the time they don't care to check whether it is actually problematic, especially for copyright matters & fair use/dealing (hence DMCA takedown requests being weaponised). |
Until this is resolved, I added a Lua script in my nginx reverse proxy which only allows media access for ip addresses that successfully accessed the /capabilities or /sync endpoints, which seem to be two authenticated ones that are reliably accessed first. |
Could you please share the config, how can one achieve this? (It'd be great for me if it would be a full example, I mean with the Docker commands as well if possible, AFAIK nginx doesn't contain the Lua engine anymore, so I need to do something to have Lua besides nginx) |
I can do that later, yeah. In my case it's integrated with https://github.com/spantaleev/matrix-docker-ansible-deploy and thus involves Traefik as well, but it should be easy to adapt. |
Be aware that this will break federation: it will mean that users on other servers will be unable to view media uploaded on your server. |
Yeah I'm not federating, thanks for pointing that out. I guess if you're looking for some extra privacy without aiming for the obvious solution that is encryption, you'll have a specific reason for that tradeoff. |
@turt2live thanks! |
@turt2live the original link in my original post can still be viewed without authentication. Has this change gone live yet on the matrix.org homeserver? And will it apply to all previous media, or only to new media shared after the change goes live? |
The matrix.org homeserver's rollout is being worked out following the spec change - there should be more detail in a few weeks (watch the matrix.org blog for updates). The spec change does not add authentication to existing endpoints, but rather introduces new ones. Servers are being advised to freeze the unauthenticated endpoints, like the one linked above, rather than add authentication retroactively. Media from before the freeze will remain accessible on the old endpoints while new media will only be accessible on the new endpoints. This is what matrix.org plans to do as well. |
Example, this was shared in a private 3 person chat, but anyone can view it: https://matrix.org/_matrix/media/v1/download/matrix.org/bSRWdHBFqtVzowZDhwRGbzDq
Most people I've recruited into Matrix are Google Hangouts refugees looking for an open platform. On Hangouts, you cannot view the web URL of an image in this way unless you're authenticated with the server and the user has shared it with you in a chat.
Would it be possible to support moving past security through obscurity at some point? Or, failing that, at least expire the images after a week or so?
This is concerning because it would be rather trivial for someone to write a simple app querying random alphanumeric strings to harvest images people have shared in private conversations.
The text was updated successfully, but these errors were encountered: