Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Documentation for media_storage_providers #7140

Open
djschilling opened this issue Mar 25, 2020 · 12 comments
Open

Documentation for media_storage_providers #7140

djschilling opened this issue Mar 25, 2020 · 12 comments
Labels
A-Docs things relating to the documentation A-Media-Repository Uploading, downloading images and video, thumbnailing T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution

Comments

@djschilling
Copy link

I want to use the s3_storage_provider.S3StorageProviderBackend.

I configured it and when images are send over my synapse server they are also stored in the s3 bucket.
Here is how i configured it:

media_storage_providers:
- module: s3_storage_provider.S3StorageProviderBackend
  store_local: False
  store_remote: True
  store_synchronous: True
  config:
    bucket: synapse1
    endpoint_url: $HIDDEN_ENDPOINT
    access_key_id: $HIDDEN_KEY
    secret_access_key: $HIDDEN_KEY

But the images are still also stored locally on disk at /var/lib/matrix-synapse/media and if i remove the folder /var/lib/matrix-synapse/media the images are not shown anymore although they are still in the s3 bucket. So it seems to me that the s3_storage_provider.S3StorageProviderBackend does nothing but also store the data as a backup option?

I also could not find any documentation about media_storage_providers in this repo.
Can someone explain what they are and how they are supposed to work please.

My use case is that i want only to use the S3 Bucket and no local storage. Is this possible with media_storage_providers?

@clokep clokep added A-Docs things relating to the documentation A-Media-Repository Uploading, downloading images and video, thumbnailing labels Apr 2, 2020
@tristanlins
Copy link
Contributor

I'm also interested in what the media storage is for.

For me too, the file is stored locally in the file system AND on the S3.
The strange thing is that store_local: False means that the file is only saved in the file system and NOT on the S3!

But if I delete the local directory, all files are still available to me.

I use a Minio server as S3 storage.

@tristanlins
Copy link
Contributor

What I just noticed is that all pre-generated thumbnails are ONLY stored on the S3 as expected. 🤔

@clokep
Copy link
Member

clokep commented Apr 13, 2020

I also could not find any documentation about media_storage_providers in this repo.

The documentation is pretty light, but seems to be at https://github.com/matrix-org/synapse/blob/develop/docs/media_repository.md, the default config could certainly offer a bit more info:

# Media storage providers allow media to be stored in different
# locations.
#
#media_storage_providers:
# - module: file_system
# # Whether to write new local files.
# store_local: false
# # Whether to write new remote media
# store_remote: false
# # Whether to block upload requests waiting for write to this
# # provider to complete
# store_synchronous: false
# config:
# directory: /mnt/some/other/directory

Looking at some of the code it seems that the "local" vs. "remote" in those configurations is whether the media was uploaded directly to this server vs. whether it was received over federation (it does not mean whether the data is stored "locally" on the server vs. "remote on S3", which is how I originally read it). See

"store_local", # Whether to store newly uploaded local files
"store_remote", # Whether to store newly downloaded remote files
"store_synchronous", # Whether to wait for successful storage for local uploads

What's your configuration for backup_media_store_path? It looks like the file system store might be enabled if backup_media_store_path is set to true, see

storage_providers = [
{
"module": "file_system",
"store_local": True,
"store_synchronous": synchronous_backup_media_store,
"store_remote": True,
"config": {"directory": backup_media_store_path},
}
]

@tristanlins
Copy link
Contributor

I have not specified backup_media_store_path, I will take a deeper look into this...

@tristanlins
Copy link
Contributor

The backup_media_store_path is not the problem, it cannot be used together with media_storage_providers:

if backup_media_store_path:
if storage_providers:
raise ConfigError(
"Cannot use both 'backup_media_store_path' and 'storage_providers'"
)

@tristanlins
Copy link
Contributor

Okay, I think I understand how the media storage providers are meant.
The bad news: It is not possible to replace the local storage (media_store_path) with the remote storage.

The storage of the media is primarily managed by the MediaStorage and not the StorageProvider.

fname = yield self.media_storage.store_file(content, file_info)

output_path = yield self.media_storage.store_file(

output_path = yield self.media_storage.store_file(

output_path = yield self.media_storage.store_file(




This in turn always saves the file first in the local directory:

with self.store_into_file(file_info) as (f, fname, finish_cb):
# Write to the main repository
yield defer_to_thread(
self.hs.get_reactor(), _write_file_synchronously, source, f
)
yield finish_cb()




And then this locally stored file is passed on to the StorageProviders:

@defer.inlineCallbacks
def finish():
for provider in self.storage_providers:
yield provider.store_file(path, file_info)




I'm slowly beginning to understand what the s3_media_upload.py cleanup job is all about.
https://github.com/matrix-org/synapse-s3-storage-provider#regular-cleanup-job




However, it should not be a problem to simply delete the local files from media_store_path at regular intervals. In this case, the file is then delivered by the first StorageProvider who still holds the file.

path = self._file_info_to_path(file_info)
local_path = os.path.join(self.local_media_directory, path)
if os.path.exists(local_path):
return FileResponder(open(local_path, "rb"))
for provider in self.storage_providers:
res = yield provider.fetch(path, file_info)
if res:
return res




I hope I have analyzed it correctly.
Perhaps a responsible developer can comment again on whether this is correct ?!

@babolivier
Copy link
Contributor

Hey @tristanlins, yes this seems correct to me.

@nemani
Copy link

nemani commented Jul 16, 2020

Can we add a link to this issue at the relevant place in the sample config file?

Thanks!

@clokep clokep added the Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution label Aug 4, 2020
@ftpmorph
Copy link

ftpmorph commented Feb 3, 2021

Note there is an alternative solution which in theory should be easy to config so that the only media directory is an S3 bucket. I'm going to try it myself when I set my server up within the next few days.

You install and configure Goofys, make sure it mounts an S3 bucket directory on boot, and then you can simply change the media directory in the Matrix config to that one and turn off the S3 storage provider.

As far as Matrix is concerned it is writing media to the local filesystem but it'll be transparently using your S3 bucket instead.

This is the S3 config from the popular Ansible Docker deploy.

@sorcer1122
Copy link

Note there is an alternative solution which in theory should be easy to config so that the only media directory is an S3 bucket. I'm going to try it myself when I set my server up within the next few days.

You install and configure Goofys, make sure it mounts an S3 bucket directory on boot, and then you can simply change the media directory in the Matrix config to that one and turn off the S3 storage provider.

As far as Matrix is concerned it is writing media to the local filesystem but it'll be transparently using your S3 bucket instead.

This is the S3 config from the popular Ansible Docker deploy.

Any luck with this? I am looking for the same solution on how to replace local media storage with S3.

@djmaze
Copy link

djmaze commented Jan 8, 2023

I bet using goofys works worse and is more brittle than just using the s3 storage provider. The local storage even has the advantage of being a cache for recently uploaded files.

Why not just run the clean-up script regularly? (Or run synapse in ephemeral containers, which, at least for me, is already sufficient?)

@sorcer1122
Copy link

I bet using goofys works worse and is more brittle than just using the s3 storage provider. The local storage even has the advantage of being a cache for recently uploaded files.

Why not just run the clean-up script regularly? (Or run synapse in ephemeral containers, which, at least for me, is already sufficient?)

Couple of reasons:

  1. I want everything to go into S3 for speed and reliability reasons
  2. Much easier to clean-up S3 bucket for me

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Docs things relating to the documentation A-Media-Repository Uploading, downloading images and video, thumbnailing T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution
Projects
None yet
Development

No branches or pull requests

9 participants