Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic cleanup for expired files #6687

Closed
shishouyuan opened this issue Jul 1, 2023 · 14 comments
Closed

Automatic cleanup for expired files #6687

shishouyuan opened this issue Jul 1, 2023 · 14 comments
Labels
Category:Enhancement Add new functionality

Comments

@shishouyuan
Copy link

shishouyuan commented Jul 1, 2023

Why the expired files has to be cleaned manually?

There was a previous issue #2622 asked for this feature, but ended with only a command line tool been provided in #4256.
As described in the documentation:

In case of expired uploads, the blob and blob.info files will not be removed automatically. Thus a lot of data can pile up over time wasting storage space.

What is the reason that the expired files has to be cleaned manually? Will the expired files in the trash bin be cleaned automatically? The doc doesn't talk about this. Should I periodically use the command line tool to do the cleanup or leave it to ocis?

Describe the solution you'd like

Automatically do the cleanup.

Describe alternatives you've considered

At least provide suggestions about this in the documentation.

Additional context

None.

@shishouyuan
Copy link
Author

Does it mean I have to call ocis storage-users trash-bin purge-expired for every user and there is no way to do the cleanup system-wide? I am quite confused why the default behavior of ocis is never do the cleanup.

STORAGE_USERS_PURGE_TRASH_BIN_USER_ID is used to obtain space trash-bin information and takes the system admin user as the default which is the OCIS_ADMIN_USER_ID but can be set individually. It should be noted, that the OCIS_ADMIN_USER_ID is only assigned automatically when using the single binary deployment and must be manually assigned in all other deployments. The command only considers spaces to which the assigned user has access and delete permission.

@stale
Copy link

stale bot commented Sep 16, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status:Stale label Sep 16, 2023
@ethan-tqa
Copy link

Not sure if you have found a solution for yourself, but in case you haven't, here is a hint from the documentation. In essence, you must manually remove the stale downloads by executing the command ocis storage-users uploads clean. Files that have been stale for more than 24 hours will be removed. That means you can set up a simple cron job that run the command once every 24 hours or so.

@wkloucek
Copy link
Contributor

Not sure if you have found a solution for yourself, but in case you haven't, here is a hint from the documentation. In essence, you must manually remove the stale downloads by executing the command ocis storage-users uploads clean. Files that have been stale for more than 24 hours will be removed. That means you can set up a simple cron job that run the command once every 24 hours or so.

Thanks for summarizing this!

When it comes to the trashbin cleanup: You basically need to do the same with ocis storage-users trash-bin purge-expired. It's not needed to run this for every user / project space since this is a global cleanup (for all personal / project spaces)

What is the reason that the expired files has to be cleaned manually?

oCIS doesn't have something like a cron scheduler. Therefore the responsibility for scheduling these maintenance jobs is currently delegated to the deployment. One reason for not having a cron scheduler in oCIS is, that oCIS is a scaleable distributed system that would then some form of locking, so that cron jobs are only running once.

In the oCIS Kubernetes Helm Chart we support it out of the box by implementing Kubernetes CronJobs (https://github.com/owncloud/ocis-charts/blob/ebb85842659e0bdb951be58867f8e4ce9f33bba2/charts/ocis/values.yaml#L1382-L1403, https://github.com/owncloud/ocis-charts/blob/ebb85842659e0bdb951be58867f8e4ce9f33bba2/charts/ocis/templates/storageusers/jobs.yaml)

@stale stale bot removed the Status:Stale label Oct 23, 2023
Copy link

stale bot commented Mar 17, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status:Stale label Mar 17, 2024
@micbar micbar added the Category:Enhancement Add new functionality label Mar 18, 2024
@stale stale bot removed the Status:Stale label Mar 18, 2024
@micbar
Copy link
Contributor

micbar commented Jun 4, 2024

Considering it as solved. Closing.

@micbar micbar closed this as completed Jun 4, 2024
@JonnyBDev
Copy link

JonnyBDev commented Aug 10, 2024

Hello @micbar ,

I am personally not really considering this as "solved". We are currently running OCIS in version 5.x on a system with a 1TB SSD. We are using S3 as storage backend for the files. As of now we've around 9TB in our bucket.

Yesterday, we received alarms from our monitoring because the SSD (1TB) just reached 100% due to "failed uploads" like described above. A lot of them were over their expiration date. I couldn't even get into the OCIS container. I was suprised, that OCIS is storing files like this on the disk itself and not in S3. Not sure what's the intent behind this (probably upload buffer?) but that doesn't really matter as long as those files are getting deleted.

In our case, they did not get deleted. After taking a look at the documentation (https://doc.owncloud.com/ocis/next/deployment/services/s-list/storage-users.html) the command ocis storage-users uploads clean shouldn't be used anymore because it's deprecated.

I used it anyway because I did not see any other possibility to get rid of the files beside deleting them via rm. Could you give me feedback on this? How am I supposed to clean up the files if the command is deprecated and OCIS is not handling this on its own?

@micbar
Copy link
Contributor

micbar commented Aug 10, 2024

There is a new command. Should be documented. And yes, this is just a buffer to speed up things. Direct upload to S3 has been evaluated and it was far too slow.

I consider this solved because there is a new command. We are running a large SaaS service in a similar configuration with S3.

@JonnyBDev
Copy link

JonnyBDev commented Aug 12, 2024

Thanks for your fast reply. Unfortunately I couldn't find a working command to clean the sessions. I found that you could use the sessions command with a flag to clean the items, at least that's what the documentation is saying.

ocis storage-users uploads sessions \
     --expired=true \
     --clean

Unfortunately, using this command will return an error, that the clean-flag does not exist. The old command ocis storage-users uploads clean is still working. It's the only command that's working but it's marked as deprecated, do not use it which is a uncomfy situation.

Incorrect Usage: flag provided but not defined: -clean

NAME:
   ocis storage-users uploads sessions - Print a list of upload sessions

USAGE:
   ocis storage-users uploads sessions command [command options] 

COMMANDS:
   help, h  Shows a list of commands or help for one command

OPTIONS:
   --id value    filter sessions by upload session id (default: unset)
   --processing  filter sessions by processing status (default: unset)
   --expired     filter sessions by expired status (default: unset)
   --json        output as json (default: false)
   --restart     send restart event for all listed sessions (default: false)
   --help, -h    show help
flag provided but not defined: -clean

Would appreciate a headup what I am doing wrong here. Additionally could you give me a small explaination what this list is actually for? If I understand correctly, a incomplete / failed upload will be stored here. What could cause something like this? Interruption of the internet connection of a uploading client?

One last question I have is why there is no automated task implemented here. I understand your reason, that there are a lot of microservices involved and you want to keep it scalable. But there is no requirement of a appropiate disk size if you are using the S3 storage backend. It makes sense to buffer files locally and upload them to S3 then. We are running the OCIS instance on a 1TB SSD machine. I believe other run them on much smaller hosts because they are using the S3 backend too. This kinda defeats the purpose for me because I have to scale the machine anyway, because I don't know which uploads of my users will eventually fail. For example right now I have 4 videos with around 200GB stuck and our alarms are firing for > 85% disk usage.

Something like a automated job for cleaning inactive and expired downloads would make a lot of sense. One thing I could imaging is a cronjob on the underlying system to exec into the container. But the command is not working as documented. That's just my two bits to this topic.

@micbar
Copy link
Contributor

micbar commented Aug 12, 2024

@butonic @mmattel I am ooto the next 2 weeks. Please help out.

@mmattel
Copy link
Contributor

mmattel commented Aug 13, 2024

@JonnyBDev to bring some light into this, regarding the option --clean does not exists:

You are referring to the documentation for the next release. Here, the option is already implemented, but next is not production as I suppose you are using... You can switch in the documentation to relevant releases on the buttom on the left or just overwrite next with (currently) 5.0. As you are using production, you should not take docs as reference for the upcoming release. Though the topic is not solved with this, you at least now know where to look at for the correct versioned documentation.

@butonic @kobergj maybe you can continue to help from the problem side.

@kobergj
Copy link
Collaborator

kobergj commented Aug 13, 2024

Which version are you using? The ocis storage-users uploads clean command should not be deprecated in5.x. Only ocis storage-users uploads list is.

From ocis 6.x on (rolling-release) both commands are deprecated in favor of the --clean flag on the session command. This allows filtering uploads before removing them.

Would appreciate a headup what I am doing wrong here.

If you are still on ocis 5.x: nothing. You did everything correctly.

If I understand correctly, a incomplete / failed upload will be stored here. What could cause something like this? Interruption of the internet connection of a uploading client?

Yes for example. Also if server crashes during postprocessing, then the upload is still there. In general it should mostly be aborted uploads.

Something like a automated job for cleaning inactive and expired downloads would make a lot of sense. One thing I could imaging is a cronjob on the underlying system to exec into the container.

We do it the same way for our deployments. Maybe it would be a benefit to integrate this into ocis directly? We could start a go routine that checks for expired uploads every x minutes. We would need to be careful when running multiple instances though. Anyways this would require a feature request.

@JonnyBDev
Copy link

Thank you @mmattel and @kobergj for your answers.

@mmattel You are absolutely right. Sometimes in the Owncloud universe it's kinda hard to keep track of the correct documentation or the correct software. The only indication here was a small dropdown in the lower left sidebar with the "next". But that's 100% on me. Should have checked this before. As you've mentioned, the command is not deprecated as of the version we are using.

@kobergj Thanks for your input. As described above, this is on me and I've read the wrong documentation. The --clean flag is only valid after we've installed the 6.x.

I would appreciate a routine to clear local stuck upload files. I don't know anything about the implications of a integration of this routine while running multiple instances. I guess that's something you as the actual maintainers have to "figure out". I'd open a feature request for this, if you could give me a small hint on where to open one.

Thanks for your help and sorry for the confusion cause by me.

@kobergj
Copy link
Collaborator

kobergj commented Aug 13, 2024

No problem. Means our documentation wasn't clear enough 😉

Just open a feature request in this repo. Then PO can decide on priority. https://github.com/owncloud/ocis/issues/new?assignees=&labels=Category%3AFeature%2C+Topic%3Aneeds-triage&projects=&template=feature_request.md&title=

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:Enhancement Add new functionality
Projects
Archived in project
Development

No branches or pull requests

7 participants