-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't allow executing cli if cache backend is unavailable #25770
Conversation
Ah so now it all makes and I think the patch is fine |
I guess we should drop support for APCu. cli does not share the cache with fpm/cgi/mod_php. This leads to weird results as web and cli may work with different data. |
well it's a cache not a data store. but yeah could result in problems when e.g. web is trying to wipe it. So let's add a setup deprecation warning for 22 and kill it in 23? |
Thanks @st3iny for sending a patch 👍 Looks good to me. An alternative approach could be to check if apcu is configured + cli request and then return a null cache instance instead. For redis and memcache it's indeed a problem and unexpected behaviour if the cache is not available on cli. For apcu it's expected that the cache is not available on cli. Still I'm not to happy about this change.
We run into a loop on logging "Memcache {class} not available for {use} cache". The logger requests an instance of IRequest, for IRequest we need SecureRandom, IConfig and CsrfTokenManager, CsrfTokenManager needs SessionStorage, SessionStorage needs ISession. ISession needs IUserSession, IUserSession needs IUserManager, IUserManager needs ICacheFactory and all starts over again. It's a bit sad that IRequest has so heavy dependencies. We could refactor Another approach could be to move the user backend cache into a CachedUserManager wrapping the UserManager. But that comes with some other difficulties. Of course. That's all to complicated to be backported 😞 |
cli requests always start in a clear state. technically it might be an issue on long running processes when some stat has changed outside… but that's rather a distributed cache topic and apcu should not be used there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bkessel's concerns: #25770 (comment)
Signed-off-by: Richard Steinmetz <richard@steinmetz.cloud>
2d6f391
to
0f949ea
Compare
Shouldn't this be backported? |
This comment has been minimized.
This comment has been minimized.
@skjnldsv backport to stable21? |
If you have APCu configured and enable_cli off, Nextcloud will not work anymore on CLI with this PR. |
Yes, this is certainly possible from a performance perspective. However, you still have to take a few things into consideration. You don't run occ in a loop. Thus it is very likely that every time you start the script, the opcache entries in the file are too old and have to be recreated again. In this case you have 2 performance hits: 1) opcache has to be created every time (which takes time) 2) it takes even longer, because it is writing to a file instead of memory.
Please, I really hope you don't consider not using opcache as the reason for Nextcloud crashing. The fact that it is disabled might bring another issue to light but it cannot be the root cause. Please note that APCu and opcache are two completely different things. Please don't mix them up. APCu is also an optional feature of the language. Update: Please note that I also replied to a question regarding file_cache 2 years ago: nextcloud/documentation#1439 (comment) |
You can't be serious. Think what you are saying here. This PR must be reverted. You are trying to work around someting, but this is insane. You have to fix the root cause. You apparently don't understand how PHP works. e.g. I have APCu configured, but I certainly do not enable it for cli. Because I actually know what I am doing. I am a performane expert. So now what? I can't use occ anymore? I seriously doubt you understand the implications and I am amazed how illogical this approach is. P.S.: I don't want to be hostile or harsh, but please think about what you are doing. This PR makes no sense. You don't have to believe me, you can talk to PHP developers or people who know how PHP works. This really shows me once again that design decisions are put into the wrong hands. |
Yes guys, please don't mix OPcache with the For APCu it is since ages that the docs state via warning that it should be enabled for CLI calls, and disabling it, always triggered a log entry on each The issue is that APCu can be easily (mis)used as data store, as it simply depends on the PHP script how it is read and written. And I agree that it makes much more sense to use it for what it is intended, as a cache that is used, when available for reading volatile data and else scripts must be able to generate this data freshly. And in case of writing data, if the configured memory cache is not available, a fallback to no cache (or custom SHM/ But hey, does it make sense to discuss all of this in a new issue? Currently this PR doesn't break anything for anyone, as, without it, the CLI call would still fail with current Nextcloud, or does it somehow still work for you @tessus? EDIT: Another thing: While I agree that building an APCu cache for each CLI call doesn't make sense (generally, I don't know how it is currently used on detail), we're talking about some milli seconds here while all Nextcloud CLI calls I'm aware of take orders of magnitude longer to finish. So the actual downside of having APCu enabled for CLI calls is marginal, at least the way I faced it. Of course this may be different when the same PHP instance is used by other applications with much more frequent small CLI calls, where initiating the APCu significantly delays/slows down those or increases the server load in general in a significant way (also thinking about additionally reserved memory and such). |
I haven't upgraded nextcloud for a while, because new releases always break my working environment. My current setup works. I am more than happy to dicuss this in a separate topic, but I am afraid the devs don't care. They misuse a cache and tell the world it's our fault, if we think that a cache should be a cache. There is nothing I can do to convince them otherwise. If bad design is the mantra of the developers I won't be able to change their minds. (I probably will get a ton of backlash for this comment, but it's a fact. It's not my fault, if overly sensitive people can't handle truth and facts.) |
Okay so then this PR makes sense as a workaround until at least the loop introduced with #25440 has been fixed, or better a general fallback for CLI calls in the cache factory itself, when the cache is not available. Everything is better than an OOM-killed instance, e.g. in the middle of an upgrade. Actually the code, prior to this commit, looks like it does quite what we want:
|
I didn't check the docs link from Jos, I thought he linked to the APCu cli_enable one, so no mixup there from my side.
Yes I am. I started caching something in the user backend (significantly reducing load) and it turns out to create an infinite loop in case APCu is configured but not available: because user backend > caching > logging > user backend > repeat
I don't have enough time to fix the circular dependency in my free time. But I don't have time to untangle those, so I took the shorter route of pointing out the misconfiguration: "APCu configured but not available" If you do so please have a quick thought how from within a CLI/OCC/cron call a
🙄 |
Thanks for clarifying and I agree that it is better to have such a workaround than the quite major issues users otherwise run into. To discuss a better solution I opened #27608. Interesting is, after having a deeper look into the code, that the old code seems to do exactly what I suggest. But then no loop would be possible, so obviously I am wrong. Would be great if you (whoever read this and is good in interpreting Nextcloud's PHP code) could have a look at it and point me to the misinterpretation I do, why the loop exists 🙂. |
What about all the paying customers? Don't you think they would consider this a priority? What I don't understand is that this is an essential issue. It is part of the core design, yet you want to build a workaround. I don't know why the devs always react negative when I point that out. Ignore me as the one with the problem right now. Just think of your business and your product. Forget the buts and ifs. You don't think it should be a priority to get the core design right and fix issues that are the result of either bad design or coding? |
@MichaIng #25770 (comment) the logger is causing the loop. |
How could I have overseen the explanation right there, thanks for for the anchor 👍. Wouldn't it then be the easiest and obvious solution to first assign the NullCache (which is already done anyway) and then do the log afterwards?
So the logger can pull in another cache factory but it won't enter this condition again (class exists and |
As merged PRs are not the best place to discuss things, |
The Nextcloud VM broke down due to this PR (the backported one). We run PHP-FPM, have all the stuff correct as per doumentation, but with 21.0.3 I'm quite baffled that you make such a large change without notice for a maintenance release. I just merged a panic PR to revert to Redis for local cache, so problem solved. But still... |
@enoch85 |
@MichaIng It was enabled but stopped working after this was merged. As I didn't have time to investigate further, and a new release was made (several hundreds of downloads per day) I was in quite the panic to get this fixed. So basically, Redis worked, APCu didn't, hence the revert. Though I agree that the best solution would be to actually run APCu for local cache, but since @kesselb mentioend this, I was like "ok, let's ditch it". I don't want to rely on something that might be ditched in 1 or even 10 versions. It should just work, no questions asked. The alternative is to run without any cache at all, that would be really fool proof. But since I aim to give the Nextcloud VM users the best possible performance / stability, Redis it is for now at least. |
Okay despite
In general yes, but for CLI, reading through related discussions, it does not seem to make sense to initiate any cache for a single CLI call, after which it is in case destroyed again. So best IMHO, after reading through all related discussion, is to not use any cache in the first place for CLI calls. Redis, as dedicated server, of course allows a shared cache across all PHP instances, but I'm not sure whether it's worth it for CLI, as Redis otherwise is slower than APCu (?). Also I'm not sure whether cache keys are not otherwise instantiated anyway so that CLI calls cannot access the existing keys from the web/PHP-FPM server? |
Yes indeed.
Yes, APCu is faster. Probably because the Redis traffic needs to travel a further distance than APCu, which is more nested into PHP. Anyway, don't want to start a discussion here. We have now reverted to Redis which we used for several years in the VM and it works. So I see no reason to change back, especially since APCu seems to be depreciated in coming Nextcloud versions for PHP-FPM. |
I don't think so, as it does not make any sense to remove support for the fasted local file cache solution, and relying on a shared cache would mean basically a misuse of a cache. I'll open a PR which disables local and distributed cache for all CLI calls in the first place, which after all I think is the only correct solution and makes all workarounds and But more discussion on that on the PR when it's up. |
but fallback to NullCache. This can be the case if APCu is used without apc.enable_cli=1. APCu (and ArrayCache) however runs within the PHP instance and hence cannot be shared between CLI and web or used as distributed cache. The CLI call can hence only create or invalildate entries for itself. For short-living CLI calls, this is theoretically a downsides regarding performance and resource usage, and Nextcloud must not have any issues using the dummy NullCache instead of an isolated freshly created and destroyed APCu or ArrayCache instance. This partly reverts #25770. The fallback was removed, because CLI calls started to hang after #25440. The reason however was not that a cache is generally required for CLI calls, but because the previously logged warning invoked the user backend, which invoked again the cacking backend, causing a loop. This commit re-adds the fallback without logging a warning. For mentioned reasons, it is okay to fallback to NullCache silently. The motivation is to make apc.enable_cli=1 optional, and that hence the documentation about it can be removed. We should not enforce admins to enable APCu for CLI calls, which is reasonably disabled by default. This also reduces requirements for hosting providers to support Nextcloud. Signed-off-by: MichaIng <micha@dietpi.com>
but fallback to NullCache. This can be the case if APCu is used without apc.enable_cli=1. APCu however runs within the PHP instance and hence cannot be shared between CLI and web or used as distributed cache. The CLI call can hence only create or invalidate entries for itself. For short-living CLI calls, this is theoretically a downsides regarding performance and resource usage, and Nextcloud must not have any issues using the dummy NullCache instead of an isolated freshly created and destroyed APCu instance. This partly reverts #25770. The fallback was removed, because CLI calls started to hang after #25440. The reason however was not that a cache is generally required for CLI calls, but because the previously logged warning invoked the user backend, which invoked again the caching backend, causing a loop. This commit re-adds the fallback without logging a warning, and for APCu only. For mentioned reasons, it is okay to fallback to NullCache silently. If Redis or memcached are configured but not available, then the web UI would fail as well, and it makes sense to abort and inform CLI calls as well then. The motivation is to make apc.enable_cli=1 optional, and that hence the documentation about it can be removed. We should not enforce admins to enable APCu for CLI calls, which is reasonably disabled by default. This also reduces requirements for hosting providers to support Nextcloud. Signed-off-by: MichaIng <micha@dietpi.com>
but fallback to NullCache. This can be the case if APCu is used without apc.enable_cli=1. APCu however runs within the PHP instance and hence cannot be shared between CLI and web or used as distributed cache. The CLI call can hence only create or invalidate entries for itself. For short-living CLI calls, this is theoretically a downsides regarding performance and resource usage, and Nextcloud must not have any issues using the dummy NullCache instead of an isolated freshly created and destroyed APCu instance. This partly reverts #25770. The fallback was removed, because CLI calls started to hang after #25440. The reason however was not that a cache is generally required for CLI calls, but because the previously logged warning invoked the user backend, which invoked again the caching backend, causing a loop. This commit re-adds the fallback without logging a warning, and for APCu only. For mentioned reasons, it is okay to fallback to NullCache silently. If Redis or memcached are configured but not available, then the web UI would fail as well, and it makes sense to abort and inform CLI calls as well then. The motivation is to make apc.enable_cli=1 optional, and that hence the documentation about it can be removed. We should not enforce admins to enable APCu for CLI calls, which is reasonably disabled by default. This also reduces requirements for hosting providers to support Nextcloud. Signed-off-by: MichaIng <micha@dietpi.com>
but fallback to NullCache. This can be the case if APCu is used without apc.enable_cli=1. APCu however runs within the PHP instance and hence cannot be shared between CLI and web or used as distributed cache. The CLI call can hence only create or invalidate entries for itself. For short-living CLI calls, this is theoretically a downsides regarding performance and resource usage, and Nextcloud must not have any issues using the dummy NullCache instead of an isolated freshly created and destroyed APCu instance. This partly reverts #25770. The fallback was removed, because CLI calls started to hang after #25440. The reason however was not that a cache is generally required for CLI calls, but because the previously logged warning invoked the user backend, which invoked again the caching backend, causing a loop. This commit re-adds the fallback without logging a warning, and for APCu only. For mentioned reasons, it is okay to fallback to NullCache silently. If Redis or memcached are configured but not available, then the web UI would fail as well, and it makes sense to abort and inform CLI calls as well then. The motivation is to make apc.enable_cli=1 optional, and that hence the documentation about it can be removed. We should not enforce admins to enable APCu for CLI calls, which is reasonably disabled by default. This also reduces requirements for hosting providers to support Nextcloud. Signed-off-by: MichaIng <micha@dietpi.com>
but fallback to NullCache. This can be the case if APCu is used without apc.enable_cli=1. APCu however runs within the PHP instance and hence cannot be shared between CLI and web or used as distributed cache. The CLI call can hence only create or invalidate entries for itself. For short-living CLI calls, this is theoretically a downsides regarding performance and resource usage, and Nextcloud must not have any issues using the dummy NullCache instead of an isolated freshly created and destroyed APCu instance. This partly reverts #25770. The fallback was removed, because CLI calls started to hang after #25440. The reason however was not that a cache is generally required for CLI calls, but because the previously logged warning invoked the user backend, which invoked again the caching backend, causing a loop. This commit re-adds the fallback without logging a warning, and for APCu only. For mentioned reasons, it is okay to fallback to NullCache silently. If Redis or memcached are configured but not available, then the web UI would fail as well, and it makes sense to abort and inform CLI calls as well then. The motivation is to make apc.enable_cli=1 optional, and that hence the documentation about it can be removed. We should not enforce admins to enable APCu for CLI calls, which is reasonably disabled by default. This also reduces requirements for hosting providers to support Nextcloud. Signed-off-by: MichaIng <micha@dietpi.com>
Fix #25742
Bug
Currently, cli code can be executed even if the configured cache backend is unavailable. In this case, the cache factory is logging a warning instead of throwing an exception.
Problem: The cache factory is pulling a logger dependency which pulls the user manager and since #25440 it is pulling a cache factory. Now the the whole thing starts over and repeats until the process runs out of memory.
Solution
I propose the cli to fail if a cache backend is unavailable because some pieces of the server code expect a cache backend to be initialized in any case. Furthermore, there is an explicit warning regarding APCu setups in the admin documentation because it needs some extra configuration to be available in cli environments.
Debugging
I added a print statement to
ServerContainer.php/query()
that prints the name of the requested dependency.Here is an excerpt showing the recursion
Look for OC\Log -> OC\User\Manager -> OC\Memcache\Factory -> OC\Log -> ...
This is a breaking change because occ and cron jobs won't be working anymore on some (misconfigured) ACPu setups. If someone can come up with a better solution or this is too much of a breaking change, feel free to close the PR.