[Bug]: Server hangs as php-fpm reach pm.max_children limit #39063

ThibautPlg · 2023-06-29T13:18:40Z

⚠️ This issue respects the following points: ⚠️

This is a bug, not a question or a configuration/webserver/proxy issue.
This issue is not already reported on Github OR Nextcloud Community Forum (I've searched it).
Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
I agree to follow Nextcloud's Code of Conduct.

Bug description

Hello,
I'm administrating multiple Nextcloud 25 instances, and I'm slowly upgrading to Nextcloud 26. However, after some days (some hours sometimes), each and every instance upgraded to Nextcloud 26 crash due to the php-fpm pm.max_children limit being reached.
I then need to restart php-fpm and everything goes normal until the next crash.

Additional context:

The servers are only hosting one Nextcloud server, with php 8.1
php-fpm max_children and other configuration values have been customized to match the available RAM of the host (between 32 to 64 childs allowed)
The php-fpm max children have never been a problem as far as I recall (some instances are older than NC20). The problem also occurs on newly installed testing instances.
The ram consumption is average, we're not maxed
I've tried with php8.2, same results

systemctl status php-fpm.service output when server is down:

● php-fpm.service - The PHP FastCGI Process Manager
   Loaded: loaded (/usr/lib/systemd/system/php-fpm.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-06-28 14:30:57 CEST; 24h ago
  Process: 3345857 ExecReload=/bin/kill -USR2 $MAINPID (code=exited, status=0/SUCCESS)
 Main PID: 462753 (php-fpm)
   Status: "Processes active: 64, idle: 2, Requests: 127550, slow: 0, Traffic: 0req/sec"
    Tasks: 67 (limit: 47993)
   Memory: 1.7G
   CGroup: /system.slice/php-fpm.service
           ├─ 462753 php-fpm: master process (/etc/php-fpm.conf)
           ├─ 613484 php-fpm: pool nextcloud
           ├─ 613485 php-fpm: pool nextcloud
           ├─ 613486 php-fpm: pool nextcloud
           ├─ 613487 php-fpm: pool nextcloud
           ├─ 613488 php-fpm: pool nextcloud
           ├─ 613489 php-fpm: pool nextcloud
(the list goes on)

I feel like some processes are idle (hanged?) and never stopped.

Am I the only one facing this issue? Why does it occurs only with NC26? What has changed regarding php processes?

Best regards,

Steps to reproduce

Upgrade to NC26 or install a fresh server
Wait a little bit
Observe php-fpm being overwhelmed

Expected behavior

Same as before, Nextcloud (php-fpm?) should remove children.

Installation method

None

Nextcloud Server version

26

Operating system

RHEL/CentOS

PHP engine version

PHP 8.1

Web server

Nginx

Database engine version

None

Is this bug present after an update or on a fresh install?

None

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

Default user-backend (database)
LDAP/ Active Directory
SSO - SAML
Other

Configuration report

No response

List of activated Apps

No response

Nextcloud Signing status

No response

Nextcloud Logs

No response

Additional info

No response

The text was updated successfully, but these errors were encountered:

joshtrichards · 2023-06-30T16:54:00Z

Hi @ThibautPlg:

There are a lot of possibilities, but you didn't provide a complete Issue form. :-)

The most important items to check off the top of my head that might provide clues:

your php-fpm logs
your nextcloud.log
your php-fpm status page (particularly in full mode)
your php-fpm pool configuration
your nginx configuration (also worth comparing against the one in the NC manual since it's periodically updated in-between NC versions)
which NC apps are active

Are all of these instances essentially similarly configured/built? While anything is possible, chances are this is some sort of local environment interaction.

I would suggest posting this in the Nextcloud Help Forum first. NC26 has been out awhile and I haven't seen rampant reports of new php-fpm related issues personally.

bjo81 · 2023-08-14T08:19:01Z

@ThibautPlg Does this issue disappear when you disable previews? We have an instance where php-fpm is stuck with requests like GET /core/preview?fileId=1151626&c=25fa3ba9d97519e2dd8f7ef2595bdf02&x=500&y=500&forceIcon=0&a=1 HTTP/2.0. Even running the preview generator app get's stuck on generating a preview of a PDF like "Nextcloud Flyer.pdf" then, so it seems the whole preview generation is stuck. All php-fpm processes hang at semop(1, [{0, -1, SEM_UNDO}],
The issue first appeared with 26.0.0, so it could be related to the new preview generation code. With < 26.0.0 15 php-fpm workers were fine, but now eben 270 are not enough as they alwas hang and never get killed.

ThibautPlg · 2023-08-21T09:10:31Z

Hi
Sorry for my absence of answers to @joshtrichards , I had a lot on my plate lately and this subject kind of went in background.

This is an example of a regular work day for my users, as you can see the fpm processes suddenly spikes until I manually reload them and everything goes right until the next time the server has a mood change.
I haven't noticed anything linked to the previews, requests are quite random and the slow.log contains entries for all kinds of scripts and nothing rings a bell for me.

In the end we "fixed" the problem by adding the following line to our php-fpm config : request_terminate_timeout = 5m.

marc4s · 2023-08-29T05:34:36Z

on my instance this started with 25.0.9 or 25.0.10. ltoday I upgraded to 26.0.5, I will check in the next days the issue persists...

preview was all the disabled

diego-treitos · 2023-10-11T07:53:11Z

I am experiencing this issue too and I also noticed it since the upgrade to 25.x.x (not sure what version). I administer several NC servers and all of them have the same behavior. While request_terminate_timeout = 5m is a working workaround, I think it is only a patch and I guess it might have an impact on performance? Anyway this looks like a bug on NC as it started happening after an upgrade.

diego-treitos · 2023-10-17T10:21:17Z

Actually setting request_terminate_timeout = 5m creates a problem when syncing big files. If you check the documentation for uploading big files (https://docs.nextcloud.com/server/20/admin_manual/configuration_files/big_file_upload_configuration.html) you see that they recommend to raise timeouts even to 1 hour. This means that if your requests terminate after 5 minutes, your big files won't sync.

I think the problem might be related to nextcloud server either not closing database connections or not recycling them in future requests because I've observed that database connections increase at the same pace that workers.

This issue is causing a lot of troubles in all my instances. I am surprised that it is not getting more attention.

ThibautPlg · 2023-10-17T13:10:59Z

The request_terminate_timeout is indeed only a workaround. We've also noticed a high increase in mariadb connections prior to a total overflow of php-fpm processes.
Quite hard to debug. Thanks for your comment though, glad (kind of) to not be the only one affected by this behavior.

robert-scheck · 2023-11-08T14:23:43Z

I see exactly the same result like @ThibautPlg reported, however on a slightly different system:

CentOS 7 (fully up-to-date)
PHP 8.1 from Remi's Safe RPM repository
Nextcloud 27.1.2

However, it is mod_php instead of PHP-FPM, but all Apache webserver processes are stuck at semop() as well, when this issue occurs. Using strace, I unfortunately can gather this:

semop(12, [{0, -1, SEM_UNDO}], 1

And on this system, the pm.max_children limit isn't hit (because no PHP-FPM), but instead the maximum of Apache webserver processes (httpd) or the maximum of MariaDB connections (depending on where you set a higher limit) at mysqld.

szaimen · 2023-11-08T16:17:08Z

#41263

Githopp192 · 2023-12-08T18:51:56Z

had a similar issue a one week ago - did increase pm.max_children and there was some process, which eat my whole memory (php-fpm setting was set to "on demand").

I did play with php-fpm setting "static" and "dynamic" - but both eat too much memory.

So i switched back to setting: "ondemand"

This calculation helped me finding the right values:
(where with "ondemand" you only would need "pm.max_children".
Additionally i set:

pm.process_idle_timeout = 10
pm.max_requests = 500

(see: ; ondemand - no children are created at startup. Children will be forked when
; new requests will connect. The following parameter are used:
; pm.max_children - the maximum number of children that
; can be alive at the same time.
; pm.process_idle_timeout - The number of seconds after which
; an idle process will be killed.

)

So far so long - no memory issues with php-fpm at all.

AvailableRAM=$(awk '/MemAvailable/ {printf "%d", $2/1024}' /proc/meminfo)
AverageFPM=$(ps --no-headers -o 'rss,cmd' -C php-fpm|awk '{ sum+=$1 } END { printf ("%d\n", sum/NR/1024,"M") }')
FPMS=$((AvailableRAM/AverageFPM))
PMaxSS=$((FPMS*2/3))
PMinSS=$((PMaxSS/2))
PStartS=$(((PMaxSS+PMinSS)/2))
echo "-------------------------"
echo "AvailableRAM:$AvailableRAM"
echo "AverageFPM:$AverageFPM"
echo "pm.max_children:$FPMS"
echo "pm.start_servers:$PStartS"
echo "pm.min_spare_servers:$PMinSS"
echo "pm.max_spare_servers:$PMaxSS"
echo "-------------------------"

Calculation PHP-FPM-Tweaks:

AvailableRAM:6457
AverageFPM:120
pm.max_children:53
pm.start_servers:26
pm.min_spare_servers:17
pm.max_spare_servers:35

diego-treitos · 2023-12-13T17:02:11Z

For me this was fixed in 27.1.4. If you are experiencing this problem, please be sure to have Nextcloud upgraded to at least that version before reporting the problem.

Githopp192 · 2023-12-14T01:49:57Z

i'm om 27.1.4

metafarion · 2024-02-14T01:25:39Z

I'm still observing this on one of my instances running 27.1.6. I've been incrementally increasing pm.max_children, 12, 20, 30, 40, 50. Each time as soon as WARNING: [pool www] server reached pm.max_children setting (40), consider raising it shows up in the log, the site cannot load anything further. Looking in top at this time reveals no active threads at all and a load average of near zero. It really just seizes up and can't do any more work until php-fpm is restarted.

Using PHP-FPM 8.2.7 and Apache 2.4.57

diego-treitos · 2024-02-20T15:45:01Z

I agree that started observing a similar behavior again. This time the processes dissappear after a few minutes, but still it creates the problem of having many processes doing nothing and the service becoming unavailable.

metafarion · 2024-02-20T15:59:36Z

I don't want to tempt fate here, but I MAY have resolved my instance by adjusting a different php.ini parameter. I'm kicking myself now for not specifically noting which one, but I was in a hurry at the time. I can say that I wouldn't have known to do it except for a suggestion that showed up in the Nextcloud Administration Settings > Overview panel under Security & setup warnings ONLY after the pm.max_children warning appearing in the system php log, but before the Nextcloud web interface became unresponsive. It was a pretty brief window, but still catchable if you set up some kind of trigger to watch the log file.

It could also be total coincidence :-P

Githopp192 · 2024-02-22T09:27:28Z

as i've written - switched back to setting: "ondemand" - on all problems solved

metafarion · 2024-03-18T02:59:21Z

Alright, so my earlier victory was ultimately short-lived, and probably coincidental. The thing that ACTUALLY seems to have fixed this for me was installing php-smbclient. My NC instance is entirely a mounted SMB external share, and without php-smbclient, any file transfer that took too long or was larger than 512MB would hang and lock up a child process until there were no more available and the server would stop processing requests.

MrRinkana · 2024-08-29T13:30:53Z

I'm still observing this on one of my instances running 27.1.6. I've been incrementally increasing pm.max_children, 12, 20, 30, 40, 50. Each time as soon as WARNING: [pool www] server reached pm.max_children setting (40), consider raising it shows up in the log, the site cannot load anything further. Looking in top at this time reveals no active threads at all and a load average of near zero. It really just seizes up and can't do any more work until php-fpm is restarted.

Using PHP-FPM 8.2.7 and Apache 2.4.57

Just for reference, similar symptoms can appear if you use keepalive between Apache and php-fpm. Not sure why but it's probably unnecessary either way if you run php-fpm on the same machine especially if using unix-sockets

ThibautPlg added 0. Needs triage Pending check for reproducibility or if it fits our roadmap bug labels Jun 29, 2023

szaimen added the 26-feedback label Jul 10, 2023

th0rgall mentioned this issue Aug 9, 2023

App breaks when php max children reached nextcloud/all-in-one#3116

Closed

ThibautPlg closed this as completed Aug 21, 2023

ThibautPlg reopened this Oct 17, 2023

joshtrichards added the performance 🚀 label Nov 8, 2023

szaimen closed this as completed Nov 23, 2023

maxhoesel mentioned this issue Sep 17, 2024

SMB nextcloud/docker#2145

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Server hangs as php-fpm reach pm.max_children limit #39063

[Bug]: Server hangs as php-fpm reach pm.max_children limit #39063

ThibautPlg commented Jun 29, 2023

joshtrichards commented Jun 30, 2023

bjo81 commented Aug 14, 2023 •

edited

Loading

ThibautPlg commented Aug 21, 2023

marc4s commented Aug 29, 2023

diego-treitos commented Oct 11, 2023

diego-treitos commented Oct 17, 2023 •

edited

Loading

ThibautPlg commented Oct 17, 2023

robert-scheck commented Nov 8, 2023 •

edited

Loading

szaimen commented Nov 8, 2023

Githopp192 commented Dec 8, 2023 •

edited

Loading

diego-treitos commented Dec 13, 2023

Githopp192 commented Dec 14, 2023

metafarion commented Feb 14, 2024 •

edited

Loading

diego-treitos commented Feb 20, 2024

metafarion commented Feb 20, 2024

Githopp192 commented Feb 22, 2024

metafarion commented Mar 18, 2024

MrRinkana commented Aug 29, 2024

[Bug]: Server hangs as php-fpm reach pm.max_children limit #39063

[Bug]: Server hangs as php-fpm reach pm.max_children limit #39063

Comments

ThibautPlg commented Jun 29, 2023

⚠️ This issue respects the following points: ⚠️

Bug description

Steps to reproduce

Expected behavior

Installation method

Nextcloud Server version

Operating system

PHP engine version

Web server

Database engine version

Is this bug present after an update or on a fresh install?

Are you using the Nextcloud Server Encryption module?

What user-backends are you using?

Configuration report

List of activated Apps

Nextcloud Signing status

Nextcloud Logs

Additional info

joshtrichards commented Jun 30, 2023

bjo81 commented Aug 14, 2023 • edited Loading

ThibautPlg commented Aug 21, 2023

marc4s commented Aug 29, 2023

diego-treitos commented Oct 11, 2023

diego-treitos commented Oct 17, 2023 • edited Loading

ThibautPlg commented Oct 17, 2023

robert-scheck commented Nov 8, 2023 • edited Loading

szaimen commented Nov 8, 2023

Githopp192 commented Dec 8, 2023 • edited Loading

Calculation PHP-FPM-Tweaks:

AvailableRAM:6457 AverageFPM:120 pm.max_children:53 pm.start_servers:26 pm.min_spare_servers:17 pm.max_spare_servers:35

diego-treitos commented Dec 13, 2023

Githopp192 commented Dec 14, 2023

metafarion commented Feb 14, 2024 • edited Loading

diego-treitos commented Feb 20, 2024

metafarion commented Feb 20, 2024

Githopp192 commented Feb 22, 2024

metafarion commented Mar 18, 2024

MrRinkana commented Aug 29, 2024

bjo81 commented Aug 14, 2023 •

edited

Loading

diego-treitos commented Oct 17, 2023 •

edited

Loading

robert-scheck commented Nov 8, 2023 •

edited

Loading

Githopp192 commented Dec 8, 2023 •

edited

Loading

AvailableRAM:6457
AverageFPM:120
pm.max_children:53
pm.start_servers:26
pm.min_spare_servers:17
pm.max_spare_servers:35

metafarion commented Feb 14, 2024 •

edited

Loading