Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PM2/Vizion: Daemon out-of-heap memory issue when running with Vizion activated in large codebase with 50+ processes #5051

Open
hmmmsausages opened this issue Apr 16, 2021 · 6 comments

Comments

@hmmmsausages
Copy link

hmmmsausages commented Apr 16, 2021

What's going wrong?

We're using PM2 to run around 50+ Node.js processes on several quite performant servers on AWS (EC2 | c4.4xlarge | 16vCPU | 30GB RAM) with code from a fairly large Git repository (118.7MB w/o node_modules and a history of ~38,000 commits).

This has been working without any performance issues, as long as the .git folder isn't present when running pm2 start with an ecosystem YAML file. PM2 daemon and the spawned processes worked well without CPU or memory issues.

If the .git repository however is present and we attempt to run the same pm2 start command with the mentioned ecosystem YAML file, the PM2 daemon crashes after 40s with the following stack trace:

PM2        | <--- Last few GCs --->
PM2        | art of marking 772 ms) (average mu = 0.344, current mu = 0.296) allocation[12945:0x373d7d0]   116063 ms: Mark-sweep 1391.7 (1409.9) -> 1385.0 (1404.6) MB, 734.1 / 0.0 ms  (+ 8.0 ms in 2 steps since start of marking, biggest step 8.0 ms, walltime since start of marking 761 ms) (average mu = 0.229, current mu = 0.074) allocation [12945:0x373d7d0]   116876 ms: Mark-sweep 1387.9 (1408.1) -> 1385.5 (1408.1) MB, 750.1 / 0.0 ms  (average mu = 0.159, current mu = 0.077) allocation failure GC in old space requested
PM2        |
PM2        |
PM2        | <--- JS stacktrace --->
PM2        |
PM2        | ==== JS stack trace =========================================
PM2        |
PM2        |     0: ExitFrame [pc: 0x2591f025bf1d]
PM2        | Security context: 0x33bd10d9e6c1 <JSObject>
PM2        |     1: parseIndex(aka parseIndex) [0x19f5c43b7dd9] [/usr/lib/node_modules/pm2/node_modules/js-git/mixins/fs-db.js:~258] [pc=0x2591f099d457](this=0x2ecfe85026f1 <undefined>,buffer=0x0a5ad102edf1 <Uint8Array map = 0x1fc8f7bd00c9>)
PM2        |     2: /* anonymous */(aka /* anonymous */) [0x3d40bd782a69] [/usr/lib/node_modules/pm2/node_modules/js-git/mixins/fs-db.js:196] [b...
PM2        |
PM2        | FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
PM2        |  1: 0x8fb090 node::Abort() [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        |  2: 0x8fb0dc  [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        |  3: 0xb0336e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        |  4: 0xb035a4 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        |  5: 0xef7602  [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        |  6: 0xef7708 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        |  7: 0xf037e2 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        |  8: 0xf04114 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        |  9: 0xf06d81 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        | 10: 0xecfe76 v8::internal::Factory::AllocateRawArray(int, v8::internal::PretenureFlag) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        | 11: 0xed06fa v8::internal::Factory::NewFixedArrayWithFiller(v8::internal::Heap::RootListIndex, int, v8::internal::Object*, v8::internal::PretenureFlag) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        | 12: 0xed07a0 v8::internal::Handle<v8::internal::FixedArray> v8::internal::Factory::NewFixedArrayWithMap<v8::internal::FixedArray>(v8::internal::Heap::RootListIndex, int, v8::internal::PretenureFlag) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        | 13: 0x100fc89 v8::internal::HashTable<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::New(v8::internal::Isolate*, int, v8::internal::PretenureFlag, v8::internal::MinimumCapacity) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        | 14: 0x1034b82 v8::internal::BaseNameDictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::EnsureCapacity(v8::internal::Handle<v8::internal::NameDictionary>, int) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        | 15: 0x1034d50 v8::internal::BaseNameDictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::AddNoUpdateNextEnumerationIndex(v8::internal::Handle<v8::internal::NameDictionary>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDetails, int*) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        | 16: 0x1034ffb v8::internal::BaseNameDictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::Add(v8::internal::Handle<v8::internal::NameDictionary>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDetails, int*) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        | 17: 0x1198588 v8::internal::Runtime_AddDictionaryProperty(int, v8::internal::Object**, v8::internal::Isolate*) [PM2 v4.5.6: God Daemon (/home/ubuntu/.pm2)]
PM2        | 18: 0x2591f025bf1d
PM2        | 2021-04-15T09:16:34: PM2 log: ===============================================================================
PM2        | 2021-04-15T09:16:34: PM2 log: --- New PM2 Daemon started ----------------------------------------------------
PM2        | 2021-04-15T09:16:34: PM2 log: Time                 : Thu Apr 15 2021 09:16:34 GMT+0000 (Coordinated Universal Time)
PM2        | 2021-04-15T09:16:34: PM2 log: PM2 version          : 4.5.6
PM2        | 2021-04-15T09:16:34: PM2 log: Node.js version      : 10.24.1
PM2        | 2021-04-15T09:16:34: PM2 log: Current arch         : x64
PM2        | 2021-04-15T09:16:34: PM2 log: PM2 home             : /home/ubuntu/.pm2
PM2        | 2021-04-15T09:16:34: PM2 log: PM2 PID file         : /home/ubuntu/.pm2/pm2.pid
PM2        | 2021-04-15T09:16:34: PM2 log: RPC socket file      : /home/ubuntu/.pm2/rpc.sock
PM2        | 2021-04-15T09:16:34: PM2 log: BUS socket file      : /home/ubuntu/.pm2/pub.sock
PM2        | 2021-04-15T09:16:34: PM2 log: Application log path : /home/ubuntu/.pm2/logs
PM2        | 2021-04-15T09:16:34: PM2 log: Worker Interval      : 30000
PM2        | 2021-04-15T09:16:34: PM2 log: Process dump file    : /home/ubuntu/.pm2/dump.pm2
PM2        | 2021-04-15T09:16:34: PM2 log: Concurrent actions   : 2
PM2        | 2021-04-15T09:16:34: PM2 log: SIGTERM timeout      : 1600
PM2        | 2021-04-15T09:16:34: PM2 log: ===============================================================================

Which looks like the PM2 daemon itself runs out the default Node.js heap memory limit of 1.4GB.

We then due to some internet research encountered issues like #3403 and similar, which led us to try running PM2 with Vizion disabled by anotating the ecosystem entries with vizion: false. This ultimately kept the PM2 daemon from crashing even when the .git folder was present in the project's directory.

From our observations this issue only happens when the list of processes is fairly big, i.e. 50+ and the Git repository itself as well. Running PM2 with less processes caused the PM2 daemon to run out of memory slower (e.g. crash after 30mins instead of 40s).

Question:

Are there any known performance issues with Vizion when using PM2 with large Git repositories and large number of processes?
Perhaps it could make sense to either have Vizion disabled by default (I guess that would be a breaking change, though) or document a warning or some guidance in the PM2 documentation, as this side effect is not very easy to spot, and Vizion itself and what it does in PM2 is not very well documented. In fact I wasn't aware about Vizion at all, before we had this issue.

How could we reproduce this issue?

Unfortunately not easy to reproduce as this seems to only happen on a certain scale:
Run PM2 with 50+ processes in a large codebase (~118MB without node_modules / ~4500 files without node_modules / ~38k commits) with .git directory present and Vizion enabled (default).
Wait a couple of minutes until PM2 is unresponsive and crashes due it reaching the heap memory limit

Supporting information

--- PM2 report ----------------------------------------------------------------
Date                 : Fri Apr 16 2021 09:03:52 GMT+0000 (Coordinated Universal Time)
===============================================================================
--- Daemon -------------------------------------------------
pm2d version         : 4.5.6
node version         : 10.24.1
node path            : /usr/bin/pm2
argv                 : /usr/bin/node,/usr/lib/node_modules/pm2/lib/Daemon.js
argv0                : node
user                 : ubuntu
uid                  : 1000
gid                  : 1000
uptime               : 1233min
===============================================================================
--- CLI ----------------------------------------------------
local pm2            : 4.5.6
node version         : 10.24.1
node path            : /usr/bin/pm2
argv                 : /usr/bin/node,/usr/bin/pm2,report
argv0                : node
user                 : ubuntu
uid                  : 1000
gid                  : 1000
===============================================================================
--- System info --------------------------------------------
arch                 : x64
platform             : linux
type                 : Linux
cpus                 : Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz
cpus nb              : 16
freemem              : 17694937088
totalmem             : 31594016768
home                 : /home/ubuntu
===============================================================================
--- PM2 list -----------------------------------------------
# 54 process - details retracted 

Module
┌────┬───────────────────────────────────────┬────────────────────┬───────┬──────────┬──────┬──────────┬──────────┬──────────┐
│ id │ module                                │ version            │ pid   │ status   │ ↺    │ cpu      │ mem      │ user     │
├────┼───────────────────────────────────────┼────────────────────┼───────┼──────────┼──────┼──────────┼──────────┼──────────┤
│ 0  │ pm2-logrotate                         │ 2.7.0              │ 27692 │ online   │ 0    │ 0.3%     │ 72.2mb   │ ubuntu   │
└────┴───────────────────────────────────────┴────────────────────┴───────┴──────────┴──────┴──────────┴──────────┴──────────┘
===============================================================================
@mgabeler-lee-6rs
Copy link

Some colleagues of mine are seeing the same issue when using pm2 to manage our development runtime, though we don't use an "ecosystem" file. PM2 4.4.0 doesn't have the issue, so I'd wager that this is at least partially related to the update of the vizion dep in PM2 4.5.3.

@shizheng163
Copy link

I also encountered this problem. I found that closing vizion had no effect on my watch operation. I am not sure what the vizion module does. In the early stage when I just need to start a redis, it takes up 3G of memory.
I don’t know what impact turning off vizion will have on my program.

@shizheng163
Copy link

I found that vizion is used for parsing git repositories, and it seems that git repositories are used for automated deployment operations such as CI/CD. My requirements do not include this aspect and it is available after turning this option off.
pm2 start <app> --no-vizion

@KoteKotlyarov
Copy link

Same issue but different story.

  • Node: 16
  • PM2: 4

PM2 has ~20 application, each app has max_memory_restart set to 500M. Everything worked like a charm 24/7 for more than an year then suddenly on one day pm2 began to crash completely with


[42292:000002BD3D129F60] 11802315 ms: Scavenge 62.6 (70.4) -> 61.9 (70.4) MB, 0.50 / 0.01 ms  (average mu = 1.000, current mu = 1.000) task; 
[42292:000002BD3D129F60] 11803487 ms: Scavenge 62.8 (70.4) -> 62.0 (70.4) MB, 0.41 / 0.01 ms  (average mu = 1.000, current mu = 1.000) allocation failure; 
[42292:000002BD3D129F60] 11805167 ms: Scavenge 62.9 (70.4) -> 61.9 (70.4) MB, 0.31 / 0.00 ms  (average mu = 1.000, current mu = 1.000) allocation failure; 


<--- JS stacktrace --->

FATAL ERROR: Committing semi space failed. Allocation failed - JavaScript heap out of memory

The interesting part is that this error appears in random app log, and sometimes this error message is truncated. Running pm2 l again shows that PM2 was not running and runs it with 0 apps.

The apps itself can't produce this error message because pm2 checks that their memory usage is smaller then a limit of 500 Mb. If app will use more memory, pm2 will restart the app. So the PM2 itself produces this error? But why it logs this error in random application's log ?

I have updated NodeJS to 20.17.0, PM2 to 5.4.2, setup separate script to log the memory usage of PM2 apps. The workflow in the following:

  • Runs PM2, resurrect all applications, setup script that will log periodically the response from pm2 l command;
  • After several minutes apps consume their average memory usage, around 100-150 Mb per app;
  • For the following 5-20 hours all apps are working fine, script shows normal pm2 l output;
  • Then from time to time pm2 l produces response where MEM column shows 0b. But 'status' column displays 'online';
  • After another several hours, pm2 starts to produce this output more often;
  • And then suddenly the response from pm2 l shows 0 applications, only pm2-logrotate module is present.

Last valid output shows normal memory usage.

There was no changes in codebase, nor in modules, nor in anything for months. Then this issue happens. But after updating the Node and PM2 this issue still happens on almost daily basis.

Maybe this story will provide some clues to other readers. Otherwise, this is pointless info.

@Unitech
Copy link
Owner

Unitech commented Nov 15, 2024

I have disabled vision by default in the prochain release of PM2,
Could you make a try and let me know if that works for you:

npm install pm2-beta -g
pm2 update

Or with bun:

bun install pm2-beta -g
pm2 update

Thanks

@hmmmsausages
Copy link
Author

Thanks @Unitech - though I'm the original poster, who raised this issue a few years ago, I don't have any means anymore unfortunately to give this a spin. Hope someone else who commented on this post can verify this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants