instance.close() let zombie process #615

gumieri · 2017-08-30T12:48:50Z

I am using puppeteer inside Docker as 'html to pdf' converter API.
Even calling browser.close() (which browser is the instance from puppeteer.launch()) I have chrome processes as zombie inside my instance:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 pastor    20   0  922068  26900  11404 S   0.0  0.7   0:00.66 node
   10 pastor    20   0   11740   1896   1512 S   0.0  0.1   0:00.04 bash
   33 pastor    20   0   51892   2008   1428 R   0.0  0.1   0:00.05 top
   40 pastor    20   0       0      0      0 Z   0.0  0.0   0:00.01 chrome
   62 pastor    20   0       0      0      0 Z   0.0  0.0   0:00.02 chrome
   69 pastor    20   0       0      0      0 Z   0.0  0.0   0:00.74 chrome

I tried to call page.close() too but nothing changed.

The text was updated successfully, but these errors were encountered:

michaeljho · 2017-09-08T17:25:38Z

Hi, I am still having this issue (zombies left from puppeteer inside Docker). Running on Debian 8.9, using puppeteer 0.10.2

gumieri · 2017-09-08T20:48:53Z

The scenery that I reported above still happening too.

aslushnikov · 2017-09-14T05:27:24Z

@michaeljho @gumieri do you guys just run puppeteer scripts in docker and let them finish successfully?

I can't reproduce this behavior with puppeteer 0.11.0-alpha. Do you do anything special?

michaeljho · 2017-09-14T05:49:54Z

Our use case is to screengrab to pdf or png. We open a page, wait for a particular CSS selector to show up, then generate pdf or screenshot. Then in finally block, page.close().then(browser.close()). Consoles show that these are being executed "successfully".

This is an oversimplification, can give more direct src if it helps. But I think it's a pretty straightforward flow?

Thanks for taking the time to track this down. For now, we're working around this by launching puppeteer inside our own forked processes, then killing the forked process upon completion. This allows us to clear zombies without having to shut down our main listening node or restart the Docker container.

aslushnikov · 2017-09-14T05:54:26Z

@michaeljho Can you reproduce this reliably? Any chance you can share a snippet with me?

michaeljho · 2017-09-14T06:12:36Z

return Promise.resolve()
    .then(() =>
      puppeteer
        .launch({ args: ['--no-sandbox'] }) // TODO: we cannot sandbox for now because of we're running as root in prod, but we want to remove this eventually?
        .then(pBrowser => {
          logger.info('Puppeteer successfully launched, asking for new page...');
          browser = pBrowser;
          return browser.newPage();
        })
        .then(pPage => {
          logger.info('New page retrieved, beginning export...');
          page = pPage;
          return page;
        })
        .then(() => {
          logger.info('Setting extra HTTP headers', headers);
          return page.setExtraHTTPHeaders(headers);
        })
        .then(() => {
          const url = `${config.baseUrl}${uri}`;
          logger.info('Navigating to', url);
          page.on('console', (...args) => {
            logger.info(`page console: ${args.join(' ')}`);
          });
          page.on('request', request => logger.info('Page request sent:', request.url));
          page.on('error', error => logger.error('Chromium Tab CRASHED', error));
          page.on('pageerror', pageerror => logger.warn('Page encountered unhandled exception:', pageerror));
          return page.goto(url);
        })
        .then(() => {
          logger.info('Page loaded, waiting for CSS selector', selector);
          return page.waitForSelector(selector, { timeout: 300000 }); // 5 minute cap for page to fully load
        })
        .then(() => {
          logger.info('Selector', selector, 'detected, doing a sleep to allow painting to finish...');
          return sleep(sleepDuration);
        })
        .then(() => {
          logger.info('ready for export, attempting raster...');
          if (format === 'png') {
            return page.screenshot({ fullPage: true, omitBackground: true });
          }
          return page.pdf();
        })
        .then(file => {
          const fileName = `${id}.${format}`;
          return {
            fileName,
            file
          };
        })
    )
    .finally(() => {
      if (page) {
        logger.info('Closing puppeteer page');
        page.close().then(() => {
          if (browser) {
            logger.info('Closing puppeteer browser');
            browser.close();
          }
        });
      }
    });

I realize now that we are also using a bunch of page listeners too, forgot to mention that. Every time this script runs, we get a zombie.

aslushnikov · 2017-09-14T06:21:25Z

@michaeljho thanks, trying now. I noticed .finally() method that is not a part of native promises.
Do you use a third-party promise library or you polyfilled .finally for native promises?

michaeljho · 2017-09-14T07:05:39Z

Ahh yeah good catch ... we're using bluebird here. We see both of those consoles, so I believe at least the page is closing "successfully".

gumieri · 2017-09-14T12:40:06Z

I executed my code out of a Docker, in a CentOS VM using the Puppeteer v10.2.
This issue does not happens, so may it should not me pointed as a bug for the Puppeteer.

If you consider viable to let this Issue open and want to investigate, feel free to use my code.
In the latest version (https://github.com/tecnospeed/pastor) still creating zombie.

michaeljho · 2017-09-18T19:44:48Z

So after some more investigation, it appears that it's the zygote and renderer forks that become defunct. The root puppeteer process terminates properly, but leaves its zygote subprocess (and subsequently the renderer sub of the zygote process) around.

Still digging more on our end. Just wanted to post info here as we find it in case it's helpful.

michaeljho · 2017-09-18T19:53:00Z

Okay -- some more info now. Sending a SIGTERM or even SIGKILL signal to the zygote process leaves it in a defunct state. My test case was this, not sure if valid here:

launch puppeteer via REPL
via shell, kill -15 the renderer process. terminates properly
via shell, kill -15 the zygote process, leaves defunct process
tried the same with SIGKILL and same symptoms.

My guess is that kill signals are not being handled properly in zygote, which is exacerbated by pid1 reassignment/orphaning that happens in docker, meaning killing the original puppeteer process has no effect.

michaeljho · 2017-09-18T21:01:52Z

I was able to work around this issue by adding an init system to my docker image. I chose https://github.com/Yelp/dumb-init. Works perfectly, no more zombies.

@gumieri maybe you want to try this out yourself?

@aslushnikov I still think it's strange that sending any kill signal to the zygote leaves things defunct. But lack of init system in Docker + re-parenting orphans to pid 1 was definitely exaggerating the issue.

frko · 2017-10-13T15:48:28Z

We've been running puppeteer in the google cloud with kubernetes in an attempt to unify performance and load testing of our website - which involves starting 1000+ containers.

We're running the default node8 slim container with puppeteer which fetches 'abstract user journeys' and executes those using puppeteer.

To overcome overhead of starting the rather large containers for every session we keep the containers alive and close the browser and re-open it for every session as we need a clean context.

Depending on the amount of the size of the executed sessions the amount of chrome processes grows slower or faster but all containers eventually will 'hang' after the following unhandled promise rejection.

(node:27) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Page crashed!
(node:27) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

We're trying to figure out what is causing this issue but we currently think this is related to the immense amount of processes.

We're trying to fix this using the above 'fix' with dumb-init but I would really like to understand how I can catch this unhandled promise rejection. As far as we can see all exposed promises are guarded with an catch but we've not been able to stop the system from hanging and failing to recover from the failure.

Might this indeed be caused by the processes? And how do I guard against this failure so we can keep the containers alive.

Regards, Frank

michaeljho · 2017-10-13T15:58:21Z

Frank,

Have you tried starting your entire promise chain with a Promise.resolve(), then .catch() anything that falls through? This is a good practice especially when trying to avoid unhandled exceptions.

Also, you can use this event handler to detect page crashes:

page.on('error', error => {
  logger.error('Chromium Tab CRASHED', error);
});

Hopefully these two things help.

frko · 2017-10-16T06:17:49Z

Thanks for the best practice, will apply it shortly, today we will verify if dumb-init indeed solves the defunct processes. Will keep you posted about he results. Regards Op 13 okt. 2017 17:58 schreef "michaeljho" <notifications@github.com>:

…

Frank, Have you tried starting your entire promise chain with a Promise.resolve(), then .catch() anything that falls through? This is a good practice especially when trying to avoid unhandled exceptions. Also, you can use this event handler to detect page crashes: logger.error('Chromium Tab CRASHED', error); });``` Hopefully these two things help. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#615 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABgQITUZRD3N4atHALmQIPZHD9nf8gZ9ks5sr4ipgaJpZM4PHVXS> .

frko · 2017-10-16T11:53:41Z

Using dumb-init did resolve the defunct process issue, which we're very happy with.

Unfortunately the best promise practice doesnt suffice to catch the 'page crashed' error thrown as an unhandled promise exception. Which makes me wonder where the process actually hangs ( it seems its stuck internally to puppeteer and never returning to the application code ).

I will now try to catch it using the page.on event handler and try to revive the browser when the process crashes. The odd thing about the whole situation is that the page crashes but the tab this has happened in is still very much alive.

frko · 2017-10-16T12:13:44Z

Just added the page.on('error', ...) handler which indeed is called when the page crashes. Exiting the node process in the error handler with an non zero exit code allows us to restart the containers and maintain a solid amount of puppeteer instances to set our servers on fire.

Thanks for the cooperation!

ebidel · 2017-10-16T18:31:05Z

Nice @frko. Curious if you're running on a cloud service? Are you waiting for a threshold of crashes to happen, then restarting the vm? Asking for a friend (https://github.com/ebidel/try-puppeteer) :)

puppeteer/puppeteer#615

rosskevin · 2017-11-06T15:11:35Z

On that same note @ebidel and @frko I'm also interested in reliability and autoscaling for a kubernetes pod (puppeteer in docker). @CesarLanderos pointed me to his kubernetes deployment config, but I'm also interested in bullet-proofing the scaling of an internal rendering service. It is similar to try-puppeteer but on kubernetes, perhaps more like puppetron and further customized.

I've added dumb init and the page error handler to kill dead processes so kubernetes can start new containers, but I'm worried that the horizontal autoscaler won't properly trigger on CPU and memory will blow it up with a bunch of concurrent requests opening up new pages. I'm curious about the right custom metrics in kubernetes to monitor for puppeteer reliability. Any thoughts?

gumieri · 2017-11-08T02:06:41Z

Hi,

IMHO, a better approach for assuring a solid auto scaling would be isolate the "API using puppeteer" container from the chrome, creating another container for running Chrome and exposing its WS to the puppeteer.

I'm just not sure if it is possible to separate the chromes instances placing a load balancer to its WS and balance these requests. Would it fail by breaking the "session"? A request to the chrome to "go-to" a page and print it would be different requests?

edit:

@rosskevin, I am using tecnospeed/pastor in production, with some hundred of requests per hour with no problem. It is running at least 2 instances (auto-scaling) behind a load-balancer.

rosskevin · 2017-11-08T13:49:29Z

@gumieri thanks for your comments

IMHO, a better approach for assuring a solid auto scaling would be isolate the "API using puppeteer" container from the chrome, creating another container for running Chrome and exposing its WS to the puppeteer.

This doesn't solve the need to still scale the chrome container reliably e.g. too many requests on a single container that blows up memory (I'm guessing memory is likely to be the constraint)

I'm just not sure if it is possible to separate the chromes instances placing a load balancer to its WS and balance these requests.

I have my customized puppetron behind a Service, and the entire browser goto/render is done in one script, so no session issues.

I am using tecnospeed/pastor in production, with some hundred of requests per hour with no problem. It is running at least 2 instances (auto-scaling) behind a load-balancer.

Good to note your success there, how many pages are your PDF files (just curious)? The notable difference between pastor vs puppetron is that puppetron is opening a new page (tab) on the same instance for different requests. I am not sure yet if this is better resource-wise and I also don't know the cookie sharing implications.

aslushnikov · 2018-01-12T09:52:53Z

We've added the reference to the dumb-init in the troubleshooting.md.

For the issues with page crashes, please upvote the #1454.

dcharbonnier · 2018-03-23T10:09:00Z

I don't think it's related to the puppeteer, I have some chrome zombie even on my linux desktop sometimes.
The solution for me was to use tini, my entrypoint is now :
exec /tini -- node --harmony_async_iteration dist/bin/run.js
I preferred tini over dump-init because it's the official docker solution but it's similar Yelp/dumb-init#63

alecdwm · 2018-07-17T09:13:55Z

@michaeljho Not sure if this is related, but in your snippet:

page.close().then(browser.close())

The code is executed as follows:

page.close()
browser.close()

// later on
undefined // or whatever is returned by browser.close()

It looks like instead you're trying to do:

page.close().then(browser.close)
// or
page.close().then(() => browser.close())

This code is instead executed as follows:

page.close()

// later on
browser.close()

aslushnikov mentioned this issue Aug 30, 2017

Kill child process group #621

Merged

aslushnikov added the bug label Aug 30, 2017

JoelEinbinder closed this as completed in #621 Aug 30, 2017

aslushnikov reopened this Sep 8, 2017

aslushnikov mentioned this issue Sep 25, 2017

Consolidate best practices in Docker section of troubleshooting document #809

Closed

2 tasks

aslushnikov mentioned this issue Oct 25, 2017

Puppeteer spawns a runaway amount of Chromium instances #1047

Closed

misenhower added a commit to misenhower/splatoon2.ink that referenced this issue Oct 30, 2017

Fix an issue with zombie Chrome processes in the app container

c5e2ce9

puppeteer/puppeteer#615

This was referenced Oct 31, 2017

Additional Docker tips #1235

Merged

Expose underlying chrome process #1238

Closed

aslushnikov closed this as completed Jan 12, 2018

SunXinFei mentioned this issue Aug 13, 2019

Docker + Node + Pm2 + Redis + Puppeteer SunXinFei/sunxinfei.github.io#20

Open

This was referenced Mar 3, 2020

Memory leak with puppeteer #4831

Closed

Node process not releasing the memory result of that getting OOM issues #5041

Closed

abeloin mentioned this issue Nov 21, 2020

Add dumb-init to prevent zombie process NoahCardoza/CloudProxy#38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instance.close() let zombie process #615

instance.close() let zombie process #615

gumieri commented Aug 30, 2017

michaeljho commented Sep 8, 2017 •

edited

Loading

gumieri commented Sep 8, 2017

aslushnikov commented Sep 14, 2017

michaeljho commented Sep 14, 2017

aslushnikov commented Sep 14, 2017

michaeljho commented Sep 14, 2017 •

edited

Loading

aslushnikov commented Sep 14, 2017

michaeljho commented Sep 14, 2017 •

edited

Loading

gumieri commented Sep 14, 2017

michaeljho commented Sep 18, 2017

michaeljho commented Sep 18, 2017 •

edited

Loading

michaeljho commented Sep 18, 2017

frko commented Oct 13, 2017 •

edited

Loading

michaeljho commented Oct 13, 2017 •

edited

Loading

frko commented Oct 16, 2017 via email

frko commented Oct 16, 2017

frko commented Oct 16, 2017

ebidel commented Oct 16, 2017

rosskevin commented Nov 6, 2017

gumieri commented Nov 8, 2017 •

edited

Loading

rosskevin commented Nov 8, 2017

aslushnikov commented Jan 12, 2018

dcharbonnier commented Mar 23, 2018

alecdwm commented Jul 17, 2018

instance.close() let zombie process #615

instance.close() let zombie process #615

Comments

gumieri commented Aug 30, 2017

michaeljho commented Sep 8, 2017 • edited Loading

gumieri commented Sep 8, 2017

aslushnikov commented Sep 14, 2017

michaeljho commented Sep 14, 2017

aslushnikov commented Sep 14, 2017

michaeljho commented Sep 14, 2017 • edited Loading

aslushnikov commented Sep 14, 2017

michaeljho commented Sep 14, 2017 • edited Loading

gumieri commented Sep 14, 2017

michaeljho commented Sep 18, 2017

michaeljho commented Sep 18, 2017 • edited Loading

michaeljho commented Sep 18, 2017

frko commented Oct 13, 2017 • edited Loading

michaeljho commented Oct 13, 2017 • edited Loading

frko commented Oct 16, 2017 via email

frko commented Oct 16, 2017

frko commented Oct 16, 2017

ebidel commented Oct 16, 2017

rosskevin commented Nov 6, 2017

gumieri commented Nov 8, 2017 • edited Loading

rosskevin commented Nov 8, 2017

aslushnikov commented Jan 12, 2018

dcharbonnier commented Mar 23, 2018

alecdwm commented Jul 17, 2018

michaeljho commented Sep 8, 2017 •

edited

Loading

michaeljho commented Sep 14, 2017 •

edited

Loading

michaeljho commented Sep 14, 2017 •

edited

Loading

michaeljho commented Sep 18, 2017 •

edited

Loading

frko commented Oct 13, 2017 •

edited

Loading

michaeljho commented Oct 13, 2017 •

edited

Loading

gumieri commented Nov 8, 2017 •

edited

Loading