Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instance.close() let zombie process #615

Closed
gumieri opened this issue Aug 30, 2017 · 24 comments
Closed

instance.close() let zombie process #615

gumieri opened this issue Aug 30, 2017 · 24 comments
Labels

Comments

@gumieri
Copy link

gumieri commented Aug 30, 2017

I am using puppeteer inside Docker as 'html to pdf' converter API.
Even calling browser.close() (which browser is the instance from puppeteer.launch()) I have chrome processes as zombie inside my instance:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 pastor    20   0  922068  26900  11404 S   0.0  0.7   0:00.66 node
   10 pastor    20   0   11740   1896   1512 S   0.0  0.1   0:00.04 bash
   33 pastor    20   0   51892   2008   1428 R   0.0  0.1   0:00.05 top
   40 pastor    20   0       0      0      0 Z   0.0  0.0   0:00.01 chrome
   62 pastor    20   0       0      0      0 Z   0.0  0.0   0:00.02 chrome
   69 pastor    20   0       0      0      0 Z   0.0  0.0   0:00.74 chrome

I tried to call page.close() too but nothing changed.

@michaeljho
Copy link

michaeljho commented Sep 8, 2017

Hi, I am still having this issue (zombies left from puppeteer inside Docker). Running on Debian 8.9, using puppeteer 0.10.2

@aslushnikov aslushnikov reopened this Sep 8, 2017
@gumieri
Copy link
Author

gumieri commented Sep 8, 2017

The scenery that I reported above still happening too.

@aslushnikov
Copy link
Contributor

@michaeljho @gumieri do you guys just run puppeteer scripts in docker and let them finish successfully?

I can't reproduce this behavior with puppeteer 0.11.0-alpha. Do you do anything special?

@michaeljho
Copy link

Our use case is to screengrab to pdf or png. We open a page, wait for a particular CSS selector to show up, then generate pdf or screenshot. Then in finally block, page.close().then(browser.close()). Consoles show that these are being executed "successfully".

This is an oversimplification, can give more direct src if it helps. But I think it's a pretty straightforward flow?

Thanks for taking the time to track this down. For now, we're working around this by launching puppeteer inside our own forked processes, then killing the forked process upon completion. This allows us to clear zombies without having to shut down our main listening node or restart the Docker container.

@aslushnikov
Copy link
Contributor

@michaeljho Can you reproduce this reliably? Any chance you can share a snippet with me?

@michaeljho
Copy link

michaeljho commented Sep 14, 2017

return Promise.resolve()
    .then(() =>
      puppeteer
        .launch({ args: ['--no-sandbox'] }) // TODO: we cannot sandbox for now because of we're running as root in prod, but we want to remove this eventually?
        .then(pBrowser => {
          logger.info('Puppeteer successfully launched, asking for new page...');
          browser = pBrowser;
          return browser.newPage();
        })
        .then(pPage => {
          logger.info('New page retrieved, beginning export...');
          page = pPage;
          return page;
        })
        .then(() => {
          logger.info('Setting extra HTTP headers', headers);
          return page.setExtraHTTPHeaders(headers);
        })
        .then(() => {
          const url = `${config.baseUrl}${uri}`;
          logger.info('Navigating to', url);
          page.on('console', (...args) => {
            logger.info(`page console: ${args.join(' ')}`);
          });
          page.on('request', request => logger.info('Page request sent:', request.url));
          page.on('error', error => logger.error('Chromium Tab CRASHED', error));
          page.on('pageerror', pageerror => logger.warn('Page encountered unhandled exception:', pageerror));
          return page.goto(url);
        })
        .then(() => {
          logger.info('Page loaded, waiting for CSS selector', selector);
          return page.waitForSelector(selector, { timeout: 300000 }); // 5 minute cap for page to fully load
        })
        .then(() => {
          logger.info('Selector', selector, 'detected, doing a sleep to allow painting to finish...');
          return sleep(sleepDuration);
        })
        .then(() => {
          logger.info('ready for export, attempting raster...');
          if (format === 'png') {
            return page.screenshot({ fullPage: true, omitBackground: true });
          }
          return page.pdf();
        })
        .then(file => {
          const fileName = `${id}.${format}`;
          return {
            fileName,
            file
          };
        })
    )
    .finally(() => {
      if (page) {
        logger.info('Closing puppeteer page');
        page.close().then(() => {
          if (browser) {
            logger.info('Closing puppeteer browser');
            browser.close();
          }
        });
      }
    });

I realize now that we are also using a bunch of page listeners too, forgot to mention that. Every time this script runs, we get a zombie.

@aslushnikov
Copy link
Contributor

@michaeljho thanks, trying now. I noticed .finally() method that is not a part of native promises.
Do you use a third-party promise library or you polyfilled .finally for native promises?

@michaeljho
Copy link

michaeljho commented Sep 14, 2017

Ahh yeah good catch ... we're using bluebird here. We see both of those consoles, so I believe at least the page is closing "successfully".

@gumieri
Copy link
Author

gumieri commented Sep 14, 2017

I executed my code out of a Docker, in a CentOS VM using the Puppeteer v10.2.
This issue does not happens, so may it should not me pointed as a bug for the Puppeteer.

If you consider viable to let this Issue open and want to investigate, feel free to use my code.
In the latest version (https://github.com/tecnospeed/pastor) still creating zombie.

@michaeljho
Copy link

So after some more investigation, it appears that it's the zygote and renderer forks that become defunct. The root puppeteer process terminates properly, but leaves its zygote subprocess (and subsequently the renderer sub of the zygote process) around.

Still digging more on our end. Just wanted to post info here as we find it in case it's helpful.

@michaeljho
Copy link

michaeljho commented Sep 18, 2017

Okay -- some more info now. Sending a SIGTERM or even SIGKILL signal to the zygote process leaves it in a defunct state. My test case was this, not sure if valid here:

  • launch puppeteer via REPL
  • via shell, kill -15 the renderer process. terminates properly
  • via shell, kill -15 the zygote process, leaves defunct process
  • tried the same with SIGKILL and same symptoms.

My guess is that kill signals are not being handled properly in zygote, which is exacerbated by pid1 reassignment/orphaning that happens in docker, meaning killing the original puppeteer process has no effect.

@michaeljho
Copy link

I was able to work around this issue by adding an init system to my docker image. I chose https://github.com/Yelp/dumb-init. Works perfectly, no more zombies.

@gumieri maybe you want to try this out yourself?

@aslushnikov I still think it's strange that sending any kill signal to the zygote leaves things defunct. But lack of init system in Docker + re-parenting orphans to pid 1 was definitely exaggerating the issue.

@frko
Copy link

frko commented Oct 13, 2017

We've been running puppeteer in the google cloud with kubernetes in an attempt to unify performance and load testing of our website - which involves starting 1000+ containers.

We're running the default node8 slim container with puppeteer which fetches 'abstract user journeys' and executes those using puppeteer.

To overcome overhead of starting the rather large containers for every session we keep the containers alive and close the browser and re-open it for every session as we need a clean context.

Depending on the amount of the size of the executed sessions the amount of chrome processes grows slower or faster but all containers eventually will 'hang' after the following unhandled promise rejection.

(node:27) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Page crashed!
(node:27) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

We're trying to figure out what is causing this issue but we currently think this is related to the immense amount of processes.

We're trying to fix this using the above 'fix' with dumb-init but I would really like to understand how I can catch this unhandled promise rejection. As far as we can see all exposed promises are guarded with an catch but we've not been able to stop the system from hanging and failing to recover from the failure.

Might this indeed be caused by the processes? And how do I guard against this failure so we can keep the containers alive.

Regards, Frank

@michaeljho
Copy link

michaeljho commented Oct 13, 2017

Frank,

Have you tried starting your entire promise chain with a Promise.resolve(), then .catch() anything that falls through? This is a good practice especially when trying to avoid unhandled exceptions.

Also, you can use this event handler to detect page crashes:

page.on('error', error => {
  logger.error('Chromium Tab CRASHED', error);
});

Hopefully these two things help.

@frko
Copy link

frko commented Oct 16, 2017 via email

@frko
Copy link

frko commented Oct 16, 2017

Using dumb-init did resolve the defunct process issue, which we're very happy with.

Unfortunately the best promise practice doesnt suffice to catch the 'page crashed' error thrown as an unhandled promise exception. Which makes me wonder where the process actually hangs ( it seems its stuck internally to puppeteer and never returning to the application code ).

I will now try to catch it using the page.on event handler and try to revive the browser when the process crashes. The odd thing about the whole situation is that the page crashes but the tab this has happened in is still very much alive.

@frko
Copy link

frko commented Oct 16, 2017

Just added the page.on('error', ...) handler which indeed is called when the page crashes. Exiting the node process in the error handler with an non zero exit code allows us to restart the containers and maintain a solid amount of puppeteer instances to set our servers on fire.

Thanks for the cooperation!

@ebidel
Copy link
Contributor

ebidel commented Oct 16, 2017

Nice @frko. Curious if you're running on a cloud service? Are you waiting for a threshold of crashes to happen, then restarting the vm? Asking for a friend (https://github.com/ebidel/try-puppeteer) :)

@rosskevin
Copy link

On that same note @ebidel and @frko I'm also interested in reliability and autoscaling for a kubernetes pod (puppeteer in docker). @CesarLanderos pointed me to his kubernetes deployment config, but I'm also interested in bullet-proofing the scaling of an internal rendering service. It is similar to try-puppeteer but on kubernetes, perhaps more like puppetron and further customized.

I've added dumb init and the page error handler to kill dead processes so kubernetes can start new containers, but I'm worried that the horizontal autoscaler won't properly trigger on CPU and memory will blow it up with a bunch of concurrent requests opening up new pages. I'm curious about the right custom metrics in kubernetes to monitor for puppeteer reliability. Any thoughts?

@gumieri
Copy link
Author

gumieri commented Nov 8, 2017

Hi,

IMHO, a better approach for assuring a solid auto scaling would be isolate the "API using puppeteer" container from the chrome, creating another container for running Chrome and exposing its WS to the puppeteer.

I'm just not sure if it is possible to separate the chromes instances placing a load balancer to its WS and balance these requests. Would it fail by breaking the "session"? A request to the chrome to "go-to" a page and print it would be different requests?

edit:

@rosskevin, I am using tecnospeed/pastor in production, with some hundred of requests per hour with no problem. It is running at least 2 instances (auto-scaling) behind a load-balancer.

@rosskevin
Copy link

@gumieri thanks for your comments

IMHO, a better approach for assuring a solid auto scaling would be isolate the "API using puppeteer" container from the chrome, creating another container for running Chrome and exposing its WS to the puppeteer.

This doesn't solve the need to still scale the chrome container reliably e.g. too many requests on a single container that blows up memory (I'm guessing memory is likely to be the constraint)

I'm just not sure if it is possible to separate the chromes instances placing a load balancer to its WS and balance these requests.

I have my customized puppetron behind a Service, and the entire browser goto/render is done in one script, so no session issues.

I am using tecnospeed/pastor in production, with some hundred of requests per hour with no problem. It is running at least 2 instances (auto-scaling) behind a load-balancer.

Good to note your success there, how many pages are your PDF files (just curious)? The notable difference between pastor vs puppetron is that puppetron is opening a new page (tab) on the same instance for different requests. I am not sure yet if this is better resource-wise and I also don't know the cookie sharing implications.

@aslushnikov
Copy link
Contributor

We've added the reference to the dumb-init in the troubleshooting.md.

For the issues with page crashes, please upvote the #1454.

@dcharbonnier
Copy link

I don't think it's related to the puppeteer, I have some chrome zombie even on my linux desktop sometimes.
The solution for me was to use tini, my entrypoint is now :
exec /tini -- node --harmony_async_iteration dist/bin/run.js
I preferred tini over dump-init because it's the official docker solution but it's similar Yelp/dumb-init#63

@alecdwm
Copy link

alecdwm commented Jul 17, 2018

@michaeljho Not sure if this is related, but in your snippet:

page.close().then(browser.close())

The code is executed as follows:

page.close()
browser.close()

// later on
undefined // or whatever is returned by browser.close()

It looks like instead you're trying to do:

page.close().then(browser.close)
// or
page.close().then(() => browser.close())

This code is instead executed as follows:

page.close()

// later on
browser.close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants