Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Node 16.x #98650

Closed
tylersmalley opened this issue Apr 28, 2021 · 20 comments · Fixed by #110684
Closed

Upgrade to Node 16.x #98650

tylersmalley opened this issue Apr 28, 2021 · 20 comments · Fixed by #110684
Assignees
Labels
Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team

Comments

@tylersmalley
Copy link
Contributor

Node v16, the next LTS, was released on April 20th. Active LTS starts on October 18th. With this, I feel it would be best to target 7.16.

The primary driver for this upgrade at this time is to support ARM on macOS.

Node.js v16.0.0 will be the first release where we ship prebuilt binaries for Apple Silicon. While we’ll be providing separate tarballs for the Intel (darwin-x64) and ARM (darwin-arm64) architectures the macOS installer (.pkg) will be shipped as a ‘fat’ (multi-architecture) binary.

Related to #83946

@tylersmalley tylersmalley added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team labels Apr 28, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@tylersmalley
Copy link
Contributor Author

Also, cc @watson since you have done the recent upgrades.

Is everyone OK targeting 7.16? 7.15 would probably ship before the active LTS of v16 starts, which is not recommended for production applications.

@rudolf
Copy link
Contributor

rudolf commented Sep 22, 2021

I think the comments on #77469 (comment) are still relevant. Although I'm in favour of moving towards crashing (and adopting v16) I believe this is a huge stability risk. We first need to get unhandled promise rejections in production to a reasonable number before we can crash as the default.

@watson
Copy link
Contributor

watson commented Sep 22, 2021

I'm surprised if we still have a lot of unhandled promise rejections in our code base. I implemented a trap for this which should crash the process with a good error message in case of an unhandled promise rejection in both development and CI a while back:

if (process.noProcessWarnings !== true) {
process.on('warning', function (warn) {
if (shouldIgnore(warn)) return;
if (process.traceProcessWarnings === true) {
console.error('Node.js process-warning detected - Terminating process...');
} else {
console.error('Node.js process-warning detected:');
console.error();
console.error(warn.stack);
console.error();
console.error('Terminating process...');
}
process.exit(1);
});
// While the above warning listener would also be called on
// unhandledRejection warnings, we can give a better error message if we
// handle them separately:
process.on('unhandledRejection', function (reason) {
console.error('Unhandled Promise rejection detected:');
console.error();
console.error(reason);
console.error();
console.error('Terminating process...');
process.exit(1);
});
}

The idea with this was to find all these places so we could fix them before turning this behavior on in production as well.

If we still have a lot of these, it could mean that this code is no longer working as intended. Could it be that part of our CI isn't triggering this code path?

@rudolf
Copy link
Contributor

rudolf commented Sep 22, 2021

When looking over the previous 4 weeks there's still a huge number of unhandled rejections:

Screenshot 2021-09-22 at 14 11 32

https://overview.elastic-cloud.com/app/dashboards#/view/cf546190-1b9a-11ec-b8ee-754395ac06ab?_g=(filters%3A!()%2CrefreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3Anow-4w%2Cto%3Anow))

Now if these 30% of clusters all only crashed once in a 4 week period that would be acceptable. I'm not exactly sure how to break down the data to get an idea of how often the average cluster would crash. But I think this is probably enough to postpone making it the default.

We don't currently have enough context to make these unhandled rejections actionable e.g. an log message like:

Detected an unhandled Promise rejection. ResponseError: [parent] Data too large, data for [] would be [1032302040/984.4mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1032302040/984.4mb], new bytes reserved: [0/0b], usages [request=16440/16kb, fielddata=4477154/4.2mb, in_flight_requests=0/0b, model_inference=0/0b, accounting=23035828/21.9mb]: circuit_breaking_exception

Could come from all 2k+ places we're making calls to Elasticsearch

@rudolf
Copy link
Contributor

rudolf commented Sep 22, 2021

If we still have a lot of these, it could mean that this code is no longer working as intended. Could it be that part of our CI isn't triggering this code path?

Unfortunately my guess is that it's neither. Most of Kibana and our tests are written for the happy path, we rarely test network errors or any circuit_breaker_exceptions which Elasticsearch returns.

@mshustov
Copy link
Contributor

mshustov commented Sep 22, 2021

Detected an unhandled Promise rejection. ResponseError: [parent] Data too large, data for [] would be [1032302040/984.4mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1032302040/984.4mb], new bytes reserved: [0/0b], usages [request=16440/16kb, fielddata=4477154/4.2mb, in_flight_requests=0/0b, model_inference=0/0b, accounting=23035828/21.9mb]: circuit_breaking_exception

Note that this is logged by the Core

process.on('unhandledRejection', (reason) => {

It's interesting the message doesn't contain an error stack. I guess the root problem is in an ES client limitation that will be fixed in @elastic/elasticsearch@v8.0 #108387 is planned for the next week.

From elastic/elasticsearch-js#1542

Drop callback-style API
Maintaining both API styles is not a problem per se, but it makes error handling more convoluted due to async stack traces.
Moving to a full-promise API will solve this issue.

@rudolf
Copy link
Contributor

rudolf commented Sep 22, 2021

I think we would also have to log the stack trace explicitly (or change ResponseError's .toString()) https://github.com/elastic/elasticsearch-js/blob/acb77fa3d1badc15091bd330e2fcf87246851799/lib/errors.js#L124

@watson
Copy link
Contributor

watson commented Sep 22, 2021

The nice thing about the reason argument to the unhandledRejection callback is that it points to the place where the promise was rejected - not where the rejected error object was created. This makes tracking down the promise much easier.

@mshustov
Copy link
Contributor

I think we would also have to log the stack trace explicitly (or change ResponseError's .toString())

toString() doesn't contain the original error message? 😱 @delvedor

@delvedor
Copy link
Member

I think we would also have to log the stack trace explicitly (or change ResponseError's .toString())

I have no problem changing the toString method, what would you like to see there?

toString() doesn't contain the original error message? 😱 @delvedor

@mshustov argh! that can be fixed! :)

@rudolf
Copy link
Contributor

rudolf commented Sep 22, 2021

I have no problem changing the toString method, what would you like to see there?

I feel like not including the stack in the toString() is more inline with the rest of Nodejs. So I think we should just log the stack explicitly.

@mshustov
Copy link
Contributor

I feel like not including the stack in the toString() is more inline with the rest of Nodejs. So I think we should just log the stack explicitly.

@rudolf you are absolutely right, the stack is logged by the console automatically. Our logging system logs it as well when receives Error, but here we wrap the object in a string

this.log.warn(`Detected an unhandled Promise rejection.\n${reason}`);

we should do something like:

this.log.warn(`Detected an unhandled Promise rejection.\n${reason}\n${reason.stack}`);

@jbudz
Copy link
Member

jbudz commented Sep 22, 2021

How does this sound?

  1. Not crashing in production by default
    echo --unhandled-rejections=warn >> config/node.options
    
  2. Fixing CI for tested code
  3. Letting development crash on untested code
  4. Documenting a workaround for developers unnecessarily impacted:
    export NODE_OPTIONS=--unhandled-rejections=warn
    
  5. If we can find a way, some form of static code analysis and another meta issue

Upgrading in 7.16.x when Node 14 LTS ends (2023-04-30) will delay ARM macOS support for development and production, so I'm hoping to avoid that if we can. It'll put more pressure on unresolved cases too.

@watson
Copy link
Contributor

watson commented Sep 23, 2021

Upgrading in 7.16.x when Node 14 LTS ends (2023-04-30) will delay ARM macOS support for development and production, so I'm hoping to avoid that if we can. It'll put more pressure on unresolved cases too.

As we have learned the hard way from 6.8, we can't upgrade the previous-major branch to a new major Node.js version without so much work that it's not worth it. So we will/have to upgrade 7.16 to Node.js 16 before v8.0.0 is released - preferable as soon as it's LTS (which is in about a month). From a security perspective it's not advisable to stay on Node.js 14 🚨

@rudolf
Copy link
Contributor

rudolf commented Oct 1, 2021

The nice thing about the reason argument to the unhandledRejection callback is that it points to the place where the promise was rejected - not where the rejected error object was created. This makes tracking down the promise much easier.

Unfortunately this doesn't seem to be the case:

given a file test.js:

const err = new Error();

process.on('unhandledRejection', (reason, promise) => {
  console.log(reason);
  console.log(promise);
});

new Promise((res, rej) => rej(err));

node test.js (using node v16.10.0) produces the following log:

Error
    at Object.<anonymous> (/Users/rudolf/dev/kibana/test.js:1:13)
    at Module._compile (node:internal/modules/cjs/loader:1101:14)
    ...
Promise {
  <rejected> Error
      at Object.<anonymous> (/Users/rudolf/dev/kibana/test.js:1:13)
      at Module._compile (node:internal/modules/cjs/loader:1101:14)
      ...
}

So you can identify where the error originates from but not where the unhandled promise was constructed 😢

https://github.com/grantila/trace-unhandled aims to solve this problem by instrumenting the promise constructor, but doesn't recommend running it in production.

@watson
Copy link
Contributor

watson commented Oct 6, 2021

@rudolf You're right. If would have sworn it was like I said - I'm not sure how I got that idea. But I agree that the trace-unhandled package shouldn't be run in production. I think it will severely decrease performance as V8 will not be able to optimize performance of these overloaded promises as they can with the native - at least that's my gut feeling based on how much optimization that V8 currently does for promises.

@kobelb
Copy link
Contributor

kobelb commented Oct 13, 2021

Did we decide to do what @jbudz suggested in #98650 (comment) and no longer crash on unhandled promise rejections for 7.16 because we continue to see unhandled project rejections in production?

If so, is there anything else that we need to worry about?

@rudolf
Copy link
Contributor

rudolf commented Oct 18, 2021

@kobelb yes. No further concerns from my side 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants