Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry ES API calls that fail with 410/Gone to prevent kibana from crashing at startup #56950

Merged
merged 2 commits into from
Feb 7, 2020

Conversation

rudolf
Copy link
Contributor

@rudolf rudolf commented Feb 6, 2020

Summary

For Saved Object migrations, ignore ES API calls that fail with 410/Gone status code. This prevents Kibana from crashing on startup under the following conditions:

  1. Kibana passes the ES version check by successfully querying ES nodes.info.
  2. Shortly after, just as Kibana attempts to start it's migrations, the ES proxy becomes unhealthy and starts returning 410/Gone status codes.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@rudolf rudolf added blocker Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Saved Objects v8.0.0 v7.7.0 v7.6.0 labels Feb 6, 2020
@rudolf rudolf requested a review from a team as a code owner February 6, 2020 08:55
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@rudolf rudolf added the bug Fixes for quality problems that affect the customer experience label Feb 6, 2020
@rudolf rudolf removed the blocker label Feb 6, 2020
@bhavyarm
Copy link
Contributor

bhavyarm commented Feb 6, 2020

@rudolf cc @LeeDr can you please let us know how you found this issue? We have a smoke test for it but I can't reproduce it. Thanks!

@rudolf
Copy link
Contributor Author

rudolf commented Feb 6, 2020

@bhavyarm It was by coincidence that we noticed it because an ES Proxy was incorrectly configured which would cause it to always return 410 status codes to any requests. The only way to test this would be to point Kibana to an proxy which allows us to control the response status codes. I've used a tool like MITMProxy to do something like this before https://docs.mitmproxy.org/stable/addons-scripting/ if we don't have any existing tools let me know if I can help set something up.

@elasticmachine
Copy link
Contributor

💔 Build Failed


Test Failures

Kibana Pipeline / x-pack-intake-agent / X-Pack Jest Tests.x-pack/legacy/plugins/uptime/server/lib/helper/__test__.getHistogramIntervalFormatted specifies the interval necessary to divide a given timespan into equal buckets, rounded to the nearest integer, expressed in ms

Link to Jenkins

Standard Out

Failed Tests Reporter:
  - Test has not failed recently on tracked branches


Stack Trace

Error: expect(received).toBeTruthy()

Received: false
    at Object.it (/var/lib/jenkins/workspace/elastic+kibana+pipeline-pull-request/kibana/x-pack/legacy/plugins/uptime/server/lib/helper/__test__/get_histogram_interval_formatted.test.ts:18:50)
    at Object.asyncJestTest (/var/lib/jenkins/workspace/elastic+kibana+pipeline-pull-request/kibana/node_modules/jest-jasmine2/build/jasmineAsyncInstall.js:102:37)
    at resolve (/var/lib/jenkins/workspace/elastic+kibana+pipeline-pull-request/kibana/node_modules/jest-jasmine2/build/queueRunner.js:43:12)
    at new Promise (<anonymous>)
    at mapper (/var/lib/jenkins/workspace/elastic+kibana+pipeline-pull-request/kibana/node_modules/jest-jasmine2/build/queueRunner.js:26:19)
    at promise.then (/var/lib/jenkins/workspace/elastic+kibana+pipeline-pull-request/kibana/node_modules/jest-jasmine2/build/queueRunner.js:73:41)
    at process._tickCallback (internal/process/next_tick.js:68:7)

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@elastic elastic deleted a comment from kibanamachine Feb 6, 2020
@rudolf
Copy link
Contributor Author

rudolf commented Feb 7, 2020

@elasticmachine merge upstream

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@rudolf rudolf merged commit 4efd26a into elastic:master Feb 7, 2020
@rudolf rudolf deleted the ignore-es-gone-errors branch February 7, 2020 09:23
rudolf added a commit to rudolf/kibana that referenced this pull request Feb 7, 2020
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
rudolf added a commit to rudolf/kibana that referenced this pull request Feb 7, 2020
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
rudolf added a commit that referenced this pull request Feb 7, 2020
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Feb 7, 2020
…b.com:jloleysens/kibana into console/feature/text-objects-in-saved-objects

* 'console/feature/text-objects-in-saved-objects' of github.com:jloleysens/kibana: (103 commits)
  fix auto closing new vis modal when navigating to lens or when navigating away with browser history (elastic#56998)
  TS of esKuery\node_types  (elastic#56857)
  Kibana app migration: Move static code dependencies into kibana_legacy plugin, part 1 (elastic#56408)
  Retry ES API calls that fail with 410/Gone (elastic#56950)
  [APM] Show missing permissions message to the user on the Services overview (elastic#56374)
  Fixing flaky CI tests for custom appRoutes (elastic#55763)
  [State Management][Docs] State syncing utils docs (elastic#56479)
  [Index management] Remove index mapper setting in tests (elastic#57066)
  Exposed common EuiExpressions to separate components be able to reuse for building new for Alert Types  (elastic#56466)
  [SIEM] update url state between page if date is relative (elastic#56813)
  fix for chart_types test (elastic#57056)
  chore(NA): remove compress from dll minimizer (elastic#57023)
  [File upload] Migrate routing to NP & add route validation (elastic#52313)
  Adding docs for grouped nav advanced setting (elastic#57013)
  Use i18n titles for field formatters, human names for numeral locales (elastic#56348)
  [Maps] Remove EMS catalogue url from docs (elastic#57020)
  [Endpoint] ERT-82 ERT-83 ERT-84: Alert list API with pagination (elastic#56538)
  [DOCS] Adds Apple notarization info to install doc (elastic#57042)
  [ML] New Platform server shim: update results service routes to use new platform router (elastic#56886)
  Fix typo on detection engine rule (elastic#56993)
  ...
rudolf added a commit that referenced this pull request Feb 7, 2020
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Saved Objects release_note:fix Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.6.1 v7.7.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants