Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[telemetry] Adds cloud provider metadata. #95131

Merged
merged 15 commits into from
Apr 12, 2021

Conversation

lukeelmers
Copy link
Member

@lukeelmers lukeelmers commented Mar 23, 2021

Summary

This PR revives some old logic from the monitoring plugin to detect the current cloud environment Kibana is running in (AWS, GCP, or Azure), and provide some basic stats for telemetry purposes.

Changes

  • Relocates the cloud detection logic from monitoring to kibana_usage_collection
  • Creates a cloud_provider usage collector that fetches the metrics (see sample data below)
  • Converts everything to TS
  • I purposefully avoided rewriting all of the detector logic in this PR as the old code was still working fine after some initial testing, however there is plenty of cleanup we could still do in followup PRs. Open to any suggestions folks have here.

Sample Data

{
  name: 'aws',
  vm_type: 't2.micro',
  region: 'us-west-2',
  zone: 'us-west-2a',
}

Testing

Testing is a pain because the only way to verify the collector works is by actually deploying Kibana to each of the cloud providers. Since a proper functional test which deploys to each provider isn't possible with the FTR, we've agreed that for now we'll stick with unit tests that have those APIs mocked.

Instead, I'm manually testing this works by deploying this branch to each cloud provider using these Terraform configs. If you'd like to test as well, you can do so following the instructions in the repo, just be sure to configure your repo to be https://github.com/lukeelmers/kibana and your branch to be feat/cloud-telemetry.

To confirm everything's working, I inspect the sample telemetry payload which you can get from scrolling to the bottom of Stack Management > Advanced Settings. The cloud_provider should be provided in the payload.

@lukeelmers lukeelmers force-pushed the feat/cloud-telemetry branch from 0cff981 to 7d9ea76 Compare March 23, 2021 19:00
@lukeelmers lukeelmers self-assigned this Mar 23, 2021
@lukeelmers lukeelmers added Feature:Telemetry v7.13.0 v8.0.0 release_note:skip Skip the PR/issue when compiling release notes labels Mar 23, 2021
@lukeelmers lukeelmers changed the title Add cloud metadata telemetry. [telemetry] Add cloud provider metadata. Mar 23, 2021
@lukeelmers lukeelmers changed the title [telemetry] Add cloud provider metadata. [telemetry] Adds cloud provider metadata. Mar 23, 2021
@lukeelmers lukeelmers force-pushed the feat/cloud-telemetry branch 4 times, most recently from 513377f to 33358d3 Compare March 30, 2021 19:54
@lukeelmers lukeelmers force-pushed the feat/cloud-telemetry branch from 33358d3 to 8adf9eb Compare March 31, 2021 17:58
Copy link
Member Author

@lukeelmers lukeelmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Github is showing some of the files I converted to TS as new, but really I just copied the contents of the old js files & only converted to TS with minor formatting updates. The main new thing introduced in this PR is the collector itself.

There's still plenty of cleanup we could do on the old code, just a question of what's worth taking the time to do now. I resisted the urge to overhaul everything, but open to any feedback folks may have 🙂

@lukeelmers
Copy link
Member Author

Just tested the latest again on GCP, Azure, AWS -- all looks good:

          "cloud_provider": {
            "name": "aws",
            "vm_type": "i3.xlarge",
            "region": "us-west-2",
            "zone": "us-west-2b"
          },
          "cloud_provider": {
            "name": "azure",
            "vm_type": "Standard_D4ds_v4",
            "region": "westus2"
          },
          "cloud_provider": {
            "name": "gcp",
            "vm_type": "e2-standard-4",
            "region": "us-west1",
            "zone": "us-west1-a"
          },

@lukeelmers lukeelmers marked this pull request as ready for review March 31, 2021 21:04
@lukeelmers lukeelmers requested review from a team as code owners March 31, 2021 21:04
@lukeelmers lukeelmers added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Mar 31, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers lukeelmers requested a review from afharo April 1, 2021 17:51
@afharo
Copy link
Member

afharo commented Apr 8, 2021

@elasticmachine merge upstream

@lukeelmers
Copy link
Member Author

@pgayvallet @afharo This should be ready for a final review. I'm trying to strike a balance here between getting in the functionality we want, while avoiding blowing the PR up too much by refactoring the old code.

I addressed the comments that didn't require too much refactoring, but have a few enhancements I'd prefer to do as follow-ups:

  1. getting rid of the constructor args which are only there for testing purposes (request, fs), and mocking those in the tests
  2. getting rid of the use of the request lib in favor of something like node-fetch which we use more widely in kibana (request looks like it's mostly used in tests atm, not many other places)
  3. adding some retry logic in the event of a failure
  4. update GCP logic to still return values even if a request for one of the fields fails, instead of using an all-or-nothing approach

Let me know if you feel strongly that we should include any of these in the first iteration, otherwise I will create a follow-up task for us to prioritize.

I think overall (3) is my biggest concern here, however AFAICT we did the same thing (only making 1 attempt) in our previous cloud telemetry implementation. So worst case scenario this has "feature parity" with what we used to report.

@lukeelmers lukeelmers requested review from pgayvallet and afharo April 8, 2021 21:50
@afharo
Copy link
Member

afharo commented Apr 9, 2021

@elasticmachine merge upstream

Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compromises seem fair to me! LGTM :)

Thank you @lukeelmers!

@lukeelmers
Copy link
Member Author

Thanks @afharo! Follow up issue created here: #96711

I'm looking at the remaining test failure so we can get this back to green.

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you feel strongly that we should include any of these in the first iteration, otherwise I will create a follow-up task for us to prioritize.

Opening a follow-up SGTM.

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers lukeelmers added the auto-backport Deprecated - use backport:version if exact versions are needed label Apr 12, 2021
@lukeelmers lukeelmers enabled auto-merge (squash) April 12, 2021 14:57
@lukeelmers lukeelmers disabled auto-merge April 12, 2021 15:31
@lukeelmers lukeelmers enabled auto-merge (squash) April 12, 2021 16:33
@lukeelmers
Copy link
Member Author

Just did one last test deployment to GCP, Azure, AWS... looks like we are good to go 🚀

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @lukeelmers

@lukeelmers lukeelmers merged commit c9cd4a0 into elastic:master Apr 12, 2021
@lukeelmers lukeelmers deleted the feat/cloud-telemetry branch April 12, 2021 16:56
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Apr 12, 2021
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Apr 12, 2021
Co-authored-by: Luke Elmers <luke.elmers@elastic.co>
@alexfrancoeur
Copy link

@lukeelmers @afharo do we need to follow up with a mapping issue in the infra repo?

@afharo
Copy link
Member

afharo commented Apr 14, 2021

For now, we have to :)

cc @elastic/infra-telemetry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.13.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants