Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of [HCP Telemetry] Periodic Refresh for Dynamic Telemetry Configuration into release/1.15.x #18349

Conversation

hc-github-team-consul-core
Copy link
Collaborator

Backport

This PR is auto-generated from #18168 to be assessed for backporting due to the inclusion of the label backport/1.15.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@Achooo
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/consul/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.


Description

New feature for HCP Telemetry that allows us to dynamically re-configure metrics export to HCP.

Background
Currently, we need to restart the Consul server to bring in any updates to filters, a new endpoint or new labels from the Consul Telemetry Gateway in HCP.

Ideally, we are able to modify these dynamically, which requires logic to continuously fetch new configuration and update it as necessary.

Implementation
This PR enables this functionality, with changes to :

  1. New object: TelemetryConfigProvider:
  • Periodically fetches telemetry configuration from HCP, using a time.Ticker.
  • Holds configuration and exposes it via methods : GetEndpoint(), GetFilters() and GetLabels()
  • Updates configuration, ONLY if it has changed. Comparisons are done using a hash function on the structs holding the config.
  • Handles concurrency access to read the config and modify the config.
  1. OTELSink: uses a new ConfigProvider interface to access filters and labels.
  • Since labels are now dynamic, we can't add them to the OTEL resource (immutable) so a new function is added to compute OTEL labels dynamically.
  • Handle dynamic filters
  1. OTELExporter: uses a new EndpointProvider interface to access the endpoint.

** Cleanup / Refactor ***

  1. The client package was cleaned to separate out the TelemetryConfig object to parse and convert objects.
  2. A logger was added to the client to log bad filter information

This refactor was necessary since now the TelemetryConfigProvider also fetches telemetry config. This avoid duplication of logic for the parsing of the endpoint, labels ,etc.

Testing & Reproduction steps

  • End to end tests:
  • Unit tests:

Links

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

Overview of commits

zalimeni and others added 30 commits June 12, 2023 14:28
- Update changelog to include new entries from release
- Update submodule versions to latest published
* port from enterprise branch

* Apply suggestions from code review

Co-authored-by: shanafarkas <105076572+shanafarkas@users.noreply.github.com>

* Update website/content/docs/connect/cluster-peering/usage/create-sameness-groups.mdx

* next steps

* Update website/content/docs/connect/cluster-peering/usage/create-sameness-groups.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/k8s/connect/cluster-peering/usage/create-sameness-groups.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

---------

Co-authored-by: shanafarkas <105076572+shanafarkas@users.noreply.github.com>
Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
* trimmed CRD step and reqs from installation

* updated tech specs

* Apply suggestions from code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com>

* added upgrade instruction

* removed tcp port req

* described downtime and DT-less upgrades

* applied additional review feedback

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com>
* additional feedback

* Update website/content/docs/api-gateway/upgrades.mdx

Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com>

---------

Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com>
* Initial page/nav creation

* configuration entry reference page

* Usage + fixes

* service intentions page

* usage

* description

* config entry updates

* formatting fixes

* Update website/content/docs/connect/config-entries/service-intentions.mdx

Co-authored-by: Paul Glass <pglass@hashicorp.com>

* service intentions review fixes

* Overview page review fixes

* Apply suggestions from code review

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

---------

Co-authored-by: Paul Glass <pglass@hashicorp.com>
Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
* Fixes

* service intentions fixes
* Merge pull request #5773 from hashicorp/docs/rate-limiting-from-ip-addresses-1.16

updated docs for rate limiting for IP addresses - 1.16

* Merge pull request #5609 from hashicorp/docs/enterprise-utilization-reporting

Add docs for enterprise utilization reporting

* Merge pull request #5734 from hashicorp/docs/envoy-ext-1.16

Docs/envoy ext 1.16

* Add release notes for 1.16-rc

* Add consul-e license utlization reporting

* Update with rc absolute links

* Update with rc absolute links

* fix typo

* Apply suggestions from code review

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update to use callout component

* address typo

* docs: FIPS 140-2 Compliance (#17668)

* Page + nav + formatting

* link fix

* Update website/content/docs/enterprise/fips.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/enterprise/fips.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/enterprise/fips.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/enterprise/fips.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/enterprise/fips.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/enterprise/fips.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/enterprise/fips.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/enterprise/fips.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/enterprise/fips.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* Update website/content/docs/enterprise/fips.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

* link fix

* Apply suggestions from code review

Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com>

* Update website/content/docs/enterprise/fips.mdx

---------

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com>

* fix apigw install values file

* fix typos in release notes

---------

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Jeff Apple <79924108+Jeff-Apple@users.noreply.github.com>
* adding redirects

* Apply suggestions from code review
* fix release notes links

* fix typos on fips docs
… 5m for consul-debug (#17596)

* changed duration to 5 mins and log level to trace

* documentation update

* change log
This includes prioritize by localities on disco chain targets rather than
resolvers, allowing different targets within the same partition to have
different policies.
…te management (#17075)

* agent: remove agent cache dependency from service mesh leaf certificate management

This extracts the leaf cert management from within the agent cache.

This code was produced by the following process:

1. All tests in agent/cache, agent/cache-types, agent/auto-config,
   agent/consul/servercert were run at each stage.

    - The tests in agent matching .*Leaf were run at each stage.

    - The tests in agent/leafcert were run at each stage after they
      existed.

2. The former leaf cert Fetch implementation was extracted into a new
   package behind a "fake RPC" endpoint to make it look almost like all
   other cache type internals.

3. The old cache type was shimmed to use the fake RPC endpoint and
   generally cleaned up.

4. I selectively duplicated all of Get/Notify/NotifyCallback/Prepopulate
   from the agent/cache.Cache implementation over into the new package.
   This was renamed as leafcert.Manager.

    - Code that was irrelevant to the leaf cert type was deleted
      (inlining blocking=true, refresh=false)

5. Everything that used the leaf cert cache type (including proxycfg
   stuff) was shifted to use the leafcert.Manager instead.

6. agent/cache-types tests were moved and gently replumbed to execute
   as-is against a leafcert.Manager.

7. Inspired by some of the locking changes from derek's branch I split
   the fat lock into N+1 locks.

8. The waiter chan struct{} was eventually replaced with a
   singleflight.Group around cache updates, which was likely the biggest
   net structural change.

9. The awkward two layers or logic produced as a byproduct of marrying
   the agent cache management code with the leaf cert type code was
   slowly coalesced and flattened to remove confusion.

10. The .*Leaf tests from the agent package were copied and made to work
    directly against a leafcert.Manager to increase direct coverage.

I have done a best effort attempt to port the previous leaf-cert cache
type's tests over in spirit, as well as to take the e2e-ish tests in the
agent package with Leaf in the test name and copy those into the
agent/leafcert package to get more direct coverage, rather than coverage
tangled up in the agent logic.

There is no net-new test coverage, just coverage that was pushed around
from elsewhere.
* add enterprise notes for IP-based rate limits

* Apply suggestions from code review

Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
Co-authored-by: David Yu <dyu@hashicorp.com>

* added bolded 'Enterprise' in list items.

---------

Co-authored-by: Tu Nguyen <im2nguyen@users.noreply.github.com>
Co-authored-by: David Yu <dyu@hashicorp.com>
* Update terminating-gateway.mdx
* Update exported-services.mdx
* Update mesh.mdx
Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
* Update Dockerfile

* Create 17719.txt
Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
…consul operator raft list-peers' (#17582)

* init

* fix tests

* added -detailed in docs

* added change log

* fix doc

* checking for entry in map

* fix tests

* removed detailed flag

* removed detailed flag

* revert unwanted changes

* removed unwanted changes

* updated change log

* pr review comment changes

* pr comment changes single API instead of two

* fix change log

* fix tests

* fix tests

* fix test operator raft endpoint test

* Update .changelog/17582.txt

Co-authored-by: Semir Patel <semir.patel@hashicorp.com>

* nits

* updated docs

---------

Co-authored-by: Semir Patel <semir.patel@hashicorp.com>
@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/cc-4960/hcp-telemetry-periodic-refresh/namely-joint-tarpon branch 2 times, most recently from bca3df7 to 49d2fdc Compare August 1, 2023 21:21
@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/cc-4960/hcp-telemetry-periodic-refresh/namely-joint-tarpon branch from 49d2fdc to bca3df7 Compare August 1, 2023 21:21
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto approved Consul Bot automated PR

@github-actions github-actions bot added type/docs Documentation needs to be created/updated/clarified theme/api Relating to the HTTP API interface theme/cli Flags and documentation for the CLI interface theme/config Relating to Consul Agent configuration, including reloading theme/ui Anything related to the UI theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication type/ci Relating to continuous integration (CI) tooling for testing or releases pr/dependencies PR specifically updates dependencies of project theme/envoy/xds Related to Envoy support theme/contributing Additions and enhancements to community contributing materials theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/agent-cache Agent Cache theme/consul-terraform-sync Relating to Consul Terraform Sync and Network Infrastructure Automation labels Aug 1, 2023
@hc-github-team-consul-core
Copy link
Collaborator Author

🤔 This PR has changes in the website/ directory but does not have a type/docs-cherrypick label. If the changes are for the next version, this can be ignored. If they are updates to current docs, attach the label to auto cherrypick to the stable-website branch after merging.

@Achooo
Copy link
Contributor

Achooo commented Aug 2, 2023

Closing in favour of manual backport : #18360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/dependencies PR specifically updates dependencies of project theme/agent-cache Agent Cache theme/api Relating to the HTTP API interface theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/cli Flags and documentation for the CLI interface theme/config Relating to Consul Agent configuration, including reloading theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/consul-terraform-sync Relating to Consul Terraform Sync and Network Infrastructure Automation theme/contributing Additions and enhancements to community contributing materials theme/envoy/xds Related to Envoy support theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication theme/ui Anything related to the UI type/ci Relating to continuous integration (CI) tooling for testing or releases type/docs Documentation needs to be created/updated/clarified
Projects
None yet
Development

Successfully merging this pull request may close these issues.