Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add reload listener for apm tracing config #13514

Merged
merged 7 commits into from
Jul 2, 2024

Conversation

kyungeunni
Copy link
Contributor

@kyungeunni kyungeunni commented Jun 28, 2024

Motivation/summary

depending on elastic/beats#40030

Subscribe to the APMConfig changes from the reloader and hot-reload the runner with the updated tracing configuration.
This will enable self-instrumentation for APM Server in managed mode.

Checklist

  • Update CHANGELOG.asciidoc
  • Documentation has been updated
  • Test with elastic-agent with the configuration written in elastic-agent.yml => will be done in separately

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

Follow instructions in Building an Elastic Agent container image with a locally built APM Server and Running an Elastic Agent container with a locally built APM Server section in TESTING.md, update the elastic-agnet.yml to see the config is received and it's passed down to tracing correctly.

Related issues

Copy link
Contributor

mergify bot commented Jun 28, 2024

This pull request does not have a backport label. Could you fix it @kyungeunni? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Jun 28, 2024
@kyungeunni kyungeunni self-assigned this Jul 1, 2024
Copy link
Contributor

mergify bot commented Jul 1, 2024

This pull request is now in conflicts. Could you fix it @kyungeunni? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b apm-tracing-config-from-ea upstream/apm-tracing-config-from-ea
git merge upstream/main
git push upstream apm-tracing-config-from-ea

@kyungeunni kyungeunni marked this pull request as ready for review July 1, 2024 16:12
@kyungeunni kyungeunni requested a review from a team as a code owner July 1, 2024 16:12
@carsonip carsonip changed the title wip: add reload listner for apm tracing config wip: add reload listener for apm tracing config Jul 1, 2024
@kyungeunni kyungeunni marked this pull request as draft July 1, 2024 16:23
@kyungeunni kyungeunni changed the title wip: add reload listener for apm tracing config feat: add reload listener for apm tracing config Jul 2, 2024
@kyungeunni kyungeunni marked this pull request as ready for review July 2, 2024 01:50
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's follow up with some manual testing before closing the issue please.

There's a couple of fields in APMConfig which I don't think we'll honour (tls, global_labels), but those should probably be handled in libbeat: https://github.com/elastic/elastic-agent-client/blob/dd646033eae6ef77cf6ff5d297e0d5513d08ac7f/pkg/proto/elastic-agent-client.pb.go#L1200-L1205

@kyungeunni
Copy link
Contributor Author

I don't think we'll honour (tls, global_labels), but those should probably be handled in libbeat

Yes, instrumentation in beats need to be updated to accept those fields. I made some changes in beats repository accepting those so I will make a PR for it as a follow up.

@kyungeunni kyungeunni merged commit dfb1735 into elastic:main Jul 2, 2024
14 checks passed
@kyungeunni kyungeunni deleted the apm-tracing-config-from-ea branch July 2, 2024 06:02
@kyungeunni
Copy link
Contributor Author

kyungeunni commented Jul 3, 2024

Follow up: Manual testing

I've done the manual testing by running a locally built elastic-agent image with the APM server binary that contains #13514 which subscribes to the APMConfig reload and applies the given APMConfig for the self-instrumentation.

Setup

  1. Run Elasticsearch and Kibana locally by running docker-compose up -d
  2. Build the Elastic Agent image with the local APM Server:
Keep the image's name and tag
referring it as `$LOCAL_IMAGE` in the later example.(looks like `elastic-agent-systemtest:${snapshotversion}`)

$ cd systemtest/cmd/buildapm
$ go run main.go -arch arm64
  1. Go to local Kibana(http://localhost:5601/app/fleet/agents), create agent policy by clicking Add Agent button. Grab Enrollment token and policy ID from the snippet to install Elastic Agent
  2. Create another docker-compose.yml file in a temp dir to run the Elastic Agent with the image built from above:
touch ~/temp/docker-compose.yaml

paste below, with replaced value for highlighted texts (**$LOCAL_IMAGE**, **Enrollment token**, **fleet-server-policy**)

version: '3.9'
services:
  elastic-agent:
    image: **$LOCAL_IMAGE**
    networks:
      - network1
    ports:
      - 8220:8220
      - 8200:8200
    user: root
    environment:
      FLEET_SERVER_ENABLE: 'true'
      FLEET_SERVER_ELASTICSEARCH_HOST: http://elasticsearch:9200
      FLEET_SERVER_SERVICE_TOKEN: **Enrollment token**
      FLEET_SERVER_POLICY_ID: **fleet-server-policy**
networks:
  network1:
    name: apm-server_default
    external: true
  1. run docker-compose up -d
  2. see Elastic Agent is running in Health state, with the confirmation from Fleet UI in Kiabana. Once done, enable the APM Integration in Kibana UI. Confirm the logs to see APM Server is running successfully.

Check for the Self Instrumentation

I recommend using docker-desktop, and I'm sharing this instructions based on the session I've done with docker-desktop.

  1. In the logs, search for keyword APM Instrumentation is enabled, none should show up. The instrumentation is disabled by default.
  2. Update the local configuration. APM Config can only be modified by updating elastic-agent.yml and the container needs to be restarted.
    update /usr/shared/elastic-agent/state/elastic-agent.yml to enable the tracing:
agent.monitoring:
  # enabled turns on monitoring of running processes
  enabled: true
  traces: true
  apm:
    environment: "testenv"

should look like this:
image

NOTE:
Mounting modified elastic-agent.yml to /usr/shared/elastic-agent/elastic-agent.yml isn't honored, the configs are ignored and a new config is created in /usr/shared/elastic-agent/state/elastic-agent.yml. seems to be happening when in managed mode.

  1. save the change and restart the container to take this change in place.
  2. In the logs, search for the keyword APM Instrumentation, the latest entry should show APM Instrumentation is enabled.
2024-07-03 15:19:30 
{
  "log.level": "info",
  "@timestamp": "2024-07-03T06:19:30.660Z",
  "message": "APM instrumentation is enabled",
  "component": {
    "binary": "apm-server",
    "dataset": "elastic_agent.apm_server",
    "id": "apm-default",
    "type": "apm"
  },
  "log": { "source": "apm-default" },
  "service.name": "apm-server",
  "ecs.version": "1.6.0",
  "log.logger": "tracing",
  "log.origin": {
    "file.line": 171,
    "file.name": "instrumentation/instrumentation.go",
    "function": "github.com/elastic/beats/v7/libbeat/instrumentation.initTracer"
  },
  "ecs.version": "1.6.0"
}

  1. You should also see the request logs sent internally (source IP 127.0.0.1). You can also see the reloader gets the configs correctly in the logs. Also, you can search for the APMConfig in the logs to preview which values are received.

Observation

I was observing some auth error when enabled the self instrumentation with above config with the error message showing "cannot handle stream: cannot process batch: unauthorized: anonymous access not permitted for agent \"go\""
It should be confirmed but it seems anonymous authenticator is enabled in the trace server which doesn't match with what we've configured in tracing.go:

authenticator, err := auth.NewAuthenticator(config.AgentAuth{})

Conclusion

Auth error prevented me from seeing the ingested traces. However, it is out of the scope for this PR and should be addressed separately. I was able to see the APMConfig is received in the APM Server successfully via reloader and the instrumentation is turned on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify test-plan v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants