Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Instrument beat pipeline #17938

Merged
merged 37 commits into from
May 4, 2020
Merged

[APM] Instrument beat pipeline #17938

merged 37 commits into from
May 4, 2020

Conversation

jalvz
Copy link
Contributor

@jalvz jalvz commented Apr 23, 2020

What does this PR do?

Instruments beats publishing output and the elasticsearch output.

Why is it important?

To gain insight into performance of libbeat based applications.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Run any beat with ELASTIC_APM_ACTIVE=true set in the environment

Related issues

@jalvz jalvz changed the title Publisher apm Instrument beat pipeline with APM Apr 23, 2020
@jalvz jalvz changed the title Instrument beat pipeline with APM [APM] Instrument beat pipeline Apr 24, 2020
@jalvz jalvz requested a review from a team April 24, 2020 15:16
@jalvz
Copy link
Contributor Author

jalvz commented Apr 27, 2020

Hey, can anyone have a look at this?
Test failures seem unrelated and show up in other PR's too.

@andresrc andresrc added [zube]: In Review Team:apm APM Server team issues/PRs labels Apr 27, 2020
@jalvz jalvz requested a review from axw April 29, 2020 13:51
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Added some food-for-thought comments.

@@ -94,8 +104,7 @@ func (w *clientWorker) run() {
continue
}
w.observer.outBatchSend(len(batch.Events()))

if err := w.client.Publish(batch); err != nil {
if err := w.client.Publish(context.TODO(), batch); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps leave a // TODO comment as well, explaining what needs to be done?

data []publisher.Event,
) ([]publisher.Event, error) {
func (client *Client) publishEvents(ctx context.Context, data []publisher.Event) ([]publisher.Event, error) {
span, ctx := apm.StartSpan(ctx, "publishEvents", "output")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if we didn't create this span, and just relied on the "publish" transaction, but this is probably necessary for now if we really want the labels.

Ideally, the code for recording the labels (events_original, etc.) should be done in a single location, rather than in specific output implementations. This would at lease give high-level information for all outputs. Specific outputs could optionally extend those spans with output-specific information, or add sub-spans (e.g. the apmelasticsearch span).

I think to do that we'd need to change the Batch and/or Observer interfaces (to report batch-specific stats) and/or the Publish signature (to return the stats).

@jalvz
Copy link
Contributor Author

jalvz commented Apr 30, 2020

@urso @ycombinator any more comments?
I'd like a final 👍 from either of you before merging

@ycombinator
Copy link
Contributor

Hey @jalvz, the libbeat CI failure looks related to this PR? https://travis-ci.org/github/elastic/beats/jobs/681381592#L2209

@ycombinator
Copy link
Contributor

ycombinator commented Apr 30, 2020

One thing that just occurred to me (sorry!) is that we're instrumenting code in the eslegclient package. This package is meant to be a generic Elasticsearch client, which means we shouldn't assume that it's only the libbeat Elasticsearch output or the libbeat Elasticsearch monitoring reporter code is using it. At some point in the future we plan to replace this eslegclient with the official Go Elasticsearch client.

So now I'm wondering if it's a good idea to add instrumentation inside this package or if we should try to instrument around the calls into this package (which would primarily be the Bulk method call).

@jalvz
Copy link
Contributor Author

jalvz commented Apr 30, 2020

Ah, I see. I'm happy to move up the the instrumentation if you prefer... that said, without ELASTIC_APM_ACTIVE the tracer is disabled and has no impact. On the other hand, if there are other users of eslegclient outside libbeat that means that they can get instrumentation out of the box for free :)

@ycombinator
Copy link
Contributor

ycombinator commented Apr 30, 2020

Yeah, as things stand today I think the changes in this PR are not really a problem. That's because it looks like the only point of instrumentation (correct me if I'm reading this wrong, btw!) is inside the Bulk method. And that method is only called from either the ES output or the ES monitoring reporter code in libbeat. There are other parts of the Beats codebase that use other eslegclient APIs but not Bulk. And, to your point, this does mean that any code using eslegclient gets APM instrumentation for free which is convenient indeed!

I'm more wondering what this means going forward, when we try to pull out eslegclient (the leg in the name stands for "legacy" , BTW 😉) and replace it with the official Go Elasticsearch client. Would we first need to instrument the latter similarly? Thinking about it more, I don't think this should necessarily block progress on this PR today but perhaps we could simply add a comment near the instrumentation points inside eslegclient about "moving" the instrumentation to the official Go client when the migration happens so we don't lose the instrumentation by accident. Alternatively we could modify this PR to keep the instrumentation around the calls into eslegclient but not inside it (not sure how much that buys us, though?).

WDYT?

@jalvz
Copy link
Contributor Author

jalvz commented Apr 30, 2020

Would we first need to instrument the latter similarly?

Yes, exactly. Like, nothing doesn't break or anything if not done, but we would get the value there in doing it. I added a comment about that (great idea).
Moving forward, it would be great to add instrumentation to other parts (mem queue, kafka/logstash outputs, etc), and I think is totally fine that things get moved around and as a result we instrument slightly different things too over time.

@@ -130,6 +130,8 @@ func NewConnection(s ConnectionSettings) (*Connection, error) {
}

var httpClient esHTTPClient
// when dropping the legacy client in favour of the official Go client, it should be instrumented
// eg, like in https://github.com/elastic/apm-server/blob/7.7/elasticsearch/client.go
Copy link
Contributor

@ycombinator ycombinator Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; merge when CI looks good.

Looking forward to seeing the results of this instrumentation — thanks for doing it!

@jalvz
Copy link
Contributor Author

jalvz commented Apr 30, 2020

fantastic, thanks a lot!

@jalvz
Copy link
Contributor Author

jalvz commented May 4, 2020

Merging this now, thanks all again for review!

@jalvz jalvz merged commit 2d9ac92 into elastic:master May 4, 2020
jalvz added a commit to jalvz/beats that referenced this pull request May 4, 2020
This allows beat users to instrument the publishing pipeline by setting ELASTIC_APM_ACTIVE=true in the environment.

Co-authored-by: Gil Raphaelli <gil@elastic.co>
Co-authored-by: Andrew Wilkins <axwalk@gmail.com>
jalvz added a commit that referenced this pull request May 5, 2020
* [APM] Instrument beat pipeline (#17938)

This allows beat users to instrument the publishing pipeline by setting ELASTIC_APM_ACTIVE=true in the environment.

Co-authored-by: Gil Raphaelli <gil@elastic.co>
Co-authored-by: Andrew Wilkins <axwalk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement release-note:dev_docs Contains a Dev Docs section Team:apm APM Server team issues/PRs Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants