Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log an error when the agent responds with a non-success HTTP status #260

Merged
merged 7 commits into from
Jun 2, 2023

Conversation

dgoffredo
Copy link
Contributor

Currently, the AgentWriter passes all response bodies to the AgentHttpEncoder. The AgentHttpEncoder then interprets the body as a JSON object. However, when the HTTP response status does not indicate success, the agent does not return a JSON response body.

This revision logs an error whenever a non-success HTTP response status is delivered by the agent.

@dgoffredo dgoffredo requested a review from cgilmour May 31, 2023 16:04
@dgoffredo dgoffredo force-pushed the david.goffredo/agent-response-logging branch from 3a725ec to 9fa92c3 Compare May 31, 2023 16:09
Copy link
Contributor

@cgilmour cgilmour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as approved, with some comments to consider.

"following body of length "
<< body.size() << ": " << body;
logger_->Log(LogLevel::error, diagnostic.str());
} else if (response_status < 200 || response_status >= 300) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to be more specific about valid statuses here. Pretty sure the agent would only normally send a 200, and anything else would be surprising.

Copy link
Contributor Author

@dgoffredo dgoffredo Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source code agrees with you: https://github.com/DataDog/datadog-agent/blob/main/pkg/trace/api/api.go

However, the status is not part of the documented contract between the agent and tracer libraries (what little contract there is: I did some documentation here).

I don't know how likely the de facto "only 200" is to change (probably 0%), but let's accept all of the 2xx anyway.

The java tracer checks [200, 300) as we do here. References:

I'll leave it.

edit: One of the links above is for the Intake writer, not the Agent writer. I'm having trouble finding the corresponding logic in the Agent writer. Also, Doug claims that the Agent will sometimes return 200 when it is overloaded. Perhaps that's what was going on in the issue that inspired this pull request. I'll look into it more. Regardless, this logging could be valuable.

edit: I'll be damned, they do check only 200: https://github.com/DataDog/dd-trace-java/blob/b25b4321c847f4206ab4970c8b6a9ab4f35ee838/dd-trace-core/src/main/java/datadog/trace/common/writer/ddagent/DDAgentApi.java#L116-L122. I'll change it here and make a note to change it in dd-trace-cpp as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the Agent might return 200 without a response body in cases that should be 429 "too many requests," now I'm wondering if we should ignore empty response bodies in all cases.

edit: In fact, it will still return the sample rates JSON body even when overloaded.

The question now is whether the Agent will ever return an empty response body in a truly success-indicating response.

It will not. Here is what I will implement:

David Goffredo
 Here's my plan:
 - transport issue: log error
 - non-200 response: log error
 - 200 response without body: log error
 - 200 response with body: assume JSON

 yeah?

Kyle Nusbaum
 Sounds good.
 I would distinguish between totally empty body and an empty json object. {} seems like a valid response.

David Goffredo
 correct

Kyle Nusbaum
 Then looks good.

David Goffredo
 by "empty," I mean length zero

 coolio, thanks a third time

Kyle Nusbaum
 No problem.

std::ostringstream diagnostic;
diagnostic << "Datadog Agent returned response without an HTTP status and with the "
"following body of length "
<< body.size() << ": " << body;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the body would be empty, but if it weren't, it could be pretty long.
So maybe just the first 100 bytes or so instead of the entire body?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that. I agree that the response will most likely be empty; but if it's not empty, I don't think it will be long. What might it contain? In the 200 case, it's a JSON object containing one entry per root service seen recently by the Agent, but that's not relevant here. In the error case, it might be... a stack trace? This is Go we're talking about. Logging the body is kind of silly, but I don't think that excessive length will be a problem. Let's leave it.


writer.flush(std::chrono::seconds(10));
REQUIRE(logger->records.size() != 0);
// The logged error diagnostic will say that there was not response status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not -> no

@dgoffredo dgoffredo merged commit b682d83 into master Jun 2, 2023
@dgoffredo dgoffredo deleted the david.goffredo/agent-response-logging branch June 2, 2023 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants