-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log an error when the agent responds with a non-success HTTP status #260
Conversation
3a725ec
to
9fa92c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking as approved, with some comments to consider.
src/agent_writer.cpp
Outdated
"following body of length " | ||
<< body.size() << ": " << body; | ||
logger_->Log(LogLevel::error, diagnostic.str()); | ||
} else if (response_status < 200 || response_status >= 300) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to be more specific about valid statuses here. Pretty sure the agent would only normally send a 200, and anything else would be surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source code agrees with you: https://github.com/DataDog/datadog-agent/blob/main/pkg/trace/api/api.go
However, the status is not part of the documented contract between the agent and tracer libraries (what little contract there is: I did some documentation here).
I don't know how likely the de facto "only 200" is to change (probably 0%), but let's accept all of the 2xx anyway.
The java tracer checks . References:[200, 300)
as we do here
- https://github.com/DataDog/dd-trace-java/blob/b25b4321c847f4206ab4970c8b6a9ab4f35ee838/dd-trace-core/src/main/java/datadog/trace/common/writer/ddintake/DDEvpProxyApi.java#LL124C6-L124C6
- https://square.github.io/okhttp/4.x/okhttp/okhttp3/-response/is-successful/
I'll leave it.
edit: One of the links above is for the Intake writer, not the Agent writer. I'm having trouble finding the corresponding logic in the Agent writer. Also, Doug claims that the Agent will sometimes return 200 when it is overloaded. Perhaps that's what was going on in the issue that inspired this pull request. I'll look into it more. Regardless, this logging could be valuable.
edit: I'll be damned, they do check only 200: https://github.com/DataDog/dd-trace-java/blob/b25b4321c847f4206ab4970c8b6a9ab4f35ee838/dd-trace-core/src/main/java/datadog/trace/common/writer/ddagent/DDAgentApi.java#L116-L122. I'll change it here and make a note to change it in dd-trace-cpp
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the Agent might return 200 without a response body in cases that should be 429 "too many requests," now I'm wondering if we should ignore empty response bodies in all cases.
edit: In fact, it will still return the sample rates JSON body even when overloaded.
The question now is whether the Agent will ever return an empty response body in a truly success-indicating response.
It will not. Here is what I will implement:
David Goffredo
Here's my plan:
- transport issue: log error
- non-200 response: log error
- 200 response without body: log error
- 200 response with body: assume JSON
yeah?
Kyle Nusbaum
Sounds good.
I would distinguish between totally empty body and an empty json object. {} seems like a valid response.
David Goffredo
correct
Kyle Nusbaum
Then looks good.
David Goffredo
by "empty," I mean length zero
coolio, thanks a third time
Kyle Nusbaum
No problem.
std::ostringstream diagnostic; | ||
diagnostic << "Datadog Agent returned response without an HTTP status and with the " | ||
"following body of length " | ||
<< body.size() << ": " << body; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the body would be empty, but if it weren't, it could be pretty long.
So maybe just the first 100 bytes or so instead of the entire body?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that. I agree that the response will most likely be empty; but if it's not empty, I don't think it will be long. What might it contain? In the 200 case, it's a JSON object containing one entry per root service seen recently by the Agent, but that's not relevant here. In the error case, it might be... a stack trace? This is Go we're talking about. Logging the body is kind of silly, but I don't think that excessive length will be a problem. Let's leave it.
test/agent_writer_test.cpp
Outdated
|
||
writer.flush(std::chrono::seconds(10)); | ||
REQUIRE(logger->records.size() != 0); | ||
// The logged error diagnostic will say that there was not response status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not -> no
Currently, the
AgentWriter
passes all response bodies to theAgentHttpEncoder
. TheAgentHttpEncoder
then interprets the body as a JSON object. However, when the HTTP response status does not indicate success, the agent does not return a JSON response body.This revision logs an error whenever a non-success HTTP response status is delivered by the agent.