Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] - Make Jaeger Tracer OpenTelemetry Agent aware. #6537

Merged
merged 9 commits into from
Jun 2, 2023

Conversation

dalexandrov
Copy link
Contributor

Resolves #6174 for Jager Tracer

@dalexandrov dalexandrov added tracing 3.x Issues for 3.x version branch labels Mar 31, 2023
@dalexandrov dalexandrov self-assigned this Mar 31, 2023
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 31, 2023
Copy link
Member

@tjquinno tjquinno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my suggestion for a more general solution.

Also, even though this is a simple change, it would be good to add a test that shows this solves the reported problem when the agent is present.


private static boolean checkAgent() {
try {
Class.forName("io.opentelemetry.javaagent.bootstrap.AgentStarter");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed yesterday, this approach handles the single situation in which the OpenTelemetry agent is present (and has established the global tracer itself) but there could be various other cases in which other code (perhaps user code), for whatever reason, has assigned the global tracer before our code has a chance to do so. The OTel agent being present is just one such case.

Is there a way to create a more general solution?

From my quick look I did not see a way to find out simply if the global OpenTelemetry has been set. We can get it (OTel automatically creates a no-op instance if set had not previously been invoked, which does not work focus) or set it (OTel throws an exception if set had previously been invoked).

But could we use that behavior to our advantage? I have not tried this, but instead of checking for the presence of the agent, what if the Helidon code which invokes GlobalOpenTelemetry#set caught the IllegalStateException, logged a message indicating that the global OTel has already been set, and invokes get to retrieve that previously-set instance?

That would tell the user that any configuration she had used which Helidon works would not have been used in setting up OT and would allow our code to continue, using either the OTel instance we had built and set or the previously-set one.

The exception which OT throws contains the stack trace captured when GlobalOpenTelemetry#set was invoked and we could include that in our log message (perhaps at FINE level) so the user understand what code had set the global OTel instance.

This might be a little more code than what's in the PR currently, but

  • it would be less fragile (OTel could change the agent class name whereas it seems less likely they would change the signature and behavior of GlobalOpenTelemetry#set, and
  • it would handle any case, not just the agent case, in which other code has set the global OT instance before our code does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please never ever use Class.forName("") in Helidon.
We should handle this type of requirements differently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currenty using fast solution with indirect checks of properties and instrumented Context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@dalexandrov dalexandrov marked this pull request as draft March 31, 2023 13:00

private static boolean checkAgent() {
try {
Class.forName("io.opentelemetry.javaagent.bootstrap.AgentStarter");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please never ever use Class.forName("") in Helidon.
We should handle this type of requirements differently

@bogedal
Copy link

bogedal commented Apr 4, 2023

Could I suggest this AgentDetector class instead? @dalexandrov @tomas-langer

package io.helidon.tracing.jaeger;

import java.util.logging.Logger;

import io.opentelemetry.context.Context;

class OpenTelemetryAgentDetector {

    private static final Logger LOGGER = Logger.getLogger(OpenTelemetryAgentDetector.class.getName());
    private static final String AGENT_CLASS_NAME = "agent";
    private static final String AGENT_PROPERTY_PREFIX = "io.opentelemetry.javaagent";

    private OpenTelemetryAgentDetector () {
        // Empty constructor for a utility class.
    }

    static boolean isOpenTelemetryAgentPresent() {
        if (isAgentInCurrentContext() || isAgentInSystemProperties()) {
            LOGGER.info("OpenTelemetry Agent detected.");
            return true;
        }

        return false;
    }

    private static boolean isAgentInCurrentContext() {
        return Context.current().getClass().getName().contains(AGENT_CLASS_NAME);
    }

    private static boolean isAgentInSystemProperties() {
        return System.getProperties().stringPropertyNames()
                .stream()
                .anyMatch(property -> property.contains(AGENT_PROPERTY_PREFIX));
    }
}

@dalexandrov
Copy link
Contributor Author

Nice, but do we really want to use this type of detection. I would consider it as a temporary solution.

@bogedal
Copy link

bogedal commented Apr 4, 2023

True - it still isn't the best.

Using something like this isn't much better either

OpenTelemetryAgentDetector.class.getClassLoader().getResource("io/opentelemetry/<...>") // Not sure what the resource path is :)

I'll think on a better solution, and should I come up with one I'll post it here.

@dalexandrov
Copy link
Contributor Author

Depends on #6493 for the Agent Detector

@dalexandrov dalexandrov marked this pull request as ready for review May 3, 2023 07:52
@dalexandrov dalexandrov requested a review from tjquinno May 3, 2023 07:52
@dalexandrov dalexandrov added this to the 3.2.2 milestone May 3, 2023
Comment on lines 425 to 427
return HelidonOpenTelemetry.create(GlobalOpenTelemetry.get(),
GlobalOpenTelemetry.getTracer(this.serviceName),
tags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formatting seems wrong. Should not this be on the level where GlobalOpenTelemetry.get() parameter is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what Idea did....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@dalexandrov dalexandrov requested a review from Verdent May 4, 2023 16:43
@tjquinno
Copy link
Member

tjquinno commented May 4, 2023

I have been out of the office and am catching up on this PR now.

@dalexandrov Did you ever try the approach I described in my 31 March comment?

That would avoid any implementation details (such as the name of the agent class, expressed either as a class name or a resource name) which could change from one release to another. It would also support any way that the global tracer might be set--not just by the agent but by any code--before the Helidon code runs and Helidon would then properly work with the established global tracer.

@dalexandrov
Copy link
Contributor Author

Hello @tjquinno,
After a lot of investigation we found out, that unfortunately your proposed approach is not possible. OpenTelemetry SDK does not allow it, since all getters there also do setting to no-op, and they do it only once. There is no way to check before setting, as checking momentarily sets the object to no-op. Other ways like checking for secondary exceptions also did not work. We have explored the code with @tomas-langer and came to the conclusion, that the proposed in this PR approach is the most suitable compromise.

@tjquinno
Copy link
Member

tjquinno commented May 5, 2023

@dalexandrov We already knew that the Otel API did not permit a get-without-set, as I described in my earlier comment.

The OTel code simply throws an IllegalStateException if the caller attempts to invoke #set(OpenTelemetry) a second time. It seems that it would be straightforward for Helidon to catch that and then invoke #get to obtain the previously-set instance, or if no exception was thrown then Helidon's attempt to set the value was successful.

Can you please explain why checking for the exception when invoking #set when it had been already invoked did not work? If you still have the code where you tried this a pointer would be great.

@tjquinno
Copy link
Member

tjquinno commented May 5, 2023

Apparently Mitia has had off-line discussions with Tomas about this. Mitia and I will speak once more about this early next week.

@tjquinno
Copy link
Member

tjquinno commented May 9, 2023

Mitia walked me through some code that shows my suggestion about catching the exception does not work. The exception is thrown not from the thread which invokes the set method but from elsewhere where Helidon cannot easily catch it and respond.

@dalexandrov dalexandrov requested review from spericas and tjquinno and removed request for Verdent, tomas-langer and tjquinno May 25, 2023 07:56
Copy link
Member

@Verdent Verdent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just formatting issues. Otherwise it looks good from what I can judge. Fix these please and I will approve this.

Comment on lines 425 to 426
return HelidonOpenTelemetry.create(GlobalOpenTelemetry.get(),
GlobalOpenTelemetry.getTracer(this.serviceName), tags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this formatting is not correct.
It should look like this:

if (HelidonOpenTelemetry.AgentDetector.isAgentPresent(config)) {
    return HelidonOpenTelemetry.create(GlobalOpenTelemetry.get(),
                                       GlobalOpenTelemetry.getTracer(this.serviceName), 
                                       tags);
}

Every method paramter should be on the new line. It improves readablity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok...
(#6537 (comment))

Copy link
Member

@Verdent Verdent Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was wrong at that point as well. since it was at the level of return instead of create method and that was what I was talking about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed :)

@dalexandrov dalexandrov requested a review from Verdent June 2, 2023 12:25
@dalexandrov dalexandrov merged commit 3da80ae into helidon-io:helidon-3.x Jun 2, 2023
@dalexandrov dalexandrov deleted the 6174_3.x_OTEL_Agent branch June 2, 2023 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.x Issues for 3.x version branch OCA Verified All contributors have signed the Oracle Contributor Agreement. tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken span hierarchy when using opentelemetry java agent
5 participants