Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove grpc-context dependency from the api package #575

Closed
codefromthecrypt opened this issue Sep 29, 2019 · 66 comments · Fixed by #1751
Closed

remove grpc-context dependency from the api package #575

codefromthecrypt opened this issue Sep 29, 2019 · 66 comments · Fixed by #1751
Assignees
Labels
API API related issues priority:p1 Critical issues and bugs. Highest priority; breaking API changes. release:required-for-ga Required for 1.0 GA release

Comments

@codefromthecrypt
Copy link
Contributor

I understand why opentelemetry has a bias towards google's gRPC library. This codebase primarily borrows from work that started internally, became google's instrumentation-java, later renamed to opencensus, and now mostly copy/pasted from that. The original primary customer was not just gRPC protocol, but specifically google's library. For example, gRPC is the most primary user of the prior work, and it is helpful to have a user to guide.

Time has passed, though, and what started as an end-to-end instrumentation library has split into a Api + SDK approach, as facilitated by the weaving in of OpenTracing. Yet, code opinons towards google's gRPC implementation still exist in the form of a strict dependency in the api package, and helper code.

While it is understandable that alliances exists, and having good integration with Google's gRPC implementation is important, even gRPC itself has options. For example, Armeria has gRPC protocol support without the need of Google's library. We have already encountered problems where other google projects drag in gRPC because of Census. This should stop now. OpenTelemetry as a body of people, sure that can advocate for Google. However, the instrumentation api must not pick winners and losers in RPC frameworks.

Please remove all traces of grpc-context from the api library and move the support functions to a context integration package.

This is a follow-up from here: #571 (comment)

@codefromthecrypt
Copy link
Contributor Author

here are some related issues from grpc-context:

@codefromthecrypt
Copy link
Contributor Author

It may also not be obvious why removing it from the api package is important besides bloat. grpc-context is not even an optional dependency. strict dependencies not only cause bloat and pin versions, but they also obscure what apis are actually powering the software. This means it hides abstraction leaks. For example, you can't tell if an instrumentation site requires grpc-context or not by looking at its dependencies: you have to scan all of its code.

TL;DR; Currently, anything marked as opentelemetry-api might as well be assumed to only work when grpc-context is present.

$ ./gradlew :opentelemetry-api:dependencies
--snip--

compileClasspath - Compile classpath for source set 'main'.
+--- com.google.auto.value:auto-value-annotations:1.6.2
+--- com.google.errorprone:error_prone_annotations:2.3.2
+--- com.google.code.findbugs:jsr305:3.0.2
\--- io.grpc:grpc-context:1.20.0

--snip--

runtimeClasspath - Runtime classpath of source set 'main'.
+--- com.google.auto.value:auto-value-annotations:1.6.2
+--- com.google.errorprone:error_prone_annotations:2.3.2
+--- com.google.code.findbugs:jsr305:3.0.2
\--- io.grpc:grpc-context:1.20.0

@codefromthecrypt
Copy link
Contributor Author

See also #576 as the SDK package is far worse

@trask
Copy link
Member

trask commented Apr 2, 2020

@carlosalberto @bogdandrutu @jkwatson I'm worried that we went in the opposite direction of removing the grpc-context dependency in 0.3.0, and it is now part of the public API surface in 0.3.0, with OpenTelemetry API users interacting directly with gRPC Context.

Summarizing my thoughts below about the grpc-context dependency, and I will join the Java SIG meeting tomorrow morning so we can discuss in person.

  1. The dependency on grpc-context makes it less clear what the OpenTelemetry API surface really is.

    For example, the OpenTelemetry API's current span can be manipulated directly via gRPC Context. Is this part of OpenTelemetry API surface?

    For another example, the way to create an OpenTelemetry scope listener currently is by creating a class with a very special name io.grpc.override.ContextStorageOverride. This seems like a very odd public API surface that we are exposing. (and what happens if there is already one of these classes on the classpath?)

  2. The dependency on grpc-context causes a particular problem for the auto-instrumentation project.

    We instrument the OpenTelemetry API itself, so that telemetry sent from an application/library to OpenTelemetry API will seamlessly interoperate with telemetry captured by auto-instrumentation. If gRPC Context is part of the OpenTelemetry API surface, we will have to interoperate with all features (and probably multiple versions) of gRPC Context, in order to keep the "application-space" gRPC Context in sync with our embedded SDK's gRPC Context.

  3. I think we will have better success getting popular libraries to emit telemetry using opentelemetry-api if it does not pull in grpc-context.

    Many libraries are super conservative with introducing new dependencies. I already heard resistance from the Azure SDK team this week when they went to upgrade to 0.3.0 and saw the hard dependency on gRPC Context (see [FEATURE REQ][Tracing] Update opentelemetry-api dependency to latest version Azure/azure-sdk-for-java#9695). I don't think this will be an uncommon concern.

@tedsuo
Copy link
Contributor

tedsuo commented Apr 2, 2020

I would prefer this dependency be removed. I believe the gPRC team was open to moving to a shared context dependency. I feel this would be the right move - both projects sharing a stand-alone context package. This was one of the intentions behind the move to separate context propagation from tracing.

The other approach would be to ask the grpc team to take on an opentelemetry dependency.

@bogdandrutu I know you were interested in this, and already had it on your radar. Is now the time?

@Oberon00
Copy link
Member

Oberon00 commented Apr 2, 2020

See #720 (comment)

@trask
Copy link
Member

trask commented Apr 4, 2020

We should also consider Project Reactor's Context, which is used by Spring WebFlux.

Project Reactor and Spring WebFlux are also very popular projects in the Java world.

So I don't think that gRPC Context is the "clear winner" in the context propagation space, which seems to be the requirement for us to pick an external library for context propagation.

And I love the idea of us working with gRPC, Project Reactor and others to help create a "clear winner" in the Java context propagation space that we can all take a dependency on.

But in the meantime, let's replace gRPC Context with our own Context implementation, and let's see if it's possible to avoid exposing our Context object directly in our public API surface so that we can swap out our Context implementation with a "clear winner" some time in the future.

@codefromthecrypt
Copy link
Contributor Author

copying in @carlosalberto who seems to be the one who decided to make a hard dependency based on bogdan's advice.

Note: I think the maintainers of this repo should be revisited, as it should really be led by folks who are conservative about things like this by default.

@codefromthecrypt
Copy link
Contributor Author

by revisit, I mean that whoever is a maintainer acknowledges that they won't add any hard dependency on a 3rd party library for paths required for usage. And especially this point..

when you do decide to do the opposite of a pending issue, you must comment on that issue. Avoiding conflict by not doing that isn't good for the project as it leads to surprise.

@jkwatson
Copy link
Contributor

jkwatson commented Apr 4, 2020

Hi @adriancole . Newly minted maintainer here, trying to make sure this gets addressed. I wholeheartedly agree that we need to minimize dependencies in the API layer, and that removing grpc-context from the API surface is absolutely our goal. Also, getting rid of grpc-context as a hard API dependency is also needed.

The other desire, I believe, is to have a battle-hardened context implementation in opentelemetry, so that we can launch with it.

Do you have a constructive suggestion on what we should be using for our context implementation?

I can see several options:

  1. bring in an existing production-ready, appropriately licensed context implementation into the codebase as source-code, repackaged under an opentelemetry namespace.
  2. build a brand-new context implementation into opentelemetry, which isn't production-hardened.
  3. design an context abstraction, with no implementation, for the API, continuing to use a wrapped grpc-context as the implementation in the SDK.

I'm sure there are other options. I'd love to hear what your approach would be.

@codefromthecrypt
Copy link
Contributor Author

codefromthecrypt commented Apr 5, 2020

@jkwatson thanks for asking! honestly, this is a good turn in itself that you are asking outsiders advice. bravo! Here's some experience from the zipkin brave project.

separate api for sure. you will need to make some decisions, particularly if it is immutable only. only add the apis you need, nothing more. here are some hints

  • consider whether or not you want a strongly typed key. this seems nice but also limiting
    • strongly typed key at the surface of your api does not imply a strongly typed one underneath.
  • probably stick to string values, and push any conversion into the users hand via decoration
    • very few propagated keys are numbers, and there's cost mostly on update which is likely infrequent.
  • make a read-only api available. most call sites only need to see the current value (ex parent)
  • consider impact of reactive, as immutable contexts are gnarly especially you can't set a value late.
  • consider sometimes the context is embedded in a message. It is hard to add this api later.

You will need an implementation plug-in that forwards minimally to a stacked thread local, but could also plug into other context storage apis. For example, gRPC has one (so does finagle). Armeria recently designed one also. Note.. some context storage is request scoped!

@codefromthecrypt
Copy link
Contributor Author

@jkwatson since I have your ear, I'll give you unsolicited feedback on how to make friends with other projects. Some preceding work ripped off designs or even large amounts of code copy/pasted with no thanks or attribution. This really made people mad. When you do make a design or implementation, please in the PR thank them, and also cite somewhere in at least comments if not javadoc or a RATIONALE the inspiration. You can turn this culture around.

@jkwatson
Copy link
Contributor

jkwatson commented Apr 5, 2020

So, it sounds like your general approach would be to build something new, rather than work with something already battle-hardened?

@codefromthecrypt
Copy link
Contributor Author

@jkwatson I assume you mean grpc context? It isn't a separate library, and also the feature scope is broader in some ways and more narrow than other context libraries. I think the api surface should be not tied to grpc context, but if the community decide to make that the default implementation they could (as long as the library is extracted, has no deps and doesn't change the JDK requirements). Basically there's still a decision about the api surface area even if you use gRPC by default, there are a lot of api decisions nested in grps, like timeout listeners etc that may not be appropriate here, and you also have to consider most RPC libraries control their own context and are not using grpc underneath.

hope this helps.

@codefromthecrypt
Copy link
Contributor Author

the best way out is to be diverse in testing.. ex don't only test interop with grpc. test interop with several libraries (ex reactor-netty, armeria, non-rpc). This will show what I'm telling about library bias.

@codefromthecrypt
Copy link
Contributor Author

here's a really old discussion that might be interesting DistributedTracing/continuation-local-storage-jvm#1

also, if you need more wrenches for the clockwork, ask someone about akka :)

@codefromthecrypt
Copy link
Contributor Author

Last unsolicited.. I might be wrong, but I think grpc has a bias towards static initial context lookup.

please do not expect everything to be static lookups. this was a problem in previous designs to make everything static registries.. "static only" is not an appropriate solution for DI or modular (OSGi environments) even if it can make sense for agents.

I can't find it now, but @jkschneider had a neat google doc or something on how micrometer allows choice of static or injected registries.

@jkwatson
Copy link
Contributor

jkwatson commented Apr 5, 2020

@jkwatson I assume you mean grpc context?

No, I was just trying figure out between the 2 BIG options: build something new, or use something that already exists, with proper attribution, etc.

@carlosalberto
Copy link
Contributor

Hey @adriancole

No need to freak out here, as we are still before 1.0 and this issue has been closed ;) But thanks for the additional information. There have been interesting reasons on the advantages and disadvantages of grpc.Context, and we kept the dependency for now, while we figure out the way to go (either keep it or abstract in a specific way).

My initial prototype considered abstracting this part, and my slight personal preference would be to make grpc.Context and independent module/artifact that, so it would become the de facto context solution for Java (something similar to what contextvars is for Python).

In any case, as John suggested we will evaluate our options now. But definitely this will happen before 1.0 ;)

@jkwatson
Copy link
Contributor

jkwatson commented Apr 6, 2020

Document being used to capture a more dynamic discussion: https://docs.google.com/document/d/1dKXHYH1CQm4IQuRl5NZaoh2vETjRlBfhcsciz_Xs5hQ/edit?usp=sharing

@codefromthecrypt
Copy link
Contributor Author

codefromthecrypt commented Apr 6, 2020 via email

@carlosalberto
Copy link
Contributor

pre 1.0 libraries when marketed to masses end up in people's code

Disagree on this point, at least for OTel - observe we just released beta a few days ago, and as such, the term beta should signal people they can still expect changes.

Lets definitely focus on the technical discussion. So I suggest commenting on the document, so a decision can be done with all the advantages/disadvantages of each approach taken into account. I have no strong attachment to any alternative, so I'm open to what is agreed ;)

@Oberon00
Copy link
Member

Oberon00 commented Apr 6, 2020

pre 1.0 libraries when marketed to masses end up in people's code

Can confirm. Planning with the initial release dates + some not generous enough buffer can lead to that. EDIT: To be clear, this was not entirely unexpected so it's not that bad; we are not re-exposing any interfaces and are prepared for breaking changes / sticking with old versions for some time.

@codefromthecrypt
Copy link
Contributor Author

I noticed my notes didn't move into the doc so I added them

Knowing opentracing was pushed with marketing and its final version was 0.33.. and that census also was a dep of grpc and also google's http library also <1.0, I think folks dismissing the concern about 1.0 aren't maybe taking this seriously enough.

@codefromthecrypt
Copy link
Contributor Author

the google doc seems to have a lot of active doubt that there isn't enough manpower etc to do context. Speaking from experience on a project which has never had more than one fulltime engineer, and most integrations are done by volunteers, I would pretty much say don't limit yourself.

Here's an example of integrating with an existing context using a bridge. there are edge cases around this, pros and cons, what to do if there's no request (ex fallback to a thread local?), but if a volunteer project can do this, so can open telemetry. give it a shot!

https://github.com/line/armeria/blob/master/brave/src/main/java/com/linecorp/armeria/common/brave/RequestContextCurrentTraceContext.java

@iNikem
Copy link
Contributor

iNikem commented Aug 9, 2020

@FroMage This is very interesting for us in auto instrumentation for Java. Can you please take a look at this use-case? How does MP-CP handle such "run-away" contexts?

@jkwatson
Copy link
Contributor

@FroMage Super interesting. I'm going to set aside from time to read the spec document. Thanks for linking this here!

@sjoerdtalsma
Copy link

@FroMage thanks for the link. It sounds like a perfect substitute for my hand-rolled api. I'll definitely take a look at it.

@jkwatson jkwatson added API API related issues priority:p1 Critical issues and bugs. Highest priority; breaking API changes. labels Aug 12, 2020
@FroMage
Copy link

FroMage commented Aug 13, 2020

Can you please take a look at this use-case? How does MP-CP handle such "run-away" contexts?

You can either inject a ManagedExecutor which clears all contexts for async actions if the context is meant to be closed before they terminate, or if the context is meant to be kept open and used until all such async actions are completed, then it's up to your application to keep track of when those async actions terminate before you can close your context.

@jkwatson
Copy link
Contributor

@FroMage Is Microprofile Context Propagation locked to java 8+, or could it be used with java 7?

@carlosalberto
Copy link
Contributor

Assigning this to @malafeev as he will do an initial digging & comparison of MicroProfile and grpc's Context.

@malafeev
Copy link
Contributor

malafeev commented Aug 18, 2020

Java 7 is not supported by MP-CP API. It's based on Java 8 features.
When I tried to compile

 ManagedExecutor executor = ManagedExecutor.builder()
        .maxAsync(5)
        .propagated(ThreadContext.CDI, ThreadContext.APPLICATION)
        .build();

    ThreadContext threadContext = ThreadContext.builder()
        .propagated(ThreadContext.SECURITY)
        .build();

I got

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project test: Compilation failure: Compilation failure: 
[ERROR] /Users/malafes/projects/test/src/main/java/Main.java:[8,47] static interface method invocations are not supported in -source 1.7
[ERROR]   (use -source 8 or higher to enable static interface method invocations)
[ERROR] /Users/malafes/projects/test/src/main/java/Main.java:[13,48] static interface method invocations are not supported in -source 1.7
[ERROR]   (use -source 8 or higher to enable static interface method invocations)

@FroMage
Copy link

FroMage commented Aug 18, 2020

Yeah, it's JDK8+

@codefromthecrypt
Copy link
Contributor Author

I would summarize this as another key consumer/integration besides grpc-context. There is also https://github.com/alibaba/transmittable-thread-local which focuses on some of this even if we formerly mostly discussed RPC and reactive.

The main idea here was keeping the deps clear and knowing that while solving for gRPC is important, so is making other things possible, which share different dependency characteristics, etc.

@malafeev
Copy link
Contributor

from my MP CP review:

  • MP CP is a set of interfaces to implement.
  • MP CP is not a solution which we can take and use without doing own implementation of interfaces.
  • MP CP is coupled to Executors. Threads should be managed by container...
  • MP CP is java 8+. It’s based on java 8 concurrency api: completable future is a first citizen.
  • MP CP must to be compatible with Java EE Concurrency model/api, therefore it may have unnecessary functionality.
  • I was not able to create an example to propagate value like a string from one thread to another thread reading documentation. So it’s still a black box for me.

@FroMage
Copy link

FroMage commented Aug 31, 2020

MP CP is not a solution which we can take and use without doing own implementation of interfaces.

Well, there's SmallRye-ContextPropagation which implements it that you can use. I don't see the point in re-implementing it.

MP CP is coupled to Executors. Threads should be managed by container...

That's only the case for ManagedExecutor but not for ThreadContext which is what you will want to use.

MP CP is java 8+. It’s based on java 8 concurrency api: completable future is a first citizen.

Yes, that's the case.

MP CP must to be compatible with Java EE Concurrency model/api, therefore it may have unnecessary functionality.

This is only true of ManagedExecutor: a single class.

I was not able to create an example to propagate value like a string from one thread to another thread reading documentation. So it’s still a black box for me.

Do you want me to provide a complete example here?

@malafeev
Copy link
Contributor

Do you want me to provide a complete example here?

@FroMage it would be nice, thanks

@FroMage
Copy link

FroMage commented Aug 31, 2020

So, there's your test class:

package whatever;

import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

import org.eclipse.microprofile.context.ThreadContext;
import org.eclipse.microprofile.context.spi.ContextManager;
import org.eclipse.microprofile.context.spi.ContextManagerProvider;
import org.junit.Assert;
import org.junit.Test;

public class DemoTest {
    @Test
    public void test() throws InterruptedException, ExecutionException {
        ContextManager contextManager = ContextManagerProvider.instance().getContextManager();
        ThreadContext threadContext = contextManager.newThreadContextBuilder().build();
        
        ExecutorService executorService = Executors.newFixedThreadPool(2);
        MyContext.clear();
        // start with no context
        Assert.assertNull(MyContext.get());
        
        // verify that new threads also get no context
        executorService.submit(() -> {
            Assert.assertNull(MyContext.get());
        }).get();
        
        // now set the context
        MyContext ctx = new MyContext();
        MyContext.set(ctx);
        // verify that we have a context
        Assert.assertEquals(ctx, MyContext.get());
        
        // verify that new threads still get no context
        executorService.submit(() -> {
            Assert.assertNull(MyContext.get());
        }).get();

        // verify that new threads with context propagation do get context
        executorService.submit(threadContext.contextualRunnable(() -> {
            Assert.assertEquals(ctx, MyContext.get());
        })).get();
    }
}

And here's your context class:

package whatever;

public class MyContext {

    private static ThreadLocal<MyContext> context = new ThreadLocal<MyContext>();

    public static void init() {
        context.set(new MyContext());
    }

    public static void clear() {
        context.remove();
    }

    public static MyContext get() {
        return context.get();
    }

    public static void set(MyContext newContext) {
        context.set(newContext);
    }

// put your context properties here
    private String reqId;

    public void set(String reqId) {
        this.reqId = reqId;
    }

    public String getReqId() {
        return reqId;
    }
}

This is typically where you'd put whatever context values you need.

You'll need to have this src/main/resources/META-INF/services/org.eclipse.microprofile.context.spi.ThreadContextProvider file too:

whatever.MyThreadContextProvider

You'll want to use SmallRye-CP:

        <dependency>
            <groupId>io.smallrye</groupId>
            <artifactId>smallrye-context-propagation</artifactId>
            <version>1.0.16</version>
        </dependency>

That's all. Now you can both propagate any number of contexts, and make sure that your context will be propagated to any CP user.

Does that help?

@jkwatson
Copy link
Contributor

In the 9/15[16] java meeting, it was agreed that @anuraaga will create a new module and start building our own context implementation. Thanks @anuraaga for spearheading this work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API API related issues priority:p1 Critical issues and bugs. Highest priority; breaking API changes. release:required-for-ga Required for 1.0 GA release
Projects
None yet
Development

Successfully merging a pull request may close this issue.