Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide strongly-typed option 3 #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

euroelessar
Copy link

Add third option with:

  1. Strong typing
  2. A context object for timeout propagation across RPC calls, as well as for providing access to inbound metadata and ability to specify outgoing metadata
  3. A way for inter-mixing sync and async stubs for migration purposes

@euroelessar
Copy link
Author

@lidizheng Any thoughts?
Introduction of the context is somehow dramatic change but it maps to the state of stubs we've come at Dropbox after few years of iteration on API based on the feedback from users and necessary feature set (propagation of cancellation, deadlines, tracing info, etc across rpc stacks & easier read access to inbound metadata, client info, and auth info and append access to outbound)

Copy link
Owner

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, I failed to notice the first email.

  1. Strong typing is definitely a plus, and will be added no matter which option we are after;
  2. Explicit context object is a good idea! How does it fit into existing API, I have to think. But it looks better while passing metadata to a call;
  3. You are right about the sync/async stub separation. If we want to provide easy migration experience, we need to supply async stub creation function in generated files.

ctx = grpc.Context() \
.with_timeout_secs(5.0) \
.append_outgoing_metadata('key', 'value')
# or: ctx.with_deadline(time.time() + 5.0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and like the pattern other gRPC languages is using.

```Python
server = grpc.server()
server.add_insecure_port(':50051')
helloworld_pb2_grpc.add_AsyncGreeterServicer_to_server(Greeter(), server)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand the value of async stub, but what is the benefit of async servicer? We can support sync / async handler in the same servicer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify, please? Do you suggest to introduce some proto extension to mark individual method within a service as sync or async?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't express myself clearly with context.

In the short term, we want the generated gRPC code to be one file, and can be used for both Python 2 and Python 3. In that case, the typing info isn't included in the generated file, hence in the view of servicer, as long as the handler name is the same, it works no matter the implementation is async or sync.

In the long term, if we decided to change streaming API, then we should duplicate individual method as you suggested. So users can opt-in to the new API if they want to.

class Greeter(...):

    def StreamingHi(self, request_iterator, context):
        for request in request_iterator:
            yield response

    async def AsyncStreamingHi(self, stream: helloworld_grpc_pb2.HiStream) -> None:
        while True:
            request = await stream.receive()
            if request.needs_respond:
                await stream.send(response)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the short term, we want the generated gRPC code to be one file, and can be used for both Python 2 and Python 3.

Why not generate async stubs in separate py3-only file? What will it break?

In that case, the typing info isn't included in the generated file

Mypy for both Python 2 and Python 3 supports typing via # type: comment, so either of methods can be typed.
Also, either of them supports putting python3-style typing into separate .pyi file.

It feels complicated to allow both StreamingHi and AsyncStreamingHi within single servicer implementation:

  1. How would typing work?
  2. How would grpc server determine which one to call?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misinterpret your first comment. I thought the snippet is what you are suggesting. What is your suggestion about servicer interface design? How can we help users migration from existing API to async API?

If we want to introduce strong typing, sure, the separate Python 3 only file is the only way. I will try to add a section in the gRFC.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an example of server's servicer stub in the pull request:
https://github.com/lidizheng/grpc-api-examples/blob/71955b583783be2f1f90233a07acb6a00b6a6632/Examples.md#option-2-2

Client should be symmetric, though Unified Stub Call can somehow complicate it (but let's discuss it in a separate thread).

If we want to introduce strong typing, sure, the separate Python 3 only file is the only way. I will try to add a section in the gRFC.

We can either generate separate .pyi files or put special comments with # type: prefix, either will work.


async def StreamingHi(self, stream: helloworld_grpc_pb2.HiStream) -> None:
pass
```
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL. typing.Protocol

logging.info('Connecting to scraper server [%s]', scraper_url)
# Create gRPC Channel
self._channel = grpc.insecure_channel(scraper_url)
self._stub = scraping_pb2_async_grpc.ScraperStub(self._channel)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncScraperStub?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note the async package.
While writing I've become more convinced that it's cleaner to separate sync and async stubs into separate packages due to issues outlined in Examples.md, so "option_three" example sticks to it

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like previous design of different invocation methods. I think the async API is our chance to unify them. My rational -> https://github.com/lidizheng/proposal/blob/grpc-python-async-api/L58-python-async-api.md#unified-stub-call

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would "Unified Stub Call" look like in typed world? Can you provide an example of generated typing for stubs?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support strong typing, it requires a separate file for Python 3. And for Python 2, existing generated code stays the same.

In Python 2, without typing, it doesn't matter

class GreeterStub(object):
  def __init__(self, channel):
    self.SayHello = channel.unary_unary(
        '/helloworld.Greeter/SayHello',
        request_serializer=helloworld__pb2.HelloRequest.SerializeToString,
        response_deserializer=helloworld__pb2.HelloReply.FromString,
        )

In Python 3, with typing, we can restrict the type of channel. self.SayHello is an instance of grpc.aio.UnaryUnaryMultiCallable class.

class GreeterStub(object):
  def __init__(self, channel: grpc.aio.Channel):
    self.SayHello = channel.unary_unary(
        '/helloworld.Greeter/SayHello',
        request_serializer=helloworld__pb2.HelloRequest.SerializeToString,
        response_deserializer=helloworld__pb2.HelloReply.FromString,
        )

Copy link
Owner

@lidizheng lidizheng Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about what you are asking for, can you be more straight forward?

To what I understand, my original question here is:

  1. Should we generate a Python 3 only file?
  2. Should we inject "async" method into MultiCallable class? Or other mechanism that allows one stub object can call both sync / async API.

The question 1 is resolved.

My standing on question 2 is that existing MultiCallable is over-complicated, and all the methods can be merged into one. And the usage of new API fits asyncio context. Also, for async stub, I don't think it should provide sync API.

Then, for the question 2, I raised the question about strong typing in MultiCallable class. Since you are familiar with gRPC Python, you can understand why that might be a problem. So, in previous comment, I am proposing the direction removing the MultiCallable abstraction layer.

# Generated Async Stub
class GreeterAsyncStub:
    def __init__(self, ...):
        pass

    async def SayHello(self,
                       request: helloworld_pb2.GreetingRequest,
                       timeout: int=None,
                       metadata: Sequence=None,
                       credentials: grpc.CallCredentials=None,
                       wait_for_ready: bool=None,
                       compression: grpc.Compression=None) -> Future[helloworld_pb2.GreetingResponse]:
        ...
        return grpc.aio.Call(...)

# grpc.aio.Call class
class Call(asyncio.Task, grpc.Context):
    async def initial_metadata(self) -> Sequence:
        pass

Copy link
Author

@euroelessar euroelessar Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let's consider the definition of SayHello from the snippet above:

async def SayHello(self, ...) -> Future[helloworld_pb2.GreetingResponse]:
  pass

call = stub.SayHello(request)
call.initial_metadata()  # <- type failure, `call` does not have "initial_metadata" method
call.time_remaining()  # <- type failure, unknown method

I guess changing it to smth like this should help (note removing async and returning Call object instead of asyncio.Future):

class GreeterAsyncStub:
    def __init__(self, ...):
        pass

    def SayHello(self,
                 request: helloworld_pb2.GreetingRequest,
                 timeout: int=None,
                 metadata: Sequence=None,
                 credentials: grpc.CallCredentials=None,
                 wait_for_ready: bool=None,
                 compression: grpc.Compression=None) -> grpc.aio.Call[helloworld_pb2.GreetingResponse]:
        ...
        return grpc.aio.Call(...)

# grpc.aio.Call class
T = TypeVar('T')
class Call(typing.Awaitable[T], grpc.Context):
  pass

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your version utilize the type annotation better. Looks like we have an agreement on the function signature ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, return type looks good.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about this @euroelessar

.. note removing async and returning Call object instead of asyncio.Future

I'm just in the implementation of the unified call object and I need to change the method from an async one to sync one, otherwise, the await used by the caller will be applied to the coroutine instead of the Call object.

My main concern with this is the friction that it would imply with the interceptors, since then by nature would need to be coroutines but would need to be added between the caller and the final call to the stub, which would not work.

In any case, I'm working on it and I will try to come up with something that works.

Copy link
Owner

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... typing.Protocol is a Python 3.8 feature???

Just realized that when I trying to execute some example snippet. It seems too early to consider using this feature, and the generated code won't be portable. This can be a feature request that adds an additional experimental generation mode for protoc plugin.

```Python
server = grpc.server()
server.add_insecure_port(':50051')
helloworld_pb2_grpc.add_AsyncGreeterServicer_to_server(Greeter(), server)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't express myself clearly with context.

In the short term, we want the generated gRPC code to be one file, and can be used for both Python 2 and Python 3. In that case, the typing info isn't included in the generated file, hence in the view of servicer, as long as the handler name is the same, it works no matter the implementation is async or sync.

In the long term, if we decided to change streaming API, then we should duplicate individual method as you suggested. So users can opt-in to the new API if they want to.

class Greeter(...):

    def StreamingHi(self, request_iterator, context):
        for request in request_iterator:
            yield response

    async def AsyncStreamingHi(self, stream: helloworld_grpc_pb2.HiStream) -> None:
        while True:
            request = await stream.receive()
            if request.needs_respond:
                await stream.send(response)

logging.info('Connecting to scraper server [%s]', scraper_url)
# Create gRPC Channel
self._channel = grpc.insecure_channel(scraper_url)
self._stub = scraping_pb2_async_grpc.ScraperStub(self._channel)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like previous design of different invocation methods. I think the async API is our chance to unify them. My rational -> https://github.com/lidizheng/proposal/blob/grpc-python-async-api/L58-python-async-api.md#unified-stub-call

Copy link
Author

@euroelessar euroelessar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... typing.Protocol is a Python 3.8 feature???

Ah, it is actually typing_extensions package, we're using it today with both Python 2.7 and Python 3.7, so it's portable.

```Python
server = grpc.server()
server.add_insecure_port(':50051')
helloworld_pb2_grpc.add_AsyncGreeterServicer_to_server(Greeter(), server)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the short term, we want the generated gRPC code to be one file, and can be used for both Python 2 and Python 3.

Why not generate async stubs in separate py3-only file? What will it break?

In that case, the typing info isn't included in the generated file

Mypy for both Python 2 and Python 3 supports typing via # type: comment, so either of methods can be typed.
Also, either of them supports putting python3-style typing into separate .pyi file.

It feels complicated to allow both StreamingHi and AsyncStreamingHi within single servicer implementation:

  1. How would typing work?
  2. How would grpc server determine which one to call?

logging.info('Connecting to scraper server [%s]', scraper_url)
# Create gRPC Channel
self._channel = grpc.insecure_channel(scraper_url)
self._stub = scraping_pb2_async_grpc.ScraperStub(self._channel)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would "Unified Stub Call" look like in typed world? Can you provide an example of generated typing for stubs?

Copy link
Owner

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input again!
@gnossen @pfreixes Can you also take a look at these examples? Which option do you think we should go after, especially for the streaming API?

```Python
server = grpc.server()
server.add_insecure_port(':50051')
helloworld_pb2_grpc.add_AsyncGreeterServicer_to_server(Greeter(), server)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misinterpret your first comment. I thought the snippet is what you are suggesting. What is your suggestion about servicer interface design? How can we help users migration from existing API to async API?

If we want to introduce strong typing, sure, the separate Python 3 only file is the only way. I will try to add a section in the gRFC.

logging.info('Connecting to scraper server [%s]', scraper_url)
# Create gRPC Channel
self._channel = grpc.insecure_channel(scraper_url)
self._stub = scraping_pb2_async_grpc.ScraperStub(self._channel)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support strong typing, it requires a separate file for Python 3. And for Python 2, existing generated code stays the same.

In Python 2, without typing, it doesn't matter

class GreeterStub(object):
  def __init__(self, channel):
    self.SayHello = channel.unary_unary(
        '/helloworld.Greeter/SayHello',
        request_serializer=helloworld__pb2.HelloRequest.SerializeToString,
        response_deserializer=helloworld__pb2.HelloReply.FromString,
        )

In Python 3, with typing, we can restrict the type of channel. self.SayHello is an instance of grpc.aio.UnaryUnaryMultiCallable class.

class GreeterStub(object):
  def __init__(self, channel: grpc.aio.Channel):
    self.SayHello = channel.unary_unary(
        '/helloworld.Greeter/SayHello',
        request_serializer=helloworld__pb2.HelloRequest.SerializeToString,
        response_deserializer=helloworld__pb2.HelloReply.FromString,
        )


```Python
ctx = grpc.Context() \
.with_timeout_secs(5.0) \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a datetime.timedelta object instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have a strong preference here, though datetime.timedelta is a bit more verbose.
Also datetime.datetime itself is harder to use for "deadline" side of the API, so it will make API non-symmetric for timeouts and deadlines.


async with grpc.insecure_channel('localhost:50051') as channel:
stub = helloworld_pb2_grpc.AsyncGreeterClient(channel)
response = await stub.Hi(ctx, helloworld_pb2.GreetRequest(...))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind making the context a non-optional positional argument. Surely there are sane defaults for all of these options.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the vast majority of cases the proper context is the one received as an argument for server handler.
For other cases either Context.BACKGROUND or Context.TODO should be used.

Using background/todo context should be explicit as missing context object has a downside of not propagating deadline/cancellation/tracing info, so reviewer should have an ability to catch it.

Copy link

@gnossen gnossen Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming Context.BACKGROUND is a thread-local/contextvar or something similar, why would that not be an appropriate choice for a default? It seems that you could get away with explicitly supplying the context only when hopping between threads/coroutines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread-local variables don't play nice with concurrency models, e.g. what is a behavior if one wants to create an intermediate context for creating custom tracing span?
If we allow users to specify/override thread-local context - we're facing with the same problems as regular thread locals (state is leaking across coroutines, not propagated to other threads, avoiding either of this problems requires both to carefully write the code and carefully review it).
If we don't allow users to do it - there is just no way to create custom intermediate spans.

Also having explicit Context has a nice benefit - it's clear if function does any i/o or not just from its definition based on the presence of context argument (though it may be not that important with asyncio due to async keyword).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally we've used an approach of making Context.TODO to use thread-local state, which can only be set by the interceptor-like layer. It makes it explicit when user relies on thread-local state and by itself can be a warning to code reviewer that there may be some other code changes required.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concerns of not doing the extra plumbing, so implicitly using the ctx available, are:

  • Opening the door to future bugs since not only user data but also gRPC data is implicitly inherited and used.
  • IMO Testing and reviewing process is harder, since the context is implicit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I echo with the concern raised by Pau. The semantic of this feature would need to be more carefully thought-out. It won't be a blocker for existing set of async API, but would potentially be a nice-to-have feature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @gnossen here, contextvars provides sane semantics and extensibility, so I'm fine with using it instead of explicit Context object and instead of keyword arguments.
It's common python3 concept so will not alienate with other libraries/frameworks, as well as will be understandable to engineers familiar with gRPC libraries from other languages (c++, golang, java, etc).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong but C++[1], Go[2] - and I would say in some way Java too - use explicit contexts, amb I wrong?

[1] https://github.com/grpc/grpc/blob/master/examples/cpp/helloworld/greeter_client.cc#L58
[2] https://github.com/grpc/grpc-go/blob/master/examples/helloworld/greeter_client/main.go#L53
[2]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++'s context is not for propagation. It's closer to the keyword arguments for configuring wait-for-ready and deadline. Java uses a scoped implicit immutable context (io.grpc.Context) via ThreadLocal. I've been told C++ uses explicit context, but I'm not sure where the API is and I'd generally consider that a mistake for C++ (although given how fragmented C++ is, there are some technical reasons implicit can be difficult). In Go it is wonderful that it is explicit, but you need the entire language ecosystem to get behind one context for that to work out. Without knowing too much about them, the implicit contextvars seems to be the Python equivalent.

The problem with explicit is it can be very difficult to plumb. For languages with lots of abstractions (e.g., Java), I'd consider explicit a death sentence to the feature. It'd be impossible to use in-practice. Go as an ecosystem has decided to plumb the context, which means it will be naturally plumbed through abstractions.


async def _async_parse_response(self, state: ScrapingState):
response = await self._streaming_call.receive()
if response is None:
Copy link

@gnossen gnossen Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a huge preference between returning a sentinel value, as you do here, and raising an exception. It certainly is nice that the whole block doesn't have to be indented. How would you feel about replacing None with something like grpc.EOF though? As a first-time viewer, it's not immediately obvious to me what None means in this context.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine as long as it doesn't hurt the typing (and special grpc.EOF should not, as Optional[X] itself is just an Union[X, None]).

```Python
class Greeter(object):
async def Hi(self,
ctx: grpc.Context,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you view the responsibilities of this Context object? Is it merely a configuration object or is it also used for carrying tracing information between remote machines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for propagating deadline, cancellation, tracing information, as well as scoped configuration (outgoing metadata, wait_for_ready, etc).

Most of this concepts (deadline/tracing) are not present in current python 2 grpc api and we internally have our own stubs to propagate them properly. Asyncio API feels like a good opportunity to fix it in upstream as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do like the idea of making context a first-class citizen of the API.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we will be having gRPC and none gRPC parameters living within the same object? how this object will achieve the requirements of storing well-known attributes, like deadlines, and at the same time giving full freedom to the user for storing other data like the tracing one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like mixing gRPC and non-gRPC parameters is not a concern if we use contextvars.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO use or not contextvars is not the point, most likely if we would use contextvars - which I have my objections - we wouldn't store the attributes in a flat way, so having everything bound to an object, like

ctx = grpc_context.get()
print(ctx.deadline)

Regarding the mixture of gRPC and none gRPC parameters, correct me if I'm wrong but the implementations provide a boundary between both worlds by asking the user of using the metadata API [1] [2]

[1] https://github.com/grpc/grpc-go/blob/master/examples/features/metadata/client/main.go#L46
[2] https://github.com/grpc/grpc/blob/master/src/cpp/client/client_context.cc#L99

```Python
ctx = grpc.Context() \
.with_timeout_secs(5.0) \
.append_outgoing_metadata('key', 'value')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for this builder pattern style over kwargs? It seems they'd accomplish the same thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpc.Context is supposed to be passed through method boundaries, each method implementation can augment it by creating new context, example:

class BarServicer(object):
  async def foo1(self, ctx: grpc.Context) -> foo_pb.Response:
    return await self.client.Foo(ctx.with_timeout_secs(5.0), foo_pb.Request())

  async def foo2(self, ctx: grpc.Context) -> foo_pb.Response:
    if 'request_id' not in ctx.inbound_metadata():
      ctx = ctx.append_outgoing_metadata('request_id', request_id.random_new())
    return await foo1(ctx)

  async def BarHandler(self, ctx: grpc.Context, req: bar_pb.Request) -> bar_pb.Response:
    return self.construct_bar_response_from_foo(await self.foo1(ctx))

It's designed after golang Context and provides roughly similar functionality.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you taken a look at https://docs.python.org/3/library/contextvars.html? Any thoughts on that approach versus this one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API didn't exist 2-3 years ago when were adding a notion of Context internally.

Looking at it today it can be viable, though I don't see any protections against mutating the same contextvars.Context object from concurrent asyncio co-routines within the same server call, which makes it dangerous/hard to troubleshoot.

For example:

  1. Server receives new rpc
  2. Handler calls 2 concurrent async methods and awaits for them using asyncio.gather
  3. Each of this concurrent calls somehow alters the context (e.g. to tighten deadline, add some outgoing metadata, etc) but they alter the same context so state leaks across co-routines
  4. Some engineer spends hours trying to troubleshoot why deadline is different or some metadata is wrong

With immutable append-only Context this situation is impossible as each co-routine will pass along its own Context object and its alteration will just create a new object and keep parent unmodified.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe contextvars handle the situation you described well. They are copy-on-write objects, with a concept of parents that can be and are restored by the asyncio event loop. Consider the following snippet:

import asyncio
import contextvars
import random


cv = contextvars.ContextVar('grpc.deadline')
cv.set(0)

async def task1():
    await asyncio.sleep(random.random())
    print(f"Task 1 deadline before: {cv.get()}")
    cv.set(1)
    await asyncio.sleep(random.random())
    print(f"Task 1 deadline after: {cv.get()}")

async def task2():
    await asyncio.sleep(random.random())
    print(f"Task 2 deadline before: {cv.get()}")
    cv.set(2)
    await asyncio.sleep(random.random())
    print(f"Task 2 deadline after: {cv.get()}")

async def task3():
    await asyncio.sleep(random.random())
    print(f"Task 3 deadline before: {cv.get()}")
    cv.set(3)
    await asyncio.sleep(random.random())
    print(f"Task 3 deadline after: {cv.get()}")

async def main():
    await asyncio.gather(
        task1(),
        task2(),
        task3(),
    )
    print(f"Final deadline: {cv.get()}")

asyncio.run(main())

The result is what you would hope:

Task 3 deadline before: 0
Task 3 deadline after: 3
Task 1 deadline before: 0
Task 2 deadline before: 0
Task 1 deadline after: 1
Task 2 deadline after: 2
Final deadline: 0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually looks promising.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the feeling that we are replacing entirely the traditional arguments for a whole context object. What's the rationale for doing so? Moving across gRPC services information/values that are also used by the gRPC library?

Also regarding the context by itself, leaving aside my first skeptical opinion on using it, I have two comments:

  • Requiring it explicitly as a parameter is IMO good idea, that would prevent having future bugs because the user didn't remember that the context was implicitly used.
  • Regarding the usage of the contextvars it would make sense IMO for giving access to request context to any piece of the code. But the user would be still forced to explicitly give the parameter when is required by a gRPC function.

Having the feeling that where you are saying context IMO it means request, where none gRPC data can be placed within an attribute of the request, for example request.user_data.

So this would bring us to have something like:

class BarServicer(object):
  async def foo1(self, request: grpc.Request) -> foo_pb.Response:
    foo_request = grpc.Request.from_request(request)
    return await self.client.Foo(foo_request.with_timeout_secs(5.0), foo_pb.Request())

Which IMO is another way of implementing the current pattern that uses arguments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gRPC has not concept of request-params, but it has a concept of context.
Please check gRPC API in other languages (golang, java, c++, etc), this object is always named context.

Having the feeling that we are replacing entirely the traditional arguments for a whole context object. What's the rationale for doing so? Moving across gRPC services information/values that are also used by the gRPC library?

There are few reasons:

  1. It matches semantics of gRPC implementation in all other languages
  2. It aligns better with the nature/usages of this properties, e.g. timeout itself has little value on per-method basis, usually services are written with deadline in mind, where deadline is a property of the parent context and engineers rely on its automatic propagation across the stack. Similar logic is applicable to tracing, outbound metadata, etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation, seems that going for something that makes the API more aligned with other languages is the way to go.

I have my objections about having an implicit context, I will join in the other discussion where this i s being discussed.

@gnossen
Copy link

gnossen commented Aug 21, 2019

@euroelessar Thank you for the PR! It's great to get feedback from someone else who's thought deeply about this.

Copy link
Author

@euroelessar euroelessar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's clarify the next steps?
It looks like we generally agree on:

  1. Stubs with explicit types;
  2. Usage of contextvars for timeouts/outbound metadata/etc instead of keyword args;
  3. Stream-object-based streaming API instead of iterator-based one;
    3.1. Don't introduce any notion of iterator-based streaming API.

Questions we did not resolve yet:

  1. How to intermix async and sync handler implementations within same server;
  2. How to intermix async and sync calls within same client channel.

logging.info('Connecting to scraper server [%s]', scraper_url)
# Create gRPC Channel
self._channel = grpc.insecure_channel(scraper_url)
self._stub = scraping_pb2_async_grpc.ScraperStub(self._channel)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, return type looks good.


async with grpc.insecure_channel('localhost:50051') as channel:
stub = helloworld_pb2_grpc.AsyncGreeterClient(channel)
response = await stub.Hi(ctx, helloworld_pb2.GreetRequest(...))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @gnossen here, contextvars provides sane semantics and extensibility, so I'm fine with using it instead of explicit Context object and instead of keyword arguments.
It's common python3 concept so will not alienate with other libraries/frameworks, as well as will be understandable to engineers familiar with gRPC libraries from other languages (c++, golang, java, etc).

```Python
class Greeter(object):
async def Hi(self,
ctx: grpc.Context,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like mixing gRPC and non-gRPC parameters is not a concern if we use contextvars.

```Python
ctx = grpc.Context() \
.with_timeout_secs(5.0) \
.append_outgoing_metadata('key', 'value')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gRPC has not concept of request-params, but it has a concept of context.
Please check gRPC API in other languages (golang, java, c++, etc), this object is always named context.

Having the feeling that we are replacing entirely the traditional arguments for a whole context object. What's the rationale for doing so? Moving across gRPC services information/values that are also used by the gRPC library?

There are few reasons:

  1. It matches semantics of gRPC implementation in all other languages
  2. It aligns better with the nature/usages of this properties, e.g. timeout itself has little value on per-method basis, usually services are written with deadline in mind, where deadline is a property of the parent context and engineers rely on its automatic propagation across the stack. Similar logic is applicable to tracing, outbound metadata, etc.

@lidizheng
Copy link
Owner

@euroelessar I was working on the prototype of the Async IO server for the past weeks. Hands-on coding and debugging makes me understand more about the problems. Let's start with your last comment.

It looks like we generally agree on:

  1. Stubs with explicit types;

Yep. This sounds great.

  1. Usage of contextvars for timeouts/outbound metadata/etc instead of keyword args;

I'm not in favor of adding a context concept. "Explicit is better than implicit." Users who want to setup similar config for a set of API can use stub.Hi(request, **config) to supply keyword args.

I would need a stronger use case to add an extra concept.

  1. Stream-object-based streaming API instead of iterator-based one;
    3.1. Don't introduce any notion of iterator-based streaming API.

For client side, I agree. Both request iterator and response iterator are against the logic flow of application. Especially the request iterator, to send request I need to write a generator somewhere above and implicitly interact with the received messages.

For server side, I felt even the example I wrote in this repo is too simple to make stream-object-based streaming API obviously superior than iterator-based. Can you provide an example that could demonstrate the benefit more clear? It would help me to convince others, since this will be a big change.

Also, asyncio coroutines allows us to yield from functions without creating another thread, which eliminates one of the drawbacks.

Questions we did not resolve yet:

  1. How to intermix async and sync handler implementations within same server;

I would still propose the same thing, async and sync handlers can be supplied to the same server, but the server will have to be grpc.aio.Server instead of the old one. We try hard to maintain the backward compatibility for the new server, but it might take time and iterations.

  1. How to intermix async and sync calls within same client channel.

For now, I would propose to only offer async stubs for grpc.aio.Channel. It is simple and reduces confusion.

Mechanically, non-async functions are not coroutine, they cannot pause and yield execution right to another coroutine. So, the sync stub will only be useful for multi-thread applications that want to gradually migrate to new API.

  1. We can provide a helper function to covert the async call into a future, and blocking wait for its result;
  2. Recommend users to create different Channel instances;
  3. Create two stub files one for async and one for sync.

Requiring users to change their code is okay to me, since Async IO is different from normal Python. However, option 3 creates additional files, and may touches users building tool chain. That might be painful.

What do you think?

@euroelessar
Copy link
Author

@lidizheng Responses are inline.

  1. Usage of contextvars for timeouts/outbound metadata/etc instead of keyword args;

I'm not in favor of adding a context concept. "Explicit is better than implicit." Users who want to setup similar config for a set of API can use stub.Hi(request, **config) to supply keyword args.

I would need a stronger use case to add an extra concept.

I'm not in favor of adding a context concept.

First of all there is already a concept of context object on server side. The suggestion is to extend it to client and make it more similar to other gRPC implementations. Also, this is consistent with Java/Golang/C++ APIs. Inline arguments are "unique" for Python stack and based on experience confuse engineers who have to work on multiple stacks at a time.

A few questions:

  1. How does one propagate deadlines and cancellations in this case?
  2. How do we ensure that engineers don't forget to propagate them if it's manual labor?
  3. How to propagate deadlines and cancellation across various 3rd-party/multi-owned libraries? It doesn't look like adding all variety of variables to all of the methods which may make gRPC calls would be an appealing option.

gRPC is adding new arguments over time (e.g. recent wait_for_ready) all stubs have to be updated for all types of calls (sync/async/unary/streaming/interceptors/etc). Interceptor modification is in fact breaks user code every time when new method is added.
Putting this argument in context allows gRPC to more easily do modifications and extensions of per-call options, as well as control propagation of this variables (e.g. deadline/cancellation).

For server side, I felt even the example I wrote in this repo is too simple to make stream-object-based streaming API obviously superior than iterator-based. Can you provide an example that could demonstrate the benefit more clear? It would help me to convince others, since this will be a big change.

Also, asyncio coroutines allows us to yield from functions without creating another thread, which eliminates one of the drawbacks.

  1. There is a value by itself in having consistent set of APIs in client and server side. Making them different will only increase confusion of engineers.
  2. It's easy to implement "iterator-based API" on top of stream-object one (e.g. using some decorators/helper functions). It's hard/impossible to correctly do it other way around.
  3. Imagine there is a server-streaming endpoint, how does one automatically cleanup any allocated resources using with or finally statements? If response streaming is an iterator (current sync api) there is (currently) no way to pass execution back to the handler in case of errors (cancellation, reset from client, etc) so resources are leaking. With stream-object API flow is cleaner - stream.send(...) call can raise an exception which can bubble up through the stack triggering cleanups.
  4. It's hard/impossible to do any work while execution is in async yield being blocked on push back from the client. Imagine the scenario when proxy needs to do a pipelining and wants to keep a buffer of 3-4 blobs in memory to ensure optimal performance and be only limited by client's push back.
  5. It is necessary in the general case of bidi-streaming RPCs to be able to have a pending read simultaneously with a pending write and resume execution when only one of the operations finish. In the response-as-generator design, once you write, you yield execution from that stack frame with no easy way to resume. If send and receive are simple async operations returning task objects, you can easily compose them and async yield till one of them is finished and resume execution.

At a high-level, generator-returning handlers look easy-to-use only for contrived examples where you are returning a sequence of values you already know. In practice, that is not a use-case for streaming RPCs (you do not really return an array of integers that way, but want to perform side-effects asynchronously and your logic is to respond appropriately by send/receiving whenever it makes sense). The "functional sequence" abstraction without regard for timing does not make sense for a service handler.

@ejona86 Can you chime in here with your grpc-java expertise, please?

Questions we did not resolve yet:

  1. How to intermix async and sync handler implementations within same server;

I would still propose the same thing, async and sync handlers can be supplied to the same server, but the server will have to be grpc.aio.Server instead of the old one. We try hard to maintain the backward compatibility for the new server, but it might take time and iterations.

This question is relevant to both server and channel:
What's a reason to have entirely new channel implementation instead of extending existing one?
Is it a simplicity of initial implementation or is it considered acceptable in to support two independent stack in the long run?

It looks like this increases longer-term development/support cost by cutting some corners today.

  1. How to intermix async and sync calls within same client channel.

For now, I would propose to only offer async stubs for grpc.aio.Channel. It is simple and reduces confusion.

Mechanically, non-async functions are not coroutine, they cannot pause and yield execution right to another coroutine. So, the sync stub will only be useful for multi-thread applications that want to gradually migrate to new API.

Yes, having two set of APIs is necessary for migration purposes, though realistically this migration will take years for many companies due to an amount of the codebase. It makes it important to support this use scenario well.

1. We can provide a helper function to covert the async call into a future, and blocking wait for its result;

If we go with a world of single grpc.Channel implementation which provides async & sync interfaces this step will not be necessary. Old sync methods can be gradually re-written on top of async ones to unify the implementation, while sharing the whole state (subconnections, lb, list of handlers, etc).

2. Recommend users to create different `Channel` instances;

This option is often not acceptable due to doubling of resource costs for unpredictable amount of time.

3. Create two stub files one for async and one for sync.

There are going to be two different stubs for sync and async API anyway (due to different signatures of methods), does this point talk about third stub API?

Requiring users to change their code is okay to me, since Async IO is different from normal Python. However, option 3 creates additional files, and may touches users building tool chain. That might be painful.

What do you think?

@pfreixes
Copy link

Regarding the context discussion I'm in favor of having them explicitly which helps us on having:

  • Alignment with other languages
  • Having functions like FromServerContext [1] would become easier to implement and use which makes the API more readable, with the current keyword argument implementation this kind of features would become more painful.

[1] https://github.com/grpc/grpc/blob/master/src/cpp/client/client_context.cc#L91

@gnossen
Copy link

gnossen commented Sep 12, 2019

Chiming back in on the context discussion. Let me start by clarifying that this context should be used only for attributes which should, by default, be propagated from client to server server to client. Namely,

  • timeout
  • cancellation status
  • tracing instrumentation (e.g. trace ID, span ID)

I'm in favor of an implicit option because we would like for this functionality to work by default without the application author having to do any extra work to make it happen. Let's suppose we didn't make this plumbing work without application author intervention. What would happen then?

A Case Study in Scarlet

In the most common scenario, the application author becomes frustrated that timeouts and cancellation don't work as expected. Sure, there's some opaque object called a "context" that they're obligated to decorate every usage of the gRPC API with, but they didn't actually know what it was for, and they certainly didn't know that they had to call an arcane function on the server side in order to retrieve the appropriate context from the client making the call. (Pau's equivalent of FromServerContext).

So they spend potentially hours learning what contexts are in the distributed systems sphere and realize they need to grab the context from the server in order to propagate it to upstream client calls made from the server. But then they realize that their timeouts and cancellations aren't making it through to their database calls, which continue running even after their RPC has been cancelled. It's eating up precious CPU cycles for nothing. They use a client library for their database that uses gRPC under the hood. All they have to do is manually pass the context along to the database client. But wait... the library author didn't include it in the API surface... Our intrepid hero now gives up completely because changing someone else's API just doesn't have priority for them.

Example

The alternative I have in mind looks more like this:

...
success = await stub.CreateUser(user_request, timeout=datetime.timedelta(seconds=5))
import grpcdb # DB client library using gRPC under the hood
...
class UserService:
    async def CreateUser(...):
        # request-derived context is automatically added to contextvars.
        if not validate_user(request.user):
            return user_pb2.Response(success=False)
        await grpcdb.CommitRecord(request.user)
        return user_pb2.Response(success=True)

As for the argument for uniformity with other languages, yes, Go uses an explicit context object. But it's a part of the larger Go ecosystem that many developers have collectively agreed to add to their API surfaces (though not in every case). Python developers have made no such collective agreement. Instead, the standard library offers an implicit context optimized specifically for coroutines -- contextvars. I say we use it.

Caveats

There are some cases in which you don't want your context propagated or you want to adopt a different context entirely (e.g. in the case of thread/coroutine hops). These cases are (thankfully) much less common. I believe manual intervention on the part of the application author is acceptable in these cases.

Divorcing a Coroutine from the Propagated Context

Suppose part of your RPC is kicking off a long uninterruptible process on a remote system. Once you start it, there's no going back, timeouts be damned. You should be able to choose not to propagate your timeout and cancellation status.

async def CreateUser(...):
    user = await grpcdb.GetUser(...)
    if is_existing_user(user):
        # Abort! You're creating too many clones!
    with grpc.divorce_context():
        await cloning_machine_stub.LiterallyCreateUser()

However, you wouldn't want this function to remove tracing instrumentation. Even if you don't want timeouts to be propagated, in all cases, you still want the upstream call to be associated with the original client call in your tracing tooling.

Adopting another Context

Suppose your RPC requires coordinating computations between multiple coroutines (threads are much less likely for Python because of the performance hit from the GIL). Propagation needs to happen in the direction of causality, which is a philosophical concept that really needs to be determined by a human judge. It just so happens that, very often, a single coroutine/thread basically corresponds to a single chain of causality (thread pools destroy this assumption, btw). In those cases, we should let a human determine causality.

In this case, we'd require that a context object be stored alongside the object representing the chain of causality. Consider this:

# Coroutine 1
context = contextvars.copy_context()
intercoroutine_queue.put((request, context))

# Coroutine 2
request, context = intercoroutine_queue.get()
def send_request():
    stub.MakeRequest(request)
context.run(run)

Edit: Made this example grpc-agnostic, since contextvars supports this out of the box.

@ejona86
Copy link

ejona86 commented Sep 12, 2019

Let me start by clarifying that this context should be used only for attributes which should, by default, be propagated from client to server.

@gnossen, I don't think that is right. Those things should be in metadata. The context is for things propagated from the server to the client. The things you listed still apply.

We use Metadata to propagate across a network from client to server. We use context (whatever the name is; however it works) to propagate within a process from server to client.

I strongly agree that context should not be used for things like wait-for-ready. Or put another way, the thing being used for cancellation/deadline/trace propagation should be a different thing than that used for wait-for-ready. Yes, both things may be called "context" by some people but they aren't the same thing and we shouldn't combine them together. I tend to call the propagation object a "context" and the configuration object a "call" or "options" or "config". (In Java, they are io.grpc.Context and io.grpc.CallOptions, respectively.)

@pfreixes
Copy link

Thanks for the detailed explanation @gnossen, the following one is a good point

They use a client library for their database that uses gRPC under the hood..

But I have some questions about having an implicit context:

1 - Not all of the data that belongs to the context might have to travel - implicitly - across all of the services. For example, some metadata might only target the server and should never be forwarded to downstream dependencies. Like a really bad scenario, this data might be Personal Information which is treated in the right way in the server but leaked in some way by the downstream dependencies.

How do we avoid this situation if the context is implicit? can we narrow the fields that will be inherited? I think this is related to what @gnossen said above

this context should be used only for attributes which should, by default, be propagated

How does it fit with the metadata field?

2 - I'm a bit lost with the tracing, is the tracing information a piece of custom data that is serialized as a regular metadata field or is there an ad-hoc field for that in the context object? @euroelessar could you evolve a bit what are the requirements and how the implicit context helps here for improving?

Also just for adding more context, in our organization tracing travel across microservices using regular HTTP metadata, and traverses API boundaries within the same application by exposing the tracing information as coroutine context. As an example take a look at the AWS X-Ray middleware and client instrument for Aiohttp.

Yes in that example the plumbing is done by the Application, and not by the framework.

3 - Is the most common pattern re-use the upstream timeout for calling the downstream dependencies? If yes, could you tell me what are the benefits of doing so instead of using a specific and independent deadline for each downstream call?

@gnossen
Copy link

gnossen commented Sep 17, 2019

Question 1
Metadata should not be added to the context by default. If metadata is used as a mechanism for transmitting data, it should be done in an explicit manner. In the case of timeout and cancellation, that will be gRPC's responsibility. In the case of tracing, it will be the responsibility of interceptors for that particular tracing system. I think Go's stance on what should go in context is great:

// Use context values only for request-scoped data that transits
// processes and API boundaries, not for passing optional parameters to
// functions.**

Question 2
I was (intentionally) a bit nebulous on this point. Tracing needs to be pluggable because there are so many different systems in use for it. To name a few:

They tend to use a model where tracing data is serialized in the metadata and deserialized and removed via interceptors on the other side. There should be some way to store arbitrary data in contextvars which can be serialized/deserialized with interceptors in a tracing system-dependent manner. I didn't do a full design in my comment above, but I imagine the flow would go something like this.

  • Process A's client-side interceptor reads tracing data from contextvars and serializes it into request metadata.
  • Process B's server-side interceptor reads and strips its metadata and injects it into contextvars.
  • Process B's client-side interceptor reads tracing data from contextvars and serializes it into request metadata
  • And so on...

At some point in the above sequence of steps, the tracing information is reported out-of-band to a tracing server somewhere in the system.

Question 3
Within Google, this is certainly the pattern. Though our API may use the term "timeout", it might make more sense to speak in terms of "deadlines." The deadline is the point in time after which the originator of the request (perhaps many hops from the most upstream server) either is no longer interested in the response or has given up on receiving one. After this point, any other computation done in service of that request anywhere in the entire distributed system is wasted effort. And so, even three hops away, computation pertaining to it should stop. If the server ignores the requested deadline from the originator and instead asserts its own opinion on the deadline, then either the server is ignoring the desires of the originator and aborting prematurely, or it's wasting resources after the client has decided to give up on the RPC.

@pfreixes
Copy link

pfreixes commented Sep 17, 2019

Hi @gnossen thanks for your response, responses to your comments inlined

Question 2

There should be some way to store arbitrary data in contextvars which can be serialized/deserialized with interceptors in a tracing system-dependent manner.

The question here, who is responsible for providing this interface? needs to be gRPC library or this is something that falls out into the application domain?

Having the feeling that this is the discrepancy point between your arguments - which I agree - and the ones expressed by @euroelessar - correct me if I'm wrong please. From what I understood @euroelessar is proposing that the interface of having a context with all of the gRPC data available - metadata too - at any place and inherited by default is responsibility of the gRPC library while what you are suggesting is the opposite - regardless if both use the same technology behind the scenes, aka contextvars - where is the application who needs to implement the plumbing for forwarding the trace information.

Question 3

I was more interested in the benefits considering that inheriting automatically the deadline might have some drawbacks, for example:

  • The client might issue a call without a deadline, it would mean that in practice for protecting the calls to your downstream dependencies - not being at mercy of the caller - you will be forced to double-check the deadline explicitly in your service. Which would make the implicit context useless.
  • Deadlines used for calling your service might not have a sense for using in your downstream dependencies, does it make sense reuse a 10s timeout for calling memory storage which has an SLO for the p99 of 10ms?

This seems that is automatically solved once the context is inherited explicitly, and having the feeling that Go took this in mind when they implemented the WithDeadline method, where the deadline is only inherited if it exists and it's lower than a explicit value.

TBH I'm still thinking that explicit is better than implicit. If it worked for Go why we can not replicate the same almost the same interface?

@gnossen
Copy link

gnossen commented Sep 17, 2019

Thanks for the response @pfreixes. I appreciate the specific use cases called out.

Question 2

proposing that the interface of having a context with all of the gRPC data available - metadata too - at any place and inherited by default is responsibility of the gRPC library

I can't agree with a proposal like that, for exactly the reasons that you pointed out earlier. I guess we need to reconcile between @euroelessar 's conception of implicit context and my own. I'll follow up with a PR on this repo with a full example.

Just to clarify my position on this, as @ejona86 pointed out above, context is really about propagating information from a server to a client living within the same process. The question of how that context is translated to bits on the wire should be left to middleware (i.e. interceptors). This broadens the capabilities of the system dramatically by making it completely pluggable.

The sole exceptions to this are timeout and cancellation status, which should be handled by gRPC Python directly.

Question 3

The client might issue a call without a deadline, it would mean that in practice for protecting the calls to your downstream dependencies - not being at mercy of the caller - you will be forced to double-check the deadline explicitly in your service. Which would make the implicit context useless.

I'm reading "for protecting calls to your downstream dependencies" as "ensuring that client RPCs do not monopolize server resources". That is, a client should not be able to set an infinite timeout and use 100% of a server's resources indefinitely. @pfreixes Please correct me if I'm wrong with that reading.

Servers themselves should be the ones to enforce time-limit and resource-limit policies, not clients. I do not consider this truly double checking. The two checks happen at different layers and for different reasons. A deadline check happens within the RPC layer, not the application layer. It is done to determine whether or not there is any point in continuing computation.

Authors of servers needing to impose resource constraints can additionally choose to fail an RPC after some set period of time or after a certain amount of resources are consumed. We actually have an example of that here. But this is a cross-cutting concern that requires changes to your application logic. It's not sufficient to simply hope that every client that calls you will set an appropriate deadline. The server must enforce it itself.

Deadlines used for calling your service might not have a sense for using in your downstream dependencies, does it make sense reuse a 10s timeout for calling memory storage which has an SLO for the p99 of 10ms?

Yes. I absolutely do think this is reasonable. As I made clear in my previous post, deadlines indicate the point in time after which the originator of the request is no longer expecting a response. That information is useful regardless of the performance capabilities of any upstream server.

Suppose your data path looks like this:

+-+  +-+  +-+
|A|->|B|->|C|
+-+  +-+  +-+

where A is the originator, B is an intermediary server, and C is a server with a 10ms SLO. Now, suppose C is in a failure mode where it is accepting connections but hanging indefinitely. In this case, it might make sense for B to override the deadline sent by A and degrade gracefully after some shorter period, say 500 ms. Implicit context does not preclude this possibility. It simply makes it non-default. I expect that this is the right choice, since I doubt the majority of microservices in the world have such tight SLOs.

But let's consider the downsides of that approach. Suppose C enters a failure mode where it responds after 600ms. B is now giving up after 500ms and degrading its response to A on every single call. Had it instead waited the 600ms, it could have given a full response in slightly longer than usual.

In either case, I can't help but wonder if deadlines are the wrong tool with this problem. Perhaps circuit breakers would be a better pattern?

This seems that is automatically solved once the context is inherited explicitly... If it worked for Go why we can not replicate almost the same interface?

All of the use cases that you have mentioned so far are also served by the implicit option I have outlined, but with a simpler, less surprising API, and turned on by default for all new users. What's more, explicitly propagated Golang Context objects are a standard from the Go language team. They work because the Go community has rallied around them and included them in their API surfaces, so that context may be propagated through applications' entire stacks.

Python has a similar construct chosen by the community: contextvars, which were created for the same purpose and have already been blessed by inclusion into the standard library. By forging our own path, I fear we'd end up with very few people actually being able to plumb context all the way through their application stack. This is a problem I've already faced on a previous project. We eventually just gave up on context because manually plumbing context through our entire system was just too much work to be able to justify. I'd prefer that everyone be able to enjoy the benefits of distributed context without having to move mountains.

It's also worth noting that implicit thread-local context is how gRPC C++ works (at least internally to Google).

A full context example PR is forthcoming.

@euroelessar
Copy link
Author

Question 1

Metadata should not be added to the context by default. If metadata is used as a mechanism for transmitting data, it should be done in an explicit manner.

It's important to differentiate inbound and outbound metadata. gRPC must not implicitly pass inbound metadata from server as outbound metadata to client.
However it's important to be able to check inbound metadata on server by interceptor, populate some context (e.g. outbound metadata or tracing-specific structure), and use this data from interceptor on client to populate outbound metadata.

It does worth to note:

  1. For some cases like request_id propagation, it's somehow more convenient to be able to add it to the outbound metadata directly by server's interceptor.
  2. There is a value in putting "inbound metadata" into standard gRPC context for the ease of consumption by applications/libraries. Please note that the separation between inbound/outbound metadata it implies that "inbound metadata" is never implicitly put on the wire by client.

It's also important to be able to propagate both tracing-related pieces & gRPC own cancellation/deadline bits via the same mechanism (e.g. contextvars), to ensure that they are in sync and consistent (e.g. to reduce the chance of engineer preserving deadline while moving execution to another thread, but forgetting to also preserve tracing/metadata bits).

Question 2
Tracing is usually a combination of server/client interceptors and some high-level API for creating intermediate spans.
Server interceptor consumes inbound metadata and populates context by some notion of "span".
Application code can create new intermediate spans to cover expensive logic (e.g. requests to non-grpc database, reading data from local disk, expensive computation) or for background tasks (e.g. detached from the main execution, like best-effort async transaction rollback on context cancellation).
Client interceptor consumes the latest "span" from the context and generates outbound metadata.

So said, it's responsibility of some tracing middleware to convert inbound metadata to span object to outbound metadata. And it's always some explicit action of user/middleware to add anything to outbound metadata.

Having the feeling that this is the discrepancy point between your arguments - which I agree - and the ones expressed by @euroelessar - correct me if I'm wrong please. From what I understood @euroelessar is proposing that the interface of having a context with all of the gRPC data available - metadata too - at any place and inherited by default is responsibility of the gRPC library while what you are suggesting is the opposite - regardless if both use the same technology behind the scenes, aka contextvars - where is the application who needs to implement the plumbing for forwarding the trace information.

I can't agree with a proposal like that, for exactly the reasons that you pointed out earlier. I guess we need to reconcile between @euroelessar 's conception of implicit context and my own. I'll follow up with a PR on this repo with a full example.

It feels acceptable if gRPC uses some standard context transport, e.g. contextvars.
In this case tracing/app-specific libraries can use contextvars as well, and it provides us a nice guarantee that app-specific and gRPC contexts are in sync and are propagated together (via standard contextvars interface).

What I'm suggesting is that gRPC's "context" contains not just deadline/cancellation, but also separate inbound/outbound metadata (without implicit promotion from inbound to outbound) for applications/libraries' convenience.

@gnossen
Copy link

gnossen commented Sep 17, 2019

@euroelessar It sounds like we're actually mostly aligned. There's just one point that I think we need to clarify:

What I'm suggesting is that gRPC's "context" contains not just deadline/cancellation, but also separate inbound/outbound metadata (without implicit promotion from inbound to outbound) for applications/libraries' convenience.

My proposal is that, with the exception of deadline and cancellation status, the context will be a dumb bag of data that is interpreted by pluggable middleware interceptors, whose job it is to connect incoming bits on the wire, to coroutine-local context, to outgoing bits on the wire:

+-------------+                       +------------------+                       +-------------+
|incoming bits+-+server interceptor+->+in-process context+-+client interceptor+->+outgoing bits|
+-------------+                       +------------------+                       +-------------+

This means that, while propagation of the context through the application process is implicit, serialization, and deserialization (and hence propagation to upstream services) is explicit (but generally handled by pre-written third-party middleware).

I can see both positives and negatives to "implicit promotion from inbound (metadata) to outbound".

Positives:

  • Proxied requests retain metadata. So in A->B->C where A and C are in Java, the Java middleware doesn't have to be rewritten in Python just for the (e.g.) tracing data to be preserved.

Negatives:

  • Metadata could be used to transmit personally identifiable information. Authors might intend for this metadata to terminate at the first hop after the originator of the request. If they're unaware of implicit propagation, this could result in something as disastrous as violating GDPR or other data protection legislation.
  • This behavior is not consistent with other gRPC implementation languages.

Looking at those two lists, I can't help but feel that the negatives outweigh the positives. @euroelessar Is there anything that I missed?

@euroelessar
Copy link
Author

@gnossen I don't think our reasoning is different here.

I agree that gRPC should never implicitly promote inbound metadata to outbound one.
Among other negative points there is also a chance of metadata to ever-grow, effectively ballooning over the limit (by default 8KB).

I'm just asking for:

  1. Inbound metadata to be available as "dumb bag of data" for consumption by middleware/application
  2. An ability to explicitly add new metadata to "dumb bag of data" for outbound metadata.

We can craft this bits internally in our infrastructure, so it's not a hard blocker, but it feels that it's beneficial in general for other users as well.

@pfreixes
Copy link

pfreixes commented Sep 17, 2019

Let me try to summarize and answer the different discussions that are open

Metadata and context

I agree with this

context is really about propagating information from a server to a client living within the same process

I guess the point here is, this supposed context provided by gRPC can be used at the same time by the application for moving stuff arround? for example, moving tracing information? Regarding this @euroelessar says:

I to ensure that they are in sync and consistent (e.g. to reduce the chance of engineer preserving deadline while moving execution to another thread, but forgetting to also preserve tracing/metadata bits).

So what you are suggesting is having a context providing the ability to save any arbitrary data besides the gRPC attributes like deadline and cancellation.

I guess that for writing this data into the metadata will be the responsibility of the interceptors which they will serialize them as new metadata items.

Having implicit deadlines

My concerns about having an implicit deadline @gnossen can be summarized in the following snippets, where a server handler that uses one downstream dependency explicitly checks if there is an active deadline - sent by the client of that server - for avoiding calling its dependency without any timeout, like:

class Server:
    async def foo(context, request):
        with context.WithTimeout(100ms) as new_context:
            client_stub.bar(new_context)

The WithTimeout checks if the context deadline value exists and its lower than now() + 100ms, if not the now() + 100ms is set. With this the call to be bar is always done using a timeout, not depending on whether the client was using one or not for calling the foo method.

With the implicit context, the previous snippet could be translated to something like:

from grpc import context

class Server:
    async def foo(request):
        with context.WithTimeout(100ms):
            client_stub.bar()

The difference between the first and the second might seem negligible but is not IMO, inheriting deadlines explicitly forces the developer to think about what deadline will be used while having it implicitly inherited might end up by making calls without any deadline.

I'm still considering that SLOs between services can differ enough for making the feature of having a default implicit deadline inherited a broken feature suboptimal solution.

Regarding the degraded state of the server that you were saying, what I would expect is having thee backpressure from the server C kicking in and rejecting some of the requests for maintaining the SLO for the none rejected requests. Yes eventually, in a fully degraded state Circuit Breakers will help the client to use an alternative server which at the same time will alleviate the one that is having issues.

Maybe all of this relates with your comment

Servers themselves should be the ones to enforce time-limit and resource-limit policies, not clients.

So it would mean that regardless of the deadline used by the client if the resources used by the server for calling a downstream dependency are critical would need to be explicitly restricted by the Server, by doing something like the second example:

from grpc import context

class Server:
    async def foo(request):
        with context.WithTimeout(100ms):
            client_stub.bar()

@gnossen
Copy link

gnossen commented Sep 18, 2019

@pfreixes Thanks for the response! Sorry to keep you up so late with long technical discussions. :)

The difference between the first and the second might seem negligible but is not IMO, inheriting deadlines explicitly forces the developer to think about what deadline will be used while having it implicitly inherited might end up by making calls without any deadline.

I don't disagree. But I think the argument against having to add an explicit context object to the surface of every library making use of gRPC is stronger.

I'm still considering that SLOs between services can differ enough for making the feature of having a default implicit deadline inherited a suboptimal solution.

Only tangentially related to the discussion of context, but FWIW, the timeout here is based on the SLO of the server and should therefore belong to the server's code or configuration, not the client's. grpclbv1 service configurations allow these to be published into DNS records so that this information doesn't have to be duplicated in every client.

I'll get a full context example up in the next day or two.

@lidizheng
Copy link
Owner

lidizheng commented Sep 18, 2019

@pfreixes By reading the comments, I think there might be a misunderstanding. The implicit context is not going to be directly used by our users. @gnossen and I had an offline discussion about this topic, and the goals of implicit context are:

  1. Supporting tracing library (we all agree it is a valid use case);
  2. Supporting deadline propagation along chain of requests.

Should we support deadline propagation? Deadline propagation alone is a useful feature, it saves server resources and helps flow control. If our library can offer that by default, do you think that's a good idea?

Explicit or implicit? I supported explicit context argument because I thought it is used to pass call options (e.g. wait_for_ready, compression, timeout) in batches. Turned out it is not, the context is in charge of passing informations across the boundary of servers. If the user want to explicitly set those information propagation themselves, it would be error-prone (forgetting one of the many clients). Especially when the actual gRPC call is hidden under another library, like connection to database, or using cloud client. In that case, I think implicit context is better as long as:

  • Users don't need to touch context to make it work;
  • Application logic has higher priority (e.g. setting timeout explicitly will override the propagated value);
  • Have good exception error string to indicate that the timeout is triggered by upstream server.

EDIT: Before we dive into the details of context, we need to bear in mind that this is a 3.7 feature. Many of our dependents may not want to accept that as their minimal version. That means this feature is very likely to be only enabled for 3.7-up users. It lowers the priority of this feature.

@pfreixes
Copy link

Or we might consider having Python 3.7 as a minimal version.

@euroelessar
Copy link
Author

It looks like we have a general agreement on the streaming API and the usage of implicit Context for deadline/cancellation propagation, and read access for inbound metadata.

@lidizheng Can the proposal be updated based on the prior discussion, please?

There are few unresolved topics:

1. CallOptions
What's the consensus on attributes which do not fit into implicitly-propagated context? For example, wait_for_ready.

Options are:

  1. Define separate CallOptions object to allow per-call configuration, and pass it as an optional argument to method stubs:
stub.Call(req, options=CallOptions(wait_for_ready=True))
  1. List individual options as separate kw arguments:
stub.Call(req, wait_for_ready=True)

Benefits of call options object:

  1. Adding new options does not break API of stubs (important in case of strong typing and mocking).
  2. Interceptors already have a notion of call options, so it's more consistent.

Downsides of call options object:

  1. Two extra words (options=CallOptions) while using it.

Do we agree on making it separate CallOptions object?

2. Outbound metadata
Should it be part of implicit context or does API only allow its modification through direct arguments to call & interceptors?

3. grpc.aio
What's a mid/long term plan for grpc-python?

Addition of isolated grpc.aio package implies that:

  1. There are two independent grpc-python implementations for unbounded period of time.
  2. There is no promise to provide type and behavior-compatible "sync" API on top of grpc.nio implementation.
  3. There is no supported migration flow from one grpc-python implementation to another.

The last point is of great interest to us as I'm not sure how to perform a graceful migration of our multi-million line codebase to another implementation under this conditions.

@lidizheng
Copy link
Owner

@euroelessar I have updated the proposal. PTAL at grpc/proposal#155.

CallOptions

Generally, I like keyword arguments, and the stub.call(..., options=some_options) can be stub.call(..., **some_options). But you made a good point that the interceptor already has this concept.

  1. Since we already have grpc.ClientCallDetails, do you think we can reuse this object for this purpose? Drawback is that the naming is a bit weird.
  2. We create a new CallOptions class and mark existing one as deprecated.

WDYT?

Outbound metadata

We provides API both on client-side and server-side to modify outbound metadata. To what I understand, the implicit context propagation will be limited to tracing metadata and deadline metadata.

grpc.aio

We do plan to support two stacks for a long period of time:

  • It guarantees backward compatibility for users who doesn't want to adopt asyncio.
  • It makes the new API opt-in, which actually ease the transition that you can opt-in parts by parts.
  • The two stacks underlying uses the latest Callback C-Core API which will is the future.

We tries hard to make the async API interface matches "sync" API, that if you strip away the type annotation, they should work just fine after adding "grpc.aio" prefix. Even mixing normal server handlers and async handlers.

In an ideal world, the async API should able to work with servers as soon as the "grpc.aio" prefix is added. For server, it is easier to execute handlers in another thread (only way to make it compatible). But for channel, it requires more boilerplate changes to await on the RPC.

Alternatively, if the application blocks inside an async function, many other components might break, and it will be quite challenging to debug. Async IO is great, but it is a different thread model than normal multi-threading application. If you plan to migrate, there is no simple way to upgrade and it just work magically.

Can you explain more about your ideal migration flow?

@euroelessar
Copy link
Author

CallOptions

Generally, I like keyword arguments, and the stub.call(..., options=some_options) can be stub.call(..., **some_options). But you made a good point that the interceptor already has this concept.

1. Since we already have `grpc.ClientCallDetails`, do you think we can reuse this object for this purpose? Drawback is that the naming is a bit weird.

2. We create a new `CallOptions` class and mark existing one as deprecated.

WDYT?

I'd probably vote for using new CallOptions.
We may also want to reconsider its fields, e.g. replace relative timeout by absolute deadline. Otherwise if an interceptor takes prolonged period of time (e.g. if it makes async request to some backend auth server) "timeout" field will start lying, which would pollute metrics/tracing/visibility/etc.

Having deadline also better matches underlying grpc-core API.

Outbound metadata

We provides API both on client-side and server-side to modify outbound metadata. To what I understand, the implicit context propagation will be limited to tracing metadata and deadline metadata.

Assuming there is an read-only API for accessing inbound metadata, having the API for "modifying outbound metadata" is good enough.

grpc.aio

We do plan to support two stacks for a long period of time:

* It guarantees backward compatibility for users who doesn't want to adopt `asyncio`.

* It makes the new API opt-in, which actually ease the transition that you can opt-in parts by parts.

* The two stacks underlying uses the latest Callback C-Core API which will is the future.

We tries hard to make the async API interface matches "sync" API, that if you strip away the type annotation, they should work just fine after adding "grpc.aio" prefix. Even mixing normal server handlers and async handlers.

Let me ask a few questions:

  1. If async server supports both async and sync handlers, why is there a need in sync server?
  2. Can sync server methods be implemented as loop.run_until_complete wrapper around asyncio one? Similar question for async channel.

In an ideal world, the async API should able to work with servers as soon as the "grpc.aio" prefix is added. For server, it is easier to execute handlers in another thread (only way to make it compatible). But for channel, it requires more boilerplate changes to await on the RPC.

Alternatively, if the application blocks inside an async function, many other components might break, and it will be quite challenging to debug. Async IO is great, but it is a different thread model than normal multi-threading application. If you plan to migrate, there is no simple way to upgrade and it just work magically.

Can you explain more about your ideal migration flow?

Let's talk about it for different cases:

Server migration
Individual server has tens of servicers with tens of methods each.
Some of this servicers tend to be owned by different teams (e.g. health, profiler, channelz, etc are part of every server). There can also be some shared lambda-like setups, where all servicers are owned by different teams, in that case number of servicers can easily reach hundreds per server.

All of this implies that individual servicers will have to be migrated to asyncio-based API on different cadence, likely over prolonged period of time (months/years).

Ideally this migration could be summarized like this:

import grpc
from dropbox.proto.service_foo.server.pb2 import add_FooServicer_to_grpc_server
# Note `async_server` module:
from dropbox.proto.service_boo.async_server.pb2 import add_BooServicer_to_grpc_server

# All sync handlers are executed in the same thread pool.
thread_pool = CustomThreadPoolExecutor(...)
server = grpc.server(thread_pool, ...)
port = server.add_secure_port("[::]:0", creds)

# FooImpl() is synchronous handler.
add_FooServicer_to_grpc_server(server, FooImpl())
# BooImpl() is asynchronous handler.
add_BooServicer_to_grpc_server(server, BooImpl())

loop.run_until_complete(server.async_serve())

Client migration
Client migration process is a victim of the same issues as server one.
Individual server will take many months to fully migrate to asyncio-based handlers, and it will have to server mixed traffic in the meantime.

Typical python gRPC server has tens/hundreds of different clients, and this clients are used from both sync and async server handlers. Cost of individual client is high (due to TLS, associated metrics, etc), so it's generally not acceptable cost-wise to double the number of underlying grpc channels (and subchannels).

All of this implies that individual grpc.Channel has to be able to be used from both sync and async contexts for the duration of migration. This can be summarized in the following snippet:

import grpc
# Eventually sync version will be removed, once all callsites are migrated.
from dropbox.proto.service_bar.client.pb2 import BarClient as SyncBarClient
from dropbox.proto.service_bar.async_client.pb2 import BarClient as AsyncBarClient

bar_channel = grpc.secure_channel(target, creds)
bar_sync_stub = SyncBarClient(channel)
bar_async_stub = AsyncBarClient(channel)

# FooImpl() is synchronous handler.
add_FooServicer_to_grpc_server(server, FooImpl(bar_stub=bar_sync_stub))
# BooImpl() is asynchronous handler.
add_BooServicer_to_grpc_server(server, BooImpl(bar_stub=bar_async_stub))

Common
Possibility of either of this migrations implies that there is a zero-risk/missing step of moving to the world when all current sync handlers/sync client stubs are served by async-compatible channel/server. And all "dangerous" migrations are done on case-by-case basis manually for individual servicers and are verified by tests and proper release process.

@lidizheng
Copy link
Owner

CallOptions

So, are we agree on calling the new API CallOptions and deprecate ClientCallDetails? Ideally, this is the explicit context object we have been discussed. I think it provides value for complex application, and can be used in calls and interceptors, which makes surface cleaner.

WDYT? @gnossen @pfreixes

grpc.aio

Sync handlers in grpc.aio.Server

If async server supports both async and sync handlers, why is there a need in sync server?

The sync server should be for compatibility only.

Ideally, we would like to remove the sync handlers from async server. But it will be quite difficult for large applications to migrate. So, we allow user to pass in sync handlers as temporary solution for the sake of migration.

The new async server interface allows us to change the surface API, and make async handlers the first class citizen while serving. The internal implementations of two will be quite different.

Can sync server methods be implemented as loop.run_until_complete wrapper around asyncio one? Similar question for async channel.

For simple cases, yes; for complex application, no. In server handlers, we have seen many blocking calls to various systems, and synchronizations. Each of them hold up entire thread, and cause the application to deadlock. So, unfortunately, the user still need to provide their ThreadPoolExecutor to make sync handlers working.

As for async channel, you mean simulating sync blocking call using run_tunil_complete, did I understand it correctly? If the user is calling from a coroutine, he or she shall see RuntimeError: Cannot run the event loop while another loop is running or RuntimeError: This event loop is already running. If the user is calling from a dedicated thread, then it can be part of the application logic.

Migrations

Thank you for providing snippets, and use cases.

Server migration

To me, the snippet you provided seems fits our current design. The port adding logic, and servicer adding logic stays the same. The only different should be make grpc.server to grpc.aio.server.

Channel migration

Underlying C-Core has subchannel re-use mechanism, and subchannel pooling. Python layer Channel object does not mean duplication of underlying TCP connections. Subchannel can be shared across Channels. Also, C-Core will smartly create new subchannel if it detects the channel argument is different.

So, the real cost here is the Python Channel wrapper objects in the application. I can expect interceptors need to be reimplemented to support async/await keywords. What will the peripheral objects around Channel be in your mind?

Overall strategy

In my mind, the automatic conversion between sync stack and async stack is challenging and error-prone. It would be better to convert at per-handler level or per-call level.

  1. Cosmetic changes to the channel and server creation;
  2. Adding async / await keywords to existing code by small baby steps;
  3. Implement new business logic in Async IO;
  4. Keep the application functional with any ratio of the two stacks.

WDYT? @euroelessar

@gnossen gnossen mentioned this pull request Oct 4, 2019
@euroelessar
Copy link
Author

grpc.aio

Sync handlers in grpc.aio.Server

If async server supports both async and sync handlers, why is there a need in sync server?

The sync server should be for compatibility only.

Ideally, we would like to remove the sync handlers from async server. But it will be quite difficult for large applications to migrate. So, we allow user to pass in sync handlers as temporary solution for the sake of migration.

Okay, so we agree that we need a server which is capable to serve both sync and async handlers.

The new async server interface allows us to change the surface API, and make async handlers the first class citizen while serving. The internal implementations of two will be quite different.

Why should an implementation of "sync handlers" in grpc.aio.server be different from one in grpc.server?
If they provide the same API and the same behavior, it's reasonable to share the implementation as well.

Can sync server methods be implemented as loop.run_until_complete wrapper around asyncio one? Similar question for async channel.

For simple cases, yes; for complex application, no. In server handlers, we have seen many blocking calls to various systems, and synchronizations. Each of them hold up entire thread, and cause the application to deadlock. So, unfortunately, the user still need to provide their ThreadPoolExecutor to make sync handlers working.

Oh, you're right. Yeah, we server-side we need ThreadPoolExecutor, but it should be trivial to build old-style sync API on top of thread pool & asyncio API.

As for async channel, you mean simulating sync blocking call using run_tunil_complete, did I understand it correctly? If the user is calling from a coroutine, he or she shall see RuntimeError: Cannot run the event loop while another loop is running or RuntimeError: This event loop is already running. If the user is calling from a dedicated thread, then it can be part of the application logic.

This sounds like a solvable technical issue. One of possible scenarios is to have a dedicated background event loop/thread pool, and communicate with it using concurrent futures. Performance wise it likely will be no worse than current grpc-python implementation.

Migrations

Thank you for providing snippets, and use cases.

Server migration

To me, the snippet you provided seems fits our current design. The port adding logic, and servicer adding logic stays the same. The only different should be make grpc.server to grpc.aio.server.

Is grpc.aio.server guaranteed to behave exactly the same as grpc.server for sync handlers?
If not, how are users supposed to replace one by another?
If yes, can grpc.server be a thin compatibility layer on top of grpc.aio.server?

Please note, this question is not about development process. The question is about the end goal.
It's okay if they are distinct originally during development and converge later (when we prove by rollout/tests/etc that they are the same from observable behavior point of view).

Channel migration

Underlying C-Core has subchannel re-use mechanism, and subchannel pooling. Python layer Channel object does not mean duplication of underlying TCP connections. Subchannel can be shared across Channels. Also, C-Core will smartly create new subchannel if it detects the channel argument is different.

So, the real cost here is the Python Channel wrapper objects in the application. I can expect interceptors need to be reimplemented to support async/await keywords. What will the peripheral objects around Channel be in your mind?

Subchannnel is not the only state or cost. While it's true that underlying tcp connection may be re-used (hopefully if all conditions are met), we still have distinct resolver and balancer instances.
Balancer and resolver are not free, as they usually imply 1-2 dedicated grpc streams, especially in context of XDS (which adds ≈3 grpc streams per each channel instance).
Which means that special care has to be taken to trick all the surrounding infrastructure and ecosystem that this two channels are still the same one.

Overall strategy

In my mind, the automatic conversion between sync stack and async stack is challenging and error-prone. It would be better to convert at per-handler level or per-call level.

1. Cosmetic changes to the channel and server creation;
2. Adding `async` / `await` keywords to existing code by small baby steps;

This two steps are only applicable if grpc.aio.{Channel,Server} are behavior-compatible with grpc.{Channel,Server} for synchronous stubs/handlers. Is it the case?

3. Implement new business logic in Async IO;

4. Keep the application functional with any ratio of the two stacks.

WDYT? @euroelessar

@lidizheng
Copy link
Owner

Several quick responses, and one important question.

Why should an implementation of "sync handlers" in grpc.aio.server be different from one in grpc.server?

grpc.aio.server allows us to introduce new API, and works with async interceptors. It also allows us to introduce proper Python 3 code.

Is grpc.aio.server guaranteed to behave exactly the same as grpc.server for sync handlers?

Yes. In our long term goal, yes.

Subchannnel is not the only state or cost. While it's true that underlying tcp connection may be re-used (hopefully if all conditions are met), we still have distinct resolver and balancer instances.
Balancer and resolver are not free, as they usually imply 1-2 dedicated grpc streams, especially in context of XDS (which adds ≈3 grpc streams per each channel instance).

The gRPC streams I assume you mean bi-di calls, so that are real costs that can't be avoided. My question is that can your resolver and balancer instances be shared across channels?

Resolvers and balancers inherently should be react to each individual incoming calls, and behavior should be the same if the calls come from the channels that has exactly same setting.

@pfreixes
Copy link

pfreixes commented Oct 5, 2019

So, are we agree on calling the new API CallOptions and deprecate ClientCallDetails? Ideally, this is the explicit context object we have been discussed. I think it provides value for complex application, and can be used in calls and interceptors, which makes surface cleaner.

@lidizheng No strong opinion on that, I'm ok on implementing a new object which would give us freedom for making it better.

For simple cases, yes; for complex application, no. In server handlers, we have seen many blocking calls to various systems, and synchronizations. Each of them hold up entire thread, and cause the application to deadlock. So, unfortunately, the user still need to provide their ThreadPoolExecutor to make sync handlers working.

As for async channel, you mean simulating sync blocking call using run_tunil_complete, did I understand it correctly? If the user is calling from a coroutine, he or she shall see RuntimeError: Cannot run the event loop while another loop is running or RuntimeError: This event loop is already running. If the user is calling from a dedicated thread, then it can be part of the application logic.

@euroelessar For having synchronous code running in another thread withn a process that is already running the asynchronous version of the gRPC without interfering with the asynchronous loop we would need first to address this issue [1], correct me if I'm wrong @lidizheng.

[1] grpc/grpc#19955

Underlying C-Core has subchannel re-use mechanism, and subchannel pooling. Python layer Channel object does not mean duplication of underlying TCP connections. Subchannel can be shared across Channels. Also, C-Core will smartly create new subchannel if it detects the channel argument is different.

Awesome, I dind't know that. @lidizheng do you have any documentation regarding this?

Individual server has tens of servicers with tens of methods each.
Some of this servicers tend to be owned by different teams (e.g. health, profiler, channelz, etc are part of every server). There can also be some shared lambda-like setups, where all servicers are owned by different teams, in that case number of servicers can easily reach hundreds per server.

@euroelessar it has surprised me a lot, I know that is a bit off-topic but I'm interested in how Dropbox is deploying its services. From what I can understand from your comment, the way that you are deploying the services - correct me if I'm wrong - is quite opposite of other organizations do - like mine, where teams have the ownership of the whole microservice where they usually have just a few endpoints. which theoretically it would make the transition - from sync to async - easier compared to have a big service running many endpoints implemented by other teams.

What's the rationale and benefits for using the model that you are telling?

@lidizheng
Copy link
Owner

@pfreixes

For having synchronous code running in another thread withn a process that is already running the asynchronous version of the gRPC without interfering with the asynchronous loop we would need first to address this issue [1], correct me if I'm wrong @lidizheng.

Yes, grpc/grpc#19955 will be a blocker.

@lidizheng do you have any documentation regarding this?

I could not find any open source documentation about this behavior, but instead I found the comments in the implementation of SubchannelWrapper helpful.

@euroelessar
Copy link
Author

@lidizheng Please see responses and one question inline.

Why should an implementation of "sync handlers" in grpc.aio.server be different from one in grpc.server?

grpc.aio.server allows us to introduce new API, and works with async interceptors. It also allows us to introduce proper Python 3 code.

Why would this statements contradict the possibility of sharing the implementation?

Is grpc.aio.server guaranteed to behave exactly the same as grpc.server for sync handlers?

Yes. In our long term goal, yes.

Moreover, this goal suggests to share the implementation, it looks like behavior-wise new grpc.aio.server is superset of old grpc.server one.

The gRPC streams I assume you mean bi-di calls, so that are real costs that can't be avoided. My question is that can your resolver and balancer instances be shared across channels?

Internal implementation and design of grpc-core implies that resolver/balancer state/instances are not shared across different channel objects. It looks like changing this semantics in the grpc-core is more complicated problem than providing both sync/async API in the same channel object in grpc-python.

Resolvers and balancers inherently should be react to each individual incoming calls, and behavior should be the same if the calls come from the channels that has exactly same setting.

In addition to per-call there is also per-channel logic. For example in centralized LB each channel has to receive individual subset of weighted endpoints from the LB backend, and at the same time periodically send load reports back.

In addition to that some other LB methods (e.g. least active requests-based ones) take number of active call to individual endpoints into account. If we split calls across two balancer objects - each one of them will have significantly less information to react on, and therefore the quality of the LB will degrade.

@euroelessar For having synchronous code running in another thread withn a process that is already running the asynchronous version of the gRPC without interfering with the asynchronous loop we would need first to address this issue [1], correct me if I'm wrong @lidizheng.

[1] grpc/grpc#19955

My suggestion is to build "sync" api on top of "asyncio" one with the following properties:

  1. Have single background thread with "asyncio" loop to power all "sync" stubs.
  2. Make each sync_stub.Call(...) call a thing wrapper around async one, e.g. smth like this:
def sync_unary_unary_adapter(async_call: Callable[[], UnaryUnaryCall[Response]]) -> Response
  done = threading.Event()
  result: List[Response] = []
  global_sync_loop().call_soon_threadsafe(lambda: sync_unary_unary_adapter_helper(async_call, result))
  done.wait()
  return result[0]

def Call(self, request: Request) -> Response:
  return sync_unary_unary_adapter(lambda: self._async_stub.Call(request))

In this case there is no "sync" implementation anymore, only "async" one. Calling "sync" handlers within existing asyncio loop will block current loop until the call is done (which is expected) but will not depend on the loop itself, and all i/o execution will happen in some another background one.

@euroelessar it has surprised me a lot, I know that is a bit off-topic but I'm interested in how Dropbox is deploying its services. From what I can understand from your comment, the way that you are deploying the services - correct me if I'm wrong - is quite opposite of other organizations do - like mine, where teams have the ownership of the whole microservice where they usually have just a few endpoints. which theoretically it would make the transition - from sync to async - easier compared to have a big service running many endpoints implemented by other teams.

What's the rationale and benefits for using the model that you are telling?

We use hybrid approach, there are some smallish services here and there, as well as some big ones.
Having big services allows us to decouple operational expertise from feature development, which by itself allows us to achieve better service reliability and resource utilization while maintaining high development velocity.

@lidizheng
Copy link
Owner

lidizheng commented Oct 10, 2019

@euroelessar

Two Stacks or One Stack

Why would this statements contradict the possibility of sharing the implementation?
Moreover, this goal suggests to share the implementation, it looks like behavior-wise new grpc.aio.server is superset of old grpc.server one.

Yes, achieving backward compatibility with sync handlers is our final goal. But it is difficult to get it right while developing new features.

Async API implementation is a large project. To reach feature parity with existing API, there are significant amount of work to do. For me, I prefer to keep shipping features to the new API, and welcome users to dog food it (after it graduate from "experimental" folder).

Duplicated Resolver / Balancer

Internal implementation and design of grpc-core implies that resolver/balancer state/instances are not shared across different channel objects. It looks like changing this semantics in the grpc-core is more complicated problem than providing both sync/async API in the same channel object in grpc-python.

In addition to per-call there is also per-channel logic. For example in centralized LB each channel has to receive individual subset of weighted endpoints from the LB backend, and at the same time periodically send load reports back.

In addition to that some other LB methods (e.g. least active requests-based ones) take number of active call to individual endpoints into account. If we split calls across two balancer objects - each one of them will have significantly less information to react on, and therefore the quality of the LB will degrade.

These are great points. TIL. The actual details are persuasive. I'm on board with the idea that there should be only one Channel object.

Combining with the last section in your comment, you are using hybrid approach. For smaller services, maybe it can be migrated altogether to avoid using two Channel object. For larger services, since the two Channel object is not an option, the services might need to wait until the Async API to be mature enough.

Even with one implementation (no "aio" prefix), it still need to wait for the Async API to be properly inject into current implementation, which might be slower if we took that path.

Sync API Wrapper

Sync API wrapper is discussed before. There are two reasons we didn't think it is the best solution:

  • gRPC Python needs to work for users not using AsyncIO, so we need to keep the old stack in this way (there will be 3 stacks then);
  • The amount of work for the sync API wrapper is not trivial. Making the async calls block is not that hard, but that requires large volumes of transition and mapping code.

@euroelessar
Copy link
Author

Two Stacks or One Stack

Yes, achieving backward compatibility with sync handlers is our final goal. But it is difficult to get it right while developing new features.

Async API implementation is a large project. To reach feature parity with existing API, there are significant amount of work to do. For me, I prefer to keep shipping features to the new API, and welcome users to dog food it (after it graduate from "experimental" folder).

I agree that it's at least non-trivial to provide correct behavior-compliant support for sync server handlers/channel stubs at day 1. My concerns and questions are about the longer-term goal.

If the proposal says something different from "we will have 2 different implementations forever" - it would suite us.

Duplicated Resolver / Balancer

These are great points. TIL. The actual details are persuasive. I'm on board with the idea that there should be only one Channel object.

Thanks.

Combining with the last section in your comment, you are using hybrid approach. For smaller services, maybe it can be migrated altogether to avoid using two Channel object. For larger services, since the two Channel object is not an option, the services might need to wait until the Async API to be mature enough.

If small projects can afford the migration in one step - they are lucky and totally should do it.

Even with one implementation (no "aio" prefix), it still need to wait for the Async API to be properly inject into current implementation, which might be slower if we took that path.

Can we iterate on Async API inside experimental package to ship something early & merge into main one after reaching some level of maturity?

Sync API Wrapper

Sync API wrapper is discussed before. There are two reasons we didn't think it is the best solution:

* gRPC Python needs to work for users not using AsyncIO, so we need to keep the old stack in this way (there will be 3 stacks then);

Are we talking about Py2 users? All Py3 users should have AsyncIO stack and it should not hurt to have a background thread with asyncio loop in it (assuming the observable behavior is the same, which was already stated as the goal in the thread above). It feels strongly better compared to current approach with a thread or two per each channel.

* The amount of work for the sync API wrapper is not trivial. Making the async calls block is not that hard, but that requires large volumes of transition and mapping code.

Few questions for either of worlds:

  • Is it more or less code compared to actual python implementation on top of grpc-core?
  • How much logic is duplicated between sync/async implementation?
  • How often would we need to adjust both of the implementations on bugs/grpc-core changes?
  • How much mental overhead is added during troubleshooting? (we add an extra question: "do you use async or sync implementation?" in addition to "do you use sync or async stubs?")

Also please note, that the suggestion is not about just adding new logic, it's about replacing one logic by another. It may not sound optimal in the shorter term or during an original iteration on the implementation, but it should save resources in the longer run.
Moreover it's totally fine to avoid/bypass some chunks of the logic (e.g. sync stubs on top of experimental async channels) during original implementation if it's stated as one of the goals/invariants in the proposal.

@lidizheng
Copy link
Owner

@euroelessar

Duplicated Resolver / Balancer

Can we iterate on Async API inside experimental package to ship something early & merge into main one after reaching some level of maturity?

Does this mean you are willing to experiment with new API? It is still in experimental stage, and as we discussing in another thread (gRFC) the API might change. If that is acceptable, the folder is grpc/experimental/aio. Current progress is bare-bone implementation for channel unary-unary is done, and server-side is under review at grpc/grpc#20598.

Sync API Wrapper

Are we talking about Py2 users?

I'm talking about several user groups:

  • Users of other asynchronous library users, who uses gevent, twisted or some other magic stuff.
  • For Py3 users, some of them might not want to introduce AsyncIO into their application, since it took over entire thread and the event loop run needs to be the entrance of their application.

Is it more or less code compared to actual python implementation on top of grpc-core?

Counting AsyncIO implementation, yes, it is more code.

How much logic is duplicated between sync/async implementation?

The sync-over-async stack will have different challenges than sync stack and async stack. Like making async calls block, or convert reader / writer into iterators. I think it would be non-trivial work.

How often would we need to adjust both of the implementations on bugs/grpc-core changes?

I can speak for the past year that C-Core has been quire stable, and didn't propagate obscure bugs to Python layer. If it keeps stable, the frequency will not be high.

How much mental overhead is added during troubleshooting? (we add an extra question: "do you use async or sync implementation?" in addition to "do you use sync or async stubs?")

I am not sure about the "troubleshooting" you are refer to. But if you mean "troubleshooting" during development for sync-over-async stack, I think it is not trivial. Majorly due to the introduction of a new stack.

Imagine a new gRPC Python user, and he/she posted a failure on GitHub. We have to ask about which stack they are using. Even if the problem is solved, we need to check if the same bug affects the other two stacks.

I think the core conflict here is whether we should build the wrapper or make aio API compatible with existing code. Essentially they are the same, the former approach makes the wrapper compatible with AsyncIO, and the later approach makes the Aio API compatible with sync code.

I personally vote for the later one, because:

  1. It helps users to realize that they are using new API during our development phase (which may last a long time);
  2. It helps maintainers of gRPC Python to prioritize the AsyncIO code path instead of existing code path if trade-offs are needed;
  3. If the project went perfectly, the only migration work is typing "grpc.aio" instead of "grpc" which is acceptable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants