Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for gRPC protocol #441

Closed
robingustafsson opened this issue Jan 5, 2018 · 34 comments · Fixed by #1623
Closed

Support for gRPC protocol #441

robingustafsson opened this issue Jan 5, 2018 · 34 comments · Fixed by #1623

Comments

@robingustafsson
Copy link
Member

robingustafsson commented Jan 5, 2018

Will happily take suggestions for what a gRPC JS API should look like. I guess https://grpc.io/docs/tutorials/basic/node.html and https://github.com/grpc/grpc-node would be good starting points.


[Added on May 29th, 2019]

To enable testing of more parts of modern software systems, microservices to be more specific, k6 needs to support gRPC. The implementation should support both simple RPC (request/response calls, "part/iteration 1") as well as streaming (client-side, server-side and bi-directional, "part/iteration 2").

Authentication

The implementation should implement the following authentication mechanisms:

Transport:

  • Insecure: no authentication
  • TLS: make use of the APIs implemented as part of the PKI crypto issue for loading keys.

Per-RPC:

  • Google OAuth2

Request/Response RPC

The expected JS API would look something like this for the request/response RPC part:

import grpc from “k6/grpc”;

let proto = open("EchoService.proto");
let client = new grpc.Client(proto, {server: "localhost:50051", credentials: {...}});

export default function() {
    let res = client.ping({ message: "Hello gRPC World!" });
    check(res, {
        "is successful": (r) => r.status === grpc.STATUS_OK
    });
}

Additional changes

This would require the following changes to k6:

  • Add support for a new protocol “grpc” as a JS module “k6/grpc”
  • Add support for metrics tracking of key gRPC statistics:
    • DNS lookup, Connect, TLS Handshake, Latency & Response time: Trend
      • Tags:
        • "service": the name of the service called
        • "method": the service method called
        • "rpc_type": the type of RPC call (one of "simple", "server-streaming", "client-streaming" or "bi-streaming")
        • "status": the response status code of a RPC call
        • "request_message": the name of the request message
        • "response_message": the name of the response message
    • Requests/second: Rate
    • Active streams: Gauge
  • The response object of a RPC call should contain the following properties:
    • "service": the name of the service called
    • "method": the service method called
    • "rpc_type": the type of RPC call (one of "simple", "server-streaming", "client-streaming" or "bi-streaming")
    • "status": the response status code of a RPC call
    • "message": the response message as a JS Object
    • "request.message": the request message as a JS Object

Resources

@KaoruDev
Copy link

Sorry newb question:
Is the feature to bake gRPC functionality to the k6 module? Or is it to add documentation on how to accomplish it or does this involve more work?

@na--
Copy link
Member

na-- commented May 13, 2018

Not at the moment, @KaoruDev, but we're planning to add gRPC support in the future and this is the tracking issue for that effort.

@KaoruDev
Copy link

Sorry, maybe I wasn't clear...what is actually involved with adding gRPC support?

@na--
Copy link
Member

na-- commented May 13, 2018

Ah, my mistake, I misread your question, sorry!

Adding support for gRPC (or for other protocols) would generally involve two related things:

  1. Deciding what the JavaScript API would be (that's what the original message in this issue alluded to)
  2. Writing the actual implementation in Go and exposing it to the goja JavaScript runtime. You can see the code of the currently exposed modules here.

It's also very important to measure the different relevant metrics (response times, data sent/received, etc.) when implementing a new protocol and to emit the measurements so the different collectors can work with them. For example, this is used for measuring the different phases of an HTTP request.

@seime
Copy link

seime commented Jan 18, 2019

Any updates about adding gRPC support?

@na--
Copy link
Member

na-- commented Jan 20, 2019

@seime we still haven't made any substantial progress on this issue. For now, to be able to fully implement gRPC support, it seems like we should first add actual event loops in k6 (see #882). Otherwise we'd have to settle with either partial support, or suboptimal support with a localized event loop, like how the websockets are currently implemented.

@seguidor777
Copy link

I would like this feature too, is there any update?

@na--
Copy link
Member

na-- commented Jul 5, 2019

We (or anyone else, to the best of our knowledge) still haven't started working on this, sorry. But we sort of have a plan about how we'll proceed. We will likely implement gRPC support in two steps:

  • First, we'll implement basic support for dealing with gRPC and the synchronous parts of the protocol. Async operations will either be completely unsupported, or they will be awkward, due to the lack of global event loops in k6 (Global JS event loops in every VU #882). This would hopefully happen in the next few months.
  • Then, we'll work on global event loops in VUs, after which we'll implement proper support for the rest of the gRPC features.

@0UserName
Copy link

Is there any progress? It's seems that all linked issues have status Open.

@na--
Copy link
Member

na-- commented Dec 13, 2019

@0UserName not yet 😞

@na-- na-- added the high prio label Dec 13, 2019
@0UserName
Copy link

0UserName commented Dec 14, 2019

@na-- , do you have a deadline for solving the problem or a date when you planned to start solving it? Event loop and sub tasks.

@na--
Copy link
Member

na-- commented Dec 16, 2019

@0UserName, we don't have a "deadline", but we roughly order issues according to their perceived priority, and gRPC is one of the very top items in the priority list... We just released k6 v0.26.0 and our current priority is finishing the new executors (#1007), but gRPC would be one of the items we tackle right after that, probably very early next year. I can't give any promises regarding the exact timing though 😞

We'd be willing to accept PRs for this if someone can implement it faster, they should just discuss their implementation approach in this issue first, before starting to actually implement it...

@0UserName
Copy link

0UserName commented Dec 16, 2019

We'd be willing to accept PRs for this if someone can implement it faster, they should just discuss their implementation approach in this issue first, before starting to actually implement it...

@na--, I thought about it :D, but I'm not sure of my skills to do it...

As I understand it, there will be a new gRPC module that manages connections under the hood: Go modules for presenting requests and response functionality. These stuff need to be mapped to some JavaScript APIs through a Goja implementation. Here the event loop should be implemented, probably some generic implementation for all modules (through inheritance or composition), i.e as a result, we should get something similar to loop for WS? But, I do not really understand what is next, and I'm not sure that the understanding above is correct.

@na--
Copy link
Member

na-- commented Dec 18, 2019

@0UserName, you've mentioned some of the main things, but I'll try to give a broader explanation than my initial one in #441 (comment):

  1. We need to decide how the user-facing JavaScript API for gRPC load testing should look like. The actual Go code would probably be in some new grpc or proto/grpc package, but the exposed JS API would be tricky:
    • One of the chief causes of the complication is that k6 would need some JS API that takes in a gRPC protobuf service definition and returns some JS object (or something) that you can use in the script to call particular services. This needs to be reasonably efficient, and some research into the available Go libraries would be needed (@robingustafsson linked https://github.com/jhump/protoreflect in the issue description, but there might be better ones).
    • Another reason is the need for asynchronous support. Even if we decide to initially only implement synchronous service calls and deal with event loops later (Support for gRPC protocol #441 (comment)), we still need to consider how we'll handle them in the API.
    • We want to learn from the mistakes we've made with the current k6 http API (issue about those hopefully coming soon) and avoid the problems of having global-only configuration for some things. We're aiming to have fully configurable and composable abstractions (for advanced use cases), but sensible and easy-to-use defaults that cover most cases. What that looks like when load testing gRPC I still have no idea...
  2. We need to measure everything gRPC-related, since after all, that's a big part of what k6 is. That means figuring out the important metrics and system metric tags and how to correctly and reliably measure them with whatever Go gRPC library we end up choosing to use.
  3. As you've mentioned, proper event loops (Global JS event loops in every VU #882) might be necessary for asynchronous gRPC support. We're still not completely sure we can implement that separately, at a future point, but we'd like to try, so we can at least have some gRPC support ASAP.

@na--
Copy link
Member

na-- commented Mar 3, 2020

Something that might affect this issue: https://blog.golang.org/a-new-go-api-for-protocol-buffers

@rajanadar
Copy link

Hello k6 team, can you please provide an update on this? Last real update was 7 months ago.

@mstoykov
Copy link
Contributor

There is no update - we have worked on v0.27.0 for most of the past 7 months.

Also again this is somewhat blocked on k6 having an event loop which given the current changes in goja (the JS VM k6 uses) is likely to become easier (and more standard).

As far as I know, there isn't any work on this planned for the next version - see milestone v0.28.0. We still intend on adding it, but given that this will likely be a pretty big change and will take a while, we prefer to catch up on some easy/small issues after v0.27.0 and also waiting a bit and fixing internals so this will land in a better state.

You are of course welcome to make PRs, or (better) discuss what you want to see as functionality and API.

The plugin support for example (planned for v0.28.0) is very likely to be used for early stages of testing it.

@rogchap
Copy link
Contributor

rogchap commented Sep 3, 2020

I've done a lot of work in this area so I've had a quick stab at adding gRPC support to k6; there is still quite a lot of work to do, but thought it would be good to get early feedback on the JS API.

I'm not sure if I should be adding state to the main k6/grpc module; is this a no-no? Should I be creating a NewClient() for example?

This is what I have so far:

import grpc from 'k6/grpc';

export function setup() {
    grpc.load([/*any proto import paths*/], "samples/grpc_server/route_guide.proto" /*, "file2.proto", "file3.proto", etc */)
    grpc.connect("localhost:10000")
}

export default function() {
    resp := grpc.invokeRPC("main.RouteGuide/GetFeature", {
        latitude: 410248224,
        longitude: -747127767
    })
    console.log(resp)
}

My plan was to make grpc.connect() a blocking call that would return errors on any connection errors like TLS errors etc.
Would also be straight forward to add support for the Reflection API, so that the connect could then get all the RPC schema from the server without the grpc.load() function (if supported by the server).

Ps. The event loop work would be great for gRPC streaming, but I think unary request will/can work as-is.

My fork can be viewed here (very messy right now; just a quick start) master...SafetyCulture:grpc

@na--
Copy link
Member

na-- commented Sep 3, 2020

Hey, @rogchap, thanks for working on this! 🎉

I'm not sure if I should be adding state to the main k6/grpc module; is this a no-no? Should I be creating a NewClient() for example?

Yes, it's a no-no 😅 We've had a lot of issues with per-module state and global options before... Such design choices in the current k6 HTTP API have seriously tied our hands in multiple ways, so much so that we plan to make a completely new composable HTTP API soon™️. But this will be even worse for gRPC, so yes, definitely avoid everything global, if at all possible. Not to mention, we should try to avoid concurrency issues and races with such shared state, given that VUs run in parallel. A NewClient() function that returns an object with connect() and the proto-dervived methods, or something like that seems a good design.

I actually haven't used gRPC from within JS, only from Go, so I don't know how an idiomatic JS gRPC API would look like... Please share if you have some ideas on the topic!

My plan was to make grpc.connect() a blocking call that would return errors on any connection errors like TLS errors etc.

This seems fine, as long as connect() is not a global function, as mentioned above.

Would also be straight forward to add support for the Reflection API, so that the connect could then get all the RPC schema from the server without the grpc.load() function (if supported by the server).

I have to admit I didn't even know about this gRPC feature 😊 But it'd be awesome if we don't require users to have .proto files to load test their services!

Ps. The event loop work would be great for gRPC streaming, but I think unary request will/can work as-is.

Yes, unary requests are all that we aim for, right now, until we have event loops.

@rogchap
Copy link
Contributor

rogchap commented Sep 4, 2020

Thanks for the feedback @na--

So I refactored to this:

import grpc from 'k6/grpc';

let client = grpc.newClient();
client.load([], "samples/grpc_server/route_guide.proto")
client.connect("localhost:10000"))

export default function() {
    const resp = client.invokeRPC("main.RouteGuide/GetFeature", {
        latitude: 410248224,
        longitude: -747127767
    })
    console.log(resp)
}

This was all fine until I started to add stats for the request due to the state being nil in the Context 😞

When I do grpc.newClient() I save the reference to the Context; when I call client.invokeRPC I was hoping that the Context would now have state added to the Context, but it's not.

If I move all the code into the default vu code, then everything works as expected; but this means the load and connect will be part of each iteration; which is not ideal. In my sample code I only have one proto file, but my production system has 100s of protos that need parsing, and would be a waste to do this on every iteration.
One possible solution would be to make the load and connect a singleton; so you only get the "hit" on the first iteration, but that's not ideal either.

Is there a better way to handle the Context from the init? or is that a no-no too? Any advise would be appreciated.

Ps. If there is a better forum to have theses discussions, let me know.

@mstoykov
Copy link
Contributor

mstoykov commented Sep 4, 2020

First - thanks for your work @rogchap 🎉

I do think this is probably the best place given that it will be nice to have a permanent an easily accessible record on why things are the way they are - we sometimes for example (me specifically) don't actually argument why certain feature exist or why some decision (sometimes not obvious ones) were made.

Because of ... security and distribution reasons there are a couple of things that need to be ... true, original explanation:

  1. any opening of external files must happen in the init context (this includes the load in your case). The biggest reason for that is that if it doesn't happen there, there is no easy way to know which files are required for the script. Also, it is in general not great to do IO operations on each iteration ;)
  2. emitting metrics in the init context ... probably doesn't work architecturally currently, or if it does it has problems. And as mentioned in the comment above it will mean that generating metrics can happen outside the one place where it must happen - the default function (or any call from it).

Given these two things I would think that it will be required that:

  1. load is only doable in the init context and should return an object that can then be provided to the client.
  2. I would expect that connect will reconnect on each iteration, if this must include the parsing of the protos I can see how that can be a problem, but I don't have a solution which won't break in a lot of the other cases. Arguably the same object that is generated by load can be accessible through the client, once it is connected in order to be saved as a global VU variable, and maybe there could be a global(ish) cache but I would argue this needs to be tested and benchmarked - I would expect that while that will be slow(ish) it will be nothing compared to the actual calls and arguably won't be a problem with not big protos, for which I would argue the above workaround (of caching it in a global VU variable) will work ... ;)

@rogchap
Copy link
Contributor

rogchap commented Sep 4, 2020

Thanks for the info @mstoykov that makes sense. When I had the load within the iterations It resulted in too many files open error, because the file Close is within a defer that is within a loop, so does not close soon enough 😬

I've currently ended up with this API:

import grpc from 'k6/grpc';

export let options = {
  vus: 100,
  duration: '10s',
};

grpc.load([], "samples/grpc_server/route_guide.proto")

export default function() {
    let client = grpc.newClient();
    client.connect("localhost:10000")

    const resp = client.invokeRPC("main.RouteGuide/GetFeature", {
        latitude: 410248224,
        longitude: -747127767
    })
    // console.log(resp)
   
    client.close()
}

This is working great, but does use a global variable to store the file descriptors from the grpc.load function, so that it can be visible to the client. I think this is a fair compromise.

I'm using sync.Once to make sure load can only be called once; but I don't think it's really necessary, now that I have it within the init section.

Currently, I'm just storing the slice of file descriptors, but I plan to change this to a map of method names to method descriptors so that within the iterations we can do a O(1) lookup, rather than O(n).

Example of my current output for gRPC (sample server has a random sleep between 0-1s):
image

@na--
Copy link
Member

na-- commented Sep 4, 2020

Yes, client.load() should happen in the init context, for the reasons @mstoykov explained. And, for a bunch of reasons, network connections of any kind probably shouldn't.

If you don't want to connect() on every script iteration, you can easily do something like this:

import grpc from 'k6/grpc';

let client = grpc.newClient();
client.load([], "samples/grpc_server/route_guide.proto")

export default function() {
    if (__ITER == 0) {
        client.connect("localhost:10000"))
    }

    const resp = client.invokeRPC("main.RouteGuide/GetFeature", {
        latitude: 410248224,
        longitude: -747127767
    })
    console.log(resp)
}

We are probably going to add a per-VU init() lifecycle function eventually, as mentioned in #785, but you can emulate it even now with the per-vu-iterations executor, as described at the bottom of that issue.

This is working great, but does use a global variable to store the file descriptors from the grpc.load function, so that it can be visible to the client. I think this is a fair compromise.
I'm using sync.Once to make sure load can only be called once; but I don't think it's really necessary, now that I have it within the init section.
Currently, I'm just storing the slice of file descriptors, but I plan to change this to a map of method names to method descriptors so that within the iterations we can do a O(1) lookup, rather than O(n).

This still seems like too stateful... Maybe I'm missing some important gRPC detail, but if you want maximum speed, what's stopping you from having a central cache for all load("foo.proto") operations based on the path name, but still having load() be a method of the client, not a global one?

@rogchap
Copy link
Contributor

rogchap commented Sep 4, 2020

So from my initial attempts I had let client = grpc.newClient(); in the init section, but I found that if declared here, the state object via state := lib.GetState(c.ctx) within func (c *Client) InvokeRPC() would be nil.

I could not see a way to declare my client outside of the default function and still have access to the state. I initially did have the central cache per client.

I was initial creating a reference to the ctx from the NewClient().

Any advice here?

@na--
Copy link
Member

na-- commented Sep 4, 2020

Why do you need the state in grpc.newClient();? You don't need to save the context (and a reference to the state) in there, you can get the context in each call of your methods, if you just specify a context.Context as their first parameter.

It's not well documented, but you can find how that works in the k6 code. For example, look at how custom metrics are implemented. They have constructors in the init context that explicitly require that there's no context. But then, their Add() method checks the reverse, that the context exists and gets the state.

All of this black reflection magic is managed by the bridge: https://github.com/loadimpact/k6/blob/master/js/common/bridge.go

It can tell when a Go function wants to receive the context and can pass the context to it when called from JS. No need to save references, just get it in every function call.

@rogchap
Copy link
Contributor

rogchap commented Sep 4, 2020

Hmmm... interesting

if you just specify a context.Context as their first parameter.

That's what I thought, but when I initially tried that, I had a reflection error:

TypeError: could not convert function call parameter main.RouteGuide/GetFeature to context.Context

So I thought (wrongly) that context is only passed to the main root module (k6/grpc in my case)

Thanks for the references @na-- , that will indeed help. Will let you know how I go.

@rogchap
Copy link
Contributor

rogchap commented Sep 4, 2020

Ah Ha... success. common.Bind() was my missing function I needed. Thanks again @na-- for the help.

FWIW: I find it weird to use a pointer to an interface i.e ctxPtr *context.Context, as an interface is already a pointer. 🤷‍♂️

@na--
Copy link
Member

na-- commented Sep 4, 2020

FWIW: I find it weird to use a pointer to an interface i.e ctxPtr *context.Context, as an interface is already a pointer. man_shrugging

Fair 😅 I think whenever the context pointer is used, it's to explicitly check that it's nil, i.e. that there's no context. Though I haven't dug into that code recently, and I wasn't the one that originally wrote it, so I can't be quite sure - there might have been some other considerations at the time 🤷‍♂️

@rogchap
Copy link
Contributor

rogchap commented Sep 7, 2020

Quick(ish) question: I'm just adding TLS support, and I've notices that the tls.Config is part of the state, which is great; I can re-use that for my gRPC TLS. But I noticed that there does not seem to be an option to set (or add to the system) Root CAs only client certs (mTLS); is this not something that you've run into for HTTP (self signed server certificates)?
For HTTP, I guess you can just rely on the protocol (http vs https) for system certs; the underlining transport will take care of this for you; but for gRPC I need to add the credentials manually. Also need to add support for plaintext too.

I can easily add support for plaintext as an option to the gRPC client (as it's not a k6 wide option); but for RootCAs it makes more sense to add this to the main TLS options. What do you reckon?
As a stop gap, I can add the System Cert Pool; for unix SSL_CERT_FILE and SSL_CERT_DIR can be used to override the default "System" certs with a self signed cert; but long term I think adding direct support for Root CAs makes sense (unless it already is and I've missed it)

@na--
Copy link
Member

na-- commented Sep 8, 2020

To be honest, I'd prefer if we don't add another global option. Rather, the gRPC client should have its own TLS options that you can pass in its constructor/connect() method/etc., and only use the global ones as the defaults. This way you'd be able to have different gRPC clients with different Root CAs. It's easy to think of a potential use case - someone might want to test both an internal and an external gRPC endpoint as different scenarios of the same load test.

We've had issues with global options and the HTTP module before, so I'd try to avoid it with any new protocols: #936, #970, #1045 among many others. If you have global options for a protocol, you loose a ton of flexibility - if they exist, they should just be the defaults.

@rogchap
Copy link
Contributor

rogchap commented Sep 8, 2020

So; quick update: Things are looking good. I have most of my "TODOs" done for supporting Unary gRPC requests; I'm currently working on the unit tests.

One outstanding TODO is the standard gRPC tags; Would these class as "System Tags"?

One of the tags is method; this makes sense for the gRPC world, but would this be confusing if I re-used the stats.TagMethod as this is for HTTP method (GET, POST etc) or would it not matter as they are two separate protocols and therefore make sense to the caller?

I was also going to add some documentation; is the website docs a separate repo?

Would you like me to submit a PR; so that you can more easily review my branch; or would you prefer to wait until I get the unit tests and tags done?

Sorry for the 21 questions.

@na--
Copy link
Member

na-- commented Sep 8, 2020

would it not matter as they are two separate protocols and therefore make sense to the caller?

Yes, I think it shouldn't matter if you reuse method - if it makes sense, reuse a tag, if it doesn't - create a new one

I was also going to add some documentation; is the website docs a separate repo?

Yes - https://github.com/loadimpact/k6-docs/

Would you like me to submit a PR; so that you can more easily review my branch; or would you prefer to wait until I get the unit tests and tags done?

Sure, submit a [WIP] PR. But to warn you, as I've mentioned a recent PR (#1619 (comment)), the earliest someone can look at your code is sometime next week, since we're a bit busy with the v0.28.0 release now.

@rogchap
Copy link
Contributor

rogchap commented Sep 11, 2020

This is now ready for review whenever you get the chance @na--, @mstoykov (I know you are busy and can't get to this till later, np).

In the meantime, I'll add some stuff to the docs.

Ps. Gatsby/NodeJS for the docs server!? 😱 Not keen on Hugo? (just joking...not joking 😉)

@na-- na-- added this to the v0.29.0 milestone Sep 11, 2020
@na--
Copy link
Member

na-- commented Sep 11, 2020

Thanks! I am on vacation next week, but @imiric and @mstoykov will review the code right after they release k6 v0.28.0. And yeah, regarding the docs, I hear you! That was close to my exact reaction when @ppcano explained the complexity of the new docs 😅 But, to be fair, the new docs are still leagues better than the previous versions that had 2 different documentation websites (one for the cloud stuff and one for k6) with significant content overlap, and much better than even the k6 docs that were hosted on readme.io.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants