Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC Load Balancing between http-adapter and things #387

Closed
nwest1 opened this issue Sep 5, 2018 · 8 comments
Closed

gRPC Load Balancing between http-adapter and things #387

nwest1 opened this issue Sep 5, 2018 · 8 comments
Labels
Milestone

Comments

@nwest1
Copy link
Contributor

nwest1 commented Sep 5, 2018

BUG REPORT

  1. What were you trying to achieve?
    Targeted TPS load testing against http-adapter and a failure scenario

  2. What are the expected results?
    balanced, self-healing connections between mainflux components

  3. What are the received results?
    gRPC not being load balanced, and when a pod is killed, these connections are not rebalanced

  4. What are the steps to reproduce the issue?

  • Run a load test in kubernetes with multiple replicas (http-adapter is using the things service name as url)
  • Note that the majority of transactions are running through a single things pod and not load balanced
  • Kill the things pod that's receiving the majority of transations
  • Note the error rate spikes and connections from adapter to things remain in error and are not rebalanced.
  1. In what environment did you encounter the issue?
    kubernetes

  2. Additional information you deem important:
    This might be slightly out of scope, as it's possible to solve this with Istio/Linkerd or some k8s ingress that is aware of gRPC. But want to bring it up as it looks like you're exploring similar replacements for nginx.

@drasko drasko added the bug label Sep 5, 2018
@drasko drasko added this to the 0.6.0 milestone Sep 5, 2018
@drasko
Copy link
Contributor

drasko commented Sep 5, 2018

@nwest1 thanks for filing this one.

@anovakovic01 anovakovic01 changed the title gRPC Load Balancing between http-adapter and users gRPC Load Balancing between http-adapter and things Sep 6, 2018
@drasko
Copy link
Contributor

drasko commented Sep 11, 2018

@nwest1 we've analyzed this one, and surely there is a missing LB part for gRPC. Problem is that although k8 adds LB in front of the services, gRPC pass via HTTP/2 and is more like a stream and not balanced via L7 LB. For gRPC we need L4 LB, and most often used is Istio approach with Envoy

This is explained well in this video: https://www.youtube.com/watch?v=F2znfxn_5Hg

Additional info:

We already wanted to go this route, please consult issue https://github.com/mainflux/mainflux/issues/352.

Although there is a way to avoid using Envoy and LB gRPC on client side, we feel that Envoy approach will be better.

@janko-isidorovic already started integration of Istio/Envoy in our k8 scripts and we should have something working before end of week.

One more approach that can be taken is using NATS and it's Request-Replay mode with internal NATS LB in Queue mode + new feature of Drain for downscaling: https://medium.com/@derekcollison/nats-resilient-systems-and-drain-mode-a764d4968711. This could eventually make system simpler, but we are not sure would it make easier to debug (i.e. o figure out where the messages come from in the case of error), and we are not sure about perfs of NATS with all these modes enabled. Let's try first Isto/Envoy solution.

@nmarcetic
Copy link
Collaborator

#378 Closes this issue.

@drasko
Copy link
Contributor

drasko commented Oct 1, 2018

@nmarcetic it is Envoy/Istio that will close the issue (https://github.com/mainflux/mainflux/issues/352).

@nmarcetic nmarcetic modified the milestones: 0.6.0, 0.7.0 Oct 1, 2018
@nmarcetic
Copy link
Collaborator

@drasko Yes and its related to #378

@janko-isidorovic
Copy link
Contributor

@nwest1 we have been able to reproduce the issue in Mainflux Lab. Adding Istio to Kubernets and Istio sidecar to the http-adapter pod resolves the issue and enables load balancing of the gRPC connections.
We need to decide if the solution is in the scope of Mainflux as it is more related to deployment strategy.

@drasko
Copy link
Contributor

drasko commented Oct 1, 2018

I agree that this depends only on deployment strategy (k8 in this case) and is not at all Mainflux issue. Let's see what is the best way to approach this, we should stay generic and not impose a specific deployment strategy and we need to understand how does this fall in Mainflux project scope.

@anovakovic01
Copy link
Contributor

Resolved with #378.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants