Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add client pool to measurementlink-labview #262

Merged
merged 7 commits into from
Aug 29, 2023

Conversation

pbirkhol-ni
Copy link
Contributor

What does this Pull Request accomplish?

This PR adds a gRPC client pool.

  • Create a GrpcClientPool class. The main API for this class is a LV2 style global that allows the pool to be shared across all measurements. Note that this could also cause problems, which I will detail below.
  • Add a Location to Insecure Address.vi VI. This VI has been written before, but now it is part of a reusable library. It is marked as community scoped for now, but could be made public if desired.
  • Create some new tests. Also put the new and existing tests into a library so that the entire library can be marked as a friend of the GrpcClientPool class.

Currently, the GrpcClientPool is accessed via a LV2 style global. This means that it is shared for all measurements. This generally shouldn't be a problem, but could be an issue when/if Destroy is called, which will destroy the clients for all measurements. However, even this isn't necessarily a problem. If the Session Manager service crashes, it will have crashed for all measurements and so it makes sense to destroy the client for all measurements.

The real issue is how we call Destroy. Currently it is called by the measurementlink-labview framework when a measurement throws any error, even if the error is unrelated to gRPC. That could result in clients getting destroyed even though the client is still valid. That might not matter much for the measurement that errored, but will result in longer run times for other measurements. Below are a few ways we could fix this:

  • Be more selective about when we call Destroy. Don't call it for every error; instead, call it for errors that indicate the client needs to be recreated. This has a few potential issues:
    • Can we anticipate all errors where it should be called? If we forget an error, the customer will be left with a bad client from which they won't be able to recover.
    • What if only a single client goes bad? Destroying all clients could still adversely affect other measurements.
  • Stop using a LV2 style global and instead create a client pool per measurement. This results in extra clients, but should otherwise work.

Finally, this all depends on proper error handling from the customer. If the customer's measurement swallows errors and a gRPC client goes bad, they will be stuck.

Why should this Pull Request be merged?

This is something we should have done a long time ago, but it will be especially important with the upcoming session management work.

What testing has been done?

New auto tests. I also modified the DCPower shipping example to use the new VIs and verified that it worked as expected.

@pbirkhol-ni
Copy link
Contributor Author

There are a lot of new VIs so I am not going to post screenshots unless requested.

@dixonjoel
Copy link
Collaborator

I also modified the DCPower shipping example to use the new VIs and verified that it worked as expected.

@pbirkhol-ni This will be updated as a follow-up PR once we have a pre-release with the client pool I assume?

Copy link
Collaborator

@dixonjoel dixonjoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for an update where the tests library is not missing files.

@pbirkhol-ni
Copy link
Contributor Author

I also modified the DCPower shipping example to use the new VIs and verified that it worked as expected.

@pbirkhol-ni This will be updated as a follow-up PR once we have a pre-release with the client pool I assume?

I thought about updating the examples but I am hoping that Measurement Service Helpers.lvlib, which currently does session management, just goes away once we are done with the session management work.

@dixonjoel
Copy link
Collaborator

I am hoping that Measurement Service Helpers.lvlib, which currently does session management, just goes away once we are done with the session management work.

So where do we expect to use this client pool first? Internally only in the session manager VIs we re-write? Do we expect customers to use them?

@pbirkhol-ni
Copy link
Contributor Author

I am hoping that Measurement Service Helpers.lvlib, which currently does session management, just goes away once we are done with the session management work.

So where do we expect to use this client pool first? Internally only in the session manager VIs we re-write? Do we expect customers to use them?

For now, we would use the client pool internally and it would not be directly exposed to customers. That is why all of the VIs are either private or community scoped. We could consider making it public, but I think what we would do instead is provide an API through the MeasurementContext class. Maybe something like Get gRPC Client and you would pass in an enum saying which client you want. I have a goal (perhaps misplaced) that MeasurementContext class is our only API and anything we want customers to use is exposed through it.

@bkeryan
Copy link
Collaborator

bkeryan commented Aug 25, 2023

The real issue is how we call Destroy. Currently it is called by the measurementlink-labview framework when a measurement throws any error, even if the error is unrelated to gRPC. That could result in clients getting destroyed even though the client is still valid. That might not matter much for the measurement that errored, but will result in longer run times for other measurements.

Does grpc-labview's pointer manager prevent this case from crashing? I have seen crashes due to gRPCid lifetime bugs, but none of them specifically involved calling CloseClient while another VI was using the client.

@jasonmreding
Copy link
Contributor

In regards to when to destroy clients and clear the cache:

  • Do you know if gprc-labview provides a standard set of errors from the client similar to other languages where the error maps to one of the 16 predefined status codes?
  • Do you know what errors clients produce if used after they are destroyed or how well grpc-labview tolerates that condition? Trying to use a gRPC ID for a client that never existed seems to just return -1 invalid argument. However, I don't know if the behavior is different if the client existed at one time.
  • You mention a couple of different times the potential impact to other measurements. To be precise, you are referring to other measure RPCs for the same measurement and not other measurements, correct?
  • Unless we make the client pool public, I think we should be more conservative with errors initially and always clear the cache (what you are doing here). If users fail to error chain properly, I don't think it's unreasonable to expect them to fix that. If we're not comfortable with that, then we should probably just make the API public and they can force clear things themselves. However, from a dev workflow perspective, I suspect it will be easier to just stop/restart the service rather than inject cleanup code into the service to work around poor error chaining while developing the service logic. It would be nice to have the LV equivalent of feature toggles for this sort of thing. That would allow users to opt in/out of behavior like this. We could probably rig up something similar with conditional compilation symbols and a little bit of glue code.

@pbirkhol-ni
Copy link
Contributor Author

The real issue is how we call Destroy. Currently it is called by the measurementlink-labview framework when a measurement throws any error, even if the error is unrelated to gRPC. That could result in clients getting destroyed even though the client is still valid. That might not matter much for the measurement that errored, but will result in longer run times for other measurements.

Does grpc-labview's pointer manager prevent this case from crashing? I have seen crashes due to gRPCid lifetime bugs, but none of them specifically involved calling CloseClient while another VI was using the client.

It appears to handle it correctly. I ran the following VI for a few minutes, and while it alternated between working correctly and throwing an error, it never crashed.

image

@pbirkhol-ni pbirkhol-ni merged commit c033f1f into main Aug 29, 2023
1 check passed
@dixonjoel dixonjoel deleted the users/pbirkhol/create-client-pool branch September 26, 2023 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants