-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make GetAvailablePackageSummaries more robust to potential plugin failures #4692
Comments
Thanks Dimitri. Yeah, I think But what you're proposing is I think trying to get the best of both worlds, where we return the results from the plugins without errors and also somehow notify that there was an issue with some plugins? It's tricky to see what the status would be for such a call (did it succeed? what if the plugin error was temporary, so the call for the next page then includes some results while the first didn't etc.) I need to think about it more, but I still think I'd prefer to return an error if there was an error retrieving the aggregated results. What do others think? |
yes, the goal is just to provide more robustness. |
I have also brought this stuff up in the past, even if I'm 100% for returning any error back to the user I still think everything should fail just because of a tiny error in a single plugin. For example, a real escanrio I faced recently: I installed Kubeapps and then upgraded it via helm plugin but I wanted to manage carvel packages. Due to some incompatibilities with my chart values, the helm plugin was failing. Well, so as a result, I wouldn't be able to neither manage the installed packages via helm nor installing new packages with carvel. |
I think you mean "should not" (from your argument below) :)
Yep, aborted could be an option if it's possible to return that status code with a valid result, but I'm not sure it is with grpc? Either the call succeeds and returns your response message, or you instead get an error response. You can have richer errors for more details about the error itself, but I don't think it's normal to return something like "partially ok", though we could enable that in our own message format (though I'm not personally convinced).
Yes, I think that's where I'd want to see a blocking error. There was a misconfiguration with your upgrade, and if you didn't see that because you still saw a catalog with carvel packages and went ahead, you might think everything is fine and start letting users use it. The correct action to take at that point would be either to disable the misbehaving plugin or fix the configuration, IMO, not use Kubeapps in a half-working state and getting different results for the same request. That said, I'm happy to consider alternatives for tolerance with plugin failures if you both feel strongly about it. Worth considering the priority of this too (ie. seems like something that could affect users more when we have community plugins in the future, not necessarily yet) If we could return results as well as an error code or information that got exposed in the client, that could be nice, but it's still |
+1 to showing available results together with any plugin error when it failed (if that is technically possible with Grpc) Perhaps we could dedicate some time to investigate a proper error framework in Grpc-web?
Something missing here at the end of the sentence @absoludity ? |
I think I just meant to delete "but it's still" (I had been going to say that it's still going to leave the results inconsistent, since the next page of results may then have some for the problem plugin, but it's not a big deal - having out of order results for a misconfigured/misbehaving plugin). So I think a majority here agree that being able to return the results for the correctly behaving plugins while also ensuring the client is aware of an issue with certain plugins would be a good thing? So let's move towards that then. Regarding the status, I think we may be restricted to returning a gRPC OK, but with a modified message that includes some extra info indicating which plugin is erroring? Let's see when someone gets to look more deeply. |
there are cases where one plugin can fail when calling GetAvailablePackageSummaries (for example postgres issue for helm plugin, bug in carvel plugin for empty result...), and in that case, the GetAvailablePackageSummaries call itself fails.
it should be possible for the core layer to catch those errors and report them along with the results from other plugins, rather than fail altogether.
The text was updated successfully, but these errors were encountered: