-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: temporarily ignore sanity checks when creating traceflow via UI #1097
Conversation
Thanks for your PR. The following commands are available:
|
/test-all |
2 similar comments
/test-all |
/test-all |
The change LGTM, however, the description is a little confusing. In order to avoid misunderstanding, the patch is a temporary change to allow users to create Traceflow even though their inputs do not meet Traceflow criteria. Since octant has just supported alerts, we will show alerts and prevent these kinds of Traceflow creation later. |
/test-hw-offload |
Then what would users get if the inputs are invalid with this patch? I think the CRD creation would fail. |
I prefer to report error to User in UI if apiserver responded error or CRD status.phase is "Failed". |
Currently we cannot send alerts to users in Octant plugins if we do not update Octant. |
More details can be seen in this issue: vmware-archive/octant#902. |
@tnqn @gran-vmv It has been reported that invalid Traceflows can not be created by UI, however, it can be created via kubectl and yaml file, e.g. source port to be empty. Users thought it was an UI issue, but it was actually because UI blocked invalid creation without showing enough information. The best solution should be both UI and Kubectl can provide more information for users to identify invalid input. |
@ZhangYW18 If the alert change is not big and we plan to enhance input check, I think we will not need this temporary change since it will bring more confusions. |
Thanks for Mengdie's explanation. One thing I have to add: Sometimes it is allowed to input an empty string as the source port, and the traceflow CRD would be generated successfully with Phase "Success" via kubectl and yaml file under such circumstance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still confused that what change users would expect after applying patch. I think the CRD validation will still fail the creation and how users get the failure information before the octant change you mentioned is included.
And this patch seems removing some necessary "return nil" wrongly.
Before this change, with the illegal input submitted, users will see the UI work as usual (jump back to the home page of traceflow plugin), while the backend does not submit requests to Antrea indeed. After the change, the request for creating traceflow would be submitted to Antrea, and if some error occurs, the phase of created traceflow would be marked as "Failed" and the detailed error can be found in the "Reason" field of traceflow CRD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of agree that this is a slight improvement over the current behavior, until we can upgrade Octant to a version that supports alerts. At least the behavior will be more consistent across the UI and creating the CRD directly. So I am personally ok with merging this.
Please consider doing the following though:
- add a TODO in the code to express that this code will be improved when we upgrade Octant to a version that supports alerts (upcoming v0.16).
- add a more detailed commit message, to match the PR description
- in the PR description, "judges" doesn't make sense for what you are trying to express. Try: "The plugin contains multiple sanity checks for the user input."
I also agree with @tnqn that the return nil
needs to be preserved for the case where CRD creation fails.
I am afraid that the current log messages for the sanity checks may also be confusing since they imply a failure, yet we continue with CRD creation. Maybe instead of something like this Failed to get srcNamespace as string: ...
, we should wrap the error so that the final log message looks like this: Invalid user input, CRD creation or Traceflow request may fail: failed to get srcNamespace as string: ...
f693da2
to
1576d8f
Compare
Thanks for Antonin's inspiring suggestions. I have updated the logs, the commit message, etc. |
f69fe64
to
acf4205
Compare
/test-all |
@@ -281,8 +282,8 @@ func (p *antreaOctantPlugin) actionHandler(request *service.ActionRequest) error | |||
case showGraphAction: | |||
name, err := request.Payload.String(traceNameCol) | |||
if err != nil { | |||
log.Printf("Failed to get name at string : %w", err) | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it fails to get the trace name, should it return? otherwise the graph would be wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter whether we return or not here. If the name is empty, then the plugin gets a traceflow CRD with all fields being empty value, and then generate an empty graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even the result is same, the readability and maintainability is different, returning here ensures the following code has a valid traceflow name, otherwise people would need to worry what's the behavior if getting traceflow with an empty name, and what would the graph look like with an empty string.
I think this patch shouldn't make the change to worse the readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil added.
log.Printf("Failed to get name at string : %w", err) | ||
return nil | ||
log.Printf("Invalid user input, CRD creation or Traceflow request may fail: "+ | ||
"failed to get name at string : %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"failed to get name at string : %s", err) | |
"failed to get name at string: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space removed.
92c2b04
to
731eed1
Compare
/test-all |
/test-hw-offload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
log.Printf("Duplicate traceflow \"%s\": same source pod and destination pod in less than one second"+ | ||
": %+v. ", tfName, tfOld) | ||
return nil | ||
log.Printf("Invalid user input, CRD creation or Traceflow request may fail: "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When iterating this feature, I suggest adding a suffix of a random string to save the step, or maybe we could just set "generateName" instead of "name" to ask apiserver to create a unique name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I fix it in this patch or fix it later?
/skip-conformance |
@mengdie-song @antoninbas do you have more comments? will merge it if not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The plugin contains multiple sanity checks to the user input: whether the IP can be converted to a standard IPv4 IP, whether the namespace/pod name meets the requirement of Kubernetes naming rule, etc. Currently, when one of those checks fails with a specific user's input submitted, the plugin work as usual (jump back to the home page of traceflow plugin) and users cannot be notified due to the restrictions of Octant. This may confuse the users a lot.
The patch is a temporary change to allow users to create Traceflow even though their inputs do not meet Traceflow criteria.
Users' requests for creating traceflow would be submitted to Antrea despite the results of sanity checks, and if some error occurs, the phase of created traceflow would be marked as "Failed" and the detailed error can be found in the "Reason" field of traceflow CRD. Since Octant has just supported alerts, we will show alerts and prevent these kinds of Traceflow creation later after upgrading Octant.