Fix: temporarily ignore sanity checks when creating traceflow via UI #1097

ZhangYW18 · 2020-08-17T02:38:53Z

The plugin contains multiple sanity checks to the user input: whether the IP can be converted to a standard IPv4 IP, whether the namespace/pod name meets the requirement of Kubernetes naming rule, etc. Currently, when one of those checks fails with a specific user's input submitted, the plugin work as usual (jump back to the home page of traceflow plugin) and users cannot be notified due to the restrictions of Octant. This may confuse the users a lot.

The patch is a temporary change to allow users to create Traceflow even though their inputs do not meet Traceflow criteria.
Users' requests for creating traceflow would be submitted to Antrea despite the results of sanity checks, and if some error occurs, the phase of created traceflow would be marked as "Failed" and the detailed error can be found in the "Reason" field of traceflow CRD. Since Octant has just supported alerts, we will show alerts and prevent these kinds of Traceflow creation later after upgrading Octant.

antrea-bot · 2020-08-17T02:39:08Z

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

/test-e2e: to trigger e2e tests.
/skip-e2e: to skip e2e tests.
/test-conformance: to trigger conformance tests.
/skip-conformance: to skip conformance tests.
/test-whole-conformance: to trigger all conformance tests on linux.
/skip-whole-conformance: to skip all conformance tests on linux.
/test-networkpolicy: to trigger networkpolicy tests.
/skip-networkpolicy: to skip networkpolicy tests.
/test-windows-conformance: to trigger windows conformance tests.
/skip-windows-conformance: to skip windows conformance tests.
/test-windows-networkpolicy: to trigger windows networkpolicy tests.
/skip-windows-networkpolicy: to skip windows networkpolicy tests.
/test-hw-offload: to trigger ovs hardware offload test.
/skip-hw-offload: to skip ovs hardware offload test.
/test-all: to trigger all tests (except whole conformance).
/skip-all: to skip all tests (except whole conformance).

ZhangYW18 · 2020-08-17T02:51:52Z

/test-all

ZhangYW18 · 2020-08-17T03:11:16Z

/test-all

lzhecheng · 2020-08-17T03:13:30Z

/test-all

mengdie-song · 2020-08-17T03:29:33Z

The change LGTM, however, the description is a little confusing. In order to avoid misunderstanding, the patch is a temporary change to allow users to create Traceflow even though their inputs do not meet Traceflow criteria. Since octant has just supported alerts, we will show alerts and prevent these kinds of Traceflow creation later.

lzhecheng · 2020-08-17T03:35:30Z

/test-hw-offload

tnqn · 2020-08-17T06:43:47Z

allow users to create Traceflow even though their inputs do not meet Traceflow criteria

Then what would users get if the inputs are invalid with this patch? I think the CRD creation would fail.

gran-vmv · 2020-08-17T06:59:57Z

allow users to create Traceflow even though their inputs do not meet Traceflow criteria

Then what would users get if the inputs are invalid with this patch? I think the CRD creation would fail.

I prefer to report error to User in UI if apiserver responded error or CRD status.phase is "Failed".

ZhangYW18 · 2020-08-17T07:14:44Z

allow users to create Traceflow even though their inputs do not meet Traceflow criteria

Then what would users get if the inputs are invalid with this patch? I think the CRD creation would fail.

I prefer to report error to User in UI if apiserver responded error or CRD status.phase is "Failed".

Currently we cannot send alerts to users in Octant plugins if we do not update Octant.

ZhangYW18 · 2020-08-17T07:16:51Z

allow users to create Traceflow even though their inputs do not meet Traceflow criteria

Then what would users get if the inputs are invalid with this patch? I think the CRD creation would fail.

I prefer to report error to User in UI if apiserver responded error or CRD status.phase is "Failed".

More details can be seen in this issue: vmware-archive/octant#902.

mengdie-song · 2020-08-17T08:12:07Z

@tnqn @gran-vmv It has been reported that invalid Traceflows can not be created by UI, however, it can be created via kubectl and yaml file, e.g. source port to be empty. Users thought it was an UI issue, but it was actually because UI blocked invalid creation without showing enough information. The best solution should be both UI and Kubectl can provide more information for users to identify invalid input.

mengdie-song · 2020-08-17T08:14:08Z

@ZhangYW18 If the alert change is not big and we plan to enhance input check, I think we will not need this temporary change since it will bring more confusions.

ZhangYW18 · 2020-08-17T08:33:36Z

@tnqn @gran-vmv It has been reported that invalid Traceflows can not be created by UI, however, it can be created via kubectl and yaml file, e.g. source port to be empty. Users thought it was an UI issue, but it was actually because UI blocked invalid creation without showing enough information. The best solution should be both UI and Kubectl can provide more information for users to identify invalid input.

Thanks for Mengdie's explanation. One thing I have to add: Sometimes it is allowed to input an empty string as the source port, and the traceflow CRD would be generated successfully with Phase "Success" via kubectl and yaml file under such circumstance.

tnqn

I'm still confused that what change users would expect after applying patch. I think the CRD validation will still fail the creation and how users get the failure information before the octant change you mentioned is included.
And this patch seems removing some necessary "return nil" wrongly.

plugins/octant/cmd/antrea-octant-plugin/traceflow.go

ZhangYW18 · 2020-08-17T15:26:44Z

I'm still confused that what change users would expect after applying patch. I think the CRD validation will still fail the creation and how users get the failure information before the octant change you mentioned is included.
And this patch seems removing some necessary "return nil" wrongly.

Before this change, with the illegal input submitted, users will see the UI work as usual (jump back to the home page of traceflow plugin), while the backend does not submit requests to Antrea indeed. After the change, the request for creating traceflow would be submitted to Antrea, and if some error occurs, the phase of created traceflow would be marked as "Failed" and the detailed error can be found in the "Reason" field of traceflow CRD.

antoninbas

I kind of agree that this is a slight improvement over the current behavior, until we can upgrade Octant to a version that supports alerts. At least the behavior will be more consistent across the UI and creating the CRD directly. So I am personally ok with merging this.

Please consider doing the following though:

add a TODO in the code to express that this code will be improved when we upgrade Octant to a version that supports alerts (upcoming v0.16).
add a more detailed commit message, to match the PR description
in the PR description, "judges" doesn't make sense for what you are trying to express. Try: "The plugin contains multiple sanity checks for the user input."

I also agree with @tnqn that the return nil needs to be preserved for the case where CRD creation fails.

I am afraid that the current log messages for the sanity checks may also be confusing since they imply a failure, yet we continue with CRD creation. Maybe instead of something like this Failed to get srcNamespace as string: ..., we should wrap the error so that the final log message looks like this: Invalid user input, CRD creation or Traceflow request may fail: failed to get srcNamespace as string: ...

ZhangYW18 · 2020-08-18T02:34:58Z

I kind of agree that this is a slight improvement over the current behavior, until we can upgrade Octant to a version that supports alerts. At least the behavior will be more consistent across the UI and creating the CRD directly. So I am personally ok with merging this.

Please consider doing the following though:

add a TODO in the code to express that this code will be improved when we upgrade Octant to a version that supports alerts (upcoming v0.16).

add a more detailed commit message, to match the PR description

in the PR description, "judges" doesn't make sense for what you are trying to express. Try: "The plugin contains multiple sanity checks for the user input."

I also agree with @tnqn that the return nil needs to be preserved for the case where CRD creation fails.

I am afraid that the current log messages for the sanity checks may also be confusing since they imply a failure, yet we continue with CRD creation. Maybe instead of something like this Failed to get srcNamespace as string: ..., we should wrap the error so that the final log message looks like this: Invalid user input, CRD creation or Traceflow request may fail: failed to get srcNamespace as string: ...

Thanks for Antonin's inspiring suggestions. I have updated the logs, the commit message, etc.

ZhangYW18 · 2020-08-18T09:33:21Z

/test-all

tnqn · 2020-08-18T11:13:01Z

plugins/octant/cmd/antrea-octant-plugin/traceflow.go

@@ -281,8 +282,8 @@ func (p *antreaOctantPlugin) actionHandler(request *service.ActionRequest) error
 	case showGraphAction:
 		name, err := request.Payload.String(traceNameCol)
 		if err != nil {
-			log.Printf("Failed to get name at string : %w", err)
-			return nil


If it fails to get the trace name, should it return? otherwise the graph would be wrong?

It doesn't matter whether we return or not here. If the name is empty, then the plugin gets a traceflow CRD with all fields being empty value, and then generate an empty graph.

Even the result is same, the readability and maintainability is different, returning here ensures the following code has a valid traceflow name, otherwise people would need to worry what's the behavior if getting traceflow with an empty name, and what would the graph look like with an empty string.
I think this patch shouldn't make the change to worse the readability.

return nil added.

tnqn · 2020-08-18T11:17:16Z

plugins/octant/cmd/antrea-octant-plugin/traceflow.go

-			log.Printf("Failed to get name at string : %w", err)
-			return nil
+			log.Printf("Invalid user input, CRD creation or Traceflow request may fail: "+
+				"failed to get name at string : %s", err)


Suggested change

"failed to get name at string : %s", err)

"failed to get name at string: %s", err)

Extra space removed.

ZhangYW18 · 2020-08-19T09:01:04Z

/test-all

abdallahyas · 2020-08-19T12:24:13Z

/test-hw-offload

tnqn

LGTM, thanks.

tnqn · 2020-08-19T15:16:26Z

plugins/octant/cmd/antrea-octant-plugin/traceflow.go

-			log.Printf("Duplicate traceflow \"%s\": same source pod and destination pod in less than one second"+
-				": %+v. ", tfName, tfOld)
-			return nil
+			log.Printf("Invalid user input, CRD creation or Traceflow request may fail: "+


When iterating this feature, I suggest adding a suffix of a random string to save the step, or maybe we could just set "generateName" instead of "name" to ask apiserver to create a unique name.

Should I fix it in this patch or fix it later?

tnqn · 2020-08-19T15:20:29Z

/skip-conformance
/skip-networkpolicy
/skip-windows-conformance
/skip-windows-networkpolicy
/test-e2e

tnqn · 2020-08-20T05:19:22Z

@mengdie-song @antoninbas do you have more comments? will merge it if not.

mengdie-song

LGTM

…ntrea-io#1097)

…1097)

…ntrea-io#1097)

gran-vmv requested review from tnqn, gran-vmv, antoninbas, jianjuns and mengdie-song August 17, 2020 03:48

tnqn requested changes Aug 17, 2020

View reviewed changes

plugins/octant/cmd/antrea-octant-plugin/traceflow.go Outdated Show resolved Hide resolved

antoninbas reviewed Aug 17, 2020

View reviewed changes

ZhangYW18 force-pushed the tf-ui-fix branch 2 times, most recently from f693da2 to 1576d8f Compare August 18, 2020 02:22

ZhangYW18 changed the title ~~Fix Antrea Octant plugin no notification bug~~ Fix: temporarily ignore sanity checks when creating traceflow via UI Aug 18, 2020

ZhangYW18 force-pushed the tf-ui-fix branch from 1576d8f to a792d76 Compare August 18, 2020 02:26

ZhangYW18 force-pushed the tf-ui-fix branch 2 times, most recently from f69fe64 to acf4205 Compare August 18, 2020 03:53

tnqn reviewed Aug 18, 2020

View reviewed changes

ZhangYW18 force-pushed the tf-ui-fix branch 2 times, most recently from 92c2b04 to 731eed1 Compare August 19, 2020 01:02

Fix: temporarily ignore sanity checks when creating traceflow via UI

b26968a

ZhangYW18 force-pushed the tf-ui-fix branch from 731eed1 to b26968a Compare August 19, 2020 01:12

tnqn approved these changes Aug 19, 2020

View reviewed changes

mengdie-song approved these changes Aug 20, 2020

View reviewed changes

antoninbas approved these changes Aug 20, 2020

View reviewed changes

tnqn merged commit a4a49a1 into antrea-io:master Aug 21, 2020

antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Aug 21, 2020

Fix: temporarily ignore sanity checks when creating traceflow via UI (a…

8b5d587

…ntrea-io#1097)

antoninbas pushed a commit that referenced this pull request Aug 21, 2020

Fix: temporarily ignore sanity checks when creating traceflow via UI (#…

4af8518

…1097)

tnqn pushed a commit to tnqn/antrea that referenced this pull request Aug 27, 2020

Fix: temporarily ignore sanity checks when creating traceflow via UI (a…

77642c3

…ntrea-io#1097)

GraysonWu pushed a commit to GraysonWu/antrea that referenced this pull request Sep 22, 2020

Fix: temporarily ignore sanity checks when creating traceflow via UI (a…

48bd6dd

…ntrea-io#1097)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: temporarily ignore sanity checks when creating traceflow via UI #1097

Fix: temporarily ignore sanity checks when creating traceflow via UI #1097

ZhangYW18 commented Aug 17, 2020 •

edited

Loading

antrea-bot commented Aug 17, 2020

ZhangYW18 commented Aug 17, 2020

ZhangYW18 commented Aug 17, 2020

lzhecheng commented Aug 17, 2020

mengdie-song commented Aug 17, 2020

lzhecheng commented Aug 17, 2020

tnqn commented Aug 17, 2020

gran-vmv commented Aug 17, 2020

ZhangYW18 commented Aug 17, 2020 •

edited

Loading

ZhangYW18 commented Aug 17, 2020

mengdie-song commented Aug 17, 2020

mengdie-song commented Aug 17, 2020

ZhangYW18 commented Aug 17, 2020

tnqn left a comment

ZhangYW18 commented Aug 17, 2020

antoninbas left a comment •

edited

Loading

ZhangYW18 commented Aug 18, 2020

ZhangYW18 commented Aug 18, 2020

tnqn Aug 18, 2020

ZhangYW18 Aug 18, 2020

tnqn Aug 18, 2020

ZhangYW18 Aug 19, 2020

tnqn Aug 18, 2020

ZhangYW18 Aug 19, 2020

ZhangYW18 commented Aug 19, 2020

abdallahyas commented Aug 19, 2020

tnqn left a comment

tnqn Aug 19, 2020

ZhangYW18 Aug 20, 2020

tnqn commented Aug 19, 2020

tnqn commented Aug 20, 2020

mengdie-song left a comment

	"failed to get name at string : %s", err)
	"failed to get name at string: %s", err)

Fix: temporarily ignore sanity checks when creating traceflow via UI #1097

Fix: temporarily ignore sanity checks when creating traceflow via UI #1097

Conversation

ZhangYW18 commented Aug 17, 2020 • edited Loading

antrea-bot commented Aug 17, 2020

ZhangYW18 commented Aug 17, 2020

ZhangYW18 commented Aug 17, 2020

lzhecheng commented Aug 17, 2020

mengdie-song commented Aug 17, 2020

lzhecheng commented Aug 17, 2020

tnqn commented Aug 17, 2020

gran-vmv commented Aug 17, 2020

ZhangYW18 commented Aug 17, 2020 • edited Loading

ZhangYW18 commented Aug 17, 2020

mengdie-song commented Aug 17, 2020

mengdie-song commented Aug 17, 2020

ZhangYW18 commented Aug 17, 2020

tnqn left a comment

Choose a reason for hiding this comment

ZhangYW18 commented Aug 17, 2020

antoninbas left a comment • edited Loading

Choose a reason for hiding this comment

ZhangYW18 commented Aug 18, 2020

ZhangYW18 commented Aug 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhangYW18 commented Aug 19, 2020

abdallahyas commented Aug 19, 2020

tnqn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tnqn commented Aug 19, 2020

tnqn commented Aug 20, 2020

mengdie-song left a comment

Choose a reason for hiding this comment

ZhangYW18 commented Aug 17, 2020 •

edited

Loading

ZhangYW18 commented Aug 17, 2020 •

edited

Loading

antoninbas left a comment •

edited

Loading