-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orchagent validates mirror session queue parameter against maximum value from SAI #1957
Conversation
…lue from SAI Signed-off-by: Raphael Tryster <raphaelt@nvidia.com>
… if vendor SAI does not specify limit Signed-off-by: Raphael Tryster <raphaelt@nvidia.com>
@@ -39,6 +39,11 @@ | |||
#define MIRROR_SESSION_DSCP_MIN 0 | |||
#define MIRROR_SESSION_DSCP_MAX 63 | |||
|
|||
// 15 is a typical value, but if vendor's SAI does not supply the maximum value, | |||
// allow all 8-bit numbers, effectively cancelling validation by orchagent. | |||
#define MIRROR_SESSION_DEFAULT_NUM_TC 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if 15 is the typical value, I would suggest having 15 as the default limit. By having 255 and if user supply a value like 100, this still can fail in SAI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failure of some automatic tests has shown me that the question is more complex than that. Firstly, let me describe our internal discussions. My first inclination was to use 15 as a typical default, but our architect pointed out that if some vendor supports a higher number and does not implement SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES, then a user configuring that higher number will be prevented from doing so. On the other hand, if we set the default high, then a smart user of that vendor's switch could go up to its maximum, and only a dumb user would enter a value that is too high and crash the switch. So I changed it to 255, which would allow all values up to 254. Of course, you may take the position that if a vendor wants to allow use of higher numbered queues, that vendor must implement SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES.
Now I would like to consult about another aspect. Changing to 255 exposed a bug in the test which I missed because internally I ran the wrong test version, but the failed tests in github show it. The vs test uses setReadOnlyAttr to simulate a value of 15 for SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES, expects passing 14 to succeed and 15 to fail. This worked as long as the default in orchagent was 15. But after I changed it to 255, creating a session with queue=15 succeeded, and the test failed. I believe the reason is that for efficiency, orchagent only queries SAI once for SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES, caches the value, and compares against the cached value whenever a session is created. You can see that code in this PR. But that probably means that when the test tries to set the limit with setReadOnlyAttr, it has no effect, because orchagent already completed its initialization, found no implementation of SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES in the vs environment, and set the limit to the default of 255. This theory is supported by additional tests I did today, where 254 succeeds and 255 fails. That means that the test cannot simulate its own limit, and must assume the same default as orchagent.
To sum up:
- In the light of the discussion above, do you still think I should change the default back to 15?
- Do you agree that I should get rid of setReadOnlyAttr in the test, and align with whatever default orchagent sets for testing the limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. Approving from my side. Lets wait for @bingwang-ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -39,6 +39,11 @@ | |||
#define MIRROR_SESSION_DSCP_MIN 0 | |||
#define MIRROR_SESSION_DSCP_MAX 63 | |||
|
|||
// 15 is a typical value, but if vendor's SAI does not supply the maximum value, | |||
// allow all 8-bit numbers, effectively cancelling validation by orchagent. | |||
#define MIRROR_SESSION_DEFAULT_NUM_TC 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. Approving from my side. Lets wait for @bingwang-ms
Signed-off-by: Raphael Tryster <raphaelt@nvidia.com>
…eter against maximum value from SAI (sonic-net#1957) Signed-off-by: Raphael Tryster <raphaelt@nvidia.com>
…lue from SAI (#1957) * Orchagent validates mirror session queue parameter against maximum value from SAI Signed-off-by: Raphael Tryster <raphaelt@nvidia.com>
Signed-off-by: Raphael Tryster raphaelt@nvidia.com
What I did
On initialization, orchagent queries SAI for the maximum value of the queue parameter in mirroring sessions. When the user configures a session, if the value requested by the user is invalid, it writes an error in the log and does not pass the request to SAI/SDK.
Why I did it
Fixes https://github.com/Azure/sonic-buildimage/issues/8189[mirroring] Missing per-vendor validation of mirror session queue parameter. Without this change, orchagent passes the invalid data to SAI/SDK which returns failure, causing orchagent to exit, effectively a system crash.
How I verified it
Manual and unit test: Create 3 mirror sessions with queue = 0, maximum valid value, lowest invalid value. First two cases, sessions are created. For last case, session not created in state DB and error in log.
Details if related