Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic Buffer Calc] Enhance the logic to check maximum headroom exceeding to cover corner scenarios #2763

Merged
merged 13 commits into from
May 22, 2023

Conversation

stephenxs
Copy link
Collaborator

@stephenxs stephenxs commented May 5, 2023

What I did
Enhance the logic to check maximum headroom exceeding to cover corner scenarios

Currently, the logic to check the maximum headroom exceeding works well when a user changes any buffer configuration in the dynamic buffer model, preventing all problematic configurations from being applied to the ASIC.
However, it can fail when a problematic configuration is config_db.json and config reload is executed. To cover this scenario, the following actions need to be done:

  1. Take the pending PG keys and buffer profiles into account when calculating the maximum headroom
    • Existing buffer PGs and buffer profiles can be in the pending queue since there are a large number of notifications needed to be handled during system initialization, which takes time.
  2. Take the lossy PG into account when calculating the maximum headroom.
    • Non-default lossy PG can be added by the user in config_db.json
  3. Pass the PG to the Lua plugin when refreshing PGs for a port

Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it

Cover corner scenarios.

How I verified it

Manual test, vs test, regression test (dynamic buffer)

Details if related

@stephenxs stephenxs changed the title Fix issue in maximum headroom checking [QoS] Enhance the logic to check maximum headroom exceeding to cover corner scenarios May 8, 2023
@stephenxs
Copy link
Collaborator Author

All test cases are failed due to the following error which should be fixed by sonic-net/sonic-utilities#2830

2023-05-08T08:09:14.5589099Z         # conn.request() calls http.client.*.request, not the method in
2023-05-08T08:09:14.5589452Z         # urllib3.request. It also calls makefile (recv) on the socket.
2023-05-08T08:09:14.5589655Z         try:
2023-05-08T08:09:14.5589756Z >           conn.request(
2023-05-08T08:09:14.5589871Z                 method,
2023-05-08T08:09:14.5590134Z                 url,
2023-05-08T08:09:14.5590232Z                 body=body,
2023-05-08T08:09:14.5590350Z                 headers=headers,
2023-05-08T08:09:14.5590462Z                 chunked=chunked,
2023-05-08T08:09:14.5590591Z                 preload_content=preload_content,
2023-05-08T08:09:14.5590721Z                 decode_content=decode_content,
2023-05-08T08:09:14.5590869Z                 enforce_content_length=enforce_content_length,
2023-05-08T08:09:14.5590986Z             )
2023-05-08T08:09:14.5591229Z E           TypeError: request() got an unexpected keyword argument 'chunked'

@stephenxs
Copy link
Collaborator Author

cherry-pickable to 202211 based on the lastest commit a484ab8684c45d56340bbb3a704ae6ca039e9282

@liat-grozovik
Copy link
Collaborator

/azp run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

The same error as before

2023-05-09T04:05:18.8163888Z         # conn.request() calls http.client.*.request, not the method in
2023-05-09T04:05:18.8164229Z         # urllib3.request. It also calls makefile (recv) on the socket.
2023-05-09T04:05:18.8164485Z         try:
2023-05-09T04:05:18.8164690Z >           conn.request(
2023-05-09T04:05:18.8164887Z                 method,
2023-05-09T04:05:18.8165086Z                 url,
2023-05-09T04:05:18.8165277Z                 body=body,
2023-05-09T04:05:18.8165491Z                 headers=headers,
2023-05-09T04:05:18.8165720Z                 chunked=chunked,
2023-05-09T04:05:18.8165963Z                 preload_content=preload_content,
2023-05-09T04:05:18.8166233Z                 decode_content=decode_content,
2023-05-09T04:05:18.8166525Z                 enforce_content_length=enforce_content_length,
2023-05-09T04:05:18.8166774Z             )
2023-05-09T04:05:18.8167228Z E           TypeError: request() got an unexpected keyword argument 'chunked'
2023-05-09T04:05:18.8170916Z 
2023-05-09T04:05:18.8171542Z /usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py:496: TypeError

stephenxs added 3 commits May 10, 2023 09:38
1. Take the pending PG keys into consideration when calculating the maximum headroom
2. Pass the PG to the Lua plugin when refreshing PGs for a port

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs stephenxs force-pushed the fix-headroom-check branch from 5b9dd0f to f1b589e Compare May 10, 2023 01:38
@keboliu
Copy link
Collaborator

keboliu commented May 10, 2023

@stephenxs maybe this PR can help to unblock the checker? #2767

@stephenxs
Copy link
Collaborator Author

stephenxs commented May 10, 2023

@stephenxs maybe this PR can help to unblock the checker? #2767

thanks, @keboliu but I have already done it in the morning and the test failed in my case. checking.

…seconds

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

stephenxs added 3 commits May 11, 2023 03:50
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs
Copy link
Collaborator Author

It turns out to be that the test failed on the Azure pipeline because the config_db updates were received in a different order in which they were received in my local vs testbed.
Add checkers to guarantee the order.

stephenxs added 5 commits May 11, 2023 12:04
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
This reverts commit c12a9e5.
@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

Failed due to irrelevant test cases. Retriggered

2023-05-12T05:33:32.4509401Z test_virtual_chassis.py::TestVirtualChassis::test_chassis_add_remove_ports FAILED [ 65%]
2023-05-12T05:35:34.6375172Z test_vlan.py::TestVlan::test_VlanMemberLinkDown FAILED                   [ 80%]

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs stephenxs changed the title [QoS] Enhance the logic to check maximum headroom exceeding to cover corner scenarios [Dynamic Buffer Calc] Enhance the logic to check maximum headroom exceeding to cover corner scenarios May 19, 2023
@neethajohn neethajohn merged commit fe8c395 into sonic-net:master May 22, 2023
@stephenxs stephenxs deleted the fix-headroom-check branch May 22, 2023 20:29
@stephenxs
Copy link
Collaborator Author

@StormLiangMS could you please cherry-pick the PR to 202211? thanks.

StormLiangMS pushed a commit that referenced this pull request May 25, 2023
…eeding to cover corner scenarios (#2763)

What I did
Enhance the logic to check maximum headroom exceeding to cover corner scenarios

Currently, the logic to check the maximum headroom exceeding works well when a user changes any buffer configuration in the dynamic buffer model, preventing all problematic configurations from being applied to the ASIC.
However, it can fail when a problematic configuration is config_db.json and config reload is executed. To cover this scenario, the following actions need to be done:

Take the pending PG keys and buffer profiles into account when calculating the maximum headroom
Existing buffer PGs and buffer profiles can be in the pending queue since there are a large number of notifications needed to be handled during system initialization, which takes time.
Take the lossy PG into account when calculating the maximum headroom.
Non-default lossy PG can be added by the user in config_db.json
Pass the PG to the Lua plugin when refreshing PGs for a port
Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it
Cover corner scenarios.

How I verified it
Manual test, vs test, regression test (dynamic buffer)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants