-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dynamic buffer calc] Support dynamic buffer calculation #1338
[Dynamic buffer calc] Support dynamic buffer calculation #1338
Conversation
d55a7e9
to
dd3b836
Compare
1f33d6d
to
6c60a7d
Compare
15b8235
to
af2872a
Compare
af2872a
to
07ef101
Compare
retest this please |
dd13ec8
to
9907703
Compare
This pull request introduces 1 alert when merging bfc28a9 into a1d6300 - view on LGTM.com new alerts:
|
1. Extend the CLI options for buffermgrd: -a: asic_table provided, -p: peripheral_table provided The buffermgrd will start with dynamic headroom calculation mode With -a provided Otherwise it will start the legacy mode (pg_headroom_profile looking up) 2. A new class is provided for dynamic buffer calculation while the old one remains. The daemon will instantiate the corresponding class according to the CLI option when it starts. 3. In both mode, the buffermgrd will copy BUFFER_XXX tables from CONFIG_DB to APPL_DB and the bufferorch will consume BUFFER_XXX tables from APPL_DB The following points are for dynamic buffer calculation mode 4. In the dynamic buffer calculation mode, there are 3 lua plugins are provided for vendor-specific operations: - buffer_headroom_<vendor>.lua, for calculationg headroom size. - buffer_pool_<vendor>.lua, for calculating buffer pool size. - buffer_check_headroom_<vendor>.lua, for checking whether headroom exceeds the limit 5. During initialization, The daemon will: - load asic_table and peripheral_table from the given json file, parse them and push them into STATE_DB.ASIC_TABLE and STATE_DB.PERIPHERAL_TABLE respectively - load all plugins - try to load the STATE_DB.BUFFER_MAX_PARAM.mmu_size which is used for updating buffer pool size - a timer will be started for periodic buffer pool size audit 6. The daemon will listen to and handle the following tables from CONFIG_DB The tables will be cached internally in the damon for the purpose of saving access time - BUFFER_POOL: - if size is provided: insert the entry to APPL_DB - otherwise: cache them and push to APPL_DB after the size is calculated by lua plugin - BUFFER_PROFILE and BUFFER_PG: - items for ingress lossless headroom need to be cached and handled (according to the design) - other items will be inserted to the APPL_DB directly - PORT_TABLE, for ports' speed and MTU update - CABLE_LENGTH, for ports' cable length 7. Other tables will be copied to APPL_DB directly: - BUFFER_QUEUE - BUFFER_PORT_INGRESS_PROFILE_LIST - BUFFER_PORT_EGRESS_PROFILE_LIST As the names of tables in APPL_DB differ from that in CONFIG_DB, all references should be adjusted accordingly 8. BufferOrch modified accordingly: Consume buffer relavent tables from APPL_DB instead of CONFIG_DB 9. Warm reboot: - db_migrator is responsible for copying the data from CONFIG_DB to APPL_DB if switch is warm-rebooted from an old image to the new image for the first time - no specific handling in the daemon side 10.Provide vstest script Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Very the profile in ASIC_DB when possible: in case of a new profile is created, we can get the OID of the profile by comparing SAI OID set before and after the creation - Add function which can switch buffer model dynamically - Add testcases for mtu update and non-default alpha Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
0700551
to
5b89593
Compare
@stephenxs , Dynamic buffer tests are failing. can you check? |
|
Yes, @neethajohn , it’s expected because the sonic build image and sonic utilities haven’t been merged yet. We need the vsimage supporting dynamic buffer to pass the test.
|
…retry In case the head element in m_toSync needs to retry, the doTask will move to the following items and call process<TableName> for each of the items in the m_toSync. However, as the process<TableName> always handle the first element in m_toSync, it results in the m_toSync being blocked if the head needs retry. Solution: pass the element to which doTask move to process<TableName> so that it's able to handle the following element in m_toSync Signed-off-by: Stephen Sun <stephens@nvidia.com>
@lguohan and @neethajohn can you please merge this PR so we can cont with the merge flow as suggested above? i dont have the ability to do so if vs is failing and in this case it should fail until we finish the merge flow. |
retest vs please |
1 similar comment
retest vs please |
retest this please |
The vs test failed only on the dynamic buffer test cases, which is expected. I think we are safe to merge it. |
**- Why I did it** To support dynamic buffer calculation. This PR also depends on the following PRs for sub modules - [sonic-swss: [buffermgr/bufferorch] Support dynamic buffer calculation #1338](sonic-net/sonic-swss#1338) - [sonic-swss-common: Dynamic buffer calculation #361](sonic-net/sonic-swss-common#361) - [sonic-utilities: Support dynamic buffer calculation #973](sonic-net/sonic-utilities#973) **- How I did it** 1. Introduce field `buffer_model` in `DEVICE_METADATA|localhost` to represent which buffer model is running in the system currently: - `dynamic` for the dynamic buffer calculation model - `traditional` for the traditional model in which the `pg_profile_lookup.ini` is used 2. Add the tables required for the feature: - ASIC_TABLE in platform/\<vendor\>/asic_table.j2 - PERIPHERAL_TABLE in platform/\<vendor\>/peripheral_table.j2 - PORT_PERIPHERAL_TABLE on a per-platform basis in device/\<vendor\>/\<platform\>/port_peripheral_config.j2 for each platform with gearbox installed. - DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2 - Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2 3. Copy the newly introduced j2 files into the image and rendering them when the system starts 4. Update the CLI options for buffermgrd so that it can start with dynamic mode 5. Fetches the ASIC vendor name in orchagent: - fetch the vendor name when creates the docker and pass it as a docker environment variable - `buffermgrd` can use this passed-in variable 6. Clear buffer related tables from STATE_DB when swss docker starts 7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2 8. Remove buffer pool sizes for ingress pools and egress_lossy_pool Update the buffer settings for dynamic buffer calculation
1. Initialize DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN when enable the dynamic buffer and remove them when disable it. Originally they were enabled by default regardless whether dynamic buffer model is enabled, which has been removed according to the latest review comments from Qi. So move the initialization here. 2. Remove all the dependency to buffer commands in order to make the vstest pass without sonic-utilities merged 3. Some enhancement: make the test more stable: - before test starting, waiting all dynamic generated buffer profiles removed - adjust the order in which related tables removed, to make sure no leftover after buffer model switched - as no dynamic profile existing before test starting, it's possible to check asic db for speed change testcase Signed-off-by: Stephen Sun <stephens@nvidia.com>
c36930f
retest this please |
@stephenxs, can you look at the buffer test failures and fix them? |
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Hi @neethajohn done. All tests passed |
@stepanblyschak this commit is causing sonic-util-build-pr failure. Can you please check ? |
@abdosi I'm checking it. |
@abdosi @stephenxs it looks like it is fixed based on the latest few PRs that are still running. I think what happened is that those PRs that failed were based on swss build #2369, which didn't have the code changes in yet, and the later ones are using https://sonic-jenkins.westus2.cloudapp.azure.com/job/vs/job/sonic-swss-build/2370/ or later, which does have the code change. I will look into updating the utilities test job to pull the bins and the tests from the same build rather than cloning the tests from the swss repo directly. (example: https://sonic-jenkins.westus2.cloudapp.azure.com/job/common/job/sonic-utilities-build-pr/3481/console, which uses the swss binaries from 2371) |
Thanks @daall . Just checked it. You are correct. |
Remove mst dump Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
…nic-net#1407) This reverts commit b10622e. **What I did** revert changes to call sdkdump and replace with old call to mstdump **How I did it** reverting a previous commit [Mellanox] Add FW dump with new SAI implementation and remove mst dump sonic-net#1338 **How to verify it** run techsupport
What I did
Support dynamic buffer calculation
Extend the CLI options for buffermgrd:
The
buffermgrd
will start the dynamic headroom calculation mode with -a provided.Otherwise, it will start the legacy mode (pg_headroom_profile looking up)
A new class is provided for dynamic buffer calculation while the old one remains.
The daemon will instantiate the corresponding class according to the CLI option when it starts.
In both modes, the
buffermgrd
will copy BUFFER_XXX tables from CONFIG_DB to APPL_DB and thebufferorch
will consume BUFFER_XXX tables from APPL_DBBackward compatibility
For legacy mode, the backward compatibility is provided. As mentioned above,
buffermgrd
will check whether the json file representing theASIC_TABLE
exists when it starts.This logic is in
cfgmgr/buffermgrd.cpp
.The logic of buffer handling in
buffermgrd
isn't changed in the legacy mode. The differences are:APPL_DB
. All tables are inCONFIG_DB
.buffermgrd
listens toPORT
andCABLE_LENGTH
tables inCONFIG_DB
and inserts the buffer profiles intoBUFFER_PROFILE
table.bufferorch
listens to buffer related tables inCONFIG_DB
and call SAI API correspondingly.buffermgrd
listens to tables inCONFIG_DB
and copies them intoAPPL_DB
buffermgrd
PORT
andCABLE_LENGTH
tables inCONFIG_DB
and inserts the buffer profiles intoBUFFER_PROFILE
table inCONFIG_DB
(not changed)CONFIG_DB
and copies them intoAPPL_DB
bufferorch
listens toAPPL_DB
and call SAI API correspondingly. (the difference is the db it listens to).db_migrator
is responsible to copy the buffer related tables fromCONFIG_DB
toAPPL_DB
when system is warmbooted from the old image to the new image for the first time.The compatible code is in
cfgmgr/buffermgr.cpp
,orchagent/bufferorch.cpp
anddb_migrator
(in the sonic-utilities PR).Why I did it
How I verified it
Dynamic buffer details
The tables will be cached internally in the daemon for the purpose of saving access time