-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[supervisor]monit container-checker failed due to unexpected "database-chassis" docker running #9043
Conversation
4f54c7d
to
fa451f0
Compare
@judyjoseph @arlakshm Please help to review this PR. Thanks. |
@@ -56,6 +56,10 @@ def get_expected_running_containers(): | |||
else: | |||
expected_running_containers.add(container_name) | |||
|
|||
if container_name == "database" and device_info.is_supervisor(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I think it is better to add the database-chassis
in the feature table for the supervisor card iso for adding the check here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the feature table is in the init_cfg.json file which is generated while building the image. If we want to add the database-chassis docker to the feature table, we have to make the init_cfg.json as J2 template then allow the the database-chassis container to be inserted/rendered in the feature table for Supervisor card during runtime at system boot up. Is this what you suggested to do or another method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init_cfg.json
is generated from init_cfg.json.j2, recently change was done to make dhcp_relay
feature enabled or disabled based on the attribute in DEVICE_METADATA
.
https://github.com/Azure/sonic-buildimage/blob/51a0aed02af5e2586b207b2493de28180dd22bed/files/build_templates/init_cfg.json.j2#L38
I was wondering if we can follow the same approach and enable database-chassis
when switch_type
in DEVICE_METADATA is fabric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weeks ago, when I looked at this init_cfg.json.j2 with this change, it looks like there is an issue with this change. During image build, the init_cfg.json is generated with the following value for the "dhcp-relay".
"dhcp_relay": {
"state": "{% if not (DEVICE_METADATA is defined and DEVICE_METADATA['localhost'] is defined and DEVICE_METADATA['localhost']['type'] is defined and DEVICE_METADATA['localhost']['type'] != 'ToRRouter') %}enabled{% else %}disabled{% endif %}", false, "enabled")) %}",
"has_timer" : false,
"has_global_scope": false,
"has_per_asic_scope": false,
"auto_restart": "enabled",
"high_mem_alert": "disabled"
},
And, at boot up for the first time installation, the dhcp-relay state value is no longer rendered to either "enabled" or "disabled" when generated the default configuration. Based on what I saw, script "sonic-cfggen" will not render init_cfg.json file since it is using option "-j". if using option "-t" as template, it will work. Am I missing anything with this change? Or do you know who renders it before the "sonic-cfggen" combine it to the config_db.json?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arlakshm When the init_cfg,json with value ' "dhcp_relay": {"state": "{% if not (DEVICE_META....." ' is rendered to either "enabled" and "disabled"? I cannot find where the code renders the value of "state' for dhcp_relay to "enabled" or "disable". Please help me understand it and I can make the same design for the database-chassis and test it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abdosi - will you be able to help answer this? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the investigation, I don't think we can use DEVICE_METATDATA|localhost|swtich_type to determine to enable the database-chassis or not. The dependency of database.service required 'database-chassis' to be started before the 'database'. Without start the database-chassis first, 'database' cannot be started and not able to load the DVEVICE_METADATA and the feature table
In file database.service
Wants=database-chassis.service
After=database-chassis.service
Hi @mlok-nokia, the I think we can follow the same approach and add the |
We can to do. The logic needs to be like below since Supervisor card can be multiasic too:
|
@arlakshm Please take a look at the reply which I made based on your suggestion -- adding the database-chassis in the init_cfg.json.j2 file and use the logic I mentioned in the reply. Is that correct? Thanks |
@mlok-nokia. The feature table may not have 'database-chassis' container. I think we can add the check of the supervisor at the end, somtehing like this?
|
Ok. I will make the change as you suggested -- always add for the Supervisor card. Thanks |
…e-chassis" docker running sonic-net#9042 -- Added database-chassis to the expected docker list for the monit to check. Signed-off-by: mlok <marty.lok@nokia.com>
fa451f0
to
6befb6a
Compare
@arlakshm I have made the change. Please review it. Thanks |
… "database-chassis" docker running #9042 (#9043) Why I did it Fixed the monit container_checker fails due to unexpected "database-chassis" docker running on Supervisor card in the VOQ chassis. fixes #9042 How I did it Added database-chassis to the always running docker list if platform is supervisor card. How to verify it Execute the CLI command "sudo monit status container_checker" Signed-off-by: mlok <marty.lok@nokia.com>
… "database-chassis" docker running #9042 (#9043) Why I did it Fixed the monit container_checker fails due to unexpected "database-chassis" docker running on Supervisor card in the VOQ chassis. fixes #9042 How I did it Added database-chassis to the always running docker list if platform is supervisor card. How to verify it Execute the CLI command "sudo monit status container_checker" Signed-off-by: mlok <marty.lok@nokia.com>
Why I did it
Fixed the monit container_checker fails due to unexpected "database-chassis" docker running on Supervisor card in the VOQ chassis. fixes #9042
How I did it
Added database-chassis to the always running docker list if platform is supervisor card.
How to verify it
Execute the CLI command "sudo monit status container_checker"
Which release branch to backport (provide reason below if selected)
Description for the changelog
A picture of a cute animal (not mandatory but encouraged)