Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config qos reload followed by config save commands leads to config save command failing with traceback #8888

Closed
dgsudharsan opened this issue Oct 1, 2021 · 3 comments · Fixed by sonic-net/sonic-utilities#1859

Comments

@dgsudharsan
Copy link
Collaborator

dgsudharsan commented Oct 1, 2021

Description

Sometimes, config qos reload followed by config save commands leads to config save command failing with traceback.

The reason for the traceback is, config qos reload triggers a write to DB and a write to config db triggers hostcfgd to update kdump. (There are no duplicate checks) https://github.com/Azure/sonic-buildimage/blob/b2659dcdbc454ceee5128a597fb34f041b1aeb9f/src/sonic-host-services/scripts/hostcfgd#L742

As part of kdump processing, some changes were introduced to save to startup config file.
sonic-net/sonic-utilities#1284

While this is executing and writing to config file in the background, config save command too writes to config file. As part of the flow to write to config_db.json the file is first read using read_json_file and then sorted.
If this happens simultaneously with the flow where kdump directly writes to config file, the trackback is observed. Since there are no file protection mechanisms implemented it results in reading incomplete data in the middle of the write.

Ideally the config save command alone should write to config_db.json. If multiple writers are required, then file protection mechanisms should be implemented to avoid reading incorrect data during writes.

Below is an example of the failed traceback.

config qos reload

/usr/local/bin/sonic-cfggen  -d --write-to-db -t /usr/share/sonic/device/x86_64-mlnx_msn3800-r0/ACS-MSN3800/buffers_dynamic.json.j2,config-db -t /usr/share/sonic/device/x86_64-mlnx_msn3800-r0/ACS-MSN3800/qos.json.j2,config-db -y /etc/sonic/sonic_version.yml
Buffer calculation model updated, restarting swss is required to take effect

config save -y

Running command: [0m[32m/usr/local/bin/sonic-cfggen -d --print-data > /etc/sonic/config_db.json

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/config/main.py", line 116, in read_json_file

   result = json.load(f)
File "/usr/lib/python3.7/json/__init__.py", line 296, in load

   parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.7/json/__init__.py", line 348, in loads

   return _default_decoder.decode(s)
File "/usr/lib/python3.7/json/decoder.py", line 337, in decode

   obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.7/json/decoder.py", line 353, in raw_decode

   obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ':' delimiter: line 3723 column 24 (char 131228)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/config", line 8, in
  sys.exit(config())
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 764, in __call__
  return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 717, in main
  rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1137, in invoke
  return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 956, in invoke

   return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 555, in invoke
  return callback(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/config/main.py", line 1048, in save
  config_db = sort_dict(read_json_file(file))
File "/usr/local/lib/python3.7/dist-packages/config/main.py", line 118, in read_json_file
  raise Exception(str(e))
Exception: Expecting ':' delimiter: line 3723 column 24 (char 131228)

Steps to reproduce the issue:

  1. config qos reload
  2. config save

Describe the results you received:

config save fails with a crash

Describe the results you expected:

config save shouldn't fail.

Output of show version:

SONiC Software Version: SONiC.202106.17-4536f35f2_Internal
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: 4536f35f2
Build date: Wed Sep 22 08:24:08 UTC 2021
Built by: sw-r2d2-bot@r-build-sonic-ci02-244

Platform: x86_64-mlnx_msn4700-r0
HwSKU: ACS-MSN4700
ASIC: mellanox
ASIC Count: 1
Serial Number: Undefined.
Model Number: Undefined.
Hardware Revision: N/A
Uptime: 09:48:31 up 0 min,  1 user,  load average: 0.43, 0.10, 0.03

Docker images:
REPOSITORY                    TAG                            IMAGE ID            SIZE
docker-syncd-mlnx             202106.17-4536f35f2_Internal   0c04f10f2c2c        997MB
docker-syncd-mlnx             latest                         0c04f10f2c2c        997MB
docker-platform-monitor       202106.17-4536f35f2_Internal   227fb1ee97f4        746MB
docker-platform-monitor       latest                         227fb1ee97f4        746MB
docker-dhcp-relay             latest                         2d2618c074d5        420MB
docker-snmp                   202106.17-4536f35f2_Internal   bd7b6e158f3e        455MB
docker-snmp                   latest                         bd7b6e158f3e        455MB
docker-teamd                  202106.17-4536f35f2_Internal   e4d76906805a        425MB
docker-teamd                  latest                         e4d76906805a        425MB
docker-lldp                   202106.17-4536f35f2_Internal   ac6ef41594e4        453MB
docker-lldp                   latest                         ac6ef41594e4        453MB
docker-database               202106.17-4536f35f2_Internal   0c00992e77ea        413MB
docker-database               latest                         0c00992e77ea        413MB
docker-router-advertiser      202106.17-4536f35f2_Internal   5b4d531fb734        413MB
docker-router-advertiser      latest                         5b4d531fb734        413MB
docker-orchagent              202106.17-4536f35f2_Internal   95b72c53b4ee        443MB
docker-orchagent              latest                         95b72c53b4ee        443MB
docker-nat                    202106.17-4536f35f2_Internal   95d2ccfbf36c        428MB
docker-nat                    latest                         95d2ccfbf36c        428MB
docker-macsec                 202106.17-4536f35f2_Internal   2d103b4b2d0f        428MB
docker-macsec                 latest                         2d103b4b2d0f        428MB
docker-sonic-telemetry        202106.17-4536f35f2_Internal   563dc1a9b412        502MB
docker-sonic-telemetry        latest                         563dc1a9b412        502MB
docker-sonic-mgmt-framework   202106.17-4536f35f2_Internal   58b3da73944d        570MB
docker-sonic-mgmt-framework   latest                         58b3da73944d        570MB
docker-fpm-frr                202106.17-4536f35f2_Internal   7258659b2c3d        443MB
docker-fpm-frr                latest                         7258659b2c3d        443MB
docker-sflow                  202106.17-4536f35f2_Internal   79306ace497b        426MB
docker-sflow                  latest                         79306ace497b        426MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@dgsudharsan
Copy link
Collaborator Author

@rajendra-dendukuri Can you please look into this?

@lguohan
Copy link
Collaborator

lguohan commented Oct 1, 2021

i agree with the analysis. kdump should not save to startup config directly.

@moshemos
Copy link

moshemos commented Oct 6, 2021

@lguohan can you update on the status? what are the next step here. we need this fix for 202106.

lguohan pushed a commit to sonic-net/sonic-utilities that referenced this issue Oct 7, 2021
)

Fix sonic-net/sonic-buildimage#8888

Warn user to save the config instead of saving the kdump config
in config_db.json under the covers. Having more than one actors
operate on config_db.json can result in an exception.
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this issue Aug 3, 2023
…859)

Fix sonic-net/sonic-buildimage#8888

Warn user to save the config instead of saving the kdump config
in config_db.json under the covers. Having more than one actors
operate on config_db.json can result in an exception.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants