Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multiasic][supervisor] sonic-db-cli crashes at boot up when execute sonic-db-cli PING command in database.sh on multiasic platform #12047

Closed
mlok-nokia opened this issue Sep 12, 2022 · 8 comments
Assignees
Labels
Chassis 🤖 Modular chassis support MSFT Triaged this issue has been triaged

Comments

@mlok-nokia
Copy link
Contributor

Description

On supervisor card, sonic-db-cli crashes when executes the sonic-db-cli PING command in the database.sh. The new implementation of the sonci-db-cli with PING command calls initializeGlobalConfig() which will check all ASICs redis#/sonic-db/database_config.json files which are not ready yet. This cause crash and the following error log. This function was used to wait for all database ready. If sonic-db-cli tries to access redis#/sonic-db/database_config.json files, it will failed.

Sep  9 23:21:15 sonic sonic-db-cli: :- parseDatabaseConfig: Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Sep  9 23:21:15 sonic database.sh[4739]: terminate called after throwing an instance of 'std::runtime_error'
Sep  9 23:21:15 sonic database.sh[4739]:   what():  Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Sep  9 23:21:15 sonic sonic-db-cli: :- initializeGlobalConfig: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json

There are 16 ASICs on this supervisor cards. This issue is similar to the isisue #10105. If sonic-db-cli behavior is changed, we may need to change waitForAllInstanceDatabaseConfigJsonFilesReady

Steps to reproduce the issue:

  1. Reboot the the syatem with the new image.

Describe the results you received:

There are core files. and the following error logs

Sep  9 23:21:15 sonic sonic-db-cli: :- parseDatabaseConfig: Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Sep  9 23:21:15 sonic database.sh[4739]: terminate called after throwing an instance of 'std::runtime_error'
Sep  9 23:21:15 sonic database.sh[4739]:   what():  Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json
Sep  9 23:21:15 sonic sonic-db-cli: :- initializeGlobalConfig: Sonic database config file syntax error >> Sonic database config file doesn't exist at /var/run/redis/sonic-db/../../redis0/sonic-db/database_config.json

Describe the results you expected:

There should not be any core file and no error log against the sonic-db-cli.

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@zhangyanzhao zhangyanzhao added Triaged this issue has been triaged MSFT labels Sep 14, 2022
@zhangyanzhao
Copy link
Collaborator

@qiluo-msft can you please help to check the sonic-db-cli behavior change and see how to fix? looks like scalability issue Thanks.

@rlhui rlhui added the Chassis 🤖 Modular chassis support label Sep 14, 2022
@rlhui
Copy link
Contributor

rlhui commented Sep 14, 2022

@SuvarnaMeenakshi - would we please check if multi-asic vs tests would catch this? Thanks.

@anamehra
Copy link
Contributor

@abdosi , This is the same as we are observing on 202205 based image.

@SuvarnaMeenakshi
Copy link
Contributor

parseDatabaseConfig

@SuvarnaMeenakshi - would we please check if multi-asic vs tests would catch this? Thanks.

As this error is seen during boot up, multi-asic VS tests suite we have today in PR checker will not be able to flag this.
This might be the case for any boot up exception seen in syslog.
If there is a reboot test case and post reboot exception seen in syslog will be flagged by log analyzer.

This specific issue is seen only on supervisor and not seen on multi-asic VS or multi-asic LC

@liuh-80
Copy link
Contributor

liuh-80 commented Oct 24, 2022

Create following PR to fix this issue:
sonic-net/sonic-swss-common#701

According to the database.sh code, it will wait until database ready by check sonic-db-cli return value, when database not ready sonic-db-cli should return 1:

https://github.com/sonic-net/sonic-buildimage/blob/master/files/build_templates/docker_image_ctl.j2

        until [[ ($(docker exec -i database$DEV pgrep -x -c supervisord) -gt 0) && ($($SONIC_DB_CLI PING | grep -c PONG) -gt 0) &&
                 ($(docker exec -i database$DEV sonic-db-cli PING | grep -c PONG) -gt 0) ]]; do
          sleep 1;
        done

However, because a code regression in sonic-db-cli, sonic-db-cli will crash.

liuh-80 added a commit to sonic-net/sonic-swss-common that referenced this issue Oct 31, 2022
…eady issue. (#701)

#### Why I did it
Fix sonic-db-cli  PING/SAVE/FLUSHALL command crash when database config file not ready issue:
sonic-net/sonic-buildimage#12047

#### How I did it
When run PING/SAVE/FLUSHALL command, catch database initialize failed exception and return 1.

#### How to verify it
Pass all existing UT and E2E test.
Add new UT to cover changed code.
Manually test, sonic-db-cli will return 1 when run PING command and can't find config file:

azureuser@a7f66d2b794c:/sonic/src/sonic-swss-common$ ./sonic-db-cli/sonic-db-cli PING
An exception of type Sonic database config file doesn't exist at /var/run/redis/sonic-db/database_config.json occurred. Arguments:
/sonic/src/sonic-swss-common/sonic-db-cli/.libs/sonic-db-cli PING
azureuser@a7f66d2b794c:/sonic/src/sonic-swss-common$ echo $?
1


#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111
- [x] 202205

#### Description for the changelog
Fix sonic-db-cli  PING/SAVE/FLUSHALL command crash when database config file not ready issue.

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/SONiC/wiki/Configuration.
-->

#### A picture of a cute animal (not mandatory but encouraged)
yxieca pushed a commit to sonic-net/sonic-swss-common that referenced this issue Nov 3, 2022
…eady issue. (#701)

#### Why I did it
Fix sonic-db-cli  PING/SAVE/FLUSHALL command crash when database config file not ready issue:
sonic-net/sonic-buildimage#12047

#### How I did it
When run PING/SAVE/FLUSHALL command, catch database initialize failed exception and return 1.

#### How to verify it
Pass all existing UT and E2E test.
Add new UT to cover changed code.
Manually test, sonic-db-cli will return 1 when run PING command and can't find config file:

azureuser@a7f66d2b794c:/sonic/src/sonic-swss-common$ ./sonic-db-cli/sonic-db-cli PING
An exception of type Sonic database config file doesn't exist at /var/run/redis/sonic-db/database_config.json occurred. Arguments:
/sonic/src/sonic-swss-common/sonic-db-cli/.libs/sonic-db-cli PING
azureuser@a7f66d2b794c:/sonic/src/sonic-swss-common$ echo $?
1


#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111
- [x] 202205

#### Description for the changelog
Fix sonic-db-cli  PING/SAVE/FLUSHALL command crash when database config file not ready issue.

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/SONiC/wiki/Configuration.
-->

#### A picture of a cute animal (not mandatory but encouraged)
@rlhui
Copy link
Contributor

rlhui commented Nov 11, 2022

fix available, please confirm if this can be closed @mlok-nokia

@mlok-nokia
Copy link
Contributor Author

I checked the changes in 202205 branch. It doesn't fix all issues. Although the change avoids the crash occurs and allow the database to load the configuration file, but the core files are still generated.

admin@supervisor:~$ ls /var/core -al
total 376
drwxr-xr-x 1 root root 4096 Nov 22 22:00 .
drwxr-xr-x 1 root root 4096 Nov 22 20:50 ..
-rw-r--r-- 1 root root 88525 Nov 22 21:42 sonic-db-cli.1669153338.6192.core.gz
-rw-r--r-- 1 root root 93392 Nov 22 21:42 sonic-db-cli.1669153339.6757.core.gz
-rw-r--r-- 1 root root 93413 Nov 22 21:42 sonic-db-cli.1669153339.6886.core.gz
-rw-r--r-- 1 root root 93284 Nov 22 21:42 sonic-db-cli.1669153339.7072.core.gz

@rlhui rlhui moved this to In Progress in SONiC Chassis Dec 14, 2022
rlhui pushed a commit that referenced this issue Feb 21, 2023
…ic platform after the c++ implementation of sonic-db-cli (#13207)

Fixe #12047. After the c++ implementation of the sonic-db-cli, sonic-db-cli PING command tries to initialize the global database for all instances database starting. If all instance database-config.json are not ready yet. it will crash and generate core file. PR sonic-net/sonic-swss-common#701 only fix the crash and the process abortion. 

Signed-off-by: mlok <marty.lok@nokia.com>
@liuh-80
Copy link
Contributor

liuh-80 commented Feb 23, 2023

@mlok-nokia, because the PR #13207 merged, could you please confirm we can close this issue and #13740?

mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this issue Mar 6, 2023
…ic platform after the c++ implementation of sonic-db-cli (sonic-net#13207)

Fixe sonic-net#12047. After the c++ implementation of the sonic-db-cli, sonic-db-cli PING command tries to initialize the global database for all instances database starting. If all instance database-config.json are not ready yet. it will crash and generate core file. PR sonic-net/sonic-swss-common#701 only fix the crash and the process abortion. 

Signed-off-by: mlok <marty.lok@nokia.com>
mssonicbld pushed a commit that referenced this issue Mar 7, 2023
…ic platform after the c++ implementation of sonic-db-cli (#13207)

Fixe #12047. After the c++ implementation of the sonic-db-cli, sonic-db-cli PING command tries to initialize the global database for all instances database starting. If all instance database-config.json are not ready yet. it will crash and generate core file. PR sonic-net/sonic-swss-common#701 only fix the crash and the process abortion. 

Signed-off-by: mlok <marty.lok@nokia.com>
@github-project-automation github-project-automation bot moved this from In Progress to Done in SONiC Chassis Mar 15, 2023
StormLiangMS pushed a commit to StormLiangMS/sonic-buildimage that referenced this issue Mar 28, 2023
Related work items: sonic-net#276, sonic-net#305, sonic-net#332, sonic-net#338, sonic-net#339, sonic-net#1188, sonic-net#1192, sonic-net#1197, sonic-net#1206, sonic-net#1685, sonic-net#1690, sonic-net#1696, sonic-net#1699, sonic-net#1709, sonic-net#1727, sonic-net#1737, sonic-net#1741, sonic-net#1742, sonic-net#2511, sonic-net#2512, sonic-net#2532, sonic-net#2559, sonic-net#2626, sonic-net#2638, sonic-net#2645, sonic-net#2649, sonic-net#2660, sonic-net#2669, sonic-net#2670, sonic-net#2678, sonic-net#10084, sonic-net#11442, sonic-net#11873, sonic-net#12047, sonic-net#12110, sonic-net#12207, sonic-net#12529, sonic-net#12678, sonic-net#13235, sonic-net#13287, sonic-net#13372, sonic-net#13395, sonic-net#13456, sonic-net#13497, sonic-net#13522, sonic-net#13545, sonic-net#13547, sonic-net#13552, sonic-net#13569, sonic-net#13572, sonic-net#13578, sonic-net#13591, sonic-net#13611, sonic-net#13647, sonic-net#13649, sonic-net#13660, sonic-net#13710, sonic-net#13716, sonic-net#13724, sonic-net#13726, sonic-net#13732, sonic-net#13735, sonic-net#13739, sonic-net#13757, sonic-net#13786, sonic-net#13792, sonic-net#13800, sonic-net#13801, sonic-net#13802, sonic-net#13805, sonic-net#13806, sonic-net#13812, sonic-net#13814, sonic-net#13822, sonic-net#13831, sonic-net#13834, sonic-net#13847, sonic-net#13870, sonic-net#13882, sonic-net#13884, sonic-net#13885, sonic-net#13894, sonic-net#13895, sonic-net#13926, sonic-net#13932, sonic-net#13935, sonic-net#13942, sonic-net#13951, sonic-net#13953, sonic-net#13964
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support MSFT Triaged this issue has been triaged
Projects
Archived in project
Development

No branches or pull requests

7 participants