-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for fast/cold-boot: call db_migrator only after old config is loaded #14933
Conversation
Tests for this PR are in progress. I am waiting for the PR tests to generate a build that I can use to test. |
Tested on physical platform. Rebooted from 202012 to master image (new install): db_migrator now gets delayed during database service bring up:
|
# Perform DB schema migration after loading backup config from previous image | ||
do_db_migration() | ||
{ | ||
if [[ -x /usr/local/bin/db_migrator.py ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@judyjoseph can you please review this part of change from multi DB namespace point of view.
With this change db_migrator will get called always. Earlier you ignored migration as part of #4477
Why would db_migrator be skipped for multi db?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vaibhavhd the db_migrator should run fine for multi-asic config _db's too. I see the VERSIONS table with DATABASE version in multi-asic config _db's, hence it should work as it works for single asic config_db.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Judy! @yxieca we should be good to go ahead with this change. Can you please re-review.
…ded (#14933) Why I did it Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator: Not finding anything, and resumes to incorrectly migrate every missing config This is not expected. migration should happen after the old config is loaded and only new schema changes need migration. Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None. The reason for incorrect call is that: database service starts db_migrator as part of startup sequence. config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service. Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed. Fixed by: Check if this is first time boot by checking pending_config_migration flag. If pending_config_migration is enabled, then do not call db_migrator as part of database service startup. Let database service start which triggers config-setup service to start. Now call db_migrator after when config-setup service loads old-config/minigraph
…ded (sonic-net#14933) Why I did it Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator: Not finding anything, and resumes to incorrectly migrate every missing config This is not expected. migration should happen after the old config is loaded and only new schema changes need migration. Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None. The reason for incorrect call is that: database service starts db_migrator as part of startup sequence. config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service. Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed. Fixed by: Check if this is first time boot by checking pending_config_migration flag. If pending_config_migration is enabled, then do not call db_migrator as part of database service startup. Let database service start which triggers config-setup service to start. Now call db_migrator after when config-setup service loads old-config/minigraph
…ded (sonic-net#14933) Why I did it Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator: Not finding anything, and resumes to incorrectly migrate every missing config This is not expected. migration should happen after the old config is loaded and only new schema changes need migration. Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None. The reason for incorrect call is that: database service starts db_migrator as part of startup sequence. config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service. Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed. Fixed by: Check if this is first time boot by checking pending_config_migration flag. If pending_config_migration is enabled, then do not call db_migrator as part of database service startup. Let database service start which triggers config-setup service to start. Now call db_migrator after when config-setup service loads old-config/minigraph
Cherry-pick PR to 202205: #15316 |
Cherry-pick PR to 202211: #15317 |
…ld config is loaded (sonic-net#14933)" (sonic-net#15464)" This reverts commit 9649a44.
…ld config is loaded (sonic-net#14933)" (sonic-net#15464)" (sonic-net#15684) This reverts commit 9649a44.
…ld config is loaded (sonic-net#14933)" (sonic-net#15464)" (sonic-net#15684) This reverts commit 9649a44.
…sku is None (#2896) Cherry pick of #2821 MSFT ADO: 17972494 Fix errors in db migration when hwsku is not detected. This PR is adds a better-error-handling fix for the issue that is fixed by: sonic-net/sonic-buildimage#14933 May 2 20:35:04 sonic database.sh[649]: Creating new database container May 2 20:35:04 sonic database.sh[663]: 99e8edba01ed0c7581f0d61dd2fa78374fa4f23e636a957004dd03a6f68eea86 May 2 20:35:04 sonic root: Starting database service... May 2 20:35:06 sonic database.sh[690]: database May 2 20:35:10 sonic database.sh[926]: True May 2 20:35:10 sonic database.sh[928]: File "/usr/local/bin/db_migrator.py", line 714, in common_migration_ops May 2 20:35:10 sonic database.sh[928]: File "/usr/local/bin/db_migrator.py", line 741, in migrate May 2 20:35:10 sonic database.sh[928]: File "/usr/local/bin/db_migrator.py", line 782, in main May 2 20:35:10 sonic database.sh[928]: Traceback (most recent call last): May 2 20:35:10 sonic database.sh[928]: TypeError: argument of type 'NoneType' is not iterable May 2 20:35:10 sonic database.sh[928]: argument of type 'NoneType' is not iterable May 2 20:35:10 sonic database.sh[928]: optional arguments: May 2 20:35:10 sonic database.sh[928]: usage: db_migrator.py [-h] [-o operation migrate, set_version, get_version] May 2 20:35:10 sonic db_migrator: :- operator(): DB '{APPL_DB}' is empty with pattern 'COPP_TABLE:*'! May 2 20:35:10 sonic db_migrator: :- operator(): DB '{APPL_DB}' is empty with pattern 'INTF_TABLE:*'! May 2 20:35:10 sonic db_migrator: :- operator(): Key 'BUFFER_MAX_PARAM_TABLE|global' field 'mmu_size' unavailable in database 'STATE_DB' May 2 20:35:10 sonic db_migrator: :- operator(): Key 'WARM_RESTART_ENABLE_TABLE|system' field 'enable' unavailable in database 'STATE_DB' May 2 20:35:10 sonic db_migrator: Caught exception: argument of type 'NoneType' is not iterable May 2 20:35:11 sonic config-setup[935]: Copying SONiC configuration minigraph.xml ... May 2 20:35:11 sonic config-setup[935]: Reloading minigraph... May 2 20:35:11 sonic config-setup[935]: Use minigraph.xml from old system... May 2 20:35:11 sonic root: Started database service... How I did it Convert hwsku's type to str before checking substring. Add error logs when hwsku and asic type information is not obtained. How to verify it Tested on a physical device
…ld config is loaded (sonic-net#14933)" (sonic-net#15464)" (sonic-net#15684) This reverts commit 9649a44.
…ld config is loaded (sonic-net#14933)" (sonic-net#15464)" (sonic-net#15684) This reverts commit 9649a44.
…mboot (#15685) (#16217) Cherypick of #15685 MSFT ADO: 24274591 Why I did it Two changes: 1 Fix a day1 issue, where check to wait until CONFIG_DB_INITIALIZED is incorrect. There are multiple places where same incorrect logic is used. Current logic (until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];) will always result in pass, irrespective of the result of GET operation. root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED" 1 root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done root@str2-7060cx-32s-29:~# root@str2-7060cx-32s-29:~# root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED" 0 root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done root@str2-7060cx-32s-29:~# Fix this logic by checking for value of flag to be "1". root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done entered here entered here entered here This gap in logic was highlighted when another fix was merged: #14933 The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized. 2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case Currently, during warm shutdown CONFIG_DB_INITIALIZED's value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery. So the value of CONFIG_DB_INITIALIZED does not depend on config db's state, however it remain what it was before reboot. Fix this by setting CONFIG_DB_INITIALIZED to 0 as when the DB is loaded, and set it to 1 after db_migrator is done. Work item tracking Microsoft ADO (number only): How I did it How to verify it
…g for warmboot (#16225) Cherry pick of #15685 MSFT ADO: 24274591 #### Why I did it Two changes: ### 1 Fix a day1 issue, where check to wait until `CONFIG_DB_INITIALIZED` is incorrect. There are multiple places where same incorrect logic is used. Current logic (`until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];`) will always result in pass, irrespective of the result of GET operation. ``` root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED" 1 root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done root@str2-7060cx-32s-29:~# root@str2-7060cx-32s-29:~# root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED" 0 root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done root@str2-7060cx-32s-29:~# ``` Fix this logic by checking for value of flag to be "1". ``` root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done entered here entered here entered here ``` This gap in logic was highlighted when another fix was merged: #14933 The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized. ### 2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case Currently, during warm shutdown `CONFIG_DB_INITIALIZED`'s value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery. So the value of `CONFIG_DB_INITIALIZED` does not depend on config db's state, however it remain what it was before reboot. Fix this by setting `CONFIG_DB_INITIALIZED` to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.
…ded (sonic-net#14933) Why I did it Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator: Not finding anything, and resumes to incorrectly migrate every missing config This is not expected. migration should happen after the old config is loaded and only new schema changes need migration. Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None. The reason for incorrect call is that: database service starts db_migrator as part of startup sequence. config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service. Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed. Fixed by: Check if this is first time boot by checking pending_config_migration flag. If pending_config_migration is enabled, then do not call db_migrator as part of database service startup. Let database service start which triggers config-setup service to start. Now call db_migrator after when config-setup service loads old-config/minigraph
…g is loaded (sonic-net#14933)" (sonic-net#15464) This reverts commit 02b1783. Reverts sonic-net#14933 The earlier commit caused a race condition that particularly broke cross branch warm upgrade. Issue happens when db_migrator is still migrating the DB and finalizer is checking DB for list of components to reconcile. If migration is not complete, finalizer get an empty list to wait for. Due to this, finalizer concludes warmboot (deletes system wide warmboot flag) and cause all the services to do cold restart. ADO: 24274591
…ld config is loaded (sonic-net#14933)" (sonic-net#15464)" (sonic-net#15684) This reverts commit 9649a44.
Related work items: sonic-net#94, sonic-net#13789, sonic-net#14149, sonic-net#14515, sonic-net#14788, sonic-net#14922, sonic-net#14933, sonic-net#15284, sonic-net#15383, sonic-net#15464, sonic-net#15519, sonic-net#15521, sonic-net#15575, sonic-net#15636, sonic-net#15652, sonic-net#15684, sonic-net#15708, sonic-net#15725, sonic-net#15739, sonic-net#15755, sonic-net#15756, sonic-net#15757
Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
The reason for incorrect call is that:
Requires=database.service
.Fixed by:
pending_config_migration
flag.pending_config_migration
is enabled, then do not call db_migrator as part of database service startup.Error that's being fixed:
Work item tracking
How I did it
How to verify it
ested on physical platform. Rebooted from 202012 to master image (new install):
db_migrator now gets delayed during database service bring up: Delaying db_migrator until config migration is over
db_migrator now gets called when config-setup service loads DB. The issue of migrating on empty DB is fixed and is evident as the previous error of hwsku being None is not seen anymore:
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)