feat(server): Multiprocess migration and `db_cleanup` #4175

whisperity · 2024-02-28T16:15:24Z

Closes #4171.

This patch allows the server to run the schema migration and db_cleanup actions in parallel across the configured products, to speed up these operations. Previously, the migration and cleanup ran in a sequential job on all products, forfeiting the benefit of having multiple CPUs on a server. As each product is a separate database and there must not be any shared resource between products, it is safe to run each migration in a separate process, in parallel.

Migrations and cleanup is prepared for scheduling in a deterministic order, ORDER BY endpoint. (Previously it was done by the ROWID.)

The connection to the "config" database is released early on to prevent a timeout on the unused and not changing configuration database from crashing the server during a longer-running product migration.

Added a facility to create beautiful logging output in migration scripts and for the cleanup routines. This log output will now necessarily include the product's endpoint identifier, as the log messages are no longer sequential.

whisperity · 2024-02-28T16:25:14Z

Example output of a 2-product server's startup

From the import of a database created under 6.22 to 6.23 + this patch.

[INFO 2024-02-28 17:18] - Checking configuration database ...
[INFO 2024-02-28 17:18] - Database schema mismatch! Possible to upgrade.
[WARNING 2024-02-28 17:18] - Please note after migration only newer CodeChecker versions can be used to start the server!
[WARNING 2024-02-28 17:18] - It is advised to make a full backup of your configuration database!
[WARNING 2024-02-28 17:18] - /CodeChecker/config.sqlite
Do you want to upgrade to the new schema? Y(es)/n(o) y
[INFO 2024-02-28 17:18] - Upgrading schema ...
[INFO][2024-02-28 17:18:51] {migration/config} [config] - 000000000001:25 - Hello 'config' migration script!
[INFO 2024-02-28 17:18] - Database is up to date.
[INFO 2024-02-28 17:18] - Status of products:
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Product endpoint | Database status                                | Database location           | Schema version in the database | Schema version in the package
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Default          | Database schema mismatch! Possible to upgrade. | /CodeChecker/Default.sqlite | 9d956a0fae8d                   | 000000000000
Test             | Database schema mismatch! Possible to upgrade. | /CodeChecker/Test.sqlite    | 9d956a0fae8d                   | 000000000000
----------------------------------------------------------------------------------------------------------------------------------------------------------------
[WARNING 2024-02-28 17:18] - Multiple products can be upgraded, make a backup!
[INFO 2024-02-28 17:18] - Preparing schema upgrade for '<all products>'
[WARNING 2024-02-28 17:18] - Please note after migration only newer CodeChecker versions can be used to start the server!
[WARNING 2024-02-28 17:18] - It is advised to make a full backup of your run databases.
[INFO 2024-02-28 17:18] - ========================
[INFO 2024-02-28 17:18] - Checking: Default
[INFO 2024-02-28 17:18] - Database schema mismatch! Possible to upgrade.
Do you want to upgrade 'Default' to new schema? Y(es)/n(o) y
[INFO 2024-02-28 17:18] - [Default] Schema will be upgraded...
[INFO 2024-02-28 17:18] - Checking: Test
[INFO 2024-02-28 17:18] - Database schema mismatch! Possible to upgrade.
Do you want to upgrade 'Test' to new schema? Y(es)/n(o) y
[INFO 2024-02-28 17:18] - [Test] Schema will be upgraded...
[INFO 2024-02-28 17:18] - ========================
[INFO 2024-02-28 17:18] - Initialising/upgrading products using 2 concurrent jobs...
[INFO 2024-02-28 17:18] - [Default] Upgrading...
[INFO 2024-02-28 17:18] - [Test] Upgrading...
[INFO][2024-02-28 17:18:53] {migration/report} [Default] - 000000000000:25 - Hello 'product' migration script!
[INFO][2024-02-28 17:18:53] {migration/report} [Test] - 000000000000:25 - Hello 'product' migration script!
[INFO 2024-02-28 17:18] - [Test] Done upgrading. Database is up to date.
[INFO 2024-02-28 17:18] - [Default] Done upgrading. Database is up to date.
[INFO 2024-02-28 17:18] - Schema initialisation(s)/upgrade(s) executed successfully.
[INFO 2024-02-28 17:18] - ========================
[INFO 2024-02-28 17:18] - Status of products:
-----------------------------------------------------------------------------------------------------------------------------------------
Product endpoint | Database status         | Database location           | Schema version in the database | Schema version in the package
-----------------------------------------------------------------------------------------------------------------------------------------
Default          | Database is up to date. | /CodeChecker/Default.sqlite | 000000000000 (up to date)      | 000000000000
Test             | Database is up to date. | /CodeChecker/Test.sqlite    | 000000000000 (up to date)      | 000000000000
-----------------------------------------------------------------------------------------------------------------------------------------
[INFO 2024-02-28 17:18] - Performing database cleanup using 2 concurrent jobs...
[INFO 2024-02-28 17:18] - [Default] Garbage collection for started...
[INFO 2024-02-28 17:18] - [Test] Garbage collection for started...
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'bugprone-empty-catch' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'bugprone-empty-catch' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'bugprone-incorrect-enable-if' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'bugprone-incorrect-enable-if' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'bugprone-non-zero-enum-to-bool-conversion' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'bugprone-non-zero-enum-to-bool-conversion' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'bugprone-optional-value-conversion' checker from UNSPECIFIED to LOW
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'bugprone-optional-value-conversion' checker from UNSPECIFIED to LOW
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'bugprone-use-after-move' checker from CRITICAL to HIGH
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'bugprone-use-after-move' checker from CRITICAL to HIGH
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'core.BitwiseShift' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'core.BitwiseShift' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'cppcoreguidelines-misleading-capture-default-by-value' checker from UNSPECIFIED to LOW
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'cppcoreguidelines-misleading-capture-default-by-value' checker from UNSPECIFIED to LOW
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'cppcoreguidelines-noexcept-move-operations' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'cppcoreguidelines-noexcept-move-operations' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'cppcoreguidelines-noexcept-swap' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'cppcoreguidelines-noexcept-swap' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'performance-noexcept-swap' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'performance-noexcept-swap' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'readability-reference-to-constructed-temporary' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'readability-reference-to-constructed-temporary' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'unix.Errno' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'unix.Errno' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'unix.StdCLibraryFunctions' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'unix.StdCLibraryFunctions' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:19] - [Test] Garbage collection finished.
[INFO 2024-02-28 17:19] - [Default] Garbage collection finished.
[INFO 2024-02-28 17:19] - Server waiting for client requests on [[::]:8001]

Szelethus · 2024-03-12T13:00:54Z

web/server/codechecker_server/cmd/server.py

+                                                       interactive=False,
+                                                       env=environ)
+        if init_instead_of_upgrade:
+            LOG.info("[%s] Initialising...", endpoint)


I think we are pretty firmly embedded in US english, certainly for "initialization".

bruntib · 2024-03-21T22:51:32Z

web/server/codechecker_server/migrations/config/env.py

-    run_migrations_offline()
-else:
-    run_migrations_online()
+    raise NotImplementedError(f"Offline '{schema}' migration is not possible!")


The offline migration has been deleted. I'm not sure, what it was, but if it's removed, do we need this NotImplementedError? I mean, will it ever be implemented?

Offline migrations are an official Alembic feature (alembic -c [config file] -n [schema] upgrade/downgrade [from-revision]:[to-revision] --sql) in which it does not actually perform migration on an actual database, but generates an SQL script which, when fed to a real database, does the migration. This is great... as long as your migrations scripts only ever touch schema (DDL) but become impossible when the migration scripts want to do something with real data.
If we had been pedantic enough in the past, the removal of this function should have happened at the first moment a schema migration script was introduced which actually queries the database.

Because we have scripts that require live (online) migration, it is impossible to implement offline migrations going forward. No, I do not think we ever used or documented this feature ourselves. However, the lack of the run_migrations_offline() function (which contents were the automatically generated one from the very beginning of the project) does not prevent the user from exercising the --sql flag on alembic. This means that we have no way of actually preventing this bad workflow prior to this moment in this script.

We are in the post-truth AI era where people will eventually go and do unfathomably misguided things, after which thoughtless issues will be reported, which makes our lives harder when we have to decipher bad-quality exception traces, see below. And even if misguided for this specific project, the "industry standard" way of using schema migrations in the "web world" is that you do that with a separate command that looks somewhat or completely unlike your normal commands: either calling alembic ... upgrade directly; or through a wrapper, such as manage.py migrate/sqlmigrate or flask db upgrade. The fact that CodeChecker does this (and even enforces this, but gracefully) within the normal workflow makes us the odd one out.

Without this change, an attempt at running an offline migration results in a harder-to-understand exception somewhere deep inside the migration scripts, pointing to the location where the first actual database access would need to happen, but can not happen, because offline mode was requested. I think this exception popping out is scary because it talks about things that essentially "point inside the standard library", so to say, and the end of the exception claims that a library type lacks a very much keyworded, special function???

(Also note that some SQL statements still get emitted, before the error happens.)

PRAGMA foreign_keys=OFF; DROP INDEX ix_reports_checker_id; ALTER TABLE reports RENAME checker_id TO checker_id_lookup; PRAGMA foreign_keys=ON; ALTER TABLE reports RENAME "analyzer_name_MOVED_TO_checkers" TO analyzer_name; ALTER TABLE reports RENAME "checker_name_MOVED_TO_checkers" TO checker_id; ALTER TABLE reports RENAME severity_moved_to_checkers TO severity; ALTER TABLE reports RENAME "checker_cat_UNUSED" TO checker_cat; ALTER TABLE reports RENAME "bug_type_UNUSED" TO bug_type; Traceback (most recent call last): File "./venv_dev/bin/alembic", line 8, in <module> sys.exit(main()) File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 559, in main CommandLine(prog=prog).main(argv=argv) File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 553, in main self.run_cmd(cfg, options) File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 530, in run_cmd fn( File "./venv_dev/lib/python3.8/site-packages/alembic/command.py", line 335, in downgrade script.run_env() File "./venv_dev/lib/python3.8/site-packages/alembic/script/base.py", line 490, in run_env util.load_python_file(self.dir, "env.py") File "./venv_dev/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file module = load_module_py(module_id, path) File "./venv_dev/lib/python3.8/site-packages/alembic/util/compat.py", line 182, in load_module_py spec.loader.exec_module(module) File "<frozen importlib._bootstrap_external>", line 848, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "./web/server/codechecker_server/migrations/report/env.py", line 126, in <module> run_migrations_offline() File "./web/server/codechecker_server/migrations/report/env.py", line 99, in run_migrations_offline context.run_migrations() File "<string>", line 8, in run_migrations File "./venv_dev/lib/python3.8/site-packages/alembic/runtime/environment.py", line 813, in run_migrations self.get_context().run_migrations(**kw) File "./venv_dev/lib/python3.8/site-packages/alembic/runtime/migration.py", line 560, in run_migrations step.migration_fn(**kw) File "./web/server/codechecker_server/migrations/report/versions/c3dad71f8e6b_store_information_about_enabled_and_disabled_checkers_for_a_run.py", line 563, in downgrade downgrade_reports() File "./web/server/codechecker_server/migrations/report/versions/c3dad71f8e6b_store_information_about_enabled_and_disabled_checkers_for_a_run.py", line 472, in downgrade_reports Base.prepare(conn, reflect=True) File "./venv_dev/lib/python3.8/site-packages/sqlalchemy/ext/automap.py", line 786, in prepare cls.metadata.reflect( File "./venv_dev/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 4558, in reflect with bind.connect() as conn: AttributeError: __enter__

With this change, the exception is thrown earlier (before any SQL statement printouts could happen), the stack trace is much more trivial, and even without a stack trace, the error message is plain in telling us what is going wrong.

Traceback (most recent call last): File "./venv_dev/bin/alembic", line 8, in <module> sys.exit(main()) File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 559, in main CommandLine(prog=prog).main(argv=argv) File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 553, in main self.run_cmd(cfg, options) File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 530, in run_cmd fn( File "./venv_dev/lib/python3.8/site-packages/alembic/command.py", line 335, in downgrade script.run_env() File "./venv_dev/lib/python3.8/site-packages/alembic/script/base.py", line 490, in run_env util.load_python_file(self.dir, "env.py") File "./venv_dev/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file module = load_module_py(module_id, path) File "./venv_dev/lib/python3.8/site-packages/alembic/util/compat.py", line 182, in load_module_py spec.loader.exec_module(module) File "<frozen importlib._bootstrap_external>", line 848, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "./web/server/codechecker_server/migrations/report/env.py", line 60, in <module> raise NotImplementedError(f"Offline '{schema}' migration is not possible!") NotImplementedError: Offline 'report' migration is not possible!

This patch allows the server to run the schema migration and `db_cleanup` actions in parallel across the configured products, to speed up these operations. Previously, the migration and cleanup ran in a sequential job on all products, forfeiting the benefit of having multiple CPUs on a server. As each product is a separate database and there must not be any shared resource between products, it is safe to run each migration in a separate process, in parallel. Migrations and cleanup is prepared for scheduling in a deterministic order, `ORDER BY endpoint`. (Previously it was done by the `ROWID`.) The connection to the "config" database is released early on to prevent a timeout on the unused and not changing configuration database from crashing the server during a longer-running product migration. Added a facility to create beautiful logging output in migration scripts and for the cleanup routines. This log output will now necessarily include the product's `endpoint` identifier, as the log messages are no longer sequential.

whisperity · 2024-03-25T11:24:54Z

@bruntib I rebased the patch, GitHub misrenders the changes from previous patch snapshot. I did the following changes:

Fixed a typo in db_cleanup
Refactored the getting of the to-be-migrated product list. I realised that "not keeping alive the CONFIG connection during PRODUCT migration" does not mean "close and reopen the CONFIG connection for every product". I made it so that one CONFIG connection is opened, the products are checked, the list of to-migrate products is created, and then the CONFIG connection is closed. So no timeouts, but the entire migration process might get a little bit faster... (It's still very fast nevertheless! 😁)

bruntib

Thanks so much, this is a great work and a huge improvement!

whisperity added enhancement 🌟 database 🗄️ Issues related to the database schema. server 🖥️ performance 🏃 labels Feb 28, 2024

whisperity added this to the release 6.24.0 milestone Feb 28, 2024

whisperity requested review from dkrupp, bruntib and vodorok as code owners February 28, 2024 16:15

whisperity force-pushed the feat/server/multiprocess-migrations branch 4 times, most recently from a69b946 to 4e9001a Compare March 2, 2024 18:24

Szelethus reviewed Mar 12, 2024

View reviewed changes

bruntib reviewed Mar 22, 2024

View reviewed changes

whisperity force-pushed the feat/server/multiprocess-migrations branch from 4e9001a to 1a0be78 Compare March 25, 2024 11:21

bruntib approved these changes Apr 16, 2024

View reviewed changes

bruntib merged commit 5228362 into Ericsson:master Apr 16, 2024
7 of 8 checks passed

whisperity mentioned this pull request Apr 25, 2024

feat(script): Verify the existence of checker config doc_url pages and find appropriate older releases for gone (removed, dealpha, etc.) checkers #4207

Merged

whisperity deleted the feat/server/multiprocess-migrations branch April 30, 2024 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): Multiprocess migration and `db_cleanup` #4175

feat(server): Multiprocess migration and `db_cleanup` #4175

whisperity commented Feb 28, 2024

whisperity commented Feb 28, 2024

Szelethus Mar 12, 2024

bruntib Mar 21, 2024

whisperity Mar 22, 2024

whisperity commented Mar 25, 2024

bruntib left a comment

feat(server): Multiprocess migration and db_cleanup #4175

feat(server): Multiprocess migration and db_cleanup #4175

Conversation

whisperity commented Feb 28, 2024

whisperity commented Feb 28, 2024

Example output of a 2-product server's startup

Szelethus Mar 12, 2024

Choose a reason for hiding this comment

bruntib Mar 21, 2024

Choose a reason for hiding this comment

whisperity Mar 22, 2024

Choose a reason for hiding this comment

whisperity commented Mar 25, 2024

bruntib left a comment

Choose a reason for hiding this comment

feat(server): Multiprocess migration and `db_cleanup` #4175

feat(server): Multiprocess migration and `db_cleanup` #4175