Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): Multiprocess migration and db_cleanup #4175

Merged

Conversation

whisperity
Copy link
Contributor

Closes #4171.

This patch allows the server to run the schema migration and db_cleanup actions in parallel across the configured products, to speed up these operations. Previously, the migration and cleanup ran in a sequential job on all products, forfeiting the benefit of having multiple CPUs on a server. As each product is a separate database and there must not be any shared resource between products, it is safe to run each migration in a separate process, in parallel.

Migrations and cleanup is prepared for scheduling in a deterministic order, ORDER BY endpoint. (Previously it was done by the ROWID.)

The connection to the "config" database is released early on to prevent a timeout on the unused and not changing configuration database from crashing the server during a longer-running product migration.

Added a facility to create beautiful logging output in migration scripts and for the cleanup routines. This log output will now necessarily include the product's endpoint identifier, as the log messages are no longer sequential.

@whisperity
Copy link
Contributor Author

Example output of a 2-product server's startup

From the import of a database created under 6.22 to 6.23 + this patch.

[INFO 2024-02-28 17:18] - Checking configuration database ...
[INFO 2024-02-28 17:18] - Database schema mismatch! Possible to upgrade.
[WARNING 2024-02-28 17:18] - Please note after migration only newer CodeChecker versions can be used to start the server!
[WARNING 2024-02-28 17:18] - It is advised to make a full backup of your configuration database!
[WARNING 2024-02-28 17:18] - /CodeChecker/config.sqlite
Do you want to upgrade to the new schema? Y(es)/n(o) y
[INFO 2024-02-28 17:18] - Upgrading schema ...
[INFO][2024-02-28 17:18:51] {migration/config} [config] - 000000000001:25 - Hello 'config' migration script!
[INFO 2024-02-28 17:18] - Database is up to date.
[INFO 2024-02-28 17:18] - Status of products:
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Product endpoint | Database status                                | Database location           | Schema version in the database | Schema version in the package
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Default          | Database schema mismatch! Possible to upgrade. | /CodeChecker/Default.sqlite | 9d956a0fae8d                   | 000000000000
Test             | Database schema mismatch! Possible to upgrade. | /CodeChecker/Test.sqlite    | 9d956a0fae8d                   | 000000000000
----------------------------------------------------------------------------------------------------------------------------------------------------------------
[WARNING 2024-02-28 17:18] - Multiple products can be upgraded, make a backup!
[INFO 2024-02-28 17:18] - Preparing schema upgrade for '<all products>'
[WARNING 2024-02-28 17:18] - Please note after migration only newer CodeChecker versions can be used to start the server!
[WARNING 2024-02-28 17:18] - It is advised to make a full backup of your run databases.
[INFO 2024-02-28 17:18] - ========================
[INFO 2024-02-28 17:18] - Checking: Default
[INFO 2024-02-28 17:18] - Database schema mismatch! Possible to upgrade.
Do you want to upgrade 'Default' to new schema? Y(es)/n(o) y
[INFO 2024-02-28 17:18] - [Default] Schema will be upgraded...
[INFO 2024-02-28 17:18] - Checking: Test
[INFO 2024-02-28 17:18] - Database schema mismatch! Possible to upgrade.
Do you want to upgrade 'Test' to new schema? Y(es)/n(o) y
[INFO 2024-02-28 17:18] - [Test] Schema will be upgraded...
[INFO 2024-02-28 17:18] - ========================
[INFO 2024-02-28 17:18] - Initialising/upgrading products using 2 concurrent jobs...
[INFO 2024-02-28 17:18] - [Default] Upgrading...
[INFO 2024-02-28 17:18] - [Test] Upgrading...
[INFO][2024-02-28 17:18:53] {migration/report} [Default] - 000000000000:25 - Hello 'product' migration script!
[INFO][2024-02-28 17:18:53] {migration/report} [Test] - 000000000000:25 - Hello 'product' migration script!
[INFO 2024-02-28 17:18] - [Test] Done upgrading. Database is up to date.
[INFO 2024-02-28 17:18] - [Default] Done upgrading. Database is up to date.
[INFO 2024-02-28 17:18] - Schema initialisation(s)/upgrade(s) executed successfully.
[INFO 2024-02-28 17:18] - ========================
[INFO 2024-02-28 17:18] - Status of products:
-----------------------------------------------------------------------------------------------------------------------------------------
Product endpoint | Database status         | Database location           | Schema version in the database | Schema version in the package
-----------------------------------------------------------------------------------------------------------------------------------------
Default          | Database is up to date. | /CodeChecker/Default.sqlite | 000000000000 (up to date)      | 000000000000
Test             | Database is up to date. | /CodeChecker/Test.sqlite    | 000000000000 (up to date)      | 000000000000
-----------------------------------------------------------------------------------------------------------------------------------------
[INFO 2024-02-28 17:18] - Performing database cleanup using 2 concurrent jobs...
[INFO 2024-02-28 17:18] - [Default] Garbage collection for started...
[INFO 2024-02-28 17:18] - [Test] Garbage collection for started...
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'bugprone-empty-catch' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'bugprone-empty-catch' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'bugprone-incorrect-enable-if' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'bugprone-incorrect-enable-if' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'bugprone-non-zero-enum-to-bool-conversion' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'bugprone-non-zero-enum-to-bool-conversion' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'bugprone-optional-value-conversion' checker from UNSPECIFIED to LOW
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'bugprone-optional-value-conversion' checker from UNSPECIFIED to LOW
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'bugprone-use-after-move' checker from CRITICAL to HIGH
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'bugprone-use-after-move' checker from CRITICAL to HIGH
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'core.BitwiseShift' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'core.BitwiseShift' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'cppcoreguidelines-misleading-capture-default-by-value' checker from UNSPECIFIED to LOW
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'cppcoreguidelines-misleading-capture-default-by-value' checker from UNSPECIFIED to LOW
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'cppcoreguidelines-noexcept-move-operations' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'cppcoreguidelines-noexcept-move-operations' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'cppcoreguidelines-noexcept-swap' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'cppcoreguidelines-noexcept-swap' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'performance-noexcept-swap' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'performance-noexcept-swap' checker from UNSPECIFIED to MEDIUM
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'readability-reference-to-constructed-temporary' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'readability-reference-to-constructed-temporary' checker from UNSPECIFIED to STYLE
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'unix.Errno' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'unix.Errno' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Test] Upgrading severity level of 'unix.StdCLibraryFunctions' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:18] - [Default] Upgrading severity level of 'unix.StdCLibraryFunctions' checker from UNSPECIFIED to HIGH
[INFO 2024-02-28 17:19] - [Test] Garbage collection finished.
[INFO 2024-02-28 17:19] - [Default] Garbage collection finished.
[INFO 2024-02-28 17:19] - Server waiting for client requests on [[::]:8001]

@whisperity whisperity force-pushed the feat/server/multiprocess-migrations branch 4 times, most recently from a69b946 to 4e9001a Compare March 2, 2024 18:24
interactive=False,
env=environ)
if init_instead_of_upgrade:
LOG.info("[%s] Initialising...", endpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are pretty firmly embedded in US english, certainly for "initialization".

run_migrations_offline()
else:
run_migrations_online()
raise NotImplementedError(f"Offline '{schema}' migration is not possible!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offline migration has been deleted. I'm not sure, what it was, but if it's removed, do we need this NotImplementedError? I mean, will it ever be implemented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline migrations are an official Alembic feature (alembic -c [config file] -n [schema] upgrade/downgrade [from-revision]:[to-revision] --sql) in which it does not actually perform migration on an actual database, but generates an SQL script which, when fed to a real database, does the migration. This is great... as long as your migrations scripts only ever touch schema (DDL) but become impossible when the migration scripts want to do something with real data.
If we had been pedantic enough in the past, the removal of this function should have happened at the first moment a schema migration script was introduced which actually queries the database.

Because we have scripts that require live (online) migration, it is impossible to implement offline migrations going forward. No, I do not think we ever used or documented this feature ourselves. However, the lack of the run_migrations_offline() function (which contents were the automatically generated one from the very beginning of the project) does not prevent the user from exercising the --sql flag on alembic. This means that we have no way of actually preventing this bad workflow prior to this moment in this script.

We are in the post-truth AI era where people will eventually go and do unfathomably misguided things, after which thoughtless issues will be reported, which makes our lives harder when we have to decipher bad-quality exception traces, see below. And even if misguided for this specific project, the "industry standard" way of using schema migrations in the "web world" is that you do that with a separate command that looks somewhat or completely unlike your normal commands: either calling alembic ... upgrade directly; or through a wrapper, such as manage.py migrate/sqlmigrate or flask db upgrade. The fact that CodeChecker does this (and even enforces this, but gracefully) within the normal workflow makes us the odd one out.


Without this change, an attempt at running an offline migration results in a harder-to-understand exception somewhere deep inside the migration scripts, pointing to the location where the first actual database access would need to happen, but can not happen, because offline mode was requested. I think this exception popping out is scary because it talks about things that essentially "point inside the standard library", so to say, and the end of the exception claims that a library type lacks a very much keyworded, special function???

(Also note that some SQL statements still get emitted, before the error happens.)

PRAGMA foreign_keys=OFF;

DROP INDEX ix_reports_checker_id;

ALTER TABLE reports RENAME checker_id TO checker_id_lookup;

PRAGMA foreign_keys=ON;

ALTER TABLE reports RENAME "analyzer_name_MOVED_TO_checkers" TO analyzer_name;

ALTER TABLE reports RENAME "checker_name_MOVED_TO_checkers" TO checker_id;

ALTER TABLE reports RENAME severity_moved_to_checkers TO severity;

ALTER TABLE reports RENAME "checker_cat_UNUSED" TO checker_cat;

ALTER TABLE reports RENAME "bug_type_UNUSED" TO bug_type;

Traceback (most recent call last):
  File "./venv_dev/bin/alembic", line 8, in <module>
    sys.exit(main())
  File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 559, in main
    CommandLine(prog=prog).main(argv=argv)
  File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 553, in main
    self.run_cmd(cfg, options)
  File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 530, in run_cmd
    fn(
  File "./venv_dev/lib/python3.8/site-packages/alembic/command.py", line 335, in downgrade
    script.run_env()
  File "./venv_dev/lib/python3.8/site-packages/alembic/script/base.py", line 490, in run_env
    util.load_python_file(self.dir, "env.py")
  File "./venv_dev/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file
    module = load_module_py(module_id, path)
  File "./venv_dev/lib/python3.8/site-packages/alembic/util/compat.py", line 182, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "./web/server/codechecker_server/migrations/report/env.py", line 126, in <module>
    run_migrations_offline()
  File "./web/server/codechecker_server/migrations/report/env.py", line 99, in run_migrations_offline
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "./venv_dev/lib/python3.8/site-packages/alembic/runtime/environment.py", line 813, in run_migrations
    self.get_context().run_migrations(**kw)
  File "./venv_dev/lib/python3.8/site-packages/alembic/runtime/migration.py", line 560, in run_migrations
    step.migration_fn(**kw)
  File "./web/server/codechecker_server/migrations/report/versions/c3dad71f8e6b_store_information_about_enabled_and_disabled_checkers_for_a_run.py", line 563, in downgrade
    downgrade_reports()
  File "./web/server/codechecker_server/migrations/report/versions/c3dad71f8e6b_store_information_about_enabled_and_disabled_checkers_for_a_run.py", line 472, in downgrade_reports
    Base.prepare(conn, reflect=True)
  File "./venv_dev/lib/python3.8/site-packages/sqlalchemy/ext/automap.py", line 786, in prepare
    cls.metadata.reflect(
  File "./venv_dev/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 4558, in reflect
    with bind.connect() as conn:
AttributeError: __enter__

With this change, the exception is thrown earlier (before any SQL statement printouts could happen), the stack trace is much more trivial, and even without a stack trace, the error message is plain in telling us what is going wrong.

Traceback (most recent call last):
  File "./venv_dev/bin/alembic", line 8, in <module>
    sys.exit(main())
  File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 559, in main
    CommandLine(prog=prog).main(argv=argv)
  File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 553, in main
    self.run_cmd(cfg, options)
  File "./venv_dev/lib/python3.8/site-packages/alembic/config.py", line 530, in run_cmd
    fn(
  File "./venv_dev/lib/python3.8/site-packages/alembic/command.py", line 335, in downgrade
    script.run_env()
  File "./venv_dev/lib/python3.8/site-packages/alembic/script/base.py", line 490, in run_env
    util.load_python_file(self.dir, "env.py")
  File "./venv_dev/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file
    module = load_module_py(module_id, path)
  File "./venv_dev/lib/python3.8/site-packages/alembic/util/compat.py", line 182, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "./web/server/codechecker_server/migrations/report/env.py", line 60, in <module>
    raise NotImplementedError(f"Offline '{schema}' migration is not possible!")
NotImplementedError: Offline 'report' migration is not possible!

This patch allows the server to run the schema migration and
`db_cleanup` actions in parallel across the configured products, to
speed up these operations. Previously, the migration and cleanup ran in
a sequential job on all products, forfeiting the benefit of having
multiple CPUs on a server. As each product is a separate database and
there must not be any shared resource between products, it is safe to
run each migration in a separate process, in parallel.

Migrations and cleanup is prepared for scheduling in a deterministic
order, `ORDER BY endpoint`. (Previously it was done by the `ROWID`.)

The connection to the "config" database is released early on to
prevent a timeout on the unused and not changing configuration
database from crashing the server during a longer-running product
migration.

Added a facility to create beautiful logging output in migration
scripts and for the cleanup routines. This log output will now
necessarily include the product's `endpoint` identifier, as the log
messages are no longer sequential.
@whisperity whisperity force-pushed the feat/server/multiprocess-migrations branch from 4e9001a to 1a0be78 Compare March 25, 2024 11:21
@whisperity
Copy link
Contributor Author

@bruntib I rebased the patch, GitHub misrenders the changes from previous patch snapshot. I did the following changes:

  • Fixed a typo in db_cleanup
  • Refactored the getting of the to-be-migrated product list. I realised that "not keeping alive the CONFIG connection during PRODUCT migration" does not mean "close and reopen the CONFIG connection for every product". I made it so that one CONFIG connection is opened, the products are checked, the list of to-migrate products is created, and then the CONFIG connection is closed. So no timeouts, but the entire migration process might get a little bit faster... (It's still very fast nevertheless! 😁)

Copy link
Contributor

@bruntib bruntib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much, this is a great work and a huge improvement!

@bruntib bruntib merged commit 5228362 into Ericsson:master Apr 16, 2024
7 of 8 checks passed
@whisperity whisperity deleted the feat/server/multiprocess-migrations branch April 30, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[server] Multiprocessed schema upgrades for -j products in parallel
3 participants