[Issue #1270] Modify DB session logic to allow for multiple schemas #1520

chouinar · 2024-03-22T14:55:43Z

Summary

Time to review: 10 mins

Changes proposed

Modify our SQLAlchemy logic to allow for multiple schemas to be setup. This includes:

Setting the schema explicitly in a class that all SQLAlchemy models inherit from
Setting up a schema translate map (only meaningfully needed for local tests) to allow for changing the name of a schema
A new (LOCAL ONLY) script for creating the api schema

Context for reviewers

Non-locally this change does not actually change anything yet - locally it does by making local development more similar to non-local

This does not actually setup any new schemas, and every table we create still lives in a single schema, the api schema.

This change looks far larger than it actually is. Before, all of our tables had their schema set implicitly by the DB_SCHEMA environment variable. Locally this value was set to public and non-locally it was set to api. These changes make it so locally it also uses api, however in order for that to work, the Alembic migrations need to explicitly say api (in case we add more schemas later). There is a flag in the Alembic configuration that tells it to generate with the schemas, but we had that disabled. I enabled it so future migrations just work. But in order to make everything work locally, I had to manually fix all of the past migrations to have the api schema.

Non-locally the schema already was api so changing already-run migrations won't matter as they already ran as if they had that value set.

Additional information

This change requires you run make db-recreate locally in order to use the updated schemas.

To test this, I manually ran the database migrations one step at a time, fixing any issues. I then ran the down migrations and made sure they also worked correctly, undoing the up migrations correctly. I then ran a few of our local scripts to make sure everything still worked properly and didn't find any issues.

chouinar · 2024-03-22T14:57:28Z

api/src/db/migrations/versions/2023_10_18_basic_opportunity_table.py

@@ -38,11 +38,12 @@ def upgrade():
            nullable=False,
        ),
        sa.PrimaryKeyConstraint("opportunity_id", name=op.f("topportunity_pkey")),
+        schema="api",


Note that modifying existing database migrations is generally not a great idea / won't do anything. This was necessary to make these changes in order to get our local setup working properly. This should have no effect on anything non-local.

chouinar · 2024-03-22T14:59:18Z

api/src/db/models/base.py

 from src.util import datetime_util

 # Override the default naming of constraints
 # to use suffixes instead:
 # https://stackoverflow.com/questions/4107915/postgresql-default-constraint-names/4108266#4108266
 metadata = MetaData(
    naming_convention={
-        "ix": "%(column_0_label)s_idx",
+        "ix": "%(table_name)s_%(column_0_name)s_idx",


This config details how SQLAlchemy + Alembic name indexes (and other constraints) in our database. The way it was named before would try adding the schema into the name, which unfortunately now did not match what we already had. This would have required Alembic to drop and recreate our indexes just to make the names match. I preferred to leave them alone and instead just adjust the naming to not include that.

chouinar · 2024-03-22T15:04:07Z

Hmm, the check for whether our DB migrations are correct is failing, locally I didn't have issues, but it seems to be a case of failing when run inside docker, but not outside, which is very weird. Looking into it

coilysiren

To be honest, I'm not 100% confident that this won't impact prod in some quirky way? But the changes all make sense, especially since we know prod is already using the API schema.

Guess the only way to find out is via testing in dev!

coilysiren · 2024-03-22T16:53:15Z

api/src/adapters/db/clients/postgres_client.py

@@ -47,6 +47,7 @@ def get_conn() -> Any:
            "postgresql+psycopg://",
            pool=conn_pool,
            hide_parameters=db_config.hide_sql_parameter_logs,
+            execution_options={"schema_translate_map": db_config.get_schema_translate_map()}


(non-blocking q) What's this for?

Tables have a schema attached to them (in this case api) but what happens if you have different schemas depending on the environment? This map lets you tell SQLAlchemy to instead generate its queries with the alias. We only use this for tests, as we create schemas like test_12345678_api and instead do the tests in those to avoid conflicting with any data you might have locally.

Note that the way we handled this before this change was by never specifying the schema, and instead setting the search_path when querying the DB, effectively changing what the queries could see. I removed the search_path logic as it actually breaks a lot when trying to use schemas this way (see latest commits)

chouinar · 2024-03-26T14:32:31Z

I've been holding on merging this - want to think through if there is any smaller set of changes I might make first to be potentially less risky. While I think all of this works uneventfully, some of the oddities I ran into getting it working had me second guessing myself

jamesbursa

Looks good, thanks for implementing! Some minor suggestions in the test code.

api/Makefile

jamesbursa · 2024-04-02T02:10:26Z

api/tests/lib/db_testing.py

+    monkeypatch.setenv("SCHEMA_PREFIX_OVERRIDE", schema_prefix)
    monkeypatch.setenv("DB_CHECK_CONNECTION_ON_INIT", "False")

    # To improve test performance, don't check the database connection
    # when initializing the DB client.
    db_client = db.PostgresDBClient()


IMHO a slightly cleaner solution without monkeypatching would be to build a PostgresDBConfig directly and then pass it to PostgresDBClient(). Something like this:

Suggested change

monkeypatch.setenv("SCHEMA_PREFIX_OVERRIDE", schema_prefix)

monkeypatch.setenv("DB_CHECK_CONNECTION_ON_INIT", "False")

# To improve test performance, don't check the database connection

# when initializing the DB client.

db_client = db.PostgresDBClient()

db_config = PostgresDBConfig(schema_prefix_override=schema_prefix)

monkeypatch.setenv("DB_CHECK_CONNECTION_ON_INIT", "False")

# To improve test performance, don't check the database connection

# when initializing the DB client.

db_client = db.PostgresDBClient(db_config)

I can reconfigure the tests to setup in the way you're suggesting, but we still require the monkeypatch.setenv("SCHEMA_PREFIX_OVERRIDE", db_schema_prefix) line. We need to set that so the postgres db config that the app client initializes also has the prefix so that it can also set up its schema translate map correctly.

jamesbursa · 2024-04-02T02:16:44Z

api/tests/conftest.py

+@pytest.fixture
+def test_api_schema(db_client):
+    db_config = PostgresDBConfig()
+    return f"{db_config.schema_prefix_override}{Schemas.API}"
+
+


How about adding a new fixture that creates the prefix string, that both this fixture and db_client() depend on. In db_client() it would pass to create_isolated_db() as an argument. Suggested implementation:

Suggested change

@pytest.fixture

def test_api_schema(db_client):

db_config = PostgresDBConfig()

return f"{db_config.schema_prefix_override}{Schemas.API}"

@pytest.fixture(scope="session")

def db_schema_prefix():

return f"test_{uuid.uuid4().int}_"

@pytest.fixture

def test_api_schema(db_schema_prefix):

return f"{db_schema_prefix}{Schemas.API}"

api/tests/lib/db_testing.py

## Summary Fixes #1446 ## Changes proposed - Fix `make login-db` to set the schema search path. ## Context for reviewers Follow up to #1444 and #1520 so that `make login-db` sets the schema search path to `api` ## Additional information Example of `make login-db` followed by `\dt` (list tables): ![Screenshot 2024-04-03 at 17 24 38](https://github.com/HHS/simpler-grants-gov/assets/3811269/222996da-02d2-4283-9290-35caf3e55a9b)

[Issue #1270] Modify DB session logic to allow for multiple schemas

1683228

chouinar requested a review from coilysiren March 22, 2024 14:55

chouinar requested review from sumiat, widal001, andycochran, acouch, SammySteiner and aplybeah as code owners March 22, 2024 14:55

github-actions bot added documentation Improvements or additions to documentation python api ci/cd database labels Mar 22, 2024

chouinar commented Mar 22, 2024

View reviewed changes

coilysiren approved these changes Mar 22, 2024

View reviewed changes

chouinar and others added 5 commits March 22, 2024 13:08

Remove DB_SCHEMA / setting a search_path which breaks a lot

3594caf

Update database ERD diagrams

9a0eda7

Removing tmp table

7b1b543

Fix nosec

0294b44

Update database ERD diagrams

2298714

acouch approved these changes Mar 25, 2024

View reviewed changes

chouinar added 2 commits April 1, 2024 10:26

Merge branch 'main' into chouinar/1270-multi-schema

e1d31be

Merge branch 'main' into chouinar/1270-multi-schema

ecffb39

github-actions bot removed the documentation Improvements or additions to documentation label Apr 1, 2024

jamesbursa approved these changes Apr 2, 2024

View reviewed changes

Reorganize how the test prefixes for schemas are set

4c9b6c7

chouinar merged commit 85acd72 into main Apr 2, 2024
8 checks passed

chouinar deleted the chouinar/1270-multi-schema branch April 2, 2024 18:36

jamesbursa mentioned this pull request Apr 3, 2024

[Issue #1446] Fix make login-db to set schema path #1616

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue #1270] Modify DB session logic to allow for multiple schemas #1520

[Issue #1270] Modify DB session logic to allow for multiple schemas #1520

chouinar commented Mar 22, 2024 •

edited

Loading

chouinar Mar 22, 2024

chouinar Mar 22, 2024

chouinar commented Mar 22, 2024

coilysiren left a comment

coilysiren Mar 22, 2024

chouinar Mar 22, 2024

chouinar Mar 22, 2024

chouinar commented Mar 26, 2024

jamesbursa left a comment

jamesbursa Apr 2, 2024

chouinar Apr 2, 2024

jamesbursa Apr 2, 2024

[Issue #1270] Modify DB session logic to allow for multiple schemas #1520

[Issue #1270] Modify DB session logic to allow for multiple schemas #1520

Conversation

chouinar commented Mar 22, 2024 • edited Loading

Summary

Time to review: 10 mins

Changes proposed

Context for reviewers

Additional information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chouinar commented Mar 22, 2024

coilysiren left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chouinar commented Mar 26, 2024

jamesbursa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chouinar commented Mar 22, 2024 •

edited

Loading