-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #1270] Modify DB session logic to allow for multiple schemas #1520
Conversation
@@ -38,11 +38,12 @@ def upgrade(): | |||
nullable=False, | |||
), | |||
sa.PrimaryKeyConstraint("opportunity_id", name=op.f("topportunity_pkey")), | |||
schema="api", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that modifying existing database migrations is generally not a great idea / won't do anything. This was necessary to make these changes in order to get our local setup working properly. This should have no effect on anything non-local.
from src.util import datetime_util | ||
|
||
# Override the default naming of constraints | ||
# to use suffixes instead: | ||
# https://stackoverflow.com/questions/4107915/postgresql-default-constraint-names/4108266#4108266 | ||
metadata = MetaData( | ||
naming_convention={ | ||
"ix": "%(column_0_label)s_idx", | ||
"ix": "%(table_name)s_%(column_0_name)s_idx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config details how SQLAlchemy + Alembic name indexes (and other constraints) in our database. The way it was named before would try adding the schema into the name, which unfortunately now did not match what we already had. This would have required Alembic to drop and recreate our indexes just to make the names match. I preferred to leave them alone and instead just adjust the naming to not include that.
Hmm, the check for whether our DB migrations are correct is failing, locally I didn't have issues, but it seems to be a case of failing when run inside docker, but not outside, which is very weird. Looking into it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I'm not 100% confident that this won't impact prod in some quirky way? But the changes all make sense, especially since we know prod is already using the API schema.
Guess the only way to find out is via testing in dev!
@@ -47,6 +47,7 @@ def get_conn() -> Any: | |||
"postgresql+psycopg://", | |||
pool=conn_pool, | |||
hide_parameters=db_config.hide_sql_parameter_logs, | |||
execution_options={"schema_translate_map": db_config.get_schema_translate_map()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(non-blocking q) What's this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tables have a schema attached to them (in this case api
) but what happens if you have different schemas depending on the environment? This map lets you tell SQLAlchemy to instead generate its queries with the alias. We only use this for tests, as we create schemas like test_12345678_api
and instead do the tests in those to avoid conflicting with any data you might have locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the way we handled this before this change was by never specifying the schema, and instead setting the search_path
when querying the DB, effectively changing what the queries could see. I removed the search_path
logic as it actually breaks a lot when trying to use schemas this way (see latest commits)
I've been holding on merging this - want to think through if there is any smaller set of changes I might make first to be potentially less risky. While I think all of this works uneventfully, some of the oddities I ran into getting it working had me second guessing myself |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for implementing! Some minor suggestions in the test code.
api/tests/lib/db_testing.py
Outdated
monkeypatch.setenv("SCHEMA_PREFIX_OVERRIDE", schema_prefix) | ||
monkeypatch.setenv("DB_CHECK_CONNECTION_ON_INIT", "False") | ||
|
||
# To improve test performance, don't check the database connection | ||
# when initializing the DB client. | ||
db_client = db.PostgresDBClient() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO a slightly cleaner solution without monkeypatching would be to build a PostgresDBConfig
directly and then pass it to PostgresDBClient()
. Something like this:
monkeypatch.setenv("SCHEMA_PREFIX_OVERRIDE", schema_prefix) | |
monkeypatch.setenv("DB_CHECK_CONNECTION_ON_INIT", "False") | |
# To improve test performance, don't check the database connection | |
# when initializing the DB client. | |
db_client = db.PostgresDBClient() | |
db_config = PostgresDBConfig(schema_prefix_override=schema_prefix) | |
monkeypatch.setenv("DB_CHECK_CONNECTION_ON_INIT", "False") | |
# To improve test performance, don't check the database connection | |
# when initializing the DB client. | |
db_client = db.PostgresDBClient(db_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can reconfigure the tests to setup in the way you're suggesting, but we still require the monkeypatch.setenv("SCHEMA_PREFIX_OVERRIDE", db_schema_prefix)
line. We need to set that so the postgres db config that the app client initializes also has the prefix so that it can also set up its schema translate map correctly.
api/tests/conftest.py
Outdated
@pytest.fixture | ||
def test_api_schema(db_client): | ||
db_config = PostgresDBConfig() | ||
return f"{db_config.schema_prefix_override}{Schemas.API}" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a new fixture that creates the prefix string, that both this fixture and db_client()
depend on. In db_client()
it would pass to create_isolated_db()
as an argument. Suggested implementation:
@pytest.fixture | |
def test_api_schema(db_client): | |
db_config = PostgresDBConfig() | |
return f"{db_config.schema_prefix_override}{Schemas.API}" | |
@pytest.fixture(scope="session") | |
def db_schema_prefix(): | |
return f"test_{uuid.uuid4().int}_" | |
@pytest.fixture | |
def test_api_schema(db_schema_prefix): | |
return f"{db_schema_prefix}{Schemas.API}" | |
## Summary Fixes #1446 ## Changes proposed - Fix `make login-db` to set the schema search path. ## Context for reviewers Follow up to #1444 and #1520 so that `make login-db` sets the schema search path to `api` ## Additional information Example of `make login-db` followed by `\dt` (list tables): ![Screenshot 2024-04-03 at 17 24 38](https://github.com/HHS/simpler-grants-gov/assets/3811269/222996da-02d2-4283-9290-35caf3e55a9b)
Summary
Fixes #1270
Time to review: 10 mins
Changes proposed
Modify our SQLAlchemy logic to allow for multiple schemas to be setup. This includes:
api
schemaContext for reviewers
Non-locally this change does not actually change anything yet - locally it does by making local development more similar to non-local
This does not actually setup any new schemas, and every table we create still lives in a single schema, the
api
schema.This change looks far larger than it actually is. Before, all of our tables had their schema set implicitly by the
DB_SCHEMA
environment variable. Locally this value was set topublic
and non-locally it was set toapi
. These changes make it so locally it also usesapi
, however in order for that to work, the Alembic migrations need to explicitly sayapi
(in case we add more schemas later). There is a flag in the Alembic configuration that tells it to generate with the schemas, but we had that disabled. I enabled it so future migrations just work. But in order to make everything work locally, I had to manually fix all of the past migrations to have theapi
schema.Non-locally the schema already was
api
so changing already-run migrations won't matter as they already ran as if they had that value set.Additional information
This change requires you run
make db-recreate
locally in order to use the updated schemas.To test this, I manually ran the database migrations one step at a time, fixing any issues. I then ran the down migrations and made sure they also worked correctly, undoing the up migrations correctly. I then ran a few of our local scripts to make sure everything still worked properly and didn't find any issues.