Merge pull request #294 from lsst-sqre/tickets/DM-46034a

DM-46034: Reorganize database documentation
lsst-sqre · Sep 3, 2024 · 94991e0 · 94991e0
2 parents b8ba5b7 + a3f7f76
commit 94991e0
Show file tree

Hide file tree

Showing 12 changed files with 540 additions and 488 deletions.
diff --git a/docs/user-guide/database.rst b/docs/user-guide/database.rst
diff --git a/docs/user-guide/database/datetime.rst b/docs/user-guide/database/datetime.rst
@@ -0,0 +1,61 @@
+#####################################
+Handling datetimes in database tables
+#####################################
+
+When a database column is defined using the SQLAlchemy ORM using the `~sqlalchemy.types.DateTime` generic type, it cannot store a timezone.
+The SQL standard type `~sqlalchemy.types.DATETIME` may include a timezone with some database backends, but it is database-specific.
+It is therefore normally easier to store times in the database in UTC without timezone information.
+
+However, `~datetime.datetime` objects in regular Python code should always be timezone-aware and use the UTC timezone.
+Timezone-naive datetime objects are often interpreted as being in the local timezone, whatever that happens to be.
+Keeping all datetime objects as timezone-aware in the UTC timezone will minimize surprises from unexpected timezone conversions.
+
+This unfortunately means that the code for storing and retrieving datetime objects from the database needs a conversion layer.
+asyncpg_ wisely declines to convert datetime objects.
+It therefore returns timezone-naive objects from the database, and raises an exception if a timezone-aware datetime object is stored in a `~sqlalchemy.types.DateTime` field.
+The conversion must therefore be done in the code making SQLAlchemy calls.
+
+Safir provides `~safir.database.datetime_to_db` and `~safir.database.datetime_from_db` helper functions to convert from a timezone-aware datetime to a timezone-naive datetime suitable for storing in a DateTime column, and vice versa.
+These helper functions should be used wherever `~sqlalchemy.types.DateTime` columns are read or updated.
+
+`~safir.database.datetime_to_db` ensures the provided datetime object is timezone-aware and in UTC and converts it to a timezone-naive UTC datetime for database storage.
+It raises `ValueError` if passed a timezone-naive datetime object.
+
+`~safir.database.datetime_from_db` ensures the provided datetime object is either timezone-naive or in UTC and returns a timezone-aware UTC datetime object.
+
+Both raise `ValueError` if passed datetime objects in some other timezone.
+Both return `None` if passed `None`.
+
+Examples
+========
+
+Here is example of reading an object from the database that includes DateTime columns:
+
+.. code-block:: python
+
+   from safir.database import datetime_from_db
+
+
+   stmt = select(SQLJob).where(SQLJob.id == job_id)
+   result = (await session.execute(stmt)).scalar_one()
+   job = Job(
+       job_id=job.id,
+       # ...
+       creation_time=datetime_from_db(job.creation_time),
+       start_time=datetime_from_db(job.start_time),
+       end_time=datetime_from_db(job.end_time),
+       destruction_time=datetime_from_db(job.destruction_time),
+       # ...
+   )
+
+Here is an example of updating a DateTime field in the database:
+
+.. code-block:: python
+
+   from safir.database import datetime_to_db
+
+
+   async with session.begin():
+       stmt = select(SQLJob).where(SQLJob.id == job_id)
+       job = (await session.execute(stmt)).scalar_one()
+       job.destruction_time = datetime_to_db(destruction_time)
diff --git a/docs/user-guide/database/dependency.rst b/docs/user-guide/database/dependency.rst
@@ -0,0 +1,77 @@
+############################################
+Using a database session in request handlers
+############################################
+
+For FastAPI applications, Safir provides a FastAPI dependency that creates a database session for each request.
+This uses the `SQLAlchemy async_scoped_session <https://docs.sqlalchemy.org/en/14/orm/extensions/asyncio.html#using-asyncio-scoped-session>`__ to transparently manage a separate session per running task.
+
+Initialize the dependency
+=========================
+
+To use the database session dependency, it must first be initialized during application startup.
+Generally this is done inside the application lifespan function.
+You must also close the dependency during application shutdown.
+
+.. code-block:: python
+
+   from collections.abc import AsyncIterator
+   from contextlib import asynccontextmanager
+
+   from fastapi import FastAPI
+   from safir.dependencies.db_session import db_session_dependency
+
+   from .config import config
+
+
+   @asynccontextmanager
+   async def lifespan(app: FastAPI) -> AsyncIterator[None]:
+       await db_session_dependency.initialize(
+           config.database_url, config.database_password
+       )
+       yield
+       await db_session_dependency.aclose()
+
+
+   app = FastAPI(lifespan=lifespan)
+
+As with some of the examples above, this assumes the application has a ``config`` object with the application settings, including the database URL and password.
+
+Using the dependency
+====================
+
+Any handler that needs a database session can depend on the `~safir.dependencies.db_session.db_session_dependency`:
+
+.. code-block:: python
+
+   from typing import Annotated
+
+   from fastapi import Depends
+   from safir.dependencies.db_session import db_session_dependency
+   from sqlalchemy.ext.asyncio import async_scoped_session
+
+
+   @app.get("/")
+   async def get_index(
+       session: Annotated[
+           async_scoped_session, Depends(db_session_dependency)
+       ],
+   ) -> Dict[str, str]:
+       async with session.begin():
+           # ... do something with session here ...
+           return {}
+
+Transaction management
+======================
+
+The application must manage transactions when using the Safir database dependency.
+SQLAlchemy will automatically start a transaction if you perform any database operation using a session (including read-only operations).
+If that transaction is not explicitly ended, `asyncpg`_ may leave it open, which will cause database deadlocks and other problems.
+
+Generally it's best to manage the transaction in the handler function (see the ``get_index`` example, above).
+Wrap all code that may make database calls in an ``async with session.begin()`` block.
+This will open a transaction, commit the transaction at the end of the block, and roll back the transaction if the block raises an exception.
+
+.. note::
+
+   Due to an as-yet-unexplained interaction with FastAPI 0.74 and later, managing the transaction inside the database session dependency does not work.
+   Calling ``await session.commit()`` there, either explicitly or implicitly via a context manager, immediately fails by raising ``asyncio.CancelledError`` and the transaction is not committed or closed.
diff --git a/docs/user-guide/database/index.rst b/docs/user-guide/database/index.rst
@@ -0,0 +1,28 @@
+######################
+Using the database API
+######################
+
+Safir-based applications that use a SQL database can use Safir to initialize that database and acquire a database session.
+Safir-based applications that use FastAPI can also use the Safir-provided FastAPI dependency to manage per-request database sessions.
+The Safir database support is based on `SQLAlchemy`_ and assumes use of PostgreSQL (possibly via `Cloud SQL <https://cloud.google.com/sql>`__) as the underlying database.
+
+Safir is an asyncio framework and requires using SQLAlchemy's asyncio API.
+Safir uses the `asyncpg`_ PostgreSQL database driver.
+
+Database support in Safir is optional.
+To use it, depend on ``safir[db]`` in your pip requirements.
+
+Also see :ref:`pydantic-dsns` for Pydantic types that help with configuring the PostgreSQL DSN.
+
+Guides
+======
+
+.. toctree::
+   :titlesonly:
+
+   initialize
+   dependency
+   session
+   datetime
+   retry
+   testing
diff --git a/docs/user-guide/database/initialize.rst b/docs/user-guide/database/initialize.rst
@@ -0,0 +1,130 @@
+#######################
+Initializing a database
+#######################
+
+Safir supports simple initialization of a database with a schema provided by the application.
+By default, this only adds any declared but missing tables, indices, or other objects, and thus does nothing if the database is already initialized.
+The application may also request a database reset, which will drop and recreate all of the tables in the current schema.
+
+More complex database schema upgrades are not supported by Safir.
+If those are required, consider using `Alembic <https://alembic.sqlalchemy.org/en/latest/>`__.
+
+Define the database schema
+==========================
+
+Database initialization in Safir assumes that the application has defined the database schema via the SQLAlchemy ORM.
+The recommended way to do this is to add a ``schema`` directory to the application containing the table definitions.
+
+In the file :file:`schema/base.py`, define the SQLAlchemy declarative base:
+
+.. code-block:: python
+
+   from sqlalchemy.orm import declarative_base
+
+   Base = declarative_base()
+
+In other files in that directory, define the database tables using the normal SQLAlchemy ORM syntax, one table per file.
+Each database table definition must inherit from ``Base``, imported from ``.base``.
+
+In :file:`schema/__init__.py`, import the table definitions from all of the files in the directory, as well as the ``Base`` variable, and export them using ``__all__``.
+
+Using non-default PostgreSQL schemas
+------------------------------------
+
+Sometimes it is convenient for multiple applications to share the same database but use separate collections of tables.
+PostgreSQL supports serving multiple schemas (in the sense of a namespace of tables) from the same database, and SQLAlchemy supports specifying the PostgreSQL schema for a given collection of tables.
+
+The normal way to do this in SQLAlchemy is to modify the `~sqlalchemy.orm.DeclarativeBase` subclass used by the table definitions to specify a non-default schema.
+For example:
+
+.. code-block:: python
+
+   from sqlalchemy import MetaData
+   from sqlalchemy.orm import DeclarativeBase
+
+   from ..config import config
+
+
+   class Base(DeclarativeBase):
+       metadata = MetaData(schema=config.database_schema)
+
+If ``config.database_schema`` is `None`, the default schema will be used; otherwise, SQLAlchemy will use the specified schema instead of the default one.
+
+Safir supports this in database initialization by creating a non-default schema if one is set.
+If the ``schema`` attribute is set (via code like the above) on the SQLAlchemy metadata passed to the ``schema`` parameter of `~safir.database.initialize_database`, it will create that schema in the PostgreSQL database if it does not already exist.
+
+Add a CLI command to initialize the database
+============================================
+
+The recommended approach to add database initialization to an application is to add an ``init`` command to the command-line interface that runs the database initialization code.
+For applications using Click_ (the recommended way to implement a command-line interface), this can be done with code like:
+
+.. code-block:: python
+
+   import click
+   import structlog
+   from safir.asyncio import run_with_asyncio
+   from safir.database import create_database_engine, initialize_database
+
+   from .config import config
+   from .schema import Base
+
+
+   # Definition of main omitted.
+
+
+   @main.command()
+   @click.option(
+       "--reset", is_flag=True, help="Delete all existing database data."
+   )
+   @run_with_asyncio
+   async def init(reset: bool) -> None:
+       logger = structlog.get_logger(config.logger_name)
+       engine = create_database_engine(
+           config.database_url, config.database_password
+       )
+       await initialize_database(
+           engine, logger, schema=Base.metadata, reset=reset
+       )
+       await engine.dispose()
+
+This code assumes that ``main`` is the Click entry point and ``.config`` provides a ``config`` object that contains the settings for the application, including the database URL and password as well as the normal Safir configuration settings.
+
+The database URL may be a Pydantic ``Url`` type or a `str`.
+The database password may be a ``pydantic.SecretStr``, a `str`, or `None` if no password is required by the database.
+
+If it receives a connection error from the database, Safir will attempt the initialization five times, two seconds apart, to allow time for networking or a database proxy to start.
+
+To drop and recreate all of the tables, pass the ``reset=True`` option to `~safir.database.initialize_database`.
+
+Run database initialization on pod startup
+==========================================
+
+The recommended pattern for Safir-based applications that use a database but do not use Alembic is to initialize the database every time the pod has been restarted.
+
+Since initialization does nothing if the schema already exists, this is safe to do.
+It only wastes a bit of time during normal startup.
+This allows the application to be deployed on a new cluster without any special initialization step.
+
+The easiest way to do this is to add a script (conventionally located in ``scripts/start-frontend.sh``) that runs the ``init`` command and then starts the application with Uvicorn_:
+
+.. code-block:: sh
+
+   #!/bin/bash
+
+   set -eu
+
+   application init
+   uvicorn application.main:app --host 0.0.0.0 --port 8080
+
+Replace ``application`` with the application entry point (the first line) and Python module (the second line).
+(These may be different if the application name contains dashes.)
+
+Then, use this as the default command for the Docker image:
+
+.. code-block:: docker
+
+   COPY scripts/start-frontend.sh /start-frontend.sh
+   CMD ["/start-frontend.sh"]
+
+As a side effect, this will test database connectivity during pod startup and wait for network or a database proxy to be ready if needed, which avoids the need for testing database connectivity during the application startup.
diff --git a/docs/user-guide/database/retry.rst b/docs/user-guide/database/retry.rst
@@ -0,0 +1,66 @@
+##############################
+Retrying database transactions
+##############################
+
+To aid in retrying transactions, Safir provides a decorator function, `safir.database.retry_async_transaction`.
+Retrying transactions is often useful in conjunction with a custom transaction isolation level.
+
+Setting an isolation level
+==========================
+
+If you have multiple simultaneous database writers and need to coordinate their writes to ensure consistent results, you may have to set a custom isolation level, such as ``REPEATABLE READ``,
+In this case, transactions that attempt to modify an object that was modified by a different connection will raise an exception and can be retried.
+
+`~safir.database.create_database_engine` and the ``initialize`` method of `~safir.dependencies.db_session.db_session_dependency` take an optional ``isolation_level`` argument that can be used to set a non-default isolation level.
+If given, this parameter is passed through to the underlying SQLAlchemy engine.
+
+See `the SQLAlchemy isolation level documentation <https://docs.sqlalchemy.org/en/20/orm/session_transaction.html#setting-transaction-isolation-levels-dbapi-autocommit>`__ for more information.
+See :doc:`dependency` and :doc:`session` for more information about initializing the database engine.
+
+Retrying transactions
+=====================
+
+To retry failed transactions in a function or method, decorate that function or method with `~safir.database.retry_async_transaction`.
+
+If the decorated function or method raises `sqlalchemy.exc.DBAPIError` (the parent exception for exceptions raised by the underlying database API), it will be re-ran with the same arguments.
+This will be repeated a configurable number of times (three by default).
+The decorated function or method must therefore be idempotent and safe to run repeatedly.
+
+Here's a simplified example from the storage layer of a Safir application:
+
+.. code-block:: python
+
+   from datetime import datetime
+
+   from safir.database import datetime_to_db, retry_async_transaction
+   from sqlalchemy.ext.asyncio import async_scoped_session
+
+
+   class Storage:
+       def __init__(self, session: async_scoped_session) -> None:
+           self._session = session
+
+       @retry_async_transaction
+       async def mark_start(self, job_id: str, start: datetime) -> None:
+           async with self._session.begin():
+               job = await self._get_job(job_id)
+               if job.phase in ("PENDING", "QUEUED"):
+                   job.phase = "EXECUTING"
+               job.start_time = start
+
+If this method races with other methods updating the same job, the custom isolation level will force this update to fail with an exception, and it will then be retried by the decorator.
+
+Changing the retry delay
+------------------------
+
+The decorator will delay for half a second (configurable with the ``delay`` parameter) between attempts, and by default the method is attempted three times.
+These can be changed with a parameter to the decorator, such as:
+
+.. code-block:: python
+   :emphasize-lines: 2
+
+   class Storage:
+       @retry_async_transaction(max_tries=5, delay=2.5)
+       async def mark_start(self, job_id: str, start: datetime) -> None:
+           async with self._session.begin():
+               ...