Skip to content

Commit

Permalink
Reorg0.7.0 (#160)
Browse files Browse the repository at this point in the history
* Change directory structure to put pypgstac and pgstac under /src

* switch pypgstac to use hatch

* move migrations to the pgstac tree

* make symlink in pypgstac for migrations

* move pgstac.sql to src/pgstac/

* update scripts and docker setup

* Cleanup unused files. Adjust tests to work with new scripts

* update sql with partitioning rework and maintenance tooling

* fix: allow missing aws credential in pre-commit

* switch from methodtools to cachetools, remove commented out code

* add fix for #156

---------

Co-authored-by: Matt McFarland <mmcfarland@microsoft.com>
  • Loading branch information
bitner and mmcfarland authored Feb 21, 2023
1 parent f42e233 commit f39c1c1
Show file tree
Hide file tree
Showing 187 changed files with 7,938 additions and 2,354 deletions.
30 changes: 5 additions & 25 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,26 +1,6 @@
{
"name": "Ubuntu",
"build": {
"dockerfile": "../docker/Dockerfile",
},

// Set *default* container specific settings.json values on container create.
"settings": {
"terminal.integrated.shell.linux": "/bin/bash"
},

// Add the IDs of extensions you want installed when the container is created.
"extensions": [],

// Use 'forwardPorts' to make a list of ports inside the container available locally.
"forwardPorts": [5432],

// Use 'postCreateCommand' to run commands after the container is created.
//"postCreateCommand": "/docker-entrypoint.sh postgres",
"overrideCommand": false,

"containerEnv": {"POSTGRES_HOST_AUTH_METHOD": "trust","POSTGRES_USER":"postgres"},

// Comment out connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
"remoteUser": "root"
}
"name": "PGStac",
"dockerComposeFile": "../docker-compose.yml",
"service": "pgstac",
"workspaceFolder": "/opt/src"
}
11 changes: 11 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,14 @@
*.eggs
venv/*
*/.direnv/*
*/.ruff_cache/*
*/.vscode/*
*/.mypy_cache/*
*/.pgadmin/*
*/.ipynb_checkpoints/*
*/.git/*
*/.github/*
*/env/*
Dockerfile
docker-compose.yml
*/.devcontainer/*
4 changes: 0 additions & 4 deletions .flake8

This file was deleted.

22 changes: 19 additions & 3 deletions .github/workflows/continuous-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,27 @@ on:
- main
pull_request:

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
DOCKER_BUILDKIT: 1

jobs:
test:
name: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Execute linters and test suites
run: ./scripts/cibuild
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v1
- name: builder
id: builder
uses: docker/build-push-action@v2
with:
context: .
load: true
push: false
cache-from: type=gha
cache-to: type=gha, mode=max

- name: Run tests
run: docker run --rm ${{ steps.builder.outputs.imageid }} test
15 changes: 12 additions & 3 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,23 @@ jobs:
- name: Install release dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
pip install setuptools wheel twine build
- name: Build and publish package
- name: Build pypgstac release
run: |
pushd src/pypgstac
rm -rf dist
python -m build --sdist --wheel
popd
- name: Publish pypgstac release
env:
TWINE_USERNAME: ${{ secrets.PYPI_STACUTILS_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_STACUTILS_PASSWORD }}
run: |
scripts/cipublish
pushd src/pypgstac
twine upload dist/*
popd
- name: Tag Release
uses: "marvinpinto/action-automatic-releases@v1.2.1"
Expand Down
61 changes: 61 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: check-yaml
- id: check-added-large-files
- id: check-toml
- id: detect-aws-credentials
args: [--allow-missing-credential]
- id: detect-private-key
- id: check-json
- id: mixed-line-ending
- id: check-merge-conflict

- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: 'v0.0.231'
hooks:
- id: ruff
files: pypgstac\/.*\.py$

- repo: local
hooks:
- id: sql
name: sql
entry: scripts/test
args: [--basicsql, --pgtap]
language: script
pass_filenames: false
verbose: true
fail_fast: true
files: sql\/.*\.sql$
- id: formatting
name: formatting
entry: scripts/test
args: [--formatting]
language: script
pass_filenames: false
verbose: true
fail_fast: true
always_run: true
- id: pypgstac
name: pypgstac
entry: scripts/test
args: [--pypgstac]
language: script
pass_filenames: false
verbose: true
fail_fast: true
files: pypgstac\/.*\.py$
- id: migrations
name: migrations
entry: scripts/test
args: [--migrations]
language: script
pass_filenames: false
verbose: true
fail_fast: true
files: migrations\/.*\.sql$
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,27 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).

## [v0.7.0]

### Added
- Reorganize code base to create clearer separation between pgstac sql code and pypgstac.
- Move Python tooling to use hatch with all python project configuration in pyproject.toml
- Rework testing framework to not rely on pypgstac or migrations. This allows to run tests on any code updates without creating a version first. If a new version has been staged, the tests will still run through all incremental migrations to make sure they pass as well.
- Add pre-commit to run formatting as well as the tests appropriate for which files have changed.
- Add a query queue to allow for deferred processing of steps that do not change the ability to get results, but enhance performance. The query queue allows to use pg_cron or similar to run tasks that are placed in the queue.
- Modify triggers to allow the use of the query queue for building indexes, adding constraints that are used solely for constraint exclusion, and updating partition and collection spatial and temporal extents. The use of the queue is controlled by the new configuration parameter "use_queue" which can be set as the pgstac.use_queue GUC or by setting in the pgstac_settings table.
- Reorganize how partitions are created and updated to maintain more metadata about partition extents and better tie the constraints to the actual temporal extent of a partition.
- Add "partitions" view that shows stats about number of records, the partition range, constraint ranges, actual date range and spatial extent of each partition.
- Add ability to automatically update the extent object on a collection using the partition metadata via triggers. This is controlled by the new configuration parameter "update_collection_extent" which can be set as the pgstac.update_collection_extent GUC or by setting in the pgstac_settings table. This can be combined with "use_queue" to defer the processing.
- Add many new tests.
- Migrations now make sure that all objects in the pgstac schema are owned by the pgstac_admin role. Functions marked as "SECURITY DEFINER" have been moved to the lower level functions responsible for creating/altering partitions and adding records to the search/search_wheres tables. This should open the door for approaches to using Row Level Security.
- Allow pypgstac loader to load data on pgstac databases that have the same major version even if minor version differs. [162] (https://github.com/stac-utils/pgstac/issues/162) Cherry picked from https://github.com/stac-utils/pgstac/pull/164.

### Fixed
- Allow empty strings in datetime intervals
- Set search_path and application_name upon connection rather than as kwargs for compatibility with RDS [156] (https://github.com/stac-utils/pgstac/issues/156)


## [v0.6.13]

### Fixed
Expand Down
44 changes: 31 additions & 13 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,24 +55,42 @@ scripts/stageversion 0.2.8
This will create a base migration for the new version and will create incremental migrations between any existing base migrations. The incremental migrations that are automatically generated by this script will have the extension ".staged" on the file. You must manually review (and make any modifications necessary) this file and remove the ".staged" extension to enable the migration.

### Making Changes to SQL
All changes to SQL should only be made in the `/sql` directory. SQL Files will be run in alphabetical order.
All changes to SQL should only be made in the `/src/pgstac/sql` directory. SQL Files will be run in alphabetical order.

### Adding Tests
PGStac uses PGTap to test SQL. Tests can be found in tests/pgtap.sql and are run using `scripts/test`
PGStac tests can be written using PGTap or basic SQL output comparisons. Additional testing is available using PyTest in the PyPgSTAC module. Tests can be run using the `scripts/test` command.

PGTap tests can be written using [PGTap](https://pgtap.org/) syntax. Tests should be added to the `/src/pgstac/tests/pgtap` directory. Any new sql files added to this directory must be added to `/src/pgstac/tests/pgtap.sql`.

The Basic SQL tests will run any file ending in '.sql' in the `/src/pgstac/tests/basic` directory and will compare the exact results to the corresponding '.sql.out' file.

PyPgSTAC tests are located in `/src/pypgstac/tests`.

All tests can be found in tests/pgtap.sql and are run using `scripts/test`

Individual tests can be run with any combination of the following flags "--formatting --basicsql --pgtap --migrations --pypgstac". If pre-commit is installed, tests will be run on commit based on which files have changed.


### To make a PR
1) Make any changes.
2) Make sure there are tests if appropriate.
3) Update Changelog using "### Unreleased" as the version.
4) Make any changes necessary to the docs.
5) Ensure all tests pass (pre-commit will take care of this if installed and the tests will also run on CI)
6) Create PR against the "main" branch.



### Release Process
1) Make sure all your code is added and committed
2) Create a PR against the main branch
3) Once the PR has been merged, start the release process.
4) Upate the version in `pypgstac/pypgstac/version.py`
5) Use `scripts/stageversion VERSION` as documented in migrations section above making sure to rename any files ending in ".staged" in the migrations section
6) Add details for release to the CHANGELOG
7) Add/Commit any changes
8) Run tests `scripts/test`
9) Create a git tag `git tag v0.2.8` using new version number
10) Push the git tag `git push origin v0.2.8`
11) The CI process will push pypgstac to PyPi, create a docker image on ghcr.io, and create a release on github.
1) Run "scripts/stageversion VERSION" (where version is the next version using semantic versioning ie 0.7.0
2) Check the incremental migration created in the /src/pgstac/migrations file with the .staged extension to make sure that the generated SQL looks appropriate.
3) Run the tests against the incremental migrations "scripts/test --migrations"
4) Move any "Unreleased" changes in the CHANGELOG.md to the new version.
5) Open a PR for the version change.
6) Once the PR has been merged, start the release process.
7) Create a git tag `git tag v0.2.8` using new version number
8) Push the git tag `git push origin v0.2.8`
9) The CI process will push pypgstac to PyPi, create a docker image on ghcr.io, and create a release on github.


### Get Involved
Expand Down
74 changes: 30 additions & 44 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,55 +1,41 @@
FROM postgres:13 as pg

LABEL maintainer="David Bitner"

FROM postgres:15-bullseye as pg
ENV PGSTACDOCKER=1
ENV POSTGIS_MAJOR 3
ENV PGUSER postgres
ENV PGDATABASE postgres
ENV PGHOST localhost
ENV \
PYTHONUNBUFFERED=1 \
PYTHONFAULTHANDLER=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=off \
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100

RUN \
apt-get update \
ENV POSTGIS_VERSION 3.3.2+dfsg-1.pgdg110+1
ENV PYTHONPATH=/opt/src/pypgstac:/opt/python:${PYTHONPATH}
ENV PATH=/opt/bin:${PATH}
ENV PYTHONWRITEBYTECODE=1
ENV PYTHONBUFFERED=1

RUN set -ex \
&& apt-get update \
&& apt-get install -y --no-install-recommends \
gnupg \
apt-transport-https \
debian-archive-keyring \
software-properties-common \
ca-certificates \
python3 python-is-python3 python3-pip \
postgresql-$PG_MAJOR-postgis-$POSTGIS_MAJOR=$POSTGIS_VERSION \
postgresql-$PG_MAJOR-postgis-$POSTGIS_MAJOR-scripts \
postgresql-$PG_MAJOR-pgtap \
postgresql-$PG_MAJOR-partman \
postgresql-$PG_MAJOR-postgis-$POSTGIS_MAJOR \
postgresql-$PG_MAJOR-postgis-$POSTGIS_MAJOR-scripts \
build-essential \
python3 \
python3-pip \
python3-setuptools \
&& pip3 install -U pip setuptools packaging \
&& pip3 install -U psycopg2-binary \
&& pip3 install -U psycopg[binary] \
&& pip3 install -U migra[pg] \
&& apt-get remove -y apt-transport-https \
&& apt-get -y autoremove \
&& rm -rf /var/lib/apt/lists/*
&& apt-get clean && apt-get -y autoremove \
&& rm -rf /var/lib/apt/lists/* \
&& mkdir -p /opt/src/pypgstac/pypgstac \
&& touch /opt/src/pypgstac/pypgstac/__init__.py \
&& touch /opt/src/pypgstac/README.md \
&& echo '__version__ = "0.0.0"' > /opt/src/pypgstac/pypgstac/version.py

EXPOSE 5432
COPY ./src/pypgstac/pyproject.toml /opt/src/pypgstac/pyproject.toml

RUN mkdir -p /docker-entrypoint-initdb.d
RUN echo "#!/bin/bash \n unset PGHOST \n pypgstac migrate" >/docker-entrypoint-initdb.d/initpgstac.sh && chmod +x /docker-entrypoint-initdb.d/initpgstac.sh

RUN mkdir -p /opt/src/pypgstac

WORKDIR /opt/src/pypgstac

COPY pypgstac /opt/src/pypgstac
RUN \
pip3 install --upgrade pip \
&& pip3 install /opt/src/pypgstac[dev,test,psycopg]

RUN pip3 install -e /opt/src/pypgstac[psycopg]
COPY ./src /opt/src
COPY ./scripts/bin /opt/bin

ENV PYTHONPATH=/opt/src/pypgstac:${PYTHONPATH}
RUN \
echo "initpgstac" > /docker-entrypoint-initdb.d/999_initpgstac.sh \
&& chmod +x /docker-entrypoint-initdb.d/999_initpgstac.sh \
&& chmod +x /opt/bin/*

WORKDIR /opt/src
25 changes: 0 additions & 25 deletions Dockerfile.dev

This file was deleted.

16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@

---

**PgSTAC** is a set of SQL function and schema to build highly performant database for Spatio-Temporal Asset Catalog (STAC). The project also provide **pypgstac** python module to help with the database migration and documents ingestion (collections and items).
**PgSTAC** is a set of SQL function and schema to build highly performant database for Spatio-Temporal Asset Catalog ([STAC](https://stacspec.org)). The project also provide **pypgstac** python module to help with the database migration and documents ingestion (collections and items).

PgSTAC provides functionality for STAC Filters and CQL2 search along with utilities to help manage indexing and partitioning of STAC Collections and Items.

PgSTAC is used in production to scale to hundreds of millions of STAC items. PgSTAC implements core data models and functions to provide a STAC API from a PostgreSQL database. As PgSTAC is fully within the database, it does not provide an HTTP facing API. The (Stac FastAPI)[https://github.com/stac-utils/stac-fastapi] PgSTAC backend and (Franklin)[https://github.com/azavea/franklin] can be used to expose a PgSTAC catalog. It is also possible to integrate PgSTAC with any other language that has PostgreSQL drivers.

PgSTAC Documentation: https://stac-utils.github.io/pgstac/pgstac

Expand All @@ -36,10 +40,12 @@ pyPgSTAC Documentation: https://stac-utils.github.io/pgstac/pypgstac

```
/
├── pypgstac/ - pyPgSTAC python module
├── scripts/ - scripts to set up the environment
├── sql/ - PgSTAC SQL code
└── test/ - test suite
├── src/pypgstac - pyPgSTAC python module
├── src/pypgstac/tests/ - pyPgSTAC tests
├── scripts/ - scripts to set up the environment, create migrations, and run tests
├── src/pgstac/sql/ - PgSTAC SQL code
├── src/pgstac/migrations/ - Migrations for incremental upgrades
└── src/pgstac/tests/ - test suite
```

## Contribution & Development
Expand Down
Loading

0 comments on commit f39c1c1

Please sign in to comment.