Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerize analytics folder #1538

Merged
merged 19 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions analytics/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Use the official python3 image based on Debian 11 "Bullseye".
Copy link
Collaborator Author

@coilysiren coilysiren Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly copied from api/Dockerfile

# https://hub.docker.com/_/python

# The build stage that will be used to deploy to the various environments
# needs to be called `release` in order to integrate with the repo's
# top-level Makefile
FROM python:3-slim AS base
Copy link
Collaborator

@widal001 widal001 Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this use 3.12 by default? If so we might want to update the minimum required python version in pyproject.toml and fix the warning around the use of 3.11-based timezone syntax, or change this to use 3.11.

Screenshot 2024-03-27 at 2 51 47 PM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its 3.12 right now


# Install poetry, the package manager.
# https://python-poetry.org
RUN pip install --no-cache-dir poetry --upgrade

RUN apt-get update \
# Install security updates
# https://pythonspeed.com/articles/security-updates-in-docker/
&& apt-get upgrade --yes \
&& apt-get install --no-install-recommends --yes \
build-essential \
libpq-dev \
postgresql \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these apt-get values might be removable like this one if the analytics code never hits postgres

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is going to use postgres

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • build-essential: a good all around package to keep
  • libpq-dev: is for postgres
  • PostgreSQL: same as above

So, nothing to remove

wget \
# Reduce the image size by clear apt cached lists
# Complies with https://github.com/codacy/codacy-hadolint/blob/master/codacy-hadolint/docs/description/DL3009.md
&& rm -fr /var/lib/apt/lists/* \
&& rm /etc/ssl/private/ssl-cert-snakeoil.key

# Install gh CLI
# docs: https://github.com/cli/cli/blob/trunk/docs/install_linux.md
RUN mkdir -p -m 755 /etc/apt/keyrings && wget -qO- https://cli.github.com/packages/githubcli-archive-keyring.gpg | tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
&& chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& apt update \
&& apt install gh -y \
&& gh --version
Comment on lines +27 to +34
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new versus the api dockerfile

Copy link
Collaborator

@widal001 widal001 Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks for catching and handling the need to install the GitHub CLI


ARG RUN_UID
ARG RUN_USER

# The following logic creates the RUN_USER home directory and the directory where
# we will be storing the application in the image. This runs when the user is not root
RUN : "${RUN_USER:?RUN_USER and RUN_UID need to be set and non-empty.}" && \
[ "${RUN_USER}" = "root" ] || \
(useradd --create-home --create --user-group --home "/home/${RUN_USER}" --uid ${RUN_UID} "${RUN_USER}" \
&& mkdir /analytics \
&& chown -R ${RUN_UID} "/home/${RUN_USER}" /analytics)

# Set PYTHONPATH so that the tests can find the source code.
ENV PYTHONPATH="/analytics/src/:$PYTHONPATH"
Comment on lines +47 to +48
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add this line

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking request, but can we look into this further? If we're installing the analytics/ package in the docker container correctly, I don't think we should need to set the PYTHONPATH directly, see this stack overflow post for more info

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, but I'm not sure how valuable that would be right now, given that large set of work on my plate. I will make a ticket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the ticket: #1565

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not super worried about it, mainly just want to flag it in case it resurfaces in some other error.


#-----------
# Dev image
#-----------

FROM base AS dev
ARG RUN_USER

# In between ARG RUN_USER and USER ${RUN_USER}, the user is still root
# If there is anything that needs to be ran as root, this is the spot

USER ${RUN_USER}
WORKDIR /analytics

COPY pyproject.toml poetry.lock ./
# Explicitly create a new virtualenv to avoid getting overridden by mounted .venv folders
RUN poetry config virtualenvs.in-project false && poetry env use python
# Install all dependencies including dev dependencies
RUN poetry install --no-root --with dev

COPY . /analytics

#---------
# Release
#---------

FROM base AS release
ARG RUN_USER

# Gunicorn requires this workaround to create writable temporary directory in
# our readonly root file system. https://github.com/aws/containers-roadmap/issues/736
RUN mkdir -p /tmp
VOLUME ["/tmp"]

# TODO(https://github.com/navapbc/template-application-flask/issues/23) Productionize the Docker image

WORKDIR /analytics

COPY . /analytics

# Remove any existing virtual environments that might exist. This
# might happen if testing out building the release image from a local machine
# that has a virtual environment within the project analytics folder.
RUN rm -fr /analytics/.venv

# Set virtualenv location to be in project to be easy to find
# This will create a virtualenv in /analytics/.venv/
# See https://python-poetry.org/docs/configuration/#virtualenvsin-project
# See https://python-poetry.org/docs/configuration/#using-environment-variables
ENV POETRY_VIRTUALENVS_IN_PROJECT=true

# Install production runtime dependencies only
RUN poetry install --no-root --only main

# Build the application binary (python wheel) defined in pyproject.toml
# Note that this will only copy over python files, and files stated in the
# include section in pyproject.toml. Also note that if you change the name or
# version section in pyproject.toml, you will need to change the dist/... to match
# or the application will not build
RUN poetry build --format wheel && poetry run pip install 'dist/simpler_grants_gov_analytics-0.1.0-py3-none-any.whl'

# Add project's virtual env to the PATH so we can directly run poetry scripts
# defiend in pyproject.toml
ENV PATH="/analytics/.venv/bin:$PATH"


USER ${RUN_USER}
99 changes: 82 additions & 17 deletions analytics/Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
POETRY ?= poetry
GITHUB ?= gh
#############
# Constants #
#############

ORG ?= HHS
REPO ?= simpler-grants-gov
SPRINT_PROJECT ?= 13
Expand All @@ -9,57 +11,120 @@ SPRINT ?= @current
UNIT ?= points
ACTION ?= show-results
MIN_TEST_COVERAGE ?= 80
APP_NAME ?= grants-analytics

# Required for CI to work properly
SHELL = /bin/bash -o pipefail

ifdef CI
DOCKER_EXEC_ARGS := -T -e CI -e GH_TOKEN -e ANALYTICS_SLACK_BOT_TOKEN -e ANALYTICS_REPORTING_CHANNEL_ID
else
DOCKER_EXEC_ARGS := -e GH_TOKEN
endif

# By default, all python/poetry commands will run inside of the docker container
# if you wish to run this natively, add PY_RUN_APPROACH=local to your environment vars
# You can set this by either running `export PY_RUN_APPROACH=local` in your shell or add
# it to your ~/.zshrc file (and run `source ~/.zshrc`)
ifeq "$(PY_RUN_APPROACH)" "local"
POETRY := poetry run
GITHUB := gh
else
POETRY := docker-compose run $(DOCKER_EXEC_ARGS) --rm $(APP_NAME) poetry run
GITHUB := docker-compose run $(DOCKER_EXEC_ARGS) --rm $(APP_NAME) gh
endif

Comment on lines +25 to +35
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important to read the documentation here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for calling this out! Works like a charm:

Screenshot 2024-03-27 at 2 36 38 PM

@coilysiren can we update the development.md docs to reference this as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the same text verbatim to development.md 77139c1

# Docker user configuration
# This logic is to avoid issues with permissions and mounting local volumes,
# which should be owned by the same UID for Linux distros. Mac OS can use root,
# but it is best practice to run things as with least permission where possible

# Can be set by adding user=<username> and/ or uid=<id> after the make command
# If variables are not set explicitly: try looking up values from current
# environment, otherwise fixed defaults.
# uid= defaults to 0 if user= set (which makes sense if user=root, otherwise you
# probably want to set uid as well).
ifeq ($(user),)
RUN_USER ?= $(or $(strip $(USER)),nodummy)
RUN_UID ?= $(or $(strip $(shell id -u)),4000)
else
RUN_USER = $(user)
RUN_UID = $(or $(strip $(uid)),0)
endif

export RUN_USER
export RUN_UID

##################
# Build Commands #
##################

check-prereqs:
@echo "=> Checking for pre-requisites"
@if ! $(POETRY) --version; then echo "=> Poetry isn't installed"; fi
@if ! $(GITHUB) --version; then echo "=> GitHub CLI isn't installed"; fi
@if ! poetry --version; then echo "=> Poetry isn't installed"; fi
@if ! github --version; then echo "=> GitHub CLI isn't installed"; fi
@echo "=> All pre-requisites satisfied"

install: check-prereqs
@echo "=> Installing python dependencies"
$(POETRY) install
poetry install

setup: install
login:
$(GITHUB) auth login

build:
docker-compose build

lint:
@echo "=> Running code quality checks"
@echo "============================="
$(POETRY) run black src tests
$(POETRY) run ruff src tests --fix
$(POETRY) run pylint src tests
$(POETRY) run mypy src
$(POETRY) black src tests
$(POETRY) ruff src tests --fix
$(POETRY) pylint src tests
$(POETRY) mypy src
@echo "============================="
@echo "=> All checks succeeded"

unit-test:
@echo "=> Running unit tests"
@echo "============================="
$(POETRY) run pytest --cov=src
$(POETRY) pytest --cov=src

e2e-test:
@echo "=> Running end-to-end tests"
@echo "============================="
$(POETRY) run pytest tests/integrations --cov=src --cov-append
$(POETRY) pytest tests/integrations --cov=src --cov-append

test-audit: unit-test e2e-test
@echo "=> Running test coverage report"
@echo "============================="
$(POETRY) run coverage report --show-missing --fail-under=$(MIN_TEST_COVERAGE)
$(POETRY) coverage report --show-missing --fail-under=$(MIN_TEST_COVERAGE)

release-build:
docker buildx build \
--target release \
--platform=linux/amd64 \
--build-arg RUN_USER=$(RUN_USER) \
--build-arg RUN_UID=$(RUN_UID) \
$(OPTS) \
.

#################
# Data Commands #
#################

sprint-data-export:
@echo "=> Exporting project data from the sprint board"
@echo "====================================================="
$(POETRY) run analytics export gh_project_data \
$(POETRY) analytics export gh_project_data \
--owner $(ORG) \
--project $(SPRINT_PROJECT) \
--output-file $(SPRINT_FILE)

issue-data-export:
@echo "=> Exporting issue data from the repository"
@echo "====================================================="
$(POETRY) run analytics export gh_issue_data \
$(POETRY) analytics export gh_issue_data \
--owner $(ORG) \
--repo $(REPO) \
--output-file $(ISSUE_FILE)
Expand All @@ -69,7 +134,7 @@ gh-data-export: sprint-data-export issue-data-export
sprint-burndown:
@echo "=> Running sprint burndown report"
@echo "====================================================="
poetry run analytics calculate sprint_burndown \
$(POETRY) analytics calculate sprint_burndown \
--sprint-file $(SPRINT_FILE) \
--issue-file $(ISSUE_FILE) \
--sprint "$(SPRINT)" \
Expand All @@ -79,7 +144,7 @@ sprint-burndown:
percent-complete:
@echo "=> Running percent complete deliverable"
@echo "====================================================="
poetry run analytics calculate deliverable_percent_complete \
$(POETRY) analytics calculate deliverable_percent_complete \
--sprint-file $(SPRINT_FILE) \
--issue-file $(ISSUE_FILE) \
--unit $(UNIT) \
Expand Down
15 changes: 15 additions & 0 deletions analytics/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: '3'

services:

grants-analytics:
build:
context: .
target: dev
args:
- RUN_UID=${RUN_UID:-4000}
- RUN_USER=${RUN_USER:-analytics}
container_name: grants-analytics
volumes:
- .:/analytics
- ~/.ssh:/home/${RUN_USER:-analytics}/.ssh
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ssh key is for gh

33 changes: 15 additions & 18 deletions analytics/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
[tool.poetry]
authors = ["widal001 <billy.daly@agile6.com>"]
description = "Python package for analyzing data related to the Simpler Grants Project"
name = "analytics"
packages = [{include = "analytics", from = "src"}]
name = "simpler-grants-gov-analytics"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The poetry run pip install line in the Dockerfile references this name

packages = [{ include = "analytics", from = "src" }]
readme = "README.md"
version = "0.1.0"

Expand All @@ -12,14 +12,14 @@ analytics = "analytics.cli:app"
[tool.poetry.dependencies]
dynaconf = "^3.2.4"
kaleido = "0.2.1"
notebook = "^7.0.0" # Goal is to replace this with another method of presenting charts
notebook = "^7.0.0" # Goal is to replace this with another method of presenting charts
pandas = "^2.0.3"
pandas-stubs = "^2.0.2.230605"
plotly = "^5.15.0"
pydantic = "^2.0.3"
python = "^3.11"
slack-sdk = "^3.23.0"
typer = {extras = ["all"], version = "^0.9.0"}
typer = { extras = ["all"], version = "^0.9.0" }

[tool.poetry.group.dev.dependencies]
black = "^23.7.0"
Expand All @@ -39,10 +39,7 @@ python_version = "3.11"

[[tool.mypy.overrides]]
ignore_missing_imports = true
module = [
"plotly.*",
"dynaconf.*",
]
module = ["plotly.*", "dynaconf.*"]

[tool.pylint."MESSAGE CONTROL"]
disable = [
Expand All @@ -55,17 +52,17 @@ disable = [

[tool.ruff]
ignore = [
"ANN101", # missing type annotation for self
"ANN102", # missing type annotation for cls
"D203", # no blank line before class
"D212", # multi-line summary first line
"FIX002", # line contains TODO
"PD901", # pandas df variable name
"ANN101", # missing type annotation for self
"ANN102", # missing type annotation for cls
"D203", # no blank line before class
"D212", # multi-line summary first line
"FIX002", # line contains TODO
"PD901", # pandas df variable name
"PLR0913", # Too many arguments to function call
"PTH123", # `open()` should be replaced by `Path.open()`
"RUF012", # Mutable class attributes should be annotated with `typing.ClassVar`
"TD003", # missing an issue link on TODO
"T201", # use of `print` detected
"PTH123", # `open()` should be replaced by `Path.open()`
"RUF012", # Mutable class attributes should be annotated with `typing.ClassVar`
"TD003", # missing an issue link on TODO
"T201", # use of `print` detected
]
line-length = 100
select = ["ALL"]
Expand Down
16 changes: 16 additions & 0 deletions documentation/analytics/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,27 @@ Once you follow the steps above, check that you meet the prerequisites with: `ma

1. Set up the project: `make setup` -- This will install the required packages and prompt you to authenticate with GitHub
2. Create a `.secrets.toml` with the following details, see the next section to discover where these values can be found:

```toml
reporting_channel_id = "<REPLACE_WITH_CHANNEL_ID>"
slack_bot_token = "<REPLACE_WITH_SLACKBOT_TOKEN_ID>"
```

3. Set a Github Token in your terminal, via `export GH_TOKEN=...`. Acquiring the token is a multi-step process:

- Go to https://github.com/settings/tokens
- Create a token
- Give it the following scopes:
- repo
- read:org
- admin:public_key
- project
- Add `export GH_TOKEN=...` to your `zshrc` or similar

### Docker vs Native

This project run itself inside of docker by default. If you wish to run this natively, add PY_RUN_APPROACH=local to your environment variables. You can set this by either running `export PY_RUN_APPROACH=local` in your shell or add it to your ~/.zshrc file (and run `source ~/.zshrc`).

### Configuring secrets

#### Prerequisites
Expand Down
Loading