Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerize analytics folder #1538

Merged
merged 19 commits into from
Mar 27, 2024
Merged

Dockerize analytics folder #1538

merged 19 commits into from
Mar 27, 2024

Conversation

coilysiren
Copy link
Collaborator

@coilysiren coilysiren commented Mar 25, 2024

Summary

Fixes #1540

Time to review: 10 mins

Changes proposed

Changes the application to primarily be run inside of docker.

You can still run code natively via the PY_RUN_APPROACH=local setting, as described in the makefile. But docker is now the "default" method of running this code.

⚠️ Breaking Changes ⚠️

If you previously authed to Github via the web browser login flow, then you will now have to change to authing to Github via a personal access token. This is because I couldn't figure out how to get the web browser login flow working, but I could get the personal access token working.

To utilize the personal access token method:

  1. Go to https://github.com/settings/tokens
  2. Create a token
  3. Give it the following scopes:
    • repo
    • read:org
    • admin:public_key
    • project
  4. Set in your terminal via export GH_TOKEN=...

Testing

$ make sprint-reports-with-latest-data ACTION=post-results

=> Exporting project data from the sprint board
=====================================================
docker-compose run -e GH_TOKEN --rm grants-analytics poetry run analytics export gh_project_data \
	--owner HHS \
	--project 13 \
	--output-file data/sprint-data.json

@github-actions github-actions bot removed the python label Mar 25, 2024
@@ -0,0 +1,113 @@
# Use the official python3 image based on Debian 11 "Bullseye".
Copy link
Collaborator Author

@coilysiren coilysiren Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly copied from api/Dockerfile

@@ -1,8 +1,8 @@
[tool.poetry]
authors = ["widal001 <billy.daly@agile6.com>"]
description = "Python package for analyzing data related to the Simpler Grants Project"
name = "analytics"
packages = [{include = "analytics", from = "src"}]
name = "simpler-grants-gov-analytics"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The poetry run pip install line in the Dockerfile references this name

container_name: grants-analytics
volumes:
- .:/analytics
- ~/.ssh:/home/${RUN_USER:-analytics}/.ssh
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ssh key is for gh

Comment on lines +25 to +35
# By default, all python/poetry commands will run inside of the docker container
# if you wish to run this natively, add PY_RUN_APPROACH=local to your environment vars
# You can set this by either running `export PY_RUN_APPROACH=local` in your shell or add
# it to your ~/.zshrc file (and run `source ~/.zshrc`)
ifeq "$(PY_RUN_APPROACH)" "local"
POETRY := poetry run
GITHUB := gh
else
POETRY := docker-compose run $(DOCKER_EXEC_ARGS) --rm $(APP_NAME) poetry run
GITHUB := docker-compose run $(DOCKER_EXEC_ARGS) --rm $(APP_NAME) gh
endif
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important to read the documentation here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for calling this out! Works like a charm:

Screenshot 2024-03-27 at 2 36 38 PM

@coilysiren can we update the development.md docs to reference this as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the same text verbatim to development.md 77139c1

Comment on lines +47 to +48
# Set PYTHONPATH so that the tests can find the source code.
ENV PYTHONPATH="/analytics/src/:$PYTHONPATH"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add this line

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking request, but can we look into this further? If we're installing the analytics/ package in the docker container correctly, I don't think we should need to set the PYTHONPATH directly, see this stack overflow post for more info

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, but I'm not sure how valuable that would be right now, given that large set of work on my plate. I will make a ticket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the ticket: #1565

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not super worried about it, mainly just want to flag it in case it resurfaces in some other error.

Comment on lines +27 to +34
# Install gh CLI
# docs: https://github.com/cli/cli/blob/trunk/docs/install_linux.md
RUN mkdir -p -m 755 /etc/apt/keyrings && wget -qO- https://cli.github.com/packages/githubcli-archive-keyring.gpg | tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
&& chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& apt update \
&& apt install gh -y \
&& gh --version
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new versus the api dockerfile

Copy link
Collaborator

@widal001 widal001 Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks for catching and handling the need to install the GitHub CLI

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 26, 2024
@coilysiren coilysiren marked this pull request as ready for review March 26, 2024 22:22
@coilysiren coilysiren requested a review from sumiat as a code owner March 26, 2024 22:22
Copy link
Collaborator

@chouinar chouinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all familiar with this code, just a few minor notes

&& apt-get install --no-install-recommends --yes \
build-essential \
libpq-dev \
postgresql \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these apt-get values might be removable like this one if the analytics code never hits postgres

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is going to use postgres

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • build-essential: a good all around package to keep
  • libpq-dev: is for postgres
  • PostgreSQL: same as above

So, nothing to remove

Comment on lines 71 to 76
# Set the host to 0.0.0.0 to make the server available external
# to the Docker container that it's running in.
ENV HOST=0.0.0.0

# Run the application.
CMD ["poetry", "run", "python", "-m", "src"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this running / what is the entrypoint file to this code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entrypoints are all of these commands here:

sprint-data-export:
@echo "=> Exporting project data from the sprint board"
@echo "====================================================="
$(POETRY) run analytics export gh_project_data \
--owner $(ORG) \
--project $(SPRINT_PROJECT) \
--output-file $(SPRINT_FILE)
issue-data-export:
@echo "=> Exporting issue data from the repository"
@echo "====================================================="
$(POETRY) run analytics export gh_issue_data \
--owner $(ORG) \
--repo $(REPO) \
--output-file $(ISSUE_FILE)
gh-data-export: sprint-data-export issue-data-export
sprint-burndown:
@echo "=> Running sprint burndown report"
@echo "====================================================="
poetry run analytics calculate sprint_burndown \
--sprint-file $(SPRINT_FILE) \
--issue-file $(ISSUE_FILE) \
--sprint "$(SPRINT)" \
--unit $(UNIT) \
--$(ACTION)
percent-complete:
@echo "=> Running percent complete deliverable"
@echo "====================================================="
poetry run analytics calculate deliverable_percent_complete \
--sprint-file $(SPRINT_FILE) \
--issue-file $(ISSUE_FILE) \
--unit $(UNIT) \
--$(ACTION)
sprint-reports: sprint-burndown percent-complete
sprint-reports-with-latest-data: gh-data-export sprint-reports

I don't plan on specifying the entrypoint inside of this Dockerfile

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the lines you commented on

585c40e

Copy link
Collaborator

@widal001 widal001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

Tests

I ran both with Docker and natively and works well.

Natively:

Screenshot 2024-03-27 at 2 36 38 PM

With docker:

Screenshot 2024-03-27 at 2 51 36 PM Screenshot 2024-03-27 at 2 51 47 PM

Non-blocking flag

At some point in the future, if our docker image is just using the latest python version we'll need to fix that deprecation warning for accessing the timezone via the 3.11 syntax.

Comment on lines +47 to +48
# Set PYTHONPATH so that the tests can find the source code.
ENV PYTHONPATH="/analytics/src/:$PYTHONPATH"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not super worried about it, mainly just want to flag it in case it resurfaces in some other error.

# The build stage that will be used to deploy to the various environments
# needs to be called `release` in order to integrate with the repo's
# top-level Makefile
FROM python:3-slim AS base
Copy link
Collaborator

@widal001 widal001 Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this use 3.12 by default? If so we might want to update the minimum required python version in pyproject.toml and fix the warning around the use of 3.11-based timezone syntax, or change this to use 3.11.

Screenshot 2024-03-27 at 2 51 47 PM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its 3.12 right now

@coilysiren coilysiren merged commit 0a25d0b into main Mar 27, 2024
2 checks passed
@coilysiren coilysiren deleted the analytics-docker branch March 27, 2024 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analytics ci/cd documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Delivery Dashboard - Dockerize the analytics application
3 participants