Skip to content

Latest commit

 

History

History
243 lines (163 loc) · 9.99 KB

README.md

File metadata and controls

243 lines (163 loc) · 9.99 KB

This repo contains the backend code for:

Analytics Data Flow

At a high level, the metrics data flows like this:

  1. Pulpcore gathers and posts analytics daily data from each installation
  2. The analytics site receives and stores the data without summarization
  3. Once a day the data is summarized via a django command called on a cron job. This also cleans old raw data after some time.
  4. The charts on the site are visualized from the summary data

Gathering and Submitting Data

Pulpcore installations gather the metrics and submit them to either the dev or prod site depending on what the version strings of the pulp components are. If all version strings are GA releases its sent to the production sent, otherwise it's sent to the dev site. See the get_analytics_posting_url() code.

Analytics payload is submitted to the server via a Protocol Buffer definition, which is defined here. The pulpcore code gathers the analytics data and constructs the analytics payload in this module.

The protocol buffer definition is compiled locally with the commands below and checked-in here in this repo and here in pulpcore.

sudo dnf install protobuf  # Install it anyway you want
cd analytics.pulpproject.org  # The command below assumes you are in the root dir
protoc --python_out pulpanalytics/ ./analytics.proto  # Copy this to pulpcore also

Storing Analytics

The analytics data POST is handled here using the protobuf object. The pieces are then saved as model instances all of which foreign key to a single System object which stores the datetime of submission.

Summarization

Summarization occurs when an openshift cron-job in the dev or prod site calls the following command every 24 hours: ./manage.py summarize. This executes this code.

The summarize command uses a separate protobuf definition which can be compiled with commands below and stored here.

sudo dnf install protobuf  # Install it anyway you want
cd analytics.pulpproject.org  # The command below assumes you are in the root dir
protoc --python_out pulpanalytics/ ./summary.proto  # This only lives on the server side (this repo)

A summary is produced for each 24-hour period and stores it as json data in a DailySummary instance. How each analytics metric is summarized is beyond the scope of this document, look at the code and the proposals for each analytics metric (which should outline summarization).

Visualizing Summarized Data

Visualizing is done using Chart.js and is handled by this get view which uses this template. This goal of this code is to read all summary data and collate it into Chart.js data structures.

Setting up a Dev Env

  1. Create (or activate) a virtualenv for your work to live in:
python3 -m venv analytics.pulpproject.org
source analytics.pulpproject.org/bin/activate
  1. Clone and Install Dependencies
git clone https://github.com/pulp/analytics.pulpproject.org.git
cd analytics.pulpproject.org
pip install -r requirements.txt
  1. Start the database

I typically use the official postgres container with podman to provide the database locally with the commands below (taken from this article).

Fetch the container with: podman pull docker.io/library/postgres. Afterwards you can see it listed with podman images.

Start the container with: podman run -dt --name my-postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 postgres

Connect to the db with psql using podman exec -it my-postgres bash. Then connect with user postgres which is the default of the postgres container. Here's a full example:

[bmbouter@localhost analytics.pulpproject.org]$ podman exec -it my-postgres bash
root@f70daa2ab15f:/# psql --user postgres
psql (14.5 (Debian 14.5-1.pgdg110+1))
Type "help" for help.

postgres=# \dt
Did not find any relations.
postgres=#
  1. Set the APP_KEY

The app uses an environment variable APP_KEY to specify the Django SECRET_KEY here. You need to set a random string as the APP_KEY.

export APP_KEY="ceb0c58c-5789-499a-881f-410aec5e1003"

Note: The APP_KEY is just a random string here.

If using the default values of the postgresql image this isn't needed, but optionally, if you want to specify db connection info that also happens as environment variables here. If you did want to set them you could do it like:

export DB_DATABASE="postgres"
export DB_USERNAME="postgres"
export DB_PASSWORD="postgres"
export DB_HOST="localhost"
  1. Apply Migrations

Apply migrations with ./manage.py migrate

  1. Create a superuser (if you want to use the Admin site)

./manage.py createsuperuser

  1. Run the server

./manage.py runserver 0.0.0.0:8000

You can then load the page at http://127.0.0.1:8000/ or the Admin site at http://127.0.0.1:8000/admin/.

Note, the 0.0.0.0:8000 is optional if you only want to receive requests on localhost, but with pulp typically running in an oci_env environment, you likely want to have it listen on all interfaces.

Testing with a Local Pulp Dev Install

Right now to test a local Pulp installation, you need to in-code modify Pulp to post data to your local telemetry installation. This is done by applying this diff:

diff --git a/pulpcore/app/tasks/telemetry.py b/pulpcore/app/tasks/telemetry.py
index 3ca9c0fb4..e4c2c30f1 100644
--- a/pulpcore/app/tasks/telemetry.py
+++ b/pulpcore/app/tasks/telemetry.py
@@ -19,7 +19,8 @@ logger = logging.getLogger(__name__)
 
 
 PRODUCTION_URL = "https://analytics.pulpproject.org/"
-DEV_URL = "https://dev.analytics.pulpproject.org/"
+# DEV_URL = "https://dev.analytics.pulpproject.org/"
+DEV_URL = "http://host.containers.internal:8000/"
 
 
 def get_telemetry_posting_url():

Additionally, ensure your telemetry environment is listening on all interfaces by having 0.0.0.0:8000 in your runserver command, e.g. ./manage.py runserver 0.0.0.0:8000.

Summarizing Data

Summarize data by calling ./manage.py summarize.

This will not summarize data posted "today" because it's not a full summary yet, so for testing it can be helpful to backdate data.

Delete the DB and reapplying migrations

Stop and delete it with the commands below. Then restart the container and reapply migrations.

podman stop my-postgres
podman rm my-postgres

Submitting and deploying PRs

The normal workflow is:

  1. Develop your changes locally and open a PR against the dev branch.
  2. Merge the PR, and after about 5ish minutes, your changes should show up at https://dev.analytics.pulpproject.org/.
  3. Test your changes on the https://dev.analytics.pulpproject.org/ site.
  4. Open a PR that merges dev into main. When this is merged after 5ish minutes your changes should show up on https://analytics.pulpproject.org/.

Exporting/Importing the database

It can be useful to export data from the production or development sites into a local development environment. This is especially useful when developing summarization from production raw data, or when developing visualization of production visualized data. This is a two-step process: 1) export the data from the production site. 2) import it into your local dev environment.

Exporting data from a site

This will work for either analytics.pulpproject.org (prod) or dev.analytics.pulpproject.org (dev). You will need openshift access to the ./manage.py environment to be able to do this.

  1. Login to openshift with the oc client
  2. Select the site you want to use, e.g. production by running: oc project prod-analytics-pulpproject-org
  3. Login to the production pod with oc using oc exec dc/pulpanalytics-app -ti -- bash
  4. Export the database using ./manage.py dumpdata --output /tmp/data.json pulpanalytics
  5. Move the file to your local machine by using something like oc rsync pulpanalytics-app-12-kxttd:/tmp/data.json /tmp/. Note, the pod name changes each time, so you'll need to get that from openshift when you go to run this command.

Importing data from a site

  1. Apply migrations to the same point as the remote DB using ./manage.py migrate
  2. Import the data using: ./manage.py loaddata /tmp/data.json

If testing summarization, you might want to go into the admin interface and delete some recent DailySummary objects to cause your ./manage.py summarize to run your local summarization code.