Welcome to 8Knot!
8Knot is a Dash data web-app built by Red Hat's Open Source Program Office (OSPO).
The number of repositories and projects that GitHub hosts is staggering. Even more impressive is the rate at which this number is growing. Projects on GitHub range from student projects to technological monoliths like the Linux Kernel. Understanding the growth trajectory, contributor behavior, progress blockers, etc. intuitively is no longer feasible for individual large projects, and is even less so for groups of projects. It is necessary that community architects and contributors alike have an accessible, correct, and extendable resource that can aggregate and interpret the data describing these communities.
The goal of this application is to serve Open Source community mangers, stakeholders, contributors, and enthusiasts by providing an insightful and convenient interface to the Open Source project data set collected by the project Augur (CHAOSS).
Please see our guide to contributing to this project at the following site: CONTRIBUTORS.md
Once you've read that, please follow our guidance on how to add additional figures and pages to the application: new_viz_guidance.md
The file-structure of our application is intuitive. Non-application files are omitted from this overview:
8Knot +-- pages/ | +-- index/ | +-- index_callbacks.py | +-- index_layout.py +-- welcome/ | +-- welcome.py | +-- visualizations/ | +-- name_of_visualization_1.py | ... | +-- name_of_visualization_n.py +-- contributions/ | +-- contributions.py | +-- visualizations/ | ... +-- contributors/ | +-- cntrb_behavior.py | +-- cntrb_type.py | +-- visualizations/ | ... +-- chaoss/ | +-- chaoss.py | +-- visualizations/ | ... +-- affiliation/ | +-- affiliation.py | +-- visualizations/ | ... +-- codebase/ | +-- codebase.py | +-- visualizations/ | ... +-- visualization_template/ | +-- viz_template.py +-- utils/ +-- app.py +-- _celery.py +-- _login.py +-- _bots.py . . . +-- ~other files~
The application 'Dash' instance is defined in the 'app.py' file, as is the app.server object that our WSGI server uses, and the manager for the task-queue.
The 'Dash' application instance imports the application's base layout from the '/pages/index/index_layout.py' file. The logic to process user input to components is laid out in this file (search bar, page selectors) is defined in '/pages/index/index_callbacks.py.'
Each page of the application is in its own 'pages' folder. On each page a variety of metrics and figures are rendered. These, for each page, are in the 'page_name/visualizations/' folder, and are imported into the file 'page_name.py.'
If one were to add a figure or a metric to a page, they would add it to that page's 'visualizations' folder and import the visualization into the page's respective 'page_name.py' file.
Open Source software is everywhere, yet it is difficult to find data about Open Source projects. Project community managers, advocates, contributors, and enthusiasts ought to be able to see high-level behavioral, health, and growth trends in their repository that assist their own intuition.
Augur, a project in the CHAOSS Foundation, is closing this gap by collecting structured data about Free and Open Source Software (FOSS).
Quoting Augur's own README.md:
"Augur’s main focus is to measure the overall health and sustainability of open source projects, as these types of projects are system critical for nearly every software organization or company. We do this by gathering data about project repositories and normalizing that into our data model to provide useful metrics about your project’s health."
8Knot's contribution to further closing this gap is to provide an interface to the data collected and organized by Augur as a data web-app with both essential statistical figures, and higher-order machine learning and data science-informed insights.
In the insightful words of Karl Fogel from his book "Producing Open Source Software":
"This is alpha software with known bugs. It runs, and works at least some of the time, but use at your own risk."
We are incredibly happy to have people try our application in any state, and we doubly welcome any thorough bug reports. We would seriously recommend, however, that any conclusions drawn from this app, either realized from our deployed application or from a local instance, be scrutinized heavily until we make a proper, stable, >1.0 release.
Please feel free to join our Matrix channel!
We would prefer any initial communication be through Matrix but if you would prefer to talk to one of our maintainers, please feel free to peruse our AUTHORS.md file where you can find contact details.
If you would like to see the current state of our application, we would love any user-stories or bug-reports from visiting our alpha-deployment!
If you would prefer to look at our most up-to-date work, please check out the following section.
Our demonstration application run in Oregon (AWS us-west2), while the Augur database that we use is in Columbus, MO. To improve user experience, we implement a cache using PostgreSQL as a side-car to our application.
This has added significant application complexity that will be rebuilt and removed in the near future once we've replication our Augur database to the same data-center as our application.
If you experience trouble with the existing caching architecture, please log an issue or reach out to us directly.
We've tried to make it as easy as possible to run your own 8Knot instance.
You will need credentials of the following form, named env.list
, at the top-level of the 8Knot directory that you clone.
The credentials below are valid, so you can copy and use them to access a testing instance of an Augur database.
These credentials are suitable for development, but please replace any secrets with different values in production.
AUGUR_DATABASE=graphish
AUGUR_HOST=chaoss.tv
AUGUR_PASSWORD=cableTV99!
AUGUR_PORT=5432
AUGUR_SCHEMA=augur_data
AUGUR_USERNAME=coup
8KNOT_DEBUG=True
REDIS_PASSWORD=1234
DEFAULT_SEARCHBAR_LABEL=chaoss
POSTGRES_PASSWORD=somepassword
SECRET_KEY=somethingsecret
8Knot doesn't handle user accounts or data collection requests on its own. To support these features, you'll need to add the
following additional configuration to your env.list
file.
You'll need to register your 8Knot instance with an Augur front-end application to get an AUGUR_APP_ID
and an AUGUR_CLIENT_SECRET
.
All parameters wrapped in < >
will need to be replaced below. <endpoint>
variables refer to the host of the Augur front-end that
you're using.
8Knot uses these parameters to manage user accounts via an OAuth2.0 flow, so user passwords are never handled directly.
AUGUR_LOGIN_ENABLED=True
AUGUR_APP_ID=<id>
AUGUR_CLIENT_SECRET=<secret>
AUGUR_SESSION_GENERATE_ENDPOINT=<endpoint>/api/unstable/user/session/generate
AUGUR_USER_GROUPS_ENDPOINT=<endpoint>/api/unstable/user/groups/repos?columns=repo_id,repo_git
AUGUR_USER_ACCOUNT_ENDPOINT=<endpoint>/account/settings
AUGUR_USER_AUTH_ENDPOINT=<endpoint>/user/authorize?client_id=<AUGUR_APP_ID>response_type=code
AUGUR_ADMIN_NAME_ENDPOINT=<endpoint>/api/unstable/application/
AUGUR_ADMIN_GROUP_NAMES_ENDPOINT=<endpoint>/api/unstable/application/groups/names
AUGUR_ADMIN_GROUPS_ENDPOINT=<endpoint>/api/unstable/application/group/repos
OAUTH_CLIENT_NAME=augur
OAUTH_CLIENT_ID=<id>
OAUTH_CLIENT_SECRET=<secret>
OAUTH_AUTHORIZE_URL=<endpoint>/user/authorize
OAUTH_TOKEN_URL=<endpoint>/api/unstable/user/session/generate
OAUTH_REDIRECT_URI = localhost:8080/
Note: You'll have to manually fill in the <AUGUR_APP_ID> in the AUGUR_USER_AUTH_ENDPOINT environment variable.
In-depth instructions for enabling 8Knot + Augur integration is available in AUGUR_LOGIN.md. An overview of OAuth's implementation in 8Knot can be found here: user-accounts-in-8knot.md.
We use containers to minimize the installation requirements for development and single-machine production deployments. If you do not have Docker on your system, please follow the following guide: Install Docker
We use Docker's Compose feature to spin up all application resources together. Please make sure you have Docker Compose installed on your system. You can find documentation on doing so here: Docker Compose
If the following commands return sensible results then Docker and Docker Compose are installed:
docker && docker compose || docker-compose
(above just runs docker and docker-compose and checks if both work)
NOTE: As of 3/29/24 we recommend using Podman
and Podman Desktop
instead of Docker
and Docker Desktop
. It will be our default development environment going forward.
There are many guides to transitioning from Docker
(Desktop) to Podman
(Desktop), but here's a rough outline of our "golden path."
- Uninstall
Docker Desktop
. This will require a GUI uninstall and looking through your apps + filesystem for remnants. - Install
Podman
andPodman Desktop
. You'll also explicitly have to provision aPodman Machine
if you're on a Mac. - Enable the "Docker compatibility add-on" available in
Podman Desktop
. This will route traffic headed for a Docker machine to the running Podman machine. (Under the hood, this points Podman at the Docker socket." - Install
docker-compose
. This is a standalone, open source tool thatPodman Compose
delegates compose responsibilities to.
At this point, the Podman
docs claim that one should have moved over to Podman
as a drop-in replacement for Docker
. However, here are two steps that we noticed were necessary in some cases.
- In
$HOME/.docker/config.json
replace "credsStore" with "credStore" (minus an 's') to solve registry credentialing problems. - Set
export DOCKER_HOST=<your_podman_machine_socket_path>
to thePodman machine
's socket on your system, which you can find in theResources
tab ofPodman Desktop
. The path starts withunix://
.
8Knot is a multi-container application.
The app-server, worker-pools, redis-cache, and postgres-cache containers communicate with one another via docker networking.
All of the build/tear-down is done with docker compose
.
To start the application, run at the top-level of the 8Knot directory:
docker compose up --build
Due to a known deadlock, we recommend scaling-up the number of worker-pool pods deployed. There need to be (#visualizations + 1) celery threads available for the callback_worker pool.
A concrete example: I have 6 CPU's allocated to my Docker runtime, so Celery workers will default to a concurrency of 6 processes. However, there are 7 visualizations on the Overview page. Therefore, I will scale the 'callback_worker' pod to 2 instances, guaranteeing that there are (2 * #CPUs = 12) available processing celery threads, ensuring that the known deadlock will be avoided.
docker compose up --build --scale worker-query=2 --scale worker-callback=2
To stop the application:
ctrl-c
To clean up the stopped containers, run:
docker compose down --volumes
The application should be available locally at 'http://localhost:8080'
We use pre-commit to handle our code quality checks. Before you make a PR please make sure to install pre-commit and pass all of the checks that it requires.