This guide will help you understand our infrastructure stack and how we maintain our platforms. While this guide does not have exhaustive details for all operations, it could be used as a reference for your understanding of the systems.
Let us know, if you have feedback or queries, and we will be happy to clarify.
This repository is continuously built, tested and deployed to separate sets of infrastructure (Servers, Databases, CDNs, etc.).
This involves three steps to be followed in sequence:
- New changes (both fixes and features) are merged into our primary development branch (
main
) via pull requests. - These changes are run through a series of automated tests.
- Once the tests pass we release the changes (or update them if needed) to deployments on our infrastructure.
Typically, main
(the default development branch) is merged into the prod-staging
branch once a day and is released into an isolated infrastructure.
This is an intermediate release for our developers and volunteer contributors. It is also known as our "staging" or "beta" release.
It is identical to our live production environment at freeCodeCamp.org
, other than it using a separate set of databases, servers, web-proxies, etc. This isolation lets us test ongoing development and features in a "production" like scenario, without affecting regular users of freeCodeCamp.org's main platforms.
Once the developer team @freeCodeCamp/dev-team
is happy with the changes on the staging platform, these changes are moved every few days to the prod-current
branch.
This is the final release that moves changes to our production platforms on freeCodeCamp.org.
We employ various levels of integration and acceptance testing to check on the quality of the code. All our tests are done through software like GitHub Actions CI and Azure Pipelines.
We have unit tests for testing our challenge solutions, Server APIs and Client User interfaces. These help us test the integration between different components.
Note
We are also in the process of writing end user tests which will help in replicating real world scenarios like updating an email or making a call to the API or third-party services.
Together these tests help in preventing issues from repeating themselves and ensure we do not introduce a bug while working on another bug or a feature.
We have configured continuous delivery software to push changes to our development and production servers.
Once the changes are pushed to the protected release branches, a build pipeline is automatically triggered for the branch. The build pipelines are responsible for building artifacts and keeping them in a cold storage for later use.
The build pipeline goes on to trigger a corresponding release pipeline if it completes a successful run. The release pipelines are responsible for collecting the build artifacts, moving them to the servers and going live.
Status of builds and releases are available here.
Currently, only members on the developer team can push to the production branches. The changes to the production-*
branches can land only via fast-forward merge to the upstream
.
Note
In the upcoming days we would improve this flow to be done via pull-requests, for better access management and transparency.
-
Configure your remotes correctly.
git remote -v
Results:
origin git@github.com:raisedadead/freeCodeCamp.git (fetch) origin git@github.com:raisedadead/freeCodeCamp.git (push) upstream git@github.com:freeCodeCamp/freeCodeCamp.git (fetch) upstream git@github.com:freeCodeCamp/freeCodeCamp.git (push)
-
Make sure your
main
branch is pristine and in sync with the upstream.git checkout main git fetch --all --prune git reset --hard upstream/main
-
Check that the GitHub CI is passing on the
main
branch for upstream.The continuous integration tests should be green and PASSING for the
main
branch. Click the green check mark next to the commit hash when viewing themain
branch code.If this is failing you should stop and investigate the errors.
-
Confirm that you are able to build the repository locally.
npm run clean-and-develop
-
Move changes from
main
toprod-staging
via a fast-forward mergegit checkout prod-staging git merge main git push upstream
[!NOTE] You will not be able to force push and if you have re-written the history in anyway these commands will error out.
If they do, you may have done something incorrectly and you should just start over.
The above steps will automatically trigger a run on the build pipeline for the prod-staging
branch. Once the build is complete, the artifacts are saved as .zip
files in a cold storage to be retrieved and used later.
The release pipeline is triggered automatically when a fresh artifact is available from the connected build pipeline. For staging platforms, this process does not involve manual approval and the artifacts are pushed to the Client CDN and API servers.
The process is mostly the same as the staging platforms, with a few extra checks in place. This is just to make sure, we do not break anything on freeCodeCamp.org which can see hundreds of users using it at any moment.
Do NOT execute these commands unless you have verified that everything is working on the staging platform. You should not bypass or skip any testing on staging before proceeding further. |
---|
-
Make sure your
prod-staging
branch is pristine and in sync with the upstream.git checkout prod-staging git fetch --all --prune git reset --hard upstream/prod-staging
-
Move changes from
prod-staging
toprod-current
via a fast-forward mergegit checkout prod-current git merge prod-staging git push upstream
[!NOTE] You will not be able to force push and if you have re-written the history in anyway these commands will error out.
If they do, you may have done something incorrectly and you should just start over.
The above steps will automatically trigger a run on the build pipeline for the prod-current
branch. Once a build artifact is ready, it will trigger a run on the release pipeline.
Additional Steps for Staff Action
One a release run is triggered, members of the developer staff team will receive an automated manual intervention email. They can either approve or reject the release run.
If the changes are working nicely and have been tested on the staging platform, then it can be approved. The approval must be given within 4 hours of the release being triggered before getting rejected automatically. A staff can re-trigger the release run manually for rejected runs, or wait for the next cycle of release.
For staff use:
Check your email for a direct link or go to the release dashboard after the build run is complete. |
---|
Once one of the staff members approves a release, the pipeline will push the changes live to freeCodeCamp.org's production CDN and API servers.
Here is the current test, build and deployment status of the codebase.
Branch | Unit Tests | Integration Tests | Builds & Deployments |
---|---|---|---|
main |
- | ||
prod-staging |
Azure Pipelines | ||
prod-current |
Azure Pipelines | ||
prod-next (experimental, upcoming) |
- | - | - |
We welcome you to test these releases in a "public beta testing" mode and get early access to upcoming features to the platforms. Sometimes these features/changes are referred to as next, beta, staging, etc. interchangeably.
Your contributions via feedback and issue reports will help us in making the production platforms at freeCodeCamp.org
more resilient, consistent and stable for everyone.
We thank you for reporting bugs that you encounter and help in making freeCodeCamp.org better. You rock!
Currently a public beta testing version is available at:
Application | Language | URL |
---|---|---|
Learn | English | https://www.freecodecamp.dev |
Espanol | https://www.freecodecamp.dev/espanol | |
Chinese | https://chinese.freecodecamp.dev | |
News | English | https://www.freecodecamp.dev/news |
Forum | English | https://forum.freecodecamp.dev |
Chinese | https://chinese.freecodecamp.dev/forum | |
API | - | https://api.freecodecamp.dev |
Note
The domain name is different than freeCodeCamp.org
. This is intentional to prevent search engine indexing and avoid confusion for regular users of the platform.
The above list not exhaustive of all the applications that we provision. Also not all language variants are deployed in staging to conserve resources.
The current version of the platform is always available at freeCodeCamp.org
.
The dev-team merges changes from the prod-staging
branch to prod-current
when they release changes. The top commit should be what you see live on the site.
You can identify the exact version deployed by visiting the build and deployment logs available in the status section. Alternatively you can also ping us in the contributors chat room for a confirmation.
There are some known limitations and tradeoffs when using the beta version of the platform.
-
All data / personal progress on these beta platforms
will NOT be saved or carried over
to production.Users on the beta version will have a separate account from the production. The beta version uses a physically separate database from production. This gives us the ability to prevent any accidental loss of data or modifications. The dev team may purge the database on this beta version as needed.
-
Deployment is expected to be frequent and in rapid iterations, sometimes multiple times a day. As a result there will be unexpected downtime at times or broken functionality on the beta version.
-
The beta site is and always has been to augment local development and testing, nothing else. It's not a promise of what’s coming, but a glimpse of what is being worked upon.
-
We use a test tenant for freeCodeCamp.dev on Auth0, and hence do not have the ability to set a custom domain. This makes it so that all the redirect callbacks and the login page appear at a default domain like:
https://freecodecamp-dev.auth0.com/
. This does not affect the functionality and is as close to production as we can get.
Please open fresh issues for discussions and reporting bugs.
You may send an email to dev[at]freecodecamp.org
if you have any queries. As always all security vulnerabilities should be reported to security[at]freecodecamp.org
instead of the public tracker and forum.
Warning
- The guide applies to the freeCodeCamp Staff members only.
- These instructions should not be considered exhaustive, please use caution.
As a member of the staff, you may have been given access to our cloud service providers like Azure, Digital Ocean, etc.
Here are some handy commands that you can use to work on the Virtual Machines (VM), for instance performing maintenance updates or doing general housekeeping.
[!NOTE] While you may already have SSH access to the VMs, that alone will not let you list VMs unless you been granted access to the cloud portals as well.
Install Azure CLI az
: https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
(One-time) Install on macOS with
homebrew
:
brew install azure-cli
(One-time) Login:
az login
Get the list of VM names and P addresses:
az vm list-ip-addresses --output table
Install Digital Ocean CLI doctl
:
https://github.com/digitalocean/doctl#installing-doctl
(One-time) Install on macOS with
homebrew
:
brew install doctl
(One-time) Login:
Authentication and context switching: https://github.com/digitalocean/doctl#authenticating-with-digitalocean
doctl auth init
Get the list of VM names and IP addresses:
doctl compute droplet list --format "ID,Name,PublicIPv4"
We are working on creating our IaC setup, and while that is in works you can use the Azure portal or the Azure CLI to spin new virtual machines and other resources.
Tip
No matter your choice of spinning resources, we have a few handy cloud-init config files to help you do some of the basic provisioning like installing docker or adding SSH keys, etc.
You should keep the VMs up to date by performing updates and upgrades. This will ensure that the virtual machine is patched with latest security fixes.
[!WARNING] Before you run these commands:
- Make sure that the VM has been provisioned completely and there is no post-install steps running.
- If you are updating packages on a VM that is already serving an application, make sure the app has been stopped / saved. Package updates will cause network bandwidth, memory and/or CPU usage spikes leading to outages on running applications.
Update package information
sudo apt update
Upgrade installed packages
sudo apt upgrade -y
Cleanup unused packages
sudo apt autoremove -y
We are running load balanced (Azure Load Balancer) instances for our web servers. These servers are running NGINX which reverse proxy all of the traffic to freeCodeCamp.org from various applications running on their own infrastructures.
The NGINX config is available on this repository.
Provisioning VMs with the Code
-
Install NGINX and configure from repository.
sudo su cd /var/www/html git clone https://github.com/freeCodeCamp/error-pages cd /etc/ rm -rf nginx git clone https://github.com/freeCodeCamp/nginx-config nginx cd /etc/nginx
-
Install Cloudflare origin certificates and upstream application config.
Get the Cloudflare origin certificates from the secure storage and install at required locations.
OR
Move over existing certificates:
# Local scp -r username@source-server-public-ip:/etc/nginx/ssl ./ scp -pr ./ssl username@target-server-public-ip:/tmp/ # Remote rm -rf ./ssl mv /tmp/ssl ./
Update Upstream Configurations:
vi configs/upstreams.conf
Add/update the source/origin application IP addresses.
-
Setup networking and firewalls.
Configure Azure firewalls and
ufw
as needed for ingress origin addresses. -
Add the VM to the load balancer backend pool.
Configure and add rules to load balancer if needed. You may also need to add the VMs to load balancer backend pool if needed.
-
Check status for NGINX service using the below command:
sudo systemctl status nginx
-
Logging and monitoring for the servers are available at:
NGINX Amplify: https://amplify.nginx.com, our current basic monitoring dashboard. We are working on more granular metrics for better observability
Config changes to our NGINX instances are maintained on GitHub, these should be deployed on each instance like so:
- SSH into the instance and enter sudo
sudo su
- Get the latest config code.
cd /etc/nginx
git fetch --all --prune
git reset --hard origin/main
- Test and reload the config with Signals.
nginx -t
nginx -s reload
- Install build tools for node binaries (
node-gyp
) etc.
sudo apt install build-essential
Provisioning VMs with the Code
-
Install Node LTS.
-
Update
npm
and install PM2 and setuplogrotate
and startup on bootnpm i -g npm@6 npm i -g pm2 pm2 install pm2-logrotate pm2 startup
-
Clone freeCodeCamp, setup env and keys.
git clone https://github.com/freeCodeCamp/freeCodeCamp.git cd freeCodeCamp git checkout prod-current # or any other branch to be deployed
-
Create the
.env
from the secure credentials storage. -
Create the
google-credentials.json
from the secure credentials storage. -
Install dependencies
npm ci
-
Build the server
npm run ensure-env && npm run build:curriculum && npm run build:server
-
Start Instances
cd api-server pm2 start ./lib/production-start.js -i max --max-memory-restart 600M --name org
pm2 logs
pm2 monit
Code changes need to be deployed to the API instances from time to time. It can be a rolling update or a manual update. The later is essential when changing dependencies or adding environment variables.
[!ATTENTION] The automated pipelines are not handling dependencies updates at the minute. We need to do a manual update before any deployment pipeline runs.
- Stop all instances
pm2 stop all
- Install dependencies
npm ci
- Build the server
npm run ensure-env && npm run build:curriculum && npm run build:server
- Start Instances
pm2 start all --update-env && pm2 logs
pm2 reload all --update-env && pm2 logs
[!NOTE] We are handling rolling updates to code, logic, via pipelines. You should not need to run these commands. These are here for documentation.
- Install build tools for node binaries (
node-gyp
) etc.
sudo apt install build-essential
Provisioning VMs with the Code
-
Install Node LTS.
-
Update
npm
and install PM2 and setuplogrotate
and startup on bootnpm i -g npm@6 npm i -g pm2 npm install -g serve pm2 install pm2-logrotate pm2 startup
-
Clone client config, setup env and keys.
git clone https://github.com/freeCodeCamp/client-config.git client cd client
Start placeholder instances for the web client, these will be updated with artifacts from the Azure pipeline.
Todo: This setup needs to move to S3 or Azure Blob storage
echo "serve -c ../../serve.json www -p 50505" >> client-start-primary.sh chmod +x client-start-primary.sh pm2 delete client-primary pm2 start ./client-start-primary.sh --name client-primary echo "serve -c ../../serve.json www -p 52525" >> client-start-secondary.sh chmod +x client-start-secondary.sh pm2 delete client-secondary pm2 start ./client-start-secondary.sh --name client-secondary
pm2 logs
pm2 monit
Code changes need to be deployed to the API instances from time to time. It can be a rolling update or a manual update. The later is essential when changing dependencies or adding environment variables.
[!ATTENTION] The automated pipelines are not handling dependencies updates at the minute. We need to do a manual update before any deployment pipeline runs.
-
Stop all instances
pm2 stop all
-
Install or update dependencies
-
Start Instances
pm2 start all --update-env && pm2 logs
pm2 reload all --update-env && pm2 logs
[!NOTE] We are handling rolling updates to code, logic, via pipelines. You should not need to run these commands. These are here for documentation.
Our chat servers are available with a HA configuration recommended in Rocket.Chat docs. The docker-compose
file for this is available here.
We provision redundant NGINX instances which are themselves load balanced (Azure Load Balancer) in front of the Rocket.Chat cluster. The NGINX configuration file are available here.
Provisioning VMs with the Code
NGINX Cluster:
-
Install NGINX and configure from repository.
sudo su cd /var/www/html git clone https://github.com/freeCodeCamp/error-pages cd /etc/ rm -rf nginx git clone https://github.com/freeCodeCamp/chat-nginx-config nginx cd /etc/nginx
-
Install Cloudflare origin certificates and upstream application config.
Get the Cloudflare origin certificates from the secure storage and install at required locations.
OR
Move over existing certificates:
# Local scp -r username@source-server-public-ip:/etc/nginx/ssl ./ scp -pr ./ssl username@target-server-public-ip:/tmp/ # Remote rm -rf ./ssl mv /tmp/ssl ./
Update Upstream Configurations:
vi configs/upstreams.conf
Add/update the source/origin application IP addresses.
-
Setup networking and firewalls.
Configure Azure firewalls and
ufw
as needed for ingress origin addresses. -
Add the VM to the load balancer backend pool.
Configure and add rules to load balancer if needed. You may also need to add the VMs to load balancer backend pool if needed.
Docker Cluster:
-
Install Docker and configure from the repository
git clone https://github.com/freeCodeCamp/chat-config.git chat cd chat
-
Configure the required environment variables and instance IP addresses.
-
Run rocket-chat server
docker-compose config docker-compose up -d
-
Check status for NGINX service using the below command:
sudo systemctl status nginx
-
Check status for running docker instances with:
docker ps
NGINX Cluster:
Config changes to our NGINX instances are maintained on GitHub, these should be deployed on each instance like so:
-
SSH into the instance and enter sudo
sudo su
-
Get the latest config code.
cd /etc/nginx git fetch --all --prune git reset --hard origin/main
-
Test and reload the config with Signals.
nginx -t nginx -s reload
Docker Cluster:
-
SSH into the instance and navigate to the chat config path
cd ~/chat
-
Get the latest config code.
git fetch --all --prune git reset --hard origin/main
-
Pull down the latest docker image for Rocket.Chat
docker-compose pull
-
Update the running instances
docker-compose up -d
-
Validate the instances are up
docker ps
-
Cleanup extraneous resources
docker system prune --volumes
Output:
WARNING! This will remove: - all stopped containers - all networks not used by at least one container - all volumes not used by at least one container - all dangling images - all dangling build cache Are you sure you want to continue? [y/N] y
Select yes (y) to remove everything that is not in use. This will remove all stopped containers, all networks and volumes not used by at least one container, and all dangling images and build caches.
List currently installed node & npm versions
nvm -v
node -v
npm -v
nvm ls
Install the latest Node.js LTS, and reinstall any global packages
nvm install 'lts/*' --reinstall-packages-from=default
Verify installed packages
npm ls -g --depth=0
Alias the default
Node.js version to the current LTS
nvm alias default lts/*
(Optional) Uninstall old versions
nvm uninstall <version>
Warning
If using PM2 for processes you would also need to bring up the applications and save the process list for automatic recovery on restarts.
Quick commands for PM2 to list, resurrect saved processes, etc.
pm2 ls
pm2 resurrect
pm2 save
pm2 logs
[!ATTENTION] For client applications, the shell script can't be resurrected between Node.js versions with
pm2 resurrect
. Deploy processes from scratch instead. This should become nicer when we move to a docker based setup.
See: https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/v2-linux?view=azure-devops and follow the instructions to stop, remove and reinstall agents. Broadly you can follow the steps listed here.
You would need a PAT, that you can grab from here: https://dev.azure.com/freeCodeCamp-org/_usersSettings/tokens
Navigate to Azure Devops and register the agent from scratch in the requisite deployment groups.
Note
You should run the scripts in the home directory, and make sure no other azagent
directory exists.
Currently updating agents requires them to be removed and reconfigured. This is required for them to correctly pick up PATH
values and other system environment variables. We need to do this for instance updating Node.js on our deployment target VMs.
-
Navigate and check status of the service
cd ~/azagent sudo ./svc.sh status
-
Stop the service
sudo ./svc.sh stop
-
Uninstall the service
sudo ./svc.sh uninstall
-
Remove the agent from the pipeline pool
./config.sh remove
-
Remove the config files
cd ~ rm -rf ~/azagent
Once You have completed the steps above, you can repeat the same steps as installing the agent.
We use a CLI tool to send out the weekly newsletter. To spin this up and begin the process:
-
Sign in to DigitalOcean, and spin up new droplets under the
Sendgrid
project. Use the Ubuntu Sendgrid snapshot with the most recent date. This comes pre-loaded with the CLI tool and the script to fetch emails from the database. With the current volume, three droplets are sufficient to send the emails in a timely manner. -
Set up the script to fetch the email list.
cd /home/freecodecamp/scripts/emails cp sample.env .env
You will need to replace the placeholder values in the
.env
file with your credentials. -
Run the script.
node get-emails.js emails.csv
This will save the email list in an
emails.csv
file. -
Break the emails down into multiple files, depending on the number of droplets you need. This is easiest to do by using
scp
to pull the email list locally and using your preferred text editor to split them into multiple files. Each file will need theemail,unsubscribeId
header. -
Switch to the CLI directory with
cd /home/sendgrid-email-blast
and configure the tool per the documentation. -
Run the tool to send the emails, following the usage documentation.
-
When the email blast is complete, verify that no emails have failed before destroying the droplets.