diff --git a/docs/contributors/code/testing-overview.md b/docs/contributors/code/testing-overview.md
index 3be1b6b935bff2..520c95bc600223 100644
--- a/docs/contributors/code/testing-overview.md
+++ b/docs/contributors/code/testing-overview.md
@@ -622,7 +622,7 @@ To run unit tests only, without the linter, use `npm run test:unit:php` instead.
 
 ## Performance testing
 
-To ensure that the editor stays performant as we add features, we monitor the impact pull requests and releases can have on some key metrics:
+To ensure that the editor stays performant as we add features, we monitor the impact pull requests and releases can have on some key metrics including:
 
 -   The time it takes to load the editor.
 -   The time it takes for the browser to respond when typing.
diff --git a/docs/explanations/architecture/performance.md b/docs/explanations/architecture/performance.md
index e631ceb01fa705..de3b7b5dcdbd5b 100644
--- a/docs/explanations/architecture/performance.md
+++ b/docs/explanations/architecture/performance.md
@@ -6,7 +6,9 @@ Performance is a key feature for editor applications and the Block editor is not
 
 To ensure the block editor stays performant across releases and development, we monitor some key metrics using [performance benchmark job](#the-performance-benchmark-job).
 
-- **Loading Time:** The time it takes to load an editor page. This includes time the server takes to respond, times to first paint, first contentful paint, DOM content load complete, load complete and first block render.
+Some of the main important metrics are:
+
+- **Loading Time:** The time it takes to load an editor page. This includes time the server takes to respond, times to first paint, first contentful paint, DOM content load complete, load complete and first block render (both in post and site).
 - **Typing Time:** The time it takes for the browser to respond while typing on the editor.
 - **Block Selection Time:** The time it takes for the browser to respond after a user selects block. (Inserting a block is also equivalent to selecting a block. Monitoring the selection is sufficient to cover both metrics).
 
@@ -57,11 +59,33 @@ Once the directory above is in place, the performance command loop over the perf
 2. Run the performance test for the current suite
 3. Stop the environment for `branch1`
 4. Repeat the first 3 steps for all other branches
-5. Repeat the previous 4 steps 3 times.
-6. Compute medians for all the performance metrics of the current suite.
+5. Compute medians for all the performance metrics of the current suite.
 
 Once all the test suites are executed, a summary report is printed.
 
+## Tracking performance using CodeVitals.
+
+The performance results for each commit are pushed to codevitals and can be seen on the [Gutenberg dashboard there](https://www.codevitals.run/project/gutenberg). The graphs allow us to track the evolution of a given metric over time.
+
+It's thus very important to ensure that the metric being computed is stable. Meaning, if you run the same test twice with the same code and environment, you'll get results that are close.
+
+Our performance job runs Github CI which means that we can't trust the consistency of the numbers that we get between two similar job runs. Github CI may allocate different CPU and memory resources for us over time for instance. To alleviate this problem, each time we run the performance job on the trunk branch, we compare the current commit's performance to a fixed reference commit hash, which allows us to track the relative difference between the current commit and the reference commit consistently regardless of environment changes.
+
+### Update the reference commit
+
+Gutenberg supports only two WP versions, this impacts the performance job in two ways:
+
+ - The base WP version used to run the performance job needs to be updated, when the minimum version supported by Gutenberg changes. In order to do that, we rely on the `Tested up to` flag of the plugin's `readme.txt` file. So each time that flag is changed, the version used for the performance job is changed as well.
+
+ - Updating the WP version used for performance jobs means that there's a high chance that the reference commit used for performance test stability becomes incompatible with the WP version that is used. So every time, the `Tested up to` flag is updated in the `readme.txt` is changed, we also have to update the reference commit that is used in `.github/workflows/performance.yml`. 
+
+The new reference commit hash that is chosen needs to meet the following requirements:
+
+ - Be compatible with the new WP version used in the "Tested up to" flag.
+ - Is already tracked on "codevitals.run" for all existing metrics.
+
+**A simple way to choose commit is to pick a very recent commit on trunk with a passing performance job.**
+
 ## Going further
 
 -   [Journey towards a performant editor](https://riad.blog/2020/02/14/a-journey-towards-a-performant-web-editor/)