cube-js · igorlukanin · Dec 19, 2023 · Dec 19, 2023
diff --git a/docs/pages/product/deployment/cloud.mdx b/docs/pages/product/deployment/cloud.mdx
@@ -35,6 +35,7 @@ In Cube Cloud, you can:
   API endpoints for the source code in the main branch, any other branch,
   or any user-specific [development mode][ref-dev-mode] branch.
 * Assign a [custom domain][ref-domains] to API endpoints of any deployment.
+* Review [performance insights][ref-performance] and fine-tune deployments for better [scalability][ref-scalability].
 * Set up account-wide [budgets][ref-budgets] to control resource consumption
   and use [auto-suspension][ref-auto-sus] to reduce resource consumption of
   non-production deployments.
@@ -51,4 +52,6 @@ In Cube Cloud, you can:
 [ref-dev-mode]: /product/workspace/dev-mode
 [ref-domains]: /product/deployment/cloud/custom-domains
 [ref-auto-sus]: /product/deployment/cloud/auto-suspension
-[ref-budgets]: /product/workspace/budgets
+[ref-budgets]: /product/workspace/budgets
+[ref-performance]: /product/workspace/performance
+[ref-scalability]: /product/deployment/cloud/scalability
diff --git a/docs/pages/product/deployment/cloud/_meta.js b/docs/pages/product/deployment/cloud/_meta.js
@@ -4,6 +4,7 @@ module.exports = {
   "continuous-deployment": "Continuous deployment",
   "custom-domains": "Custom domains",
   "auto-suspension": "Auto-suspension",
+  "scalability": "Scalability",
   "pricing": "Pricing",
   "support": "Support",
   "limits": "Limits"

diff --git a/docs/pages/product/deployment/cloud/deployment-types.mdx b/docs/pages/product/deployment/cloud/deployment-types.mdx
@@ -64,8 +64,8 @@ Production Clusters are designed to support high-availability production
 workloads. It consists of several key components, including starting with 2 Cube
 API instances, 1 Cube Refresh Worker and 2 Cube Store Routers - all of which run
 on dedicated infrastructure. The cluster can automatically scale to meet the
-needs of your workload by adding more components as necessary; check the
-[Scalability section](#scalability) below.
+needs of your workload by adding more components as necessary; check the page on
+[scalability][ref-scalability] to learn more.
 
 ## Production multi-cluster
 
@@ -96,59 +96,6 @@ Cube Cloud routes traffic between clusters based on
 Each cluster is billed separately, and all clusters can use auto-scaling to
 match demand.
 
-## Scalability
-
-Cube Cloud also allows adding additional infrastructure to your deployment to
-increase scalability and performance beyond what is available with each
-Production Deployment.
-
-### Cube Store Worker
-
-Cube Store Workers are used to build and persist pre-aggregations. Each Worker
-has a **maximum of 150GB** of storage; [additional Cube Store
-workers][ref-limits] can be added to your deployment to both increase storage
-space and improve pre-aggregation performance. A **minimum of 2** Cube Store
-Workers is required for pre-aggregations; this can be adjusted. For a rough
-estimate, it will take approximately 2 Cube Store Workers per 4 GB of
-pre-aggregated data per day.
-
-<InfoBox>
-
-Idle workers will automatically hibernate after 10 minutes of inactivity, and
-will not consume CCUs until they are resumed. Workers are resumed automatically
-when Cube receives a query that should be accelerated by a pre-aggregation, or
-when a scheduled refresh is triggered.
-
-</InfoBox>
-
-To change the number of Cube Store Workers in a deployment, go to the
-deployment’s <Btn>Settings</Btn> screen, and open the <Btn>Configuration</Btn>
-tab. From this screen, you can set the number of Cube Store Workers from the
-dropdown:
-
-<Screenshot
-  alt="Cube Cloud Deployment Settings page showing auto-scaling configuration options"
-  src="https://ucarecdn.com/3b39c56f-d553-4612-b4f0-07084cc4b742/"
-/>
-
-### Cube API Instance
-
-With a Production Deployment, 2 Cube API Instances are included. That said, it
-is very common to use more, and [additional API instances][ref-limits] can be
-added to your deployment to increase the throughput of your queries. A rough
-estimate is that 1 Cube API Instance is needed for every 5-10
-requests-per-second served. Cube API Instances can also auto-scale as needed.
-
-To change how many Cube API instances are available in the Production Cluster,
-go to the deployment’s <Btn>Settings</Btn> screen, and open
-the <Btn>Configuration</Btn> tab. From this screen, you can set the minimum and
-maximum number of Cube API instances for a deployment:
-
-<Screenshot
-  alt="Cube Cloud Deployment Settings page showing auto-scaling configuration options"
-  src="https://ucarecdn.com/3b39c56f-d553-4612-b4f0-07084cc4b742/"
-/>
-
 ## Switching between deployment types
 
 To switch a deployment's type, go to the deployment's <Btn>Settings</Btn> screen
@@ -161,3 +108,4 @@ and select from the available options:
 
 [ref-conf-ref-ctx-to-app-id]: /reference/configuration/config#contexttoappid
 [ref-limits]: /product/deployment/cloud/limits#resources
+[ref-scalability]: /product/deployment/cloud/scalability
diff --git a/docs/pages/product/deployment/cloud/scalability.mdx b/docs/pages/product/deployment/cloud/scalability.mdx
@@ -0,0 +1,55 @@
+# Scalability
+
+Cube Cloud also allows adding additional infrastructure to your deployment to
+increase scalability and performance beyond what is available with each
+Production Deployment.
+
+## Auto-scaling of API instances
+
+With a Production Cluster, 2 Cube API Instances are included. That said, it
+is very common to use more, and [additional API instances][ref-limits] can be
+added to your deployment to increase the throughput of your queries. A rough
+estimate is that 1 Cube API Instance is needed for every 5-10
+requests-per-second served. Cube API Instances can also auto-scale as needed.
+
+To change how many Cube API instances are available in the Production Cluster,
+go to the deployment’s <Btn>Settings</Btn> screen, and open
+the <Btn>Configuration</Btn> tab. From this screen, you can set the minimum and
+maximum number of Cube API instances for a deployment:
+
+<Screenshot
+  alt="Cube Cloud Deployment Settings page showing auto-scaling configuration options"
+  src="https://ucarecdn.com/3b39c56f-d553-4612-b4f0-07084cc4b742/"
+/>
+
+## Sizing Cube Store workers
+
+Cube Store Workers are used to build and persist pre-aggregations. Each Worker
+has a **maximum of 150GB** of storage; [additional Cube Store
+workers][ref-limits] can be added to your deployment to both increase storage
+space and improve pre-aggregation performance. A **minimum of 2** Cube Store
+Workers is required for pre-aggregations; this can be adjusted. For a rough
+estimate, it will take approximately 2 Cube Store Workers per 4 GB of
+pre-aggregated data per day.
+
+<InfoBox>
+
+Idle workers will automatically hibernate after 10 minutes of inactivity, and
+will not consume CCUs until they are resumed. Workers are resumed automatically
+when Cube receives a query that should be accelerated by a pre-aggregation, or
+when a scheduled refresh is triggered.
+
+</InfoBox>
+
+To change the number of Cube Store Workers in a deployment, go to the
+deployment’s <Btn>Settings</Btn> screen, and open the <Btn>Configuration</Btn>
+tab. From this screen, you can set the number of Cube Store Workers from the
+dropdown:
+
+<Screenshot
+  alt="Cube Cloud Deployment Settings page showing auto-scaling configuration options"
+  src="https://ucarecdn.com/3b39c56f-d553-4612-b4f0-07084cc4b742/"
+/>
+
+
+[ref-limits]: /product/deployment/cloud/limits#resources
diff --git a/docs/pages/product/workspace/_meta.js b/docs/pages/product/workspace/_meta.js
@@ -8,7 +8,7 @@ module.exports = {
   "pre-aggregations": "Pre-Aggregations",
   "performance": "Performance Insights",
   "access-control": "Access Control",
-  "sso": "Single Sign-On",
+  "sso": "Single Sign-on",
   "budgets": "Budgets",
   "preferences": "Preferences",
   "cli": "CLI"

diff --git a/docs/pages/product/workspace/performance.mdx b/docs/pages/product/workspace/performance.mdx
@@ -1,20 +1,16 @@
 # Performance Insights
 
-<WarningBox>
-
-This page is work-in-progress.
-
-</WarningBox>
-
 The&nbsp;<Btn>Performance</Btn> page in Cube Cloud displays charts that help
 analyze the performance of your deployment and fine-tune its configuration.
 It's recommended to review Performance Insights when the workload changes
 or if you face any performance-related issues with your deployment.
 
 <SuccessBox>
 
-Performance Insights are available in Cube Cloud on
-[all tiers](https://cube.dev/pricing).
+Performance Insights are available in Cube Cloud on [Premium and above
+tiers](https://cube.dev/pricing). Please contact us through the in-product
+chat or check with your dedicated CSM to enable Performance Insights in
+your account.
 
 </SuccessBox>
 
@@ -25,60 +21,140 @@ Charts provide insights into different aspects of your deployment.
 ### API instances
 
 The&nbsp;<Btn>API instances</Btn> chart shows the number of API instances
-that served queries to the deployment over time.
+that served queries to the deployment.
+
+You can use this chart to **fine-tune the
+[auto-scaling][ref-scalability-api] configuration of API instances**, e.g.,
+increase the minimum and maximum number of API instances.
+
+For example, the following chart shows a deployment with sane auto-scaling
+limits that don't need adjusting. It looks like the deployment needs to
+sustain just a few infrequent load bursts per day and auto-scaling to 3 API
+instances does the job just fine:
+
+<Screenshot src="https://ucarecdn.com/71de8978-8d3f-42cd-a32f-03daa73ad561/"/>
+
+The next chart shows a deployment with auto-scaling limits that definitely
+need an adjustment. It looks like the load is so high that most of the time
+this deployment has to use at least 4-6 API instances. So, it would be wise
+to increase the minimum auto-scaling limit to 6 API instances:
+
+<Screenshot src="https://ucarecdn.com/e5c074b0-e4d4-442e-af48-e50ec0f61963/"/>
+
+When in doubt, consider using a higher minimum auto-scaling limit: when an
+additional API instance starts, it needs some time to compile the data model
+before it would be able to serve the requests. So, over-provisioning API
+instances with a higher minimum auto-scaling limit would allow to decrease
+the number of requests that had to wait for the [data model
+compilation](#data-model-compilation).
+
+Also, you can use this chart to **fine-tune the
+[auto-suspension][ref-auto-sus] configuration**, e.g., by turning
+auto-suspension off or increasing the auto-suspension threshold.
+For example, the following chart shows a [Development
+Instance][ref-dev-instance] deployment that is only accessed a few times
+a day and automatically suspends after a short period of inactivity:
 
-{/* TODO: Add screenshot */}
+<Screenshot src="https://ucarecdn.com/9bf6760b-805c-413c-85fb-9402b48718cb/"/>
 
-You can use this chart to fine-tune the auto-scaling configuration of API
-instances, e.g., increase the minimum and maximum number of API instances. 
+The next chart shows a misconfigured [Production Cluster][ref-prod-cluster]
+deployment that serves the requests throughout the whole day but was
+configured to auto-suspend with a tiny threshold:
 
-Also, you can use this chart to fine-tune the auto-suspension configuration,
-e.g., by turning auto-suspension off or increasing the auto-suspension
-threshold.
+<Screenshot src="https://ucarecdn.com/2938ff51-0699-4f60-bba6-03a0132774f0/"/>
 
-### Data sources
+### Cache type
 
-The&nbsp;<Btn>Requests by data source</Btn> chart shows the number of API
-requests that were fulfilled by using cache or querying the upstream data
-source over time. The&nbsp;<Btn>Avg. response time by data source</Btn>
-shows the difference in the response time for
-requests that hit the cache or go to the upstream data source.
+The&nbsp;<Btn>Requests by cache type</Btn> chart shows the number of API
+requests that were fulfilled by using pre-aggregations, in-memory cache,
+or no cache (i.e., by querying the upstream data source). For example, the
+following chart shows a deployment that fulfills about 50% of requests by
+using pre-aggregations:
 
-{/* TODO: Add screenshot (x2) */}
+<Screenshot src="https://ucarecdn.com/fe784a74-edd5-44c0-803f-267237219b1d/"/>
+
+The&nbsp;<Btn>Avg. response time by data source</Btn> shows the difference
+in the response time for requests that hit pre-aggregations, in-memory cache,
+or no cache (i.e., the upstream data source). The next chart shows that
+pre-aggregations usually provide sub-second response times while queries to
+the data source take much longer:
+
+<Screenshot src="https://ucarecdn.com/94ac15b6-a59c-4474-ba68-e07657d55d78/"/>
 
 You can use these charts to see if you'd like to have more queries that hit
-the cache and have lower response time. In that case, consider adding more
-pre-aggregations in Cube Store or fine-tune the existing ones.
+the cache and have lower response time. In that case, **consider adding more
+[pre-aggregations][ref-pre-aggregations] in Cube Store** or fine-tune the
+existing ones.
 
 ### Data model compilation
 
 The&nbsp;<Btn>Requests by data model compilation</Btn> chart shows the
 number of API requests that had or had not to wait for the data model
-compilation. The&nbsp;<Btn>Wait time for data model compilation</Btn> chart
+compilation. For example, the following chart shows a deployment that
+only has a tiny fraction of requests that require the data model to be
+compiled:
+
+<Screenshot src="https://ucarecdn.com/022a6a71-121a-4b45-ba97-1b0fd2571556/"/>
+
+The&nbsp;<Btn>Wait time for data model compilation</Btn> chart
 shows the total time requests had to wait for the data model compilation.
+The next chart shows that at certain points of time requests had to wait
+dozens of seconds while the data model was being compiled:
 
-{/* TODO: Add screenshot */}
+<Screenshot src="https://ucarecdn.com/520d7e4b-3838-48ae-b0aa-c988f588c3d7/"/>
 
-You can use these charts to identify multitenancy misconfiguration,
-fine-tune the auto-suspension configuration, or consider using a
-multi-cluster deployment.
+You can use these charts to **fine-tune the [auto-suspension][ref-auto-sus]
+configuration** (e.g., turn it off or increase the threshold so that API
+instances suspend less frequently), **identify [multitenancy][ref-multitenancy]
+misconfiguration** (e.g., suboptimal bucketing via
+[`context_to_app_id`][ref-context-to-app-id]), or
+**consider using a [multi-cluster deployment][ref-multi-cluster]** to
+distribute requests to different tenants over a number of Production
+Cluster deployments.
 
 ### Cube Store
 
 The&nbsp;<Btn>Saturation for queries by Cube Store workers</Btn> chart
-shows if Cube Store workers are overloaded with serving **queries**. High
-saturation for queries prevents Cube Store workers from fulfilling requests
-and results in wait time displayed at the&nbsp;<Btn>Wait time for queries
-by Cube Store workers</Btn> chart.
+shows if Cube Store workers are overloaded with serving **queries**.
+High saturation for queries prevents Cube Store workers from fulfilling
+requests and results in wait time displayed at the&nbsp;<Btn>Wait time for
+queries by Cube Store workers</Btn> chart.
+
+For example, the following chart shows a deployment that uses 4 Cube Store
+workers and almost never lets them come to saturation, resulting in no wait
+time for queries: 
 
-{/* TODO: Add screenshot */}
+<Screenshot src="https://ucarecdn.com/9f33377e-ebf4-4227-9f49-a30b7f5bc04b/"/>
 
 Similarly, the&nbsp;<Btn>Saturation for jobs by Cube Store workers</Btn>
 and <Btn>Wait time for jobs by Cube Store workers</Btn> charts show if
 Cube Store Workers are overloaded with serving **jobs**, i.e., building
 pre-aggregations or performing internal tasks such as data compaction.
 
-{/* TODO: Add screenshot */}
+For example, the following chart shows a misconfigured deployment that uses
+8 Cube Store workers and keeps them at full saturation during prolonged
+intervals, resulting in huge wait time and, in case of jobs, delayed refresh
+of pre-aggregations:
+
+<Screenshot src="https://ucarecdn.com/eb3f8897-5358-4e5b-8507-b10c122d6206/"/>
+
+The next chart shows that oversaturated Cube Store workers might yield
+hours of wait time for queries and jobs:
+
+<Screenshot src="https://ucarecdn.com/14edcb1d-a22c-47f8-aef4-636c0d726fb2/"/>
+
+You can use these charts to **fine-tune the [number of Cube Store
+workers][ref-scalability-cube-store]** used by your deployment, e.g.,
+increase it until you see that there's no saturation and no wait time
+for queries and jobs.
+
 
-You can use these charts to consider fine-tuning the number of Cube Store
-workers used by your deployment.
+[ref-scalability-api]: /product/deployment/cloud/scalability#auto-scaling-of-api-instances
+[ref-scalability-cube-store]: /product/deployment/cloud/scalability#sizing-cube-store-workers
+[ref-auto-sus]: /product/deployment/cloud/auto-suspension
+[ref-dev-instance]: /product/deployment/cloud/deployment-types#development-instance
+[ref-prod-cluster]: /product/deployment/cloud/deployment-types#production-cluster
+[ref-multi-cluster]: /product/deployment/cloud/deployment-types#production-multi-cluster
+[ref-pre-aggregations]: /product/caching/using-pre-aggregations
+[ref-multitenancy]: /product/configuration/advanced/multitenancy
+[ref-context-to-app-id]: /reference/configuration/config#context_to_app_id
diff --git a/docs/pages/product/workspace/sso.mdx b/docs/pages/product/workspace/sso.mdx
@@ -3,7 +3,7 @@ redirect_from:
   - /workspace/sso/
 ---
 
-# Single Sign-On
+# Single Sign-on
 
 As an account administrator, you can manage how your team accesses Cube Cloud.
 There are options to log in using email and password, a GitHub account, or a