Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Update references #406

Merged
merged 1 commit into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions docs/modules/superset/pages/getting_started/installation.adoc
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
= Installation

On this page you will install the Stackable Superset Operator as well as the commons and secret Operator which are required by all Stackable Operators.
On this page you will install the Stackable Superset Operator as well as the commons and secret Operator which are
required by all Stackable Operators.

== Stackable Operators

There are 2 ways to run Stackable Operators

1. Using xref:stackablectl::index.adoc[]

2. Using Helm
. Using xref:management:stackablectl:index.adoc[]
. Using Helm

=== stackablectl

stackablectl is the command line tool to interact with Stackable operators and our recommended way to install Operators.
Follow the xref:stackablectl::installation.adoc[installation steps] for your platform.
`stackablectl` is the command line tool to interact with Stackable operators and our recommended way to install
Operators. Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform.

After you have installed stackablectl run the following command to install all Operators necessary for Superset:
After you have installed `stackablectl`, run the following command to install all Operators necessary for Superset:

[source,bash]
----
Expand All @@ -30,7 +30,8 @@ The tool will show
[INFO ] Installing superset operator
----

TIP: Consult the xref:stackablectl::quickstart.adoc[] to learn more about how to use stackablectl. For example, you can use the `-k` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind].
TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`. For
example, you can use the `--cluster kind` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind].

=== Helm

Expand All @@ -46,7 +47,8 @@ Then install the Stackable Operators:
include::example$getting_started/getting_started.sh[tag=helm-install-operators]
----

Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Superset service (as well as the CRDs for the required operators). You are now ready to deploy Superset in Kubernetes.
Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Superset service (as well as the
CRDs for the required operators). You are now ready to deploy Superset in Kubernetes.

== What's next

Expand Down
62 changes: 48 additions & 14 deletions docs/modules/superset/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,55 +2,89 @@
:description: The Stackable Operator for Apache Superset is a Kubernetes operator that can manage Apache Superset clusters. Learn about its features, resources, dependencies and demos, and see the list of supported Superset versions.
:keywords: Stackable Operator, Apache Superset, Kubernetes, operator, data science, data exploration, SQL, engineer, big data, CRD, StatefulSet, ConfigMap, Service, Druid, Trino, S3, demo, version

The Stackable Operator for Apache Superset is an operator that can deploy and manage https://superset.apache.org/[Apache Superset] clusters on Kubernetes. Superset is a data exploration and visualization tool that connects to data sources via SQL. Store your data in Apache Druid or Trino, and manage your Druid and Trino instances with the Stackable Operators for xref:druid:index.adoc[Apache Druid] or xref:trino:index.adoc[Trino]. This operator helps you manage your Superset instances on Kubernetes efficiently.
The Stackable Operator for Apache Superset is an operator that can deploy and manage https://superset.apache.org/[Apache
Superset] clusters on Kubernetes. Superset is a data exploration and visualization tool that connects to data sources
via SQL. Store your data in Apache Druid or Trino, and manage your Druid and Trino instances with the Stackable
Operators for xref:druid:index.adoc[Apache Druid] or xref:trino:index.adoc[Trino]. This operator helps you manage your
Superset instances on Kubernetes efficiently.

== Getting started

Get started using Superset with Stackable Operator by following the xref:getting_started/index.adoc[]. It guides you through installing the Operator alongside a PostgreSQL database, connecting to your Superset instance and analyzing some preloaded example data.
Get started using Superset with Stackable Operator by following the xref:getting_started/index.adoc[]. It guides you
through installing the Operator alongside a PostgreSQL database, connecting to your Superset instance and analyzing some
preloaded example data.

== Resources

The Operator manages three https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/[custom resources]: The _SupersetCluster_, _SupersetDB_ and _DruidConnection_. It creates a number of different Kubernetes resources based on the custom resources.
The Operator manages three https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/[custom
resources]: The _SupersetCluster_, _SupersetDB_ and _DruidConnection_. It creates a number of different Kubernetes
resources based on the custom resources.

=== Custom resources

The SupersetCluster is the main resource for the configuration of the Superset instance. The resource defines only one xref:concepts:roles-and-role-groups.adoc[role], the `node`. The various configuration options are explained in the xref:usage-guide/index.adoc[]. It helps you tune your cluster to your needs by configuring xref:usage-guide/storage-resource-configuration.adoc[resource usage], xref:usage-guide/security.adoc[security], xref:usage-guide/logging.adoc[logging] and more.
The SupersetCluster is the main resource for the configuration of the Superset instance. The resource defines only one
xref:concepts:roles-and-role-groups.adoc[role], the `node`. The various configuration options are explained in the
xref:usage-guide/index.adoc[]. It helps you tune your cluster to your needs by configuring
xref:usage-guide/storage-resource-configuration.adoc[resource usage], xref:usage-guide/security.adoc[security],
xref:usage-guide/logging.adoc[logging] and more.

When a SupersetCluster is first deployed, a SupersetDB resource is created. The SupersetDB resource is a wrapper resource for the SQL database that is used by Superset for its metadata. The resource contains some configuration but also keeps track of whether the database has been initialized or not. It is not deleted automatically if a SupersetCluster is deleted, and so can be reused.
When a SupersetCluster is first deployed, a SupersetDB resource is created. The SupersetDB resource is a wrapper
resource for the SQL database that is used by Superset for its metadata. The resource contains some configuration but
also keeps track of whether the database has been initialized or not. It is not deleted automatically if a
SupersetCluster is deleted, and so can be reused.

DruidConnection resources link a Superset and Druid instance. It lets you define this connection in the familiar way of deploying a resource (instead of configuring the connection via the Superset UI or API). The operator configures the connection between Druid and the Superset instance.
DruidConnection resources link a Superset and Druid instance. It lets you define this connection in the familiar way of
deploying a resource (instead of configuring the connection via the Superset UI or API). The operator configures the
connection between Druid and the Superset instance.

=== Kubernetes resources

Based on the custom resources you define, the Operator creates ConfigMaps, StatefulSets and Services.

image::superset_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the operator]

The diagram above depicts all the Kubernetes resources created by the operator, and how they relate to each other. The Jobs created for the SupersetDB and DruidConnnection resources are not shown.
The diagram above depicts all the Kubernetes resources created by the operator, and how they relate to each other. The
Jobs created for the SupersetDB and DruidConnnection resources are not shown.

For every xref:concepts:roles-and-role-groups.adoc#_role_groups[role group] you define, the Operator creates a StatefulSet with the amount of replicas defined in the RoleGroup. Every Pod in the StatefulSet has two containers: the main container running Superset and a sidecar container gathering metrics for xref:operators:monitoring.adoc[]. The Operator creates a Service for the `node` role as well as a single service per role group.
For every xref:concepts:roles-and-role-groups.adoc#_role_groups[role group] you define, the Operator creates a
StatefulSet with the amount of replicas defined in the RoleGroup. Every Pod in the StatefulSet has two containers: the
main container running Superset and a sidecar container gathering metrics for xref:operators:monitoring.adoc[]. The
Operator creates a Service for the `node` role as well as a single service per role group.

ConfigMaps are created, one per RoleGroup and also one for the SupersetDB. Both ConfigMaps contains two files: `log_config.py` and `superset_config.py` which contain logging and general Superset configuration respectively.
ConfigMaps are created, one per RoleGroup and also one for the SupersetDB. Both ConfigMaps contains two files:
`log_config.py` and `superset_config.py` which contain logging and general Superset configuration respectively.

== Required external component: Metastore SQL database

Superset requires an SQL database in which to store its metadata, dashboards and users. The xref:getting_started/index.adoc[] guides you through installing an example database with a Superset instance that you can use to get started, but is not suitable for production use. Follow the setup instructions for one of the xref:required-external-components.adoc[supported databases] for a production database.
Superset requires an SQL database in which to store its metadata, dashboards and users. The
xref:getting_started/index.adoc[] guides you through installing an example database with a Superset instance that you
can use to get started, but is not suitable for production use. Follow the setup instructions for one of the
xref:required-external-components.adoc[supported databases] for a production database.

== Connecting to data sources

Superset does not store its own data, instead it connects to other products where data is stored. On the Stackable Platform the two commonly used choices are xref:druid:index.adoc[Apache Druid] and xref:trino:index.adoc[Trino]. For Druid there is a way to xref:usage-guide/connecting-druid.adoc[connect a Druid instance declaratively] with a custom resource. For Trino this is on the roadmap. Have a look at the demos linked <<demos, below>> for examples of using Superset with Druid or Trino.
Superset does not store its own data, instead it connects to other products where data is stored. On the Stackable
Platform the two commonly used choices are xref:druid:index.adoc[Apache Druid] and xref:trino:index.adoc[Trino]. For
Druid there is a way to xref:usage-guide/connecting-druid.adoc[connect a Druid instance declaratively] with a custom
resource. For Trino this is on the roadmap. Have a look at the demos linked <<demos, below>> for examples of using
Superset with Druid or Trino.

== [[demos]]Demos

Many of the Stackable xref:stackablectl::demos/index.adoc[demos] use Superset in the stack for data visualization and explaration. The demos come in two main variants.
Many of the Stackable xref:demos:index.adoc[demos] use Superset in the stack for data visualization and explaration. The
demos come in two main variants.

=== With Druid

The xref:stackablectl::demos/nifi-kafka-druid-earthquake-data.adoc[] and xref:stackablectl::demos/nifi-kafka-druid-water-level-data.adoc[] demos show Superset connected to xref:druid:index.adoc[Druid], exploring earthquake and water level data respectively.
The xref:demos:nifi-kafka-druid-earthquake-data.adoc[] and xref:demos:nifi-kafka-druid-water-level-data.adoc[] demos
show Superset connected to xref:druid:index.adoc[Druid], exploring earthquake and water level data respectively.

=== With Trino

The xref:stackablectl::demos/spark-k8s-anomaly-detection-taxi-data.adoc[], xref:stackablectl::demos/trino-taxi-data.adoc[], xref:stackablectl::demos/trino-iceberg.adoc[] and xref:stackablectl::demos/data-lakehouse-iceberg-trino-spark.adoc[] demos all use a xref:trino:index.adoc[Trino] instance on top of S3 storage that hold data to analyze. Superset is connected to Trino to analyze a variety of different datasets.
The xref:demos:spark-k8s-anomaly-detection-taxi-data.adoc[], xref:demos:trino-taxi-data.adoc[],
xref:demos:trino-iceberg.adoc[] and xref:demos:data-lakehouse-iceberg-trino-spark.adoc[] demos all use a
xref:trino:index.adoc[Trino] instance on top of S3 storage that hold data to analyze. Superset is connected to Trino to
analyze a variety of different datasets.

== Supported Versions

Expand Down