This is the new way for data teams to discover, understand, trust, and collaborate on data assets. ODD serves as a tool to put Data Governance strategies into practice and this guide will show you an easy way to get Open Data Discovery up and running on Amazon EKS.
This environment consists of:
- ODD Platform – an application that ingests, structurizes, indexes and provides a collected metadata via REST API and UI
- PostgreSQL sample database
- Before you start, ensure that you have an AWS account and if not, then you have to create one.
- Provision an EKS Cluster
- Install and deploy PosgreSQL
- Deploy and run Open data Discovery (ODD)
-
Step 1. Click on Quick lunch and you’ll be redirected to Cloud Formation Stack on AWS the account where you are logged in. Please, check that you are in one of the supported regions: us-west-2, us-west-1, us-east-2, us-east-1.
-
Step 2. You’ll be directed through several setup stages, including following ones:
-
Cluster Setup
- Cluster Name: Supply a unique and descriptive name for your EKS cluster, like “MyEKS-Cluster”. The default name is pre-set as: ODD-EKS.
-
Node Group
-
Instance Types: Choose EC2 Instance types for your worker nodes. The default type is pre-set as: t3.large.
-
Desired Capacity: Indicate the quantity of worker nodes you want in the node group, The default is configured as 1.
-
SSH Key Pair: Opt for an existing or create a new one for secure worker node access.
-
-
Role
- Provide an existing role with sufficient privileges or create and assign a new one.
-
-
Step 3.Check all your configurations to confirm their correctness.
-
Step 4.Click “Create Stack” to confirm the EKS cluster creation process.
To begin, authenticate kubectl with your EKS cluster. AWS offers a convenient command:
aws eks --region <region> update-kubeconfig --name <cluster-name>
Replace with the AWS region where your EKS cluster is deployed and with the name of your EKS cluster to have a command similar to following:
aws eks --region us-east-1 update-kubeconfig --name ODD-EKS
At the current state only following regions are available:
- us-west-2
- us-west-1
- us-east-2
- us-east-1
Confirm that your kubectl configuration is correctly set by listing the available nodes in your cluster:
kubectl get nodes
Visit the Helm Github releases page and download the suitable Helm binary. You can use the following command:
sudo yum install -y openssl && curl -sSL https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
To ensure a successful installation, use the command:
helm version --short
Add a repository to access pre-built charts:
helm repo add bitnami https://charts.bitnami.com/bitnami
Install PosgreSQL with the command:
helm install postgresql bitnami/postgresql --set primary.persistence.enabled=false --set global.postgresql.auth.database=odd-platform
This basic deployment can be tailored by adjusting values in the Helm chart to meet your specific requirements.
To check the status of your deployment after the installation is done, use:
kubectl get pods
Upon the successful installation of PosgreSQL, an auto-generated password becomes available.It’s a good practice to store this password as an environment variable and use it when working with the ODD platform.
To do that, execute the following command:
export POSTGRES_PASSWORD=$(kubectl get secret --namespace default postgresql -o jsonpath="{.data.postgres-password}" | base64 -d)
To deploy ODD platform, first you need to add a repository:
helm repo add opendatadiscovery https://opendatadiscovery.github.io/charts
helm install odd-platform opendatadiscovery/odd-platform --set config.yaml.spring.datasource.username=postgres --set config.yaml.spring.datasource.password="$POSTGRES_PASSWORD" --set config.yaml.spring.datasource.url="jdbc:postgresql://postgresql:5432/odd-platform" --set service.type=LoadBalancer --set service.annotations."service\.beta\.kubernetes\.io/load-balancer-source-ranges"="<IPAddressOfYourLocalStationHere>/32"
To find your IP address follow these instructions.
- For Windows OS, you can search for “What is my IP” in your preferred search engine.
- For MacOS and Linux, use the command
wget -qO- ipecho.net/plain
And your public IP address will be displayed in the terminal output. Also, if you are behind a router firewall, the IP address you retrieve will be the public IP assigned to your router by your ISP.
For example,
helm install odd-platform opendatadiscovery/odd-platform --set config.yaml.spring.datasource.username=postgres --set config.yaml.spring.datasource.password="$POSTGRES_PASSWORD" --set config.yaml.spring.datasource.url="jdbc:postgresql://postgresql:5432/odd-platform" --set service.type=LoadBalancer --set service.annotations."service\.beta\.kubernetes\.io/load-balancer-source-ranges"="83.3.12.58/32"
If you wish to enable connectivity with multiple IPs, you’ll need to execute the following set of commands instead:
helm upgrade odd-platform opendatadiscovery/odd-platform --set config.yaml.spring.datasource.username=postgres --set config.yaml.spring.datasource.password=" $POSTGRES_PASSWORD" --set config.yaml.spring.datasource.url="jdbc:postgresql://postgresql:5432/odd-platform" --set service.type=LoadBalancer --set service.annotations."service\.beta\.kubernetes\.io/load-balancer-source-ranges"="<YourIPAddressHere>/32\,<AnotherIPAddressHere>/32"
Do not forget to replace the strings and in this command with your IP addresses separated with commas and written in double quotation marks.
There is a common command for this action:
kubectl get pods
kubectl get svc
After completing the setup and ensuring everything is up and running, you can start using the ODD platform through your web browser. To do this, obtain the hostname of your Load Balancer and use it to establish a connection to your EKS.
kubectl get svc odd-platform -o=custom-columns=EXTERNAL-IP:.status.loadBalancer.ingress[0].hostname | tail -n 1
If the setup is successful, you will be able to access the platform demo page directly from your web browser.
With the versions of the platform >= 0.18.0 you could get acquainted with the API of the platform by simply visiting Swagger UI. For example, if for Load Balancer host a1e67ff8befc54b75969f9834a6e329a-948212351
we could visit http://a1e67ff8befc54b75969f9834a6e329a-948212351.us-east-1.elb.amazonaws.com/api/v3/webjars/swagger-ui/index.html
.
In this setup there are no certificates created to use encrypted communication. Be aware that only http protocol is supported in this setup. For example, http://a1e67ff8befc54b75969f9834a6e329a-948212351.us-east-1.elb.amazonaws.com/
This protocol is not secure, please, do not send any sensitive information via this connection! Demonstration purpose only! For production cases please configure HTTPS Protocol.
Deletion starts with uninstalling the platform
helm uninstall odd-platform
To avoid incurring additional charges or when you’re confident that you no longer require your current resources any longer you can delete your Cloudformation Stack.
Setting up the Collector involves several steps.
- Begin by accessing your ODD landing page and heading to the Management section, where you can initiate the configuration process. Provide a Name for the collector and save the settings.
- Make sure to securely copy and store the token generated by the platform for future use and if not, then the token will need to be regenerated for your next session.
- Once you have completed the initial setup, proceed by opening your AWS Cloudshell and entering the following command:
sudo yum install -y openssl && curl -sSL https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
Executing this command will install Helm3, a Kubernetes package manager directly from its Github repository onto your system.
To verify successful installation, you can use the command:
helm version --short
- Now, it is time to proceed with adding the ODD repository and configuring the collector files. This can be accomplished by executing the following commands in the specified order.
helm repo add opendatadiscovery https://opendatadiscovery.github.io/charts
wget https://raw.githubusercontent.com/opendatadiscovery/charts/master/cloudformation/collector-values.yaml
Note: you need to replace the Generated token part in following command with the token you have copied earlier and run it.
sed -i 's/odd-token/<Generated token>/g' collector-values.yaml
export POSTGRES_PASSWORD=$(kubectl get secret --namespace default postgresql -o jsonpath="{.data.postgres-password}" | base64 -d) helm install odd-collector opendatadiscovery/odd-collector --set nameOverride=odd-collector --set passwordSecretsEnvs.POSTGRES_PASSWORD=$POSTGRES_PASSWORD -f collector-values.yaml
If you’ve followed the instructions correctly, you should see in outcome in your Cloudshell informing you that ODD Collector is up and running.
Furthermore, we’ve made it available for you to include additional plugins if desired.
To do that, manually update the collector_config.yaml file with your chosen plugins and then run the following command in the Cloudshell:
helm upgrade --install odd-collector opendatadiscovery/odd-collector --set nameOverride=odd-collector --set passwordSecretsEnvs.POSTGRES_PASSWORD=$POSTGRES_PASSWORD -f collector-values.yaml