Skip to content

Use a Terraform script to deploy Apache DolphinScheduler on Amazon EKS container service with one click, and output the DolphinScheduler access UI for immediate use.

Notifications You must be signed in to change notification settings

SEZ9/terraform-dolphinscheduler

Repository files navigation

Self-managed Apache DolphinScheduler deployment for EKS

Apache DolphinScheduler is a distributed and extensible open-source workflow orchestration platform with powerful DAG visual interfaces.You can obtain an analysis of the architecture and technical details by reviewing the official DolphinScheduler documentation.Apache DolphinScheduler

Core Architecture Details of the Deployment Script:

Architecture on EKS: The deployment is containerized on EKS and provides a publicly accessible ELB proxy domain name.

Core Components for DolphinScheduler:

Database: Utilizes AWS Aurora Serverless v2 AWS Aurora Serverless, which allows configuration of elastic scaling ranges, with a default range of 2-16 ACUs. This provides elastic database capabilities for the system.

Persistent Storage: Uses AWS EFS for serverless, elastic persistent storage AWS EFS.

Task Logging: Remote task logs are enabled by default and are stored in S3 AWS S3, where task-level logs can be reviewed.

Dependency and File Management: Dependencies and file management are stored in S3.

Namespace Management: DolphinScheduler namespace and node scaling are managed automatically by Karpenter) .

Architecture Preview

Architecture Diagram

Deployment Steps:

Install Basic Environment:

Install Terraform, Helm, and kubectl.

Execute Installation Script:

cd ${terraform}/
sh install.sh

Wait for Completion: The process will take approximately 30 minutes.

Retrieve Ingress Address:

Obtain the ingress address after deployment.

# Create or update a kubeconfig file for your cluster. Replace region-code with the AWS Region that your cluster is in and replace my-cluster with the name of your cluster. eg.

aws eks --region us-east-1 update-kubeconfig --name dolphinscheduler

kubectl get ingress -n dolphinscheduler
NAME               CLASS   HOSTS                  ADDRESS 
dolphinscheduler   alb     dolphinscheduler.org   k8s-dolphins-dolphins-xxxx.us-east-1.elb.amazonaws.com   80      3m30s

To log in to DolphinScheduler using the ELB address and port with the default username and password. http://your-elb-address:12345/dolphinscheduler/ui

Cleanup Steps

Execute Cleanup Script:

sh cleanup.sh

Requirements

Name Version
terraform >= 1.0.0
aws >= 5.0
helm >= 2.9.0
kubectl >= 1.30
kubernetes >= 2.20.0
random 3.5.1

Providers

Name Version
aws >= 5.0
aws.ecr >= 5.0
kubectl >= 1.14
kubernetes >= 2.20.0
random 3.5.1

Modules

Name Source Version
dolphinScheduler_irsa_scheduler aws-ia/eks-blueprints-addon/aws ~> 1.0
dolphinScheduler_irsa_webserver aws-ia/eks-blueprints-addon/aws ~> 1.0
dolphinScheduler_irsa_worker aws-ia/eks-blueprints-addon/aws ~> 1.0
dolphinScheduler_s3_bucket terraform-aws-modules/s3-bucket/aws ~> 3.0
amp_ingest_irsa aws-ia/eks-blueprints-addon/aws ~> 1.0
db terraform-aws-modules/rds-aurora/aws ~> 5.0
ebs_csi_driver_irsa terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks ~> 5.34
eks terraform-aws-modules/eks/aws ~> 19.15
eks_blueprints_addons aws-ia/eks-blueprints-addons/aws ~> 1.2
eks_data_addons aws-ia/eks-data-addons/aws ~> 1.2.9
fluentbit_s3_bucket terraform-aws-modules/s3-bucket/aws ~> 3.0
security_group terraform-aws-modules/security-group/aws ~> 5.0
spark_logs_s3_bucket terraform-aws-modules/s3-bucket/aws ~> 3.0
spark_team_a_irsa aws-ia/eks-blueprints-addon/aws ~> 1.0
vpc terraform-aws-modules/vpc/aws ~> 5.0

Resources

Name Type
aws_efs_file_system.efs resource
aws_efs_mount_target.efs_mt resource
aws_iam_policy.dolphinScheduler_scheduler resource
aws_iam_policy.dolphinScheduler_webserver resource
aws_iam_policy.dolphinScheduler_worker resource
aws_iam_policy.fluentbit resource
aws_iam_policy.grafana resource
aws_iam_policy.spark resource
aws_prometheus_workspace.amp resource
aws_s3_object.this resource
aws_secretsmanager_secret.dolphinScheduler_webserver resource
aws_secretsmanager_secret.postgres resource
aws_secretsmanager_secret_version.dolphinScheduler_webserver resource
aws_secretsmanager_secret_version.postgres resource
aws_security_group.efs resource
kubectl_manifest.dolphinScheduler_webserver resource
kubectl_manifest.efs_pvc resource
kubectl_manifest.efs_sc resource
kubernetes_cluster_role.spark_role resource
kubernetes_cluster_role_binding.dolphinScheduler_worker_spark_role_binding resource
kubernetes_cluster_role_binding.spark_role_binding resource
kubernetes_namespace_v1.dolphinScheduler resource
kubernetes_namespace_v1.spark_team_a resource
kubernetes_secret_v1.dolphinScheduler_scheduler resource
kubernetes_secret_v1.dolphinScheduler_webserver resource
kubernetes_secret_v1.dolphinScheduler_worker resource
kubernetes_secret_v1.spark_team_a resource
kubernetes_service_account_v1.dolphinScheduler_scheduler resource
kubernetes_service_account_v1.dolphinScheduler_webserver resource
kubernetes_service_account_v1.dolphinScheduler_worker resource
kubernetes_service_account_v1.spark_team_a resource
random_id.dolphinScheduler_webserver resource
random_password.postgres resource
aws_availability_zones.available data source
aws_caller_identity.current data source
aws_ecrpublic_authorization_token.token data source
aws_eks_cluster_auth.this data source
aws_iam_policy_document.dolphinScheduler_s3_logs data source
aws_iam_policy_document.fluent_bit data source
aws_iam_policy_document.grafana data source
aws_iam_policy_document.spark_operator data source
aws_partition.current data source
aws_region.current data source
aws_secretsmanager_secret_version.admin_password_version data source

Inputs

Name Description Type Default Required
db_private_subnets Private Subnets CIDRs. 254 IPs per Subnet/AZ for dolphinScheduler DB. list(string)
[
"10.0.20.0/26",
"10.0.21.0/26"
]
no
eks_cluster_version EKS Cluster version string "1.29" no
eks_data_plane_subnet_secondary_cidr Secondary CIDR blocks. 32766 IPs per Subnet per Subnet/AZ for EKS Node and Pods list(string)
[
"100.64.0.0/17",
"100.64.128.0/17"
]
no
enable_dolphinScheduler Enable Apache dolphinScheduler bool true no
enable_dolphinScheduler_spark_example Enable Apache dolphinScheduler and Spark Operator example bool false no
enable_amazon_prometheus Enable AWS Managed Prometheus service bool true no
name Name of the VPC and EKS Cluster string "self-managed-dolphinScheduler" no
private_subnets Private Subnets CIDRs. 254 IPs per Subnet/AZ for Private NAT + NLB + dolphinScheduler + EC2 Jumphost etc. list(string)
[
"10.0.1.0/24",
"10.0.2.0/24"
]
no
public_subnets Public Subnets CIDRs. 62 IPs per Subnet/AZ list(string)
[
"10.0.0.0/26",
"10.0.0.64/26"
]
no
region Region string "us-west-2" no
secondary_cidr_blocks Secondary CIDR blocks to be attached to VPC list(string)
[
"100.64.0.0/16"
]
no
vpc_cidr VPC CIDR string "10.0.0.0/16" no

Outputs

Name Description
configure_kubectl Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig
s3_bucket_id_dolphinScheduler_logs dolphinScheduler logs S3 bucket ID
s3_bucket_id_fluentbit_logs FluentBit logs S3 bucket ID

About

Use a Terraform script to deploy Apache DolphinScheduler on Amazon EKS container service with one click, and output the DolphinScheduler access UI for immediate use.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published