-
Notifications
You must be signed in to change notification settings - Fork 46
Clustering
Note: clustering support is available since
Since Titanoboa is based on functional principles such as immutability, it is only natural that its cluster is master-less. Due to this design choice you can easily add or remove nodes during runtime with no configuration change and zero downtime. You can scale to hundreds of nodes very easily.
Also since Titanoboa server is basically just a jar you can run the cluster nodes anywhere: on bare metal, VMs, Docker or K8s.
See for example how I added 20 extra nodes in few seconds during my recent talk at re:Clojure conference:
In video above you can see that it is possible to alter cluster just by running Titanoboa workflow within that cluster. This way you can add nodes - see e.g. here. In similar way you can even build a separate cluster using Titanoboa workflow.
In clustered environment, titanoboa nodes use a message broker to communicate. Instead of re-inventing the wheel, titanoboa simply relies on broker's ability to deliver the message - so different types of brokers and their different settings may be best fit for different use cases: use non-persisted in-memory queues for best performance. Or use persisted queues on a highly available broker to ensure failover and high availability. In similar way, if you use a broker with unlimited scalability that spans across multiple availability zones (such as AWS SQS or Kafka) your titanoboa cluster can then have unlimited scalability and can span multiple availability zones.
Using shared file system is not mandatory though it makes Titanoboa Cluster easier to configure and to control - on the shared file system (whether it is a physical drive, S3, EFS, HDFS etc.) can reside one or more of the following:
- workflow repository
- job folders (each workflow job can optionally create a unique job folder to make share of files/data between workflow steps easier)
- server configuration file
- external dependencies file
Since I always love to steal ideas from sci-fi movies and various areas of physics, T-boa's Cluster is based on following two principles:
- In Space No One Can Hear You Scream
- When You Are Not Looking The Universe Does Not Exist
For each Titanoboa Cluster, there is a "heartbeat" exchange set up in the underlying message broker. Each node periodically sends a broadcast of its internal state to this exchange. These broadcasts may be received by some consumer or very well may be not. There is no guarantee.
Any cluster node can become "Cluster Aware" and subscribe to the State Broadcast Exchange. It will then start to receive the state broadcasts from the other nodes and based on these snapshots (or puzzle pieces) of states it can start building its own understanding of what is happening in the entire cluster.
By default nodes are not cluster aware, but any can become one - e.g. when you connect to GUI on a particular node and open it in the browser the node will become cluster aware and you can use it as a "master" node.
Also each node contains a simple proxy module, so you can use its very API to send commands to other nodes as well.
As with Hubble, when you look at stars you see only their ancient snapshots as the light has been traveling for billions of years - the same applies when you open a GUI and try to make sense of what is happening in the cluster.
Since titanoboa servers are not serializing over a single Database but rather leverage some particular Message Broker (e.g. Rabbit MQ, Kafka, JMS or SQS) they are built for massive scale. You can run hundreds of nodes easily. The underlying MQ broker is transactional, so you can rest assured that the workflow jobs will get processed, failovers will get handled etc.
But the scale will always have a trade-off: State of your cluster in any given point in time (e.g. jobs running, their state etc.) will always be limited literally by the speed of light (i.e. latency; and also because node broadcast their states only periodically).
So when watching workflow jobs in your GUI always be mindful that there might be latency in play.
(in-ns 'titanoboa.server)
(log/info "Hello, I am RMQ server-config and I am being loaded...")
;;RMQ setup:
(alter-var-root #'server-config
(constantly {:systems-catalogue {:core {:system-def #'titanoboa.system.rabbitmq/distributed-core-system
:worker-def #'titanoboa.system.rabbitmq/distributed-worker-system
:autostart true
:worker-count 2
:restart-workers-on-error true}
:archival-system {:system-def #'titanoboa.system.rabbitmq/archival-system
:autostart true}}
:jobs-repo-path "/home/miro/work/titanoboa/titanoboa-lite/release-tests/cluster-test/repo/"
:steps-repo-path "/home/miro/work/titanoboa/titanoboa-lite/release-tests/cluster-test/step-repo/"
:job-folder-path "/home/miro/work/titanoboa/titanoboa-lite/release-tests/cluster-test/job-folders/"
:log-file-path "titanoboa.log"
:enable-cluster true
:heartbeat-exchange-name "heartbeat"
:cmd-exchange-name "command"
:jobs-cmd-exchange-name "jobs-command"
:cluster-eviction-interval (* 1000 30)
:cluster-eviction-age (* 1000 60)
:cluster-state-broadcast #'titanoboa.system.rabbitmq/cluster-broadcast-system
:cluster-state-subs #'titanoboa.system.rabbitmq/cluster-subscription-system
:cluster-state-fn #'titanoboa.cluster/process-broadcast
:archive-ds-ks [:archival-system :system :db-pool]
:systems-config {:core {:new-jobs-queue "titanoboa-bus-queue-0"
:jobs-queue "titanoboa-bus-queue-1"
:archival-queue "titanoboa-bus-archive"
:eviction-age (* 1000 60 5)
:eviction-interval (* 1000 30)}
:archival-system {:jdbc-url "jdbc:postgresql://localhost:5432/mydb?currentSchema=titanoboa"
:archival-queue "titanoboa-bus-archive"}}}))
{:coordinates [[org.postgresql/postgresql "9.4.1208"]
[com.novemberain/langohr
"3.5.0"
:exclusions
[cheshire]]],
:require [[titanoboa.system.rabbitmq]
[titanoboa.database.postgres]
[titanoboa.cluster]],
:repositories {"clojars" "https://clojars.org/repo",
"central" "https://repo1.maven.org/maven2/"},
:import [[org.postgresql.Driver]]}
#!/bin/bash
java -Dboa.server.port=3001 -Dboa.server.config.path=/home/miro/work/titanoboa/titanoboa-lite/release-tests/cluster-test/boa_server_config_rmq.clj -Dboa.server.dependencies.path=/home/miro/work/titanoboa/titanoboa-lite/release-tests/cluster-test/ext-dependencies.clj -cp "./build/titanoboa.jar:./lib/*" titanoboa.server
Note the -Dboa.server.port
property in the script above - this way you can easily test the cluster setup locally, changing port number for each instance.