Skip to content

bhatti/formicary

Repository files navigation

formicary

formicary logo

The formicary is a distributed orchestration engine that allows to execute batch jobs, workflows or CI/CD pipelines based on docker, kubernetes, shell, http or messaging executors.

GoDoc Go Report Card Maintainability Test Coverage Docker Image Version (latest by date)

Overview

The formicary is a distributed orchestration engine for executing background jobs and workflows that are executed remotely using Docker/Kubernetes/Shell/HTTP/Websocket/Messaging or other protocols. A job comprises directed acyclic graph of tasks, where the task defines a unit of work. The formicary architecture is based on the Leader-Follower (or master/worker), Pipes-Filter, Fork-Join and SEDA deisgn patterns. The queen-leader schedules and orchestrates the graph of tasks and ant-workers execute the work. The task work is distributed among ant-workers based on tags executor protocols such as Kubernetes, Docker, Shell, HTTP, etc. The formicary uses an object-store for persisting or staging intermediate or final artifacts from the tasks, which can be used by other tasks as input for their work. This allows building stages of tasks using Pipes and Filter and SEDA patterns, where artifacts and variables can be passed from one task to another so that output of a task can be used as input of another task. The Fork/Join pattern allows executing work in parallel and then joining the results at the end. The main use-cases for formicary include:

  • Processing directed acyclic graphs of tasks
  • Batch jobs such as ETL, data imports and other offline processing
  • Scheduled batch processing such as clearing, settlement, etc
  • Data Pipelines such as processing a large size data in background
  • CI/CD Pipelines for building, testing and deploying code
  • Automation for repetitive tasks
  • Building workflows of tasks that have complex dependencies and can interact with a variety of protocols

Features:

  • Declarative definition of a job consisting of directed acyclic graph (DAG) of tasks using a simple yaml configuration file.
  • GO based templates for job-definitions so that you can define customized variables and actions.
  • Persistence of artifacts from tasks that can be used by other tasks or used as output of jobs.
  • Extensible Method abstraction for supporting a variety of execution protocols such as Docker, Kubernetes HTTP, Websocket, Messaging or other customized protocols.
  • Caching of dependencies such as npm, maven, gradle, python, etc.
  • Encryption for storing secured configuration in the database or while in network communication.
  • Cron based scheduled processing where jobs can be executed at specific times or run periodically.
  • Optional tasks that can fail without failing entire job.
  • Finalized or always-run task that are executed regardless if the job fails or succeeds.
  • Child jobs using fork/await so that a job can spawn other jobs that are executed asynchronously and then joins the results later in the job workflow.
  • Job/Task retries where a failed job or task can be rerun for a specified number of times or based on error/exit codes. The job rety supports partial restart so that only failed tasks are rerun upon retries.
  • Filtering of jobs/task execution based on user-defined conditions or parameters.
  • Job priority, where higher priority jobs are executed before the low priority jobs.
  • Job cancellation that can cleanly stop job and task execution.
  • Applies CPU/Memory/Disk quota to tasks for managing available computing resources.
  • Provides reports and statistics on job outcomes and resource usage such as CPU, memory and storage.
  • Resource constraints based scheduling and routing where ants register with tags that support special annotations and tasks are routed based on tags defined in the job definition.
  • Ant executors support multiple protocols that ants can register with queen node such as queue, http, websocket, docker, kubernetes, etc.
  • Pub/sub based events are used to propagate real-time updates of job/task executions to UI or other parts of the system other parts of the system.
  • Streaming of real-time Logs to the UI as job/tasks are processed.
  • Provides email notifications on job completion or failures.
  • Authentication and authorization using OAuth, JWT and RBAC standards.
  • Graceful shutdown of queen server and ant workers that can receive a shutdown signal and the server/worker processes stop accepting new work but waits until completion of in-progress work. Also, supports abrupt shutdown of queen server so that jobs can be resumed from the task that was in the progress. As the task work is handled by the ant worker, no work is lost.
  • Metrics/auditing/usage of jobs and user actions.

Requirements:

3rd party Libraries

Version

  • 0.1

License

  • AGPLv3 (GNU Affero General Public License)

Docs

Operations

Installation

Installing formicary

Running

Running formicary

Queen/Ants Configuration

Configuration for Queen (server) and Ants (workers)

User Guides

Getting Started

Building Pipelines

Parallel Pipelines with parent/child

CD/CD Pipelines

Simple ETL Job

ETL Examples

Public Plugins

Developing Public Plugins

Kubernetes

Kubernetes Examples

How-To Guides

Job / Task Definition Configuration Options

Job / Task Definition Configuration

API Docs

API Docs

Comparison

Comparison with other frameworks and solutions

Migrating from Airflow

Apache Airflow

Code and Design

Architecture

Formicary Architecture

Executors

Ant Executors

Development

Formicary Development Guide

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published