Skip to content
Boxin Wang edited this page Aug 20, 2018 · 10 revisions

Overview

Databus is a low latency change capture system which has become an integral part of LinkedIn’s data processing pipeline. Databus addresses a fundamental requirement to reliably capture, flow and processes primary data changes. Databus provides the following features :

  1. Isolation between sources and consumers
  2. Guaranteed in order and at least once delivery with high availability
  3. Consumption from an arbitrary time point in the change stream including full bootstrap capability of the entire data.
  4. Partitioned consumption
  5. Source consistency preservation

Architecture

The main components in the above architecture are as follows:

Databus Relays

  1. Read changed rows from the Databus sources in the source database and serialize them as Databus data change events in an in-memory buffer
  2. Listen for requests from Databus Clients (including Bootstrap Producers) and transport new Databus data change events
  3. Please find more documentation at Databus 2.0 Relay

Databus Clients

  1. Check for new data change events on relays and execute business-logic specific callbacks.
  2. If they fall too far behind from the relays, run a catchup query to a bootstrap server.
  3. New Databus clients run a bootstrap query to a bootstrap server and then switch to a relay for recent data change events.
  4. Single clients can process an entire Databus stream or they be part of a cluster where each consumer processes only a portion of the stream.
  5. Please find more documentation at Databus 2.0 Client.

Databus Bootstrap Producers

  1. Just a special kind of Databus client.
  2. Check for new data change events on relays.
  3. Store those events in a MySQL database.
  4. The MySQL database is used for bootstrap and catchup for clients.

Databus Bootstrap Servers

  1. Listen for requests from Databus Clients and return long look-back data change events for bootstrapping and catchup.

More detailed documentation can be found at https://github.com/linkedin/databus/wiki/_pages.