Skip to content
troy ding edited this page May 23, 2014 · 11 revisions

Welcome to the resa wiki!

Resa is a general framework for robust, elastic and realtime data stream processing over the cloud.

#Overview <img src="https://raw.githubusercontent.com/wiki/troyding/resa/architecture.png" alt="architecture" width="80%"/ >

Modules

Allocation Optimize

Allocation Optimize Module takes a topology's execution details as its input, and calculates the suggested parallelism for each component in this topology. Finally, it decides whether the newly calculated suggestion should be submitted to the nimbus. It guarantees that the submitted parallelism suggestion will be the best allocation under the current workload of the topology.

Implementation

##Package overview

  • resa.metrics: Provides supports for metrics collection
  • resa.optimize: Provides classes and interfaces for topology optimizing
  • resa.topology: Classes for building user typologies
  • resa.util: Some useful tools, such as config utility, topology helper and so on

Allocation Optimize

To run this module, the first step is to retrieve topology's execution details, which are called metrics in the Storm system. After that, the collected metrics are forwarded to a TopologyOptimizer instance, which will then calculate an optimized allocation and decide whether such allocation should be taken into effect.

How metrics are collected

We collect topology's execution details through Storm's metrics system. In Storm, a system bolt named MetricsConsumer bolt is started to collect a topology's metrics. The metrics produced by other components will be sent to this MetricsConsumer bolt. Storm provides API to customize a topology's metrics consumer. Classes behave as a metrics consumer must implement the IMetricsConsumer interface. Our implemented class is named resa.metrics.ResaMetricsCollector. For efficiency, ResaMetricsCollector will start a TopologyOptimizer instance, so that we can push the metrics data just simply through a queue.

By default, Storm collects several types of metrics which are called built-in metrics, such as process-latency, execute-latency, execute-count and so on. The definition of all these metrics can be found in the source file builtin-metrics.clj. Unfortunately, these metrics only cannot meet our demands, so we need to extend Storm's metric API to collect more types of metrics. Class resa.metrics.MeasurableBolt is used to collect the bolt metrics, while Class resa.metrics.MeasurableSpout is for the spout metrics. Names of collected metrics by resa can be found in the class resa.metrics.MetricNames

Calculate Optimized allocation

The main class to calculate optimized allocation is ResaMetricsCollector, which we mentioned above.

Summit to nimbus