Skip to content
This repository has been archived by the owner on May 15, 2019. It is now read-only.

Pure Hadoop

emilymatthews edited this page Sep 27, 2016 · 4 revisions

Apache Spot (Incubating) can be installed on a new or existing Hadoop cluster, its components viewed as services and distributed according to common roles in the cluster. One approach is to follow the recommended deployment of CDH (see diagram below).

This approach is recommended for customers with a dedicated cluster for use of the solution or a security data lake; it takes advantage of existing investment in hardware and software. The disadvantage of this approach is that it does require the installation of software on Hadoop nodes not managed by systems like Cloudera Manager.

Pure Hadoop

In the Pure Hadoop deployment scenario, the ingest component runs on an edge node, which is an expected use of this role. It is required to install some non-Hadoop software to make ingest component work. The Operational Analytics runs on a node intended for browser-based management and user applications, so that all user interfaces are located on a node or nodes with the same role. The Machine Learning (ML) component is installed on worker nodes, as the resource management for an ML pipeline is similar for functions inside and outside Hadoop.

Although both of these deployment options are validated and supported, additional scenarios that combine these approaches are certainly possible.