-
Notifications
You must be signed in to change notification settings - Fork 46
Hybrid Hadoop
On existing Hadoop installations, a different approach involves using additional virtual machines and interacting with Hadoop components (Spark, HDFS) through a gateway node. This approach is recommended for customers with a Hadoop environment hosting heterogeneous use cases, where minimal deviation from node roles is desired. The disadvantage is that virtual machines must be sized properly according to workloads.
In addition to the services deployed on the existing cluster, additional Virtual Machines (VM’s) are required to host the non-Hadoop functions of the solution. The gateway service is required for some of these VM’s to allow for interaction with Spark, Hive, and HDFS.
Note: While the above condition is a recommended layout for production, pilot deployments may be chosen to combine the above roles into fewer VM’s. Each component of the Open Network Insight solution has integral interactions with Hadoop, but its non-Hadoop processing and memory requirements are separable with this approach.