Skip to content
This repository has been archived by the owner on May 15, 2019. It is now read-only.

Install and Configure Ingest

natedogs911 edited this page Apr 21, 2016 · 1 revision

First copy the solution code (ingest.tar.gz) to the /home/ directory.

git clone https://github.com/Open-Network-Insight/oni-ingest.git
cd oni-ingest
...

Two configuration files must be edited:

  • The master configuration (etc/master.json), that controls where and when the ingest component looks for new files
  • The worker configuration (etc/worker.json), that tells ingest workers where to get file data and how to decode.

You only need to edit the configuration sections (flow, dns) for the data sources that the solution will use.

Master Configuration

Open the master_ingest.json file for editing.

 cd etc
 vi master_ingest.json

Here is a sample of the master_ingest.json file, with an explanation of the configuration variables.We recommend only changing the collector_path and pcap_split_staging variables.

cat master_ingest.json
{
    "dns":{
        "collector_path":"/mnt/sec_shared.nfs/dns",
        "pkt_num":"650000",
        "pcap_split_staging":"/mnt/sec_shared.nfs/dns",
        "time_to_wait":3600,
        "queue_name":"dns_ingest_queue"
    },
    "flow":{
        "collector_path":"/mnt/sec_shared.nfs/nfcapd",
        "queue_name":"flow_ingest_queue"
    }
}

collector_path: this should be a path on the local file system where binary files are staged, from either the nfcapd service or as a staging area for pcap files.

pkt_num:(packet only) number of packets per file (passed to editcap when splitting pcap files that are larger than 1GB).

pcap_split_staging: (packet only) this is where split files will be placed on the local file system before they are loaded into HDFS.

time_to_wait: (packet only) the polling interval of the master ingest process on the local file system.

queue_name: the name of the queue that will appear in the queue list.

Worker Configuration

Open the worker_ingest.json file for editing.

vi worker_ingest.json

Here is a sample of the worker_ingest.json file, with an explanation of the configuration variables. We recommend only changing the rabbitmq_server variable. Modifying the process_opt variable can require breaking changes to the solution code (skilled developers only!).

cat worker_ingest.json
{
    "rabbitmq_server":"10.10.10.10",
    "dns":{
            "queue_name":"dns_ingest_queue",
            "process_opt":"-E separator=, -E header=y -T fields -e frame.time -e frame.len -e ip.src -e ip.dst -e dns.resp.name -e dns.resp.type -e dns.resp.class -e dns.flags -e dns.flags.rcode -e dns.a 'dns.flags.response == 1'"
    },
    "flow":{		
            "queue_name":"flow_ingest_queue",
            "process_opt":""
    }
}

rabbitmq_server: this is the IP address of the edge node used for the ingest component (in most cases, the master and workers are all on the same node).

process_opt: the flags passed to the binary decoders.

queue_name: the name of the queue that will appear in the queue list.