Skip to content
emagutu edited this page Apr 23, 2015 · 20 revisions
### Introduction

Enterprise Log Search and Archive is a solution to achieve the following:

  • Normalize, store, and index logs at unlimited volumes and rates
  • Provide a simple and clean search interface and API
  • Provide an infrastructure for alerting, reporting and sharing logs
  • Control user actions with local or LDAP/AD-based permissions
  • Plugin system for taking actions with logs
  • Exist as a completely free and open-source project

ELSA accomplishes these goals by harnessing the highly-specialized strengths of other open-source projects: Perl provides the glue to asynchronously tie the log receiver (Syslog-NG) together with storage (MySQL) and indexing (Sphinx Search) and serves this over a web interface provided either by Apache or any other web server, including a standalone pure-Perl server for a lighter footprint.

Table of Contents

### Why ELSA? I wrote ELSA because commercial tools were both lacking and cost prohibitive. The only tool that provided the features I needed was Splunk. Unfortunately, it was cost prohibitive and was too slow to receive the log volume I wanted on the hardware I had available. ELSA is inspired by Splunk but is focused on speed versus dashboards and presentation.

In designing ELSA, I tried the following components but found them too slow. Here they are ordered from fastest to slowest for indexing speeds (non-scientifically tested):

  1. Tokyo Cabinet
  2. MongoDB
  3. TokuDB MySQL plugin
  4. Elastic Search (Lucene)
  5. Splunk
  6. HBase
  7. CouchDB
  8. MySQL Fulltext
### Capabilities ELSA achieves n node scalability by allowing every log receiving node to operate completely independently of the others. Queries from a client through the API against the nodes are sent in parallel so the query will take only the amount of time the of the longest response. Query results are aggregated by the API before being sent to the client as a response. Response times vary depending on the number of query terms and their selectivity, but a given node on modest hardware takes about one half second per billion log entries.

Log reception rates greater than 50,000 events per second per node are achieved through the use of a fast pattern parser in Syslog-NG called PatternDB. The pattern parser allows Syslog-NG to normalize logs without resorting to computationally expensive regular expressions. This allows for sustained high log reception rates in Syslog-NG which are piped directly to a Perl program which further normalizes the logs and prepares large text files for batch inserting into MySQL. MySQL is capable of inserting over 100,000 rows per second when batch loading like this. After each batch is loaded, Sphinx indexes the newly inserted rows in temporary indexes, then again in larger batches every few hours in permanent indexes.

Sphinx can create temporary indexes at a rate of 50,000 logs per second consolidate these temporary indexes at around 35,000 logs per second, which becomes the terminal sustained rate for a given node. The effective bursting rate is around 100,000 logs per second, which is the upper bound of Syslog-NG on most platforms. If indexing cannot keep up, a backlog of raw text files will accumulate. In this way, peaks of several hours or more can be endured without log loss but with an indexing delay.

The overall flow diagram looks like this:

Live, continuously:

Network → Syslog-NG (PatternDB) → Raw text file

or

HTTP upload → Raw text file

Batch load (by default every minute):

Raw text file → MySQL → Sphinx

### Installation Installation is done by running the install.sh file obtained either by downloading from the sources online or grabbing from the install tarball featured on the ELSA Google Code home page. When install.sh runs, it will check for the existence of /etc/elsa_vars.sh to see if there are any local customizations, such as passwords, file locations, etc. to apply. The install.sh script will update itself if it finds a newer version online, so be sure to store any changes in /etc/elsa_vars.sh. The install.sh script should be run separately for a node install and a web install. You can install both like this: sh install.sh node && sh install.sh web. Installation will attempt to download and install all prerequisites and initialize databases and folders. It does not require any interaction.

Currently, Linux and FreeBSD 8.x are supported, with Linux distros based on Debian (including Ubuntu), RedHat (including CentOS), and SuSE tested. install.sh should run and succeed on these distributions, assuming that the defaults are chosen and that no existing configurations will conflict.

### Updates

Updating an installation is done via the install.sh file (assuming your ELSA directory is /usr/local/elsa): sh /usr/local/elsa/contrib/install.sh node update && sh /usr/local/elsa/contrib/install.sh web update. This will check the web for any updates and apply them locally, taking into account local customizations in /etc/elsa_vars.sh.

### Plugins ELSA ships with several plugins:

These plugins tell the web server what to do when a user clicks the "Info" link next to each log. It can do anything, but it is designed for returning useful information in a dialog panel in ELSA with an actions menu. An example that ships with ELSA is that if a StreamDB URL is configured (or OpenFPC) any log that has an IP address in it will have a "getPcap" option which will autofill pcap request parameters for one-click access to the traffic related to the log being viewed.

New plugins can be added easily by subclassing the "Info" Perl class and editing the elsa_web.conf file to include them. Contributions are welcome!

### File Locations The main ELSA configuration files are /etc/elsa_node.conf and /etc/elsa_web.conf. All configuration is controlled through these files, except for query permissions which are stored in the database and administrated through the web interface. Nodes read in the elsa_node.conf file every batch load, so changes may be made to it without having to restart Syslog-NG.

Most Linux distributions do not ship recent versions of Syslog-NG. Therefore, the install compiles it from source and installs it to $BASE_DIR/syslog-ng with the configuration file in $BASE_DIR/syslog-ng/etc/, where it will be read by default. By default, $BASE_DIR is /usr/local and $DATA_DIR is /data. Syslog-NG writes raw files to $DATA_DIR/elsa/tmp/buffers/ and loads them into the index and archive tables at an interval configured in the elsa_node.conf file, which is 60 seconds by default. The files are deleted upon successful load. When the logs are bulk inserted into the database, Sphinx is called to index the new rows. When indexing is complete, the loader notes the new index in the database which will make it available to the next query. Indexes are stored in $DATA_DIR/sphinx and comprise about as much space as the raw data stored in MySQL.

Archive tables typically compress at a 10:1 ratio, and therefore use only about 5% of the total space allocated to logs compared with the index tables and indexes themselves. The index tables are necessary because Sphinx searches return only the ID's of the matching logs, not the logs themselves, therefore a primary key lookup is required to retrieve the raw log for display. For this reason, archive tables alone are insufficient because they do not contain a primary key.

If desired, MySQL database files can be stored in a specified directory by adding the "mysql_dir" directive to elsa_node.conf and pointing it to a folder created which has proper permissions and SELinux/apparmor security settings.

### Hosting all files locally If your ELSA web server will not have Internet access, you will need to host the Javascript for the web pages locally. To do this, after installing:
cd /usr/local/elsa/web/inc
wget "http://yuilibrary.com/downloads/yui2/yui_2.9.0.zip"
unzip yui_2.9.0.zip

Edit the elsa_web.conf file and set yui/local to be "inc" and comment out "version" and "modifier."

### Caveats for Local File Hosting If Internet access is not available, some plugins will not function correctly. In particular the whois plugin uses an external web service to do lookups, and these will not be possible without Internet connectivity. In addition, dashboards will not work if the client's browser does not have connectivity to Google to pull down their graphing library. ### Web Server The web frontend is typically served with Apache, but the Plack Perl module allows for any web server to be used, including a standalone server called Starman which can be downloaded from CPAN. Any implementation will still have all authentication features available because they are implemented in the underlying Perl.

The server is backended on the ELSA web database, (elsa_web by default), which stores user information including permissions, query log, stored results, and query schedules for alerting.

Admins are designated by configuration variables in the elsa_web.conf file, either by system group when using local auth, or by LDAP/AD group when using LDAP auth. To designate a group as an admin, add the group to the array in the configuration. Under the “none” auth mode, all users are admins because they are all logged in under a single pseudo-username.

The web server is required for both log collectors and log searchers (node and web) because searches query nodes (peers) using a web services API.

### Configuration Most settings in the elsa_web.conf and elsa_node.conf files should be fine with the defaults, but there are a few important settings which need to be changed depending on the environment. ### elsa_web.conf: * Nodes: Contains the connection information to the log node databases which hold the actual data. * Auth_method: Controls how authentication and authorization occurs. For LDAP, the ldap settings must also be filled out. * Link_key: should be changed to something other than the default. It is used to salt the auth hashes for permalinks. * Email: For alerts and archive query notifications, you need to setup the email server to use. If you wish to get the actual results from an alert, in addition to a link to the results, add the following config to the email section:
	 "email": {
			"include_data": 1
		}
  • Meta_db: Should point to the database which stores the web management information. This can reside on a node, but probably shouldn't. The performance won't be much of a factor, so running this locally on the web server should be fine.

  • Excluded_classes: If you want to remove some classes from the menus and searches altogether, configure the config entry for excluded_classes like this:

      "excluded_classes": {
      	"BRO_SSL": 1
      },
    

APIKeys: The "apikeys" hash holds all known username/apikey combinations, such as:

	"apikeys": { "elsa": "abc" }

Peers: Configuration for how this ELSA node will talk to other ELSA nodes. Note that a configuration for itself (127.0.0.1) is required for any query to complete. An example configuration is:

	"peers": {
		"127.0.0.1": {
			"url": "http://127.0.0.1/",
			"user": "elsa",
			"apikey": "abc"
		}
	}
  • Default OR: By default, all search terms are required to be found in the event to constitute a match (AND). If you wish, you can set the config value "default_or" to a true value to change the default behavior to making the search match if any of the given values are true:

      "default_or": 1
    
### elsa_node.conf: * Database: Edit the connection settings for the local database, if non-default. * Log_size_limit: Total size in bytes allowed for all logs and indexes. * Sphinx/perm_index_size: This setting must be tweaked so that perm_index_size number of logs come into the system before (num_indexes * sphinx/index_interval) seconds pass. * Archive/percentage: Percentage of log_size_limit reserved for archive. * Archive/days: Max number of days to retain logs for in the archive * Sphinx/days: Max number of days to retain logs for in the indexes * forwarding/forward_only: This node will only forward logs and not index them. * forwarding/destinations: An array of hashes of forwarders, as detailed in the Forwarding section. ### Forwarding Logs ELSA can be setup to forward (replicate) logs to an unlimited number of destinations in several ways:
Method Config Directive
File Copy cp
SSH scp
HTTP/S url

File Copy

Configuration options:

Option Meaning Required
dir Directory to copy the file to. This can be a destination where backup agent reads from or an NFS mount. Yes

SSH

Configuration options:

Option Meaning Required
user Username for SSH Yes
password Password for the user If no key_path
key_path Path for RSA/DSA keypair files (.pub) If no password
host IP or DNS name of host to forward to Yes
dir Remote directory to copy to Yes

URL

Configuration items:

Option Meaning Required
url Full URL, (including https://), of where to send logs Yes
verify_mode Boolean indicating whether strict SSL certificate checking is to be enforced. Use zero for certificates that don't have a trusted certificate authority on the forwarder (default self-signed, for instance) No
timeout Number of seconds to issue a timeout on. Defaults to zero (no timeout) No
ca_file SSL certificate authority file to use to verify the remote server's certificate No
cert_file Client-side SSL certificate the server may require to verify the client's identity No
key_file Key corresponding with cert_file No

An example forwarding configuration may look like this:

	"forwarding": {
		"forward_only": "1",
		"destinations": [
			{ "method": "url", "url": "http://example.com/API/upload" },
			{ "method": "url", "url": "https://secure.example.com/API/upload", "ca_file": "/etc/mycafile.pem" }
		]
	}

Low volume configuration tuning

If your ELSA node isn't receiving many logs (less than a few hundred per minute), you may need to tune your setup so that permanent indexes aren't underutilized. There are at most num_indexes number of permanent indexes, and if there isn't a free one available, the oldest one will be overwritten. If this happens before the log_size_limit has been reached, then it means that you rolled logs before you wanted to. This means you need to tweak some settings in elsa_node.conf:

  • Increase num_indexes to something larger like 400
  • Increase allowed_temp_percent to 80 This should give you .8 x 400 x 60 seconds of time before temp indexes get rolled into a perm index, and should give you more perm indexes before they get rolled. With 400 perm indexes, that should be more than 88 days of possible index time. If that's still not enough, move index_interval up from 60 seconds to something larger (this will extend the "lifetime" of a temp index).

If you set num_indexes to be larger than 200, you should increase the open files limit for searchd (Sphinx). You can do this on Linux by editing /etc/security/limits.conf and adding:

	root soft nofile 100000
	root hard nofile 200000

Then logout, login, and restart searchd.

Changing num_indexes

If you change the num_indexes setting in /etc/elsa_node.conf, you will need to regenerate the /usr/local/etc/sphinx.conf file. To do so, either delete or move the existing sphinx.conf file and then run:

	echo "" | perl /usr/local/elsa/node/elsa.pl -on
	pkill searchd
	/usr/local/sphinx/bin/searchd --config /usr/local/etc/sphinx.conf

This will regenerate the config file using the new num_indexes value. There is one last step that needs to be taken, and that is to instantiate the actual Sphinx files by running indexer on these previously non-existent files. This step depends on what the new value of num_indexes is. In this example, we have changed num_indexes from 200 to 400, so we need to instantiate indexes 201 through 400. We do this thusly:

           for COUNTER in `seq 201 400`; do /usr/local/sphinx/bin/indexer --config /usr/local/etc/sphinx.conf temp_$COUNTER perm_$COUNTER; done
Clone this wiki locally