-
Notifications
You must be signed in to change notification settings - Fork 2
Home
LogWatcher is a python daemon that gathers metrics from the access logs of web applications (apache and tomcat have been tested), and sends the metrics downstream to either a Graphite Server or Ganglia gmond listener in near-realtime. Metrics are named with the prefix "LW_" for easy identification. An instance of LogWatcher is required for each access log that is to be watched. The log file can be either statically named or pre-rotated with timestamp in the filename. Metrics are typically collected/averaged by minute, but this is configurable.
- Python 2.6+ (with time,os,re,sys,atexit,ConfigParser,getopt,string)
- Ganglia or Graphite
LogWatcher can be installed using the existing python package at https://pypi.python.org/pypi by simply running
pip install LogWathcher
You can also build and deploy your own python package from the GitHub source. There is also a sample Spec file that can be used to build and deploy LogWatcher as an RPM.
A Sample init (start/stop script) and ini (configuration) file are provided here as well and are not part of the python package. Typically these are deployed via some other CM tool like Puppet or Chef.
There is a test access log and test INI file available in the test directory. To test basic functionality after you have installed the LogWatcher package you can run the following via the command line:
python2.7 /usr/lib/python2.7/site-packages/logwatcher/logwatcher.py -D -V -c /app/logwatcher/etc/test.ini -b
The output should look like:
DEBUG: FOUND A NEW LOGFILE, we should switch (after finishing)
DEBUG: Last line was None (try 1)
DEBUG: opening logfile /tmp/access_test
DEBUG: log count = 0
DEBUG: current position is 0
DEBUG: readlines() returned 650 lines
DEBUG: Found new count metric: return_code_404
DEBUG: Found new count metric: isCust_NotSet
.
.
.
DEBUG: readlines() returned 0 lines
You may see ERRORS related to not being able to find gmetric if your system does not have this Ganglia binary installed.
Go here for details about the LogWatcher configuration file. There is a sample INI that should produce the default LogWatcher metrics with every little customization.
LogWatcher can be used to generate metrics from anything that can be found in the access logs using regular expressions. Some basic log format suggestions for Tomcat and Apache are as follows.
<Valve className="org.apache.catalina.valves.AccessLogValve"
directory="/var/log/access"
prefix="access"
resolveHosts="false"
checkExists="true"
rotatable="false"
pattern="%a %v %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" "%{REQUEST_DETAILS}r" %D"
/>
LogFormat "%h %{Host}i %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D" custom_fmt
Additional custom metrics can be added and found via regex in the config. The following is a recommended format for these metrics:
[key=value]
Example Tomcat Log Line with Additional Custom Metrics:
1.23.45.678 logwather.com - [26/Sep/2015:15:59:16 -0700] "GET /profile HTTP/1.1" 200 2989 "http://referrer.com/restaurants" "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko" " [wsTime=7] [isCust=0] [ver=2] [showAds=true] [daoTimelisting=2.678338] [daoTimecgmdblisting=2.678338] [oTime=0] [pTime=1] [daoTimecontent=2.511219] [daoTimecgmdbcontent=2.511219] [clientIp=12.3.4.5.6]" 8
LogWather can currently send metrics either directly to a Graphite servers or to Ganglia.
Ganglia is the default destination for metrics. Logwatcher will expect and use the /etc/gmond.conf file on your system.
To send to graphite you need to use the following runtime options. Using these will disable the default behavior of sending metrics to Ganglia.
-g --graphite-server <s> Use graphite, with server <s>
-G --use-graphite Use graphite, find server in /etc/graphite.conf
There are basically three primary types of metrics supported, plus another derived from the first two and special-use priming metrics. Most use a regexp which finds a value.
The value saved as $1 in the regex will be counted, and a separate metric created for each $1 found (as well as a _NotSet metric that counts lines not matching your regex) Note that metric names are dynamically generated from the values found. The metrics are persisted for the run-time of the LW instance, typically months.
The values saved as $1 since the last notify event (based on the notify_schedule) are added together and saved as a single metric.
These are derived from either counts or sums. The ratio is the value of the original metric divided by the Queries metric (unfiltered requests per minute) Used for alerting, since ratios don't vary much with traffic changes during the day.
These ratios are derived from counts and/or sums using user-defined expressions Can be used to configure ratio-style metrics on specific segments of traffic, instead of all requests
Each of these is a collection of counts showing the distribution of values over N buckets of size M. Used primarily to provide data for processing time histograms (typically 11 buckets of 100ms each, the last bucket counting any value over 1000ms)
Metric Name | Reported Units | Units | Description |
---|---|---|---|
LW_<distinguisher>_Total_Processing_Time | seconds | seconds | The sum of the processing time value from every log line, in ms, since the last notify event (based on the notify_schedule). Requires the following parameters be set: processing_time_regex processing_time_units (see the Config Options table below for suggested settings) |
LW_<distinguisher>_Avg_Processing_Time | seconds | seconds | LW_<distinguisher>Total_Processing_Time / LW<distinguisher>_Queries |
LW__Max_Processing_Time | seconds | seconds | The maximum value matching processing_time_regex since the last notify event (based on the notify_schedule option). |
LW_<distinguisher>_exceeding_SLA | percent | percent | The percent of not-ignored log lines (see ignore_pattern option) processed since the last notify event (based on the notify_schedule option), who's processing time exceeds sla_ms value. |
LW_<distinguisher>_exceeding_SLA_ct | percent | decimal | The count of not-ignored log lines (see ignore_pattern) processed since the last notify event (based on the notify_schedule), who's processing time exceeds sla_ms. |
LW_<distinguisher>_Queries | count | decimal | The total number of not-ignored log lines (see ignore_pattern option) processed since the last notify event (based on the notify_schedule). |
LW_<distinguisher>_QPS | qps | decimal | The (average?) QPS, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule). |
LW_LW_Version | string | string | The version of LogWatcher. |
LW__ignored | count | decimal | A count of the number of log lines ignored (matching ignore_pattern) since the last notify event (based on the notify_schedule). |
LW_<distinguisher>QPS | qps | decimal | The (average?) QPS for log lines matching brand_regex, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule). |
LW_<distinguisher>_QPS_NULL_brand | qps | decimal | The (average?) QPS for log lines that do not match brand_regex, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule). |
LW_LW_LogTime | seconds | decimal | |
LW_LW_NewMetrics | float | decimal | Count of new metrics that were not sent on the last cycle, or what? ...it does seem to exclude some, or all, of the built-in metrics. |
LW_LW_TotalMetrics | float | decimal | A count of the number of metrics LogWatcher is sending, not counting this metric. |
LW_LW_NotifyTime | seconds | decimal |
LogWatcher supports very simple LinePlugins. The plugins can modify the log lines, compute complex metrics, or even send some or all of the lines to a separate log file or other system (kafka). Note that lines excluded by the exclude filter are not sent to plugins. Plugin Details are available here.