Skip to content

Agent Developer Mode

Aaditya Talwai edited this page May 18, 2015 · 12 revisions

Agent Developer Mode

The Agent Developer Mode allows the user to collect a wide array of metrics regarding the performance of the agent itself. It can be enabled by adding developer_mode : yes to your datadog.conf. When in developer mode the following functionality is added to the agent:

  1. Metrics for collection time and emit time are sent to Datadog on every collector run.
  2. An additional check agent_metrics is run at the end of every collector loop. This check collects a variety of metrics about the collector's performance, and can be configured with the same interface used to configure regular AgentChecks. Source code for this check can be found under checks.d/agent_metrics.py

Configuring the Agent Metrics Check

Here is an example configuration for the agent_metrics check:

init_config:
    process_metrics:
        - name: get_memory_info
          type: gauge
          active: yes
        - name: get_io_counters
          type: rate
          active: yes
        - name: get_connections
          type: gauge
          active: no

instances:
    [{}]

Each element in the process_metrics list represents a single psutil.Process method that will be executed against the running collector process. The name field specifies the name of the method, the type field specifies the metric type (currently only gauge and rate are supported), and the active field is a utility flag to activate/deactivate certain method calls during the check. Note the method specified in name is executed only when:

  1. The method is available on the psutil.Process class as of psutil==2.1.1
  2. The underlying OS supports the execution of that method (e.g get_io_counters is not available for OS X processes)

If the agent_metrics check cannot execute a particular method, it logs a warning and continues with its business.

Metrics collected via these methods are parsed and aggregated in a namespace derived from the method name. get_memory_info -> datadog.agent.collector.memory_info.rss and datadog.agent.collector.memory_info.vms. The logic for this lives here and here. These metrics are then aggregated and forwarded to DataDog as with any other AgentCheck

Profiling an individual check

Individual checks can be profiled by adding the --profile flag to the standard agent.py check command line call. E.g.: python agent.py check network --profile.

Profiling information consists of the following:

  1. Check runtime
  2. Memory and I/O before/after if available
  3. Pstats output restricted to 20 calls.

[Here] is an example run for profiling the network check.

Clone this wiki locally