Skip to content

User Manual (endure)

Marcin Siodelski edited this page Dec 19, 2024 · 18 revisions

endure is a DHCP diagnostics utility running side-by-side with your DHCP server or relay, gathering various metrics and chasing issues. This page contains helpful information about the program installation and usage. Thank you for choosing this software.

Table of Contents

System and Software Requirements

Endure has been tested on Ubuntu 22 and macOS Sonoma 14.1.1. It requires libpcap library.

To compile the program on Ubuntu 22, first install libpcap-dev:

$ apt install libpcap-dev

Building with Cargo

Endure is written in Rust and can be compiled using the cargo utility. The minimal required rustc version is 1.74.

$ cd endure
$ cargo build --release

The resulting binary can be found in the endure/target/release directory.

Installing from Packages

A .deb package is provided with each release on the Releases Page. Other package formats will be available shortly.

Running the Utility

Suppose your DHCP server is responding to the traffic on the interfaces bridge101 and bridge102. You can start monitoring the server with the following command:

$ endure collect -i bridge101 -i bridge102 -c stdout

The -c stdout argument configures the program to output collected metrics into the console periodically.

Listening on the local loopback interface has no practical application in production, however it may be useful for testing purposes. The --loopback switch is an alias for the -i [loopback name] (e.g., -i lo). However, the --loopback argument cannot be combined with -i.

$ endure collect --loopback -c stdout

Having a capture file named capture.pcap it is possible to gather the same metrics using the read command:

$ endure read --pcap capture.pcap --json

It will produce a report in the JSON format containing the metrics computed from all the DHCP packets in the capture file. The same report can be produced in the CSV format using the --csv switch.

Finally, using the --stream switch it is possible to generate a CSV output with several rows, each row presenting metrics for the last 100 packets. For example:

$ endure read --pcap capture.pcap --csv --stream

Supported Metrics

Endure can gather and report the following metrics for the DHCPv4 service.

Metric Description
bootp_opcode_boot_requests_count A total number of BootRequest messages
bootp_opcode_boot_replies_count A total number of BootRequest messages
bootp_opcode_invalid_count A total number of invalid messages (having invalid OpCode)
bootp_opcode_boot_requests_percent A percentage of BootRequest messages
bootp_opcode_boot_replies_percent A percentage of the BootReply messages
bootp_opcode_invalid_percent A percentage of neither request nor reply messages
bootp_retransmit_percent Percentage of retransmissions
bootp_retransmit_secs_avg Average number of seconds the DHCP clients have been retrying to acquire a lease
bootp_retransmit_longest_trying_client MAC address of a client who has been trying to get the lease the longest
dhcpv4_roundtrip_dora_milliseconds_avg Average time in milliseconds to complete a successful 4-way (DORA) exchange
dhcpv4_roundtrip_dora_do_milliseconds_avg Average time in milliseconds to complete a Discover/Offer exchange during the 4-way (DORA) exchange
dhcpv4_roundtrip_dora_ra_milliseconds_avg Average time in milliseconds to complete a Request/Ack exchange during the 4-way (DORA) exchange

Metrics Reporting Channels

Endure can report the metrics in several different ways, both periodically and on demand. The reporting channels described below can be used exclusively or combined together.

CSV File or Console

The CSV format is the easiest to generate and consume. The -c stdout configures the generation of the periodic metrics report to the console. The -r argument specifies an interval in seconds between the consecutive reports. The default interval is 5 seconds.

$ endure collect -i bridge101 -c stdout -r 3

In order to direct the output to a specific file, specify the file path instead of the stdout keyword.

$ endure collect -i bridge101 -c /tmp/report.csv

Export to Prometheus

Prometheus is a popular metrics collecting and monitoring solution. Endure implements a Prometheus exporter making it possible to benefit from the graphical data presentation and alarms supported by Prometheus. Running the following command:

$ endure collect -i bridge101 -a 127.0.0.1:8080 --prometheus

The above command makes the metrics available to Prometheus on the following endpoint: http://127.0.0.1:8080. To test the endpoint with curl try the following command from the terminal. An example output is also shown below.

$ curl 127.0.0.1:8080/metrics

# HELP opcode_boot_requests_total Total number of the BootRequest messages.
# TYPE opcode_boot_requests_total gauge
opcode_boot_requests_total 0
# HELP opcode_boot_replies_total Total number of the BootReply messages.
# TYPE opcode_boot_replies_total gauge
opcode_boot_replies_total 0
# HELP opcode_boot_replies_total Total number of the invalid messages.
# TYPE opcode_boot_replies_total gauge
opcode_boot_replies_total 0
# HELP opcode_boot_requests_percent Percentage of the BootRequest messages.
# TYPE opcode_boot_requests_percent gauge
opcode_boot_requests_percent 0.0
# HELP opcode_boot_replies_percent Percentage of the BootReply messages.
# TYPE opcode_boot_replies_percent gauge
opcode_boot_replies_percent 0.0
# HELP opcode_invalid_percent Percentage of the invalid messages.
# TYPE opcode_invalid_percent gauge
opcode_invalid_percent 0.0
# HELP retransmit_percent Percentage of the retransmissions in the mssages sent by clients.
# TYPE retransmit_percent gauge
retransmit_percent 0.0
# HELP retransmit_secs_avg Average retransmission time (i.e. average time in retransmissions to acquire a new lease).
# TYPE retransmit_secs_avg gauge
retransmit_secs_avg 0.0
# EOF

Server Sent Events (SSE)

SSE is a popular and easy-to-use mechanism to subscribe to and receive the events from the monitored system. Endure utilizes this mechanism to expose the periodic metrics reports to the subscribers, such as web services.

$ endure collect -i bridge101 -a 127.0.0.1:8080 --sse

The periodic output over the SSE can also be tested with curl:

$ curl 127.0.0.1:8080/sse

data: {"event_type":"PeriodicReport","payload":{"time":"2024-03-11T18:34:41.240978+01:00","opcode_boot_requests_count":0,"opcode_boot_replies_count":0,"opcode_invalid_count":0,"opcode_boot_requests_percent":0.0,"opcode_boot_replies_percent":0.0,"opcode_invalid_percent":0.0,"retransmit_percent":0.0,"retransmit_secs_avg":0.0,"retransmit_longest_trying_client":""}}

REST API

The future goal is to expose a REST API to manage the running endure program. Right now, the REST API is limited to a single endpoint to receive the gathered metrics in the JSON format.

$ endure collect -i bridge101 -a 127.0.0.1:8080 --api

Try the API with curl:

$ curl 127.0.0.1:8080/api/metrics

{"time":"2024-03-11T18:38:59.178268+01:00","opcode_boot_requests_count":0,"opcode_boot_replies_count":0,"opcode_invalid_count":0,"opcode_boot_requests_percent":0.0,"opcode_boot_replies_percent":0.0,"opcode_invalid_percent":0.0,"retransmit_percent":0.0,"retransmit_secs_avg":0.0,"retransmit_longest_trying_client":""}

Working with Traffic Captures

Traffic captures are useful for traffic analysis and are often gathered with the tools such as tcpdump or wireshark. Endure also offers generating them during the traffic analysis on selected interfaces. Include the optional -p command line switch to specify the location of the generated .pcap files:

$ endure collect -i bridge101 -i bridge102 -c stdout -p /tmp/pcap

In the example above, the program will create two capture files, one for each interface in the specified directory. The directory must exist before running the tool, or it will exit with an error.

The traffic captures can be analyzed offline using the following command:

$ endure read --pcap capture.pcap --json

This command processes the entire file and reports the relevant metrics in JSON format. For example:

{
  "opcode_boot_replies_count": 370,
  "opcode_boot_replies_percent": 40.5,
  "opcode_boot_requests_count": 543,
  "opcode_boot_requests_percent": 59.4,
  "opcode_invalid_count": 0,
  "opcode_invalid_percent": 0.0,
  "retransmit_longest_trying_client": "00:0c:01:02:04:73",
  "retransmit_percent": 65.9,
  "retransmit_secs_avg": 3.7
}

The same report can be returned in the CSV format:

$ endure read --pcap capture.pcap --csv

Note that the returned reports contain a snapshot of the metrics at the end of the file processing. The metrics are computed from all packets in the capture file. For long traffic captures, however, it may be more practical to report partial metrics similar to the ones returned during the online traffic analysis on the network interfaces. The partial metrics are returned when the --stream switch is present. For example:

$ endure read --pcap capture.pcap --csv --stream

The partial metrics are generated after every 100 analyzed packets. For example, if a capture contains 1000 packets, the command above will return 10 rows with the partial metrics collected for the preceding 100 packets.

The --stream switch is incompatible with the --json switch.

Integration with Prometheus and Grafana

Endure source code includes a Docker Compose configuration that launches two containers, one with a Prometheus instance, and one with Grafana. This setup requires that Host network driver is enabled in Docker.

Launch the containers using the following commands:

$ cd docker
$ docker compose up

Prometheus is configured to scrape the metrics from http://localhost:9100. Make sure that endure makes the metrics available on that address and port. For example:

./endure collect --loopback  --prometheus --http-address 127.0.0.1:9100

Open Grafana dashboard in the browser on http://localhost:3000. Navigate to Dashboards to monitor the metrics exported by endure.