-
Notifications
You must be signed in to change notification settings - Fork 0
User Manual (endure)
endure is a DHCP diagnostics utility running side-by-side with your DHCP server or relay, gathering various metrics and chasing issues. This page contains helpful information about the program installation and usage. Thank you for choosing this software.
- System and Software Requirements
- Building with Cargo
- Installing from Packages
- Running the Utility
- Supported Metrics
- Metrics Reporting Channels
- Working with Traffic Captures
- Integration with Prometheus and Grafana
Endure has been tested on Ubuntu 22 and macOS Sonoma 14.1.1. It requires libpcap library.
To compile the program on Ubuntu 22, first install libpcap-dev
:
$ apt install libpcap-dev
Endure is written in Rust and can be compiled using the cargo utility. The minimal required rustc
version is 1.74.
$ cd endure
$ cargo build --release
The resulting binary can be found in the endure/target/release
directory.
A .deb
package is provided with each release on the Releases Page. Other package formats will be available shortly.
Suppose your DHCP server is responding to the traffic on the interfaces bridge101
and bridge102
. You can start monitoring the server with the following command:
$ endure collect -i bridge101 -i bridge102 -c stdout
The -c stdout
argument configures the program to output collected metrics into the console periodically.
Listening on the local loopback interface has no practical application in production, however it may be useful for testing purposes. The --loopback
switch is an alias for the -i [loopback name]
(e.g., -i lo
). However, the --loopback
argument cannot be combined with -i
.
$ endure collect --loopback -c stdout
Having a capture file named capture.pcap
it is possible to gather the same metrics using the read
command:
$ endure read --pcap capture.pcap --json
It will produce a report in the JSON format containing the metrics computed from all the DHCP packets in the capture file. The same report can be produced in the CSV format using the --csv
switch.
Finally, using the --stream
switch it is possible to generate a CSV output with several rows, each row presenting metrics for the last 100 packets. For example:
$ endure read --pcap capture.pcap --csv --stream
Endure can gather and report the following metrics for the DHCPv4 service.
Metric | Description |
---|---|
bootp_opcode_boot_requests_count |
A total number of BootRequest messages |
bootp_opcode_boot_replies_count |
A total number of BootRequest messages |
bootp_opcode_invalid_count |
A total number of invalid messages (having invalid OpCode ) |
bootp_opcode_boot_requests_percent |
A percentage of BootRequest messages |
bootp_opcode_boot_replies_percent |
A percentage of the BootReply messages |
bootp_opcode_invalid_percent |
A percentage of neither request nor reply messages |
bootp_retransmit_percent |
Percentage of retransmissions |
bootp_retransmit_secs_avg |
Average number of seconds the DHCP clients have been retrying to acquire a lease |
bootp_retransmit_longest_trying_client |
MAC address of a client who has been trying to get the lease the longest |
dhcpv4_roundtrip_dora_milliseconds_avg |
Average time in milliseconds to complete a successful 4-way (DORA) exchange |
dhcpv4_roundtrip_dora_do_milliseconds_avg |
Average time in milliseconds to complete a Discover/Offer exchange during the 4-way (DORA) exchange |
dhcpv4_roundtrip_dora_ra_milliseconds_avg |
Average time in milliseconds to complete a Request/Ack exchange during the 4-way (DORA) exchange |
Endure can report the metrics in several different ways, both periodically and on demand. The reporting channels described below can be used exclusively or combined together.
The CSV format is the easiest to generate and consume. The -c stdout
configures the generation of the periodic metrics report to the console. The -r
argument specifies an interval in seconds between the consecutive reports. The default interval is 5 seconds.
$ endure collect -i bridge101 -c stdout -r 3
In order to direct the output to a specific file, specify the file path instead of the stdout
keyword.
$ endure collect -i bridge101 -c /tmp/report.csv
Prometheus is a popular metrics collecting and monitoring solution. Endure implements a Prometheus exporter making it possible to benefit from the graphical data presentation and alarms supported by Prometheus. Running the following command:
$ endure collect -i bridge101 -a 127.0.0.1:8080 --prometheus
The above command makes the metrics available to Prometheus on the following endpoint: http://127.0.0.1:8080
. To test the endpoint with curl
try the following command from the terminal. An example output is also shown below.
$ curl 127.0.0.1:8080/metrics
# HELP opcode_boot_requests_total Total number of the BootRequest messages.
# TYPE opcode_boot_requests_total gauge
opcode_boot_requests_total 0
# HELP opcode_boot_replies_total Total number of the BootReply messages.
# TYPE opcode_boot_replies_total gauge
opcode_boot_replies_total 0
# HELP opcode_boot_replies_total Total number of the invalid messages.
# TYPE opcode_boot_replies_total gauge
opcode_boot_replies_total 0
# HELP opcode_boot_requests_percent Percentage of the BootRequest messages.
# TYPE opcode_boot_requests_percent gauge
opcode_boot_requests_percent 0.0
# HELP opcode_boot_replies_percent Percentage of the BootReply messages.
# TYPE opcode_boot_replies_percent gauge
opcode_boot_replies_percent 0.0
# HELP opcode_invalid_percent Percentage of the invalid messages.
# TYPE opcode_invalid_percent gauge
opcode_invalid_percent 0.0
# HELP retransmit_percent Percentage of the retransmissions in the mssages sent by clients.
# TYPE retransmit_percent gauge
retransmit_percent 0.0
# HELP retransmit_secs_avg Average retransmission time (i.e. average time in retransmissions to acquire a new lease).
# TYPE retransmit_secs_avg gauge
retransmit_secs_avg 0.0
# EOF
SSE is a popular and easy-to-use mechanism to subscribe to and receive the events from the monitored system. Endure utilizes this mechanism to expose the periodic metrics reports to the subscribers, such as web services.
$ endure collect -i bridge101 -a 127.0.0.1:8080 --sse
The periodic output over the SSE can also be tested with curl
:
$ curl 127.0.0.1:8080/sse
data: {"event_type":"PeriodicReport","payload":{"time":"2024-03-11T18:34:41.240978+01:00","opcode_boot_requests_count":0,"opcode_boot_replies_count":0,"opcode_invalid_count":0,"opcode_boot_requests_percent":0.0,"opcode_boot_replies_percent":0.0,"opcode_invalid_percent":0.0,"retransmit_percent":0.0,"retransmit_secs_avg":0.0,"retransmit_longest_trying_client":""}}
The future goal is to expose a REST API to manage the running endure program. Right now, the REST API is limited to a single endpoint to receive the gathered metrics in the JSON format.
$ endure collect -i bridge101 -a 127.0.0.1:8080 --api
Try the API with curl
:
$ curl 127.0.0.1:8080/api/metrics
{"time":"2024-03-11T18:38:59.178268+01:00","opcode_boot_requests_count":0,"opcode_boot_replies_count":0,"opcode_invalid_count":0,"opcode_boot_requests_percent":0.0,"opcode_boot_replies_percent":0.0,"opcode_invalid_percent":0.0,"retransmit_percent":0.0,"retransmit_secs_avg":0.0,"retransmit_longest_trying_client":""}
Traffic captures are useful for traffic analysis and are often gathered with the tools such as tcpdump
or wireshark
. Endure also offers generating them during the traffic analysis on selected interfaces. Include the optional -p
command line switch to specify the location of the generated .pcap
files:
$ endure collect -i bridge101 -i bridge102 -c stdout -p /tmp/pcap
In the example above, the program will create two capture files, one for each interface in the specified directory. The directory must exist before running the tool, or it will exit with an error.
The traffic captures can be analyzed offline using the following command:
$ endure read --pcap capture.pcap --json
This command processes the entire file and reports the relevant metrics in JSON format. For example:
{
"opcode_boot_replies_count": 370,
"opcode_boot_replies_percent": 40.5,
"opcode_boot_requests_count": 543,
"opcode_boot_requests_percent": 59.4,
"opcode_invalid_count": 0,
"opcode_invalid_percent": 0.0,
"retransmit_longest_trying_client": "00:0c:01:02:04:73",
"retransmit_percent": 65.9,
"retransmit_secs_avg": 3.7
}
The same report can be returned in the CSV format:
$ endure read --pcap capture.pcap --csv
Note that the returned reports contain a snapshot of the metrics at the end of the file processing. The metrics are computed from all packets in the capture file. For long traffic captures, however, it may be more practical to report partial metrics similar to the ones returned during the online traffic analysis on the network interfaces. The partial metrics are returned when the --stream
switch is present. For example:
$ endure read --pcap capture.pcap --csv --stream
The partial metrics are generated after every 100 analyzed packets. For example, if a capture contains 1000 packets, the command above will return 10 rows with the partial metrics collected for the preceding 100 packets.
The --stream
switch is incompatible with the --json
switch.
Endure source code includes a Docker Compose configuration that launches two containers, one with a Prometheus instance, and one with Grafana. This setup requires that Host network driver is enabled in Docker.
Launch the containers using the following commands:
$ cd docker
$ docker compose up
Prometheus is configured to scrape the metrics from http://localhost:9100. Make sure that endure
makes the metrics available on that address and port. For example:
./endure collect --loopback --prometheus --http-address 127.0.0.1:9100
Open Grafana dashboard in the browser on http://localhost:3000. Navigate to Dashboards
to
monitor the metrics exported by endure
.
© Marcin Siodelski 2023-2024