Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(sync): merged autoware.iv/pull/2362 (#761) #134

Merged
merged 2 commits into from
Dec 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions system/system_monitor/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,10 @@ ament_auto_add_executable(hdd_reader
reader/hdd_reader/hdd_reader.cpp
)

ament_auto_add_executable(traffic_reader
reader/traffic_reader/traffic_reader.cpp
)

find_library(NL3 nl-3 REQUIRED)
find_library(NLGENL3 nl-genl-3 REQUIRED)
list(APPEND NL_LIBS ${NL3} ${NLGENL3})
Expand All @@ -148,6 +152,7 @@ target_link_libraries(process_monitor_lib ${LIBRARIES})
target_link_libraries(gpu_monitor_lib ${GPU_LIBRARY} ${Boost_LIBRARIES} ${LIBRARIES})
target_link_libraries(msr_reader ${Boost_LIBRARIES} ${LIBRARIES})
target_link_libraries(hdd_reader ${Boost_LIBRARIES} ${LIBRARIES})
target_link_libraries(traffic_reader ${Boost_LIBRARIES} ${LIBRARIES})

rclcpp_components_register_node(cpu_monitor_lib
PLUGIN "CPUMonitor"
Expand Down
5 changes: 2 additions & 3 deletions system/system_monitor/config/cpu_monitor.param.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
/**:
ros__parameters:
temp_warn: 90.0
temp_error: 95.0
usage_warn: 0.90
usage_warn: 0.96
usage_error: 1.00
usage_count: 2
usage_avg: true
msr_reader_port: 7634
6 changes: 3 additions & 3 deletions system/system_monitor/config/hdd_monitor.param.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
num_disks: 1
disks: # Until multi type lists are allowed, name N the disks as disk0...disk{N-1}
disk0:
name: /dev/sda
name: /dev/sda3
temp_warn: 55.0
temp_error: 70.0
usage_warn: 0.95
usage_error: 0.99
free_warn: 5120 # MB(8hour)
free_error: 100 # MB(last 1 minute)
3 changes: 1 addition & 2 deletions system/system_monitor/config/mem_monitor.param.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
/**:
ros__parameters:
usage_warn: 0.95
usage_error: 0.99
available_size: 1024 # MB
3 changes: 2 additions & 1 deletion system/system_monitor/config/net_monitor.param.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
/**:
ros__parameters:
devices: ["*"]
usage_warn: 0.95
traffic_reader_port: 7636
monitor_program: "greengrass"
2 changes: 0 additions & 2 deletions system/system_monitor/docs/topics_cpu_monitor.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@
| level | message |
| ----- | ------- |
| OK | OK |
| WARN | warm |
| ERROR | hot |

<b>[values]</b>

Expand Down
28 changes: 16 additions & 12 deletions system/system_monitor/docs/topics_mem_monitor.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,19 @@

<b>[values]</b>

| key | value (example) |
| ------------ | --------------- |
| Mem: usage | 18.99% |
| Mem: total | 31G |
| Mem: used | 5.9G |
| Mem: free | 15G |
| Swap: total | 2.0G |
| Swap: used | 0B |
| Swap: free | 2.0G |
| Total: total | 33G |
| Total: used | 5.9G |
| Total: free | 17G |
| key | value (example) |
| --------------- | --------------- |
| Mem: usage | 29.72% |
| Mem: total | 31.2G |
| Mem: used | 6.0G |
| Mem: free | 20.7G |
| Mem: shared | 2.9G |
| Mem: buff/cache | 4.5G |
| Mem: available | 21.9G |
| Swap: total | 2.0G |
| Swap: used | 218M |
| Swap: free | 1.8G |
| Total: total | 33.2G |
| Total: used | 6.2G |
| Total: free | 22.5G |
| Total: used+ | 9.1G |
76 changes: 55 additions & 21 deletions system/system_monitor/docs/topics_net_monitor.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,64 @@

## <u>Network Usage</u>

/diagnostics/cpu_monitor: Network Usage
/diagnostics/net_monitor: Network Usage

<b>[summary]</b>

| level | message |
| ----- | --------- |
| OK | OK |
| WARN | high load |
| ERROR | down |
| level | message |
| ----- | ------- |
| OK | OK |

<b>[values]</b>

| key | value (example) |
| ----------------------------- | --------------------- |
| Network [0-9]: status | OK / high load / down |
| Network [0-9]: interface name | wlp82s0 |
| Network [0-9]: rx_usage | 0.00% |
| Network [0-9]: tx_usage | 0.00% |
| Network [0-9]: rx_traffic | 0.00 MB/s |
| Network [0-9]: tx_traffic | 0.00 MB/s |
| Network [0-9]: capacity | 400.0 MB/s |
| Network [0-9]: mtu | 1500 |
| Network [0-9]: rx_bytes | 58455228 |
| Network [0-9]: rx_errors | 0 |
| Network [0-9]: tx_bytes | 11069136 |
| Network [0-9]: tx_errors | 0 |
| Network [0-9]: collisions | 0 |
| key | value (example) |
| ----------------------------- | --------------- |
| Network [0-9]: status | OK |
| Network [0-9]: interface name | wlp82s0 |
| Network [0-9]: rx_usage | 0.00% |
| Network [0-9]: tx_usage | 0.00% |
| Network [0-9]: rx_traffic | 0.00 MB/s |
| Network [0-9]: tx_traffic | 0.00 MB/s |
| Network [0-9]: capacity | 400.0 MB/s |
| Network [0-9]: mtu | 1500 |
| Network [0-9]: rx_bytes | 58455228 |
| Network [0-9]: rx_errors | 0 |
| Network [0-9]: tx_bytes | 11069136 |
| Network [0-9]: tx_errors | 0 |
| Network [0-9]: collisions | 0 |

## <u>Network Traffic</u>

/diagnostics/net_monitor: Network Traffic

<b>[summary]</b>

| level | message |
| ----- | ------- |
| OK | OK |

<b>[values] program</b>

| key | value (example) |
| -------------------------------- | ------------------------------------------- |
| nethogs [0-9]: PROGRAM | /lambda/greengrassSystemComponents/1384/999 |
| nethogs [0-9]: SENT (KB/Sec) | 1.13574 |
| nethogs [0-9]: RECEIVED (KB/Sec) | 0.261914 |

<b>[values] all</b>

| key | value (example) |
| --------------------- | -------------------------------------------------------------- |
| nethogs: all (KB/Sec) | python3.7/1520/999 0.274414 0.354883 |
| | /lambda/greengrassSystemComponents/1299/999 0.487305 0.0966797 |
| | sshd: muser@pts/5/15917/1002 0.396094 0.0585938 |
| | /usr/bin/python3.7/2371/999 0 0 |
| | /greengrass/ggc/packages/1.10.0/bin/daemon/906/0 0 0 |
| | python3.7/4362/999 0 0 |
| | unknown TCP/0/0 0 0 |

<b>[values] error</b>

| key | value (example) |
| ----- | ----------------------------------------------------- |
| error | [nethogs -t] execve failed: No such file or directory |
31 changes: 31 additions & 0 deletions system/system_monitor/docs/traffic_reader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# traffic_reader

## Name

traffic_reader - monitoring netwok traffic by process

## Synopsis

traffic_reader [OPTION]

## Description

Monitoring netwok traffic by process.<br>
This runs as a daemon process and listens to a TCP/IP port (7636 by default).

**Options:**<br>
_-h, --help_<br>
&nbsp;&nbsp;&nbsp;&nbsp;Display help<br>
_-p, --port #_<br>
&nbsp;&nbsp;&nbsp;&nbsp;Port number to listen to

**Exit status:**<br>
Returns 0 if OK; non-zero otherwise.

## Notes

The 'traffic_reader' requires nethogs command.<br>

## Operation confirmed platform

- Ubuntu 20.04.3 LTS (GNU/Linux 5.11.0-40-generic x86_64)
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,14 @@ class CPUMonitorBase : public rclcpp::Node
virtual void checkUsage(
diagnostic_updater::DiagnosticStatusWrapper & stat); // NOLINT(runtime/references)

/**
* @brief convert Cpu Usage To diagnostic Level
* @param [cpu_name] mpstat cpu name
* @param [usage] cpu usage value
* @return DiagStatus::OK or WARN or ERROR
*/
virtual int CpuUsageToLevel(const std::string & cpu_name, float usage);

/**
* @brief check CPU load average
* @param [out] stat diagnostic message passed directly to diagnostic publish calls
Expand Down Expand Up @@ -128,12 +136,12 @@ class CPUMonitorBase : public rclcpp::Node
int num_cores_; //!< @brief number of cores
std::vector<cpu_temp_info> temps_; //!< @brief CPU list for temperature
std::vector<cpu_freq_info> freqs_; //!< @brief CPU list for frequency
std::vector<int> usage_check_cnt_; //!< @brief CPU list for usage over check counter
bool mpstat_exists_; //!< @brief flag if mpstat exists

float temp_warn_; //!< @brief CPU temperature(DegC) to generate warning
float temp_error_; //!< @brief CPU temperature(DegC) to generate error
float usage_warn_; //!< @brief CPU usage(%) to generate warning
float usage_error_; //!< @brief CPU usage(%) to generate error
int usage_count_; //!< @brief CPU usage(%) usage over continuous count
bool usage_avg_; //!< @brief Check CPU usage calculated as averages among all processors

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,12 @@
*/
struct HDDParam
{
float temp_warn_; //!< @brief HDD temperature(DegC) to generate warning
float temp_error_; //!< @brief HDD temperature(DegC) to generate error
float usage_warn_; //!< @brief HDD usage(%) to generate warning
float usage_error_; //!< @brief HDD usage(%) to generate error
float temp_warn_; //!< @brief HDD temperature(DegC) to generate warning
float temp_error_; //!< @brief HDD temperature(DegC) to generate error
int free_warn_; //!< @brief HDD free space(MB) to generate warning
int free_error_; //!< @brief HDD free space(MB) to generate error

HDDParam() : temp_warn_(55.0), temp_error_(70.0), usage_warn_(0.95), usage_error_(0.99) {}
HDDParam() : temp_warn_(55.0), temp_error_(70.0), free_warn_(5120), free_error_(100) {}
};

class HDDMonitor : public rclcpp::Node
Expand Down Expand Up @@ -74,6 +74,13 @@ class HDDMonitor : public rclcpp::Node
void checkUsage(
diagnostic_updater::DiagnosticStatusWrapper & stat); // NOLINT(runtime/references)

/**
* @brief human readable size string to MB
* @param [in] human readable size string
* @return Megabyte
*/
int HumanReadableToMegaByte(const std::string & str);

/**
* @brief get HDD parameters
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,7 @@ class MemMonitor : public rclcpp::Node

char hostname_[HOST_NAME_MAX + 1]; //!< @brief host name

float usage_warn_; //!< @brief Memory usage(%) to generate warning
float usage_error_; //!< @brief Memory usage(%) to generate error
size_t available_size_; //!< @brief Memory available size to generate error

/**
* @brief Memory usage status messages
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,15 @@ class NetMonitor : public rclcpp::Node
void checkUsage(
diagnostic_updater::DiagnosticStatusWrapper & stat); // NOLINT(runtime/references)

/**
* @brief monitor traffic
* @param [out] stat diagnostic message passed directly to diagnostic publish calls
* @note NOLINT syntax is needed since diagnostic_updater asks for a non-const reference
* to pass diagnostic message updated in this function to diagnostic publish calls.
*/
void monitorTraffic(
diagnostic_updater::DiagnosticStatusWrapper & stat); // NOLINT(runtime/references)

/**
* @brief get wireless speed
* @param [in] ifa_name interface name
Expand All @@ -92,7 +101,9 @@ class NetMonitor : public rclcpp::Node
std::vector<std::string> device_params_; //!< @brief list of devices
NL80211 nl80211_; // !< @brief 802.11 netlink-based interface

float usage_warn_; //!< @brief Memory usage(%) to generate warning
std::string monitor_program_; //!< @brief nethogs monitor program name
bool nethogs_all_; //!< @brief nethogs result all mode
int traffic_reader_port_; //!< @brief port number to connect to traffic_reader

/**
* @brief Network usage status messages
Expand Down
55 changes: 55 additions & 0 deletions system/system_monitor/include/traffic_reader/traffic_reader.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
// Copyright 2021 Tier IV, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

/**
* @file traffic_reader.h
* @brief traffic reader definitions
*/

#ifndef TRAFFIC_READER__TRAFFIC_READER_HPP_
#define TRAFFIC_READER__TRAFFIC_READER_HPP_

#include <boost/serialization/serialization.hpp>
#include <boost/serialization/string.hpp>

#include <string>

/**
* @brief traffic information
*/
struct TrafficReaderResult
{
int error_code_; //!< @brief error code, 0 on success, otherwise error
std::string str_; //!< @brief nethogs result string

/**
* @brief Load or save data members.
* @param [inout] ar archive reference to load or save the serialized data members
* @param [in] version version for the archive
* @note NOLINT syntax is needed since this is an interface to serialization and
* used inside boost serialization.
*/
template <typename archive>
void serialize(archive & ar, const unsigned /*version*/) // NOLINT(runtime/references)
{
ar & error_code_;
ar & str_;
}
};

constexpr std::string_view GET_ALL_STR{"<All>"}; //!< @brief nethogs result all request string

constexpr int TRAFFIC_READER_PORT = 7636; //!< @brief traffic reader port: 7634-7647 Unassigned

#endif // TRAFFIC_READER__TRAFFIC_READER_HPP_
Loading