-
Notifications
You must be signed in to change notification settings - Fork 429
Instrumentation and monitoring tool
UCX library provides a tool to analyze UCX-based applications in runtime. The tool creates a representation of each process that uses UCX library in Virtual Filesystem (VFS). The VFS hierarchy of directories shows relations between objects of UCX library. Files grouped in directories describe properties of UCX library object. The file content characterizes a specific property of the object.
The tool is based on Filesystem in Userspace (FUSE) interface. FUSE v3 development package is required to build the tool. If the tool was successfully built, there will be a binary file in the UCX install directory. Launch a daemon process to enable analysis of UCX-based applications using the following command:
$ <path_to_ucx_install_dir>/bin/ucx_vfs
Each running process, that uses UCX library, has corresponding directory in /tmp/ucx/<PID>
.
Stop the daemon, if you don’t want to analyze your applications anymore using the following command:
$ <path_to_ucx_install_dir>/bin/ucx_vfs stop
Directory /tmp/ucx/<PID>
represents usage of UCX library by corresponding process. The directory contains three grouping sub-directories: UCP, UCT, UCS. A directory represents a UCX library object, or combines to groups objects of the same type, or properties of an object. A file describes a UCX library object property.
File name | Description |
---|---|
mem_address | Memory address of the pointer |
File name | Description |
---|---|
error_mode | Error handling mode |
local_address/IPv[4|6] * | Local address: ip |
local_address/port * | Local address: port |
mem_address | Memory address of the pointer |
peer_name | Remote worker address name |
remote_address/IPv[4|6] * | Peer address: ip |
remote_address/port * | Peer address: port |
- [local|remote]_address directory is created only for endpoints created in client-server mode.
File name | Description |
---|---|
ip | Listening socket address: IP address |
port | Listening socket address: Port number |
File name | Description |
---|---|
address_name | Worker address name composed of host name and process id |
counters/ep_closures | Number of endpoint closures |
counters/ep_creations | Number of requests to create endpoint |
counters/ep_creation_failures | Number of failed requests to create endpoint |
counters/ep_failures | Number of failed endpoints |
keepalive/ep_count | Keepalive: Number of endpoints processed in current time slot |
keepalive/round_count | Keepalive: Number of rounds done |
mem_address | Memory address of the pointer |
num_all_eps | Number of all endpoints (except internal endpoints) |
thread_mode | Thread safety mode which worker and the associated resources should be created with |
File name | Description |
---|---|
log_level | Log level above which log messages will be printed |
File name | Description |
---|---|
all | Memory tracking output. Count and size of objects created by the library |
File name | Description |
---|---|
gc_list/length | Number of regions to destroy, regions could not be destroyed from memhook |
inv_q/length | Number of regions which were invalidated during memory events |
max_regions | Maximum number of regions |
max_size | Maximum total size of regions |
num_regions | Total number of managed regions |
regions_distribution/threshold/count | Number of regions with a size smaller than threshold |
regions_distribution/threshold/total_size | Total size of regions with a size smaller than threshold |
total_size | Total size of registered memory |
File name | Description |
---|---|
qp_num | Number of queue pairs |
File name | Description |
---|---|
available | Number of available queue pairs |
unsignaled | Number of unsignaled completion |
qp_num | Number of queue pairs |
sw_pi | Producer index for next work queue entry |
prev_sw_pi | Producer index where last WQE started |
qstart | Pointer to the begining of queue |
qend | Pointer to the end of queue |
bb_max | Maximum building block number |
sig_pi | Producer index for last signaled WQE |
hw_ci | Consumer index |
File name | Description |
---|---|
rx_available | Available credit for rx queue (UD only) |
rx_qp_len | Length of qp rx queue (UD only) |
tx_available | Available credit for tx queue (UD only) |
tx_qp_len | Length of qp tx queue (UD only) |
The presence of the file means that the interface supports the feature.
File name | Description |
---|---|
am_bcopy | Buffered active message |
am_dup | Active messages may be received with duplicates |
am_short | Short active message |
am_zcopy | Zero-copy active message |
atomic_cpu | Atomic communications are consistent with respect to CPU operations |
atomic_device | Atomic communications are consistent only with respect to other atomics on the same device |
cb_async | Supports setting a callback which will be invoked within a reasonable amount of time if uct_worker_progress() is not being called |
cb_sync | Supports setting a callback which is invoked only from the calling context of uct_worker_progress() |
connect_to_ep | Supports connecting to specific endpoint |
connect_to_iface | Supports connecting to interface |
connect_to_sockaddr | Supports connecting to sockaddr |
ep_check | Endpoint check |
ep_keepalive | Transport endpoint has built-in keepalive feature |
errhandle_am_id | Invalid AM id on remote |
errhandle_bcopy_buf | Invalid buffer for buffered operation |
errhandle_bcopy_len | Invalid length for buffered operation |
errhandle_peer_failure | Remote peer failures/outage |
errhandle_remote_mem | Remote memory access |
errhandle_short_buf | Invalid buffer for short operation |
errhandle_zcopy_buf | Invalid buffer for zero copy operation |
get_bcopy | Buffered get |
get_short | Short get |
get_zcopy | Zero-copy get |
pending | Pending operations |
put_bcopy | Buffered put |
put_short | Short put |
put_zcopy | Zero-copy put |
tag_eager_bcopy | Hardware tag matching buffered eager support |
tag_eager_short | Hardware tag matching short eager support |
tag_eager_zcopy | Hardware tag matching zero-copy eager support |
tag_rndv_zcopy | Hardware tag matching rendezvous zero-copy support |
File name | Description |
---|---|
align_mtu | MTU used for alignment |
max_bcopy | Total maximum size (including header) for buffered active message |
max_hdr | Maximum header size for zero-copy active message |
max_iov | Maximum number of elements in iov for zero-copy active message |
max_short | Total maximum size (including header) for short active message |
max_zcopy | Total maximum size (including header) for zero-copy active message |
min_zcopy | Minimum size for zero-copy active message |
opt_zcopy_align | Optimal alignment for zero-copy buffer address |
File name | Description |
---|---|
align_mtu | MTU used for alignment |
max_bcopy | Total maximum size (including header) for buffered get |
max_iov | Maximum number of elements in iov for zero-copy get |
max_short | Total maximum size (including header) for short get |
max_zcopy | Total maximum size (including header) for zero-copy get |
min_zcopy | Minimum size for zero-copy get |
opt_zcopy_align | Optimal alignment for zero-copy buffer address |
File name | Description |
---|---|
align_mtu | MTU used for alignment |
max_bcopy | Total maximum size (including header) for buffered put |
max_iov | Maximum number of elements in iov for zero-copy put |
max_short | Total maximum size (including header) for short put |
max_zcopy | Total maximum size (including header) for zero-copy put |
min_zcopy | Minimum size for zero-copy put |
opt_zcopy_align | Optimal alignment for zero-copy buffer address |
File name | Description |
---|---|
local_cpus | Mask of CPUs near the resource |
reg_cost | Memory registration cost estimation (time, seconds) as a linear function of the buffer size |
rkey_packed_size | Size of buffer needed for packed rkey |
File name | Description |
---|---|
access_mem_types | Memory types that Memory Domain can access |
alloc_mem_types | Bitmap of memory types that Memory Domain can allocate memory on |
detect_mem_types | Bitmap of memory types that Memory Domain can detect if address belongs to it |
max_alloc | Maximum allocation size |
max_reg | Maximum registration size |
reg_mem_types | Bitmap of memory types that Memory Domain can be registered with |
File name | Description |
---|---|
advise | Memory advice support |
alloc | Memory allocation support |
fixed | Memory allocation with fixed address support |
invalidate | Memory invalidation support |
need_memh | The transport needs a valid local memory handle for zero-copy operations |
need_rkey | The transport needs a valid remote memory key for remote memory operations |
reg | Memory registration support |
rkey_ptr | Direct access to remote memory via a pointer that is returned by uct_rkey_ptr |
sockaddr | Client-server connection establishment via sockaddr support |