Kata 2.0 Metrics Design

Kata implements CRI's API and supports ContainerStats and ListContainerStats interfaces to expose containers metrics. User can use these interfaces to get basic metrics about containers.

Unlike runc, Kata is a VM-based runtime and has a different architecture.

Limitations of Kata 1.x and target of Kata 2.0

Kata 1.x has a number of limitations related to observability that may be obstacles to running Kata Containers at scale.

In Kata 2.0, the following components will be able to provide more details about the system:

containerd shim v2 (effectively kata-runtime)
Hypervisor statistics
Agent process
Guest OS statistics

Note: In Kata 1.x, the main user-facing component was the runtime (kata-runtime). From 1.5, Kata introduced the Kata containerd shim v2 (containerd-shim-kata-v2) which is essentially a modified runtime that is loaded by containerd to simplify and improve the way VM-based containers are created and managed.

For Kata 2.0, the main component is the Kata containerd shim v2, although the deprecated kata-runtime binary will be maintained for a period of time.

Any mention of the "Kata runtime" in this document should be taken to refer to the Kata containerd shim v2 unless explicitly noted otherwise (for example by referring to it explicitly as the kata-runtime binary).

Metrics architecture

Kata 2.0 metrics strongly depend on Prometheus, a graduated project from CNCF.

Kata Containers 2.0 introduces a new Kata component called kata-monitor which is used to monitor the Kata components on the host. It's shipped with the Kata runtime to provide an interface to:

Get metrics
Get events

At present, kata-monitor supports retrieval of metrics only: this is what will be covered in this document.

This is the architecture overview of metrics in Kata Containers 2.0:

And the sequence diagram is shown below:

For a quick evaluation, you can check out this how to.

Kata monitor

The kata-monitor management agent should be started on each node where the Kata containers runtime is installed. kata-monitor will:

Note: a node running Kata containers will be either a single host system or a worker node belonging to a K8s cluster capable of running Kata pods.

Aggregate sandbox metrics running on the node, adding the sandbox_id label to them.
Attach the additional cri_uid, cri_name and cri_namespace labels to the sandbox metrics, tracking the uid, name and namespace Kubernetes pod metadata.
Expose a new Prometheus target, allowing all node metrics coming from the Kata shim to be collected by Prometheus indirectly. This simplifies the targets count in Prometheus and avoids exposing shim's metrics by ip:port.

Only one kata-monitor process runs in each node.

kata-monitor uses a different communication channel than the one used by the container engine (containerd/CRI-O) to communicate with the Kata shim. The Kata shim exposes a dedicated socket address reserved to kata-monitor.

The shim's metrics socket file is created under the virtcontainers sandboxes directory, i.e. vc/sbs/${PODID}/shim-monitor.sock.

Note: If there is no Prometheus server configured, i.e., there are no scrape operations, kata-monitor will not collect any metrics.

Kata runtime

Kata runtime is responsible for:

Gather metrics about shim process
Gather metrics about hypervisor process
Gather metrics about running sandbox
Get metrics from Kata agent (through ttrpc)

Kata agent

Kata agent is responsible for:

Gather agent process metrics
Gather guest OS metrics

In Kata 2.0, the agent adds a new interface:

rpc GetMetrics(GetMetricsRequest) returns (Metrics);

message GetMetricsRequest {}

message Metrics {
	string metrics = 1;
}

The metrics field is Prometheus encoded content. This can avoid defining a fixed structure in protocol buffers.

Performance and overhead

Metrics should not become a bottleneck for the system or downgrade the performance: they should run with minimal overhead.

Requirements:

Metrics MUST be quick to collect
Metrics MUST be small
Metrics MUST be generated only if there are subscribers to the Kata metrics service
Metrics MUST be stateless

In Kata 2.0, metrics are collected only when needed (pull mode), mainly from the /proc filesystem, and consumed by Prometheus. This means that if the Prometheus collector is not running (so no one cares about the metrics) the overhead will be zero.

The metrics service also doesn't hold any metrics in memory.

Metrics size

*	No Sandbox	1 Sandbox	2 Sandboxes
Metrics count	39	106	173
Metrics size (bytes)	9K	144K	283K
Metrics size (`gzipped`, bytes)	2K	10K	17K

Metrics size: response size of one Prometheus scrape request.

It's easy to estimate the size of one metrics fetch request issued by Prometheus. The formula to calculate the expected size when no gzip compression is in place is:
9 + (144 - 9) * number of kata sandboxes

Prometheus supports gzip compression. When enabled, the response size of each request will be smaller:
2 + (10 - 2) * number of kata sandboxes

Example
We have 10 sandboxes running on a node. The expected size of one metrics fetch request issued by Prometheus against the kata-monitor agent running on that node will be:
9 + (144 - 9) * 10 = 1.35M

If gzip compression is enabled:
2 + (10 - 2) * 10 = 82K

Metrics delay

And here is some test data:

End-to-end (from Prometheus server to kata-monitor and kata-monitor write response back): 20ms(avg)
Agent (RPC all from shim to agent): 3ms(avg)

Test infrastructure:

OS: Ubuntu 20.04
Hardware: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz, 6 Cores, and 16GB memory.

Scrape interval

Prometheus default scrape_interval is 1 minute, but it is usually set to 15 seconds. A smaller scrape_interval causes more overhead, so users should set it depending on their monitoring needs.

Metrics list

Here are listed all the metrics supported by Kata 2.0. Some metrics are dependent on the VM guest kernel, so the available ones may differ based on the environment.

Metrics are categorized by the component from/for which the metrics are collected.

Metric types
Kata agent metrics
Firecracker metrics
Kata guest OS metrics
Hypervisor metrics
Kata monitor metrics
Kata containerd shim v2 metrics

Note:

Labels here do not include the instance and job labels added by Prometheus.

Notes about metrics unit

Kibibytes, abbreviated KiB. 1 KiB equals 1024 B.

For some metrics (like network devices statistics from file /proc/net/dev), unit depends on label( for example recv_bytes and recv_packets have different units).

Most of these metrics are collected from the /proc filesystem, so the unit of each metric matches the unit of the relevant /proc entry. See the proc(5) manual page for further details.

Metric types

Prometheus offers four core metric types.

Counter: A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase.
Gauge: A gauge metric represents a single numerical value that can go up and down, typically used for measured values like current memory usage.
Histogram: A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets.
Summary: A summary samples observations like histogram, it can calculate configurable quantiles over a sliding time window.

See Prometheus metric types for detailed explanations about these metric types.

Kata agent metrics

Agent's metrics contains metrics about agent process.

Metric name	Type	Units	Labels	Introduced in Kata version
`kata_agent_io_stat`: Agent process IO stat.	`GAUGE`		`item` (see `/proc/<pid>/io`) `cancelled_write_byte` `rchar` `read_bytes` `syscr` `syscw` `wchar` `write_bytes` `sandbox_id`	2.0.0
`kata_agent_proc_stat`: Agent process stat.	`GAUGE`		`item` (see `/proc/<pid>/stat`) `cstime` `cutime` `stime` `utime` `sandbox_id`	2.0.0
`kata_agent_proc_status`: Agent process status.	`GAUGE`		`item` (see `/proc/<pid>/status`) `hugetlbpages` `nonvoluntary_ctxt_switches` `rssanon` `rssfile` `rssshmem` `vmdata` `vmexe` `vmhwm` `vmlck` `vmlib` `vmpeak` `vmpin` `vmpte` `vmrss` `vmsize` `vmstk` `vmswap` `voluntary_ctxt_switches` `sandbox_id`	2.0.0
`kata_agent_process_cpu_seconds_total`: Total user and system CPU time spent in seconds.	`COUNTER`	`seconds`	`sandbox_id`	2.0.0
`kata_agent_process_max_fds`: Maximum number of open file descriptors.	`GAUGE`		`sandbox_id`	2.0.0
`kata_agent_process_open_fds`: Number of open file descriptors.	`GAUGE`		`sandbox_id`	2.0.0
`kata_agent_process_resident_memory_bytes`: Resident memory size in bytes.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_agent_process_start_time_seconds`: Start time of the process since `unix` epoch in seconds.	`GAUGE`	`seconds`	`sandbox_id`	2.0.0
`kata_agent_process_virtual_memory_bytes`: Virtual memory size in bytes.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_agent_scrape_count`: Metrics scrape count	`COUNTER`		`sandbox_id`	2.0.0
`kata_agent_total_rss`: Agent process total `rss` size	`GAUGE`		`sandbox_id`	2.0.0
`kata_agent_total_time`: Agent process total time	`GAUGE`		`sandbox_id`	2.0.0
`kata_agent_total_vm`: Agent process total `vm` size	`GAUGE`		`sandbox_id`	2.0.0

Firecracker metrics

Metrics for Firecracker vmm.

Metric name	Type	Labels	Introduced in Kata version
`kata_firecracker_api_server`: Metrics related to the internal API server.	`GAUGE`	`item` `process_startup_time_cpu_us` `process_startup_time_us` `sync_response_fails` `sync_vmm_send_timeout_count` `sandbox_id`	2.0.0
`kata_firecracker_block`: Block Device associated metrics.	`GAUGE`	`item` `activate_fails` `cfg_fails` `event_fails` `execute_fails` `flush_count` `invalid_reqs_count` `no_avail_buffer` `queue_event_count` `rate_limiter_event_count` `rate_limiter_throttled_events` `read_bytes` `read_count` `update_count` `update_fails` `write_bytes` `write_count` `sandbox_id`	2.0.0
`kata_firecracker_get_api_requests`: Metrics specific to GET API Requests for counting user triggered actions and/or failures.	`GAUGE`	`item` `instance_info_count` `instance_info_fails` `machine_cfg_count` `machine_cfg_fails` `sandbox_id`	2.0.0
`kata_firecracker_i8042`: Metrics specific to the i8042 device.	`GAUGE`	`item` `error_count` `missed_read_count` `missed_write_count` `read_count` `reset_count` `write_count` `sandbox_id`	2.0.0
`kata_firecracker_latencies_us`: Performance metrics related for the moment only to snapshots.	`GAUGE`	`item` `diff_create_snapshot` `full_create_snapshot` `load_snapshot` `pause_vm` `resume_vm` `vmm_diff_create_snapshot` `vmm_full_create_snapshot` `vmm_load_snapshot` `vmm_pause_vm` `vmm_resume_vm` `sandbox_id`	2.0.0
`kata_firecracker_logger`: Metrics for the logging subsystem.	`GAUGE`	`item` `log_fails` `metrics_fails` `missed_log_count` `missed_metrics_count` `sandbox_id`	2.0.0
`kata_firecracker_mmds`: Metrics for the MMDS functionality.	`GAUGE`	`item` `connections_created` `connections_destroyed` `rx_accepted` `rx_accepted_err` `rx_accepted_unusual` `rx_bad_eth` `rx_count` `tx_bytes` `tx_count` `tx_errors` `tx_frames` `sandbox_id`	2.0.0
`kata_firecracker_net`: Network-related metrics.	`GAUGE`	`item` `activate_fails` `cfg_fails` `event_fails` `mac_address_updates` `no_rx_avail_buffer` `no_tx_avail_buffer` `rx_bytes_count` `rx_count` `rx_event_rate_limiter_count` `rx_fails` `rx_packets_count` `rx_partial_writes` `rx_queue_event_count` `rx_rate_limiter_throttled` `rx_tap_event_count` `tap_read_fails` `tap_write_fails` `tx_bytes_count` `tx_count` `tx_fails` `tx_malformed_frames` `tx_packets_count` `tx_partial_reads` `tx_queue_event_count` `tx_rate_limiter_event_count` `tx_rate_limiter_throttled` `tx_spoofed_mac_count` `sandbox_id`	2.0.0
`kata_firecracker_patch_api_requests`: Metrics specific to PATCH API Requests for counting user triggered actions and/or failures.	`GAUGE`	`item` `drive_count` `drive_fails` `machine_cfg_count` `machine_cfg_fails` `network_count` `network_fails` `sandbox_id`	2.0.0
`kata_firecracker_put_api_requests`: Metrics specific to PUT API Requests for counting user triggered actions and/or failures.	`GAUGE`	`item` `actions_count` `actions_fails` `boot_source_count` `boot_source_fails` `drive_count` `drive_fails` `logger_count` `logger_fails` `machine_cfg_count` `machine_cfg_fails` `metrics_count` `metrics_fails` `network_count` `network_fails` `sandbox_id`	2.0.0
`kata_firecracker_rtc`: Metrics specific to the RTC device.	`GAUGE`	`item` `error_count` `missed_read_count` `missed_write_count` `sandbox_id`	2.0.0
`kata_firecracker_seccomp`: Metrics for the seccomp filtering.	`GAUGE`	`item` `num_faults` `sandbox_id`	2.0.0
`kata_firecracker_signals`: Metrics related to signals.	`GAUGE`	`item` `sigbus` `sigsegv` `sandbox_id`	2.0.0
`kata_firecracker_uart`: Metrics specific to the UART device.	`GAUGE`	`item` `error_count` `flush_count` `missed_read_count` `missed_write_count` `read_count` `write_count` `sandbox_id`	2.0.0
`kata_firecracker_vcpu`: Metrics specific to VCPUs' mode of functioning.	`GAUGE`	`item` `exit_io_in` `exit_io_out` `exit_mmio_read` `exit_mmio_write` `failures` `filter_cpuid` `sandbox_id`	2.0.0
`kata_firecracker_vmm`: Metrics specific to the machine manager as a whole.	`GAUGE`	`item` `device_events` `panic_count` `sandbox_id`	2.0.0
`kata_firecracker_vsock`: VSOCK-related metrics.	`GAUGE`	`item` `activate_fails` `cfg_fails` `conn_event_fails` `conns_added` `conns_killed` `conns_removed` `ev_queue_event_fails` `killq_resync` `muxer_event_fails` `rx_bytes_count` `rx_packets_count` `rx_queue_event_count` `rx_queue_event_fails` `rx_read_fails` `tx_bytes_count` `tx_flush_fails` `tx_packets_count` `tx_queue_event_count` `tx_queue_event_fails` `tx_write_fails` `sandbox_id`	2.0.0

Kata guest OS metrics

Guest OS's metrics in hypervisor.

Metric name	Type	Labels	Introduced in Kata version
`kata_guest_cpu_time`: Guest CPU stat.	`GAUGE`	`cpu` (CPU no. and total for all CPUs) `0` (CPU 0) `1` (CPU 1) `total` (for all CPUs) `item` (Kernel/system statistics, from `/proc/stat`) `guest` `guest_nice` `idle` `iowait` `irq` `nice` `softirq` `steal` `system` `user` `sandbox_id`	2.0.0
`kata_guest_diskstat`: Disks stat in system.	`GAUGE`	`disk` (disk name) `item` (see `/proc/diskstats`) `discards` `discards_merged` `flushes` `in_progress` `merged` `reads` `sectors_discarded` `sectors_read` `sectors_written` `time_discarding` `time_flushing` `time_in_progress` `time_reading` `time_writing` `weighted_time_in_progress` `writes` `writes_merged` `sandbox_id`	2.0.0
`kata_guest_load`: Guest system load.	`GAUGE`	`item` `load1` `load15` `load5` `sandbox_id`	2.0.0
`kata_guest_meminfo`: Statistics about memory usage on the system.	`GAUGE`	`item` (see `/proc/meminfo`) `active` `active_anon` `active_file` `anon_hugepages` `anon_pages` `bounce` `buffers` `cached` `cma_free` `cma_total` `commit_limit` `committed_as` `direct_map_1G` `direct_map_2M` `direct_map_4M` `direct_map_4k` `dirty` `hardware_corrupted` `high_free` `high_total` `hugepages_free` `hugepages_rsvd` `hugepages_surp` `hugepages_total` `hugepagesize` `hugetlb` `inactive` `inactive_anon` `inactive_file` `k_reclaimable` `kernel_stack` `low_free` `low_total` `mapped` `mem_available` `mem_free` `mem_total` `mlocked` `mmap_copy` `nfs_unstable` `page_tables` `per_cpu` `quicklists` `s_reclaimable` `s_unreclaim` `shmem` `shmem_hugepages` `shmem_pmd_mapped` `slab` `swap_cached` `swap_free` `swap_total` `unevictable` `vmalloc_chunk` `vmalloc_total` `vmalloc_used` `writeback` `writeback_tmp` `sandbox_id`	2.0.0
`kata_guest_netdev_stat`: Guest net devices stats.	`GAUGE`	`interface` (network device name) `item` (see `/proc/net/dev`) `recv_bytes` `recv_compressed` `recv_drop` `recv_errs` `recv_fifo` `recv_frame` `recv_multicast` `recv_packets` `sent_bytes` `sent_carrier` `sent_colls` `sent_compressed` `sent_drop` `sent_errs` `sent_fifo` `sent_packets` `sandbox_id`	2.0.0
`kata_guest_tasks`: Guest system load.	`GAUGE`	`item` `cur` `max` `sandbox_id`	2.0.0
`kata_guest_vm_stat`: Guest virtual memory stat.	`GAUGE`	`item` (see `/proc/vmstat`) `allocstall_dma` `allocstall_dma32` `allocstall_movable` `allocstall_normal` `balloon_deflate` `balloon_inflate` `compact_daemon_free_scanned` `compact_daemon_migrate_scanned` `compact_daemon_wake` `compact_fail` `compact_free_scanned` `compact_isolated` `compact_migrate_scanned` `compact_stall` `compact_success` `drop_pagecache` `drop_slab` `htlb_buddy_alloc_fail` `htlb_buddy_alloc_success` `kswapd_high_wmark_hit_quickly` `kswapd_inodesteal` `kswapd_low_wmark_hit_quickly` `nr_active_anon` `nr_active_file` `nr_anon_pages` `nr_anon_transparent_hugepages` `nr_bounce` `nr_dirtied` `nr_dirty` `nr_dirty_background_threshold` `nr_dirty_threshold` `nr_file_pages` `nr_free_cma` `nr_free_pages` `nr_inactive_anon` `nr_inactive_file` `nr_isolated_anon` `nr_isolated_file` `nr_kernel_stack` `nr_mapped` `nr_mlock` `nr_page_table_pages` `nr_shmem` `nr_shmem_hugepages` `nr_shmem_pmdmapped` `nr_slab_reclaimable` `nr_slab_unreclaimable` `nr_unevictable` `nr_unstable` `nr_vmscan_immediate_reclaim` `nr_vmscan_write` `nr_writeback` `nr_writeback_temp` `nr_written` `nr_zone_active_anon` `nr_zone_active_file` `nr_zone_inactive_anon` `nr_zone_inactive_file` `nr_zone_unevictable` `nr_zone_write_pending` `oom_kill` `pageoutrun` `pgactivate` `pgalloc_dma` `pgalloc_dma32` `pgalloc_movable` `pgalloc_normal` `pgdeactivate` `pgfault` `pgfree` `pginodesteal` `pglazyfree` `pglazyfreed` `pgmajfault` `pgmigrate_fail` `pgmigrate_success` `pgpgin` `pgpgout` `pgrefill` `pgrotated` `pgscan_direct` `pgscan_direct_throttle` `pgscan_kswapd` `pgskip_dma` `pgskip_dma32` `pgskip_movable` `pgskip_normal` `pgsteal_direct` `pgsteal_kswapd` `pswpin` `pswpout` `slabs_scanned` `swap_ra` `swap_ra_hit` `unevictable_pgs_cleared` `unevictable_pgs_culled` `unevictable_pgs_mlocked` `unevictable_pgs_munlocked` `unevictable_pgs_rescued` `unevictable_pgs_scanned` `unevictable_pgs_stranded` `workingset_activate` `workingset_nodereclaim` `workingset_refault` `sandbox_id`	2.0.0

Hypervisor metrics

Hypervisors metrics, collected mainly from proc filesystem of hypervisor process.

Metric name	Type	Labels	Introduced in Kata version
`kata_hypervisor_fds`: Open FDs for hypervisor.	`GAUGE`	`sandbox_id`	2.0.0
`kata_hypervisor_io_stat`: Process IO statistics.	`GAUGE`	`item` (see `/proc/<pid>/io`) `cancelledwritebytes` `rchar` `readbytes` `syscr` `syscw` `wchar` `writebytes` `sandbox_id`	2.0.0
`kata_hypervisor_netdev`: Net devices statistics.	`GAUGE`	`interface` (network device name) `item` (see `/proc/net/dev`) `recv_bytes` `recv_compressed` `recv_drop` `recv_errs` `recv_fifo` `recv_frame` `recv_multicast` `recv_packets` `sent_bytes` `sent_carrier` `sent_colls` `sent_compressed` `sent_drop` `sent_errs` `sent_fifo` `sent_packets` `sandbox_id`	2.0.0
`kata_hypervisor_proc_stat`: Hypervisor process statistics.	`GAUGE`	`item` (see `/proc/<pid>/stat`) `cstime` `cutime` `stime` `utime` `sandbox_id`	2.0.0
`kata_hypervisor_proc_status`: Hypervisor process status.	`GAUGE`	`item` (see `/proc/<pid>/status`) `hugetlbpages` `nonvoluntary_ctxt_switches` `rssanon` `rssfile` `rssshmem` `vmdata` `vmexe` `vmhwm` `vmlck` `vmlib` `vmpeak` `vmpin` `vmpmd` `vmpte` `vmrss` `vmsize` `vmstk` `vmswap` `voluntary_ctxt_switches` `sandbox_id`	2.0.0
`kata_hypervisor_threads`: Hypervisor process threads.	`GAUGE`	`sandbox_id`	2.0.0

Kata monitor metrics

Metrics about monitor itself.

Metric name	Type	Units	Labels	Introduced in Kata version
`kata_monitor_go_gc_duration_seconds`: A summary of the pause duration of garbage collection cycles.	`SUMMARY`	`seconds`		2.0.0
`kata_monitor_go_goroutines`: Number of goroutines that currently exist.	`GAUGE`			2.0.0
`kata_monitor_go_info`: Information about the Go environment.	`GAUGE`		`version` (golang version) `go1.13.9` (environment dependent variable)	2.0.0
`kata_monitor_go_memstats_alloc_bytes`: Number of bytes allocated and still in use.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_alloc_bytes_total`: Total number of bytes allocated, even if freed.	`COUNTER`	`bytes`		2.0.0
`kata_monitor_go_memstats_buck_hash_sys_bytes`: Number of bytes used by the profiling bucket hash table.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_frees_total`: Total number of frees.	`COUNTER`			2.0.0
`kata_monitor_go_memstats_gc_cpu_fraction`: The fraction of this program's available CPU time used by the GC since the program started.	`GAUGE`			2.0.0
`kata_monitor_go_memstats_gc_sys_bytes`: Number of bytes used for garbage collection system metadata.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_heap_alloc_bytes`: Number of heap bytes allocated and still in use.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_heap_idle_bytes`: Number of heap bytes waiting to be used.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_heap_inuse_bytes`: Number of heap bytes that are in use.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_heap_objects`: Number of allocated objects.	`GAUGE`			2.0.0
`kata_monitor_go_memstats_heap_released_bytes`: Number of heap bytes released to OS.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_heap_sys_bytes`: Number of heap bytes obtained from system.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_last_gc_time_seconds`: Number of seconds since 1970 of last garbage collection.	`GAUGE`	`seconds`		2.0.0
`kata_monitor_go_memstats_lookups_total`: Total number of pointer lookups.	`COUNTER`			2.0.0
`kata_monitor_go_memstats_mallocs_total`: Total number of `mallocs`.	`COUNTER`			2.0.0
`kata_monitor_go_memstats_mcache_inuse_bytes`: Number of bytes in use by `mcache` structures.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_mcache_sys_bytes`: Number of bytes used for `mcache` structures obtained from system.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_mspan_inuse_bytes`: Number of bytes in use by `mspan` structures.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_mspan_sys_bytes`: Number of bytes used for `mspan` structures obtained from system.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_next_gc_bytes`: Number of heap bytes when next garbage collection will take place.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_other_sys_bytes`: Number of bytes used for other system allocations.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_stack_inuse_bytes`: Number of bytes in use by the stack allocator.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_stack_sys_bytes`: Number of bytes obtained from system for stack allocator.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_memstats_sys_bytes`: Number of bytes obtained from system.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_go_threads`: Number of OS threads created.	`GAUGE`			2.0.0
`kata_monitor_process_cpu_seconds_total`: Total user and system CPU time spent in seconds.	`COUNTER`	`seconds`		2.0.0
`kata_monitor_process_max_fds`: Maximum number of open file descriptors.	`GAUGE`			2.0.0
`kata_monitor_process_open_fds`: Number of open file descriptors.	`GAUGE`			2.0.0
`kata_monitor_process_resident_memory_bytes`: Resident memory size in bytes.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_process_start_time_seconds`: Start time of the process since `unix` epoch in seconds.	`GAUGE`	`seconds`		2.0.0
`kata_monitor_process_virtual_memory_bytes`: Virtual memory size in bytes.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_process_virtual_memory_max_bytes`: Maximum amount of virtual memory available in bytes.	`GAUGE`	`bytes`		2.0.0
`kata_monitor_running_shim_count`: Running shim count(running sandboxes).	`GAUGE`			2.0.0
`kata_monitor_scrape_count`: Scape count.	`COUNTER`			2.0.0
`kata_monitor_scrape_durations_histogram_milliseconds`: Time used to scrape from shims	`HISTOGRAM`	`milliseconds`		2.0.0
`kata_monitor_scrape_failed_count`: Failed scape count.	`COUNTER`			2.0.0

Kata containerd shim v2 metrics

Metrics about Kata containerd shim v2 process.

Metric name	Type	Units	Labels	Introduced in Kata version
`kata_shim_agent_rpc_durations_histogram_milliseconds`: RPC latency distributions.	`HISTOGRAM`	`milliseconds`	`action` (RPC actions of Kata agent) `grpc.CheckRequest` `grpc.CloseStdinRequest` `grpc.CopyFileRequest` `grpc.CreateContainerRequest` `grpc.CreateSandboxRequest` `grpc.DestroySandboxRequest` `grpc.ExecProcessRequest` `grpc.GetMetricsRequest` `grpc.GuestDetailsRequest` `grpc.ListInterfacesRequest` `grpc.ListProcessesRequest` `grpc.ListRoutesRequest` `grpc.MemHotplugByProbeRequest` `grpc.OnlineCPUMemRequest` `grpc.PauseContainerRequest` `grpc.RemoveContainerRequest` `grpc.ReseedRandomDevRequest` `grpc.ResumeContainerRequest` `grpc.SetGuestDateTimeRequest` `grpc.SignalProcessRequest` `grpc.StartContainerRequest` `grpc.StatsContainerRequest` `grpc.TtyWinResizeRequest` `grpc.UpdateContainerRequest` `grpc.UpdateInterfaceRequest` `grpc.UpdateRoutesRequest` `grpc.WaitProcessRequest` `grpc.WriteStreamRequest` `sandbox_id`	2.0.0
`kata_shim_fds`: Kata containerd shim v2 open FDs.	`GAUGE`		`sandbox_id`	2.0.0
`kata_shim_go_gc_duration_seconds`: A summary of the pause duration of garbage collection cycles.	`SUMMARY`	`seconds`	`sandbox_id`	2.0.0
`kata_shim_go_goroutines`: Number of goroutines that currently exist.	`GAUGE`		`sandbox_id`	2.0.0
`kata_shim_go_info`: Information about the Go environment.	`GAUGE`		`sandbox_id` `version` (golang version) `go1.13.9` (environment dependent variable)	2.0.0
`kata_shim_go_memstats_alloc_bytes`: Number of bytes allocated and still in use.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_alloc_bytes_total`: Total number of bytes allocated, even if freed.	`COUNTER`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_buck_hash_sys_bytes`: Number of bytes used by the profiling bucket hash table.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_frees_total`: Total number of frees.	`COUNTER`		`sandbox_id`	2.0.0
`kata_shim_go_memstats_gc_cpu_fraction`: The fraction of this program's available CPU time used by the GC since the program started.	`GAUGE`		`sandbox_id`	2.0.0
`kata_shim_go_memstats_gc_sys_bytes`: Number of bytes used for garbage collection system metadata.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_heap_alloc_bytes`: Number of heap bytes allocated and still in use.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_heap_idle_bytes`: Number of heap bytes waiting to be used.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_heap_inuse_bytes`: Number of heap bytes that are in use.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_heap_objects`: Number of allocated objects.	`GAUGE`		`sandbox_id`	2.0.0
`kata_shim_go_memstats_heap_released_bytes`: Number of heap bytes released to OS.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_heap_sys_bytes`: Number of heap bytes obtained from system.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_last_gc_time_seconds`: Number of seconds since 1970 of last garbage collection.	`GAUGE`	`seconds`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_lookups_total`: Total number of pointer lookups.	`COUNTER`		`sandbox_id`	2.0.0
`kata_shim_go_memstats_mallocs_total`: Total number of `mallocs`.	`COUNTER`		`sandbox_id`	2.0.0
`kata_shim_go_memstats_mcache_inuse_bytes`: Number of bytes in use by `mcache` structures.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_mcache_sys_bytes`: Number of bytes used for `mcache` structures obtained from system.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_mspan_inuse_bytes`: Number of bytes in use by `mspan` structures.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_mspan_sys_bytes`: Number of bytes used for `mspan` structures obtained from system.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_next_gc_bytes`: Number of heap bytes when next garbage collection will take place.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_other_sys_bytes`: Number of bytes used for other system allocations.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_stack_inuse_bytes`: Number of bytes in use by the stack allocator.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_stack_sys_bytes`: Number of bytes obtained from system for stack allocator.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_memstats_sys_bytes`: Number of bytes obtained from system.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_go_threads`: Number of OS threads created.	`GAUGE`		`sandbox_id`	2.0.0
`kata_shim_io_stat`: Kata containerd shim v2 process IO statistics.	`GAUGE`		`item` (see `/proc/<pid>/io`) `cancelledwritebytes` `rchar` `readbytes` `syscr` `syscw` `wchar` `writebytes` `sandbox_id`	2.0.0
`kata_shim_netdev`: Kata containerd shim v2 network devices statistics.	`GAUGE`		`interface` (network device name) `item` (see `/proc/net/dev`) `recv_bytes` `recv_compressed` `recv_drop` `recv_errs` `recv_fifo` `recv_frame` `recv_multicast` `recv_packets` `sent_bytes` `sent_carrier` `sent_colls` `sent_compressed` `sent_drop` `sent_errs` `sent_fifo` `sent_packets` `sandbox_id`	2.0.0
`kata_shim_pod_overhead_cpu`: Kata Pod overhead for CPU resources(percent).	`GAUGE`	percent	`sandbox_id`	2.0.0
`kata_shim_pod_overhead_memory_in_bytes`: Kata Pod overhead for memory resources(bytes).	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_proc_stat`: Kata containerd shim v2 process statistics.	`GAUGE`		`item` (see `/proc/<pid>/stat`) `cstime` `cutime` `stime` `utime` `sandbox_id`	2.0.0
`kata_shim_proc_status`: Kata containerd shim v2 process status.	`GAUGE`		`item` (see `/proc/<pid>/status`) `hugetlbpages` `nonvoluntary_ctxt_switches` `rssanon` `rssfile` `rssshmem` `vmdata` `vmexe` `vmhwm` `vmlck` `vmlib` `vmpeak` `vmpin` `vmpmd` `vmpte` `vmrss` `vmsize` `vmstk` `vmswap` `voluntary_ctxt_switches` `sandbox_id`	2.0.0
`kata_shim_process_cpu_seconds_total`: Total user and system CPU time spent in seconds.	`COUNTER`	`seconds`	`sandbox_id`	2.0.0
`kata_shim_process_max_fds`: Maximum number of open file descriptors.	`GAUGE`		`sandbox_id`	2.0.0
`kata_shim_process_open_fds`: Number of open file descriptors.	`GAUGE`		`sandbox_id`	2.0.0
`kata_shim_process_resident_memory_bytes`: Resident memory size in bytes.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_process_start_time_seconds`: Start time of the process since `unix` epoch in seconds.	`GAUGE`	`seconds`	`sandbox_id`	2.0.0
`kata_shim_process_virtual_memory_bytes`: Virtual memory size in bytes.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_process_virtual_memory_max_bytes`: Maximum amount of virtual memory available in bytes.	`GAUGE`	`bytes`	`sandbox_id`	2.0.0
`kata_shim_rpc_durations_histogram_milliseconds`: RPC latency distributions.	`HISTOGRAM`	`milliseconds`	`action` (Kata shim v2 actions) `checkpoint` `close_io` `connect` `create` `delete` `exec` `kill` `pause` `pids` `resize_pty` `resume` `shutdown` `start` `state` `stats` `update` `wait` `sandbox_id`	2.0.0
`kata_shim_threads`: Kata containerd shim v2 process threads.	`GAUGE`		`sandbox_id`	2.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kata-2-0-metrics.md

kata-2-0-metrics.md

Kata 2.0 Metrics Design

Limitations of Kata 1.x and target of Kata 2.0

Metrics architecture

Kata monitor

Kata runtime

Kata agent

Performance and overhead

Metrics size

Metrics delay

Metrics list

Metric types

Kata agent metrics

Firecracker metrics

Kata guest OS metrics

Hypervisor metrics

Kata monitor metrics

Kata containerd shim v2 metrics

Files

kata-2-0-metrics.md

Latest commit

History

kata-2-0-metrics.md

File metadata and controls

Kata 2.0 Metrics Design

Limitations of Kata 1.x and target of Kata 2.0

Metrics architecture

Kata monitor

Kata runtime

Kata agent

Performance and overhead

Metrics size

Metrics delay

Metrics list

Metric types

Kata agent metrics

Firecracker metrics

Kata guest OS metrics

Hypervisor metrics

Kata monitor metrics

Kata containerd shim v2 metrics