-
Notifications
You must be signed in to change notification settings - Fork 8
/
use-metrics.html.md.erb
216 lines (182 loc) · 9.31 KB
/
use-metrics.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
title: Using JMX Bridge
owner: PCF Metrics
---
<strong><%= modified_date %></strong>
JMX Bridge is a Java Management Extensions (JMX) tool for Elastic
Runtime.
To help you monitor your installation and assist in troubleshooting, JMX Bridge collects and exposes system data from Cloud Foundry components via a JMX
endpoint.
<p class="note"><strong>Note</strong>: If using JMX Bridge v1.8 with PCF v1.10, please see the following recommended <a href="https://docs.pivotal.io/pivotalcf/1-10/monitoring/kpi.html">Key Performance Indicators</a>. </p>
## <a id="cc"></a>Cloud Controller Metrics ##
JMX Bridge reports the number of Cloud Controller API requests
completed and the requests sent but not completed.
The number of requests sent but not completed represents the pending activity in
your system, and can be higher under load.
This number will vary over time, and the range it can vary over depends on
specifics of your environment such as hardware, OS, processor speeds, load, etc.
In any given environment, though, you can establish a typical range of values
and maximum for this number.
Use the Cloud Controller metrics to ensure that the Cloud Controller is processing API requests in a timely manner.
If the pending activity in your system increases significantly past the typical
maximum and stays at an elevated level, Cloud Controller requests may be failing
and additional troubleshooting may be necessary.
The following table shows the name of the Cloud Controller metric, what the
metric represents, and the metric type (data type).
<table border="1" class="nice" >
<tr>
<th><strong>METRIC NAME</strong></th>
<th><strong>DEFINITION</strong></th>
<th><strong>METRIC TYPE (DATA TYPE)</strong></th>
</tr>
<tr>
<td>cc.requests.completed</td><td>Number of Cloud Controller API requests completed since this instance of Cloud Controller started</td><td>Counter (Integer)</td>
</tr>
<tr>
<td>cc.requests.outstanding</td><td>Number of Cloud Controller API requests made but not completed since this instance of Cloud Controller started</td><td>Counter (Integer)</td>
</tr>
</table>
See the [Cloud Controller](https://docs.pivotal.io/pivotalcf/concepts/architecture/cloud-controller.html) topic for more information about the Cloud Controller.
## <a id="router"></a>Router Metrics ##
JMX Bridge reports the number of sent requests and the number of
completed requests for each Cloud Foundry component.
The difference between these two metrics is the number of requests made to a
component but not completed, and represents the pending activity for that
component.
The number for each component can vary over time, and is typically higher under
load.
In any given environment, though, you can establish a typical range of values
and maximum for this number for each component.
Use these metrics to ensure that the Router is passing requests to other
components in a timely manner.
If the pending activity for a particular component increase significantly past
the typical maximum and stays at an elevated level, additional troubleshooting
of that component may be necessary.
If the pending activity for most or all components increases significantly and
stays at elevated values, troubleshooting of the router may be necessary.
The following table shows the name of the Router metric, what the metric
represents, and the metric type (data type).
<table border="1" class="nice" >
<tr>
<th><strong>METRIC NAME</strong></th>
<th><strong>DEFINITION</strong></th>
<th><strong>METRIC TYPE (DATA TYPE)</strong></th>
</tr>
<tr>
<td>gorouter.requests<br />[component=c]</td><td>Number of requests the router has received for component <strong>c</strong> since this instance of the router has started<br /><strong>c</strong> can be CloudController or route-emitter</td><td>Counter (Integer)</td>
</tr>
<tr>
<td>gorouter.responses<br />[status=s,component=c]</td><td>Number of requests completed by component <strong>c</strong> since this instance of the router has started<br /><strong>c</strong> can be CloudController or route-emitter<br/><strong>s</strong> is http status family: 2xx, 3xx, 4xx, 5xx, and other</td><td>Counter (Integer)</td>
</tr>
</table>
See the [Router](https://docs.pivotal.io/pivotalcf/concepts/architecture/router.html) topic for more information about the
Router.
## <a id="diego"></a>Diego Metrics ##
Pivotal JMX Bridge reports metrics for the Diego cells and from the Diego Bulletin Board System (BBS). The following tables show the name of the Diego metric, what the metric represents, and the metric type (data type).
For general information about Diego, see the [Diego Architecture](https://docs.pivotal.io/pivotalcf/concepts/diego/diego-architecture.html) topic.
### Diego Cell Metrics
Pivotal JMX Bridge reports the following metrics for each Diego cell. If you have multiple cells, JMX Bridge reports metrics for each cell individually. The metrics are not summed across cells.
Use these metrics to determine the size of your deployment or when to scale up a deployment, and to track the status of Long Running Processes (LRP) in the Diego life cycle.
<table border="1" class="nice" >
<tr>
<th><strong>METRIC NAME</strong></th>
<th><strong>DEFINITION</strong></th>
<th><strong>METRIC TYPE (DATA TYPE)</strong></th>
</tr>
<tr>
<td>rep.CapacityTotalMemory</td><td>Total amount of memory available for this cell to allocate to containers</td><td>Gauge (Float)</td>
</tr>
<tr>
<td>rep.CapacityRemainingMemory</td><td>Remaining amount of memory available for this cell to allocate to containers</td><td>Gauge (Float)</td>
</tr>
<tr>
<td>rep.CapacityTotalDisk</td><td>Total amount of disk available for this cell to allocate to containers</td><td>Gauge (Float)</td>
</tr>
<tr>
<td>rep.CapacityRemainingDisk</td><td>Remaining amount of disk available for this cell to allocate to containers</td><td>Gauge (Float)</td>
</tr>
<tr>
<td>rep.ContainerCount</td><td>Number of containers hosted on the cell</td><td>Gauge (Integer)</td>
</tr>
</table>
### Diego BBS Metrics
Pivotal JMX Bridge reports these metrics from the Diego BBS, and are deployment-wide metrics. Use these metrics to inspect the state of the apps running on the deployment as a whole.
<table border="1" class="nice" >
<tr>
<th><strong>METRIC NAME</strong></th>
<th><strong>DEFINITION</strong></th>
<th><strong>METRIC TYPE (DATA TYPE)</strong></th>
</tr>
<tr>
<td>bbs.CrashedActualLRPs</td><td>Total number of LRP instances that have crashed</td><td>Gauge (Integer)</td>
</tr>
<tr>
<td>bbs.LRPsRunning</td><td>Total number of LRP instances that are running on cells</td><td>Gauge (Integer)</td>
</tr>
<tr>
<td>bbs.LRPsUnclaimed</td><td>Total number of LRP instances that have not yet been claimed by a cell</td><td>Gauge (Integer)</td>
</tr>
<tr>
<td>bbs.LRPsClaimed</td><td>Total number of LRP instances that have been claimed by some cell</td><td>Gauge (Integer)</td>
</tr>
<tr>
<td>bbs.LRPsDesired</td><td>Total number of LRP instances desired across all LRPs</td><td>Gauge (Integer)</td>
</tr>
<tr>
<td>bbs.LRPsExtra</td><td>Total number of LRP instances that are no longer desired but still have a BBS record</td><td>Gauge (Integer)</td>
</tr>
<tr>
<td>bbs.LRPsMissing</td><td>Total number of LRP instances that are desired but have no record in the BBS</td><td>Gauge (Integer)</td>
</tr>
</table>
## <a id="vm"></a>Virtual Machine Metrics ##
JMX Bridge reports data for each virtual machine (VM) in a deployment.
Use these metrics to monitor the health of your Virtual Machines.
The following table shows the name of the Virtual Machine metric, what the
metric represents, and the metric type (data type).
<table border="1" class="nice" >
<tr>
<th><strong>METRIC NAME</strong></th>
<th><strong>DEFINITION</strong></th>
<th><strong>METRIC TYPE (DATA TYPE)</strong></th>
</tr>
<tr>
<td>system.cpu.sys</td><td>Amount of CPU spent in system processes</td><td>Gauge (Float)</td>
</tr>
<tr>
<td>system.cpu.user</td><td>Amount of CPU spent in user processes</td><td>Gauge (Float)</td>
</tr>
<tr>
<td>system.cpu.wait</td><td>Amount of CPU spent in waiting processes</td><td>Gauge (Float)</td>
</tr>
<tr>
<td>system.disk.ephemeral.percent</td><td>Percentage of ephemeral disk used on the VM</td><td>Gauge (Float, 0-100)</td>
</tr>
<tr>
<td>system.disk.ephemeral.inode.percent</td><td>Percentage of inodes consumed by the ephemeral disk</td><td>Gauge (Float, 0-100)</td>
</tr>
<tr>
<td>system.disk.persistent.percent</td><td>Percentage of persistent disk used on the VM</td><td>Gauge (Float, 0-100)</td>
</tr>
<tr>
<td>system.disk.persistent.inode.percent</td><td>The percentage of inodes consumed by the persistent disk</td><td>Gauge (Float, 0-100)</td>
</tr>
<tr>
<td>system.disk.system.percent</td><td>Percentage of system disk used on the VM</td><td>Gauge (Float, 0-100)</td>
</tr>
<tr>
<td>system.healthy</td><td>Indicates whether a VM system is healthy. `1` means the system is healthy, and `0` means the system is not healthy</td><td>Gauge (Float, 0-1)</td>
</tr>
<tr>
<td>system.load.1m</td><td>Amount of load the system is under, averaged over one minute</td><td>Gauge (Float)</td>
</tr>
<tr>
<td>system.mem.percent</td><td>Percentage of memory used on the VM</td><td>Gauge (Float)</td>
</tr>
<tr>
<td>system.swap.kb</td><td>Amount of swap used on the VM in KB</td><td>Gauge (Float)</td>
</tr>
<tr>
<td>system.swap.percent</td><td>Percentage of swap used on the VM</td><td>Gauge (Float, 0-100)</td>
</tr>
</table>