title |
---|
Simulate Network Faults |
This document describes how to simulate network faults using NetworkChaos in Chaos Mesh.
NetworkChaos is a fault type in Chaos Mesh. By creating a NetworkChaos experiment, you can simulate a network fault scenario for a cluster. Currently, NetworkChaos supports the following fault types:
- Partition: network disconnection and partition.
- Net Emulation: poor network conditions, such as high delays, high packet loss rate, packet reordering, and so on.
- Bandwidth: limit the communication bandwidth between nodes.
Before creating NetworkChaos experiments, ensure the following:
- During the network injection process, make sure that the connection between Controller Manager and Chaos Daemon works, otherwise the NetworkChaos cannot be restored anymore.
- If you want to simulate Net Emulation fault, make sure the NET_SCH_NETEM module is installed in the Linux kernel. If you are using CentOS, you can install the module through the kernel-modules-extra package. Most other Linux distributions have installed the module already by default.
-
Open Chaos Dashboard, and click NEW EXPERIMENT on the page to create a new experiment:
-
In the Choose a Target area, choose NETWORK ATTACK and select a specific behavior, such as LOSS. Then fill out specific configuration.
For details of specific configuration fields, refer to [Field description](#field description).
-
Fill out the experiment information, and specify the experiment scope and the scheduled experiment duration.
-
Submit the experiment information.
-
Write the experiment configuration to the
network-delay.yaml
file, as shown below:apiVersion: chaos-mesh.org/v1alpha1 kind: NetworkChaos metadata: name: delay spec: action: delay mode: one selector: namespaces: - default labelSelectors: 'app': 'web-show' delay: latency: '10ms' correlation: '100' jitter: '0ms'
This configuration causes a latency of 10 milliseconds in the network connections of the target Pods. In addition to latency injection, Chaos Mesh supports packet loss and packet reordering injection. For details, see field description.
-
After the configuration file is prepared, use
kubectl
to create an experiment:kubectl apply -f ./network-delay.yaml
-
Write the experiment configuration to the
network-partition.yaml
file, as shown below:apiVersion: chaos-mesh.org/v1alpha1 kind: NetworkChaos metadata: name: partition spec: action: partition mode: all selector: namespaces: - default labelSelectors: 'app': 'app1' direction: to target: mode: all selector: namespaces: - default labelSelectors: 'app': 'app2'
This configuration blocks the connection created from
app1
toapp2
. The value for thedirection
field can beto
,from
orboth
. For details, refer to Field description. -
After the configuration file is prepared, use
kubectl
to create the experiment:kubectl apply -f ./network-partition.yaml
-
Write the experiment configuration to the
network-bandwidth.yaml
file, as shown below:apiVersion: chaos-mesh.org/v1alpha1 kind: NetworkChaos metadata: name: bandwidth spec: action: bandwidth mode: all selector: namespaces: - default labelSelectors: 'app': 'app1' bandwidth: rate: '1mbps' limit: 100 buffer: 10000
This configuration limits the bandwidth of
app1
to 1 mbps. -
After the configuration file is prepared, use
kubectl
to create the experiment:kubectl apply -f ./network-bandwidth.yaml
Parameter | Type | Description | Default value | Required | Example |
---|---|---|---|---|---|
action | string | Indicates the specific fault type. Available types include: netem , delay (network delay), loss (packet loss), duplicate (packet duplicating), corrupt (packet corrupt), partition (network partition), and bandwidth (network bandwidth limit).After you specify action field, refer to Description for action -related fields for other necessary field configuration. |
None | Yes | Partition |
target | Selector | Used in combination with direction, making Chaos only effective for some packets. | None | No | |
direction | enum | Indicates the direction of target packets. Available vaules include from (the packets from target ), to (the packets to target ), and both ( the packets from or to target ). This parameter makes Chaos only take effect for a specific direction of packets. |
to | No | both |
mode | string | Specifies the mode of the experiment. The mode options include one (selecting a random Pod), all (selecting all eligible Pods), fixed (selecting a specified number of eligible Pods), fixed-percent (selecting a specified percentage of Pods from the eligible Pods), and random-max-percent (selecting the maximum percentage of Pods from the eligible Pods). |
None | Yes | 1 |
value | string | Provides a parameter for the mode configuration, depending on mode . For example, when mode is set to fixed-percent , value specifies the percentage of Pods. |
None | No | 2 |
containerNames | []string | Specifies the name of the container into which the fault is injected. | None | No | ["nginx"] |
selector | struct | Specifies the target Pod. For details, refer to Define the experiment scope. | None | Yes |
For the Net Emulation and Bandwidth fault types, you can further configure the action
related parameters according to the following description.
- Net Emulation type:
delay
,loss
,duplicated
,corrupt
- Bandwidth type:
bandwidth
Setting action
to delay
means simulating network delay fault. You can also configure the following parameters.
Parameter | Type | Description | Required | Required | Example |
---|---|---|---|---|---|
latency | string | Indicates the network latency | No | No | 2ms |
correlation | string | Indicates the correlation between the current latency and the previous one. | No | No | 0.5 |
jitter | string | Indicates the range of the network latency | No | No | 1ms |
reorder | Reorder(#Reorder) | Indicates the status of network packet reordering | No |
The computational model for correlation
is as follows:
-
Generate a random number whose distribution is related to the previous value:
rnd = value * (1-corr) + last_rnd * corr
rnd
is the random number.corr
is thecorrelation
you fill out before. -
Use this random number to determine the delay of the current packet:
((rnd % (2 * sigma)) + mu) - sigma
In the above command,
sigma
isjitter
andmu
islatency
.
Setting action
to reorder
means simulating network packet reordering fault. You can also configure the following parameters.
Parameter | Type | Description | Default value | Required | Example |
---|---|---|---|---|---|
reorder | string | Indicates the probability to reorder | 0 | No | 0.5 |
correlation | string | Indicates the correlation between this time's length of delay time and the previous time's length of delay time | 0 | No | 0.5 |
gap | int | Indicates the gap before and after packet reordering | 0 | No | 5 |
Setting action
to loss
means simulating packet loss fault. You can also configure the following parameters.
Parameter | Type | Description | Default value | Required | Example |
---|---|---|---|---|---|
loss | string | Indicates the probability of packet loss | 0 | No | 0.5 |
correlation | string | Indicates the correlation between the probability of current packet loss and the previous time's packet loss | 0 | No | 0.5 |
Set action
to duplicate
, meaning simulating package duplication. At this point, you can also set the following parameters.
Parameter | Type | Description | Default value | Required | Example |
---|---|---|---|---|---|
duplicate | string | Indicates the probability of packet duplicating | 0 | No | 0.5 |
correlation | string | Indicates the correlation between the probability of current packet duplicating and the previous time's packet duplicating | 0 | No | 0.5 |
Setting action
to corrupt
means simulating package corruption fault. You can also configure the following parameters.
Parameter | Type | Description | Default value | Required | Example |
---|---|---|---|---|---|
corrupt | string | Indicates the probability of packet corruption | 0 | No | 0.5 |
correlation | string | Indicates the correlation between the probability of current packet corruption and the previous time's packet corruption | 0 | No | 0.5 |
For occasional events such as reorder
, loss
, duplicate
, and corrupt
, the correlation
is more complicated. For specific model description, refer to NetemCLG.
Setting action
to bandwidth
means simulating bandwidth limit fault. You also need to configure the following parameters.
Parameter | Type | Description | Default value | Required | Example |
---|---|---|---|---|---|
rate | string | Indicates the rate of bandwidth limit | Yes | 1mbps | |
limit | string | Indicates the number of bytes waiting in queue | Yes | 1 | |
buffer | uint32 | Indicates the maximum number of bytes that can be sent instantaneously | Yes | 1 | |
peakrate | uint64 | Indicates the maximum consumption of bucket (usually not set) |
No | 1 | |
minburst | uint32 | Indicates the size of peakrate bucket (usually not set) |
No | 1 |
For more details of these fields, you can refer to tc-tbf document.