title |
---|
Simulate GCP Faults |
This document describes how to use Chaos Mesh to inject faults into GCP Pod. Chaos Dashboard and YAML files are provided to create GCPChaos experiments.
GCPChaos is a fault type in Chaos Mesh. By creating a GCPChaos experiment, you can simulate fault scenarios of the specified GCP instance. Currently, GCPChaos supports the following fault types:
- Node Stop: stops the specified GCP instance.
- Node Reset: reboots the specified GCP instance.
- Disk Loss: uninstalls the storage volume from the specified GCP instance.
To easily connect to the GCP cluster, you can create a Kubernetes Secret
file to store the authentication information in advance.
Below is a sample secret
file:
apiVersion: v1
kind: Secret
metadata:
name: cloud-key-secret
namespace: chaos-testing
type: Opaque
stringData:
service_account: your-gcp-service-account-base64-encode
- name defines the name of kubernetes secret.
- namespace defines the namespace of kubernetes secret.
- service_account stores the service account of your GCP cluster. Remember to complete Base64 encoding for your GCP service account.
:::note
Before you create an experiment using Chaos Dashboard, make sure the following requirements are met:
-
Chaos Dashboard is installed.
-
Chaos Dashboard can be accessed using kubectl port-forward command:
kubectl port-forward -n chaos-testing svc/chaos-dashboard 2333:2333
You can then access the dashboard via
http://localhost:2333
in your browser.
:::
-
Open Chaos Dashboard, and click NEW EXPERIMENT on the page to create a new experiment:
-
In the Choose a Target area, choose GCP fault and select a specific behavior, such as STOP NODE:
-
Fill out the experiment information, and specify the experiment scope and the scheduled experiment duration:
-
Submit the experiment information.
-
Write the experiment configuration to the
gcpchaos-node-stop.yaml
, as shown below:apiVersion: chaos-mesh.org/v1alpha1 kind: GCPChaos metadata: name: node-stop-example namespace: chaos-testing spec: action: node-stop secretName: 'cloud-key-secret' project: 'your-project' zone: 'your-zone' instance: 'your-instance' duration: '5m'
Based on this configuration example, Chaos Mesh will inject the
node-stop
fault into the specified GCP instance so that the GCP instance will be unavailable in 5 minutes.For more information about stopping GCP instances, refer to Stop GCP instance.
-
After the configuration file is prepared, use
kubectl
to create an experiment:kubectl apply -f gcpchaos-node-stop.yaml
-
Write the experiment configuration to the
gcpchaos-node-reset.yaml
, as shown below:apiVersion: chaos-mesh.org/v1alpha1 kind: GCPChaos metadata: name: node-reset-example namespace: chaos-testing spec: action: node-reset secretName: 'cloud-key-secret' project: 'your-project' zone: 'your-zone' instance: 'your-instance' duration: '5m'
Based on this configuration example, Chaos Mesh will inject
node-reset
fault into the specified GCP instance so that the GCP instance will be reset.For more information about resetting GCP instances, refer to Resetting a GCP instance.
-
After the configuration file is prepared, use
kubectl
to create an experiment:kubectl apply -f gcpchaos-node-reset.yaml
-
Write the experiment configuration to the
gcpchaos-disk-loss.yaml
, as shown below:apiVersion: chaos-mesh.org/v1alpha1 kind: GCPChaos metadata: name: disk-loss-example namespace: chaos-testing spec: action: disk-loss secretName: 'cloud-key-secret' project: 'your-project' zone: 'your-zone' instance: 'your-instance' deviceNames: ['disk-name'] duration: '5m'
Based on this configuration example, Chaos Mesh will inject a
disk-loss
fault into the specified GCP instance so that the GCP instance is detached from the specified storage volume within 5 minutes.For more information about detaching GCP instances, refer to Detach GCP storage.
-
After the configuration file is prepared, use
kubectl
to create an experiment:kubectl apply -f gcpchaos-disk-loss.yaml
The following table shows the fields in the YAML configuration file.
Parameter | Type | Descpription | Default value | Required | Example |
---|---|---|---|---|---|
action | string | Indicates the specific type of faults. The available fault types include node-stop, node-reset, and disk-loss. | node-stop | Yes | node-stop |
mode | string | Indicates the mode of the experiment. The mode options include one (selecting a Pod at random), all (selecting all eligible Pods), fixed (selecting a specified number of eligible Pods), fixed-percent (selecting a specified percentage of the eligible Pods), and random-max-percent (selecting the maximum percentage of the eligible Pods). |
None | Yes | one |
value | string | Provides parameters for the mode configuration, depending on mode . For example, when mode is set to fixed-percent , value specifies the percentage of pods. |
None | No | 2 |
secretName | string | Indicates the name of the Kubernetes secret that stores the GCP authentication information. | None | No | cloud-key-secret |
project | string | Indicates the name of GCP project. | None | Yes | your-project |
zone | string | Indicates the region of GCP instance. | None | Yes | us-central1-a |
instance | string | Indicates the ID of GCP instance. | None | Yes | your-gcp-instance-id |
deviceNames | []string | This is a required field when the action is disk-loss . This field specifies the machine disk ID. |
None | no | ["your-disk-id"] |
duration | string | Indicates the duration of the experiment. | None | Yes | 30s |