Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dogfood e2etest; Client/Server Application #617

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
39 changes: 39 additions & 0 deletions dogfood/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,42 @@ If your changes don't seem to propagate, you can:
- `make uninstall` and `make install`
or move to the top level directory and run
- `colima delete` and `make colima-start` and redo [dogfood instructions](README.md)

## Testing Datadog Metrics

To apply the datadog agent to your local colima environment, run the following:

```
kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/clusterrole.yaml"

kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/serviceaccount.yaml"

kubectl apply -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/clusterrolebinding.yaml"
```

Then take a look at the file `datadog-agent-all-features.yaml` (Feel free to remove the SECURITY feature as it is
unnecessary for testing). You will notice that an api key AND a random string encoded in base64 is required. Get yourself
an API key from your Datadog site, think of a random string, then do the following:

```
echo -n '<Your API key>' | base64
# Copy the encoding and paste it where needed in the datadog.yaml
echo -n 'Random string' | base64
# Copy the encoding and paste it where needed in the datadog.yaml
```

By default the Datadog site is set to the US site datadoghq.com. If you're using other sites, you may want to edit the
`DD_SITE` environment variable accordingly.

Deploy the Daemonset:
```
kubectl apply -f datadog-agent-all-features.yaml
```

Verify it is running correctly using `kubectl get daemonset` in the appropriate namespace (`default` is the default)

Once you've verified the daemonset is up and running, you'll need to get Kubernetes State Metrics with the following steps:
1. Download the kube-state manifests folder [here](https://github.com/kubernetes/kube-state-metrics/tree/master/examples/standard).
2. `kubectl apply -f <NAME_OF_THE_KUBE_STATE_MANIFESTS_FOLDER>`

Then you should be set to see metrics for the client and server containers.
3 changes: 2 additions & 1 deletion dogfood/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ install:
uninstall:
helm template ./client/chart | kubectl delete -f -
helm template ./server/chart | kubectl delete -f -
kubectl delete -f ../examples/namespace.yaml

reinstall: uninstall install

restart-client:
kubectl -n chaos-demo rollout restart deployment chaos-dogfood-client
Expand Down
7 changes: 7 additions & 0 deletions dogfood/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,13 @@ x
You can `kubectl apply -f examples/<disruption.yaml>` for any `example/` disruption files.
For gRPC disruption, you can follow these [detailed steps](../docs/grpc_disruption/demo_instructions.md).

### Sending Metrics to Datadog

For the purposes of testing disruptions/workflows, you should make sure that the datadog agent is properly installed
on the cluster that the client and server are running on. The Datadog Agent should be posting metrics related to CPU,
Network, and Disk which are all necessary to test the related disruptions. The client contains computation related to these disruptions and can be tested using
the disruptions mentioned.

### Clean up

- Run `make uninstall` to `kubectl delete` both charts as well as remove the namespace.
2 changes: 1 addition & 1 deletion dogfood/client/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM ubuntu:focal as client
FROM ubuntu:jammy as client

COPY built_go_client /usr/local/bin/dogfood_client

Expand Down
18 changes: 18 additions & 0 deletions dogfood/client/chart/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,22 @@ spec:
{{- end }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
volumes:
- name: data
persistentVolumeClaim:
claimName: dogfood-client-pvc
containers:
- name: read-file
image: ubuntu:bionic-20220128
command: [ "/bin/bash" ]
args:
[
"-c",
"echo 'create file to read from: /mnt/data/disk-read-file' && dd if=/dev/zero of=/mnt/data/disk-read-file bs=20k count=1; while true; do time dd if=/mnt/data/disk-read-file of=/dev/null iflag=direct; sleep 1; done",
]
volumeMounts:
- mountPath: /mnt/data
name: data
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
Expand All @@ -39,3 +54,6 @@ spec:
protocol: TCP
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumeMounts:
- mountPath: /mnt/data
name: data
17 changes: 17 additions & 0 deletions dogfood/client/chart/templates/volumeclaim.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Unless explicitly stated otherwise all files in this repository are licensed
# under the Apache License Version 2.0.
# This product includes software developed at Datadog (https://www.datadoghq.com/).
# Copyright 2022 Datadog, Inc.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dogfood-client-pvc
namespace: chaos-demo
spec:
storageClassName: longhorn
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
62 changes: 55 additions & 7 deletions dogfood/client/dogfood_client.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,11 @@ package main

import (
"context"
"errors"
"flag"
"fmt"
"log"
"os"
"strconv"
"time"

Expand Down Expand Up @@ -56,7 +58,50 @@ func getCatalogWithTimeout(client pb.ChaosDogfoodClient) ([]*pb.CatalogItem, err
return res.Items, nil
}

// regularly order food for different aniamls
func printAndLog(logLine string) {
fmt.Println(logLine)

go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, why do we want to write all this random data to /mnt/data/logging whenever we're trying to print to console? They dont seem to need to be so coupled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of printAndLog is to give the client Write IO. This is the function that will help us test Disk write disruptions properly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know why we want to have the logging done, but why put it in printAndLog? It seems this could be in gothreads that run separately, I don't see why we kick it off once per print

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this spawned from not knowing what data to write to disk at first. I think you can pull random data from somewhere, but I opted to just use the logs that were already in the code as data to write.

f, err := os.OpenFile("/dev/urandom", os.O_RDONLY|os.O_SYNC, 0644)
if err != nil {
log.Fatal(err)
}

logLineBytes := make([]byte, 500000)

_, err = f.Read(logLineBytes)
if err != nil {
log.Fatal(err)
}

if err = f.Close(); err != nil {
log.Fatal(err)
}

if _, err = os.Stat("/mnt/data/logging"); errors.Is(err, os.ErrNotExist) {
f, err = os.Create("/mnt/data/logging")
if err != nil {
log.Fatal(err)
}
} else {
f, err = os.OpenFile("/mnt/data/logging", os.O_WRONLY|os.O_SYNC, 0644)
if err != nil {
log.Fatal(err)
}
}

_, err = f.Write(logLineBytes)
if err != nil {
log.Fatal(err)
}

if err = f.Close(); err != nil {
log.Fatal(err)
}
}()
}

// regularly order food for different animals
// note: mouse should return error because food for mice is not in the catalog
func sendsLotsOfRequests(client pb.ChaosDogfoodClient) {
animals := []string{"dog", "cat", "mouse"}
Expand All @@ -66,24 +111,24 @@ func sendsLotsOfRequests(client pb.ChaosDogfoodClient) {

for {
// visually mark a new loop in logs
fmt.Println("x")
printAndLog("x")

// grab catalog
items, err := getCatalogWithTimeout(client)
if err != nil {
fmt.Printf("| ERROR getting catalog:%v\n", err.Error())
printAndLog(fmt.Sprintf("| ERROR getting catalog:%v\n", err.Error()))
}

fmt.Printf("| catalog: %v items returned %s\n", strconv.Itoa(len(items)), stringifyCatalogItems(items))
printAndLog(fmt.Sprintf("| catalog: %v items returned %s\n", strconv.Itoa(len(items)), stringifyCatalogItems(items)))
time.Sleep(time.Second)

// make an order
order, err := orderWithTimeout(client, animals[i])
if err != nil {
fmt.Printf("| ERROR ordering food: %v\n", err.Error())
printAndLog(fmt.Sprintf("| ERROR ordering food: %v\n", err.Error()))
}

fmt.Printf("| ordered: %v\n", order)
printAndLog(fmt.Sprintf("| ordered: %v\n", order))
time.Sleep(time.Second)

// iterate
Expand All @@ -106,9 +151,10 @@ func stringifyCatalogItems(items []*pb.CatalogItem) string {

func main() {
// create and eventually close connection
fmt.Printf("connecting to %v...\n", serverAddr)
printAndLog(fmt.Sprintf("connecting to %v...\n", serverAddr))

var opts []grpc.DialOption

opts = append(opts, grpc.WithInsecure())
opts = append(opts, grpc.WithBlock())

Expand All @@ -126,5 +172,7 @@ func main() {
// generate and use client
client := pb.NewChaosDogfoodClient(conn)

printAndLog("We successfully generated the client, getting ready to send requests")

sendsLotsOfRequests(client)
}
Loading