Skip to content

Commit

Permalink
restructuring (watcher stuff last)
Browse files Browse the repository at this point in the history
Signed-off-by: clux <sszynrae@gmail.com>
  • Loading branch information
clux committed Nov 8, 2023
1 parent b613705 commit 4475038
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 30 deletions.
65 changes: 35 additions & 30 deletions docs/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,34 @@
# Troubleshooting

Problems with [Api] commands failing is often RBAC or naming related and can be identified either in error codes or logs. Some common problems and solutions are explored herein.
Problems with [Api] commands failing is often RBAC, misspelled names (used for url construction) and can be usually be identified via error codes and logs. Some common problems and solutions are explored herein.

See [[observability#adding-logs]] for how to setup tracing subscribers to `kube` logs gets printed.
!!! note "Logs are a prerequisite"

See [[observability#adding-logs]] for how to setup tracing subscribers properly ([env_logger] works also).

## Request Inspection

If you are replicating `kubectl` behaviour, then you can cross-reference with logs.

Given an [example alpine pod](https://github.com/kube-rs/kube/blob/12bd223e0a7ef49c4ed0420a169e6c1bc3c1e214/examples/pod_exec.rs#L19-L29), we will run `exec` on a shell loop, and use `-v=9` to look for a `curl` expression in a large (and abbreviated) amount of debug output to see what we actually tell the apiserver to do:

```sh
$ kubectl exec example -it -v=9 -- sh -c 'for i in $(seq 1 3); do date; done'

round_trippers.go:466] curl -v -XPOST -H "X-Stream-Protocol-Version: v4.channel.k8s.io" \
'https://0.0.0.0:64262/api/v1/namespaces/kube-system/pods/example/exec?command=sh&command=-c&command=for+i+in+%24%28seq+1+3%29%3B+do+date%3B+done&container=example&stdin=true&stdout=true&tty=true'
```

This url and query parameters can be cross-referenced in the logs from `kube_client`.

A very similar call is here being done [from the `pod_exec` example](https://github.com/kube-rs/kube/blob/12bd223e0a7ef49c4ed0420a169e6c1bc3c1e214/examples/pod_exec.rs#L57-L63), and when running with `RUST_LOG=debug` we can find a "requesting" debug line with the url used:

```sh
$ RUST_LOG=debug cargo run --example pod_exec
DEBUG HTTP{http.method=GET http.url=https://0.0.0.0:64262/api/v1/namespaces/kube-system/pods/example/exec?&stdout=true&command=sh&command=-c&command=for+i+in+%24%28seq+1+3%29%3B+do+date%3B+done otel.name="exec" otel.kind="client"}: kube_client::client::builder: requesting
```

Then we can investigate whether our query parameters matches what is expected (in this case stream differences and tty differences).

## Access

Expand All @@ -18,7 +44,7 @@ And they should be visible directly provided you are actully printing your error
If you turn up logging to `RUST_LOG=kube=debug` you should also see most errors internally.
### Watcher Errors
## Watcher Errors
A [watcher] will expose [watcher::Error] as the error part of it's `Stream` items. If these errors are discarded, it might lead to a continuously failing and retrying program.
Expand All @@ -38,20 +64,23 @@ If you are not printing the watcher errors yourself, you can get them via logs f
WARN kube_runtime::watcher: watcher error 403: Api(ErrorResponse { status: "Failure", message: "documents.kube.rs is forbidden: User \"system:serviceaccount:default:doc-controller\" cannot watch resource \"documents\" in API group \"kube.rs\" at the cluster scope", reason: "Forbidden", code: 403 })
```
#### Watcher Error Handling
## Stream Errors
Because of the soft-error policy on stream errors, it's useful to consider what to do with errors in general from infinite streams.
The __easiest__ error handling setup is to tear down the application on any errors by (say) passing stream errors through a `try_for_each` (ala [pod_watcher](https://github.com/kube-rs/kube/blob/5813ad043e00e7b34de5e22a3fd983419ece2493/examples/pod_watcher.rs#L26-L33)) or a `try_next` loop (ala [event_watcher](https://github.com/kube-rs/kube/blob/5813ad043e00e7b34de5e22a3fd983419ece2493/examples/event_watcher.rs#L39-L43)).
!!! note "Crashing in-cluster"
If you are deployed in-cluster, don't be afraid to exit early on errors you don't expect. Exits are easier to handle than a badly running app in a confusing state. By crashing, you get [retry with backoff](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy) for free, plus you often get alerts such as [KubePodCrashLooping](https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubepodcrashlooping/) triggering (without instrumentation needed).
If you are deployed in-cluster, don't be afraid to exit(1)/crash early on errors you don't expect. Exits are easier to handle than a badly running app in a confusing state. By crashing, you get [retry with backoff](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy) for free, plus you often get alerts such as [KubePodCrashLooping](https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubepodcrashlooping/) triggering (without instrumentation needed).
While easy, early exits is not the best solution;
- __Locally__, having a CLI abruptly exit is a bad user experience.
- __In-cluster__, frequent restarts of a large app with many spurious non-fatal condition can mask underlying problems.
- early exits throw cancel-safety and state transaction concerns out the window
For controllers with multiple watchers, [[observability#Adding Metrics]] is customary, so that you can alert on percentage error rates over a time span.
For controllers with multiple watchers, [[observability#Adding Metrics]] is instead customary, so that you can alert on percentage error rates over a time span (telling the operator to go look at logs for why).
It is also common to check for **blocker errors** up-front before starting an infinite watch stream;
Expand All @@ -68,29 +97,5 @@ watcher(docs, conf).try_for_each(|_| future::ready(Ok(()))).await?;
This is a particularly common error case since CRD installation is often managed out-of-band with the application and thus often neglected.
### Request Inspection
If you are replicating `kubectl` behaviour, then you can cross-reference with logs.
Given an [example alpine pod](https://github.com/kube-rs/kube/blob/12bd223e0a7ef49c4ed0420a169e6c1bc3c1e214/examples/pod_exec.rs#L19-L29), we will run `exec` on a shell loop, and use `-v=9` to look for a `curl` expression in a large (and abbreviated) amount of debug output to see what we actually tell the apiserver to do:
```sh
$ kubectl exec example -it -v=9 -- sh -c 'for i in $(seq 1 3); do date; done'
round_trippers.go:466] curl -v -XPOST -H "X-Stream-Protocol-Version: v4.channel.k8s.io" \
'https://0.0.0.0:64262/api/v1/namespaces/kube-system/pods/example/exec?command=sh&command=-c&command=for+i+in+%24%28seq+1+3%29%3B+do+date%3B+done&container=example&stdin=true&stdout=true&tty=true'
```
This url and query parameters can be cross-referenced in the logs from `kube_client`.
A very similar call is here being done [from the `pod_exec` example](https://github.com/kube-rs/kube/blob/12bd223e0a7ef49c4ed0420a169e6c1bc3c1e214/examples/pod_exec.rs#L57-L63), and when running with `RUST_LOG=debug` we can find a "requesting" debug line with the url used:
```sh
$ RUST_LOG=debug cargo run --example pod_exec
DEBUG HTTP{http.method=GET http.url=https://0.0.0.0:64262/api/v1/namespaces/kube-system/pods/example/exec?&stdout=true&command=sh&command=-c&command=for+i+in+%24%28seq+1+3%29%3B+do+date%3B+done otel.name="exec" otel.kind="client"}: kube_client::client::builder: requesting
```
Then we can investigate whether our query parameters matches what is expected (in this case stream differences and tty differences).
--8<-- "includes/abbreviations.md"
--8<-- "includes/links.md"
1 change: 1 addition & 0 deletions includes/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,3 +95,4 @@
[Server-Side Apply]: https://kubernetes.io/docs/reference/using-api/server-side-apply/
[k3d]: https://k3d.io/
[JsonSchema]: https://docs.rs/schemars/latest/schemars/trait.JsonSchema.html
[env_logger]: https://docs.rs/env_logger/latest/env_logger/

0 comments on commit 4475038

Please sign in to comment.