Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add common configuration and troubleshooting tasks #408

Merged
merged 8 commits into from
Feb 12, 2020
43 changes: 43 additions & 0 deletions deploy/docs/Best_Practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,46 @@ $ helm upgrade collection sumologic/sumologic --reuse-values -f values.yaml
See the following links to official Fluentd buffer documentation:
- https://docs.fluentd.org/configuration/buffer-section
- https://docs.fluentd.org/buffer/file

### Excluding Logs From Specific Components

You can exclude specific logs from being sent to Sumo Logic by specifying the following parameters either in the `values.yaml` file or the `helm install` command.
```
excludeContainerRegex
excludeHostRegex
excludeNamespaceRegex
excludePodRegex
```

- This is Ruby regex, so all ruby regex rules apply. Unlike regex in the Sumo collector, you do not need to match the entire line. When doing multiple patterns, put them inside of parentheses and pipe separate them.
- For things like pods and containers you will need to use a star at the end because the string is dynamic. Example:
```bash
excludepodRegex: "(dashboard.*|sumologic.*)"
```
- For things like namespace you won’t need to use a star at the end since there is no dynamic string. Example:
```bash
excludeNamespaceRegex: “(sumologic|kube-public)”
```

### Add a local file to fluent-bit configuration

If you want to capture container logs to a container that writes locally, you will need to ensure the logs get mounted to the host so fluent-bit can be configured to capture from the host.

Example:
In the fluentbit overrides file (https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/master/deploy/fluent-bit/overrides.yaml) in the `rawConfig section`, you have to add a new input specifying the file path, eg.

```bash
[INPUT]
Name tail
Path /var/log/syslog
```
Reference: https://fluentbit.io/documentation/0.12/input/tail.html

### Filtering Prometheus Metrics by Namespace in the Remote Write Config
If you want to filter metrics by namespace, it can be done in the prometheus remote write config. Here is an example of excluding kube-state metrics.
```bash
- action: drop
regex: kube-state-metrics;(namespace1|namespace2)
sourceLabels: [job, namespace]
```
The above section should be added in each of the kube-state remote write blocks.
10 changes: 10 additions & 0 deletions deploy/docs/Troubleshoot_Collection.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,16 @@ helm install stable/prometheus-operator --name prometheus-operator --namespace s

There’s an issue with backwards compatibility in the current version of the prometheus-operator helm chart that requires us to override the selectors for kube-scheduler and kube-controller-manager in order to see metrics from them. If you are not seeing metrics from these two targets, try running the commands in the "Configure Prometheus" section [here](./Non_Helm_Installation.md#missing-metrics-for-controller-manager-or-scheduler).

### Promethues stuck in `Terminating` state after running `helm del collection`
Delete the pod forcefully by adding `--force --grace-period=0` to the `kubectl delete pod` command.


### Validation error in helm installation
``` bash
Error: validation failed: [unable to recognize no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1"
```
This is a known race condition with Helm and there is no workaround at this time. If this happens just re-run the `helm install` command and add in `--no-crd-hook` to the helm install command.

### Rancher

If you are running the out of the box rancher monitoring setup, you cannot run our Prometheus operator alongside it. The Rancher Prometheus Operator setup will actually kill and permanently terminate our Prometheus Operator instance and will prevent the metrics system from coming up.
Expand Down