Skip to content

Commit

Permalink
feat(new source): Initial kubernetes_logs implementation (vectordot…
Browse files Browse the repository at this point in the history
…dev#2653)

* Add kubernetes-integration-tests feature

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Enable kubernetes tests

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add skaffold for quick development iterations

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add kubernetes mod and cargo feature

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add an HTTP client tailored for k8s API in an in-cluster environment

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a decoder for chained k8s responses

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add block_on_std to test_util

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add tools for processing HTTP bodies as streams of k8s responses

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add Watcher trait

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add ApiWatcher implementation

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add MockWatcher implementation

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a reflector implementation

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a placeholder for kubernetes logs source

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add paths provider implementation based on pods state driven by reflector

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add parser for k8s log formats

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add partial events merger

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add pod metadata annotator

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add kubernetes logs source implementation

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Change reflector errors to be less restrictive

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Better error handling

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Improve error handling

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Better error handling for event pipeline

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Abstract API watcher around watch request builder

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add assertions to the reflector internal state in-between the invocations

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix the comment at optional tranform mod

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a to do comment to make transform utils part of core

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix typo at kustomization.yaml

Signed-off-by: MOZGIII <mike-n@narod.ru>

* More flexible interface at resource version

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix the mod comment at resource version

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Remove manual minikube init from scripts/skaffold.sh

Skaffold is capable of detecting minikube itself. It will also work with
Docker Desktop on Windows and macOS out of the box.

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Specify shell at scripts/minikube-docker-env.sh

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Switch scripts/copy-docker-image-to-minikube.sh to use scripts/minikube-docker-env.sh

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Switch to using device inodes for file fingerprinting

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix grammar at in-cluster config

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add the variable that's missing to the error at in-cluster config

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Move in_cluster mod declaration to the top of the file

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Cut some ununsed code from src/kubernetes/resource_version.rs

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix typo at src/kubernetes/reflector.rs

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Move the file key to the top of the file

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix comment at parsers test

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a comment for Config::self_node_name

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Allow disabling automatic partial merge

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Allow customizing the fields names used by pod metadata annotator

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Abstract reflector around state

Adds an indirection layer while maintaining the same logic.

This unlocks:
- adding debounce to the evmap for advanced state management;
- adding removal delay to mitigate the race condition with pod being
  removed while having a huge log backlog;
Those were doable before, but at the cost of code clarity, reliability and
maintainability.

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add support for delayed delete at reflector

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Reimplement the tests to add delayed deletion testing capability

New channel-based mocks implementation allows to properly eliminate race
conditions and achieve the complete determinism and reliability of the
test scenario.

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Improve the request preparation code at kubernetes::client::Client

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add reexports at src/kubernetes/mod.rs

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Adjust and use watcher error constructors

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Eliminate unused transform

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a test for stream error behavior

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add tests and derives at transform::util::pick

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Require Watcher::Stream to be Send

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add instrumentation

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add state maintenance and move delayed delete logic into to a state wrapper

This unlocks further complicating the logic around state without making
reflector overcomplicated. This better aligns with the goal of building
composable and testable loosely coupled components.

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Ignore instrumenting state tests

There no way to assert individual emits, and asserting metrics directly
causes issues:
- these tests break the internal tests at the metrics implementation
  itself, since we end up initializing the metrics controller twice;
- testing metrics introduces unintended coupling between subsystems,
  ideally we only need to assert that we emit, but avoid assumptions on
  what the results of that emit are.

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Clean up reflector tests

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add flush debouncing to the evmap state

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Proper delay control at main test flow

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add evmap tests with and without debounce

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix a typo

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Document the controversial join_host_port

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Improve instrumenting watcher events

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Improve api watcher events

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Rename init to new

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Use Strings at parser tests

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Corrected partial message detection hueristics at docker parser

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Hint for where's what at parser test case

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Bump base image at skaffold/docker/Dockerfile to debian:bullseye-slim

Without it local builds don't work host OS' with glibc 2.29

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Better script layout at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Convert an annotation failure warn log to internal event

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Correct the shutdown logic

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Specify STOPSIGNAL at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Set terminationGracePeriodSeconds at distribution/kubernetes/vector-namespaced.yaml

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix paths generation on Windows

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a to do to unignore instrumenting state tests

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add Kubernetes section to the CONTRIBUTING.md

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Solve the issue with the config generation

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Better document the intent for the kubernetes_logs source

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Print vector version upon docker build at skaffold/docker/Dockerfile

This is so that we catch the error with inability to exec vector at
container build time, rather than at runtime.

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add more build-time output at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add patchelf at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Optimize commands order at skaffold/docker/Dockerfile for layer caching

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add kubectl at CONTRIBUTING.md

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add additional details at CONTRIBUTING.md

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix the data dir description

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Move transform utils to the source mod to avoid introducing global items

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix the typo at CONTRIBUTING.md

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Eliminate the ApiWatcher::invoke_boxed_stream

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add code docs for join_host_port

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a lifecycle system to manage futures sanely and reliably

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Reorganized the mod and use clauses at src/sources/kubernetes_logs/mod.rs

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Switch fingerprinter to checksum

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Revert "Switch fingerprinter to checksum"

Apparently it breaks everything. E2E tests start failing.
We need a better way.

This reverts commit a4c2a7d.

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Invert the condition at Debounce::signal

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix typo at src/internal_events/kubernetes/instrumenting_state.rs

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Remove the rate limit from KubernetesLogsEventReceived

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Adjust the log style at KubernetesLogsEventReceived

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Move pod metadata annotation to access file path directly

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add test to ensure MultiResponseDecoder doesn't leak memory

It was controversial, and so this test is added.

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Simplified the picker logic and added tests

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a special case for `\n` at the MultiResponseDecoder::finish

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Workaround for watch API failures

This should've been fixed by now, but it's still causing issues with older
k8s versions that we want to support.

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a long line test case for the CRI format parser

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Log the buffer state at response decoding error

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Handle data parsing errors as stream ends

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Ensure we only read logs under container name subdirectory

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Promote trace to an error at src/kubernetes/multi_response_decoder.rs

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Add a test case for bookmark parsing error

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Remove meaningless leading \n from the boorkmark test

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Update k8s-openapi to a branch with a fix for the bookmark parsing issue

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Switch kubernetes_logs source to Fingerprinter::FirstLineChecksum

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Revert "Handle data parsing errors as stream ends"

This reverts commit c0e1695.

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Switch k8s-openapi git repo to our fork

No actual changes there compared to upstream, just for preservation

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Use cargo patch instead of per-crate git spec for k8s-openapi

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Switch to the upstream of k8s-openapi

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Fix clippy offences

Signed-off-by: MOZGIII <mike-n@narod.ru>

* Bump k8s-openapi and switch to crates.io version

Signed-off-by: MOZGIII <mike-n@narod.ru>
Signed-off-by: Brian Menges <brian.menges@anaplan.com>
  • Loading branch information
MOZGIII authored and Brian Menges committed Dec 9, 2020
1 parent f32ad73 commit efb1f24
Show file tree
Hide file tree
Showing 60 changed files with 6,078 additions and 25 deletions.
80 changes: 80 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ expanding into more specifics.
1. [Tips and Tricks](#tips-and-tricks)
1. [Benchmarking](#benchmarking)
1. [Profiling](#profiling)
1. [Kubernetes](#kubernetes)
1. [Humans](#humans)
1. [Documentation](#documentation)
1. [Changelog](#changelog)
Expand Down Expand Up @@ -547,6 +548,85 @@ cat stacks.folded | inferno-flamegraph > flamegraph.svg
And that's it! You now have a flamegraph SVG file that can be opened and
navigated in your favorite web browser.

### Kubernetes

There is a special flow for when you develop portions of Vector that are
designed to work with Kubernetes, like `kubernetes_logs` source or the
`deployment/kubernetes/*.yaml` configs.

This flow facilitates building Vector and deploying it into a cluster.

#### Requirements

There are some extra requirements besides what you'd normally need to work on
Vector:

* `linux` system (create an issue if you want to work with another OS and we'll
help);
* [`skaffold`](https://skaffold.dev/)
* [`docker`](https://www.docker.com/)
* [`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
* [`kustomize`](https://kustomize.io/)
* [`minikube`](https://minikube.sigs.k8s.io/)-powered or other k8s cluster
* [`cargo watch`](https://github.com/passcod/cargo-watch)

#### The dev flow

Once you have the requirements, use the `scripts/skaffold.sh dev` command.

That's it, just one command should take care of everything!

It will

1. build the `vector` binary in development mode,
2. build a docker image from this binary via `skaffold/docker/Dockerfile`,
3. deploy `vector` into the Kubernetes cluster at your current kubectl context
using the built docker image and a mix of our production deployment
configuration from the `distribution/kubernetes/*.yaml` and the special
dev-flow configuration at `skaffold/manifests/*.yaml`; see
`kustomization.yaml` for the exact specification.

As the result of invoking the `scripts/skaffold.sh dev`, you should see
a `skaffold` process running on your local machine, printing the logs from the
deployed `vector` instance.

To stop the process, press `Ctrl+C`, and wait for `skaffold` to clean up
the cluster state and exit.

`scripts/skaffold.sh` wraps `skaffold`, you can use other `skaffold` subcommands
if it fits you better.

#### Troubleshooting

You might need to tweak `skaffold`, here are some hints:

* `skaffold` will try to detect whether a local cluster is used; if a local
cluster is used, `skaffold` won't push the docker images it builds to a
registry.
See [this page](https://skaffold.dev/docs/environment/local-cluster/)
for how you can troubleshoot and tweak this behavior.

* `skaffold` can rewrite the image name so that you don't try to push a docker
image to a repo that you don't have access to.
See [this page](https://skaffold.dev/docs/environment/image-registries/)
for more info.

* For the rest of the `skaffold` tweaks you might want to apply check out
[this page](https://skaffold.dev/docs/environment/).

#### Going through the dev flow manually

Is some cases `skaffold` may not work. It's possible to go through the dev flow
manually, without `skaffold`.

One of the important thing `skaffold` does is it patches the configuration to
tie things together. If you want to go without it, you'll have to take care of
that yourself, thus some additional knowledge of Kubernetes inner workings is
required.

Essentially, the steps you have to take to deploy manually are the same that
`skaffold` will perform, and they're outlined at the previous section.

## Humans

After making your change, you'll want to prepare it for Vector's users
Expand Down
86 changes: 71 additions & 15 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ strip-ansi-escapes = { version = "0.1.0"}
colored = "1.9"
warp = { package = "warp", version = "0.2", default-features = false, optional = true }
evmap = { version = "7", features = ["bytes"], optional = true }
evmap10 = { package = "evmap", version = "10", features = ["bytes"], optional = true }
logfmt = { version = "0.0.2", optional = true }
notify = "4.0.14"
once_cell = "1.3"
Expand All @@ -152,6 +153,7 @@ pulsar = { version = "1.0.0", default-features = false, features = ["tokio-runti
task-compat = "0.1"
cidr-utils = "0.4.2"
pin-project = "0.4.22"
k8s-openapi = { version = "0.9", features = ["v1_15"], optional = true }

# For WASM
vector-wasm = { path = "lib/vector-wasm", optional = true }
Expand Down Expand Up @@ -228,6 +230,10 @@ leveldb-cmake = ["leveldb", "leveldb/leveldb-sys-3"]
wasm = ["lucetc", "lucet-runtime", "lucet-wasi", "vector-wasm", "anyhow"]
wasm-timings = ["wasm"]

# Enables kubernetes dependencies and shared code. Kubernetes-related sources,
# transforms and sinks should depend on this feature.
kubernetes = ["k8s-openapi", "evmap10"]

# Sources
sources = [
"sources-docker",
Expand All @@ -246,6 +252,7 @@ sources = [
"sources-syslog",
"sources-tls",
"sources-vector",
"sources-kubernetes-logs",
]
sources-docker = ["bollard"]
sources-file = ["bytesize"]
Expand All @@ -263,6 +270,7 @@ sources-stdin = ["bytesize"]
sources-syslog = ["sources-socket", "syslog_loose"]
sources-tls = ["sources-http", "sources-logplex", "sources-socket", "sources-splunk_hec"]
sources-vector = ["sources-socket"]
sources-kubernetes-logs = ["kubernetes", "transforms-merge", "transforms-json_parser", "transforms-regex_parser"]

# Transforms
transforms = [
Expand Down Expand Up @@ -427,6 +435,7 @@ kafka-integration-tests = ["sources-kafka", "sinks-kafka"]
loki-integration-tests = ["sinks-loki"]
pulsar-integration-tests = ["sinks-pulsar"]
splunk-integration-tests = ["sinks-splunk_hec", "warp"]
kubernetes-integration-tests = ["sources-kubernetes-logs"]

shutdown-tests = ["sources","sinks-console","sinks-prometheus","sinks-blackhole","unix","rdkafka","transforms-log_to_metric","transforms-lua"]
disable-resolv-conf = []
Expand Down
9 changes: 5 additions & 4 deletions distribution/kubernetes/vector-namespaced.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ data:
# Configuration for vector.
# Docs: https://vector.dev/docs/
# Configure the controlled by the deployment.
# Data dir is location controlled at the `DaemonSet`.
data_dir = "/vector-data-dir"
# Ingest logs from Kubernetes.
[sources.kubernetes]
type = "kubernetes"
[sources.kubernetes_logs]
type = "kubernetes_logs"
---
apiVersion: apps/v1
kind: DaemonSet
Expand All @@ -28,11 +28,11 @@ spec:
metadata:
labels:
name: vector
vector.dev/exclude: "true"
spec:
containers:
- name: vector
image: timberio/vector:latest-alpine
imagePullPolicy: Always
args:
- --config
- /etc/vector/*.toml
Expand Down Expand Up @@ -61,6 +61,7 @@ spec:
- name: config-dir
mountPath: /etc/vector/
readOnly: true
terminationGracePeriodSeconds: 60
tolerations:
# This toleration is to have the daemonset runnable on master nodes.
# Remove it if your masters can't run pods.
Expand Down
10 changes: 10 additions & 0 deletions kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# This is a part of our skaffold setup for development.
# Do not use in production.

namespace: vector

resources:
- distribution/kubernetes/vector-global.yaml
- skaffold/manifests/namespace.yaml
- skaffold/manifests/config.yaml
- distribution/kubernetes/vector-namespaced.yaml
3 changes: 2 additions & 1 deletion scripts/copy-docker-image-to-minikube.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ docker save "${IMAGES[@]}" | gzip >"$IMAGES_ARCHIVE"
# Start a subshell to preserve the env state.
(
# Switch to minikube docker.
eval "$(minikube --shell bash docker-env)"
# shellcheck source=minikube-docker-env.sh disable=SC1091
. scripts/minikube-docker-env.sh

# Load images.
docker load -i "$IMAGES_ARCHIVE"
Expand Down
9 changes: 9 additions & 0 deletions scripts/minikube-docker-env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env bash
set -euo pipefail

if ! COMMANDS="$(minikube --shell bash docker-env)"; then
echo "Unable to obtain docker env from minikube; is minikube started?" >&2
exit 7
fi

eval "$COMMANDS"
Loading

0 comments on commit efb1f24

Please sign in to comment.