RFC 15056 - 2022-10-31 - Tooling revamp

This RFC discusses improving Vector's developer tooling in order to ease maintenance and expand testing possibilities.

Scope

In scope

Begin the standardization of interfacing with the repository (building, testing, releasing, etc.)
Define a version support matrix for each integration
Make developer UX consistent across platforms

Out of scope

Non-integration tests

Pain

Test failures are difficult to debug, esp. since this change
Windows is essentially unsupported since Make takes a great deal of effort to install and most scripts require Bash and utilities like find
Adding tests for new integrations to CI (here and here) is a manual and occasionally forgotten step (see outstanding questions)
Makefiles and scripts can get messy fast and often are hard to scale well

Proposal

User Experience

Interaction with the repository will be done with a new tool called vdev, which will eventually replace most Bash and all Ruby scripts. (PoC in #14990)

Vector's unified dev tool

Usage: vdev [OPTIONS] <COMMAND>

Commands:
  build   Build Vector
  config  Manage the vdev config file
  exec    Execute a command within the repository
  int     Manage integrations
  meta    Collection of useful utilities
  status  Show information about the current environment

Options:
  -v, --verbose...  More output per occurrence
  -q, --quiet...    Less output per occurrence
  -h, --help        Print help information
  -V, --version     Print version information

Implementation

The integration test directory will become nested, with each integration having its own directory, which will be a Rust crate.

Environment management

Each project will expose a CLI binary with 2 commands:

start - will set up the environment, create mock test data (like hitting some endpoint), wait until it's "ready", etc.
stop - will tear down the environment

Both commands receive a single argument representing the JSON configuration of the environment generated by the matrix (see below).

Note that while most environments will continue to be Docker-based, this unlocks the ability to use Kubernetes, Terraform, or do arbitrary things.

Configuration

Each directory will have a test.yaml with the following options:

args - default array of arguments to pass to the test command
env - table of environment variables that will be set during tests
matrix - array of tables used to generate the environments

For example, the Elasticsearch setup might be:

args:
- --features
- es-integration-tests
- --lib
- "::elasticsearch::integration_tests::"

env:
  AWS_ACCESS_KEY_ID: dummy
  AWS_SECRET_ACCESS_KEY: dummy
  ELASTICSEARCH_AWS_ADDRESS: http://localstack:4571
  ELASTICSEARCH_HTTP_ADDRESS: http://elasticsearch:9200
  ELASTICSEARCH_HTTPS_ADDRESS: https://elasticsearch-secure:9200

matrix:
- version: ["7.13.1"]
  type: ["classic"]

We could then extend the support matrices:

matrix:
- version: ["7.13.1", "8.4.3"]
  type: ["classic"]

- version: ["1.3.6", "2.3.0"]
  type: ["opensearch"]

That would indicate the following unique environments:

7.13.1-classic
8.4.3-classic
1.3.6-opensearch
2.3.0-opensearch

If we were testing the 8.4.3-classic environment the management CLI would get the following payload:

{
    "type": "classic",
    "version": "8.4.3"
}

Interface

The vdev int command would have the following sub-commands:

show - accepts an integration name and shows the available/running environments
start - calls the equivalently named command of the management CLI. requires an integration name and environment name. the integration names refer to the names of the integration test directories
stop - ^^
test - executes the tests and will terminate with the same exit code. accepts an integration name, environment name, and arbitrary extra arguments to pass to the test command. if no environment name is provided then all will be tested and any not already started will be torn down
```
vdev int test elasticsearch
vdev int test elasticsearch 7.13.1-classic
vdev int test elasticsearch 8.4.3-classic -- --no-capture
```

Testing container

There will be a single container per integration for running tests and the entry point will be /bin/sleep infinity.

Drawbacks

Slight change in process for all developers which would require communication and docs

Prior Art

ddev is a tool specific to Datadog Agent integrations
Hatch provides environment management and the matrix syntax we borrowed

Alternatives

Status quo

Could ignore all pain and add some more Docker compose test files

Python

Could call environments' start/stop methods through Hatch to avoid maintaining matrix logic
Could write all tooling in Python to theoretically increase development velocity

Since Rust is already a requirement, it's easiest to run a single Cargo command to install the tooling. Introducing a dependency on Python at any point would not just require installing the right version of Python but also installing tools inside of a dedicated virtual environment (to avoid dependency conflicts) and adding its bin (Scripts on Windows) directory to PATH. pipx could help with that.

Python development on macOS can be particularly painful.

Plan Of Attack

Merge the PoC of vdev
Implement the testing logic and add step to CI
For every integration ported remove it from the Makefile
When complete, enable integration tests on all PRs (even for contributors)

Future Improvements

Command to generate the scaffolding for new integrations
Shared test utilities like for running Docker compose
Only run an integration's tests on PRs for changes. The test matrix on PRs could be computed (example) by vdev based on what has changed. This will save somewhere between 6.5-8 hours of billable CI time per commit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2022-10-31-15056-tooling-revamp.md

2022-10-31-15056-tooling-revamp.md

RFC 15056 - 2022-10-31 - Tooling revamp

Table of Contents

Scope

In scope

Out of scope

Pain

Proposal

User Experience

Implementation

Environment management

Configuration

Interface

Testing container

Drawbacks

Prior Art

Alternatives

Status quo

Python

Plan Of Attack

Future Improvements

Files

2022-10-31-15056-tooling-revamp.md

Latest commit

History

2022-10-31-15056-tooling-revamp.md

File metadata and controls

RFC 15056 - 2022-10-31 - Tooling revamp

Table of Contents

Scope

In scope

Out of scope

Pain

Proposal

User Experience

Implementation

Environment management

Configuration

Interface

Testing container

Drawbacks

Prior Art

Alternatives

Status quo

Python

Plan Of Attack

Future Improvements