Skip to content

Commit

Permalink
[CONSUL-415] Create Scenarios Troubleshooting Docs (#49)
Browse files Browse the repository at this point in the history
  • Loading branch information
joselo85 committed Dec 21, 2022
1 parent e606698 commit 56c677a
Showing 1 changed file with 59 additions and 52 deletions.
111 changes: 59 additions & 52 deletions test/integration/connect/envoy/WindowsTroubleshooting.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,83 @@
# Envoy Integration Tests on Windows

# Windows operation
## Index

## Steps for Windows operation
- [About this Guide](#about-this-guide)
- [Prerequisites](#prerequisites)
- [Running the Tests](#running-the-tests)
- [Troubleshooting](#troubleshooting)
- [About Envoy Integration Tests on Windows](#about-envoy-integration-tests-on-windows)
- [Common Errors](#common-errors)
- [Windows Scripts Changes](#windows-scripts-changes)
- [Volume Issues](#volume-issues)

- GO installation
- Library installation
- Build Images Execution
- From a Bash console execute: `./build-images.sh`
- Execution of the tests
- It is important to execute the CMD or Powershell tests
## About this Guide

On this guide you will find all the information required to run the Envoy integration tests on Windows.

### Common errors
## Prerequisites

If the tests are executed without docker running, the following error will be seen:
```shell
error during connect: This error may indicate that the docker daemon is not running.: Post "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.24/build?buildargs=%7B%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile-bats-windows&labels=%7B%7D&memory=0&memswap=0&networkmode=default&rm=1&shmsize=0&t=bats-verify&target=&ulimits=null&version=1": open //./pipe/docker_engine: The system cannot find the file specified.
```
To run the integration tests yo will need to have the following installed on your System:

If any of the docker images does not exist or is mistagged, an error similar to the following will be displayed:
```powershell
Error response from daemon: No such container: envoy_workdir_1
```
- GO v1.18(or later).
- Gotestsum library [installation](https://pkg.go.dev/gotest.tools/gotestsum).
- Docker.

If you run the Windows tests from WSL you will get the following error message:
```powershell
main_test.go:34: command failed: exec: "cmd": executable file not found in $PATH
```
Before running the tests, you will need to build the required Docker images, to do so, you can use the script provided [here](../../../../build-support-windows/build-images.sh):

## Considerations on differences in scripts
- Build Images Script Execution
- From a Bash console (GitBash or WSL) execute: `./build-images.sh`

- Creation of a new directory test case that includes the basic Windows configuration files. These configuration files include the definition of "local_service_address".
- The "http-addr", "grpc-addr" and "admin-access-log-path" flags were added to the creation of the Envoy Bootstrap files.
- The so called "sh" were changed for "Bash" calls in Windows containers.
- The creation of a function that recovers the IP of a docker container mounted.
- The IP address of the consultation is used in the "setup_upsert_l4_intention" function.
- The "config-dir" path of the creation of the images of "envoy_consul" was adapted to adapt to the Windows format.
- The use of the function "stop_and_copy_files" was included after the creation of the bootstrap files to include these among the shared files in the volume.
## Running the Tests

To execute the tests you need to run the following command depending on the shell you are using:
**On Powershell**:
`go test -v -timeout=30m -tags integration ./test/integration/connect/envoy -run="TestEnvoy/<TEST CASE>" -win=true`
Where **TEST CASE** is the individual test case we want to execute (e.g. case-badauthz).

**On Git Bash**:
`ENVOY_VERSION=<ENVOY VERSION> go test -v -timeout=30m -tags integration ./test/integration/connect/envoy -run="TestEnvoy/<TEST CASE>" -win=true`
Where **TEST CASE** is the individual test case we want to execute (e.g. case-badauthz), and **ENVOY VERSION** is the version which you are currently testing.

## Difference over network types
> [!TIP]
> When executing the integration tests using **Powershell** you may need to set the ENVOY_VERSION value manually in line 20 of the [run-tests.windows.sh](run-tests.windows.sh) file.
There are fundamental differences in networking between the Linux version and the Windows version. In Linux, a Host type network is used that links all the containers through localhost, but this type of network does not exist in Windows. In Windows, the use of a NAT-type network was chosen. This difference is the cause of many of the problems that exist when running the tests on Windows since, in Host-type networks, any call to localhost from any of the containers would refer to the Docker host. This brings problems in two different categories, on the one hand when it comes to setting up the containers required for the test environment and on the other hand when running the tests themselves.
## Troubleshooting

### Differences when lifting containers
### About Envoy Integration Tests on Windows

When building the test environment in the current architecture running with Windows, we find that there are problems linking the different containers with Consul. Many default settings are used in the Linux scheme. This assumes that the services are running on the same machine, so it checks pointing to "localhost". But, in windows architecture these configurations don't work since each container is considered an independent entity with its own localhost. In this aspect, the registration of the services in consul had to be modified so that they included the address of the sidecar, since without it the connection to the services is not made.
Integration tests on Linux run a multi-container architecture that take advantage of the Host Network Docker feature, using this feature means that the container's network stack is not isolated from the Docker host (the container shares the host’s networking namespace), and the container does not get its own IP-address allocated (read more about this [here](https://docs.docker.com/network/host/)). This feature is only available for Linux, which made migrating the tests to Windows challenging, since replicating the same architecture created more issues, that's why a **single container** architecture was chosen to run the Envoy integration tests.
Using a single container architecture meant that we could use the same tests as on linux, moreover we were able to speed-up their execution by replacing *docker run* commands which started utility containers, for *docker exec* commands.

### Common errors

If the tests are executed without docker running, the following error will be seen:

```powershell
services {
connect {
sidecar_service {
proxy {
local_service_address = "s1-sidecar-proxy"
}
}
}
}
error during connect: This error may indicate that the docker daemon is not running.: Post "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.24/build?buildargs=%7B%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile-bats-windows&labels=%7B%7D&memory=0&memswap=0&networkmode=default&rm=1&shmsize=0&t=bats-verify&target=&ulimits=null&version=1": open //./pipe/docker_engine: The system cannot find the file specified.
```

### Differences in test calls

The tests are carried out from the **envoy_verify-primary_1** bats container, in all cases pointing to localhost to verify some feature. When pointing to localhost, within the windows network it takes it as if it were pointing to itself and for that reason they fail. To solve it, a function was created that maps each port with a hostname and from there locates the assigned IP and returns the corresponding IP and port.
If any of the docker images does not exist or is mistagged, an error similar to the following will be displayed:

```powershell
@test "s1 proxy admin is up on :19000" {
retry_default curl -f -s localhost:19000/stats -o /dev/null
}
Error response from daemon: No such container: envoy_workdir_1
```

If you run the Windows tests from WSL you will get the following error message:

ADDRESS=$(nslookup envoy_s1-sidecar-proxy_1)
CONTAINER_HOSTPORT="${HOSTPORT/127.0.0.1:19000/"${ADDRESS}:19000"}"
```bash
main_test.go:34: command failed: exec: "cmd": executable file not found in $PATH
```

## Problems with the Workdir
## Windows Scripts Changes

- The "http-addr", "grpc-addr" and "admin-access-log-path" flags were added to the creation of the Envoy Bootstrap files.
- To execute commands sh was replaced by bash on our Windows container.
- All paths were updated to use Windows format.
- Created *stop_and_copy_files* function to copy files into the shared volume (see [volume issues](#volume-issues)).
- Changed the *-admin-bind* value from `0.0.0.0` to `127.0.0.1` when generating the Envoy Bootstrap files.
- Removed the *&&* from the *common_run_container_service's* docker exec command and replaced it with *\*.

## Volume Issues

A problem was found with the method set for creating the volume. The way the volume is currently created in Windows creates a static volume. This means that every time you want to reflect a change in the containers, it must be deleted and recreated. For this reason, every time a file is required to be modified from outside the application, the **stop_and_copy_files** function must be executed.
Another difference that arose when migrating the tests from Linux to Windows, is that file system operations can't be executed while Windows containers are running. Currently, when running the tests a **named volume** is created and all of the required files are copied into that volume. Because of the constraint mentioned before, the workaround we implemented was creating a function (**stop_and_copy_files**) that stops the *kubernetes/pause* container and executes a script to copy the required files and finally starts the container again.

0 comments on commit 56c677a

Please sign in to comment.