Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from ovn-nbctl, ovn-sbctl, and ovs-vsctl to using ovsdbapp #136

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
65d9ece
generate_hosts: Small refactor
putnopvut Aug 29, 2022
ac7c7e2
Add get-config-value.py script.
putnopvut Sep 21, 2022
ca5ee1e
ovn-tester: Containerize the tester.
putnopvut Sep 21, 2022
c1c8996
ovn-tester: Use ovsdbapp in place of ovs-vsctl
putnopvut Aug 24, 2022
f830f74
ovn-tester: Use ovsdbapp in place of ovn-nbctl.
putnopvut Sep 2, 2022
6134fd6
ovn-tester: Use ovsdbapp in place of ovs-sbctl.
putnopvut Sep 2, 2022
7c37a7a
ci: Fix low-scale CI.
putnopvut Sep 22, 2022
f4a4ac6
Address PR comments.
putnopvut Oct 4, 2022
82a5cb4
Properly quote jinja variable in ansible playbook.
putnopvut Oct 4, 2022
476efcd
Fix problems found during PR review.
putnopvut Oct 5, 2022
097aa1e
Un-stringify the things I stringified before.
putnopvut Oct 6, 2022
d4ace38
Add garbage collection debugging.
putnopvut Oct 6, 2022
9d04740
Use full path for SSH key in example deployment file.
putnopvut Oct 6, 2022
5aa3e71
Only monitor SB tables we care about.
putnopvut Oct 6, 2022
908aac1
Copy HTML files in mine_data().
putnopvut Oct 6, 2022
3310aac
Run process monitor in the ovn-tester container.
putnopvut Oct 6, 2022
0805ecb
Use poll instead of select
putnopvut Oct 6, 2022
14bcd83
Remove timedelta stuff from latency.py
putnopvut Oct 6, 2022
a2da1a2
Fix spurious double insert exception.
putnopvut Oct 6, 2022
a1af949
Address some more PR comments:
putnopvut Oct 11, 2022
887477b
Start process monitor from ansible.
putnopvut Oct 11, 2022
1dc44c1
Use the first IP address in the node_net for the tester.
putnopvut Oct 11, 2022
fd5429f
Log in UTC.
putnopvut Oct 11, 2022
98d1422
README: Indicate the need to re-install after updating ovn-tester.
putnopvut Oct 17, 2022
0653426
process-monitor: Get it working for ovn-tester.
putnopvut Oct 17, 2022
6cfcf1a
Move physical deployment to tester container when running test.
putnopvut Oct 31, 2022
d54b2b2
Fix incorrect copying of html files.
putnopvut Oct 31, 2022
c9bddc4
Use $(hostname) instead of other methods for cirrus CI.
putnopvut Oct 31, 2022
1d157dc
Stay in the rundir when installing ovn-tester.
putnopvut Oct 31, 2022
1985b31
Die if running the test fails.
putnopvut Oct 31, 2022
78fa2fc
Use /usr/bin/env python instead of python3.
putnopvut Oct 31, 2022
f64c623
Don't delete ovn-tester ports from br-ovn when deploying.
putnopvut Oct 31, 2022
2e19451
Add some comments in places where the logic may be non-obvious.
putnopvut Oct 31, 2022
b878247
Do not tag tester image after pulling.
putnopvut Nov 1, 2022
67d6407
Account for potential overflow of nb_cfg in the DB.
putnopvut Nov 2, 2022
b0d9ade
Fix pep8 issues.
putnopvut Nov 2, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions .cirrus.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@ low_scale_task:

env:
DEPENDENCIES: git ansible podman podman-docker
PHYS_DEPLOYMENT: ${CIRRUS_WORKING_DIR}/physical-deployments/localhost.yml
PHYS_DEPLOYMENT: ${CIRRUS_WORKING_DIR}/physical-deployments/ci.yml

runtime_cache:
folder: runtime-cache

configure_ssh_script:
- mkdir -p /root/.ssh/
- ssh-keygen -t rsa -N '' -q -f /root/.ssh/id_rsa
- ssh-keyscan localhost >> /root/.ssh/known_hosts
- ssh-keyscan $(hostname) >> /root/.ssh/known_hosts
- cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
- chmod og-wx /root/.ssh/authorized_keys
- ssh root@localhost -v echo Hello
- ssh root@$(hostname) -v echo Hello

install_dependencies_script:
- dnf install -y ${DEPENDENCIES}
Expand All @@ -30,6 +30,7 @@ low_scale_task:
- tar -xzf runtime-cache/runtime.tar.gz || true

install_script:
- 'sed -i "s/<host>/$(hostname)/g" ${PHYS_DEPLOYMENT}'
- ./do.sh install

pack_caches_script:
Expand Down
12 changes: 12 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM ovn/ovn-multi-node

ARG SSH_KEY

COPY ovn-tester /ovn-tester

RUN mkdir -p /root/.ssh/
COPY $SSH_KEY /root/.ssh/

COPY ovn-fake-multinode-utils/process-monitor.py /tmp/

RUN pip3 install -r /ovn-tester/requirements.txt
58 changes: 41 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,22 @@ insecure docker registries, cleanup existing docker containers).

## Physical topology

* TESTER: One machine to run the tests which needs to be able to SSH
paswordless (preferably as `root`) to all other machines in the topology.
Performs the following:
* ORCHESTRATOR: One machine that needs to be able to SSH paswordless
(preferably as `root`) to all other machines in the topology. Performs the
following:
- prepares the test enviroment: clone the specified versions of `OVS` and
`OVN` and build the `ovn-fake-multinode` image to be used by the `OVN`
nodes.
- provisions all other `OVN` nodes with the required software packages
and with the correct version of `ovn-fake-multinode` to run simulated/fake
`OVN` chassis.
- runs a docker registry where the `ovn-fake-multinode` (i.e.,
`ovn/ovn-multi-node`) image is pushed and from which all other `OVN`
nodes will pull the image.
- runs the scale test scenarios.
`ovn/ovn-multi-node`) and `ovn-tester` images are pushed and from which all
other `OVN` nodes will pull the image.

* TESTER: One machine to run the `ovn-tester` container which runs the python
ovn-tester code. Like the ORCHESTRATOR, the TESTER also needs to be able to
SSH passwordless to all other machines in the topology.
* OVN-CENTRAL: One machine to run the `ovn-central` container(s) which
run `ovn-northd` and the `Northbound` and `Southbound` databases.
* OVN-WORKER-NODE(s): Machines to run `ovn-netlab` container(s), each of
Expand All @@ -41,12 +43,12 @@ single L2 switch. This interface will be used for traffic to/from the
`Northbound` and `Southbound` databases and for tunneled traffic.

**NOTE**: there's no restriction regarding physical machine roles so for
debugging issues the TESTER, OVN-CENTRAL and OVN-WORKER-NODEs can all
be the same physical machine in which case there's no need for the secondary
Ethernet interface to exist.
debugging issues the ORCHESTRATOR, TESTER, OVN-CENTRAL and OVN-WORKER-NODEs can
all be the same physical machine in which case there's no need for the
secondary Ethernet interface to exist.

## Sample physical topology:
* TESTER: `host01.mydomain.com`
* ORCHESTRATOR: `host01.mydomain.com`
* OVN-CENTRAL: `host02.mydomain.com`
* OVN-WORKER-NODEs:
- `host03.mydomain.com`
Expand All @@ -55,14 +57,22 @@ Ethernet interface to exist.
OVN-CENTRAL and OVN-WORKER-NODEs all have Ethernet interface `eno1`
connected to a physical switch in a separate VLAN, as untagged interfaces.

## Minimal requirements on the TESTER node (tested on Fedora 32)
**NOTE**: The hostnames specified in the physical topology are used by both
the ORCHESTRATOR and by the `ovn-tester` container running in the TESTER.
Therefore, the values need to be resolvable by both of these entities and
need to resolve to the same host. `localhost` will not work since this does
not resolve to a unique host.

## Minimal requirements on the ORCHESTRATOR node (tested on Fedora 32)

### Install required packages:
```
dnf install -y git ansible
```

### Make docker work with Fedora 32 (disable cgroup hierarchy):
## Minimal requirements on the TESTER node (tested on Fedora 36)

### Make docker work with Fedora 32+ (disable cgroup hierarchy):

```
dnf install -y grubby
Expand Down Expand Up @@ -107,10 +117,16 @@ A sample file written for the deployment described above is available at

The file should contain the following mandatory sections and fields:
- `registry-node`: the hostname (or IP) of the node that will store the
docker private registry. In usual cases this is should be the TESTER
docker private registry. In usual cases this is should be the ORCHESTRATOR
machine.
- `internal-iface`: the name of the Ethernet interface used by the underlay
(DB and tunnel traffic). This can be overridden per node if needed.
- `tester-node`:
- `name`: the hostname (or IP) of the node that will run `ovn-tester` (the
python code that performs the actual test)
- `ssh_key`: An ssh private key to install in the TESTER that can be used
to communicate with the other machines in the cluster.
Default: `~/.ssh/id_rsa`
- `central-node`:
- `name`: the hostname (or IP) of the node that will run `ovn-central`
(`ovn-northd` and databases).
Expand Down Expand Up @@ -162,6 +178,9 @@ This step will:
can be enabled by setting the `EXTRA_OPTIMIZE=yes` environment variable
(`EXTRA_OPTIMIZE=yes ./do.sh install`).
- push the container image to all other nodes and prepare the test environment.
- build the `ovn/ovn-tester` container image which will be used by the TESTER
node to run the ovn-tester application.
- push the `ovn/ovn-tester` container image to the TESTER node.

To override the OVS, OVN or ovn-fake-multinode repos/branches use the
following environment variables:
Expand All @@ -176,6 +195,10 @@ cd ~/ovn-heater
OVS_REPO=https://github.com/dceara/ovs OVS_BRANCH=tmp-branch OVN_REPO=https://github.com/dceara/ovn OVN_BRANCH=tmp-branch-2 ./do.sh install
```

NOTE: Because the installation step is responsible for deploying the ovn-tester
container to the TESTER, this means that if any changes are made to the
ovn-tester application, the installation step must be re-run.

## Perform a reinstallation (e.g., new OVS/OVN versions are needed):

For OVS, OVN or ovn-fake-multinode code changes to be reflected the
Expand Down Expand Up @@ -231,10 +254,11 @@ cd ~/ovn-heater
./do.sh run <scenario> <results-dir>
```

This executes `<scenario>` on the physical deployment. Current
scenarios also cleanup the environment, i.e., remove all docker containers
from all physical nodes. **NOTE**: If the environment needs to be explictly
cleaned up, we can also execute before running the scenario:
This executes `<scenario>` on the physical deployment (specifically on the
`ovn-tester` container on the TESTER). Current scenarios also cleanup the
environment, i.e., remove all docker containers from all physical nodes.
**NOTE**: If the environment needs to be explictly cleaned up, we can also
execute before running the scenario:

```
cd ~/ovn-heater
Expand Down
66 changes: 52 additions & 14 deletions do.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
set -o errexit -o pipefail

topdir=$(pwd)
rundir=${topdir}/runtime
rundir_name=runtime
rundir=${topdir}/${rundir_name}
deployment_dir=${topdir}/physical-deployments
result_dir=${topdir}/test_results

Expand All @@ -17,6 +18,8 @@ ovn_fmn_playbooks=${ovn_fmn_utils}/playbooks
ovn_fmn_generate=${ovn_fmn_utils}/generate-hosts.py
ovn_fmn_docker=${ovn_fmn_utils}/generate-docker-cfg.py
ovn_fmn_podman=${ovn_fmn_utils}/generate-podman-cfg.py
ovn_fmn_get=${ovn_fmn_utils}/get-config-value.py
ovn_fmn_ip=${rundir}/ovn-fake-multinode/ip_gen.py
hosts_file=${rundir}/hosts
installer_log_file=${rundir}/installer-log
docker_daemon_file=${rundir}/docker-daemon.json
Expand All @@ -26,7 +29,6 @@ log_perf_file=${rundir}/perf.sh
process_monitor_file=${rundir}/process-monitor.py

ovn_tester=${topdir}/ovn-tester
ovn_tester_log_file=test-log

EXTRA_OPTIMIZE=${EXTRA_OPTIMIZE:-no}
USE_OVSDB_ETCD=${USE_OVSDB_ETCD:-no}
Expand Down Expand Up @@ -92,7 +94,7 @@ function install_venv() {
python3 -m virtualenv ${ovn_heater_venv}
fi
source ${ovn_heater_venv}/bin/activate
pip install -r ${ovn_tester}/requirements.txt
pip install -r ${topdir}/utils/requirements.txt
deactivate
popd
}
Expand Down Expand Up @@ -228,6 +230,17 @@ function install_ovn_fake_multinode() {
popd
}

function install_ovn_tester() {
ssh_key=$(${ovn_fmn_get} ${phys_deployment} tester-node ssh_key)
putnopvut marked this conversation as resolved.
Show resolved Hide resolved
# We need to copy the files into a known directory within the Docker
# context directory. Otherwise, Docker can't find the files we reference.
cp ${ssh_key} .
ssh_key_file=${rundir_name}/$(basename ${ssh_key})
docker build -t ovn/ovn-tester --build-arg SSH_KEY=${ssh_key_file} -f ${topdir}/Dockerfile ${topdir}
docker tag ovn/ovn-tester localhost:5000/ovn/ovn-tester
docker push localhost:5000/ovn/ovn-tester
}

# Prepare OVS bridges and cleanup containers.
function init_ovn_fake_multinode() {
echo "-- Initializing ovn-fake-multinode cluster on all nodes"
Expand All @@ -240,6 +253,10 @@ function pull_ovn_fake_multinode() {
ansible-playbook ${ovn_fmn_playbooks}/pull-fake-multinode.yml -i ${hosts_file}
}

function pull_ovn_tester() {
ansible-playbook ${ovn_fmn_playbooks}/pull-ovn-tester.yml -i ${hosts_file}
}

function install() {
pushd ${rundir}
install_deps
Expand All @@ -248,6 +265,8 @@ function install() {
install_ovn_fake_multinode
init_ovn_fake_multinode
pull_ovn_fake_multinode
install_ovn_tester
pull_ovn_tester
putnopvut marked this conversation as resolved.
Show resolved Hide resolved
popd
}

Expand Down Expand Up @@ -279,11 +298,12 @@ function record_test_config() {

function mine_data() {
out_dir=$1
tester_host=$2

echo "-- Mining data from logs in: ${out_dir}"


pushd ${out_dir}

mkdir -p mined-data
for p in ovn-northd ovn-controller ovn-nbctl; do
logs=$(find ${out_dir}/logs -name ${p}.log)
Expand All @@ -301,7 +321,8 @@ function mine_data() {
grep ovn-installed ${logs} | cut -d ':' -f 2- | tr '|' ' ' \
| cut -d ' ' -f 1,7 | tr 'T' ' ' | sort > mined-data/ovn-installed.log

python3 ${topdir}/utils/latency.py "$(date +%z)" \
source ${rundir}/${ovn_heater_venv}/bin/activate
python3 ${topdir}/utils/latency.py \
./mined-data/ovn-binding.log ./mined-data/ovn-installed.log \
> mined-data/binding-to-ovn-installed-latency

Expand All @@ -321,9 +342,23 @@ function mine_data() {
| grep ovn-scale | head -3)
python3 ${topdir}/utils/process-stats.py \
resource-usage-report-worker.html ${resource_usage_logs}
deactivate

popd
}

function get_tester_ip() {
local test_file=$1

# The tester gets the first IP address in the configured node_net.
node_net=$(${ovn_fmn_get} ${test_file} cluster node_net --default=192.16.0.0/16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: now we hardcode 192.16.0.0/16 in two places: here and in ovn_tester.py. Do you think there's an easy way to just define this default in a single place?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this, and there are a few ways that could work.

  1. Use an environment variable to define default values and ensure this same environment variable is used both in the orchestrator and the tester container.
  2. Define default values in some file, and refer to this file from do.sh and ovn-tester.
  3. Write a python module similar to ovn-fake-multinode-tools/get-config-value.py and put it in ovn-tester and define defaults in that module. Then do.sh and ovn-tester could both call into this module to get configuration values.
  4. Use a template value in ovn_tester.py that can be filled in by do.sh at install-time.

Of these, I think 1 is clearly the worst. 2 is simple but also a bit clumsy. For me, it's a toss-up whether 3 or 4 is better. I think I would lean more towards option 3 myself. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that is what you meant in 4, but one thing I had in mind previously was:

  1. Make ovn-tester fail if at least one variable is not defined in the yml file, i.e. no defaults.
  2. Make do.sh generate the full yml from a test scenario yml filling all the undefined values with defaults.
  3. Run ovn-tester with a generated fully filled yml.

This will ensure that we do not have defaults defined in multiple places. Will also be easier to track down undefined/misconfigured values if ovn-tester will just fail and not proceed using unexpected config values.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds interesting! It also sounds like something we can do as a follow up in my opinion. Shall we open an issue to track this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igsilya Sorry for the ambiguity on number 4. What I meant was that ovn_tester.py would have something like this in it:

DEFAULT_NODE_NET = {{ default_node_net }}
node_net = netaddr.IPNetwork(cluster_args.get('node_net', DEFAULT_NODE_NET))

We might also rename it to ovn_tester.py.in to make it clear that it can't be run as-is. Then, when running do.sh install, the do.sh code will populate {{ default_node_net }} with the proper default. This way, the default only is defined in do.sh.
It's not my favorite suggestion, but it's something I thought of.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@putnopvut Hmm, I'm not sure I like the idea of an ovn_tester.py.in. I think I'd prefer @igsilya's suggestion of do.sh filling in all required input values in the yml we finally pass to ovn-tester.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why from the 4 suggestions I came up with, 3 was the one I preferred most. Mine and @igsilya's are more-or-less the same, except that in his, do.sh controls the defaults, and in mine, ovn-tester controls the defaults.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, your option "3" is also fine, thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually implemented @igsilya's idea at https://github.com/putnopvut/ovn-heater/tree/default_dumper . I was hoping to be able to open a PR here, but github does not allow using a PR as the basis for another PR. Therefore, when I created the PR, it was way larger and more confusing than necessary. I'll hold off and post that PR once this one is merged.

node_cidr=${node_net#*/}
node_ip=${node_net%/*}
ip_index=1
tester_ip=$(${ovn_fmn_ip} ${node_net} ${node_ip} ${ip_index})
echo "${tester_ip}/${node_cidr}"
}

function run_test() {
local test_file=$1
local out_dir=$2
Expand All @@ -338,11 +373,14 @@ function run_test() {
# Perform a fast cleanup by doing a minimal redeploy.
init_ovn_fake_multinode

source ${rundir}/${ovn_heater_venv}/bin/activate
pushd ${out_dir}
tester_ip=$(get_tester_ip ${test_file})
if ! ansible-playbook ${ovn_fmn_playbooks}/run-tester.yml -i ${hosts_file} --extra-vars "test_file=${test_file} tester_ip=${tester_ip} phys_deployment=${phys_deployment}" ; then
die "-- Failed to set up test!"
fi

if ! python -u ${ovn_tester}/ovn_tester.py $phys_deployment ${test_file} 2>&1 | tee ${ovn_tester_log_file}; then
echo "-- Failed to run test! Check logs at: $PWD/${ovn_tester_log_file}"
tester_host=$(${ovn_fmn_get} ${phys_deployment} tester-node name)
if ! ssh root@${tester_host} docker exec ovn-tester python3 -u /ovn-tester/ovn_tester.py /physical-deployment.yml /test-scenario.yml 2>&1 | tee ${out_dir}/test-log ; then
echo "-- Failed to run test. Check logs at: ${out_dir}/test-log"
dceara marked this conversation as resolved.
Show resolved Hide resolved
fi

echo "-- Collecting logs to: ${out_dir}"
Expand All @@ -352,19 +390,19 @@ function run_test() {
for f in *.tgz; do
tar xvfz $f
done
popd
# Prior to containerization of ovn-tester, HTML files written by ovn-tester
# were written directly to ${out_dir}. To make things easier for tools, we
# copy the HTML files back to this original location.
cp ${tester_host}/ovn-tester/*.html ${out_dir} || true

# Once we successfully ran the test and collected its logs, the post
# processing (e.g., data mining) can run in a subshell with errexit
# disabled. We don't want the whole thing to error out if the post
# processing fails.
(
set +o errexit
mine_data ${out_dir}
mine_data ${out_dir} ${tester_host}
)

popd
deactivate
}

function usage() {
Expand Down
Loading