Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot-server): create isolation for tests and run them in parallel #11517

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion robot-server/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ sdist_file = dist/$(call python_get_sdistname,robot-server,robot_server)
# specified test
tests ?= tests
cov_opts ?= --cov=$(SRC_PATH) --cov-report term-missing:skip-covered --cov-report xml:coverage.xml
test_opts ?=
test_opts ?= -n auto --dist loadscope
Copy link
Contributor

@SyntaxColoring SyntaxColoring Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your experimentation, does it seem like loadscope has the intended effect? The online docs suggest it doesn't actually follow fixture scope; it's something coarser-grained based on files?

I guess if it misbehaves, the effect would be that expensive fixtures are redundantly set up across processes, which will be slower than expected but should at least be safe if the tests are written properly?

Copy link
Contributor

@SyntaxColoring SyntaxColoring Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this more, I think we have performance concerns and safety concerns, but it's not loadscope's fault, per se.


On my dual-core 2018 MacBook Air:

Serial (old) Parallel (pytest-xdist)
Run time (wall clock) 3m25s 3m45s
CPU utilization 50% 100%
Worker processes 1 4

The parallel test suite is slower despite higher CPU utilization. This suggests to me that the tests are being distributed badly.

pytest-xdist worker process can initialize fixtures redundantly if test functions are distributed badly to them. For example, say you have 4 test functions that use the same scope="session" fixture. Then, pytest-xdist happens to distribute each of those test functions to a different worker process. This will cause the fixture to execute 4 times, doing 4x more work than necessary.

My theory is that we're running into this with some of our expensive fixtures, like run_server, and the penalty of doing more work is outweighing the benefit of doing that work in parallel.

If this theory is true, any performance improvements or regressions introduced by this PR would be largely luck-based: how many cores does the machine have, and how did pytest-xdist decide to distribute across them?

To fix this, we could either:

  • Make run_server (and other expensive scope="session" fixtures) truly global across worker processes, so they're not executed redundantly even if pytest-xdists distributes tests badly.
    • This would make scope="session" fixtures behave more closely to how we'd all hope and intuitively expect.
    • However, because run_server is so stateful, I think this is a dangerous path. Our integration tests can and do leak into each other, and if they do that in parallel or in an unpredictable order, it will cause flakiness and confusing failures.
  • Help pytest-xdist distribute tests better.
    • Ideally, pytest-xdist would schedule things intelligently based on which tests use which fixtures. There's a good and very old proposal for this, but it seems to have stalled.
    • So we might have to write a custom pytest-xdist scheduler, which is a quasi-documented part of the pytest-xdist API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the insight as always Max. If tavern supported parameterized marks other than skipif we could upgrade pytest-xdist to 2.5 and use --dist loadgroup with @pytest.mark.xdist_group(name="group1") but it does not.
So I think we can keep the fixture scoping, add marks to tests optimally run in xdist, then run some tests using xdist and some not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Can we spell out full-length command line options, for readability?

Suggested change
test_opts ?= -n auto --dist loadscope
test_opts ?= --numprocesses auto --dist loadscope


# Host key location for buildroot robot
br_ssh_key ?= $(default_ssh_key)
Expand Down
154 changes: 125 additions & 29 deletions robot-server/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
import subprocess
import time
import sys
import tempfile
import os
import shutil
import json
import pathlib
import requests
import pytest
import socket

from contextlib import closing
from datetime import datetime, timezone
from fastapi import routing
from mock import MagicMock
Expand Down Expand Up @@ -112,28 +112,63 @@ def api_client_no_errors(override_hardware: None) -> TestClient:
return client


@pytest.fixture(scope="session")
def request_session() -> requests.Session:
def _request_session() -> requests.Session:
session = requests.Session()
session.headers.update({API_VERSION_HEADER: LATEST_API_VERSION_HEADER_VALUE})
return session
Comment on lines +115 to 118
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this existed before this PR, but a Session is a resource that should be close()d, since it represents a connection pool (among other things), and we don't want to leak TCP connections.

I think we should rewrite this like:

@contextmanager
def _request_session() -> Generator[requests.Session, None, None]:
    with requests.Session() as session:
        session.headers.update({API_VERSION_HEADER: LATEST_API_VERSION_HEADER_VALUE})
        yield session

This ties into my other comments about using regular Python functions more and Pytest fixtures less.



@pytest.fixture(scope="session")
def server_temp_directory() -> Iterator[str]:
new_dir = tempfile.mkdtemp()
def request_session() -> requests.Session:
return _request_session()


@pytest.fixture(scope="function")
def function_scope_request_session() -> requests.Session:
return _request_session()
Comment on lines 121 to +128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These request_session and function_scope_request_session fixtures are only used in this file, correct? If I'm reading things correctly, they're just helpers to set up run_server and function_scope_run_server, which is what the tests actually care about?

I think our life gets a bit easier if we don't have fixtures for request_session and function_scope_request_session. If something in this file needs a Session, let it call _request_session() directly. No need to use Pytest's dependency injection machinery for it.

Per your review request, this will help deduplicate the code between run_server and function_scope_run_server. See my other comment.



@pytest.fixture(scope="function")
def function_scope_server_temp_directory(tmp_path: Path) -> str:
new_dir: str = str(tmp_path.resolve())
os.environ["OT_API_CONFIG_DIR"] = new_dir
config.reload()
return new_dir


@pytest.fixture(scope="session")
def server_temp_directory(tmp_path_factory: pytest.TempPathFactory) -> str:
new_dir: str = str(tmp_path_factory.mktemp("temp-robot").resolve())
os.environ["OT_API_CONFIG_DIR"] = new_dir
config.reload()
return new_dir

yield new_dir
shutil.rmtree(new_dir)

del os.environ["OT_API_CONFIG_DIR"]
def _free_port() -> str:
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
sock.bind(("localhost", 0))
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
return str(sock.getsockname()[1])
Comment on lines +147 to +151
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this came from https://stackoverflow.com/a/45690594/497934, right?

I need to refresh my memory on how port allocation stuff like this works, but does it seem fishy that we close the socket when we return the port number? I worry that we're doing this:

  1. Open a socket and bind it to an automatically-chosen port
  2. Get the port that we just chose automatically
  3. Return the port and close the socket, thereby unintentionally freeing the port?

I'll dig around and see if there's a better way to do this, but I wonder if this whole thing needs to become a context manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I've gathered:

Given all of that, I propose we ditch this port auto-allocation and just use random() port numbers. I figure if things are going to be dodgy no matter what, we can at least keep the code simple.

If random port numbers prove insufficient, we can either:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will run with the random port and see how it goes.



@pytest.fixture(scope="session")
def free_port() -> str:
return _free_port()


@pytest.fixture(scope="module")
def module_scope_free_port() -> str:
return _free_port()


@pytest.fixture(scope="function")
def function_scope_free_port() -> str:
return _free_port()


@pytest.fixture(scope="session")
def run_server(
Comment on lines 169 to 170
Copy link
Contributor

@SyntaxColoring SyntaxColoring Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For flakiness reasons, I don't think run_server can remain scope="session" in a parallel world.

My understanding is that run_server is used primarily (or exclusively?) by our Tavern integration tests. Historically, these tests have shared a single run_server per test session, for performance reasons.

Our Tavern tests occasionally leak and affect each other. For example, maybe one test leaves behind a run resource, which affects what subsequent tests see when they do GET /runs.

So far, the combination of these facts hasn't caused any flakiness, because pytest always runs the tests serially and in an order that's consistent in practice.

But in a pytest-xdist parallelized world, tests will be allocated to dev servers unpredictably. An integration test might flakily succeed or fail depending on what other tests happened to get allocated to the same server before it.

I think we either need to:

request_session: requests.Session, server_temp_directory: str
request_session: requests.Session, server_temp_directory: str, free_port: str
) -> Iterator["subprocess.Popen[Any]"]:
"""Run the robot server in a background process."""
# In order to collect coverage we run using `coverage`.
Expand All @@ -154,7 +189,7 @@ def run_server(
"--host",
"localhost",
"--port",
"31950",
free_port,
],
env={
"OT_ROBOT_SERVER_DOT_ENV_PATH": "dev.env",
Expand All @@ -171,37 +206,96 @@ def run_server(

while True:
try:
request_session.get("http://localhost:31950/health")
request_session.get(f"http://localhost:{free_port}/health")
except ConnectionError:
pass
else:
break
time.sleep(0.5)
request_session.post(
"http://localhost:31950/robot/home", json={"target": "robot"}
f"http://localhost:{free_port}/robot/home", json={"target": "robot"}
)
yield proc
proc.send_signal(signal.SIGTERM)
proc.wait()


@pytest.fixture
def attach_pipettes(server_temp_directory: str) -> Iterator[None]:
@pytest.fixture(scope="function")
def function_scope_run_server(
function_scope_request_session: requests.Session,
function_scope_server_temp_directory: str,
function_scope_free_port: str,
) -> Iterator["subprocess.Popen[Any]"]:
Comment on lines +223 to +228
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing your review question about deduplication, I think this gets a lot easier if we use regular Python functions more and Pytest fixtures less.

For example, we could have a _run_server() function that's a regular Python context manager that runs the server and returns its URI or something.

And then, if we want to reuse the same server across all tests in a file, we can easily wrap that plain Python function in a 5-line module-scope fixture.

For example, this is what we do in one of our integration tests:

# Module-scope to avoid the overhead of restarting the server between test functions.
# This relies on the test functions only reading, never writing.
@pytest.fixture(scope="module")
def dev_server(module_scope_free_port: str) -> Generator[DevServer, None, None]:
port = module_scope_free_port
with DevServer(
port=port,
persistence_directory=_OLDER_PERSISTENCE_DIR,
) as server:
server.start()
yield server

In other words, the idea is to keep our reusable building blocks defined as plain Python functions, because they're easy to compose; and only wrap them all up in a properly-scoped fixture at the top level.

"""Run the robot server in a background process."""
# In order to collect coverage we run using `coverage`.
# `-a` is to append to existing `.coverage` file.
# `--source` is the source code folder to collect coverage stats on.
with subprocess.Popen(
[
sys.executable,
"-m",
"coverage",
"run",
"-a",
"--source",
"robot_server",
"-m",
"uvicorn",
"robot_server:app",
"--host",
"localhost",
"--port",
function_scope_free_port,
],
env={
"OT_ROBOT_SERVER_DOT_ENV_PATH": "dev.env",
"OT_API_CONFIG_DIR": function_scope_server_temp_directory,
},
stdin=subprocess.DEVNULL,
# The server will log to its stdout or stderr.
# Let it inherit our stdout and stderr so pytest captures its logs.
stdout=None,
stderr=None,
) as proc:
# Wait for a bit to get started by polling /hcpealth
from requests.exceptions import ConnectionError

while True:
try:
function_scope_request_session.get(
f"http://localhost:{function_scope_free_port}/health"
)
except ConnectionError:
pass
else:
break
time.sleep(0.5)
function_scope_request_session.post(
f"http://localhost:{function_scope_free_port}/robot/home",
json={"target": "robot"},
)
yield proc
proc.send_signal(signal.SIGTERM)
proc.wait()


@pytest.fixture(scope="function")
def attach_pipettes(function_scope_server_temp_directory: str) -> None:
import json

pipette = {"dropTipShake": True, "model": "p300_multi_v1"}

pipette_dir_path = os.path.join(server_temp_directory, "pipettes")
pipette_dir_path = os.path.join(function_scope_server_temp_directory, "pipettes")
pipette_file_path = os.path.join(pipette_dir_path, "testpipette01.json")

with open(pipette_file_path, "w") as pipette_file:
json.dump(pipette, pipette_file)
yield
os.remove(pipette_file_path)


@pytest.fixture
def set_up_pipette_offset_temp_directory(server_temp_directory: str) -> None:
@pytest.fixture(scope="function")
def set_up_pipette_offset_temp_directory(
function_scope_server_temp_directory: str,
) -> None:
attached_pip_list = ["123", "321"]
mount_list = [Mount.LEFT, Mount.RIGHT]
definition = labware.get_labware_definition("opentrons_96_filtertiprack_200ul")
Expand All @@ -216,8 +310,8 @@ def set_up_pipette_offset_temp_directory(server_temp_directory: str) -> None:
)


@pytest.fixture
def set_up_tip_length_temp_directory(server_temp_directory: str) -> None:
@pytest.fixture(scope="function")
def set_up_tip_length_temp_directory(function_scope_server_temp_directory: str) -> None:
attached_pip_list = ["123", "321"]
tip_length_list = [30.5, 31.5]
definition = labware.get_labware_definition("opentrons_96_filtertiprack_200ul")
Expand All @@ -229,8 +323,10 @@ def set_up_tip_length_temp_directory(server_temp_directory: str) -> None:
modify.save_tip_length_calibration(pip, cal)


@pytest.fixture
def set_up_deck_calibration_temp_directory(server_temp_directory: str) -> None:
@pytest.fixture(scope="function")
def set_up_deck_calibration_temp_directory(
function_scope_server_temp_directory: str,
) -> None:
attitude = [[1.0008, 0.0052, 0.0], [-0.0, 0.992, 0.0], [0.0, 0.0, 1.0]]
modify.save_robot_deck_attitude(attitude, "pip_1", "fakehash")

Expand All @@ -243,18 +339,18 @@ def session_manager(hardware: HardwareControlAPI) -> SessionManager:
)


@pytest.fixture
@pytest.fixture(scope="function")
def set_disable_fast_analysis(
request_session: requests.Session,
function_scope_request_session: requests.Session, function_scope_free_port: str
) -> Iterator[None]:
"""For integration tests that need to set then clear the
enableHttpProtocolSessions feature flag"""
url = "http://localhost:31950/settings"
url = f"http://localhost:{function_scope_free_port}/settings"
data = {"id": "disableFastProtocolUpload", "value": True}
request_session.post(url, json=data)
function_scope_request_session.post(url, json=data)
yield None
data["value"] = None
request_session.post(url, json=data)
function_scope_request_session.post(url, json=data)


@pytest.fixture
Expand Down
1 change: 0 additions & 1 deletion robot-server/tests/integration/common.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,3 @@ description: Variables and other info used across tests

variables:
host: http://localhost
port: 31950
4 changes: 1 addition & 3 deletions robot-server/tests/integration/dev_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@


class DevServer:
def __init__(
self, port: str = "31950", persistence_directory: Optional[Path] = None
) -> None:
def __init__(self, port: str, persistence_directory: Optional[Path] = None) -> None:
"""Initialize a dev server."""
self.server_temp_directory: str = tempfile.mkdtemp()
self.persistence_directory: Path = (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,24 @@ test_name: home standalone command

marks:
- usefixtures:
- free_port
- run_server

stages:
- name: Delete all runs # Ensure there is no active run
request:
url: '{host:s}:{port:d}/runs'
url: '{host:s}:{free_port:s}/runs'
method: GET
response:
status_code: 200
verify_response_with:
- function: 'tests.integration.fixtures:delete_all_runs'
extra_kwargs:
host: '{host:s}'
port: '{port:d}'
port: '{free_port:s}'
- name: issue home Command params all
request:
url: '{host:s}:{port:d}/commands'
url: '{host:s}:{free_port:s}/commands'
method: POST
params:
waitUntilComplete: true
Expand All @@ -45,7 +46,7 @@ stages:
command_id_all: data.id
- name: issue home Command params empty
request:
url: '{host:s}:{port:d}/commands'
url: '{host:s}:{free_port:s}/commands'
method: POST
params:
waitUntilComplete: true
Expand All @@ -65,7 +66,7 @@ stages:
command_id_empty: data.id
- name: Get command by id
request:
url: '{host:s}:{port:d}/commands/{command_id_empty}'
url: '{host:s}:{free_port:s}/commands/{command_id_empty}'
method: GET
response:
strict:
Expand All @@ -77,7 +78,7 @@ stages:
status: succeeded
- name: Get commands
request:
url: '{host:s}:{port:d}/commands'
url: '{host:s}:{free_port:s}/commands'
method: GET
response:
strict:
Expand All @@ -89,7 +90,7 @@ stages:
- id: '{command_id_empty}'
- name: Create Empty Run
request:
url: '{host:s}:{port:d}/runs'
url: '{host:s}:{free_port:s}/runs'
json:
data: {}
method: POST
Expand All @@ -107,7 +108,7 @@ stages:
run_id: data.id
- name: issue home Command
request:
url: '{host:s}:{port:d}/commands'
url: '{host:s}:{free_port:s}/commands'
method: POST
params:
waitUntilComplete: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ test_name: loadModule command failure

marks:
- usefixtures:
- free_port
- run_server
Comment on lines +5 to 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refactor the free_port and run_server fixtures into a common fixture called something like server_url_base or global_server_url_base.

The value of this fixture would be a string like http://localhost:12345, pointing to a running dev server. The Tavern tests would use it like url: '{server_url_base}/runs'.

The Tavern tests would not explicitly use a run_server fixture.

- parametrize:
key: model
Expand All @@ -11,7 +12,7 @@ marks:
stages:
- name: Create Empty Run
request:
url: '{host:s}:{port:d}/runs'
url: '{host:s}:{free_port:s}/runs'
json:
data: {}
method: POST
Expand All @@ -29,7 +30,7 @@ stages:
run_id: data.id
- name: Create loadModule Command
request:
url: '{host:s}:{port:d}/runs/{run_id}/commands'
url: '{host:s}:{free_port:s}/runs/{run_id}/commands'
method: POST
params:
waitUntilComplete: true
Expand Down
Loading