Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuebot/rqd] Add feature to run frames on a containerized environment using docker #1549

Merged

Conversation

DiegoTavares
Copy link
Collaborator

@DiegoTavares DiegoTavares commented Oct 18, 2024

Motivation

Running OpenCue In a multi operational system environment requires segregating the farm, which means hosts have to be assigned to one OS and cannot be shared between shows that have different OS requirements. This can be a challenge when sharing resources between shows is necessary.

Proposed solution

A new execution mode on rqd runDocker to live alongside runLinux, runWindows, and runDarwin (macOs). This mode will launch the frame command on a docker container based on the frame expected OS. With this, rqd is now able to run jobs from different OSs on the same host.

But to make this possible, a rqd host needs to advertise itself not with its own OS code (defined by SP_OS on rqd.conf), but with all the OSs of images it is capable of executing.

Configuration changes

The following sections were added to rqd.conf:

[docker.config]
# Setting this to True requires all the additional "docker.[]" sections to be filled
RUN_ON_DOCKER=True

# This section is only required if RUN_ON_DOCKER=True
# List of volume mounts following docker run's format, but replacing = with :
[docker.mounts]
TEMP=type:bind,source:/tmp,target:/tmp,bind-propagation:slave
NET=type:bind,source:/net,target:/net,bind-propagation:slave

# This section is only required if RUN_ON_DOCKER=True
#  - keys represent OSs this rqd is capable of executing jobs in
#  - values are docker image tags
[docker.images]
centos7=centos7.3:latest
rocky9=rocky9.3:latest

In this case, the rqd host would advertise itself with OS=centos7,rocky9, and the dispatch logic has been changed accordingly to account for dispatching frames to nodes that support multiple OSs.

DiegoTavares and others added 29 commits October 16, 2024 15:55
When RUN_ON_DOCKER is set on rqd.conf, each frame will be launched as a docker container using the base image configured as DOCKER_IMAGE.
When RUN_ON_DOCKER is set on rqd.conf, each frame will be launched as a
docker container using the base image configured as DOCKER_IMAGE.
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
Logging was added on the wrong scope, which led to a "Frame not found in cache" when a frame was actually found.
New spec is required to allow passing the layer's expected OS.
When rqd is running on docker mode, it can report multiple supported OSs. On rqd.conf, multiple images can be provided under [docker.images] and each image refers to a supported OS.
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
…ation#1550)

Signed-off-by: Diego Tavares <dtavares@imageworks.com>
Previously it was safe to use the host's OS when querying for procs, now the job OS needs to be used as a host can have multiple OSs.
To be able to run as the frame's owner, the entrypoint needs to ensure the user exists before running the frame's cmd.
Not having nimby installed is an expected event, not an exception.
…le (AcademySoftwareFoundation#1542)

- Updated `viewComments` method in `MenuActions.py` to wrap single Job
objects in a list.
- This prevents `TypeError` when attempting to iterate over a
non-iterable Job object.
…on#1543)

- Add `rocky9` log root to `render_logs.root` in `cuegui.yaml`
… directly (AcademySoftwareFoundation#1547)

**Summarize your change.**
Have changed most tests to use `-m unittest discover` instead og
`setup.py test`

The old `setup.py test` doesn't work in newer versions of python since
it has been deprecated
unittest was not reporting test failures and interruptions as expected, which caused us to be running with failed unit tests for a long time.

This commit replaces unittest with pytest for rqd and fixes some of the relevant unit tests.
…oundation#1554)

Deleting an item from the dict being iterated over on sanitizeFrames
caused the error: "Dictionary changed size during iteration".
…to3 (AcademySoftwareFoundation#1557)

**Link the Issue(s) this Pull Request is related to.**
This is to fix AcademySoftwareFoundation#1555

**Summarize your change.**
Replaces 2to3 with a simple script that adds "from ." in front of pb2
imports.

This is done to support newer versions of python where 2to3 has been
removed.
Since AcademySoftwareFoundation#1308 rqd stopped supporting stats files containing whitespaces and parenthesis.
When RUN_ON_DOCKER is set on rqd.conf, each frame will be launched as a docker container using the base image configured as DOCKER_IMAGE.
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
@DiegoTavares
Copy link
Collaborator Author

This change has been rebased from #1560 to allow running unit tests on rqd.

DiegoTavares and others added 13 commits October 30, 2024 14:14
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
Update temporary sync branch

---------

Signed-off-by: Diego Tavares <dtavares@imageworks.com>
Co-authored-by: Ramon Figueiredo <rfigueiredo@imageworks.com>
Co-authored-by: Jimmy Christensen <Lithorus@gmail.com>
For services as SMTP and others that require direct access to a port, running with network HOST gives frames a similar access to network as they had when running outside of a container
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
When RUN_ON_DOCKER is set on rqd.conf, each frame will be launched as a docker container using the base image configured as DOCKER_IMAGE.
…ation#1550)

Signed-off-by: Diego Tavares <dtavares@imageworks.com>
…demySoftwareFoundation#1570)

Memory properties constantly need to be tuned according to farm
requirements, which makes it a good candidate for becoming a property
instead of a hardcoded constant.
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
Copy link
Collaborator

@ramonfigueiredo ramonfigueiredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGFM

Approved with minor changes.

Thanks!

rqd/rqd/rqconstants.py Outdated Show resolved Hide resolved
rqd/rqd/rqcore.py Show resolved Hide resolved
rqd/rqd/rqcore.py Outdated Show resolved Hide resolved
DiegoTavares added a commit to AcademySoftwareFoundation/opencue.io that referenced this pull request Nov 7, 2024
@lithorus
Copy link
Contributor

Any idea on when this will get merged? I have a PR coming with the loki support which will have several merge conflicts with this branch PR :)

Using the container logs to get the frameId is not reliable. When the container fails quick docker doesn't stream the logs, so a new strategy using container.top() was implemented failing back to the log solution if needed be.
Besides that, also add escaping for " on the frame command being sent to docker.
calling psutil's function cmdline raises the ZombieProcess, which wasn't been caught and caused an interuptino on the rssUpdate loop.
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
@DiegoTavares
Copy link
Collaborator Author

Any idea on when this will get merged? I have a PR coming with the loki support which will have several merge conflicts with this branch PR :)

Today

@DiegoTavares DiegoTavares merged commit 291b694 into AcademySoftwareFoundation:master Nov 15, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants