fix(deadline): configure identity registration settings using RenderQueue backend security group #633
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #632
Problem
See #632.
The root cause of the problem is that the internal
DeploymentInstance
created by theRenderQueue
to configure Deadline Secrets Management identity registration settings performs a number of actions that require network access to AWS service API endpoints and PyPI to fetch theboto3
package.When this instance is deployed in an isolated subnet (no internet access) PyPI cannot be reached. Also, when the
DeploymentInstance
cannot reach VPC endpoints, the user-data commands fail. The user-data is unable to send any signal to CloudFormation and the signal timeout rolls back the CloudFormation deployment.Solution
VPC Endpoint Reachability
To deploying the
RenderQueue
into isolated subnets, the VPC supplied to theRenderQueue
must be configured with sufficient VPC gateway/interface endpoints. The subnets specified by thevpcSubnets
prop supplied to theRenderQueue
must be configured with routes to the endpoints.In the case of VPC interface endpoints, each VPC interface endpoint will have an associated security group. Those security groups must allow ingress from the security groups of the
DeploymentInstance
. In RFDK 0.38.0, theDeploymentInstance
created by theRenderQueue
was created with its own Security Group, and there was no public API for RFDK users to access it. To overcome this, the following changes were made:DeploymentInstance
internal construct to accept asecurityGroup
propery and forward this to theAutoScalingGroup
createdRenderQueue
to construct itsDeploymentInstance
using its existing backend security group (used by theRenderQueue
's Auto-Scaling Group)By having the
DeploymentInstance
share a security group with the other backend infrastructure of theRenderQueue
(currently just the Auto-Scaling Group providing ECS capacity), users can then use theRenderQueue.backendConnections
API to permit network traffic from theDeploymentInstance
to the security group associated with the required VPC interface endpoints.RFDK users who had deployed the
RenderQueue
into isolated subnets prior to RFDK 0.38.0 and used theRenderQueue.backendConnections
API to permit traffic to their VPC interface endpoints will require no further changes. Similarly, if a user had instead usedRenderQueue.asg.connections
to permit access to their VPC interface endpoints will also require no change to their code.PyPI Reachability
To allow the
RenderQueue
is deployed into private subnets with no route to an internet gateway, the logic in theconfigure_identity_registration_settings.py
was changed from using boto3 to instead launch a sub-process to invoke the equivalent AWS CLI commands.Testing
Reproduced the issue with the following CDK app: minimal-rfdk-isolated-rq-deadline-sm-reproducer.tar.gz
To reproduce the problem, you must extract the archive, then from the extracted directory, run the following commands:
npm install npm run stage npm run build npx cdk deploy "*"
To test the fix, I have simply build and packaged the RFDK using this PR branch. Once built and packaged, I then installed the package using these instructions. Finally, I re-deployed with:
npx cdk deploy "*"
and confirmed that the deployment succeeds and that the identity registration settings get applied correctly..
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license