Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-batch-alpha: lambda not authorized to perform batch:SubmitJob after upgrading from 2.69.0 to 2.78.0 #25574

Closed
suzhoum opened this issue May 12, 2023 · 6 comments · Fixed by #26729
Assignees
Labels
@aws-cdk/aws-batch Related to AWS Batch bug This issue is a bug. needs-reproduction This issue needs reproduction. p1

Comments

@suzhoum
Copy link

suzhoum commented May 12, 2023

Describe the bug

I'm trying to upgrade from 2.69.0 to the latest 2.78.0, and encountered an issue when trying to perform batch:SubmitJob from a lambda function. The error message is

arn:aws:sts::xxx:assumed-role/ag-bench-test-batch-stack-agbenchtestbatchjobfunct-1AQCSFR51GLG7/ag-bench-test-batch-job-function is not authorized to perform: batch:SubmitJob on resource: arn:aws:batch:us-west-2:xxx:job-definition/jobdefinitionED9E5E04-dd5ddb78a49496b

Expected Behavior

Lambda function should be able to perform batch:SubmitJob after the upgrade to v2.79.0

Current Behavior

I tried my best to update my code to generate the exact same cloudformation template that was generated in 2.69.0, but still there are some major differences.

I'm posting the code snippet that we have changed in this project in order to upgrade:

in v2.69.0:

from aws_cdk import aws_batch_alpha as batch

container = batch.JobDefinitionContainer(
            image=docker_container_image,
            gpu_count=container_gpu,
            vcpus=container_vcpu,
            memory_limit_mib=container_memory,
            linux_params=ecs.LinuxParameters(self, f"{prefix}-linux_params", shared_memory_size=container_memory),
        )

job_definition = batch.JobDefinition(
            self,
            "job-definition",
            container=container,
            retry_attempts=3,
            timeout=core.Duration.minutes(1500),
        )

batch_instance_role = iam.Role(
            self,
            f"{prefix}-instance-role",
            assumed_by=iam.CompositePrincipal(
                iam.ServicePrincipal("ec2.amazonaws.com"),
                iam.ServicePrincipal("ecs.amazonaws.com"),
                iam.ServicePrincipal("ecs-tasks.amazonaws.com"),
            ),
            managed_policies=[
                iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AmazonEC2ContainerServiceforEC2Role"),
            ],
        )

batch_instance_profile = iam.CfnInstanceProfile(
            self, 
            f"{prefix}-instance-profile", 
            roles=[batch_instance_role.role_name]
        )

compute_environment = batch.ComputeEnvironment(
            self,
            f"{prefix}-compute-environment",
            compute_resources=batch.ComputeResources(
                allocation_strategy=batch.AllocationStrategy.BEST_FIT_PROGRESSIVE,
                vpc=vpc,
                vpc_subnets=ec2.SubnetSelection(subnets=vpc.private_subnets),
                maxv_cpus=compute_env_maxv_cpus,
                instance_role=batch_instance_profile.profile_arn,
                instance_types=instances,
                security_groups=[sg],
                type=batch.ComputeResourceType.ON_DEMAND,
                launch_template=batch.LaunchTemplateSpecification(
                    launch_template_name=batch_launch_template_name  # LaunchTemplate.launch_template_name returns None
                ),
            ),
        )

        job_queue = batch.JobQueue(
            self,
            f"{prefix}-job-queue",
            priority=1,
            compute_environments=[batch.JobQueueComputeEnvironment(compute_environment=compute_environment, order=1)],
        )

in v2.79.0

from aws_cdk import aws_batch_alpha as batch
import aws_cdk as core

container = batch.EcsEc2ContainerDefinition(
                self, 
                f"{prefix}-container-definition",
                image=docker_container_image,
                memory=core.Size.mebibytes(container_memory),
                cpu=container_vcpu,
                gpu=container_gpu,
                environment={
                    "AWS_ACCOUNT": os.environ["CDK_DEPLOY_ACCOUNT"],
                    "AWS_REGION": os.environ["CDK_DEPLOY_REGION"],
                },
                execution_role=None,
                linux_parameters=batch.LinuxParameters(self, f"{prefix}-linux-params", shared_memory_size=core.Size.mebibytes(container_memory))
            )

job_definition = batch.EcsJobDefinition(
            self, 
            f"{prefix}-job-definition",
            container=container,
            retry_attempts=3,
            timeout=core.Duration.minutes(1500)
        )

batch_service_role = iam.Role(
            self,
            f"{prefix}-service-role",
            assumed_by=iam.CompositePrincipal(
                iam.ServicePrincipal("batch.amazonaws.com"),
            ),
            managed_policies=[
                iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AWSBatchServiceRole"),
            ],
        )

compute_environment = batch.ManagedEc2EcsComputeEnvironment(self, f"{prefix}-compute-environment",
            vpc=vpc,
            vpc_subnets=ec2.SubnetSelection(subnets=vpc.private_subnets),
            allocation_strategy=batch.AllocationStrategy.BEST_FIT_PROGRESSIVE,
            maxv_cpus=compute_env_maxv_cpus,
            instance_role=batch_instance_profile,
            instance_types=instances,
            security_groups=[sg],
            launch_template=launch_template,
            service_role=batch_service_role,
            use_optimal_instance_classes=False,
            update_to_latest_image_version=False,
            replace_compute_environment=True,
        )

The key difference I see in the generated CFN from above code snippets are, in v2.79.0, there arecontainerdefinitionExecutionRole and containerdefinitionExecutionRoleDefaultPolicy created:

"agbenchtestcontainerdefinitionExecutionRole0A25AAB3": {
   "Type": "AWS::IAM::Role",
   "Properties": {
    "AssumeRolePolicyDocument": {
     "Statement": [
      {
       "Action": "sts:AssumeRole",
       "Effect": "Allow",
       "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
       }
      }
     ],
     "Version": "2012-10-17"
    },
    "Tags": [
     {
      "Key": "ag-bench-test",
      "Value": "benchmark"
     }
    ]
   },
   "Metadata": {
    "aws:cdk:path": "ag-bench-test-batch-stack/ag-bench-test-container-definition/ExecutionRole/Resource"
   }
  },
  "agbenchtestcontainerdefinitionExecutionRoleDefaultPolicy2B49DF06": {
   "Type": "AWS::IAM::Policy",
   "Properties": {
    "PolicyDocument": {
     "Statement": [
      {
       "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage"
       ],
       "Effect": "Allow",
       "Resource": {
        "Fn::Join": [
         "",
         [
          "arn:",
          {
           "Ref": "AWS::Partition"
          },
          ":ecr:us-west-2:097403188315:repository/cdk-hnb659fds-container-assets-097403188315-us-west-2"
         ]
        ]
       }
      },
      {
       "Action": "ecr:GetAuthorizationToken",
       "Effect": "Allow",
       "Resource": "*"
      }
     ],
     "Version": "2012-10-17"
    },
    "PolicyName": "agbenchtestcontainerdefinitionExecutionRoleDefaultPolicy2B49DF06",
    "Roles": [
     {
      "Ref": "agbenchtestcontainerdefinitionExecutionRole0A25AAB3"
     }
    ]
   },
   "Metadata": {
    "aws:cdk:path": "ag-bench-test-batch-stack/ag-bench-test-container-definition/ExecutionRole/DefaultPolicy/Resource"
   }
  },

"AWS::Batch::ComputeEnvironment" has two more properties in v2.79.0

"ComputeResources": {
    "UpdateToLatestImageVersion": false
}
"ReplaceComputeEnvironment": true,

The Lambda function's CFN remained unchanged.

Reproduction Steps

See above

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.79.9

Framework Version

No response

Node.js Version

v18.13.0

OS

ubuntu

Language

Python

Language Version

No response

Other information

No response

@suzhoum suzhoum added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 12, 2023
@github-actions github-actions bot added the @aws-cdk/aws-lambda Related to AWS Lambda label May 12, 2023
@pahud pahud self-assigned this May 15, 2023
@pahud pahud changed the title (aws-cdk-alpha): lambda not authorized to perform batch:SubmitJob after upgrading from 2.69.0 to 2.78.0 aws-batch-alpha: lambda not authorized to perform batch:SubmitJob after upgrading from 2.69.0 to 2.78.0 May 15, 2023
@github-actions github-actions bot added the @aws-cdk/aws-batch Related to AWS Batch label May 15, 2023
@comcalvi
Copy link
Contributor

can you try passing the role, instead of the profile? I'm surprised that even compiles. Eg turn this:

compute_environment = batch.ManagedEc2EcsComputeEnvironment(self, f"{prefix}-compute-environment",
            instance_role=batch_instance_profile,
// ...
        )

into this:

compute_environment = batch.ManagedEc2EcsComputeEnvironment(self, f"{prefix}-compute-environment",
            instance_role=batch_instance_role,
// ...
        )

@suzhoum
Copy link
Author

suzhoum commented May 15, 2023

@comcalvi thanks for your response! I tried but still got the same error. We used instance_role=batch_instance_profile.profile_arn in 2.69.0 and it worked, so we kept the similar thing in the code.

@peterwoodworth peterwoodworth added p1 needs-reproduction This issue needs reproduction. and removed needs-triage This issue or PR still needs to be triaged. @aws-cdk/aws-lambda Related to AWS Lambda labels May 16, 2023
@comcalvi
Copy link
Contributor

@suzhoum can you share your lambda function's CDK definition on both versions? How does it relate to the CE?

@suzhoum
Copy link
Author

suzhoum commented Aug 3, 2023

Still facing the issue as of v2.89.0

@comcalvi
Copy link
Contributor

comcalvi commented Aug 9, 2023

This is another reason to add grant methods

@mergify mergify bot closed this as completed in #26729 Aug 14, 2023
mergify bot pushed a commit that referenced this issue Aug 14, 2023
Add a new method, `grantSubmitJob`, to the JobDefinition construct. Enables batch users to easily grant `submitJob` permissions to any principal. 

Closes #25574.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-batch Related to AWS Batch bug This issue is a bug. needs-reproduction This issue needs reproduction. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants