Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable deletion of SageMaker job style CR from API server if job is complete #16

Merged
merged 1 commit into from
Apr 21, 2021

Conversation

surajkota
Copy link
Member

@surajkota surajkota commented Apr 20, 2021

Description of Changes

Testing

Tested manually - let the job(training, process, transform & hpo) run till completion and ensured running kubectl delete on the CR removes the resource from management. No error on controller side.

Without this PR, the controller gets stuck in backoff because of an error from SageMaker

2021-03-25T07:41:11.356Z	DEBUG	ackrt	starting reconciliation	{"kind": "TrainingJob", "namespace": "default", "name": "xgboost-trainingjob-z61ptgyri5m1", "generation": 2, "account": "XXXX", "role": "", "region": "us-west-2"}
2021-03-25T07:41:11.707Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "trainingjob", "request": "default/xgboost-trainingjob-z61ptgyri5m1", "error": "ValidationException: The request was rejected because the training job is in status Completed.\n\tstatus code: 400, request id: 8e6a1690-9d35-4529-a2b9-3e9e011fcef2"}
github.com/go-logr/zapr.(*zapLogger).Error
	/home/ubuntu/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/home/ubuntu/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/home/ubuntu/go/pkg/mod/k8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/home/ubuntu/go/pkg/mod/k8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/home/ubuntu/go/pkg/mod/k8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
	/home/ubuntu/go/pkg/mod/k8s.io/apimachinery@v0.18.6/pkg/util/wait/wait.go:90

automated tests will be part of a separate follow up of @mbaijal's PR

Related PRs

Both merged
aws-controllers-k8s/code-generator#45
aws-controllers-k8s/code-generator#26

@surajkota surajkota requested review from jkuruba and a team and removed request for RedbackThomson April 20, 2021 00:35
@surajkota surajkota self-assigned this Apr 20, 2021
Copy link
Contributor

@jkuruba jkuruba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the testing to include the specific tests run and the test outputs?

Otherwise, changes look good.

@surajkota surajkota force-pushed the job_delete_fix branch 2 times, most recently from 2460dd0 to 73a67f7 Compare April 20, 2021 18:25
	- reduce custom code by using callback hooks
@surajkota surajkota merged commit 6f42ab9 into aws-controllers-k8s:main Apr 21, 2021
ryansteakley pushed a commit to ryansteakley/sagemaker-controller that referenced this pull request May 17, 2021
- enable deletion of resource if job is complete
- reduce custom code by using callback hooks
ryansteakley pushed a commit to ryansteakley/sagemaker-controller that referenced this pull request May 18, 2021
- enable deletion of resource if job is complete
- reduce custom code by using callback hooks
@mbaijal mbaijal mentioned this pull request Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants