-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow running examples on Apple Silicon M1 and fix image build errors for arm64 #1898
Allow running examples on Apple Silicon M1 and fix image build errors for arm64 #1898
Conversation
f8e8830
to
9e31474
Compare
We should contain these fixes in the next Katib release to resolve image build errors for arm64. |
@@ -36,8 +36,8 @@ if [ -z "$(command -v kubectl)" ]; then | |||
exit 1 | |||
fi | |||
|
|||
# Step 1. Create Kind cluster with Kubernetes v1.22.9 | |||
kind create cluster --image kindest/node:v1.22.9 | |||
# Step 1. Create Kind cluster with Kubernetes v1.23.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason for changing default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I upgraded the KinD Kubernetes version since we have upgraded Kubernetes dependencies to v0.23.
Also, according to this doc, K8s v1.22 reach EoL before we will release after the next Katib major version.
fi | ||
|
||
echo -e "\nBuilding Tensorflow with summaries mnist training container example...\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are arm64 images built only for tf-mnist-with-summaries
and enas-cnn-cifar10
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all trial images for amd64 except tf-mnist-with-summaries
and enas-cnn-cifar10
can work on the M1 Mac since we can emulate x86_64 in docker desktop with Rosetta2.
Would you like to make all images conform to arm64 in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we can fix later as well as it works now. We need to cleanup scripts for building images of different architecture.(rather than having arch checks per image etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. @johnugeorge
It might be better to introduce a multi-arch build after the next Katib release.
ref: https://docs.docker.com/desktop/multi-arch/
I will create an issue to keep tracking this feature.
# If the local machine's CPU architecture is arm64, rewrite mysql image. | ||
if [ "$(uname -m)" = "arm64" ]; then | ||
kubectl patch deployments -n kubeflow katib-mysql --type json -p \ | ||
'[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value": "arm64v8/mysql:8.0.29-oracle"}]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any better replacement solution with kustomize
itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have only 3 requirement tools to work on this KinD cluster example, I added that code not to make increase requirement tools.
Ready to merge? |
Sure! |
Thanks /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
I added codes to change the katib-mysql image to
arm64v8/mysql:8.0.29-oracle
toexamples/v1beta1/kind-cluster/deploy.sh
since we get the following errors when we run KinD Cluster examples on Apple Silicon M1 Mac.Also I modified Dockerfile to fix this TensorFlow issue in
tf-mnist-with-summaries
andenas-cnn-cifar10
.In addition, I fixed some following errors when we build some images for arm64.
error logs for tfevent-metricscollector 1
error logs for tfevent-metricscollector 2
error logs for suggestion-chocolate 1
error logs for suggestion-chocolate 2
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Checklist: