-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow running examples on Apple Silicon M1 and fix image build errors for arm64 #1898
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,8 +36,8 @@ if [ -z "$(command -v kubectl)" ]; then | |
exit 1 | ||
fi | ||
|
||
# Step 1. Create Kind cluster with Kubernetes v1.22.9 | ||
kind create cluster --image kindest/node:v1.22.9 | ||
# Step 1. Create Kind cluster with Kubernetes v1.23.6 | ||
kind create cluster --image kindest/node:v1.23.6 | ||
echo -e "\nKind cluster has been created\n" | ||
|
||
# Step 2. Set context for kubectl | ||
|
@@ -53,6 +53,12 @@ kubectl get nodes | |
echo -e "\nDeploying Katib components\n" | ||
kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master" | ||
|
||
# If the local machine's CPU architecture is arm64, rewrite mysql image. | ||
if [ "$(uname -m)" = "arm64" ]; then | ||
kubectl patch deployments -n kubeflow katib-mysql --type json -p \ | ||
'[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value": "arm64v8/mysql:8.0.29-oracle"}]' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any better replacement solution with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we have only 3 requirement tools to work on this KinD cluster example, I added that code not to make increase requirement tools. |
||
fi | ||
|
||
# Wait until all Katib pods are running. | ||
kubectl wait --for=condition=ready --timeout=${TIMEOUT} -l "katib.kubeflow.org/component in (controller,db-manager,mysql,ui)" -n kubeflow pod | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,3 @@ | ||
scipy>=1.7.2 | ||
tensorflow==2.9.1; platform_machine=="x86_64" | ||
tensorflow-aarch64==2.9.1; platform_machine=="aarch64" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
tensorflow==2.9.1; platform_machine=="x86_64" | ||
tensorflow-aarch64==2.9.1; platform_machine=="aarch64" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -112,32 +112,33 @@ echo -e "\nBuilding median stopping rule...\n" | |
docker build --platform "linux/$ARCH" -t "${REGISTRY}/earlystopping-medianstop:${TAG}" -f ${CMD_PREFIX}/earlystopping/medianstop/${VERSION}/Dockerfile . | ||
|
||
# Training container images | ||
echo -e "\nBuilding training container images..." | ||
|
||
if [ ! "$ARCH" = "amd64" ]; then | ||
echo -e "\nTraining container images are supported only amd64." | ||
echo -e "\nSome training container images are supported only amd64." | ||
else | ||
|
||
echo -e "\nBuilding training container images..." | ||
|
||
echo -e "\nBuilding mxnet mnist training container example...\n" | ||
docker build --platform linux/amd64 -t "${REGISTRY}/mxnet-mnist:${TAG}" -f examples/${VERSION}/trial-images/mxnet-mnist/Dockerfile . | ||
|
||
echo -e "\nBuilding Tensorflow with summaries mnist training container example...\n" | ||
docker build --platform linux/amd64 -t "${REGISTRY}/tf-mnist-with-summaries:${TAG}" -f examples/${VERSION}/trial-images/tf-mnist-with-summaries/Dockerfile . | ||
|
||
echo -e "\nBuilding PyTorch mnist training container example...\n" | ||
docker build --platform linux/amd64 -t "${REGISTRY}/pytorch-mnist:${TAG}" -f examples/${VERSION}/trial-images/pytorch-mnist/Dockerfile . | ||
|
||
echo -e "\nBuilding Keras CIFAR-10 CNN training container example for ENAS with GPU support...\n" | ||
docker build --platform linux/amd64 -t "${REGISTRY}/enas-cnn-cifar10-gpu:${TAG}" -f examples/${VERSION}/trial-images/enas-cnn-cifar10/Dockerfile.gpu . | ||
|
||
echo -e "\nBuilding Keras CIFAR-10 CNN training container example for ENAS with CPU support...\n" | ||
docker build --platform linux/amd64 -t "${REGISTRY}/enas-cnn-cifar10-cpu:${TAG}" -f examples/${VERSION}/trial-images/enas-cnn-cifar10/Dockerfile.cpu . | ||
|
||
echo -e "\nBuilding PyTorch CIFAR-10 CNN training container example for DARTS with CPU support...\n" | ||
docker build --platform linux/amd64 -t "${REGISTRY}/darts-cnn-cifar10-cpu:${TAG}" -f examples/${VERSION}/trial-images/darts-cnn-cifar10/Dockerfile.cpu . | ||
|
||
echo -e "\nBuilding PyTorch CIFAR-10 CNN training container example for DARTS with GPU support...\n" | ||
docker build --platform linux/amd64 -t "${REGISTRY}/darts-cnn-cifar10-gpu:${TAG}" -f examples/${VERSION}/trial-images/darts-cnn-cifar10/Dockerfile.gpu . | ||
|
||
fi | ||
|
||
echo -e "\nBuilding Tensorflow with summaries mnist training container example...\n" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are arm64 images built only for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, all trial images for amd64 except Would you like to make all images conform to arm64 in this PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think, we can fix later as well as it works now. We need to cleanup scripts for building images of different architecture.(rather than having arch checks per image etc) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It makes sense. @johnugeorge ref: https://docs.docker.com/desktop/multi-arch/ I will create an issue to keep tracking this feature. |
||
docker build --platform "linux/$ARCH" -t "${REGISTRY}/tf-mnist-with-summaries:${TAG}" -f examples/${VERSION}/trial-images/tf-mnist-with-summaries/Dockerfile . | ||
|
||
echo -e "\nBuilding Keras CIFAR-10 CNN training container example for ENAS with CPU support...\n" | ||
docker build --platform "linux/$ARCH" -t "${REGISTRY}/enas-cnn-cifar10-cpu:${TAG}" -f examples/${VERSION}/trial-images/enas-cnn-cifar10/Dockerfile.cpu . | ||
|
||
echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason for changing default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I upgraded the KinD Kubernetes version since we have upgraded Kubernetes dependencies to v0.23.
Also, according to this doc, K8s v1.22 reach EoL before we will release after the next Katib major version.