Skip to content

Commit

Permalink
Merge pull request #3380 from mr0re1/move_tmpl2
Browse files Browse the repository at this point in the history
Migrate `instance_template` modules from `slurm-gcp` repo
  • Loading branch information
mr0re1 authored Dec 10, 2024
2 parents 1adc1bd + 8d1ae34 commit d3f3ceb
Show file tree
Hide file tree
Showing 19 changed files with 1,568 additions and 9 deletions.
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,9 @@ repos:
hooks:
- id: script-must-have-extension
- id: shellcheck
exclude: ".*unlinted"
- id: shfmt
exclude: ".*tpl"
exclude: ".*tpl|.*unlinted"
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ modules. For support with the underlying modules, see the instructions in the
| Name | Source | Version |
|------|--------|---------|
| <a name="module_slurm_nodeset_template"></a> [slurm\_nodeset\_template](#module\_slurm\_nodeset\_template) | github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_instance_template | 6.8.6 |
| <a name="module_slurm_nodeset_template"></a> [slurm\_nodeset\_template](#module\_slurm\_nodeset\_template) | ../../internal/slurm-gcp-v6/instance_template | n/a |
## Resources
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ locals {
}

module "slurm_nodeset_template" {
source = "github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_instance_template?ref=6.8.6"
source = "../../internal/slurm-gcp-v6/instance_template"

project_id = var.project_id
region = var.region
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | ~> 1.0 |
| <a name="requirement_local"></a> [local](#requirement\_local) | ~> 2.0 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_local"></a> [local](#provider\_local) | ~> 2.0 |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_instance_template"></a> [instance\_template](#module\_instance\_template) | ../internal_instance_template | n/a |

## Resources

| Name | Type |
|------|------|
| [local_file.startup](https://registry.terraform.io/providers/hashicorp/local/latest/docs/data-sources/file) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_access_config"></a> [access\_config](#input\_access\_config) | Access configurations, i.e. IPs via which the VM instance can be accessed via the Internet. | <pre>list(object({<br/> nat_ip = string<br/> network_tier = string<br/> }))</pre> | `[]` | no |
| <a name="input_additional_disks"></a> [additional\_disks](#input\_additional\_disks) | List of maps of disks. | <pre>list(object({<br/> disk_name = string<br/> device_name = string<br/> disk_type = string<br/> disk_size_gb = number<br/> disk_labels = map(string)<br/> auto_delete = bool<br/> boot = bool<br/> }))</pre> | `[]` | no |
| <a name="input_additional_networks"></a> [additional\_networks](#input\_additional\_networks) | Additional network interface details for GCE, if any. | <pre>list(object({<br/> network = string<br/> subnetwork = string<br/> subnetwork_project = string<br/> network_ip = string<br/> nic_type = string<br/> access_config = list(object({<br/> nat_ip = string<br/> network_tier = string<br/> }))<br/> ipv6_access_config = list(object({<br/> network_tier = string<br/> }))<br/> }))</pre> | `[]` | no |
| <a name="input_bandwidth_tier"></a> [bandwidth\_tier](#input\_bandwidth\_tier) | Tier 1 bandwidth increases the maximum egress bandwidth for VMs.<br/>Using the `virtio_enabled` setting will only enable VirtioNet and will not enable TIER\_1.<br/>Using the `tier_1_enabled` setting will enable both gVNIC and TIER\_1 higher bandwidth networking.<br/>Using the `gvnic_enabled` setting will only enable gVNIC and will not enable TIER\_1.<br/>Note that TIER\_1 only works with specific machine families & shapes and must be using an image that supports gVNIC. See [official docs](https://cloud.google.com/compute/docs/networking/configure-vm-with-high-bandwidth-configuration) for more details. | `string` | `"platform_default"` | no |
| <a name="input_can_ip_forward"></a> [can\_ip\_forward](#input\_can\_ip\_forward) | Enable IP forwarding, for NAT instances for example. | `bool` | `false` | no |
| <a name="input_disable_smt"></a> [disable\_smt](#input\_disable\_smt) | Disables Simultaneous Multi-Threading (SMT) on instance. | `bool` | `false` | no |
| <a name="input_disk_auto_delete"></a> [disk\_auto\_delete](#input\_disk\_auto\_delete) | Whether or not the boot disk should be auto-deleted. | `bool` | `true` | no |
| <a name="input_disk_labels"></a> [disk\_labels](#input\_disk\_labels) | Labels to be assigned to boot disk, provided as a map. | `map(string)` | `{}` | no |
| <a name="input_disk_size_gb"></a> [disk\_size\_gb](#input\_disk\_size\_gb) | Boot disk size in GB. | `number` | `100` | no |
| <a name="input_disk_type"></a> [disk\_type](#input\_disk\_type) | Boot disk type, can be either pd-ssd, local-ssd, or pd-standard. | `string` | `"pd-standard"` | no |
| <a name="input_enable_confidential_vm"></a> [enable\_confidential\_vm](#input\_enable\_confidential\_vm) | Enable the Confidential VM configuration. Note: the instance image must support option. | `bool` | `false` | no |
| <a name="input_enable_oslogin"></a> [enable\_oslogin](#input\_enable\_oslogin) | Enables Google Cloud os-login for user login and authentication for VMs.<br/>See https://cloud.google.com/compute/docs/oslogin | `bool` | `true` | no |
| <a name="input_enable_shielded_vm"></a> [enable\_shielded\_vm](#input\_enable\_shielded\_vm) | Enable the Shielded VM configuration. Note: the instance image must support option. | `bool` | `false` | no |
| <a name="input_gpu"></a> [gpu](#input\_gpu) | GPU information. Type and count of GPU to attach to the instance template. See<br/>https://cloud.google.com/compute/docs/gpus more details.<br/>- type : the GPU type<br/>- count : number of GPUs | <pre>object({<br/> type = string<br/> count = number<br/> })</pre> | `null` | no |
| <a name="input_labels"></a> [labels](#input\_labels) | Labels, provided as a map | `map(string)` | `{}` | no |
| <a name="input_machine_type"></a> [machine\_type](#input\_machine\_type) | Machine type to create. | `string` | `"n1-standard-1"` | no |
| <a name="input_metadata"></a> [metadata](#input\_metadata) | Metadata, provided as a map. | `map(string)` | `{}` | no |
| <a name="input_min_cpu_platform"></a> [min\_cpu\_platform](#input\_min\_cpu\_platform) | Specifies a minimum CPU platform. Applicable values are the friendly names of<br/>CPU platforms, such as Intel Haswell or Intel Skylake. See the complete list:<br/>https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform | `string` | `null` | no |
| <a name="input_name_prefix"></a> [name\_prefix](#input\_name\_prefix) | Prefix for template resource. | `string` | `"default"` | no |
| <a name="input_network"></a> [network](#input\_network) | The name or self\_link of the network to attach this interface to. Use network<br/>attribute for Legacy or Auto subnetted networks and subnetwork for custom<br/>subnetted networks. | `string` | `null` | no |
| <a name="input_network_ip"></a> [network\_ip](#input\_network\_ip) | Private IP address to assign to the instance if desired. | `string` | `""` | no |
| <a name="input_on_host_maintenance"></a> [on\_host\_maintenance](#input\_on\_host\_maintenance) | Instance availability Policy | `string` | `"MIGRATE"` | no |
| <a name="input_preemptible"></a> [preemptible](#input\_preemptible) | Allow the instance to be preempted. | `bool` | `false` | no |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | Project ID to create resources in. | `string` | n/a | yes |
| <a name="input_region"></a> [region](#input\_region) | Region where the instance template should be created. | `string` | `null` | no |
| <a name="input_resource_policies"></a> [resource\_policies](#input\_resource\_policies) | A list of self\_links of resource policies to attach to the instance.<br/>Currently a max of 1 resource policy is supported. | `list(string)` | `null` | no |
| <a name="input_service_account"></a> [service\_account](#input\_service\_account) | Service account to attach to the instances. See<br/>'main.tf:local.service\_account' for the default. | <pre>object({<br/> email = string<br/> scopes = set(string)<br/> })</pre> | `null` | no |
| <a name="input_shielded_instance_config"></a> [shielded\_instance\_config](#input\_shielded\_instance\_config) | Shielded VM configuration for the instance. Note: not used unless<br/>enable\_shielded\_vm is 'true'.<br/>- enable\_integrity\_monitoring : Compare the most recent boot measurements to the<br/> integrity policy baseline and return a pair of pass/fail results depending on<br/> whether they match or not.<br/>- enable\_secure\_boot : Verify the digital signature of all boot components, and<br/> halt the boot process if signature verification fails.<br/>- enable\_vtpm : Use a virtualized trusted platform module, which is a<br/> specialized computer chip you can use to encrypt objects like keys and<br/> certificates. | <pre>object({<br/> enable_integrity_monitoring = bool<br/> enable_secure_boot = bool<br/> enable_vtpm = bool<br/> })</pre> | <pre>{<br/> "enable_integrity_monitoring": true,<br/> "enable_secure_boot": true,<br/> "enable_vtpm": true<br/>}</pre> | no |
| <a name="input_slurm_bucket_path"></a> [slurm\_bucket\_path](#input\_slurm\_bucket\_path) | GCS Bucket URI of Slurm cluster file storage. | `string` | n/a | yes |
| <a name="input_slurm_cluster_name"></a> [slurm\_cluster\_name](#input\_slurm\_cluster\_name) | Cluster name, used for resource naming. | `string` | n/a | yes |
| <a name="input_slurm_instance_role"></a> [slurm\_instance\_role](#input\_slurm\_instance\_role) | Slurm instance type. Must be one of: controller; login; compute; or null. | `string` | `null` | no |
| <a name="input_source_image"></a> [source\_image](#input\_source\_image) | Source disk image. | `string` | `""` | no |
| <a name="input_source_image_family"></a> [source\_image\_family](#input\_source\_image\_family) | Source image family. | `string` | `""` | no |
| <a name="input_source_image_project"></a> [source\_image\_project](#input\_source\_image\_project) | Project where the source image comes from. If it is not provided, the provider project is used. | `string` | `""` | no |
| <a name="input_spot"></a> [spot](#input\_spot) | Provision as a SPOT preemptible instance.<br/>See https://cloud.google.com/compute/docs/instances/spot for more details. | `bool` | `false` | no |
| <a name="input_subnetwork"></a> [subnetwork](#input\_subnetwork) | The name of the subnetwork to attach this interface to. The subnetwork must<br/>exist in the same region this instance will be created in. Either network or<br/>subnetwork must be provided. | `string` | `null` | no |
| <a name="input_subnetwork_project"></a> [subnetwork\_project](#input\_subnetwork\_project) | The ID of the project in which the subnetwork belongs. If it is not provided, the provider project is used. | `string` | `null` | no |
| <a name="input_tags"></a> [tags](#input\_tags) | Network tag list. | `list(string)` | `[]` | no |
| <a name="input_termination_action"></a> [termination\_action](#input\_termination\_action) | Which action to take when Compute Engine preempts the VM. Value can be: 'STOP', 'DELETE'. The default value is 'STOP'.<br/>See https://cloud.google.com/compute/docs/instances/spot for more details. | `string` | `"STOP"` | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_instance_template"></a> [instance\_template](#output\_instance\_template) | Instance template details |
| <a name="output_name"></a> [name](#output\_name) | Name of instance template |
| <a name="output_self_link"></a> [self\_link](#output\_self\_link) | Self\_link of instance template |
| <a name="output_service_account"></a> [service\_account](#output\_service\_account) | Service account object, includes email and scopes. |
| <a name="output_tags"></a> [tags](#output\_tags) | Tags that will be associated with instance(s) |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
#!/bin/bash
# Copyright (C) SchedMD LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -e

SLURM_DIR=/slurm
FLAGFILE=$SLURM_DIR/slurm_configured_do_not_remove
SCRIPTS_DIR=$SLURM_DIR/scripts
if [[ -z "$HOME" ]]; then
# google-startup-scripts.service lacks environment variables
HOME="$(getent passwd "$(whoami)" | cut -d: -f6)"
fi

METADATA_SERVER="metadata.google.internal"
URL="http://$METADATA_SERVER/computeMetadata/v1"
CURL="curl -sS --fail --header Metadata-Flavor:Google"

PING_METADATA="ping -q -w1 -c1 $METADATA_SERVER"
echo "INFO: $PING_METADATA"
for i in $(seq 10); do
[ $i -gt 1 ] && sleep 5;
$PING_METADATA > /dev/null && s=0 && break || s=$?;
echo "ERROR: Failed to contact metadata server, will retry"
done
if [ $s -ne 0 ]; then
echo "ERROR: Unable to contact metadata server, aborting"
wall -n '*** Slurm setup failed in the startup script! see `journalctl -u google-startup-scripts` ***'
exit 1
else
echo "INFO: Successfully contacted metadata server"
fi

PING_GOOGLE="ping -q -w1 -c1 8.8.8.8"
echo "INFO: $PING_GOOGLE"
for i in $(seq 5); do
[ $i -gt 1 ] && sleep 2;
$PING_GOOGLE > /dev/null && s=0 && break || s=$?;
echo "failed to ping Google DNS, will retry"
done
if [ $s -ne 0 ]; then
echo "WARNING: No internet access detected"
else
echo "INFO: Internet access detected"
fi

mkdir -p $SCRIPTS_DIR
UNIVERSE_DOMAIN="$($CURL $URL/instance/attributes/universe_domain)"
BUCKET="$($CURL $URL/instance/attributes/slurm_bucket_path)"
if [[ -z $BUCKET ]]; then
echo "ERROR: No bucket path detected."
exit 1
fi

SCRIPTS_ZIP="$HOME/slurm-gcp-scripts.zip"
export CLOUDSDK_CORE_UNIVERSE_DOMAIN="$UNIVERSE_DOMAIN"
until gcloud storage cp "$BUCKET/slurm-gcp-devel.zip" "$SCRIPTS_ZIP"; do
echo "WARN: Could not download SlurmGCP scripts, retrying in 5 seconds."
sleep 5
done
unzip -o "$SCRIPTS_ZIP" -d "$SCRIPTS_DIR"
rm -rf "$SCRIPTS_ZIP"

#temporary hack to not make the script fail on TPU vm
chown slurm:slurm -R "$SCRIPTS_DIR" || true
chmod 700 -R "$SCRIPTS_DIR"


if [ -f $FLAGFILE ]; then
echo "WARNING: Slurm was previously configured, quitting"
exit 0
fi
touch $FLAGFILE

function tpu_setup {
#allow the following command to fail, as this attribute does not exist for regular nodes
docker_image=$($CURL $URL/instance/attributes/slurm_docker_image 2> /dev/null || true)
if [ -z $docker_image ]; then #Not a tpu node, do not do anything
return
fi
if [ "$OS_ENV" == "slurm_container" ]; then #Already inside the slurm container, we should continue starting
return
fi

#given a input_string like "WORKER_0:Joseph;WORKER_1:richard;WORKER_2:edward;WORKER_3:john" and a number 1, this function will print richard
parse_metadata() {
local number=$1
local input_string=$2
local word=$(echo "$input_string" | awk -v n="$number" -F ':|;' '{ for (i = 1; i <= NF; i+=2) if ($(i) == "WORKER_"n) print $(i+1) }')
echo "$word"
}

input_string=$($CURL $URL/instance/attributes/slurm_names)
worker_id=$($CURL $URL/instance/attributes/tpu-env | awk '/WORKER_ID/ {print $2}' | tr -d \')
real_name=$(parse_metadata $worker_id $input_string)

#Prepare to docker pull with gcloud
mkdir -p /root/.docker
cat << EOF > /root/.docker/config.json
{
"credHelpers": {
"gcr.io": "gcloud",
"us-docker.pkg.dev": "gcloud"
}
}
EOF
#cgroup detection
CGV=1
CGROUP_FLAGS="-v /sys/fs/cgroup:/sys/fs/cgroup:rw"
if [ -f /sys/fs/cgroup/cgroup.controllers ]; then #CGV2
CGV=2
fi
if [ $CGV == 2 ]; then
CGROUP_FLAGS="--cgroup-parent=docker.slice --cgroupns=private --tmpfs /run --tmpfs /run/lock --tmpfs /tmp"
if [ ! -f /etc/systemd/system/docker.slice ]; then #In case that there is no slice prepared for hosting the containers create it
printf "[Unit]\nDescription=docker slice\nBefore=slices.target\n[Slice]\nCPUAccounting=true\nMemoryAccounting=true" > /etc/systemd/system/docker.slice
systemctl start docker.slice
fi
fi
#for the moment always use --privileged, as systemd might not work properly otherwise
TPU_FLAGS="--privileged"
# TPU_FLAGS="--cap-add SYS_RESOURCE --device /dev/accel0 --device /dev/accel1 --device /dev/accel2 --device /dev/accel3"
# if [ $CGV == 2 ]; then #In case that we are in CGV2 for systemd to work correctly for the moment we go with privileged
# TPU_FLAGS="--privileged"
# fi

docker run -d $CGROUP_FLAGS $TPU_FLAGS --net=host --name=slurmd --hostname=$real_name --entrypoint=/usr/bin/systemd --restart unless-stopped $docker_image
exit 0
}

tpu_setup #will do nothing for normal nodes or the container spawned inside TPU

echo "INFO: Running python cluster setup script"
SETUP_SCRIPT_FILE=$SCRIPTS_DIR/setup.py
chmod +x $SETUP_SCRIPT_FILE
exec $SETUP_SCRIPT_FILE
Loading

0 comments on commit d3f3ceb

Please sign in to comment.