diff --git a/community/examples/intel/README.md b/community/examples/intel/README.md index e83bd27391..475c6bbf82 100644 --- a/community/examples/intel/README.md +++ b/community/examples/intel/README.md @@ -1,454 +1,3 @@ # Intel Solutions for the Cluster Toolkit (formerly HPC Toolkit) -> **_NOTE:_** The [hpc-slurm-daos.yaml](hpc-slurm-daos.yaml) will not be compatible -> for newer version of slurm-gcp v6. - - - - -- [Intel Solutions for the Cluster Toolkit](#intel-solutions-for-the-cluster-toolkit) - - [DAOS Cluster](#daos-cluster) - - [Initial Setup for DAOS Cluster](#initial-setup-for-daos-cluster) - - [Deploy the DAOS Cluster](#deploy-the-daos-cluster) - - [Connect to a client node](#connect-to-a-client-node) - - [Verify the DAOS storage system](#verify-the-daos-storage-system) - - [Create a DAOS Pool and Container](#create-a-daos-pool-and-container) - - [About the DAOS Command Line Tools](#about-the-daos-command-line-tools) - - [View Free Space](#view-free-space) - - [Create a Pool](#create-a-pool) - - [Create a Container](#create-a-container) - - [Mount the DAOS Container](#mount-the-daos-container) - - [Use DAOS Storage](#use-daos-storage) - - [Unmount the DAOS Container](#unmount-the-daos-container) - - [Delete the DAOS infrastructure when not in use](#delete-the-daos-infrastructure-when-not-in-use) - - [DAOS Server with Slurm cluster](#daos-server-with-slurm-cluster) - - [Initial Setup for the DAOS/Slurm cluster](#initial-setup-for-the-daosslurm-cluster) - - [Deploy the DAOS/Slurm Cluster](#deploy-the-daosslurm-cluster) - - [Connect to the DAOS/Slurm Cluster login node](#connect-to-the-daosslurm-cluster-login-node) - - [Create and Mount a DAOS Container](#create-and-mount-a-daos-container) - - [Run a Job that uses the DAOS Container](#run-a-job-that-uses-the-daos-container) - - [Unmount the Container](#unmount-the-container) - - [Delete the DAOS/Slurm Cluster infrastructure when not in use](#delete-the-daosslurm-cluster-infrastructure-when-not-in-use) - -## DAOS Cluster - -The [pfs-daos.yaml](pfs-daos.yaml) blueprint describes an environment with -- Two DAOS server instances -- Two DAOS client instances - -The [pfs-daos.yaml](pfs-daos.yaml) blueprint uses a Packer template and -Terraform modules from the [Google Cloud DAOS][google-cloud-daos] repository. -Please review the [introduction to image building](../../../docs/image-building.md) -for general information on building custom images using the Toolkit. - -Identify a project to work in and substitute its unique id wherever you see -`<>` in the instructions below. - -[google-cloud-daos]: https://github.com/daos-stack/google-cloud-daos -[pre-deployment_guide]: https://github.com/daos-stack/google-cloud-daos/blob/main/docs/pre-deployment_guide.md -[DAOS Yum Repository]: https://packages.daos.io - -### Initial Setup for DAOS Cluster - -Before provisioning the DAOS cluster you must follow the steps listed in the [Google Cloud DAOS Pre-deployment Guide][pre-deployment_guide]. - -Skip the "Build DAOS Images" step at the end of the [Pre-deployment Guide][pre-deployment_guide]. The [pfs-daos.yaml](pfs-daos.yaml) blueprint will build the images as part of the deployment. - -The Pre-deployment Guide provides instructions for: -- installing the Google Cloud CLI -- enabling service accounts -- enabling APIs -- establishing minimum resource quotas -- creating a Cloud NAT to allow instances without public IPs to access the [DAOS Yum Repository] repository. - -### Deploy the DAOS Cluster - -After completing the steps in the [Pre-deployment Guide][pre-deployment_guide] use `gcluster` to provision the blueprint - -```text -gcluster create community/examples/intel/pfs-daos.yaml \ - --vars project_id=<> \ - [--backend-config bucket=] -``` - -This will create the deployment directory containing Terraform modules and -Packer templates. The `--backend-config` option is not required but recommended. -It will save the terraform state in a pre-existing [Google Cloud Storage -bucket][bucket]. For more information see [Setting up a remote terraform -state][backend]. Use `gcluster deploy` to provision your DAOS storage cluster: - -```text -gcluster deploy pfs-daos --auto-approve -``` - -[backend]: ../../../examples/README.md#optional-setting-up-a-remote-terraform-state -[bucket]: https://cloud.google.com/storage/docs/creating-buckets - -### Connect to a client node - -1. Open the following URL in a new tab. - - https://console.cloud.google.com/compute - - This will take you to **Compute Engine > VM instances** in the Google Cloud Console. - - Select the project in which the DAOS cluster will be provisioned. - -2. Click on the **SSH** button associated with the **daos-client-0001** - instance to open a window with a terminal into the first DAOS client instance. - -### Verify the DAOS storage system - -The `community/examples/intel/pfs-daos.yaml` blueprint does not contain configuration for DAOS pools and containers. Therefore, pools and containers will need to be created manually. - -Before pools and containers can be created the storage system must be formatted. Formatting the storage is done automatically by the startup script that runs on the *daos-server-0001* instance. The startup script will run the [dmg storage format](https://docs.daos.io/v2.4/admin/deployment/?h=dmg+storage#storage-formatting) command. It may take a few minutes for all daos server instances to join. - -Verify that the storage system has been formatted and that the daos-server instances have joined. - -```bash -sudo dmg system query -v -``` - -The command will not return output until the system is ready. - -The output will look similar to - -```text -Rank UUID Control Address Fault Domain State Reason ----- ---- --------------- ------------ ----- ------ -0 225a0a51-d4ed-4ac3-b1a5-04b31c08b559 10.128.0.51:10001 /daos-server-0001 Joined -1 553ab1dc-99af-460e-a57c-3350611d1d09 10.128.0.43:10001 /daos-server-0002 Joined -``` - -Both daos-server instances should show a state of *Joined*. - -### Create a DAOS Pool and Container - -#### About the DAOS Command Line Tools - -The DAOS Management tool `dmg` is used by System Administrators to manage the DAOS storage [system](https://docs.daos.io/v2.4/overview/architecture/#daos-system) and DAOS [pools](https://docs.daos.io/v2.4/overview/storage/#daos-pool). Therefore, `sudo` must be used when running `dmg`. - -The DAOS CLI `daos` is used by both users and System Administrators to create and manage [containers](https://docs.daos.io/v2.4/overview/storage/#daos-container). It is not necessary to use `sudo` with the `daos` command. - -#### View Free Space - -View how much free space is available. - -```bash -sudo dmg storage query usage -``` - -#### Create a Pool - -Create a single pool owned by root which uses 100% of the available free space. - -```bash -sudo dmg pool create --size=100% --user=root pool1 -``` - -Set ACLs to allow any user to create a container in *pool1*. - -```bash -sudo dmg pool update-acl -e A::EVERYONE@:rcta pool1 -``` - -See the [Pool Operations](https://docs.daos.io/v2.4/admin/pool_operations) section of the DAOS Administration Guide for more information about creating pools. - -#### Create a Container - -At this point it is necessary to determine who will need to access the container -and how it will be used. The ACLs will need to be set properly to allow users and/or groups to access the container. - -For the purpose of this demo create the container without specifying ACLs. The container will be owned by your user account and you will have full access to the container. - -```bash -daos container create --type=POSIX --properties=rf:0 pool1 cont1 -``` - -See the [Container Management](https://docs.daos.io/v2.4/user/container) section of the DAOS User Guide for more information about creating containers. - -#### Mount the DAOS Container - -Mount the container with dfuse (DAOS Fuse) - -```bash -mkdir -p "${HOME}/daos/cont1" -dfuse --singlethread --pool=pool1 --container=cont1 --mountpoint="${HOME}/daos/cont1" -``` - -Verify that the container is mounted - -```bash -df -h -t fuse.daos -``` - -### Use DAOS Storage - -The `cont1` container is now mounted on `${HOME}/daos/cont1` - -Create a 20GiB file which will be stored in the DAOS filesystem. - -```bash -time LD_PRELOAD=/usr/lib64/libioil.so \ -dd if=/dev/zero of="${HOME}/daos/cont1/test20GiB.img" iflag=fullblock bs=1G count=20 -``` - -**Known Issue:** - -When you run `ls -lh "${HOME}/daos/cont1"` you may see that the `test20GiB.img` file shows a size of 0 bytes. - -If you unmount the container and mount it again, the file size will show as 20G. - -```bash -fusermount3 -u "${HOME}/daos/cont1" -dfuse --singlethread --pool=pool1 --container=cont1 --mountpoint="${HOME}/daos/cont1" -ls -lh "${HOME}/daos/cont1" -``` - -A work-around for this issue to disable caching when mounting the container. - -```bash -dfuse --singlethread --disable-caching --pool=pool1 --container=cont1 --mountpoint="${HOME}/daos/cont1" -``` - -See the [File System](https://docs.daos.io/v2.4/user/filesystem/) section of the DAOS User Guide for more information about DFuse. - -### Unmount the DAOS Container - -The container will need to be unmounted before you log out. If this is not done it can leave open file handles and prevent the container from being mounted when you log in again. - -Verify that the container is unmounted - -```bash -df -h -t fuse.daos -``` - -Logout of the DAOS client instance. - -```bash -logout -``` - -See the [DFuse (DAOS FUSE)](https://docs.daos.io/v2.4/user/filesystem/?h=dfuse#dfuse-daos-fuse) section of the DAOS User Guide for more information about mounting POSIX containers. - -### Delete the DAOS infrastructure when not in use - -> **_NOTE:_** Data stored in the DAOS container will be permanently lost after cluster deletion. - -Delete the remaining infrastructure - -```bash -gcluster destroy pfs-daos --auto-approve -``` - -## DAOS Server with Slurm cluster - -The [hpc-slurm-daos.yaml](hpc-slurm-daos.yaml) blueprint can be used to deploy a Slurm cluster and four DAOS server instances. The Slurm compute instances are configured as DAOS clients. - -The blueprint uses modules from -- [google-cloud-daos][google-cloud-daos] -- [community/modules/compute/schedmd-slurm-gcp-v6-nodeset][schedmd-slurm-gcp-v6-nodeset] -- [community/modules/compute/schedmd-slurm-gcp-v6-partition][schedmd-slurm-gcp-v6-partition] -- [community/modules/scheduler/schedmd-slurm-gcp-v6-login][schedmd-slurm-gcp-v6-login] -- [community/modules/scheduler/schedmd-slurm-gcp-v6-controller][schedmd-slurm-gcp-v6-controller] - -The blueprint also uses a Packer template from the [Google Cloud -DAOS][google-cloud-daos] repository. Please review the [introduction to image -building](../../../docs/image-building.md) for general information on building -custom images using the Toolkit. - -Substitute your project ID wherever you see `<>` in the instructions below. - -### Initial Setup for the DAOS/Slurm cluster - -Before provisioning the DAOS cluster you must follow the steps listed in the [Google Cloud DAOS Pre-deployment Guide][pre-deployment_guide]. - -Skip the "Build DAOS Images" step at the end of the [Pre-deployment Guide][pre-deployment_guide]. The [hpc-slurm-daos.yaml](hpc-slurm-daos.yaml) blueprint will build the DAOS server image as part of the deployment. - -The [Pre-deployment Guide][pre-deployment_guide] provides instructions for enabling service accounts, APIs, establishing minimum resource quotas and other necessary steps to prepare your project for DAOS server deployment. - -[google-cloud-daos]: https://github.com/daos-stack/google-cloud-daos -[pre-deployment_guide]: https://github.com/daos-stack/google-cloud-daos/blob/main/docs/pre-deployment_guide.md -[packer-template]: https://github.com/daos-stack/google-cloud-daos/blob/main/images/daos.pkr.hcl -[apis]: ../../../README.md#enable-gcp-apis -[schedmd-slurm-gcp-v6-nodeset]: ../../modules/compute/schedmd-slurm-gcp-v6-nodeset -[schedmd-slurm-gcp-v6-partition]: ../../modules/compute/schedmd-slurm-gcp-v6-partition -[schedmd-slurm-gcp-v6-controller]: ../../modules/scheduler/schedmd-slurm-gcp-v6-controller -[schedmd-slurm-gcp-v6-login]: ../../modules/scheduler/schedmd-slurm-gcp-v6-login - -Follow the Toolkit guidance to enable [APIs][apis] and establish minimum resource [quotas][quotas] for Slurm. - -[apis]: ../../../README.md#enable-gcp-apis -[quotas]: ../../../README.md#gcp-quotas - -The following available quota is required in the region used by Slurm: - -- Filestore: 2560GB -- C2 CPUs: 6000 (fully-scaled "compute" partition) - - This quota is not necessary at initial deployment, but will be required to - successfully scale the partition to its maximum size -- C2 CPUs: 4 (login node) - -### Deploy the DAOS/Slurm Cluster - -Use `gcluster` to provision the blueprint, supplying your project ID - -```text -gcluster create community/examples/intel/hpc-slurm-daos.yaml \ - --vars project_id=<> \ - [--backend-config bucket=] -``` - -This will create a set of directories containing Terraform modules and Packer -templates. - -The `--backend-config` option is not required but recommended. It will save the terraform state in a pre-existing [Google Cloud Storage bucket][bucket]. For more information see [Setting up a remote terraform state][backend]. - -Follow `gcluster` instructions to deploy the environment - -```text -gcluster deploy hpc-slurm-daos --auto-approve -``` - -[backend]: ../../../examples/README.md#optional-setting-up-a-remote-terraform-state -[bucket]: https://cloud.google.com/storage/docs/creating-buckets - -### Connect to the DAOS/Slurm Cluster login node - -Once the startup script has completed and Slurm reports readiness, connect to the login node. - -1. Open the following URL in a new tab. - - https://console.cloud.google.com/compute - - This will take you to **Compute Engine > VM instances** in the Google Cloud Console - - Select the project in which the cluster will be provisionsd. - -2. Click on the `SSH` button associated with the `hpcslurmda-login-login-001` - instance. - - This will open a separate pop up window with a terminal into our newly created - Slurm login VM. - -### Create and Mount a DAOS Container - -The [community/examples/intel/hpc-slurm-daos.yaml](hpc-slurm-daos.yaml) blueprint defines a single DAOS pool named `pool1`. The pool will be created when the *daos-server* instances are provisioned. - -You will need to create your own DAOS container in the pool that can be used by your Slurm jobs. - -While logged into the login node create a container named `cont1` in the `pool1` pool: - -```bash -daos cont create --type=POSIX --properties=rf:0 pool1 cont1 -``` - -NOTE: If you encounter an error `daos: command not found`, it's likely that the startup scripts have not finished running yet. Wait a few minutes and try again. - -Since the `cont1` container is owned by your account, your Slurm jobs will need to run as your user account to access the container. - -Create a mount point for the container and mount it with dfuse (DAOS Fuse) - -```bash -mkdir -p ${HOME}/daos/cont1 - -dfuse --singlethread \ ---pool=pool1 \ ---container=cont1 \ ---mountpoint=${HOME}/daos/cont1 -``` - -Verify that the container is mounted - -```bash -df -h -t fuse.daos -``` - -### Run a Job that uses the DAOS Container - -On the login node create a `daos_job.sh` file with the following content - -```bash -#!/bin/bash -JOB_HOSTNAME="$(hostname)" -TIMESTAMP="$(date '+%Y%m%d%H%M%S')" - -echo "Timestamp = ${TIMESTAMP}" -echo "Date = $(date)" -echo "Hostname = $(hostname)" -echo "User = $(whoami)" -echo "Working Directory = $(pwd)" -echo "" -echo "Number of Nodes Allocated = $SLURM_JOB_NUM_NODES" -echo "Number of Tasks Allocated = $SLURM_NTASKS" - -MOUNT_DIR="${HOME}/daos/cont1" -LOG_FILE="${MOUNT_DIR}/${JOB_HOSTNAME}.log" - -echo "${JOB_HOSTNAME} : Creating directory: ${MOUNT_DIR}" -mkdir -p "${MOUNT_DIR}" - -echo "${JOB_HOSTNAME} : Mounting with dfuse" -dfuse --singlethread --pool=pool1 --container=cont1 --mountpoint="${MOUNT_DIR}" -sleep 5 - -echo "${JOB_HOSTNAME} : Creating log file" -echo "Job ${SLURM_JOB_ID} running on ${JOB_HOSTNAME}" | tee "${MOUNT_DIR}/${TIMESTAMP}_${JOB_HOSTNAME}.log" - -echo "${JOB_HOSTNAME} : Unmounting dfuse" -fusermount3 -u "${MOUNT_DIR}" - -``` - -Run the `daos_job.sh` script in an interactive Slurm job on 4 nodes - -```bash -srun --nodes=4 \ - --ntasks-per-node=1 \ - --time=00:10:00 \ - --job-name=daos \ - --output=srunjob_%j.log \ - --partition=compute \ - daos_job.sh & -``` - -Run `squeue` to see the status of the job. The `daos_job.sh` script will run once on each of the 4 nodes. Each time it runs it creates a log file which is stored in the `cont1` DAOS container. - -Wait for the job to complete and then view the files that were created in the `cont1` DAOS container mounted on `${HOME}/daos/cont1`. - -```bash -ls -l ${HOME}/daos/cont1/*.log -cat ${HOME}/daos/cont1/*.log -``` - -### Unmount the Container - -The container will need to by unmounted before you log out. If this is not done it can leave open file handles and prevent the container from being mounted when you log in again. - -```bash -fusermount3 -u ${HOME}/daos/cont1 -``` - -Verify that the container is unmounted - -```bash -df -h -t fuse.daos -``` - -See the [DFuse (DAOS FUSE)](https://docs.daos.io/v2.4/user/filesystem/?h=dfuse#dfuse-daos-fuse) section of the DAOS User Guide for more information about mounting POSIX containers. - -### Delete the DAOS/Slurm Cluster infrastructure when not in use - -> **_NOTE:_** -> -> - Data on the DAOS file system will be permanently lost after cluster deletion. -> - If the Slurm controller is shut down before the auto-scale instances are destroyed, those compute instances will be left running. - -Open your browser to the VM instances page and ensure that instances named "compute" -have been shutdown and deleted by the Slurm autoscaler. - -Delete the remaining infrastructure: - -```bash -gcluster destroy hpc-slurm-daos --auto-approve -``` +> **_NOTE:_** The DAOS example blueprints (`hpc-slurm-daos.yaml` and `pfs-daos.yaml`) have been removed from the Cluster Toolkit. We recommend migrating to the first-party [Parallelstore](../../../modules/file-system/parallelstore/) module for similar functionality. To help with this transition, see the Parallelstore example blueprints ([pfs-parallelstore.yaml](../../../examples/pfs-parallelstore.yaml) and [ps-slurm.yaml](../../../examples/ps-slurm.yaml)). If the external [Google Cloud DAOS][google-cloud-daos] repository is necessary, we recommend using the last Cluster Toolkit [v1.41.0](https://github.com/GoogleCloudPlatform/cluster-toolkit/releases/tag/v1.41.0). diff --git a/community/examples/intel/hpc-slurm-daos.yaml b/community/examples/intel/hpc-slurm-daos.yaml deleted file mode 100644 index b3c217474c..0000000000 --- a/community/examples/intel/hpc-slurm-daos.yaml +++ /dev/null @@ -1,188 +0,0 @@ -# Copyright 2024 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - ---- - -blueprint_name: hpc-slurm-daos - -vars: - project_id: ## Set GCP Project ID Here ## - deployment_name: hpc-slurm-daos - region: us-central1 - zone: us-central1-c - daos_server_image_family: daos-server-hpc-rocky-8 - daos_version: "2.4" - tags: [] - -# Note: this blueprint assumes the existence of a default global network and -# subnetwork in the region chosen above - -validators: -- validator: test_module_not_used - inputs: {} - skip: true - -deployment_groups: -- group: primary - modules: - - id: network1 - source: github.com/GoogleCloudPlatform/hpc-toolkit//modules/network/vpc?ref=v1.33.0&depth=1 - - - id: homefs - source: github.com/GoogleCloudPlatform/hpc-toolkit//modules/file-system/filestore?ref=v1.33.0&depth=1 - use: [network1] - settings: - local_mount: /home - -- group: daos-server-image - modules: - # more info: https://github.com/daos-stack/google-cloud-daos/tree/main/images - - id: daos-server-image - source: "github.com/daos-stack/google-cloud-daos//images?ref=v0.5.0&depth=1" - kind: packer - settings: - daos_version: $(vars.daos_version) - daos_repo_base_url: https://packages.daos.io/ - daos_packages_repo_file: EL8/packages/x86_64/daos_packages.repo - use_iap: true - enable_oslogin: false - machine_type: n2-standard-32 - source_image_family: hpc-rocky-linux-8 - source_image_project_id: cloud-hpc-image-public - image_guest_os_features: ["GVNIC"] - disk_size: "20" - state_timeout: "10m" - scopes: ["https://www.googleapis.com/auth/cloud-platform"] - use_internal_ip: true - omit_external_ip: false - daos_install_type: server - image_family: $(vars.daos_server_image_family) - -- group: cluster - modules: - # more info: https://github.com/daos-stack/google-cloud-daos/tree/main/terraform/modules/daos_server - - id: daos - source: "github.com/daos-stack/google-cloud-daos//terraform/modules/daos_server?ref=v0.5.0&depth=1" - use: [network1] - settings: - labels: {ghpc_role: file-system} - machine_type: "n2-standard-16" - os_family: $(vars.daos_server_image_family) - daos_disk_count: 4 - tags: $(vars.tags) - pools: - - name: "pool1" - size: "100%" - # Do not set value for scm_size when size=100% - daos_scm_size: - user: "root@" - group: "root@" - acls: - - "A::OWNER@:rwdtTaAo" - - "A:G:GROUP@:rwtT" - - "A::EVERYONE@:rcta" - properties: - reclaim: "lazy" - containers: [] - - - id: daos-client-script - source: github.com/GoogleCloudPlatform/hpc-toolkit//modules/scripts/startup-script?ref=v1.33.0&depth=1 - settings: - runners: - - type: data - content: $(daos.daos_agent_yml) - destination: /etc/daos/daos_agent.yml - - type: data - content: $(daos.daos_control_yml) - destination: /etc/daos/daos_control.yml - - type: shell - content: $(daos.daos_client_install_script) - destination: /tmp/daos_client_install.sh - - type: shell - content: $(daos.daos_client_config_script) - destination: /tmp/daos_client_config.sh - - - id: debug_nodeset - source: github.com/GoogleCloudPlatform/hpc-toolkit//community/modules/compute/schedmd-slurm-gcp-v6-nodeset?ref=v1.33.0&depth=1 - use: [network1] - settings: - name: ns1 - node_count_dynamic_max: 4 - machine_type: n2-standard-2 - enable_placement: false # the default is: true - service_account_scopes: - - "https://www.googleapis.com/auth/monitoring.write" - - "https://www.googleapis.com/auth/logging.write" - - "https://www.googleapis.com/auth/devstorage.read_only" - - "https://www.googleapis.com/auth/cloud-platform" - - - id: debug_partition - source: github.com/GoogleCloudPlatform/hpc-toolkit//community/modules/compute/schedmd-slurm-gcp-v6-partition?ref=v1.33.0&depth=1 - use: [debug_nodeset] - settings: - partition_name: debug - exclusive: false # allows nodes to stay up after jobs are done - is_default: true - - - id: compute_nodeset - source: github.com/GoogleCloudPlatform/hpc-toolkit//community/modules/compute/schedmd-slurm-gcp-v6-nodeset?ref=v1.33.0&depth=1 - use: [network1] - settings: - name: ns2 - node_count_dynamic_max: 20 - bandwidth_tier: gvnic_enabled - service_account_scopes: - - "https://www.googleapis.com/auth/monitoring.write" - - "https://www.googleapis.com/auth/logging.write" - - "https://www.googleapis.com/auth/devstorage.read_only" - - "https://www.googleapis.com/auth/cloud-platform" - - - id: compute_partition - source: github.com/GoogleCloudPlatform/hpc-toolkit//community/modules/compute/schedmd-slurm-gcp-v6-partition?ref=v1.33.0&depth=1 - use: [compute_nodeset] - settings: - partition_name: compute - - - id: slurm_login - source: github.com/GoogleCloudPlatform/hpc-toolkit//community/modules/scheduler/schedmd-slurm-gcp-v6-login?ref=v1.33.0&depth=1 - use: [network1] - settings: - name_prefix: login - machine_type: n2-standard-4 - enable_login_public_ips: true - tags: $(vars.tags) - service_account_scopes: - - "https://www.googleapis.com/auth/monitoring.write" - - "https://www.googleapis.com/auth/logging.write" - - "https://www.googleapis.com/auth/devstorage.read_only" - - "https://www.googleapis.com/auth/cloud-platform" - - - id: slurm_controller - source: github.com/GoogleCloudPlatform/hpc-toolkit//community/modules/scheduler/schedmd-slurm-gcp-v6-controller?ref=v1.33.0&depth=1 - use: - - network1 - - debug_partition - - compute_partition - - slurm_login - - homefs - - daos-client-script - settings: - enable_controller_public_ips: true - compute_startup_script: $(daos-client-script.startup_script) - controller_startup_script: $(daos-client-script.startup_script) - login_startup_script: $(daos-client-script.startup_script) - compute_startup_scripts_timeout: 1000 - controller_startup_scripts_timeout: 1000 - login_startup_scripts_timeout: 1000 - tags: $(vars.tags) diff --git a/community/examples/intel/pfs-daos.yaml b/community/examples/intel/pfs-daos.yaml deleted file mode 100644 index 3abf5c9778..0000000000 --- a/community/examples/intel/pfs-daos.yaml +++ /dev/null @@ -1,109 +0,0 @@ -# Copyright 2024 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - ---- - -blueprint_name: pfs-daos - -vars: - project_id: ## Set GCP Project ID Here ## - deployment_name: pfs-daos - region: us-central1 - zone: us-central1-c - daos_server_image_family: daos-server-hpc-rocky-8 - daos_client_image_family: daos-client-hpc-rocky-8 - daos_version: "2.4" - tags: [] - -# Note: this blueprint assumes the existence of a default global network and -# subnetwork in the region chosen above - -deployment_groups: -- group: primary - modules: - - id: network1 - source: modules/network/pre-existing-vpc - -- group: daos-server-image - modules: - # more info: https://github.com/daos-stack/google-cloud-daos/tree/main/images - - id: daos-server-image - source: "github.com/daos-stack/google-cloud-daos//images?ref=v0.5.0&depth=1" - kind: packer - settings: - daos_version: $(vars.daos_version) - daos_repo_base_url: https://packages.daos.io - daos_packages_repo_file: EL8/packages/x86_64/daos_packages.repo - use_iap: true - enable_oslogin: false - machine_type: n2-standard-32 - source_image_family: hpc-rocky-linux-8 - source_image_project_id: cloud-hpc-image-public - image_guest_os_features: ["GVNIC"] - disk_size: "20" - state_timeout: "10m" - scopes: ["https://www.googleapis.com/auth/cloud-platform"] - use_internal_ip: true - omit_external_ip: false - daos_install_type: server - image_family: $(vars.daos_server_image_family) - -- group: daos-client-image - modules: - # more info: https://github.com/daos-stack/google-cloud-daos/tree/v0.5.0/images - - id: daos-client-image - source: "github.com/daos-stack/google-cloud-daos//images?ref=v0.5.0&depth=1" - kind: packer - settings: - daos_version: $(vars.daos_version) - daos_repo_base_url: https://packages.daos.io - daos_packages_repo_file: EL8/packages/x86_64/daos_packages.repo - use_iap: true - enable_oslogin: false - machine_type: n2-standard-32 - source_image_family: hpc-rocky-linux-8 - source_image_project_id: cloud-hpc-image-public - image_guest_os_features: ["GVNIC"] - disk_size: "20" - state_timeout: "10m" - scopes: ["https://www.googleapis.com/auth/cloud-platform"] - use_internal_ip: true - omit_external_ip: false - daos_install_type: client - image_family: $(vars.daos_client_image_family) - -- group: daos-cluster - modules: - # more info: https://github.com/daos-stack/google-cloud-daos/tree/develop/terraform/modules/daos_server - - id: daos-server - # source: $(vars.daos_server_module_source_url) - source: "github.com/daos-stack/google-cloud-daos//terraform/modules/daos_server?ref=v0.5.0&depth=1" - use: [network1] - settings: - number_of_instances: 2 - labels: {ghpc_role: file-system} - os_family: $(vars.daos_server_image_family) - daos_scm_size: "172" - tags: $(vars.tags) - - # more info: https://github.com/daos-stack/google-cloud-daos/tree/develop/terraform/modules/daos_client - - id: daos-client - # source: $(vars.daos_client_module_source_url) - source: "github.com/daos-stack/google-cloud-daos//terraform/modules/daos_client?ref=v0.5.0&depth=1" - use: [network1, daos-server] - settings: - number_of_instances: 2 - labels: {ghpc_role: compute} - os_family: $(vars.daos_client_image_family) - tags: $(vars.tags)