Skip to content

Latest commit

 

History

History
197 lines (160 loc) · 14.3 KB

File metadata and controls

197 lines (160 loc) · 14.3 KB

Description

This module creates:

  • A local file: A Google Cloud Batch job template file is created. See the instructions output for the location of the file and instructions on how to submit it to Batch.
  • An instance template: This instance template defines the compute settings to be used for the Batch job such as network, machine type, image, and startup script. This instance template is automatically referenced from the Batch job template described above.

When this module is used with the batch-login-node module, the generated job template will be placed on the login node.

In some cases the job template can be submitted to the Google Cloud Batch API without modification, but for more complex workloads it is expected that the user will modify the template after running the Cluster Toolkit.

Example

- id: batch-job
  source: modules/scheduler/batch-job-template
  use: [network1]
  settings:
    runnable: "echo 'hello world'"
    machine_type: n2-standard-4
  outputs: [instructions]

See the Google Cloud Batch Example for how to use the batch-job-template module with other Cluster Toolkit modules such as filestore and startup-script.

Shared VPC

This module supports using a shared VPC with a Batch job. To accomplish this, include a pre-existing-vpc module that references an existing shared VPC and then have the batch-job-template module use the pre-existing-vpc.

Instance Templates

Many of the settings for a Google Cloud Batch job are set using an instance template, machine_type for example. The batch-job-template module accomplishes this by creating an instance template within the module, which is supplied to the Google Cloud Batch job.

Alternatively, one can supply an instance template to the batch-job-template module using the instance_template setting. This supplied instance template could be generated outside of the Cluster Toolkit (via the Cloud Console UI for example) or using a separate module within the blueprint. To define an instance template within a blueprint, one can use the Cloud Foundation Toolkit instance template module as shown in the following example. This can be useful when trying to set a property not natively supported in the batch-job-template module.

Example generating instance template using Cloud Foundation Toolkit module

deployment_groups:
- group: primary
  modules:
  - id: network1
    source: modules/network/pre-existing-vpc

  - id: appfs
    source: modules/file-system/filestore
    use: [network1]

  - id: batch-startup-script
    source: modules/scripts/startup-script
    settings:
      runners: ...

  - id: batch-compute-template
    source: github.com/terraform-google-modules/terraform-google-vm//modules/instance_template?ref=v7.8.0
    use: [batch-startup-script]
    settings:
      # Boiler plate to work with Cloud Foundation Toolkit
      network: $(network1.network_self_link)
      service_account: {email: null, scopes: ["https://www.googleapis.com/auth/cloud-platform"]}
      access_config: [{nat_ip: null, network_tier: null}]
      # Batch customization
      machine_type: n2-standard-4
      metadata:
        network_storage: ((jsonencode([module.appfs.network_storage])))
      source_image_family: hpc-rocky-linux-8
      source_image_project: cloud-hpc-image-public

  - id: batch-job
    source: ./modules/scheduler/batch-job-template
    settings:
      instance_template: $(batch-compute-template.self_link)
    outputs: [instructions]

License

Copyright 2022 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Requirements

Name Version
terraform >= 1.1
google >= 4.0
local >= 2.0.0
null ~> 3.0
random >= 3.0

Providers

Name Version
google >= 4.0
local >= 2.0.0
null ~> 3.0
random >= 3.0

Modules

Name Source Version
instance_template terraform-google-modules/vm/google//modules/instance_template ~> 10.1.1
netstorage_startup_script github.com/GoogleCloudPlatform/hpc-toolkit//modules/scripts/startup-script v1.39.0

Resources

Name Type
local_file.job_template resource
local_file.submit_script resource
null_resource.submit_job resource
random_id.submit_job_suffix resource
google_compute_image.compute_image data source

Inputs

Name Description Type Default Required
allow_automatic_updates If false, disables automatic system package updates on the created instances. This feature is
only available on supported images (or images derived from them). For more details, see
https://cloud.google.com/compute/docs/instances/create-hpc-vm#disable_automatic_updates
bool true no
deployment_name Name of the deployment, used for the job_id string n/a yes
enable_public_ips If set to true, instances will have public IPs bool true no
gcloud_version The version of the gcloud cli being used. Used for output instructions. Valid inputs are "alpha", "beta" and "" (empty string for default version) string "" no
image DEPRECATED: Google Cloud Batch compute node image. Ignored if instance_template is provided. any null no
instance_image Google Cloud Batch compute node image. Ignored if instance_template is provided.

Expected Fields:
name: The name of the image. Mutually exclusive with family.
family: The image family to use. Mutually exclusive with name.
project: The project where the image is hosted.
map(string)
{
"family": "hpc-rocky-linux-8",
"project": "cloud-hpc-image-public"
}
no
instance_template Compute VM instance template self-link to be used for Google Cloud Batch compute node. If provided, a number of other variables will be ignored as noted by Ignored if instance_template is provided in descriptions. string null no
job_filename The filename of the generated job template file. Will default to cloud-batch-<job_id>.json if not specified string null no
job_id An id for the Google Cloud Batch job. Used for output instructions and file naming. Automatically populated by the module id if not set. If setting manually, ensure a unique value across all jobs. string n/a yes
labels Labels to add to the Google Cloud Batch compute nodes. Key-value pairs. Ignored if instance_template is provided. map(string) n/a yes
log_policy Create a block to define log policy.
When set to CLOUD_LOGGING, logs will be sent to Cloud Logging.
When set to PATH, path must be added to generated template.
When set to DESTINATION_UNSPECIFIED, logs will not be preserved.
string "CLOUD_LOGGING" no
machine_type Machine type to use for Google Cloud Batch compute nodes. Ignored if instance_template is provided. string "n2-standard-4" no
mpi_mode Sets up barriers before and after each runnable. In addition, sets permissiveSsh=true, requireHostsFile=true, and taskCountPerNode=1. taskCountPerNode can be overridden by task_count_per_node. bool false no
native_batch_mounting Batch can mount some fs_type nativly using the 'volumes' block in the job file. If set to false, all mounting will happen through Cluster Toolkit startup scripts. bool true no
network_storage An array of network attached storage mounts to be configured. Ignored if instance_template is provided.
list(object({
server_ip = string
remote_mount = string
local_mount = string
fs_type = string
mount_options = string
client_install_runner = map(string)
mount_runner = map(string)
}))
[] no
on_host_maintenance Describes maintenance behavior for the instance. If left blank this will default to MIGRATE except the use of GPUs requires it to be TERMINATE string null no
project_id Project in which the HPC deployment will be created string n/a yes
region The region in which to run the Google Cloud Batch job string n/a yes
runnable A simplified form of var.runnables that only takes a single script. Use either runnables or runnable. string null no
runnables A list of shell scripts to be executed in sequence as the main workload of the Google Batch job. These will be used to populate the generated template.
list(object({
script = string
}))
null no
service_account Service account to attach to the Google Cloud Batch compute node. Ignored if instance_template is provided.
object({
email = string,
scopes = set(string)
})
{
"email": null,
"scopes": [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/trace.append"
]
}
no
startup_script Startup script run before Google Cloud Batch job starts. Ignored if instance_template is provided. string null no
submit When set to true, the generated job file will be submitted automatically to Google Cloud as part of terraform apply. bool false no
subnetwork The subnetwork that the Batch job should run on. Defaults to 'default' subnet. Ignored if instance_template is provided. any null no
task_count Number of parallel tasks number 1 no
task_count_per_node Max number of tasks that can be run on a VM at the same time. If not specified, Batch will decide a value. number null no

Outputs

Name Description
gcloud_version The version of gcloud to be used.
instance_template Instance template used by the Batch job.
instructions Instructions for submitting the Batch job.
job_data All data associated with the defined job, typically provided as input to clout-batch-login-node.
network_storage An array of network attached storage mounts used by the Batch job.
startup_script Startup script run before Google Cloud Batch job starts.