Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Python Integration Test Script #3331

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

---
blueprint_name: slurm-test

vars:
project_id: ## Set GCP Project ID Here ##
deployment_name: slurm-test
region: us-central1
zone: us-central1-a

deployment_groups:
- group: primary
modules:
- id: network
source: modules/network/pre-existing-vpc

- id: nodeset
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
use: [network]
settings:
bandwidth_tier: gvnic_enabled
machine_type: c2-standard-4
node_count_dynamic_max: 3
allow_automatic_updates: false

- id: partition
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
use: [nodeset]
settings:
is_default: true
partition_name: compute

- id: slurm_login
source: community/modules/scheduler/schedmd-slurm-gcp-v6-login
use: [network]
settings:
machine_type: n1-standard-4
enable_login_public_ips: true

- id: slurm_controller
source: community/modules/scheduler/schedmd-slurm-gcp-v6-controller
use: [network, slurm_login, partition]
settings:
machine_type: n1-standard-4
enable_controller_public_ips: true
58 changes: 58 additions & 0 deletions tools/python-integration-tests/blueprints/slurm-simple.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

---
blueprint_name: slurm-test

vars:
project_id: ## Set GCP Project ID Here ##
deployment_name: slurm-test
region: us-central1
zone: us-central1-a

deployment_groups:
- group: primary
modules:
- id: network
source: modules/network/pre-existing-vpc

- id: nodeset
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
use: [network]
settings:
bandwidth_tier: gvnic_enabled
machine_type: c2-standard-4
node_count_dynamic_max: 5
allow_automatic_updates: false

- id: partition
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
use: [nodeset]
settings:
is_default: true
partition_name: compute

- id: slurm_login
source: community/modules/scheduler/schedmd-slurm-gcp-v6-login
use: [network]
settings:
machine_type: n1-standard-4
enable_login_public_ips: true

- id: slurm_controller
source: community/modules/scheduler/schedmd-slurm-gcp-v6-controller
use: [network, slurm_login, partition]
settings:
machine_type: n1-standard-4
enable_controller_public_ips: true
61 changes: 61 additions & 0 deletions tools/python-integration-tests/blueprints/topology-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

---
blueprint_name: topology-test

vars:
project_id: ## Set GCP Project ID Here ##
deployment_name: topology-test
region: us-central1
zone: us-central1-a

deployment_groups:
- group: primary
modules:
- id: network
source: modules/network/pre-existing-vpc

- id: nodeset
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
use: [network]
settings:
bandwidth_tier: gvnic_enabled
machine_type: n2-standard-4
node_count_dynamic_max: 0
node_count_static: 5
allow_automatic_updates: false
enable_placement: true

- id: partition
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
use: [nodeset]
settings:
is_default: true
partition_name: topo
exclusive: false

- id: slurm_login
source: community/modules/scheduler/schedmd-slurm-gcp-v6-login
use: [network]
settings:
machine_type: n1-standard-4
enable_login_public_ips: true

- id: slurm_controller
source: community/modules/scheduler/schedmd-slurm-gcp-v6-controller
use: [network, slurm_login, partition]
settings:
machine_type: n1-standard-4
enable_controller_public_ips: true
129 changes: 129 additions & 0 deletions tools/python-integration-tests/deployment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Copyright 2024 "Google LLC"
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import json
import shutil
import os
import subprocess
import yaml

class Deployment:
def __init__(self, blueprint: str):
self.blueprint_yaml = blueprint
self.state_bucket = "daily-tests-tf-state"
self.project_id = None
self.workspace = None
self.instance_name = None
self.username = None
self.deployment_name = None
self.zone = None

def run_command(self, cmd: str, err_msg: str = None) -> subprocess.CompletedProcess:
res = subprocess.run(cmd, shell=True, universal_newlines=True, check=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
return res

def parse_blueprint(self, file_path: str):
with open(file_path, 'r') as file:
content = yaml.safe_load(file)
self.deployment_name = content["vars"]["deployment_name"]
self.zone = content["vars"]["zone"]

def get_posixAccount_info(self):
# Extract the username from posixAccounts
result = self.run_command(f"gcloud compute os-login describe-profile --format=json").stdout
posixAccounts = json.loads(result)

for account in posixAccounts.get('posixAccounts', []):
if 'accountId' in account:
self.project_id = account['accountId']
self.username = account['username']

def set_deployment_variables(self):
self.workspace = os.path.abspath(os.getcwd().strip())
self.parse_blueprint(self.blueprint_yaml)
self.get_posixAccount_info()
self.instance_name = self.deployment_name.replace("-", "")[:10] + "-slurm-login-001"

def create_blueprint(self):
cmd = [
"./gcluster",
"create",
"-l",
"ERROR",
self.blueprint_yaml,
"--backend-config",
f"bucket={self.state_bucket}",
"--vars",
f"project_id={self.project_id}",
"--vars",
f"deployment_name={self.deployment_name}",
"-w"
]

subprocess.run(cmd, check=True, cwd=self.workspace)

def compress_blueprint(self):
cmd = [
"tar",
"-czf",
"%s.tgz" % (self.deployment_name),
"%s" % (self.deployment_name),
]

subprocess.run(cmd, check=True, cwd=self.workspace)

def upload_deployment(self):
cmd = [
"gsutil",
"cp",
"%s.tgz" % (self.deployment_name),
"gs://%s/%s/" % (self.state_bucket, self.deployment_name)
]

subprocess.run(cmd, check=True, cwd=self.workspace)

def print_download_command(self):
print("gcloud storage cp gs://%s/%s/%s.tgz ." % (self.state_bucket, self.deployment_name, self.deployment_name))

def create_deployment_directory(self):
self.set_deployment_variables()
self.create_blueprint()
self.compress_blueprint()
self.upload_deployment()
self.print_download_command()

def deploy(self):
# Create deployment directory
self.create_deployment_directory()
cmd = [
"./gcluster",
"deploy",
self.deployment_name,
"--auto-approve"
]

subprocess.run(cmd, check=True, cwd=self.workspace)

def destroy(self):
cmd = [
"./gcluster",
"destroy",
self.deployment_name,
"--auto-approve"
]

subprocess.run(cmd, check=True, cwd=self.workspace)
os.remove(f"{self.deployment_name}.tgz")
shutil.rmtree(self.deployment_name)
47 changes: 47 additions & 0 deletions tools/python-integration-tests/slurm_reconfig_size.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Copyright 2024 "Google LLC"
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from ssh import SSHManager
from deployment import Deployment
from test import SlurmTest
import unittest
import time

class SlurmReconfigureSize(SlurmTest):
# Class to test simple reconfiguration
def __init__(self, deployment, reconfig_blueprint):
super().__init__(deployment)
self.reconfig_blueprint = reconfig_blueprint

def runTest(self):
hostname = self.get_login_node()
self.ssh(hostname)
self.check_node_size_reconfig()

def check_node_size_reconfig(self):
# Check 5 nodes are available
self.assert_equal(len(self.get_nodes()), 5)

self.deployment = Deployment(self.reconfig_blueprint)
self.deployment.deploy()

print("Wait 90 seconds for reconfig")
time.sleep(90)

# Check 3 nodes are available
self.assert_equal(len(self.get_nodes()), 3)

if __name__ == "__main__":
deployment = Deployment("tools/python-integration-tests/blueprints/slurm-simple.yaml")
unittest.TextTestRunner().run(SlurmReconfigureSize(deployment, "tools/python-integration-tests/blueprints/slurm-simple-reconfig.yaml"))
Loading
Loading