Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 0.2.0-alpha #25

Merged
merged 51 commits into from
Nov 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
a7c0df4
Add suite of golang precommits
heyealex Oct 14, 2021
838a66a
Adding additional linters to precommit checks
heyealex Oct 14, 2021
9b618e2
Improve error messages
heyealex Oct 19, 2021
0504c1a
Merge pull request #10 from heyealex/update-precommits
heyealex Oct 19, 2021
2374d20
Conform to Terraform best practices
tpdownes Oct 18, 2021
5371f4c
Conform to snake_case Terraform naming convention
tpdownes Oct 18, 2021
f43fc60
Merge pull request #9 from tpdownes/provider_best_practices
tpdownes Oct 20, 2021
949d115
Improved file system support.
cboneti Oct 20, 2021
f600ed9
Merge remote-tracking branch 'upstream/develop' into cboneti/DDN-lustre
cboneti Oct 20, 2021
20130cc
Bring reswriter test coverage up past 80%
heyealex Oct 19, 2021
3fee5b4
project_id to project in terraform google provider
heyealex Oct 20, 2021
543f1d5
Update resources/third-party/file-system/DDN-EXAScaler/README.md
cboneti Oct 20, 2021
64e1f84
Merge pull request #12 from heyealex/reswriter-tests
heyealex Oct 20, 2021
1b4ad79
Merge remote-tracking branch 'upstream/develop' into cboneti/DDN-lustre
cboneti Oct 20, 2021
822e9c3
Fixing variables in the example file.
cboneti Oct 20, 2021
0e43c69
Merge branch 'cboneti/DDN-lustre' of github.com:cboneti/hpc-toolkit i…
cboneti Oct 20, 2021
0ea1b4b
Merge pull request #11 from cboneti/cboneti/DDN-lustre
cboneti Oct 20, 2021
e37bb69
Bring resreader test coverage above 80%
heyealex Oct 20, 2021
9333dc3
Merge pull request #13 from heyealex/resreader-tests
heyealex Oct 21, 2021
cab949c
Bring config package unit tests past 80% coverage
heyealex Oct 25, 2021
db58fed
Address reviewer feedback
heyealex Oct 25, 2021
5484dc1
Merge pull request #14 from heyealex/config-test-coverage
heyealex Oct 25, 2021
373bda8
Add descriptions to resource READMEs
heyealex Oct 25, 2021
a943914
Update resources README, address reviewer feedback
heyealex Oct 26, 2021
e6c2eef
Merge pull request #15 from heyealex/resource-readmes
heyealex Oct 26, 2021
516717b
Update examples README
heyealex Oct 27, 2021
c0d55ee
Add PR template
heyealex Oct 27, 2021
491749b
Merge pull request #16 from heyealex/resource-readmes
heyealex Oct 27, 2021
a772903
Merge pull request #17 from heyealex/pr-checklist
heyealex Oct 27, 2021
1167593
Improved makefile, enforce >= 80% eng converage
cboneti Oct 27, 2021
d5e202d
enforce_coverage no longer fails at the first pkg.
cboneti Oct 28, 2021
8cb9078
Fixing a few issues with our documentation.
cboneti Oct 28, 2021
d0ec2bb
Workaround for exascaler to work on most networks.
cboneti Oct 28, 2021
fa149ac
Merge pull request #19 from cboneti/cboneti/improved-readmes
cboneti Oct 28, 2021
4eb91d5
Perl directly called in Makefile/path enforce_coverage
cboneti Oct 28, 2021
5d684c4
Merge pull request #20 from cboneti/cboneti/DDN-lustre
cboneti Oct 28, 2021
88fce1f
Merge pull request #18 from cboneti/enforce-coverage
cboneti Oct 28, 2021
770b891
Improved EXAScaler
cboneti Nov 2, 2021
df08a50
Merge pull request #22 from cboneti/cboneti/DDN-lustre
cboneti Nov 2, 2021
b2554a1
Add ability to manage Terraform backends
tpdownes Nov 1, 2021
856c6a6
Merge pull request #21 from tpdownes/remote_backend
tpdownes Nov 4, 2021
86c60c1
Update TFWriter to write files w/hclwrite package
heyealex Nov 3, 2021
4c19d65
Simplify ResWriter interface
heyealex Nov 4, 2021
aceb5c2
Move versions and license to separate files
heyealex Nov 4, 2021
63b1ff8
Move versions and license to separate files
heyealex Nov 4, 2021
e478c55
Incorporate TF backend changes in new reswriter
heyealex Nov 4, 2021
57c2265
Merge pull request #23 from heyealex/hclwrite-resource-writer
heyealex Nov 4, 2021
3be84c5
Updating version number for minor release
heyealex Nov 8, 2021
37b8c2f
Merge pull request #24 from heyealex/minor-version-update
heyealex Nov 8, 2021
56e62dd
General cleanup
heyealex Nov 8, 2021
ab2e91f
Merge pull request #26 from heyealex/minor-version-update
heyealex Nov 9, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,16 @@ repos:
exclude: \.terraform\/.*$
pass_filenames: true
require_serial: true
- repo: git://github.com/dnephin/pre-commit-golang
rev: v0.4.0
hooks:
- id: go-fmt
- id: go-vet
- id: go-lint
- id: go-imports
- id: go-cyclo
args: [-over=15]
- id: go-critic
- id: go-unit-tests
- id: go-build
- id: go-mod-tidy
13 changes: 11 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: tests fmt vet test-engine test-resources test-examples packer packer-clean packer-check packer-docs
.PHONY: tests fmt vet test-engine test-resources test-examples packer packer-clean packer-check packer-docs add-google-license
RES = ./resources
ENG = ./cmd/... ./pkg/...
SRC = $(ENG) $(RES)/tests/...
Expand All @@ -20,7 +20,8 @@ vet:

test-engine:
$(info **************** running ghpc unit tests **************)
go test -cover $(ENG)
go test -cover $(ENG) 2>&1 | perl tools/enforce_coverage.pl


test-resources:
$(info **************** running resources unit tests *********)
Expand Down Expand Up @@ -58,3 +59,11 @@ packer-docs:
terraform-docs json $${folder} --config .tfdocs-json.yaml;\
done
endif

ifeq (, $(shell which addlicense))
add-google-license:
$(error "could not find addlicense in PATH, run: go install github.com/google/addlicense@latest")
else
add-google-license:
addlicense -c "Google LLC" -l apache .
endif
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Simply run `make` in the root directory.
## Basic Usage
To create a blueprint, an input YAML file needs to be written or adapted from
the examples under `examples`. A good starting point is
`examples/hpc-cluster-slurm.yaml` which creates a blueprint for a new network,
`examples/hpc-cluster-small.yaml` which creates a blueprint for a new network,
a filestore instance and a slurm login node and controller.
More information on the example configs can be found in the README.md of the
`examples` directory.
Expand All @@ -32,15 +32,15 @@ In order to create a blueprint using `ghpc`, first ensure you've updated your
config template to include your GCP project ID then run the following command:

```
ghpc create --config examples/hpc-cluster-slurm.yaml
./ghpc create --config examples/hpc-cluster-small.yaml
```

The blueprint directory, named as the `blueprint_name` field from the input
config will be created in the same directory as ghpc.

To deploy the blueprint, use terraform in the resource group directory:
```
cd hpc-slurm/primary # From hpc-cluster-slurm.yaml example
cd hpc-slurm/primary # From hpc-cluster-small.yaml example
terraform init
terraform apply
```
Expand Down
5 changes: 4 additions & 1 deletion cmd/create.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,17 @@ package cmd
import (
"hpc-toolkit/pkg/config"
"hpc-toolkit/pkg/reswriter"
"log"

"github.com/spf13/cobra"
)

func init() {
createCmd.Flags().StringVarP(&yamlFilename, "config", "c", "",
"Configuration file for the new blueprints")
createCmd.MarkFlagRequired("config")
if err := createCmd.MarkFlagRequired("config"); err != nil {
log.Fatalf("error while marking 'config' flag as required: %e", err)
}
rootCmd.AddCommand(createCmd)
}

Expand Down
6 changes: 5 additions & 1 deletion cmd/expand.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,18 @@ package cmd

import (
"hpc-toolkit/pkg/config"
"log"

"github.com/spf13/cobra"
)

func init() {
expandCmd.Flags().StringVarP(&yamlFilename, "config", "c", "",
"Configuration file for the new blueprints")
expandCmd.MarkFlagRequired("config")
err := expandCmd.MarkFlagRequired("config")
if err != nil {
log.Fatalf("Error in init for expand command: %v", err)
}
expandCmd.Flags().StringVarP(&outputFilename, "out", "o", "expanded.yaml",
"Output file for the expanded yaml.")
rootCmd.AddCommand(expandCmd)
Expand Down
2 changes: 1 addition & 1 deletion cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ HPC deployments on the Google Cloud Platform.`,
log.Fatalf("cmd.Help function failed: %s", err)
}
},
Version: "v0.1.1-alpha (private preview)",
Version: "v0.2.0-alpha (private preview)",
}
)

Expand Down
18 changes: 15 additions & 3 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,22 @@ Please note that global variables defined under `vars` are automatically
passed to resources if the resources have an input that matches the variable name.

## Config Descriptions
**hpc-cluster-slurm.yaml**: Creates a basic auto-scaling SLURM cluster with a
single SLURM patition and default settings. The blueprint also creates a new VPC
network, and a filestore instance mounted to `/home`.
**hpc-cluster-small.yaml**: Creates a basic auto-scaling SLURM cluster with a
single SLURM patition and mostly default settings. The blueprint also creates a
new VPC network, and a filestore instance mounted to `/home`.

**hpc-cluster-high-io.yaml**: Creates a slurm cluster with tiered file systems
for higher performance. It connects to the default VPC of the project and
creates two partitions and a login node.

File systems:
* The homefs mounted at `/home` is a default "PREMIUM" tier filestore with 2.5TiB of capacity
* The projectsfs is mounted at `/projects` and is a high scale SSD filestore
instance with 10TiB of capacity.
* The scratchfs is mounted at `/scratch` and is a [DDN Exascaler Lustre](../resources/third-party/file-system/DDN-EXAScaler/README.md) file
system designed for high IO performance. The capacity is ~10TiB.

### Experimental
**omnia-cluster-simple.yaml**: Creates a simple omnia cluster, with an omnia-manager node and 8 omnia-compute nodes, on the pre-existing default network. Omnia will be automatically installed after the nodes are provisioned. All nodes mount a filestore instance on `/home`.

## Config Schema
Expand Down
96 changes: 96 additions & 0 deletions examples/hpc-cluster-high-io.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

blueprint_name: hpc-cluster-high-io

vars:
project_id: ## Set GCP Project ID Here ##
deployment_name: hpc-slurm-io
region: us-central1
zone: us-central1-a

resource_groups:
- group: primary
resources:
- source: ./resources/network/pre-existing-vpc
kind: terraform
id: network1

- source: ./resources/file-system/filestore
kind: terraform
id: homefs
settings:
local_mount: /home
network_name: $(network1.network_name)

- source: ./resources/file-system/filestore
kind: terraform
id: projectsfs
settings:
filestore_tier: HIGH_SCALE_SSD
size_gb: 10240
local_mount: /projects
network_name: $(network1.network_name)

- source: ./resources/third-party/file-system/DDN-EXAScaler
kind: terraform
id: scratchfs
settings:
local_mount: /scratch
network_name: $(network1.network_name)
subnetwork_name: ((module.network1.primary_subnetwork.name))
subnetwork_address: ((module.network1.primary_subnetwork.ip_cidr_range))

- source: ./resources/third-party/compute/SchedMD-slurm-on-gcp-partition
kind: terraform
id: compute_partition
settings:
max_node_count: 200
partition_name: compute
subnetwork_name: ((module.network1.primary_subnetwork.name))
network_storage:
- $(homefs.network_storage)
- $(scratchfs.network_storage)
- $(projectsfs.network_storage)

- source: ./resources/third-party/scheduler/SchedMD-slurm-on-gcp-controller
kind: terraform
id: slurm_controller
settings:
subnetwork_name: ((module.network1.primary_subnetwork.name))
network_storage:
- $(homefs.network_storage)
- $(scratchfs.network_storage)
- $(projectsfs.network_storage)
login_network_storage:
- $(homefs.network_storage)
- $(scratchfs.network_storage)
- $(projectsfs.network_storage)
partitions:
- $(compute_partition.partition)

- source: ./resources/third-party/scheduler/SchedMD-slurm-on-gcp-login-node
kind: terraform
id: slurm_login
settings:
subnetwork_name: ((module.network1.primary_subnetwork.name))
network_storage:
- $(homefs.network_storage)
- $(scratchfs.network_storage)
- $(projectsfs.network_storage)
login_network_storage:
- $(homefs.network_storage)
- $(scratchfs.network_storage)
- $(projectsfs.network_storage)
controller_name: $(slurm_controller.controller_node_name)
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@
# See the License for the specific language governing permissions and
# limitations under the License.

blueprint_name: hpc-slurm
blueprint_name: hpc-cluster-small

vars:
project_id: ## Set GCP Project ID Here ##
deployment_name: hpc-slurm
deployment_name: hpc-slurm-small
region: europe-west4
zone: europe-west4-a

Expand All @@ -36,7 +36,7 @@ resource_groups:

- source: ./resources/third-party/compute/SchedMD-slurm-on-gcp-partition
kind: terraform
id: compute-partition
id: compute_partition
settings:
partition_name: compute
max_node_count: 20
Expand All @@ -46,7 +46,7 @@ resource_groups:

- source: ./resources/third-party/scheduler/SchedMD-slurm-on-gcp-controller
kind: terraform
id: slurm-controller
id: slurm_controller
settings:
subnetwork_name: ((module.network1.primary_subnetwork.name))
login_node_count: 1
Expand All @@ -55,15 +55,15 @@ resource_groups:
login_network_storage:
- $(homefs.network_storage)
partitions:
- $(compute-partition.partition)
- $(compute_partition.partition)

- source: ./resources/third-party/scheduler/SchedMD-slurm-on-gcp-login-node
kind: terraform
id: slurm-login
id: slurm_login
settings:
subnetwork_name: ((module.network1.primary_subnetwork.name))
network_storage:
- $(homefs.network_storage)
login_network_storage:
- $(homefs.network_storage)
controller_name: $(slurm-controller.controller_node_name)
controller_name: $(slurm_controller.controller_node_name)
11 changes: 7 additions & 4 deletions examples/omnia-cluster-simple.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# WARNING: This example and the omnia-install resource are still under development
# and experimental! This example is not yet fully supported.

blueprint_name: omnia-cluster

vars:
Expand Down Expand Up @@ -50,7 +53,7 @@ resource_groups:

- source: ./resources/compute/simple-instance
kind: terraform
id: omnia-manager
id: omnia_manager
settings:
name_prefix: omnia-manager
network_self_link: $(network1.network_self_link)
Expand All @@ -65,7 +68,7 @@ resource_groups:

- source: ./resources/compute/simple-instance
kind: terraform
id: omnia-compute
id: omnia_compute
settings:
instance_count: 8
name_prefix: omnia-compute
Expand All @@ -84,5 +87,5 @@ resource_groups:
id: omnia
settings:
depends:
- $(omnia-compute.name)
manager_node: ((module.omnia-manager.name[0]))
- $(omnia_compute.name)
manager_node: ((module.omnia_manager.name[0]))
5 changes: 4 additions & 1 deletion ghpc.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,11 @@ package main

import (
"hpc-toolkit/cmd"
"os"
)

func main() {
cmd.Execute()
if err := cmd.Execute(); err != nil {
os.Exit(1)
}
}
6 changes: 5 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,14 @@ go 1.16

require (
github.com/gruntwork-io/terratest v0.38.0
github.com/hashicorp/hcl/v2 v2.10.1 // indirect
github.com/hashicorp/hcl/v2 v2.10.1
github.com/hashicorp/terraform-config-inspect v0.0.0-20210625153042-09f34846faab
github.com/imdario/mergo v0.3.12 // indirect
github.com/otiai10/copy v1.6.0
github.com/spf13/cobra v1.2.1
github.com/zclconf/go-cty v1.9.1
golang.org/x/net v0.0.0-20210805182204-aaa1db679c0d // indirect
golang.org/x/sys v0.0.0-20211013075003-97ac67df715c // indirect
gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f
gopkg.in/yaml.v2 v2.4.0
)
Loading