-
Notifications
You must be signed in to change notification settings - Fork 26
Image build
The appliance always uses the ansible in ansible/site.yml
but this can be used in two ways:
- To directly configure nodes (baremetal or VM).
- To build a compute or login node images in Packer, which can subsequently be deployed to nodes.
Note that building control node images is not currently supported.
It is recommended that:
- An initial deployment is done by directly configuring nodes, probably on a smaller cluster than the final goal. This enables any bugs in config, networking, DNS etc to be worked out. The
ansible/adhoc/hpctests.yml
can be run to check hardware/software stack performance is as expected. - Once happy, it is strongly recommended that images are built and deployed to the final cluster. This ensures that nodes can always be replaced and protects against changes in upstream repos (e.g. OpenHPC releases) which make subsequent directly-configured nodes incompatible with the existing cluster.
If using dev/staging/prod environments consider:
- Using direct configuration in
dev
- Testing built images in
staging
- Only deploying images in
prod
Testing images in a staging
environment needs to consider the cluster-specific state built into the images. This is typically:
- The address for the slurmctld, any filesystem server addresses and monitoring service addresses
- The contents of the environment's
secrets.yml
such as the Munge key
By default, addresses are defined in the appliance config using the inventory_hostname
for the relevant node. With working DNS it should therefore be possible to create a staging
cluster with names matching production
in a separate network, which can run unmodified production
images. The only differences will be in the definition of the Slurm partitions (assuming a different cluster size) on the Slurm control node. Alternatively, compute node images could be tested in a temporary partition of a production cluster, and login node images tested in a similar way, just by modifying the partition/login node definitions.
By default, yum update is enabled in packer builds but not for direct configuration. This ensures running ansible against an existing cluster does not (by default) perform updates. To perform updates during direct configuration (i.e. on a live cluster) override the variables in environments/common/inventory/group_vars/all/update.yml
- the ansible/adhoc/update-packages.yml
playbook can also be used to avoid running the whole of ansible/site.yml
.
Alternatively, updates could be performed on a pre-built image either by:
-
Booting the image, making changes (manually or via ansible) and snapshotting it.
-
Using tools such as
virt-customize
:$ virt-customize -a <image.qcow2> --run-command "dnf upgrade -y <pkg_name>"