You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've tried this 3 times and it fails each time. Initially I installed a cluster and was running workload on it and it was OK. I only saw issues
once I tried to remove the bootstrap node and seeing issues accessing pvcs. The bootstrap remove got hung up (removed node from PowerVC and but stuck later on). An odd problem with NFS where IOs were hung but not an obvious issue with physical storage. Then I tried to remove bootstrap node immediately after recreating the cluster. This also resulted in issues where NFS filesystem wasn't mounted, and yet another issue where again the terraform execution was stuck after removing bootstrap node from PowerVC (yet NFS mount was OK here).
Here are some details below with last attempt. We are stuck in the gathering facts task for the ansible ocp4-helpernode playbook
Changes to Outputs:
~ bootstrap_ip = "" -> ""
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Destroying... [
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Still destroyin
g... [id=2c289dad-9552-4037-8105-f798406ff623, 10s elapsed]
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Still destroyin
g... [id=2c289dad-9552-4037-8105-f798406ff623, 20s elapsed]
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Still destroyin
g... [id=2c289dad-9552-4037-8105-f798406ff623, 30s elapsed]
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Destruction com
plete after 34s
module.helpernode.null_resource.config: Destroying... [id=3876494058890088587]
module.helpernode.null_resource.config: Destruction complete after 0s
module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Destroyin
g... [id=f874bcaf-e8d5-46f6-8088-652ee3b9930a/317b2360-639d-4d8a-8b34-a58f1bb19e
e9][0]: Destroying... [id
=7072aba5-ac95-4b36-994a-1855f2624b55][0]: Destruction compl
ete after 7s
module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Destructi
on complete after 9s
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Destroying...
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Still destroy
ing... [id=317b2360-639d-4d8a-8b34-a58f1bb19ee9, 10s elapsed]
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Destruction c
omplete after 11s
module.helpernode.null_resource.config: Creating...
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Creating...
module.helpernode.null_resource.config: Provisioning with 'remote-exec'...
module.helpernode.null_resource.config (remote-exec): Connecting to remote host
via SSH...
module.helpernode.null_resource.config (remote-exec): Host:
module.helpernode.null_resource.config (remote-exec): User: root
module.helpernode.null_resource.config (remote-exec): Password: false
module.helpernode.null_resource.config (remote-exec): Private key: true
module.helpernode.null_resource.config (remote-exec): Certificate: false
module.helpernode.null_resource.config (remote-exec): SSH Agent: false
module.helpernode.null_resource.config (remote-exec): Checking Host Key: false
module.helpernode.null_resource.config (remote-exec): Target Platform: unix
module.helpernode.null_resource.config (remote-exec): Connected!
module.helpernode.null_resource.config (remote-exec): Cloning into ocp4-helperno
module.helpernode.null_resource.config (remote-exec): Note: switching to 'adb110
module.helpernode.null_resource.config (remote-exec): You are in 'detached HEAD'
state. You can look around, make experimental
module.helpernode.null_resource.config (remote-exec): changes and commit them, a
nd you can discard any commits you make in this
module.helpernode.null_resource.config (remote-exec): state without impacting an
y branches by switching back to a branch.
module.helpernode.null_resource.config (remote-exec): If you want to create a ne
w branch to retain commits you create, you may
module.helpernode.null_resource.config (remote-exec): do so (now or later) by us
ing -c with the switch command. Example:
Not sure how I missed this issue. Suggest using markdown code format while pasting console logs.
Coming back to the root cause the main line that shows the reason for recreating the nfs disk(module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]):
Seems the terraform provider for openstack is returning the storage template ID when querying the service. Which detects there is a change in the template for you as shown above. We have not used this feature recently but seems something is changed recently where only ID will work.
As a workaround please set variable volume_storage_template to a value "63272fa4-2a99-4a94-ab1e-2a12fb64b1f8" and run apply. This should not detect forced replacement change.
I've tried this 3 times and it fails each time. Initially I installed a cluster and was running workload on it and it was OK. I only saw issues
once I tried to remove the bootstrap node and seeing issues accessing pvcs. The bootstrap remove got hung up (removed node from PowerVC and but stuck later on). An odd problem with NFS where IOs were hung but not an obvious issue with physical storage. Then I tried to remove bootstrap node immediately after recreating the cluster. This also resulted in issues where NFS filesystem wasn't mounted, and yet another issue where again the terraform execution was stuck after removing bootstrap node from PowerVC (yet NFS mount was OK here).
Here are some details below with last attempt. We are stuck in the gathering facts task for the ansible ocp4-helpernode playbook
Output from terraform:
$ terraform apply -var-file var.tfvars[0]: Reading... Reading...[0]: Reading... Read complete after 0s [id=1
ec8928da9e89f9b35deb26dd484665fda91d99d73e31330dce71edf3a4e19cc][0]: Read complete after 0s [id=
7551bfa9e87523c711bf18607b8af5ccfee1657ea6c4817bbc3dd2186602f590][0]: Read complete after 0s [id=
28b9dcc333049039879c9c1e94f95816f0341945047e8ae59674e1233f72be83] Reading... Reading... Reading...
module.bastion.openstack_compute_keypair_v2.key-pair[0]: Refreshing state... [id
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Refreshing st
ate... [id=317b2360-639d-4d8a-8b34-a58f1bb19ee9] Reading... Reading... Read complete after
2s [id=f5e55ae3-c790-4a29-91e1-ce04a1acfc69] Reading... Read complete after
2s [id=1e5b0eed-6681-4305-8bc9-e20afb9f7cca] Read complete a
fter 2s [id=874b188b-074a-4042-b0c8-3a22f04f8302] Read complete after
2s [id=d364331a-9f24-4784-bced-3765e0c097ed] Read complete after 2s
[id=874b188b-074a-4042-b0c8-3a22f04f8302] Read complete after 0
s [id=63011d28-987a-4ae1-a094-595f2e513a23][0]: Refreshing state...
[id=91a5c711-f109-4ec0-91e7-86cd821233cc][0]: Refreshing state.
.. [id=7072aba5-ac95-4b36-994a-1855f2624b55]
module.bastion.openstack_compute_instance_v2.bastion[0]: Refreshing state... [id
=f874bcaf-e8d5-46f6-8088-652ee3b9930a][0]: Refreshing state...
[id=6d0e1c9a-aa11-48c3-80cd-e22c2cbe8abe][0]: Refreshing state...
module.bastion.null_resource.bastion_init[0]: Refreshing state... [id=5535521664
module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Refreshin
g state... [id=f874bcaf-e8d5-46f6-8088-652ee3b9930a/317b2360-639d-4d8a-8b34-a58f
module.bastion.null_resource.bastion_register[0]: Refreshing state... [id=390765
module.bastion.null_resource.enable_repos[0]: Refreshing state... [id=8446503032
module.bastion.null_resource.bastion_packages[0]: Refreshing state... [id=756303
module.bastion.null_resource.setup_nfs_disk[0]: Refreshing state... [id=57837008
01307001475][0]: Reading... Reading...[0]: Read complete after 0s [id=85
d98bf1d766507417ab5b578be1abe6f3e6c0a80e57a931862b80f5ff8b4153][0]: Reading...
module.helpernode.null_resource.config: Refreshing state... [id=3876494058890088
587][0]: Read complete after 0s [id=7a
035ac3f88d415956417f73f6ecd986a9d339cdbbea088f5332e0cd8a46de94] Read complete after 0s [id=
module.installconfig.null_resource.pre_install[0]: Refreshing state... [id=14174
module.installconfig.null_resource.install_config: Refreshing state... [id=46832
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Refreshing stat
e... [id=2c289dad-9552-4037-8105-f798406ff623]
module.bootstrapconfig.null_resource.bootstrap_config: Refreshing state... [id=6
module.masternodes.openstack_compute_instance_v2.master[0]: Refreshing state...
module.bootstrapcomplete.null_resource.bootstrap_complete: Refreshing state... [
module.workernodes.openstack_compute_instance_v2.worker[0]: Refreshing state...
module.workernodes.null_resource.remove_worker[0]: Refreshing state... [id=17321
module.install.null_resource.install: Refreshing state... [id=287553626659034692
module.install.null_resource.upgrade[0]: Refreshing state... [id=871900455648840
Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0] must be re
-/+ resource "openstack_blockstorage_volume_v3" "storage_volume" {
~ attachment = [
- {
- device = "/dev/sdb"
- id = "317b2360-639d-4d8a-8b34-a58f1bb19ee9"
- instance_id = "f874bcaf-e8d5-46f6-8088-652ee3b9930a"
] -> (known after apply)
~ availability_zone = "nova" -> (known after apply)
~ id = "317b2360-639d-4d8a-8b34-a58f1bb19ee9" -> (known aft
er apply)
~ metadata = {
- "attached_mode" = "rw"
- "volume_wwn" = "60050768028105F5D0000000000002D4"
} -> (known after apply)
name = "merlin2-nfs-storage-vol"
+ region = (known after apply)
~ volume_type = " base template" -> "6327
2fa4-2a99-4a94-ab1e-2a12fb64b1f8" # forces replacement
# (1 unchanged attribute hidden)
module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0] must b
e replaced
-/+ resource "openstack_compute_volume_attach_v2" "storage_v_attach" {
~ device = "/dev/sdb" -> (known after apply)
~ id = "f874bcaf-e8d5-46f6-8088-652ee3b9930a/317b2360-639d-4d8a-8
b34-a58f1bb19ee9" -> (known after apply)
+ region = (known after apply)
~ volume_id = "317b2360-639d-4d8a-8b34-a58f1bb19ee9" # forces replacemen
t -> (known after apply)
# (1 unchanged attribute hidden)
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0] will be dest
(because index [0] is out of range for count)
access_ip_v4 = "" -> null
all_metadata = {
} -> null
all_tags = [] -> null
availability_zone = "Default Group" -> null
created = "2023-05-01 21:09:11 +0000 UTC" -> null
flavor_id = "874b188b-074a-4042-b0c8-3a22f04f8302" -> null
flavor_name = "bastion_bootstrap" -> null
force_delete = false -> null
id = "2c289dad-9552-4037-8105-f798406ff623" -> null
image_id = "a518c74e-cd80-4c67-8724-15b2720b2108" -> null
image_name = "rhcos-new" -> null
name = "merlin2-bootstrap" -> null
power_state = "active" -> null
security_groups = [] -> null
stop_before_destroy = false -> null
updated = "2023-05-01 22:04:38 +0000 UTC" -> null
user_data = "eb7b092f153c6094e6202339c2b0ef36dbc518fd" -> null
network {
module.helpernode.null_resource.config must be replaced
-/+ resource "null_resource" "config" {
~ id = "3876494058890088587" -> (known after apply)
~ triggers = { # forces replacement
~ "bootstrap_count" = "1" -> "0"
# (2 unchanged elements hidden)
}[0] will be destro
(because index [0] is out of range for count)
admin_state_up = true -> null
all_fixed_ips = [
] -> null
all_security_group_ids = [] -> null
all_tags = [] -> null
device_id = "2c289dad-9552-4037-8105-f798406ff623" -> null
device_owner = "compute:Default Group" -> null
dns_assignment = [] -> null
id = "7072aba5-ac95-4b36-994a-1855f2624b55" -> null
mac_address = "fa:16:3e:5c:1d:b7" -> null
name = "merlin2-bootstrap-port" -> null
network_id = "f5e55ae3-c790-4a29-91e1-ce04a1acfc69" -> null
port_security_enabled = false -> null
tags = [] -> null
tenant_id = "e4af56f8139e4418abcb29c723bf15a9" -> null
binding {
fixed_ip {
Plan: 3 to add, 0 to change, 5 to destroy.
Changes to Outputs:
~ bootstrap_ip = "" -> ""
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Destroying... [
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Still destroyin
g... [id=2c289dad-9552-4037-8105-f798406ff623, 10s elapsed]
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Still destroyin
g... [id=2c289dad-9552-4037-8105-f798406ff623, 20s elapsed]
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Still destroyin
g... [id=2c289dad-9552-4037-8105-f798406ff623, 30s elapsed]
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Destruction com
plete after 34s
module.helpernode.null_resource.config: Destroying... [id=3876494058890088587]
module.helpernode.null_resource.config: Destruction complete after 0s
module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Destroyin
g... [id=f874bcaf-e8d5-46f6-8088-652ee3b9930a/317b2360-639d-4d8a-8b34-a58f1bb19e
e9][0]: Destroying... [id
=7072aba5-ac95-4b36-994a-1855f2624b55][0]: Destruction compl
ete after 7s
module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Destructi
on complete after 9s
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Destroying...
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Still destroy
ing... [id=317b2360-639d-4d8a-8b34-a58f1bb19ee9, 10s elapsed]
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Destruction c
omplete after 11s
module.helpernode.null_resource.config: Creating...
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Creating...
module.helpernode.null_resource.config: Provisioning with 'remote-exec'...
module.helpernode.null_resource.config (remote-exec): Connecting to remote host
via SSH...
module.helpernode.null_resource.config (remote-exec): Host:
module.helpernode.null_resource.config (remote-exec): User: root
module.helpernode.null_resource.config (remote-exec): Password: false
module.helpernode.null_resource.config (remote-exec): Private key: true
module.helpernode.null_resource.config (remote-exec): Certificate: false
module.helpernode.null_resource.config (remote-exec): SSH Agent: false
module.helpernode.null_resource.config (remote-exec): Checking Host Key: false
module.helpernode.null_resource.config (remote-exec): Target Platform: unix
module.helpernode.null_resource.config (remote-exec): Connected!
module.helpernode.null_resource.config (remote-exec): Cloning into ocp4-helperno
module.helpernode.null_resource.config (remote-exec): Note: switching to 'adb110
module.helpernode.null_resource.config (remote-exec): You are in 'detached HEAD'
state. You can look around, make experimental
module.helpernode.null_resource.config (remote-exec): changes and commit them, a
nd you can discard any commits you make in this
module.helpernode.null_resource.config (remote-exec): state without impacting an
y branches by switching back to a branch.
module.helpernode.null_resource.config (remote-exec): If you want to create a ne
w branch to retain commits you create, you may
module.helpernode.null_resource.config (remote-exec): do so (now or later) by us
ing -c with the switch command. Example:
module.helpernode.null_resource.config (remote-exec): git switch -c
module.helpernode.null_resource.config (remote-exec): Or undo this operation wit
module.helpernode.null_resource.config (remote-exec): git switch -
module.helpernode.null_resource.config (remote-exec): Turn off this advice by se
tting config variable advice.detachedHead to false
module.helpernode.null_resource.config (remote-exec): HEAD is now at adb1102 Mer
ge pull request #305 from redhat-cop/devel
module.helpernode.null_resource.config: Provisioning with 'file'...
module.helpernode.null_resource.config: Still creating... [10s elapsed]
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Still creatin
g... [10s elapsed]
module.helpernode.null_resource.config: Provisioning with 'file'...
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Creation comp
lete after 12s [id=35ba1876-52b6-4769-9950-eaf3be077eaa]
module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Creating.
module.helpernode.null_resource.config: Provisioning with 'file'...
module.helpernode.null_resource.config: Provisioning with 'remote-exec'...
module.helpernode.null_resource.config (remote-exec): Connecting to remote host
via SSH...
module.helpernode.null_resource.config (remote-exec): Host:
module.helpernode.null_resource.config (remote-exec): User: root
module.helpernode.null_resource.config (remote-exec): Password: false
module.helpernode.null_resource.config (remote-exec): Private key: true
module.helpernode.null_resource.config (remote-exec): Certificate: false
module.helpernode.null_resource.config (remote-exec): SSH Agent: false
module.helpernode.null_resource.config (remote-exec): Checking Host Key: false
module.helpernode.null_resource.config (remote-exec): Target Platform: unix
module.helpernode.null_resource.config (remote-exec): Connected!
module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Creation
complete after 7s [id=f874bcaf-e8d5-46f6-8088-652ee3b9930a/35ba1876-52b6-4769-99
module.helpernode.null_resource.config (remote-exec): Running ocp4-helpernode pl
module.helpernode.null_resource.config: Still creating... [20s elapsed]
module.helpernode.null_resource.config (remote-exec): Using /root/ocp4-helpernod
e/ansible.cfg as config file
module.helpernode.null_resource.config (remote-exec): PLAY [all] ***************
module.helpernode.null_resource.config (remote-exec): TASK [Gathering Facts] ***
module.helpernode.null_resource.config: Still creating... [30s elapsed]
module.helpernode.null_resource.config: Still creating... [17h6m25s elapsed]
Initiating node info:
$ ps -ef | grep terraform
gjertsen 3410758 12015 0 May01 pts/1 00:08:32 terraform apply -var-file va
gjertsen 3411154 3410758 0 May01 pts/1 00:00:03 .terraform/providers/registr
bastion node state:
ps -ef | grep ansible
root 67764 67738 7 May01 pts/1 01:21:07 /usr/libexec/platform-python /usr/bin/ansible-playbook -i inventory -e @helpernode_vars.yaml tasks/main.yml -v --become
root 67771 67764 0 May01 pts/1 00:00:00 /usr/libexec/platform-python /usr/bin/ansible-playbook -i inventory -e @helpernode_vars.yaml tasks/main.yml -v --become
root 67782 1 0 May01 ? 00:00:00 ssh: /root/.ansible/cp/08610c3669 [mux]
root 67890 67771 0 May01 pts/1 00:00:00 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User="root" -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/08610c3669 -tt /bin/sh -c '/usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1682979218.3504968-67771-94255852006671/ && sleep 0'
root 67891 67783 0 May01 pts/3 00:00:00 /bin/sh -c /usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1682979218.3504968-67771-94255852006671/ && sleep 0
root 67912 67891 0 May01 pts/3 00:00:04 /usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1682979218.3504968-67771-94255852006671/
NFS mount looks OK
ls -al /export
total 0
drwxrwxrwx. 3 nobody nobody 92 May 1 17:41 .
dr-xr-xr-x. 19 root root 259 May 1 17:06 ..
drwxrwxrwx. 2 nobody nobody 6 May 1 17:41 openshift-image-registry-registry-pvc-pvc-5b20c6ca-b184-41eb-b145-c5253c26015a
The text was updated successfully, but these errors were encountered: