Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No way to attach a disk to a Flatcar VM on Azure #1297

Closed
TimoKramer opened this issue Dec 18, 2023 · 21 comments
Closed

No way to attach a disk to a Flatcar VM on Azure #1297

TimoKramer opened this issue Dec 18, 2023 · 21 comments

Comments

@TimoKramer
Copy link

Description

I tried every imaginable combination of configuration of storage/disks and storage/filesystems on a VM on Azure. It always fails to complete the creation. I don't know how to debug that and I don't know how to get a proper error message back unfortunately.

Impact

Not being able to easily mount and format a disk makes it impossible to use Flatcar right now. Maybe I can reliably script that.

Environment and steps to reproduce

  1. Set-up: MS Azure VM deployed with Terraform
  2. Task: Attaching, formatting and mounting a data disk
  3. Action(s):
    a. Requested the start of a new VM with a disk attached
    b. creation is looping for
    b. virtual machine failing to be created
  4. Error: VM is in failed state

Expected behavior

I would like to have my Azure VM properly deployed with a disk attached, formatted and mounted

Additional information

Flatcar 3602.2.2

Using this for attaching the disk and tried like here without the disks part but it's the same. So I thought we have to provision the disk first like with fdisk so I added the disks part. Still no better.

  disks:
    - device: /dev/disk/azure/scsi1/lun10
      partitions:
        - label: odoo
  filesystems:
    - device: /dev/disk/azure/scsi1/lun10
      format: ext4
      wipe_filesystem: true
      label: odoo
@TimoKramer TimoKramer added the kind/bug Something isn't working label Dec 18, 2023
@TimoKramer TimoKramer changed the title disks and filesystem let the creation fail No way to attach a disk to a Flatcar VM on Azure Dec 18, 2023
@tormath1
Copy link
Contributor

Hi @TimoKramer, can you share your Terraform config used to deploy this instance?

@TimoKramer
Copy link
Author

module "naming" {
  source  = "Azure/naming/azurerm"
  version = "0.4.0"
  prefix  = [var.environment]
}

resource "azurerm_resource_group" "rg" {
  name     = module.naming.resource_group.name
  location = var.resource_group_location
}

resource "azurerm_dns_zone" "zone" {
  name                = "${var.dns_zone_name}.germanywestcentral.cloudapp.azure.com"
  resource_group_name = azurerm_resource_group.rg.name
}

resource "azurerm_public_ip" "public_ip" {
  name                = module.naming.public_ip.name
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  allocation_method   = "Static"
  domain_name_label   = var.dns_zone_name

}

resource "azurerm_dns_a_record" "record" {
  name                = "dev-psql"
  resource_group_name = azurerm_resource_group.rg.name
  zone_name           = azurerm_dns_zone.zone.name
  ttl                 = 3600
  target_resource_id  = azurerm_public_ip.public_ip.id
}

resource "azurerm_virtual_network" "test" {
  name                = module.naming.virtual_network.name
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
}

resource "azurerm_network_security_group" "odoo" {
  name                = "odoo-${module.naming.network_security_group.name}"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  security_rule {
    name                       = module.naming.network_security_rule.name
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = azurerm_subnet.odoo.address_prefixes[0]
  }
}

resource "azurerm_network_security_group" "postgres" {
  name                = "psql-${module.naming.network_security_group.name}"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  security_rule {
    name                       = "test123"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "5432"
    source_address_prefix      = azurerm_subnet.odoo.address_prefixes[0]
    destination_address_prefix = azurerm_subnet.postgres.address_prefixes[0]
  }
}

resource "azurerm_subnet" "odoo" {
  name                 = "odoo-${module.naming.subnet.name}"
  resource_group_name  = azurerm_resource_group.rg.name
  virtual_network_name = azurerm_virtual_network.test.name
  address_prefixes     = ["10.0.1.0/24"]
}

resource "azurerm_subnet" "postgres" {
  name                 = "postgres-${module.naming.subnet.name}"
  resource_group_name  = azurerm_resource_group.rg.name
  virtual_network_name = azurerm_virtual_network.test.name
  address_prefixes     = ["10.0.2.0/24"]
  service_endpoints    = ["Microsoft.Storage"]
  delegation {
    name = "fs"
    service_delegation {
      name = "Microsoft.DBforPostgreSQL/flexibleServers"
      actions = [
        "Microsoft.Network/virtualNetworks/subnets/join/action",
      ]
    }
  }
}

resource "azurerm_subnet_network_security_group_association" "odoo" {
  subnet_id                 = azurerm_subnet.odoo.id
  network_security_group_id = azurerm_network_security_group.odoo.id
}

resource "azurerm_subnet_network_security_group_association" "postgres" {
  subnet_id                 = azurerm_subnet.postgres.id
  network_security_group_id = azurerm_network_security_group.postgres.id
}

resource "azurerm_private_dns_zone" "postgres" {
  name                = "odoo-postgres-pdz.postgres.database.azure.com"
  resource_group_name = azurerm_resource_group.rg.name
  depends_on          = [azurerm_subnet_network_security_group_association.postgres]
}

resource "azurerm_private_dns_zone_virtual_network_link" "postgres" {
  name                  = "odoo-postgres-pdzvnetlink.com"
  private_dns_zone_name = azurerm_private_dns_zone.postgres.name
  virtual_network_id    = azurerm_virtual_network.test.id
  resource_group_name   = azurerm_resource_group.rg.name
}

resource "azurerm_postgresql_flexible_server" "postgres" {
  name                   = "odoo-postgres-server"
  resource_group_name    = azurerm_resource_group.rg.name
  location               = azurerm_resource_group.rg.location
  version                = "16"
  delegated_subnet_id    = azurerm_subnet.postgres.id
  private_dns_zone_id    = azurerm_private_dns_zone.postgres.id
  administrator_login    = "core"
  administrator_password = var.postgres_password
  zone                   = "1"
  storage_mb             = 32768
  sku_name               = "B_Standard_B2ms"
  backup_retention_days  = 7

  depends_on = [azurerm_private_dns_zone_virtual_network_link.postgres]
}

resource "azurerm_lb" "public" {
  name                = module.naming.lb.name
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  sku                 = "Basic"
  frontend_ip_configuration {
    name                 = "publicIPAddress"
    public_ip_address_id = azurerm_public_ip.public_ip.id
  }
}

resource "azurerm_lb_backend_address_pool" "test" {
  name            = "BackEndAddressPool"
  loadbalancer_id = azurerm_lb.public.id
}

resource "azurerm_lb_probe" "probe" {
  name                = "tcp-probe"
  protocol            = "Tcp"
  port                = 443
  loadbalancer_id     = azurerm_lb.public.id
}

resource "azurerm_lb_rule" "https-rule" {
  name                           = "https-rule"
  protocol                       = "Tcp"
  frontend_port                  = 443
  backend_port                   = 443
  backend_address_pool_ids       = [azurerm_lb_backend_address_pool.test.id]
  frontend_ip_configuration_name = azurerm_lb.public.frontend_ip_configuration[0].name
  probe_id                       = azurerm_lb_probe.probe.id
  loadbalancer_id                = azurerm_lb.public.id
}

resource "azurerm_lb_rule" "http-rule" {
  name                           = "http-rule"
  protocol                       = "Tcp"
  frontend_port                  = 80
  backend_port                   = 80
  backend_address_pool_ids       = [azurerm_lb_backend_address_pool.test.id]
  frontend_ip_configuration_name = azurerm_lb.public.frontend_ip_configuration[0].name
  probe_id                       = azurerm_lb_probe.probe.id
  loadbalancer_id                = azurerm_lb.public.id
}

resource "azurerm_network_interface" "internal" {
  count               = var.cluster_count
  name                = "${module.naming.network_interface.name}-${random_string.vm.id}-${count.index}"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name

  ip_configuration {
    name                          = "testConfiguration${count.index}"
    subnet_id                     = azurerm_subnet.odoo.id
    private_ip_address_allocation = "Dynamic"
  }
}

resource "azurerm_network_interface_backend_address_pool_association" "test" {
  count                   = var.cluster_count
  network_interface_id    = azurerm_network_interface.internal[count.index].id
  ip_configuration_name   = "testConfiguration${count.index}"
  backend_address_pool_id = azurerm_lb_backend_address_pool.test.id
}

resource "azurerm_availability_set" "odoo" {
  name                         = module.naming.availability_set.name
  location                     = azurerm_resource_group.rg.location
  resource_group_name          = azurerm_resource_group.rg.name
  platform_fault_domain_count  = 2
  platform_update_domain_count = 2
  managed                      = true
}

resource "random_string" "vm" {
  length         = 8
  special        = false
}

resource "azurerm_linux_virtual_machine" "odoo" {
  count                 = var.cluster_count
  name                  = "${module.naming.linux_virtual_machine.name}-${random_string.vm.id}-${count.index}"
  location              = azurerm_resource_group.rg.location
  availability_set_id   = azurerm_availability_set.odoo.id
  resource_group_name   = azurerm_resource_group.rg.name
  network_interface_ids = [azurerm_network_interface.internal[count.index].id]
  size                  = var.server_type

  plan {
    name      = "stable-gen2"
    product   = "flatcar-container-linux-free"
    publisher = "kinvolk"
  }

  source_image_reference {
    publisher = "kinvolk"
    offer     = "flatcar-container-linux-free"
    sku       = "stable-gen2"
    version   = var.flatcar_version
  }

  admin_ssh_key {
    username   = var.username
    public_key = file("~/.ssh/core.key.pub")
  }

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
    name                 = "osdisk-${random_string.vm.id}-${count.index}"
  }

  computer_name  = "odoo-${count.index}"
  admin_username = var.username
  custom_data = base64encode(data.ct_config.machine-ignitions.rendered)
  # lifecycle {
  #   create_before_destroy = true
  # }
}

resource "azurerm_managed_disk" "datadisk" {
  count                = var.cluster_count
  name                 = "${module.naming.managed_disk.name}-odoo-datadisk-${count.index}"
  location             = azurerm_resource_group.rg.location
  resource_group_name  = azurerm_resource_group.rg.name
  storage_account_type = "Standard_LRS"
  create_option        = "Empty"
  disk_size_gb         = "256"
}

resource "azurerm_virtual_machine_data_disk_attachment" "test" {
  count              = var.cluster_count
  managed_disk_id    = azurerm_managed_disk.datadisk[count.index].id
  virtual_machine_id = azurerm_linux_virtual_machine.odoo[count.index].id
  lun                = "10"
  caching            = "ReadWrite"
}

resource "terraform_data" "machine-ignitions" {
  input = data.ct_config.machine-ignitions.rendered
}

data "ct_config" "machine-ignitions" {
  content  = data.template_file.machine-configs.rendered
  strict   = true
}

data "template_file" "machine-configs" {
  template = file("${path.module}/cl/machine-mynode.yaml.tmpl")

  vars = {
    ssh_keys = jsonencode(var.ssh_keys)
    user     = var.username
    port     = var.ssh_port
    db_host  = azurerm_dns_a_record.record.fqdn #azurerm_private_dns_zone.postgres.name
  }
}

@tormath1
Copy link
Contributor

Great thanks, can you share now your machine-mynode.yaml.tmpl + version of the Terraform providers please.

@TimoKramer
Copy link
Author

terraform {
  required_version = ">=1.2"
  required_providers {
    ct = {
      source  = "poseidon/ct"
      version = "~>0.13.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~>3.0"
    }
    random = {
      source  = "hashicorp/random"
      version = "~>3.0"
    }
    template = {
      source  = "hashicorp/template"
      version = "~> 2.2.0"
    }
    null = {
      source  = "hashicorp/null"
      version = "~> 3.0.0"
    }
  }
}
provider "azurerm" {
  features {}
}

@TimoKramer
Copy link
Author

TimoKramer commented Dec 20, 2023

variant: flatcar
version: 1.0.0

storage:
  ## disks:
  ##   - device: /dev/disk/azure/scsi1/lun10
  ##     partitions:
  ##       - label: odoo
  ## filesystems:
  ##   - device: /dev/disk/azure/scsi1/lun10
  ##     format: ext4
  ##     wipe_filesystem: true
  ##     label: odoo
  directories:
    - path: /etc/traefik/acme
      user:
        name: root
  files:
    ## - path: /usr/local/bin/mount-drive.sh
    ##   mode: 0700
    ##   contents:
    ##     inline: |
    ##       #/usr/bin/env bash
    ##       set -o errexit
    ##       set -o pipefail
    ##       set -o nounset
    ##
    ##       echo 'type=83' | sfdisk /dev/disk/azure/scsi1/lun10
    ##       mkfs.ext4 -F /dev/disk/azure/scsi1/lun10-part1
    ##       mount /dev/disk/azure/scsi1/lun10-part1 /var/lib/odoo/
    ##       chown -R odoo /var/lib/odoo
    - path: /etc/flatcar/update.conf
      overwrite: true
      mode: 0420
      contents:
        inline: |
          REBOOT_STRATEGY=reboot
          LOCKSMITHD_REBOOT_WINDOW_START="Sun 04:00"
          LOCKSMITHD_REBOOT_WINDOW_LENGTH=1h
    - path: /etc/odoo/odoo.conf
      mode: 0644
      contents:
        inline: |
          [options]
          ; addons_path = /mnt/extra-addons
          data_dir = /var/lib/odoo
          ; init = base,crm,stock,point_of_sale
          admin_passwd = coregoeshard
          ; proxy_mode = True
          ; csv_internal_sep = ,
          ; db_maxconn = 64
          db_name = odoo-db
          db_user = core
          db_password = foobar
          db_host = ${db_host}
          ; db_template = template1
          ; dbfilter = .*
          ; debug_mode = False
          ; email_from = False
          ; limit_memory_hard = 2684354560
          ; limit_memory_soft = 2147483648
          ; limit_request = 8192
          ; limit_time_cpu = 60
          ; limit_time_real = 120
          ; list_db = True
          log_web = True
          ; log_db = False
          ; log_handler = [':INFO']
          ; log_level = info
          ; logfile = None
          ; longpolling_port = 8072
          ; max_cron_threads = 2
          ; osv_memory_age_limit = 1.0
          ; osv_memory_count_limit = False
          ; smtp_password = False
          ; smtp_port = 25
          ; smtp_server = localhost
          ; smtp_ssl = False
          ; smtp_user = False
          ; workers = 0
          ; xmlrpc = True
          ; xmlrpc_interface =
          ; xmlrpc_port = 8069
          ; xmlrpcs = True
          ; xmlrpcs_interface =
          ; xmlrpcs_port = 8071
    - path: /etc/traefik/traefik.yml
      mode: 0644
      contents:
        inline: |
          ## STATIC CONFIGURATION
          log:
            level: INFO
          accessLog: {}

          ## Only for development purpose. Remove it on production environment
          api:
            insecure: true
            dashboard: true

          entryPoints:
            web:
              address: ":80"
              forwardedHeaders:
                insecure: true
              http:
                redirections:
                  entryPoint:
                    to: websecure
                    scheme: https
            websecure:
              address: ":443"
              forwardedHeaders:
                insecure: true

          providers:
            file:
              filename: "/etc/traefik/traefik.yml"

          ## DYNAMIC CONFIGURATION
          http:
            routers:
              https-route-to-local-ip:
                rule: "Host(`core-odoo.germanywestcentral.cloudapp.azure.com`)"
                tls:
                  certResolver: lets-encr
                service: route-to-local-ip-service
                priority: 1000
                entryPoints:
                  - websecure
                middlewares:
                  - test-redirectscheme

              http-route-to-local-ip:
                rule: "Host(`core-odoo.germanywestcentral.cloudapp.azure.com`)"
                service: route-to-local-ip-service
                priority: 1000
                entryPoints:
                  - web


            middlewares:
              test-redirectscheme:
                redirectScheme:
                  scheme: https

            services:
              route-to-local-ip-service:
                loadBalancer:
                  servers:
                    - url: "http://odoo:8069"

          certificatesResolvers:
            lets-encr:
              acme:
                ### Only for development purpose. Remove it on production environment
                ## Uncomment for test the LE certificate
                #caServer: https://acme-staging-v02.api.letsencrypt.org/directory
                storage: "/etc/traefik/acme/acme.json"
                ### Only for development purpose. Remove it on production environment
                email: post@core.de
                tlsChallenge: {}
systemd:
  units:
    ## - name: mount-drive.service
    ##   enabled: true
    ##   contents: |
    ##     [Unit]
    ##     Description="Mount external disk to /var/lib/odoo"
    ##     Before=docker.service
    ##     [Service]
    ##     Type=oneshot
    ##     ExecStart=/usr/local/bin/mount-drive.sh
    ##     [Install]
    ##     WantedBy=multi-user.target
    - name: sshd.socket
      dropins:
      - name: 10-sshd-port.conf
        contents: |
          [Socket]
          ListenStream=
          ListenStream=${port}
    ## - name: net-internal.service
    ##   enabled: true
    ##   contents: |
    ##     [Unit]
    ##     Description=Odoo internal network
    ##     After=docker.service
    ##     Requires=docker.service
    ##     [Service]
    ##     Type=oneshot
    ##     RemainAfterExit=yes
    ##     ExecStartPre=-/usr/bin/sh -c "/usr/bin/docker network rm odoo-net-internal || true"
    ##     ExecStart=/usr/bin/docker network create --internal --subnet 172.16.0.0/24 odoo-net-internal
    ##     ExecStop=/usr/bin/docker network rm odoo-net-internal
    ##     [Install]
    ##     WantedBy=multi-user.target
    - name: traefik.service
      enabled: true
      contents: |
        [Unit]
        Description=Traefik
        After=docker.service
        Requires=docker.service
        [Service]
        TimeoutStartSec=0
        ExecStartPre=/usr/bin/docker rm --force traefik
        ExecStart=/usr/bin/docker run --name traefik \
                                      --publish 443:443 \
                                      --publish 80:80 \
                                      --pull always \
                                      --log-driver=journald \
                                      --volume /etc/traefik:/etc/traefik \
                                      --link odoo:odoo \
                                      docker.io/traefik:v3.0
        ExecStop=/usr/bin/docker stop traefik
        Restart=always
        RestartSec=5s
        [Install]
        WantedBy=multi-user.target
    - name: odoo.service
      enabled: true
      contents: |
        [Unit]
        Description=Odoo
        After=docker.service
        Requires=docker.service
        [Service]
        TimeoutStartSec=0
        ExecStartPre=/usr/bin/docker rm --force odoo
        ExecStart=/usr/bin/docker run --name odoo \
                                      --pull always \
                                      --log-driver=journald \
                                      --volume /var/lib/odoo:/var/lib/odoo \
                                      --volume /etc/odoo:/etc/odoo \
                                      docker.io/odoo:17.0
        ExecStop=/usr/bin/docker stop odoo
        Restart=always
        RestartSec=5s
        [Install]
        WantedBy=multi-user.target
passwd:
  users:
    - name: ${user}
      ssh_authorized_keys:  ${ssh_keys}
      groups:
        - docker
    - name: odoo
      uid: 101
      no_create_home: true

@tormath1
Copy link
Contributor

Did you try with the following:

variant: flatcar
version: 1.0.0

storage:
  disks:
    - device: /dev/disk/azure/scsi1/lun10
      partitions:
        - label: odoo
  filesystems:
    - device: /dev/disk/by-partlabel/odoo
      format: ext4
      wipe_filesystem: true
      label: odoo
...

@TimoKramer
Copy link
Author

no, trying now

@TimoKramer
Copy link
Author

It's failing unfortunately as well. I think I tried with /dev/disk/azure/scsi1/lun10-part1 as well

@tormath1
Copy link
Contributor

tormath1 commented Dec 20, 2023

Do you have access to the console to access the emergency target (and get some logs)?

@TimoKramer
Copy link
Author

I activated the boot diagnostics and seeing this:

�[K[   �[0;31m*�[0;1;31m*�[0m�[0;31m*�[0m] Job ignition-fetch.service/start running (37min 1s / no limit)
�M
�[K[    �[0;31m*�[0;1;31m*�[0m] Job ignition-fetch.service/start running (37min 1s / no limit)
�M
�[K[     �[0;31m*�[0m] Job ignition-fetch.service/start running (37min 2s / no limit)
�M
�[K[    �[0;31m*�[0;1;31m*�[0m] Job ignition-fetch.service/start running (37min 2s / no limit)
�M
�[K[   �[0;31m*�[0;1;31m*�[0m�[0;31m*�[0m] Job ignition-fetch.service/start running (37min 3s / no limit)
�M
�[K[  �[0;31m*�[0;1;31m*�[0m�[0;31m* �[0m] Job ignition-fetch.service/start running (37min 3s / no limit)
�M
�[K[ �[0;31m*�[0;1;31m*�[0m�[0;31m*  �[0m] Job ignition-fetch.service/start running (37min 4s / no limit)

and the screenshot
image

@TimoKramer
Copy link
Author

Here is some output from the serial console after a reboot:

         Starting iscsid.service - Open-iSCSI...
[    5.293757] iscsid[720]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi
[    5.297681] iscsid[720]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier].
[    5.312029] iscsid[720]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6.
[    5.314662] iscsid[720]: If using hardware iscsi like qla4xxx this message can be ignored.
[    5.317466] iscsid[720]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi
[    5.320594] iscsid[720]: iscsid: can't open iscsid.safe_logout configuration file /etc/iscsi/iscsid.conf
[  OK  ] Started iscsid.service - Open-iSCSI.
[    5.328692] systemd[1]: Started iscsid.service - Open-iSCSI.
         Starting dracut-initqueue.…ice - dracut initqueue hook...
[    5.337256] systemd[1]: Starting dracut-initqueue.service - dracut initqueue hook...
[  OK  ] Finished dracut-initqueue.…rvice - dracut initqueue hook.
[    5.351027] systemd[1]: Finished dracut-initqueue.service - dracut initqueue hook.
[  OK  ] Reached target remote-fs-p…eparation for Remote File Systems.
[    5.356023] systemd[1]: Reached target remote-fs-pre.target - Preparation for Remote File Systems.
[  OK  ] Reached target remote-cryp…et - Remote Encrypted Volumes.
[    5.362223] systemd[1]: Reached target remote-cryptsetup.target - Remote Encrypted Volumes.
[  OK  ] Reached target remote-fs.target - Remote File Systems.
[    5.370308] systemd[1]: Reached target remote-fs.target - Remote File Systems.
[    5.375245] ignition[708]: GET result: OK
         Starting dracut-pre-mount.…ice - dracut pre-mount hook...
[    5.380479] systemd[1]: Starting dracut-pre-mount.service - dracut pre-mount hook...
[    5.387898] systemd[1]: Finished dracut-pre-mount.service - dracut pre-mount hook.
[  OK  ] Finished dracut-pre-mount.…rvice - dracut pre-mount hook.
[    6.263494] systemd-networkd[703]: eth0: Gained IPv6LL
[     *] Job ignition-fetch.service/start running (1min 26s / no limit)

@tormath1
Copy link
Contributor

Ah ok so it fails to fetch the config. Any chance to try with user_data instead of custom_data ? Both should work but let's try.

@TimoKramer
Copy link
Author

I am pretty sure I tried that already since I figured out that the instances are not recreated when using user_data yesterday. But will try again.

@TimoKramer
Copy link
Author

TimoKramer commented Dec 20, 2023

It fails and I can see in the serial console:
Ignition failed: create partitions failed: failed to wait on disks devs: device unit dev-disk-azure-scsi1-lun10.device timeout

It seems that the disk is not available during boot

:/root# ls -lahrt /dev/disk/azure/
total 0
drwxr-xr-x  3 root root  60 Dec 20 13:48 .
drwxr-xr-x 11 root root 220 Dec 20 13:48 ..
drwxr-xr-x  2 root root 240 Dec 20 13:48 scsi0
:/root# ls -lahrt /dev/disk/azure/scsi0/
total 0
drwxr-xr-x 3 root root  60 Dec 20 13:48 ..
lrwxrwxrwx 1 root root  12 Dec 20 13:48 lun1 -> ../../../sdb
lrwxrwxrwx 1 root root  13 Dec 20 13:48 lun1-part1 -> ../../../sdb1
lrwxrwxrwx 1 root root  12 Dec 20 13:48 lun0 -> ../../../sda
lrwxrwxrwx 1 root root  13 Dec 20 13:48 lun0-part2 -> ../../../sda2
lrwxrwxrwx 1 root root  13 Dec 20 13:48 lun0-part6 -> ../../../sda6
lrwxrwxrwx 1 root root  13 Dec 20 13:48 lun0-part1 -> ../../../sda1
lrwxrwxrwx 1 root root  13 Dec 20 13:48 lun0-part9 -> ../../../sda9
lrwxrwxrwx 1 root root  13 Dec 20 13:48 lun0-part7 -> ../../../sda7
lrwxrwxrwx 1 root root  13 Dec 20 13:48 lun0-part4 -> ../../../sda4
lrwxrwxrwx 1 root root  13 Dec 20 13:48 lun0-part3 -> ../../../sda3
drwxr-xr-x 2 root root 240 Dec 20 13:48 .

EDIT: I can see it in azure portal though

@tormath1
Copy link
Contributor

ok, so at least we have now something to debug on :D I guess the instance and the disk are created in parallel, then only the disk is attached and I think you're hitting this: hashicorp/terraform-provider-azurerm#6117

@TimoKramer
Copy link
Author

oh man, so I should try the azurerm_virtual_machine module and not the linux one?! that's bad. It cost me at least two days. Thank you @tormath1, I really appreciate your help here.

@tormath1
Copy link
Contributor

@TimoKramer you can try with azurevm_virtual_machine (to at least confirm the issue) - but since the usage of this resource is deprecated, I would investigate more on the systemd unit service to format the disk when it's available (post-boot then...)

@tormath1
Copy link
Contributor

@TimoKramer if you get a chance to confirm that it works with azurevm_virtual_machine would you be interested to document your findings in the Flatcar documentation?
I think a "known issue" section here: https://www.flatcar.org/docs/latest/installing/cloud/azure/#terraform would be fine - to warn users about this current limitation.
Here's the documentation repo: https://github.com/flatcar/flatcar-website/tree/master/content/docs/latest

@tormath1 tormath1 added kind/docs and removed kind/bug Something isn't working labels Dec 21, 2023
@tormath1 tormath1 moved this from 📝 Needs Triage to ⚒️ In Progress in Flatcar tactical, release planning, and roadmap Dec 21, 2023
@TimoKramer
Copy link
Author

@tormath1 I tried with azurerm_virtual_machine but creating the storage was always failing, telling me to add either managed_disk_id or vhd_uri. I don't see why and how to add that so I ditched that. Now going down the route to add disks with systemd units.

@TimoKramer
Copy link
Author

TimoKramer commented Dec 22, 2023

@TimoKramer
Copy link
Author

I think this can be closed since it is an issue with terraform and when someone searches for this problem should find a closed one as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants