Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additional installer option to wipe all disks before install #660

Merged
merged 1 commit into from
Feb 19, 2024

Conversation

ibrokethecloud
Copy link
Contributor

Problem:

During out internal automation testing it is possible different disks were used to install of harvester.
When the disk is changed the older installation is left intact which can cause unexpected results during reboots.
To ensure this does not happen we need to be able to allow wiping of disks via the installer.

Solution:

PR adds an additional option in harvester config wipeDisks which can be passed via a config url or kernel arguments harvester.install.wipe_disks=true and will result in the harv-install script wiping all disks on the node.

Related Issue:
harvester/harvester#2066
harvester/harvester#2781
harvester/harvester#4527

Test plan:

@ibrokethecloud ibrokethecloud requested review from bk201, Vicente-Cheng and starbops and removed request for bk201 and Vicente-Cheng February 16, 2024 01:43
Copy link
Contributor

@Vicente-Cheng Vicente-Cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@innobead
Copy link
Collaborator

cc @m-ildefons

Copy link
Contributor

@tserong tserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just doing a build to give this a quick test, but LGTM.

Copy link
Member

@starbops starbops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

package/harvester-os/files/usr/sbin/harv-install Outdated Show resolved Hide resolved
@tserong tserong self-requested a review February 16, 2024 08:43
Copy link
Contributor

@tserong tserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hang on... I just tried this in a previously used VM, and hit an error. It was a manual install, not PXE, but I edited the kernel args in the grub menu to add harvester.install.wipe_disks=true to get this PR into play. Here's what I ended up with:

image

Note that the wipe did succeed - previously /dev/vdb and /dev/vdc had a bunch of partitions on them, and now they're gone:

# cat /proc/partitions 
major minor  #blocks  name

   7        0     581632 loop0
 253        0  125829120 vda
 253       16 4294967296 vdb
 253       32  209715200 vdc
 253       48   20971520 vdd
  11        0    5863104 sr0

I'll re-run and see what I can figure out....

@tserong
Copy link
Contributor

tserong commented Feb 16, 2024

It's the partprobe:

# cat /proc/partitions
major minor  #blocks  name

   7        0     581632 loop0
 253        0  125829120 vda
 253       16 4294967296 vdb
 253       17       1024 vdb1
 253       18      51200 vdb2
 253       19    8388608 vdb3
 253       20   15728640 vdb4
 253       21 1287651328 vdb5
 253       22 2983144448 vdb6
 253       32  209715200 vdc
 253       33       1024 vdc1
 253       34      65536 vdc2
 253       35    4194304 vdc3
 253       36    8388608 vdc4
 253       37  197063680 vdc5
 253       48   20971520 vdd
  11        0    5863104 sr0

# lsblk -d -n -J -o NAME,TYPE | jq -r '.blockdevices[] | select(.type == "disk") | .name'
vda
vdb
vdc
vdd

# sgdisk -Z /dev/vda
Creating new GPT entries in memory.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
# sgdisk -Z /dev/vdb
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
# sgdisk -Z /dev/vdc
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
# sgdisk -Z /dev/vdd
Creating new GPT entries in memory.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.

# partprobe -s
/dev/vdd: msdos partitions
/dev/vdb: msdos partitions
Error: Can't have a partition outside the disk!
/dev/vdc: msdos partitions
/dev/vda: msdos partitions

# echo $?
1

@Vicente-Cheng
Copy link
Contributor

Vicente-Cheng commented Feb 16, 2024

Hi @tserong,
Could you check the vdb partitions table?
It seems to be the root cause of the issue.

@tserong
Copy link
Contributor

tserong commented Feb 16, 2024

In this case /dev/vdb is a 4TiB disk, and I don't know why it purports to have msdos partitions what with having just been zapped... But that'll be where the error is coming from.

@tserong
Copy link
Contributor

tserong commented Feb 16, 2024

# fdisk -l /dev/vdb
Disk /dev/vdb: 4 TiB, 4398046511104 bytes, 8589934592 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

# parted /dev/vdb print
Error: /dev/vdb: unrecognised disk label
Model: Virtio Block Device (virtblk)                                      
Disk /dev/vdb: 4398GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags: 

# cat /proc/partitions 
major minor  #blocks  name

   7        0     581632 loop0
 253        0  125829120 vda
 253       16 4294967296 vdb
 253       32  209715200 vdc
 253       48   20971520 vdd
  11        0    5863104 sr0

There's nothing there. What am I missing?

@Vicente-Cheng
Copy link
Contributor

# fdisk -l /dev/vdb
Disk /dev/vdb: 4 TiB, 4398046511104 bytes, 8589934592 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

# parted /dev/vdb print
Error: /dev/vdb: unrecognised disk label
Model: Virtio Block Device (virtblk)                                      
Disk /dev/vdb: 4398GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags: 

# cat /proc/partitions 
major minor  #blocks  name

   7        0     581632 loop0
 253        0  125829120 vda
 253       16 4294967296 vdb
 253       32  209715200 vdc
 253       48   20971520 vdd
  11        0    5863104 sr0

There's nothing there. What am I missing?

How about vdc?

@tserong
Copy link
Contributor

tserong commented Feb 16, 2024

vdc is the same (no partitions). I did a little more digging, turns out the partprobe of all devices is picking up the CD-ROM:

# for dev in vda vdb vdc vdd sr0 ; do echo $dev ; partprobe -s /dev/$dev ; done
vda
/dev/vda: msdos partitions
vdb
/dev/vdb: msdos partitions
vdc
/dev/vdc: msdos partitions
vdd
/dev/vdd: msdos partitions
sr0
Error: Can't have a partition outside the disk!

So, I've got a couple of ideas:

  1. Move the partprobe inside the above loop (so we only call partprobe -s /dev/$disk for each disk wiped), or,
  2. Keep the existing partprobe but ignore the return value (so change it to partprobe -s || :)

@Vicente-Cheng
Copy link
Contributor

Vicente-Cheng commented Feb 16, 2024

vdc is the same (no partitions). I did a little more digging, turns out the partprobe of all devices is picking up the CD-ROM:

# for dev in vda vdb vdc vdd sr0 ; do echo $dev ; partprobe -s /dev/$dev ; done
vda
/dev/vda: msdos partitions
vdb
/dev/vdb: msdos partitions
vdc
/dev/vdc: msdos partitions
vdd
/dev/vdd: msdos partitions
sr0
Error: Can't have a partition outside the disk!

So, I've got a couple of ideas:

  1. Move the partprobe inside the above loop (so we only call partprobe -s /dev/$disk for each disk wiped), or,
  2. Keep the existing partprobe but ignore the return value (so change it to partprobe -s || :)

nice catch, sr0 can not show the partition table.

If you just run partprobe (w/o display partition), does it still hit error?

@tserong
Copy link
Contributor

tserong commented Feb 16, 2024

If you just run partprobe (w/o display partition), does it still hit error?

Yup:

# partprobe
Error: Can't have a partition outside the disk!
# echo $?
1

@tserong
Copy link
Contributor

tserong commented Feb 16, 2024

One other thought - as I mentioned in harvester/harvester#4527 (comment):

Thinking about this further though, some care must be taken before wiping everything that looks like a disk that's attached to a given host. Imagine you had a system that could see a bunch of random disks on a SAN. Wiping all of these might not be a good idea.

I still think this PR is fine, but when we document the harvester.install.wipe_disks=true option, we're going to need to say something like "DO NOT USE THIS UNLESS YOU REALLY REALLY MEAN IT"

Added partprobe based on feedback

partprobe wiped disk only

fixed indentation based on feedback
Copy link
Contributor

@tserong tserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just re-tested with the partprobe -s /dev/$disk change - works like a charm.

@ibrokethecloud ibrokethecloud merged commit 63fd292 into harvester:master Feb 19, 2024
5 checks passed
@bk201
Copy link
Member

bk201 commented Feb 19, 2024

@Mergifyio backport v1.2

Copy link

mergify bot commented Feb 19, 2024

backport v1.2

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants