Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows-Restart Provisioner Should Support Custom Validation #12921

Open
Charles-Stuart opened this issue Apr 12, 2024 · 0 comments
Open

Windows-Restart Provisioner Should Support Custom Validation #12921

Charles-Stuart opened this issue Apr 12, 2024 · 0 comments

Comments

@Charles-Stuart
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

The windows-restart provisioner should support running a custom script to confirm the VM has finished restarting, and that the process that required this restart in the first place has completed its tasks.

This could/should work together with the check_registry parameter to provide a much better user experience for long-running Windows builds.

This appears to have been partially implemented via the restart_check_command parameter, but that parameter's current behavior makes it trip right at the finish line. This may be a bug, or it might be the intended behavior. See "About restart_check_command" in the Additional Details section at the bottom.

I know similar requests in the past have been met with a statement to use AutoUnattend / Sysprep... but I feel I've identified enough use cases and a simple enough implementation mechanism to warrant another review. This is especially true considering how bad the user experience is when trying to orchestrate complex builds with sysprep run_once tasks.

Use Case(s)

  • Wait for installation scripts to complete across multiple restarts, such as:
    • Windows updates management
    • Testing OS upgrades, such as from Windows 10 XXXX -> Windows 10 YYYY
    • Installing agents that deploy filter drivers (Horizon View, AV/AM products, FSLogix, etc)
  • Installing applications that don't respect "IgnoreRestart" installation arguments
  • Support using configuration managers, such as DSC, without needing to add direct support for the tool itself
  • Show progress output to the build console during long-running builds
    • This is a borderline requirement for creating Windows builds in an enterprise environment

Potential configuration

NOTE I'm not a GO developer, I may be completely wrong about the changes that need to be implemented.

If updating the restart_check_command parameter's behavior is acceptable, this seems fairly simple:

The provisioner requires no changes to configuration:

provisioner "windows-restart" {
	check_registry        = true
	restart_check_command = "powershell -file \"C:\\packer\\Block-PackerBuild.ps1\""
	registry_keys         = [
		"HKLM:\\Software\\packer\script-in-progress"
	]
}

Changes to the provisioner's wait loop:

if runCustomRestartCheck {
	// run user-configured restart check
	err := cmdRestartCheck.RunWithUi(ctx, p.comm, p.ui)
	if err != nil {
		log.Printf("Communication connection err: %s", err)
		continue
	}
	log.Printf("Connected to machine")
	
	// Don't disable the restart check users specifically ask to run
	// runCustomRestartCheck = false 
}

Alternative options are below that preserve the original behavior of the provisioner for backwards-compatibility. Note that I don't know how to update the parameters or configs, so I can't provide examples for those changes.

A. Add a boolean param always_rerun_restart_check

This option would just cause the provisioner to re-run the restart check on each run of the loop.

The user of the provisioner could then author their own script to evaluate the state of the VM in a wait loop, and then continue the build after the condition is met.

Configuring the provisioner:

provisioner "windows-restart" {
	always_rerun_restart_check = true
	check_registry             = true
    restart_check_command      = "powershell -file \"C:\\packer\\Block-PackerBuild.ps1\""
	registry_keys              = [
		"HKLM:\\Software\\packer\\script-in-progress"
	]
}

Changes to the provisioner's wait loop:

if runCustomRestartCheck {
	// run user-configured restart check
	err := cmdRestartCheck.RunWithUi(ctx, p.comm, p.ui)
	if err != nil {
		log.Printf("Communication connection err: %s", err)
		continue
	}
	log.Printf("Connected to machine")
	
	// Previous code, always disables the custom restart check
	// runCustomRestartCheck = false 

	// My update - only disable the custom restart check 
	//   if always_run_restart_check is not set to true
	if !p.config.always_run_restart_check {
		runCustomRestartCheck = false
	}
}
B. Evaluate the return code from the restart_check_command parameter as part of the success criteria

This option would cause the provisioner to check the result of the custom restart check script to determine success.

The user of the provisioner could then author a script that exits with non-zero exit codes if it detects the build isn't ready to continue.

Configuring the provisioner:

provisioner "windows-restart" {
	check_registry             = true
    restart_check_command      = "powershell -file \"C:\\packer\\Block-PackerBuild.ps1\""
	registry_keys              = [
		"HKLM:\\Software\\packer\\script-in-progress"
	]
}

Changes to the provisioner's wait loop:

if runCustomRestartCheck {
	// run user-configured restart check
	err := cmdRestartCheck.RunWithUi(ctx, p.comm, p.ui)
	if err != nil {
		log.Printf("Communication connection err: %s", err)
		continue
	}
	log.Printf("Connected to machine")

	// My update - restart the loop if the script exits without an exit code of 0
	if(!cmdRestartCheck.ExitStatus() == 0) {
		// We failed, continue the loop until it returns 0
		log.Printf("Exited with non-zero code: %s", cmdRestartCheck.ExitStatus())
		continue
	}
}
C. Add a string param registry_check_status_script

This option would define a script that should run every time the registry key check detects the keys defined in the registry_keys parameter.

The user could then author a script that outputs the current status to the log.

Configuring the provisioner:

provisioner "windows-restart" {
	check_registry               = true
    registry_check_status_script = "C:\\packer\\Show-BuildProgress.ps1"
	registry_keys                = [
		"HKLM:\\Software\\packer\\script-in-progress"
	]
}
if p.config.CheckKey {
	// Run the specified status check script if we're supposed to check the registry
	if !p.config.RegistryCheckStatusScript == "" {
		RegistryStatusCheckCommand := winrm.Powershell(p.config.RegistryCheckStatusScript)
		cmdRegistryStatusCheck := &packersdk.RemoteCmd{Command: RegistryStatusCheckCommand}
		log.Printf("Invoking status check script %s", p.config.RegistryCheckStatusScript)
		
		// Invoke the script here
		err := cmdRegistryStatusCheck.RunWithUi(ctx, p.comm, p.ui)
		if err != nil {
			log.Printf("Status check script errored: %s", err)
			continue
		}
	}

	// continue to checking individual registry keys
}

Potential References

This is the file I think that needs to be changed:
provisioner.go

Additional Details

Workflow and Sample Block-PackerBuild script

In more detail, the workflow this would allow is as follows:

  1. Powershell/etc provisioner invokes a long-running script that:

    1. Sets a registry key to block the windows-restart provisioner from continuing
    2. Registers itself to re-run on startup
    3. Installs software, then initiates 0 or more restarts as needed
    4. Unregisters startup task and deletes blocking registry key when all operations are done
  2. Windows-Restart provisioner then:

    1. Runs custom check to wait for the startup script to complete
    2. Re-runs custom check script if the VM restarts and the registry values are still present
    3. Only continues when the custom check completes AND the configured registry values are no longer present

Here's an example of a script that could be used in conjunction with the feature:

# Block-PackerBuild.ps1 - registered to run at startup or invoked by the windows-restart provisioner as a status check

# Add the registry key to block packer from continuing
New-Item -ItemType Directory -Path "HKLM:\Software\packer\script-in-progress" -Force

# Wait until the DSC agent is compliant
do
{
	$LCMState = (Get-DSCLocalConfigurationManager -ErrorAction SilentlyContinue).LCMState
	Write-Host "Waiting for lcm state to be idle: [ $LCMState ]"
	Start-Sleep -Seconds 5
}
while($LCMState -ne 'Idle')

# Remove the registry key and exit to allow the provisioner to continue
Remove-Item -Path "HKLM:\Software\packer\script-in-progress" -Force
The Problems With restart_check_command

In theory, this should be supported by the restart_check_command parameter.
However, there are several major problems with the parameter's current setup:

  1. There is no method of signaling a failure to the provisioner using this parameter
    1. All output from the script is ignored, including stderr, stdout, Write-Output, Write-Host, throwing terminating exceptions, etc
    2. Exit/return codes are similarly ignored
  2. The custom check is only run until the first time it succeeds
    1. After first success, the provisioner only runs the "default" check on future iterations of the loop
  3. The default check used afterwards outputs a message to the log every 5 - 10 seconds "$vm-name restarted.", which
    1. Provides no useful information to the user
    2. Clutters up the log, making troubleshooting more difficult
    3. Misleads the user, implying that the VM is being repeatedly restarted every few seconds
Regarding The AutoUnattend/Sysprep Option

While I understand the official stance from others in the past has been to use sysprep / autounattend to run workflows like this prior to enabling WinRM connections, this has several issues:

  1. Not all workflows are suitable for use during an auto-unattend deployment
  2. No status is given to the user during the unattended setup phase, making for a terrible user experience if the process takes more than 5 - 10 minutes
  3. Unattended installations are often difficult to configure and troubleshoot, especially when compared to WinRM-based provisioning steps
  4. Provisioning steps that require credentials would require embedding application secrets in the sysprep XML
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants