Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK Migration - V2 Beta #306

Closed
wants to merge 19 commits into from
Closed

SDK Migration - V2 Beta #306

wants to merge 19 commits into from

Conversation

JenGoldstrich
Copy link
Contributor

@JenGoldstrich JenGoldstrich commented May 31, 2023

Description

This PR switches from the ARM Builder from the "track 1" SDK we've been using to an SDK maintained by the Terraform Azure Provider team, this new SDK is located here. It has drop in support for track 2 SDKs to help us address the deprecation of the existing SDK with fewer breaking changes. As the new SDK adds track 2 support we can migrate them over time

Related to #243

Breaking Changes

  • All ARM builder functionality moved to github.com/hashicorp/go-azure-sdk
  • Address Connect to Azure Storage account using Managed identity and not Account key. #250 for the ARM Builder as we are currently considering removing the Read Only SAS's that are generated during VHD builds. And no longer create a connection to the storage account via an account key
  • Device Code authentication is removed, we are reccomending using az login --use-device-code instead to replicate this functionality

Other Changes

  • Removed 5 minute expiration check on JWT tokens, as OIDC tokens don't work with this expiration check

What's not done

  • Usage of PollingDurationTimeout as the actual polling duration timeout consistently throughout new code
  • DTL Builder and CHroot are not tested or guaranteed to work with this beta release, they will be migrated after we have validated the migration of the ARM builder and gotten feedback from users
  • The new SDK supports has some endpoints on track 2, these endpoints require the creation of new request/response handlers that will have to be added before the full v2 release

if claims.ExpiresAt < time.Now().Add(5*time.Minute).Unix() {
errs = packersdk.MultiErrorAppend(errs, fmt.Errorf("client_jwt will expire within 5 minutes, please use a JWT that is valid for at least 5 minutes"))
}
//if claims.ExpiresAt < time.Now().Add(5*time.Minute).Unix() {
Copy link
Contributor Author

@JenGoldstrich JenGoldstrich May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OIDC Access Tokens have a short validity period (60 seconds), so throwing this error prevents proper OIDC support, removing it fixes this, I have a slack link with some more context from the provider folks if needed

…et it, switch from MeClient to ServicePrincipals client to get objectID
return c.authType
}

// TODO I still need to port this to the new SDK
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is used by every builder, I'll port is towards the end of the process and leave this todo until then

@@ -305,12 +314,13 @@ func (s *StepDeployTemplate) deleteDeploymentResources(ctx context.Context, depl
defer wg.Done()
retryConfig := retry.Config{
Tries: 10,
RetryDelay: (&retry.Backoff{InitialBackoff: 10 * time.Second, MaxBackoff: 600 * time.Second, Multiplier: 2}).Linear,
RetryDelay: (&retry.Backoff{InitialBackoff: 5 * time.Second, MaxBackoff: 60 * time.Second, Multiplier: 1.5}).Linear,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems to speed up builds quite a bit, if the delete failed a few times here we could end up waiting for quite a long time, this cuts off the max backoff at 1 minute, and lowers the multiplier that increases the backoff, anecdotally acceptance tests feel way faster, and the OIDC test build only took 5 minutes.

@@ -107,11 +107,11 @@ func (s *StepDeployTemplate) Cleanup(state multistep.StateBag) {
// delete the disk as the image request will return a 404
computeName := state.Get(constants.ArmComputeName).(string)
isManagedDisk := state.Get(constants.ArmIsManagedImage).(bool)
imageType, imageName, err := s.disk(context.TODO(), subscriptionId, resourceGroupName, computeName)
imageType, imageName, err := s.disk(ctx, subscriptionId, resourceGroupName, computeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

@@ -65,7 +65,7 @@ func (b *Builder) Run(ctx context.Context, ui packersdk.Ui, hook packersdk.Hook)

ui.Say("Running builder ...")

ctx, cancel := context.WithCancel(ctx)
ctx, cancel := context.WithTimeout(ctx, time.Minute*60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to make this timeout configurable via some configuration option?

if b.config.ClientConfig.ObjectID == "" {
b.config.ClientConfig.ObjectID = getObjectIdFromToken(ui, spnCloud)
b.config.ClientConfig.ObjectID = *objectID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a case where objectID is nil and this throws a nil pointer deference error?

if err == nil {
err = f.WaitForCompletionRef(ctx, azureClient.ImagesClient.Client)
}
err := azureClient.ImagesClient.DeleteThenPoll(ctx, imageId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

images, err := client.ImagesClient.ListByResourceGroupComplete(context.TODO(), resourceGroupName)
func findManagedImageByName(client *AzureClient, name, subscriptionId, resourceGroupName string) (*hashiImagesSDK.Image, error) {
id := commonids.NewResourceGroupID(subscriptionId, resourceGroupName)
images, err := client.ImagesClient.ListByResourceGroupComplete(context.TODO(), id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really part of the review but I'm curious why the code has context.TODO() as opposed to reusing the ctx created during runtime? Guessing there is no context being passed in would be my first guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! So the context for the clients is mostly used for canceling and polling deadlines, I have not passed it through to here yet since it was only one get request used outside of a step, happy to add a real context here if you think it'd be better

go.mod Outdated Show resolved Hide resolved
Copy link
Contributor

@lbajolet-hashicorp lbajolet-hashicorp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed ~half for now, will continue this tomorrow, overall there's a ton of changes and I left more questions than proper comments, but this looks good to me at first glance.
I notice that only the arm builder is updated now, will you refactor the rest later?

@@ -56,7 +53,7 @@ type Artifact struct {
StateData map[string]interface{}
}

func NewManagedImageArtifact(osType, resourceGroup, name, location, id, osDiskSnapshotName, dataDiskSnapshotPrefix string, generatedData map[string]interface{}, keepOSDisk bool, template *CaptureTemplate, getSasUrl func(name string) string) (*Artifact, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very ignorant of this plugin, but what's a SAS URL, and why do we remove that here? I presume this is not necessary anymore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related but not too much, I see the only other place where there's a reference to a getSasUrl is in the NewArtifact function for the dtl builder (in builder/azure/dtl/artifact.go line 56), which is never used in the code according to gopls, so I presume we could remove this as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, a SAS URL is a link to an Azure Storage account resource with an embedded token, we create SAS tokens in VHD builds and when using managed image builds without deleting the disk I believe, however we've been asked by a lot of our users to remove this here #250, because of the security concerns of using these they are not recommended by Microsoft, we are doing them in every build. To top it off the new SDK does not provide the same helper we currently use to generate these tokens.

So for v2 after chatting with Wilken, and given the above issue, I am proposing removing them, they can still be created by users outside of Packer, and this was never used in the resulting Azure Artifact, but was simply printed in the terminal output of the build.

Yes we should remove it from the DTL builder but I would like to hold of any changes to the DTL builder until we have signed off on the way we've handling this for the ARM builder

azureClient.DisksClient.UserAgent = fmt.Sprintf("%s %s", useragent.String(version.AzurePluginVersion.FormattedVersion()), azureClient.DisksClient.UserAgent)
// Clients that have been ported to hashicorp/go-azure-sdk
azureClient.DisksClient = hashiDisksSDK.NewDisksClientWithBaseURI(cloud.ResourceManagerEndpoint)
azureClient.DisksClient.Client.Authorizer = authWrapper.AutorestAuthorizer(resourceManagerAuthorizer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that's the same package referred to here, but from our discussions I remember autorest being a contentious choice, is this the same package referred to here? Won't we still experience crashes because of this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So autorest is an intermediary library used in track 1 SDKs, hashicorp/go-azure-sdk is still migrating from auto-rest to its own transit layer.

This code has been converted to use a different auth layer than Autorest, but track 1 SDKs still rely on autorest, so we can use the helper method in autoWrapper to convert this for SDKs not converted, then each time a client is moved to track 2 we will just replace authWrapper.AutorestAuthorizer(resourceManagerAuthorizer) with resourceManagerAuthorizer, as that helper function just turns the "new" auth object into the old one

}

// Returns an Azure Client used for the Azure Resource Manager
// Also returns the Azure object ID for the authentication method used in the build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, why is the object ID returned as a separate value here? Shouldn't it be embedded in the AzureClient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be defined at the builder level and then passed into the client so I just kept it that way, also historically users could add the ObjectID in their templates instead of relying on us to grab it from the token claims

return "", err
}
return claims["oid"].(string), nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: superfluous empty line

@@ -80,28 +80,33 @@ func (b *Builder) Run(ctx context.Context, ui packersdk.Ui, hook packersdk.Hook)
b.stateBag.Put(constants.ArmManagedImageSubscription, b.config.ClientConfig.SubscriptionID)
}

b.stateBag.Put(constants.ArmSubscription, b.config.ClientConfig.SubscriptionID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is part of the config, do we need to keep this in the state? Are the steps that use this unable to access the full config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I see some other config options are being stored in the state so if that's a convention for this plugin feel free to disregard my comment.

@@ -56,6 +57,7 @@ func TestBuilderAcc_SharedImageGallery_ARM64SpecializedLinuxSIG(t *testing.T) {
Setup: func() error {
createSharedImageGalleryDefinition(t, CreateSharedImageGalleryDefinitionParameters{
galleryImageName: "arm-linux-specialized-sig",
subscriptionId: os.Getenv("ARM_SUBSCRIPTION_ID"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this is part of the environment rather than a constant for this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These acceptance tests rely on env variables for most of the Azure related fields that are in interpreted in the templates, this just grabs them to use in these functions

}

return nil, fmt.Errorf("Cannot find an image named '%s' in the resource group '%s'", name, resourceGroupName)
}

func findVirtualNetworkResourceGroup(client *AzureClient, name string) (string, error) {
virtualNetworks, err := client.VirtualNetworksClient.ListAllComplete(context.TODO())
func findVirtualNetworkResourceGroup(client *AzureClient, subscriptionId, name string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fuel my curiosity, I see that a bunch of calls now include the subscriptionId, for those of us unitiated to Azure's offering, what's that and why do we need it now in so many places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure Subscriptions are user created groups where resources are organized. Every resource (VM, Managed Image) has a Subscription, an a Resource Group which are sub-divisions of subscriptions.

In the old SDK we passed in the subscription ID into the Azure Client, in the new SDK they ask for the subscription ID on every call but the resulting API call is the same regardless.


return multistep.ActionHalt
// HACK(chrboum): I do not like this. The capture method should be returning this value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who's cherboum and paulmey? Are they working on the SDK?
Oh nevermind that's an older comment that was moved, still, is this hack still relevant with the refactor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but I haven't fully vetted that, there are two helper methods we use in azure client that still rely on autorest specific libraries, these are working on the new SDK currently but I need to migrate them still

s.say(s.client.LastError.Error())
}

return exists.Response.StatusCode != 404, err
return exists.HttpResponse.StatusCode != 404, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that happen? If the error might be linked to a 404, should we do the check in both places?

if err != nil {
if exists.HttpResponse.StatusCode == 404 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance that HttpResponse might be nil here? If so this will panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but this was true in the old SDK and it never happened, definitely worth guarding against but unlikely to happen I believe

@@ -1,6 +1,3 @@
// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: MPL-2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken, this should stay part of the file for compliance

"github.com/hashicorp/packer-plugin-azure/builder/azure/common/constants"
"github.com/hashicorp/packer-plugin-sdk/multistep"
packersdk "github.com/hashicorp/packer-plugin-sdk/packer"
)

type StepPublishToSharedImageGallery struct {
client *AzureClient
publish func(ctx context.Context, sourceID string, sharedImageGallery SharedImageGalleryDestination, miSGImageVersionEndOfLifeDate string, miSGImageVersionExcludeFromLatest bool, miSigReplicaCount int32, location string, diskEncryptionSetId string, tags map[string]*string) (string, error)
publish func(ctx context.Context, subscriptionID string, sourceID string, sharedImageGallery SharedImageGalleryDestination, miSGImageVersionEndOfLifeDate string, miSGImageVersionExcludeFromLatest bool, miSigReplicaCount int64, location string, diskEncryptionSetId string, tags map[string]string) (string, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what's the mi in miSGImageVersion... referring to?
Machine image maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, not 100% sure tbh

@@ -130,7 +130,7 @@
"listeners": [
{
"certificateUrl": "",
"protocol": "https"
"protocol": "Https"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird casing here, is this voluntary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the new SDK sends, they resolve the same but we do a file compare to check in the test that causes failures cause https != https, Azure interprets them the same

package common

// StringPtr returns a pointer to the passed string.
func StringPtr(s string) *string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised we have to make our own, isn't there something like this in the Azure SDK already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to push it to the Hashicorp Azure SDK if not, especially if there's a bunch of string pointers in the API calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was in the old SDK as an annoying helper tbh, and I didn't want to have to have a bunch of duplicated code. The new SDK does not have this and tbh I wasn't sure how it'd be received as a PR so I felt this was simpler

@@ -4,8 +4,14 @@
package template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity again, what is being templated here? I'm not sure I understand what's the use for this package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure ARM templates are kinda like packer/terraform templates, in that they are codied infrastructure, but its an Azure concept for defining certain types of deployments https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/overview

The way the Azure Packer plugin works is that it first deploys a template to create a VM, and then it captures that VM into an image artifact

"CHINACLOUD": "china",
"AZURECHINACLOUD": "china",

// TODO Implement AzureGermanCloud
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI Azure Germany doesn't exist anymore since October 2021. There are Azure regions in Germany but they are part of the general public cloud. Ref: Azure Germany

@JenGoldstrich
Copy link
Contributor Author

rc1 with ARM builder support available here: https://github.com/JenGoldstrich/packer-plugin-azure/releases/tag/v2.0.0-betarc1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants