Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: asoctl import-ing a ManagedCluster produces an invalid resource #3805

Closed
nojnhuh opened this issue Feb 20, 2024 · 8 comments · Fixed by #3880 or #4109
Closed

Bug: asoctl import-ing a ManagedCluster produces an invalid resource #3805

nojnhuh opened this issue Feb 20, 2024 · 8 comments · Fixed by #3880 or #4109
Assignees
Labels
bug 🪲 Something isn't working high-priority Issues we intend to prioritize (security, outage, blocking bug)
Milestone

Comments

@nojnhuh
Copy link
Member

nojnhuh commented Feb 20, 2024

Version of Azure Service Operator

v2.5.0

Describe the bug
asoctl import-ing an AKS cluster created with az aks create with only the required parameters produces invalid values that trigger validation errors from the AKS API.

I had to make the following tweaks to the generated YAML to get the ManagedCluster to reconcile successfully:

spec.networkProfile.loadBalancerProfile.backendPoolType: nodeIPConfiguration -> NodeIPConfiguration

spec.networkProfile.loadBalancerSku: Standard -> standard

To Reproduce
Steps to reproduce the behavior:

az group create -n adopt-test -l eastus && az aks create -g adopt-test -n adopt1
asoctl import azure-resource /subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/adopt-test -o managedcluster.yaml
# Delete all but the ResourceGroup and ManagedCluster resources, then
kubectl apply -f managedcluster.yaml

Expected behavior
asoctl import produces valid YAML that doesn't require any additional changes to reconcile successfully.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@theunrepentantgeek
Copy link
Member

I suspect this is an example of a wider issue.

In theory, Azure Resource Providers are supposed to be case insensitive, case preserving - it doesn't matter what letter case you use, and the system preserves what you send it.

In practice, this is a goal, but one that isn't achieved by all RPs, including it seems, ContainerService.

I suspect that what we need is a case correcting conversion when populating the Spec based on the Status.

@matthchr
Copy link
Member

We can work around this now in ASO (if we want), but FWIW this is actually on my list to fix in the AKS RP. Just not sure when that fix will land.

@matthchr matthchr added bug 🪲 Something isn't working and removed needs-triage 🔍 labels Feb 26, 2024
@matthchr matthchr added this to the v2.8.0 milestone Feb 26, 2024
@theunrepentantgeek
Copy link
Member

To handle this purely within ASO - which I suspect we want to do to avoid upstream delays with RP changes - we'd need to change our conversion logic for enum values.

Currently we convert enum values using direct casts:

backendPoolType := string(source.BackendPoolType)
destination.BackendPoolType = BackendPoolType(backendPoolType)

This is case preserving, but assumes the source casing we have is correct.

We could instead code generate a conversion map, indexed by lowercase string value, allowing case correction as we go.

// Package global
var backendPoolTypeConversion = map[string]BackendPoolType {
        "nodeip": BackendPoolType_NodeIP,
        "nodeipconfiguration": BackendPoolType_NodeIPConfiguration,
}|

// Conversion code
backendPoolType := string(source.BackendPoolType)
if v, ok := backendPoolTypeConversion[backendPoolType]; ok {
        destination.BackendPoolType = v
} else {
        destination.BackendPoolType = BackendPoolType(backendPoolType)
}

@nojnhuh
Copy link
Member Author

nojnhuh commented Apr 29, 2024

This still seems to be an issue for me with asoctl v2.7.0. In addition to the changes listed above in the description, I found I also had to change spec.type on each of two ManagedClustersAgentPools from Microsoft.ContainerService/managedClusters/agentPools to VirtualMachineScaleSets. (I don't think I had tried to apply the agent pool yamls before)

The errors I saw this time were k8s webhook errors. I think when I first ran into this, the webhooks were allowing the values but they were being rejected by the RP.

@matthchr
Copy link
Member

matthchr commented Apr 29, 2024

Are you using the latest asoctl? I think the fix needs the latest asoctl to perform normalization when it writes the YAML. We didn't change that ASO only accepts the one case. We fixed asoctl to export the right case always.

The spec.type thing seems surprising... that's just because you had set the wrong type?

@nojnhuh
Copy link
Member Author

nojnhuh commented Apr 30, 2024

Yes, this is with asoctl v2.7.0. One difference I see is that asoctl imports an AKS cluster created with the az aks create command with the correct "standard" but for a cluster created via the portal, it pulls in "Standard" instead. Both get the invalid "nodeIPConfiguration" value.

The agent pool spec.type value came straight from asoctl, so that seems like a separate bug.

@matthchr matthchr reopened this Apr 30, 2024
@matthchr matthchr modified the milestones: v2.7.0, v2.8.0 Apr 30, 2024
@matthchr
Copy link
Member

I've reopened this since it seems like the fix isn't working.

The az cli works now because I fixed AKS itself normalizing the case from standard -> Standard internally. I suspect the portal is passing Standard rather than standard and AKS just preserves whatever case the user gives us now. I'll see if I can track that down and fix it but it doesn't change the fact that asoctl isn't behaving how I thought it would behave. @theunrepentantgeek can you take a look at this as well and see if anything jumps out at you?

We also should track down the issue with spec.type. My suspicion is that we have colliding properties (.properties.type and .type).

@theunrepentantgeek theunrepentantgeek added the high-priority Issues we intend to prioritize (security, outage, blocking bug) label May 2, 2024
@theunrepentantgeek
Copy link
Member

On the AgentPool status types, we have both type (the generic ARM type of the resource) and properties_type (the type of agent pool) and we're incorrectly copying type across onto the spec when doing the import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working high-priority Issues we intend to prioritize (security, outage, blocking bug)
Projects
3 participants