-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPIKE] Split types StackCollection, ClusterProvider etc. and increase cohesion #4655
Comments
#2931 Has lot of details around the problems with clusterProvider |
StackCollectionThe type StackManager interface {
AppendNewClusterStackResource(plan, supportsManagedNodes bool) (bool, error)
CreateStack(name string, stack builder.ResourceSet, tags, parameters map[string]string, errs chan error) error
DeleteStackBySpec(s *Stack) (*Stack, error)
DeleteStackBySpecSync(s *Stack, errs chan error) error
DeleteStackSync(s *Stack) error
DeleteTasksForDeprecatedStacks() (*tasks.TaskTree, error)
DescribeClusterStack() (*Stack, error)
DescribeIAMServiceAccountStacks() ([]*Stack, error)
DescribeNodeGroupStack(nodeGroupName string) (*Stack, error)
DescribeNodeGroupStacks() ([]*Stack, error)
DescribeNodeGroupStacksAndResources() (map[string]StackInfo, error)
DescribeStack(i *Stack) (*Stack, error)
DescribeStackChangeSet(i *Stack, changeSetName string) (*ChangeSet, error)
DescribeStackEvents(i *Stack) ([]*cloudformation.StackEvent, error)
DescribeStacks() ([]*Stack, error)
DoCreateStackRequest(i *Stack, templateData TemplateData, tags, parameters map[string]string, withIAM bool, withNamedIAM bool) error
DoWaitUntilStackIsCreated(i *Stack) error
EnsureMapPublicIPOnLaunchEnabled() error
FixClusterCompatibility() error
GetAutoScalingGroupName(s *Stack) (string, error)
GetClusterStackIfExists() (*Stack, error)
GetFargateStack() (*Stack, error)
GetIAMAddonName(s *Stack) string
GetIAMAddonsStacks() ([]*Stack, error)
GetIAMServiceAccounts() ([]*v1alpha5.ClusterIAMServiceAccount, error)
GetKarpenterStack() (*Stack, error)
GetManagedNodeGroupTemplate(options GetNodegroupOption) (string, error)
GetNodeGroupName(s *Stack) string
GetNodeGroupStackType(options GetNodegroupOption) (v1alpha5.NodeGroupType, error)
GetStackTemplate(stackName string) (string, error)
GetUnmanagedNodeGroupSummaries(name string) ([]*NodeGroupSummary, error)
HasClusterStackUsingCachedList(clusterStackNames []string, clusterName string) (bool, error)
ListClusterStackNames() ([]string, error)
ListIAMServiceAccountStacks() ([]string, error)
ListNodeGroupStacks() ([]NodeGroupStack, error)
ListStacks(statusFilters ...string) ([]*Stack, error)
ListStacksMatching(nameRegex string, statusFilters ...string) ([]*Stack, error)
LookupCloudTrailEvents(i *Stack) ([]*cloudtrail.Event, error)
MakeChangeSetName(action string) string
MakeClusterStackName() string
NewClusterCompatTask() tasks.Task
NewManagedNodeGroupTask(nodeGroups []*v1alpha5.ManagedNodeGroup, forceAddCNIPolicy bool, importer vpc.Importer) *tasks.TaskTree
NewTaskToDeleteAddonIAM(wait bool) (*tasks.TaskTree, error)
NewTaskToDeleteUnownedNodeGroup(clusterName, nodegroup string, eksAPI eksiface.EKSAPI, waitCondition *DeleteWaitCondition) tasks.Task
NewTasksToCreateClusterWithNodeGroups(nodeGroups []*v1alpha5.NodeGroup, managedNodeGroups []*v1alpha5.ManagedNodeGroup, supportsManagedNodes bool, postClusterCreationTasks ...tasks.Task) *tasks.TaskTree
NewTasksToCreateIAMServiceAccounts(serviceAccounts []*v1alpha5.ClusterIAMServiceAccount, oidc *iamoidc.OpenIDConnectManager, clientSetGetter kubernetes.ClientSetGetter) *tasks.TaskTree
NewTasksToDeleteClusterWithNodeGroups(stack *Stack, stacks []NodeGroupStack, deleteOIDCProvider bool, oidc *iamoidc.OpenIDConnectManager, clientSetGetter kubernetes.ClientSetGetter, wait bool, cleanup func(chan error, string) error) (*tasks.TaskTree, error)
NewTasksToDeleteIAMServiceAccounts(serviceAccounts []string, clientSetGetter kubernetes.ClientSetGetter, wait bool) (*tasks.TaskTree, error)
NewTasksToDeleteNodeGroups(stacks []NodeGroupStack, shouldDelete func(_ string) bool, wait bool, cleanup func(chan error, string) error) (*tasks.TaskTree, error)
NewTasksToDeleteOIDCProviderWithIAMServiceAccounts(oidc *iamoidc.OpenIDConnectManager, clientSetGetter kubernetes.ClientSetGetter) (*tasks.TaskTree, error)
NewUnmanagedNodeGroupTask(nodeGroups []*v1alpha5.NodeGroup, forceAddCNIPolicy bool, importer vpc.Importer) *tasks.TaskTree
RefreshFargatePodExecutionRoleARN() error
StackStatusIsNotReady(s *Stack) bool
StackStatusIsNotTransitional(s *Stack) bool
UpdateNodeGroupStack(nodeGroupName, template string, wait bool) error
UpdateStack(options UpdateStackOptions) error
} This interface works quite well, and is easily mocked out throughout the codebase. However the interface itself is quite large
ProposalI propose we move all of the following:
a rough outline of what the new interface might look like (just guess work): type StackManager interface {
CreateStack(name string, stack builder.ResourceSet, tags, parameters map[string]string, errs chan error) error
DeleteStack(d DeleteOpts) (*Stack, error)
GetClusterStack() (*Stack, error)
GetIAMServiceAccountStacks() ([]*Stack, error)
GetNodeGroupStack(nodeGroupName string) (*Stack, error)
GetNodeGroupStacks() ([]*Stack, error)
GetNodeGroupStacksAndResources() (map[string]StackInfo, error)
GetStack(i *Stack) (*Stack, error)
GetStackChangeSet(i *Stack, changeSetName string) (*ChangeSet, error)
GetStackEvents(i *Stack) ([]*cloudformation.StackEvent, error)
GetStacks() ([]*Stack, error)
GetAutoScalingGroupName(s *Stack) (string, error)
GetClusterStackIfExists() (*Stack, error)
GetFargateStack() (*Stack, error)
GetIAMAddonName(s *Stack) string
GetIAMAddonsStacks() ([]*Stack, error)
GetIAMServiceAccounts() ([]*v1alpha5.ClusterIAMServiceAccount, error)
GetKarpenterStack() (*Stack, error)
GetManagedNodeGroupTemplate(options GetNodegroupOption) (string, error)
GetNodeGroupName(s *Stack) string
GetNodeGroupStackType(options GetNodegroupOption) (v1alpha5.NodeGroupType, error)
GetStackTemplate(stackName string) (string, error)
GetUnmanagedNodeGroupSummaries(name string) ([]*NodeGroupSummary, error)
GetClusterStackNames() ([]string, error)
GetIAMServiceAccountStacks() ([]string, error)
GetNodeGroupStacks() ([]NodeGroupStack, error)
GetStacks(statusFilters ...string) ([]*Stack, error)
GetStacksMatching(nameRegex string, statusFilters ...string) ([]*Stack, error)
HasClusterStackUsingCachedList(clusterStackNames []string, clusterName string) (bool, error)
MakeChangeSetName(action string) string
MakeClusterStackName() string
NewClusterCompatTask() tasks.Task
NewManagedNodeGroupTask(nodeGroups []*v1alpha5.ManagedNodeGroup, forceAddCNIPolicy bool, importer vpc.Importer) *tasks.TaskTree
StackStatusIsNotReady(s *Stack) bool
UpdateStack(options UpdateStackOptions) error
} |
GoalsReduce the complexity/split out The problemClusterProviderThe Part of
Mocking out this interface is often how we achieve a lot of mocking of API calls across the codebase. The Before deciding what should/shouldn't live with this package, we need to establish what we want this package to be for. Given the package is called func (c *ClusterProvider) DescribeControlPlane(meta *api.ClusterMeta) (*awseks.Cluster, error)
func (c *ClusterProvider) SupportsManagedNodes(clusterConfig *api.ClusterConfig) (bool, error)
func (c *ClusterProvider) ControlPlaneVersion() string
func ValidateBottlerocketSupport(controlPlaneVersion string, kubeNodeGroups []KubeNodeGroup) error
func (c *ClusterProvider) UpdatePublicAccessCIDRs(clusterConfig *api.ClusterConfig) error All of these functions provide an example of where the What can stay and what should be moved
$ ll
api.go
client.go
compatibility.go
eks.go
fargate.go
logging_retryer.go
nodegroup.go
nodegroup_service.go
tasks.go
update.go Lets take a look at each file and see whats in it api.goContains:
On the face of it a lot of this isn't really EKS specific but more general AWS related. We should move this out of this package (:heavy_check_mark: added to proposal at the end of the comment) The rest of file is made up of:
None of these functions seem related to EKS specifically, so lets move them. client.goContains code for setting up the aws-sdk-go sessions. Should be moved along side the AWS APIs code. compatibility.goContains methods for validating a clusters configuration. I'm unsure after a first pass through if this really belows in // ValidateClusterForCompatibility looks at the cluster stack and check if it's
// compatible with current nodegroup configuration, if it find issues it returns an error
func (c *ClusterProvider) ValidateClusterForCompatibility(cfg *api.ClusterConfig, stackManager manager.StackManager) error { eks.goContains methods for interacting with clusters. Seems good for this package ✔️ func (c *ClusterProvider) DescribeControlPlane(meta *api.ClusterMeta) (*awseks.Cluster, error)
func (c *ClusterProvider) RefreshClusterStatus(spec *api.ClusterConfig) error
func (c *ClusterProvider) SupportsManagedNodes(clusterConfig *api.ClusterConfig) (bool, error)
func (c *ClusterProvider) SupportsFargate(clusterConfig *api.ClusterConfig) (bool, error)
func (c *ClusterProvider) RefreshClusterStatusIfStale(spec *api.ClusterConfig) error
func (c *ClusterProvider) CanOperate(spec *api.ClusterConfig) (bool, error)
func (c *ClusterProvider) CanOperateWithRefresh(spec *api.ClusterConfig) (bool, error)
func (c *ClusterProvider) CanUpdate(spec *api.ClusterConfig) (bool, error)
func (c *ClusterProvider) ControlPlaneVersion() string
func (c *ClusterProvider) ControlPlaneVPCInfo() awseks.VpcConfigResponse
func (c *ClusterProvider) GetCluster(clusterName string) (*awseks.Cluster, error)
func (c *ClusterProvider) WaitForControlPlane(meta *api.ClusterMeta, clientSet *kubernetes.Clientset) error
func ClusterSupportsManagedNodes(cluster *awseks.Cluster) (bool, error)
func ClusterSupportsFargate(cluster *awseks.Cluster) (bool, error)
func PlatformVersion(platformVersion string) (int, error) Also contains some stuff not-so related to EKS that should be moved ❎ func (c *ClusterProvider) NewOpenIDConnectManager(spec *api.ClusterConfig) (*iamoidc.OpenIDConnectManager, error)
func (c *ClusterProvider) LoadClusterIntoSpecFromStack(spec *api.ClusterConfig, stackManager manager.StackManager) error
func (c *ClusterProvider) LoadClusterVPC(spec *api.ClusterConfig, stackManager manager.StackManager) error fargate.goContains funcs for managing fargate. Definitely doesn't belong here ❎ func (fpt *fargateProfilesTask) Describe() string
func (fpt *fargateProfilesTask) Do(errCh chan error) error
func DoCreateFargateProfiles(config *api.ClusterConfig, fargateClient FargateClient) error
func ScheduleCoreDNSOnFargateIfRelevant(config *api.ClusterConfig, ctl *ClusterProvider, clientSet kubernetes.Interface) error logging_retryer.goCode for implementing our nodegroup.goMethods for validating that managed nodegroup configurations are supported by the EKS versions. Seems reasonable & EKS related ✔️ func ValidateFeatureCompatibility(clusterConfig *api.ClusterConfig, kubeNodeGroups []KubeNodeGroup) error
func ValidateBottlerocketSupport(controlPlaneVersion string, kubeNodeGroups []KubeNodeGroup) error
func ValidateManagedNodesSupport(clusterConfig *api.ClusterConfig) error
func VersionSupportsManagedNodes(controlPlaneVersion string) (bool, error)
func ValidateWindowsCompatibility(kubeNodeGroups []KubeNodeGroup, controlPlaneVersion string) error
func ValidateKMSSupport(clusterConfig *api.ClusterConfig, eksVersion string) error
func SupportsWindowsWorkloads(nodeGroups []KubeNodeGroup) bool
func LogWindowsCompatibility(nodeGroups []KubeNodeGroup, clusterMeta *api.ClusterMeta) Also contains a random stack-related func that should not be here ❎ func (c *ClusterProvider) GetNodeGroupIAM(stackManager manager.StackManager, ng *api.NodeGroup) error nodegroup_service.goMethods for translating managed nodegroup configuration into real information, e.g. taking func NewNodeGroupService(provider api.ClusterProvider, instanceSelector InstanceSelector) *NodeGroupService
func (m *NodeGroupService) NewAWSSelectorSession(provider api.ClusterProvider)
func (m *NodeGroupService) Normalize(nodePools []api.NodePool, clusterMeta *api.ClusterMeta) error
func (m *NodeGroupService) ExpandInstanceSelectorOptions(nodePools []api.NodePool, clusterAZs []string) error
func (m *NodeGroupService) ValidateLegacySubnetsForNodeGroups(spec *api.ClusterConfig, provider api.ClusterProvider) error Also contains some stray validation logic that should/code be in func (m *NodeGroupService) ValidateExistingNodeGroupsForCompatibility(cfg *api.ClusterConfig, stackManager manager.StackManager) error And a task related method :shakes-fist: ❎ lets move this into func (m *NodeGroupService) DoAllNodegroupStackTasks(taskTree *tasks.TaskTree, region, name string) error task.go:panic: tasks shouldn't be here. Move to own packages/inside update.goMethods for updating cluster settings and fetching information from the cluster ( func (c *ClusterProvider) GetCurrentClusterConfigForLogging(spec *api.ClusterConfig) (sets.String, sets.String, error)
func (c *ClusterProvider) UpdateClusterConfigForLogging(cfg *api.ClusterConfig) error
func (c *ClusterProvider) GetCurrentClusterVPCConfig(spec *api.ClusterConfig) (*ClusterVPCConfig, error)
func (c *ClusterProvider) UpdateClusterConfigForEndpoints(cfg *api.ClusterConfig) error
func (c *ClusterProvider) UpdatePublicAccessCIDRs(clusterConfig *api.ClusterConfig) error
func (c *ClusterProvider) EnableKMSEncryption(ctx context.Context, clusterConfig *api.ClusterConfig) error
func (c *ClusterProvider) UpdateClusterVersion(cfg *api.ClusterConfig) (*eks.Update, error)
func (c *ClusterProvider) UpdateClusterVersionBlocking(cfg *api.ClusterConfig) error RefactorsAs you can see from above a lot happens in here and a lot of it is :magic:. Tackling this will take a lot of worth. A first pass:
Once that is done we can split up the package as whole. The package is called
|
This is a good survey, and really any step towards refactoring this monster is a good start. You set out some good initial steps that can be taken. One suggestion that I have on a first pass is to add those to the list and write specific action items (e.g. |
thanks!
I've added some emojis to the different sections to try to make it clear what should be moved. I think rather than re-listing all the funcs to be moved, perhaps sub-tasks for tackling each file might be a easier way to split the work up? |
What I would do and I think this is something that you are trying to do as well, is the following:
Doing this, should make mocking easy and specific. You don't have to deal with each and every provider on every turn, you could mock what you need specifically. As you said, this is a lot of work. Figuring out which belongs where. I think you made a good effort of starting this work. :) |
Closing this ticket now. The next steps are to turn the proposal into actionable tickets |
Types
StackCollection
andClusterProvider
have a lot of methods, many of which do not necessarily belong in them, thereby reducing cohesion and adding too many responsibilities to a single type. Both types should be split into smaller and more focused types.Related: #4213
As part of this spike ticket, investigate a few possible options for implementation and then we'll discuss as a team what is the best path to take.
Timebox: 2 days
The text was updated successfully, but these errors were encountered: