Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Fix computation of excess workload and available capacity #371

Merged
merged 2 commits into from
Jun 1, 2023

Conversation

pdk27
Copy link
Collaborator

@pdk27 pdk27 commented Jun 1, 2023

Description

The plugin computes current demand and available capacity in order to identify excess workload and provision capacity for it. This code is run when jobs are added to queues.
Problems with current computations:

  • Available capacity includes both connectingExecutors and plannedCapacitySnapshot. This can lead to availableCapacity being much higher than reality, esp. for huge fleets and a busy Jenkins with numerous builds running / queued.
  • Current demand changes as capacity is provisioned but queue length remains same. This can be misleading and makes code and logs less readable, especially when dealing with negative current demand.
com.amazon.jenkins.ec2fleet.NoDelayProvisionStrategy apply In NodeProvisioner.StrategyDecision -> apply StrategyState{label=spot-workers, 
snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=0}

currentDemand 1 availableCapacity 0 (availableExecutors 0 connectingExecutors 0 plannedCapacitySnapshot 0 additionalPlannedCapacity 0)

com.amazon.jenkins.ec2fleet.NoDelayProvisionStrategy apply In NodeProvisioner.StrategyDecision -> apply StrategyState{label=spot-workers, 
snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=2, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=2, additionalPlannedCapacity=0}

currentDemand -3 availableCapacity 4 (availableExecutors 0 connectingExecutors 2 plannedCapacitySnapshot 2 additionalPlannedCapacity 0)
  • Logs are misleading, show provisioning completed messages after planning to provision.
Context:

The plugin keeps track of new capacity planned to provision with plannedCapacitySnapshot, which is required to avoid over-provisioning. The plugin also controls when the Java futures for planned nodes are resolved - after Jenkins has established a connection to the new node or connection fails due to timeout i.e.plannedCapacitySnapshot includes nodes in ‘connecting’ state.

Reference of other plugins:

Related Issues:

Testing done

Tested that results match expectations with a snapshot version of the plugin with:

  • various configurations
  • changing plugin config
  • scaling ASG directly in AWS console
  • multiple Jenkins (free style) projects

Sample logs:

  • excess workload:
>> Provisioning:
In NodeProvisioner.StrategyDecision -> apply StrategyState{label=spot-workers, snapshot=LoadStatisticsSnapshot{definedExecutors=2, onlineExecutors=2, connectingExecutors=0, busyExecutors=2, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=0}  
label [spot-workers]: queueLength 1 availableCapacity 0 (availableExecutors 0 plannedCapacitySnapshot 0 additionalPlannedCapacity 0)  
EC2FleetCloud [spot-workers] excessWorkload 1  
EC2FleetCloud [spot-workers] to provision = 1  
Planned 1 new nodes  
Started provisioning FleetNode-EC2FleetCloud-6 from EC2FleetCloud with 1 executors. Remaining excess workload: 0  
Provisioning completed 

>> Update cycle (actual provisioning using EC2 APIs):
EC2FleetCloud [spot-workers] start cloud com.amazon.jenkins.ec2fleet.EC2FleetCloud@2a1fd5ad  
EC2FleetCloud [spot-workers] Set target capacity to '3'  

>> ... After successful connection...

[hudson.slaves.NodeProvisioner lambda$update$6] FleetNode-EC2FleetCloud-3 provisioning successfully completed. We have now 4 computer(s)  

Before:

In NodeProvisioner.StrategyDecision -> apply StrategyState{label=spot-workers, snapshot=LoadStatisticsSnapshot{definedExecutors=3, onlineExecutors=3, connectingExecutors=0, busyExecutors=3, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=0}  
label [spot-workers]: currentDemand 1 availableCapacity 0 (availableExecutors 0 connectingExecutors 0 plannedCapacitySnapshot 0 additionalPlannedCapacity 0)  
EC2FleetCloud [spot-workers] excessWorkload 1  
EC2FleetCloud [spot-workers] to provision = 1  
Planned 1 new nodes  
After provisioning currentDemand=-1  
label [spot-workers]: currentDemand is less than 1, not provisioning  
Provisioning completed  
  • no excess workload:
In NodeProvisioner.StrategyDecision -> apply StrategyState{label=spot-workers, snapshot=LoadStatisticsSnapshot{definedExecutors=5, onlineExecutors=3, connectingExecutors=0, busyExecutors=3, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=2, additionalPlannedCapacity=0}  
label [spot-workers]: queueLength 1 availableCapacity 2 (availableExecutors 0 plannedCapacitySnapshot 2 additionalPlannedCapacity 0)  
label [spot-workers]: No excess workload, provisioning not needed. 

Before:

In NodeProvisioner.StrategyDecision -> apply StrategyState{label=spot-workers, snapshot=LoadStatisticsSnapshot{definedExecutors=3, onlineExecutors=2, connectingExecutors=0, busyExecutors=2, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=1, additionalPlannedCapacity=0}  
label [spot-workers]: currentDemand 0 availableCapacity 1 (availableExecutors 0 connectingExecutors 0 plannedCapacitySnapshot 1 additionalPlannedCapacity 0)  
label [spot-workers]: currentDemand is less than 1, not provisioning  
Provisioning completed  

@pdk27 pdk27 requested a review from jillmon June 1, 2023 17:41
Copy link
Collaborator

@cjerad cjerad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@pdk27 pdk27 removed the request for review from jillmon June 1, 2023 18:32
…tegy.java

Co-authored-by: Jerad C <jeradc@amazon.com>
@pdk27 pdk27 added the do not merge Don't merge this (at least not yet) label Jun 1, 2023
@pdk27 pdk27 merged commit 6bce254 into jenkinsci:master Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge Don't merge this (at least not yet)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants