-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] End-of-year CI budget report: cost exceeded projection #5176
Comments
there is a jenkins-github plugin with which auto-test can only be triggered when a committer commented with "ok to test", https://wiki.jenkins.io/display/JENKINS/GitHub+pull+request+builder+plugin, |
@CodingCat Nice! Thanks for the link. Let me take a look at it. This should let us save some CI runs caused by WIP commits. |
One idea is to add a script to throttle provision of EC2 workers to meet the monthly budget limit. If the limit is breached, no new EC2 worker would be launched. |
I agree, even for us internally, we are not provisioning workers for CI in an unlimited manner, instead we tends to queue up the test (to prevent those pre-mature commits taking too many resources) |
@CodingCat I installed the GitHub Pull Request Builder plugin. I will watch the repository closely in the next few days and see if the plugin is working. EDIT. This seems to clash with the GitHub Branch Source plugin. Will investigate. |
For personal projects I've also used the [AWS budgets] (https://aws.amazon.com/aws-cost-management/aws-budgets/) feature to at least send email warnings and I'm now looking into autoshutdown of instances as well. I think the budget warnings should be implemented (and it's easy) at least, so you can manually intervene in case something happens, like the wrong instance type eating up all the budget. |
I would also recommend us looking at alternatives. For example, github action and azure pipelines are both good options to get CPU CI(dask, spark, windows) and we can only use jenkins for GPU CIs that can be triggered optionally |
@dmlc/xgboost-committer
In March 2019, AWS graciously granted us 12k USD worth of AWS credits to maintain our CI server (https://xgboost-ci.net). In addition, sponsors including NVIDIA are committed to monthly donation through Open Source Collective. Since then, we revamped our CI (#4234) and added test coverage for more platforms and targets (CUDA, multi-GPU, dask, Spark).
Now here is the end of year report. I had estimated the cloud cost to be 1000 USD/month, but thanks to active contributions over the year, the average cost has been greater, at 1600 USD/month. As a result, the 12k AWS credit ran out 3 months earlier than what I expected. See the following table:
(*) There isn't actually enough balance in the donation account to cover this amount. For now, I will personally cover the difference.
I am reaching out to AWS for another round of donation. Absent additional funding, we will need to take drastic cost-saving measures. Let us find ways to keep the server running. I personally donated USD 3042 to keep the server going for another month, so that we can push out 1.0 release.
Cost by EC2 instance type:
(**) I accidentally assigned C5.9xlarge type to Windows workers, which blew up the cost by 700 USD. The issue has been fixed, by downgrading them to C5.4xlarge.
Cost by AWS service:
The text was updated successfully, but these errors were encountered: