Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The bot is running out of CPU credits #7261

Closed
HebaruSan opened this issue Jun 19, 2019 · 15 comments
Closed

The bot is running out of CPU credits #7261

HebaruSan opened this issue Jun 19, 2019 · 15 comments

Comments

@HebaruSan
Copy link
Member

Problem

The status page claims that all sorts of things have been indexed over the past two days:

image

But that doesn't square with the commits in CKAN-meta:

image

On the forum the specific examples of PAWS and Trajectories were given:
https://forum.kerbalspaceprogram.com/index.php?/topic/154922-ckan-the-comprehensive-kerbal-archive-network-v1262-dragon%EF%BB%BF/&do=findComment&comment=3621485

And indeed, they both have new releases on their repos, the bot says it indexed them when they came out, but there's no new .ckan file.

So the bot is running, and it's finding new releases and updating its status file, but they're not getting indexed. Is there something weird with its git config that's stopping changes from being pushed to GitHub?

@techman83
Copy link
Member

We're reaching another capacity limit for our brute force method of indexing.
2019-06-19_16:43:03
The box is out of CPU credits. So it will be taking longer than the allowed timeout value and terminating the run before it has finished.

I have been pondering lately about how to better re-architect this, because whilst it has been a pretty robust service, it's expensive spinning up the Mono runtime ~1500 times and doing all the processing around it. Ideally it would be really nice to write some python bindings like you would with C/C++, but I'm unsure how doable that is with C# and if how much would even be possible. At the least, translation/validation/final output would ideally be done by netkan, even if Python did all the web calls etc.

Also the deployment is very manual, so making small changes and deploying them is time consuming, which makes it hard to iterate quickly.

TLDR

I've disabled the indexer and I'll turn it back on tonight. Along with making the runs every 6 hours now.

@HebaruSan
Copy link
Member Author

Huh, I always assumed it would be RAM or disk space that sank us.

Would it help if we had a netkan --bot command, which performed one whole bot pass in one instance of Mono?

@DasSkelett
Copy link
Member

That lite_index option would be handy too probably.

@techman83
Copy link
Member

I got carried away as I've kinda wanted to re-architect it for some time.. KSP-CKAN/CKAN#2789

@HebaruSan HebaruSan pinned this issue Jun 19, 2019
@techman83
Copy link
Member

techman83 commented Jun 20, 2019

So adjusting things to 8 hourly allows the credits just enough time to recover before the next run. And the box has been running with no spare credits for several weeks, so it must have only just recently been taking longer than 3 hours to finish a run (which I've just monkey patched to 6 hours).

2019-06-20_16:15:17

@HebaruSan
Copy link
Member Author

The status quo would be fine with one simple change: Suspend/resume when credits run out instead of terminating. Is there by any chance a server option for that?

@HebaruSan HebaruSan changed the title Is the bot having trouble pushing changes? The bot is running out of CPU credits Jun 20, 2019
@techman83
Copy link
Member

@HebaruSan - That's not an entirely silly idea. I could write a little awscli script that checks if there are more than X credits available, and the last run was outside the total run time. I'll check what IAM permissions the instance needs to pull that information.

I also had a think about the queuing. Essentially it would be a case of adding the SDK (that might be hard or easy), setting the auth environment variables and with long polling open a receive message request on a loop and do something with the results.

https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/EnableLongPolling.html

@HebaruSan
Copy link
Member Author

This issue was about the bot not working, and it is functioning now. There are still things we would like to improve, but that's what KSP-CKAN/CKAN#2789 is for.

@DasSkelett
Copy link
Member

Well actually, now it's the other way around, the commits are pushed, but it appears that the status page isn't updated?
The status page claims the last indexed mod is TarsierSpaceTechnologyWithGalaxies, but since then there have been more commits:
https://github.com/KSP-CKAN/CKAN-meta/commits/master.

Sorting for last inflated / last checked tells me that it stopped updating the page at the end of the run (ZZZRadioTelescope).
Has this something to do with KSP-CKAN/NetKAN-status#8 @techman83?

@HebaruSan
Copy link
Member Author

Those commits came from the SpaceDock web hook rather than the bot.
The part of KSP-CKAN/NetKAN-bot#80 where the web hook was supposed to update the status page does not work, I think because the web hook does not have access to the same status file.

@techman83
Copy link
Member

Hmm, I thought that did work? We'll fix it properly with the re-architecture.

@HebaruSan
Copy link
Member Author

it is functioning now

I spoke too soon, the bot hasn't gotten to PoodsCalmNebulaSkybox in 2 days.

@HebaruSan HebaruSan reopened this Jun 27, 2019
@techman83
Copy link
Member

/me sighs, yup. I've disabled it and killed the current run. I'll whip up a credit checker over one of my coffee breaks today.

@techman83
Copy link
Member

2019-06-28_09:48:21

Think I just found an issue with NetKAN, I'll raise an issue.

@HebaruSan
Copy link
Member Author

Working again thanks to #7269.
KSP-CKAN/CKAN#2816 should fix the underlying issue (this time).

@HebaruSan HebaruSan unpinned this issue Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants