Skip to content
This repository has been archived by the owner on Nov 23, 2023. It is now read-only.

Cull CI log groups periodically #2319

Closed
Jimlinz opened this issue Nov 17, 2022 · 4 comments
Closed

Cull CI log groups periodically #2319

Jimlinz opened this issue Nov 17, 2022 · 4 comments
Assignees
Labels
enabler story Enable to team to improve

Comments

@Jimlinz
Copy link
Contributor

Jimlinz commented Nov 17, 2022

Our CI has seed prefixes. This is useful in many ways, especially for troubleshooting, as well as setting up the stage for running pytest tests in parallel. However, one downside is that it creates a handful of new log groups for each seed / CI run (one for each lambda per CI run). This adds up very quickly.

While there is an option to set individual log entry to expire, there isn't an option to delete the log group itself within the geostore stack. AFAIK log group RemovalPolicy.DESTROY isn't an option (see https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_lambda_python_alpha/PythonFunction.html).

This isn't a problem in production, but it is an issue in CI, where CloudWatch logs are pretty much unusable past 10,000 groups.

We should look at various options to keep the number of log groups in check in CI. This would not only keep things tidy, keep costs down (less logs), but also allows us to actually view and search the logs when troubleshooting.

Options:

  • GitHub actions to cull logs on a schedule
  • Lambda function to cull logs on a schedule
  • Find a way to clean up after itself (delete own log groups) during teardown (but would make the logging less useful as they are gone before one could look at it for troubleshooting purposes)
  • ???
  • Profit

Please feel free to add your thoughts and suggestions below

@Jimlinz Jimlinz added the enabler story Enable to team to improve label Nov 17, 2022
@billgeo billgeo moved this to 📋 Backlog in Data Infrastructure Squad Nov 18, 2022
@billgeo billgeo moved this from 📋 Backlog to 🔖 Ready in Data Infrastructure Squad Nov 18, 2022
@l0b0
Copy link
Contributor

l0b0 commented Nov 18, 2022

We should ideally have tests for what goes into the logs (and we do for a bunch of things), so the logs in the CI instances shouldn't really provide any new info we couldn't gather from the CI output. So I'd be in favour of tearing these down after the tests.

@billgeo billgeo moved this from 🔖 Ready to 📋 Backlog in Data Infrastructure Squad Nov 18, 2022
@billgeo
Copy link
Contributor

billgeo commented Nov 18, 2022

I agree. And if there was any artifacts we need after CI, should be kept as Github artifacts which can then be configured to be deleted after some set period.

@billgeo billgeo moved this from 📋 Backlog to 🏗 Doing / Implementing in Data Infrastructure Squad Nov 24, 2022
@Jimlinz Jimlinz moved this from 🏗 Doing / Implementing to 👀 Reviewing in Data Infrastructure Squad Nov 24, 2022
@Jimlinz
Copy link
Contributor Author

Jimlinz commented Nov 24, 2022

Merged: #2338

@Jimlinz Jimlinz moved this from 👀 Reviewing to 🏗 Doing / Implementing in Data Infrastructure Squad Nov 24, 2022
@Jimlinz Jimlinz self-assigned this Nov 24, 2022
@billgeo
Copy link
Contributor

billgeo commented Nov 29, 2022

This has been done. And all old log groups will eventually be removed.

@billgeo billgeo closed this as completed Nov 29, 2022
Repository owner moved this from 🏗 Doing / Implementing to 👀 Reviewing in Data Infrastructure Squad Nov 29, 2022
@billgeo billgeo moved this from 👀 Reviewing to ✅ Closed in Data Infrastructure Squad Nov 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enabler story Enable to team to improve
Development

No branches or pull requests

3 participants