Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix ExpiringDict caching solution #1200

Merged
merged 7 commits into from
Jul 22, 2024
Merged

fix ExpiringDict caching solution #1200

merged 7 commits into from
Jul 22, 2024

Conversation

terrazoon
Copy link
Contributor

@terrazoon terrazoon commented Jul 19, 2024

Description

For various reasons we are currently forced to use an in-memory cache to store job information that is used to populate the reports. Although we increased the cache time to 7 days and the number of objects to 20,000, there are still intermittent problems retrieving data being reported.

We found two separate issues:

  1. Sometimes we cannot download a csv file (404) even though we know that csv was successfully uploaded and the message in the csv was successfully sent. This should not happen and the reason why it does is currently a mystery (seems to happen on Fridays!). We are going to ignore this issue for the moment.

  2. Every time the app restarts, the in-memory cache is obviously wiped out, and it seems that the app restarts more frequently with more dire consequences for report generation than we anticipated. In addition to this, no in-memory solution can run across processes so we will need a separate cache for every gunicorn worker.

So the solution proposed here is run a 'repair' task periodically that checks the cache to make sure it has all jobs currently in the bucket.

NOTE: The 'best practice' here would be to use redis or memcached, but until or unless we decide to go that route, ExpiringDict where we rebuild it on a schedule seems like a workable solution. It potentially does require more memory (although how much memory is being used by one instance of the cache is unknown, and probably not much).

Security Considerations

N/A

@terrazoon terrazoon self-assigned this Jul 19, 2024
@terrazoon terrazoon marked this pull request as draft July 19, 2024 21:40
@terrazoon terrazoon marked this pull request as ready for review July 22, 2024 14:21
@terrazoon terrazoon requested review from ccostino and a team July 22, 2024 14:24
Copy link
Contributor

@xlorepdarkhelm xlorepdarkhelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my question below.

app/aws/s3.py Show resolved Hide resolved
app/aws/s3.py Outdated Show resolved Hide resolved
Copy link
Contributor

@xlorepdarkhelm xlorepdarkhelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ccostino ccostino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @terrazoon!

Let's see if this helps alleviate some of the pain at least and buys us more time and space to figure out our next steps with all of this.

@ccostino ccostino merged commit 26c6d39 into main Jul 22, 2024
7 checks passed
@ccostino ccostino deleted the grrr branch July 22, 2024 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants