Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add redis and a task queue #35

Closed
mlissner opened this issue Jan 28, 2023 · 10 comments · Fixed by #53 or #62
Closed

Add redis and a task queue #35

mlissner opened this issue Jan 28, 2023 · 10 comments · Fixed by #53 or #62
Milestone

Comments

@mlissner
Copy link
Member

mlissner commented Jan 28, 2023

I think we're going to need this:

  • If we can wait a few seconds from when we get a webhook event, the doc might become available, if it's popular. Right now we don't have a way to delay doing something.
  • We don't have a good way to store idempotency keys. We could make a DB table for them, but redis with an expiration date would be better.
  • Receiving webhook events from CL has to be fast. There's a short timeout. That means that we need to store the webhook event, and give it a 200 response. Then, asynchronously, we should do things.

We use celery for our task queue in Django, and I've never liked it very much. Four years ago I did a review of rq, but it wasn't good enough and I had to abandon it. Since then, it has slowly matured, and I think it'll have all the things we need for this project while being a LOT simpler than Celery.

@mlissner mlissner added this to the pre-launch milestone Jan 28, 2023
@mlissner
Copy link
Member Author

I marked this for the pre-launch milestone. I'd be happy to push it back, if we think we can launch without it, but I don't think we really can. Can we?

@anseljh
Copy link
Member

anseljh commented Jan 28, 2023 via email

@mlissner
Copy link
Member Author

FWIW, here's my comparison from a few years ago between rq, huey, datamatiq, and Celery: https://docs.google.com/spreadsheets/d/1Q_v__QtYzDtIDOkfmeQqVAw_jiksBz-GbJn-m-pDM34/edit#gid=0

@mlissner
Copy link
Member Author

I think a lot of the "noes" and blanks in the RQ column are now yeses and knowns.

@mlissner
Copy link
Member Author

mlissner commented Feb 1, 2023

I think there are two parts to this, or maybe three:

  1. We need redis set up in docker-compose and in prod (where we use Elasticache).

  2. We need a worker set up in docker-compose and deployed in k8s.

  3. There's a Sentry integration for rq. We'll want that set up via the Sentry settings.

  4. Once Sentry is configured properly (and rq installed properly), we should set up a URL that triggers a failing rq task. CL has this as celery_fail, but it's the same idea.

When all of that is wired up properly, you can hit the URL to trigger the failing rq task. Django will load that URL, which enqueues an rq task. The worker will pick that up and it'll fail. Sentry will pick up the failure, and we'll know it worked top to bottom.

@mlissner
Copy link
Member Author

mlissner commented Feb 1, 2023

OK, sorry, that's four parts. 🤦

@mlissner
Copy link
Member Author

mlissner commented Feb 1, 2023

Something else to do (five parts):

  • Look at CL's settings and see which parts need updating to work with redis. Two that I noticed today are setting the cache to use Redis and setting up sessions to use it (we don't use sessions much yet, but we will!)

@mlissner
Copy link
Member Author

mlissner commented Feb 3, 2023

Sorry, one more thing to do (six parts):

  • Add redis to the health check endpoint

@mlissner
Copy link
Member Author

mlissner commented Feb 7, 2023

Hm, looks like we can't reach the redis host from our servers, and that when this happens our health check fails. When our health check fails, it restarts our pods over and over, so I tweaked our k8s config to stop doing that and I'm debugging now.

@mlissner mlissner reopened this Feb 7, 2023
@mlissner
Copy link
Member Author

mlissner commented Feb 7, 2023

I always bang my shins with encryption and today is no exception. Instead of connecting with redis://, we need to connect with rediss://, which uses encryption in transit. That's needed because I set up the node to require it when I created the node. It's fixed and deployed: https://bots.law/monitoring/health-check/

@mlissner mlissner closed this as completed Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
2 participants