-
-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database Errors When Running rqworker-pool #650
Comments
To add to this: worker-pool works fine as long as the worker count is only 1 E.g. This works
But this fails
A relevant hint may be due to the operation of |
Another set of hints are found in this article https://medium.com/@philamersune/fixing-ssl-error-decryption-failed-or-bad-record-mac-d668e71a5409 "The SSL error: decryption failed or bad record mac occurs either when the certificate is invalid or the message hash value has been tampered; in our case it’s because of the latter. Django creates a single database connection when it tries to query for the first time. Any subsequent calls to the database will use this existing connection until it is expired or closed, in which it will automatically create a new one the next time you query. The PostgreSQL engine in Django uses psycopg to talk to the database; according to the document it is level 2 thread safe. Unfortunately, the timeout() method is using multiprocessing module and therefore tampers the SSL MAC. There are different ways to fix this. We can either (1) use basic threads instead of spawning a new process or (2) use a new database connection in the timeout() method. We can also (3) scrap the timeout() method altogether and handle the async task properly via Celery." |
I found a solution but it's unclear how to integrate into the set of libraries since
The change would be to modify the start-up code for each new process - i.e. https://github.com/rq/rq/blob/3ad86083c33ec28b81a07f94dafdcf1cd56429ea/rq/worker_pool.py#L243 The change is as follows - inserting the following lines into the position above (i.e. into the start of the from django.db import connections
# another complication arises if someone is using a DB alias that is not default... I guess this would need to be configurable
connections["default"].close() |
Perhaps one API design solution would be to provide a way for users to override the WorkerPool class? Another could be to provide a hook |
Ah ok. Django-RQ's
I can think of two options:
I can provide hooks to make it easier, but it'd be better to have working proof of concepts so I know which hooks to provide. I was thinking we could do this via |
I was thinking about this a bit more and wanted to share more information:
As for specific code, I ran something like this. I'm sure there are better ways to reduce duplication and make better use of sub-classing etc. - but it's a POC so you understand the direction: from multiprocessing import Process
from typing import Optional
from uuid import uuid4
from django.db import connections
from django.db.utils import load_backend
from rq.worker_pool import WorkerPool, run_worker
class Psycopg2CompatibleWorkerPool(WorkerPool):
def start_worker(
self,
count: Optional[int] = None,
burst: bool = True,
_sleep: float = 0,
logging_level: str = "INFO",
):
"""
Starts a worker and adds the data to worker_datas.
* sleep: waits for X seconds before creating worker, for testing purposes
"""
name = uuid4().hex
process = Process(
target=run_worker_with_new_db_connection,
args=(name, self._queue_names, self._connection_class, self._pool_class, self._pool_kwargs),
kwargs={
'_sleep': _sleep,
'burst': burst,
'logging_level': logging_level,
'worker_class': self.worker_class,
'job_class': self.job_class,
'serializer': self.serializer,
},
name=f'Worker {name} (WorkerPool {self.name})',
)
process.start()
worker_data = WorkerData(name=name, pid=process.pid, process=process) # type: ignore
self.worker_dict[name] = worker_data
self.log.debug('Spawned worker: %s with PID %d', name, process.pid)
def run_worker_with_new_db_connection(*args, **kwargs):
alias = "default"
connections[alias].close()
run_worker(*args, **kwargs) |
@jackkinsella take a look at https://github.com/rq/rq/pull/2052/files . I think having this PR merged into RQ would provide a reasonable place for django-rq and other frameworks to hook into. |
@selwin Would the idea be that django_rq (or individual user code) could override |
Yes, that’s the general idea. I thought about making a method that returns the callable to run the worker but I think returning the actual process would give more flexibility.On Mar 3, 2024, at 8:22 PM, Jack Kinsella ***@***.***> wrote:
@selwin Would the idea be that django_rq (or individual user code) could overrideget_worker_process and add their own implementation without needing to mess with the larger method?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
So what would be some pseudo code if someone using the library wanted to override that function? I'm wondering if we need to provide an easy hook - e.g. some way for people to modify their Django settings file to specify a location for the override function? |
Well actually, I guess the Django RQ library would be the ideal place to override it. All Django users will have this problem for all DBs I believe. This is due to how SSL works after forking |
@jackkinsella I released RQ 1.16.1 with |
Sure, I can take this on! |
Added feedback here instead https://github.com/rq/django-rq/pull/655/files |
I'm trying to switch from multiple workers for different queues into worker-pool, using the command below
I'm getting database errors:
Other database error appear if I used the
SimpleWorker
This happens on heroku (ubuntu os), database is heroku postgres 13.13
python 3.11.0
django==4.2.8
django-redis==5.4.0
rq==1.15.1
django-rq==2.10.1
RQ working fine as single worker for each queue and these errors happens only when switching to
worker-pool
, any thing I need to change to database configuration?The text was updated successfully, but these errors were encountered: