Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker process killed during run_maintenance_tasks() #726

Closed
tdruez opened this issue May 12, 2023 · 3 comments
Closed

Worker process killed during run_maintenance_tasks() #726

tdruez opened this issue May 12, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@tdruez
Copy link
Contributor

tdruez commented May 12, 2023

make log SERVICE=worker
scancodeio-worker-1  | INFO 2023-05-11 16:26:44.90 Progress: 90% (307,711/341,901) ETA: 17 seconds
scancodeio-worker-1  | 16:27:09 Sent heartbeat to prevent worker timeout. Next one should arrive within 90 seconds.
scancodeio-worker-1  | 16:28:05 Received message: {'type': 'message', 'pattern': None, 'channel': b'rq:pubsub:42ab127f8898470e95b6db849e3db6e3', 'data': b'{"command": "stop-job", "job_id": "7959f197-87f8-4a0e-ac5d-9c06c41edd3c"}'}
scancodeio-worker-1  | 16:28:05 Received command to stop job 7959f197-87f8-4a0e-ac5d-9c06c41edd3c
scancodeio-worker-1  | 16:28:05 Killed horse pid 29
scancodeio-worker-1  | 16:28:05 Job stopped by user, moving job to FailedJobRegistry
scancodeio-worker-1  | 16:28:05 Handling failed execution of job 7959f197-87f8-4a0e-ac5d-9c06c41edd3c
scancodeio-worker-1  | 16:28:05 Sent heartbeat to prevent worker timeout. Next one should arrive within 480 seconds.
scancodeio-worker-1  | 16:28:05 Cleaning registries for queue: default
scancodeio-worker-1  | INFO Synchronizing QUEUED and RUNNING Run with their related Jobs...
scancodeio-worker-1  | 16:28:06 Worker rq:worker:42ab127f8898470e95b6db849e3db6e3: found an unhandled exception, quitting...
scancodeio-worker-1  | Traceback (most recent call last):
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 98, in inner
scancodeio-worker-1  |     return func(*args, **kwargs)
scancodeio-worker-1  |            ^^^^^^^^^^^^^^^^^^^^^
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/psycopg/cursor.py", line 829, in fetchmany
scancodeio-worker-1  |     self._check_result_for_fetch()
scancodeio-worker-1  | 16:28:06 Registering death
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/psycopg/cursor.py", line 606, in _check_result_for_fetch
scancodeio-worker-1  |     raise e.ProgrammingError("the last operation didn't produce a result")
scancodeio-worker-1  | psycopg.ProgrammingError: the last operation didn't produce a result
scancodeio-worker-1  |
scancodeio-worker-1  | The above exception was the direct cause of the following exception:
scancodeio-worker-1  |
scancodeio-worker-1  | Traceback (most recent call last):
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/rq/worker.py", line 780, in work
scancodeio-worker-1  |     self.run_maintenance_tasks()
scancodeio-worker-1  |   File "/app/scancodeio/worker.py", line 54, in run_maintenance_tasks
scancodeio-worker-1  |     scanpipe_app.sync_runs_and_jobs()
scancodeio-worker-1  |   File "/app/scanpipe/apps.py", line 201, in sync_runs_and_jobs
scancodeio-worker-1  |     if queued_or_running:
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 412, in __bool__
scancodeio-worker-1  |     self._fetch_all()
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 1881, in _fetch_all
scancodeio-worker-1  |     self._result_cache = list(self._iterable_class(self))
scancodeio-worker-1  |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 91, in __iter__
scancodeio-worker-1  |     results = compiler.execute_sql(
scancodeio-worker-1  |               ^^^^^^^^^^^^^^^^^^^^^
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1593, in execute_sql
scancodeio-worker-1  |     return list(result)
scancodeio-worker-1  |            ^^^^^^^^^^^^
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 2091, in cursor_iter
scancodeio-worker-1  |     for rows in iter((lambda: cursor.fetchmany(itersize)), sentinel):
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 2091, in <lambda>
scancodeio-worker-1  |     for rows in iter((lambda: cursor.fetchmany(itersize)), sentinel):
scancodeio-worker-1  |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 97, in inner
scancodeio-worker-1  |     with self:
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
scancodeio-worker-1  |     raise dj_exc_value.with_traceback(traceback) from exc_value
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 98, in inner
scancodeio-worker-1  |     return func(*args, **kwargs)
scancodeio-worker-1  |            ^^^^^^^^^^^^^^^^^^^^^
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/psycopg/cursor.py", line 829, in fetchmany
scancodeio-worker-1  |     self._check_result_for_fetch()
scancodeio-worker-1  |   File "/usr/local/lib/python3.11/site-packages/psycopg/cursor.py", line 606, in _check_result_for_fetch
scancodeio-worker-1  |     raise e.ProgrammingError("the last operation didn't produce a result")
scancodeio-worker-1  | django.db.utils.ProgrammingError: the last operation didn't produce a result
scancodeio-worker-1  | 16:28:06 Unsubscribing from channel rq:pubsub:42ab127f8898470e95b6db849e3db6e3

It seems that a pipeline Run was manually stopped. The run_maintenance_tasks is eventually called by the worker, triggering the scanpipe_app.sync_runs_and_jobs().
The exception is raised when evaluating the Django QuerySet:

run_model.objects.queued_or_running()
@tdruez tdruez added the bug Something isn't working label May 12, 2023
@tdruez
Copy link
Contributor Author

tdruez commented May 12, 2023

To go further and look at the RQ failed Jobs:

docker compose run worker bash
from rq import Queue
from rq.job import Job
from rq.registry import FailedJobRegistry
import django_rq

redis = django_rq.get_connection()
queue = Queue(connection=redis)
registry = FailedJobRegistry(queue=queue)

# Show all failed job IDs and the exceptions they caused during runtime
for job_id in registry.get_job_ids():
    job = Job.fetch(job_id, connection=redis)
    print("-" * 98)
    print(job_id, job.exc_info)
59aa9157-c63f-4547-8173-be9386955483 Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
psycopg.OperationalError: consuming input failed: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/rq/worker.py", line 1359, in perform_job
    rv = job.perform()
         ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rq/job.py", line 1178, in perform
    self._result = self._execute()
                   ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rq/job.py", line 1215, in _execute
    result = self.func(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/scanpipe/tasks.py", line 52, in execute_pipeline_task
    run = get_run_instance(run_pk)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/scanpipe/tasks.py", line 37, in get_run_instance
    return run_model.objects.get(pk=run_pk)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 633, in get
    num = len(clone)
          ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 380, in __len__
    self._fetch_all()
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 1881, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 91, in __iter__
    results = compiler.execute_sql(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1560, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/cursor.py", line 723, in execute
    raise ex.with_traceback(None)
django.db.utils.OperationalError: consuming input failed: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 308, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 331, in create_cursor
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/connection.py", line 840, in cursor
    self._check_connection_ok()
  File "/usr/local/lib/python3.11/site-packages/psycopg/connection.py", line 479, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
psycopg.OperationalError: the connection is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/rq/worker.py", line 1380, in perform_job
    self.execute_failure_callback(job, *exc_info)
  File "/usr/local/lib/python3.11/site-packages/rq/worker.py", line 1335, in execute_failure_callback
    job.failure_callback(job, self.connection, *exc_info)
  File "/app/scanpipe/tasks.py", line 45, in report_failure
    run = get_run_instance(run_pk=job.id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/scanpipe/tasks.py", line 37, in get_run_instance
    return run_model.objects.get(pk=run_pk)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 633, in get
    num = len(clone)
          ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 380, in __len__
    self._fetch_all()
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 1881, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 91, in __iter__
    results = compiler.execute_sql(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1558, in execute_sql
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 330, in cursor
    return self._cursor()
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 307, in _cursor
    with self.wrap_database_errors:
  File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 308, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 331, in create_cursor
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/connection.py", line 840, in cursor
    self._check_connection_ok()
  File "/usr/local/lib/python3.11/site-packages/psycopg/connection.py", line 479, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
django.db.utils.OperationalError: the connection is closed

@tdruez
Copy link
Contributor Author

tdruez commented May 12, 2023

This may be related to the database connection persistence in the context of long-running pipeline jobs.
https://docs.djangoproject.com/en/4.2/ref/databases/#persistent-connections

tdruez added a commit that referenced this issue May 12, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 12, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 12, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 12, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 15, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 18, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 26, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 26, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 26, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 26, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 26, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 26, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
@tdruez
Copy link
Contributor Author

tdruez commented May 30, 2023

This appears to be fixed with changes from #748

@tdruez tdruez closed this as completed May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant