Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel queued and running pipeline task #176

Closed
tdruez opened this issue May 11, 2021 · 7 comments
Closed

Cancel queued and running pipeline task #176

tdruez opened this issue May 11, 2021 · 7 comments

Comments

@tdruez
Copy link
Contributor Author

tdruez commented May 12, 2021

See also #134

@tdruez
Copy link
Contributor Author

tdruez commented Aug 9, 2021

https://docs.celeryproject.org/en/latest/userguide/workers.html#revoke-revoking-tasks

The app.control.revoke(task_id, terminate=True) would be the best approach but it is only supported by prefork pool while we are using the thread pool.

revoke: Revoking tasks
pool support: all, terminate only supported by prefork

@tdruez
Copy link
Contributor Author

tdruez commented Aug 9, 2021

@pombredanne Note that we picked the "thread" pool instead of the default "prefork" one since we were encountering the following error when entering multiprocessing section of the code:
daemonic processes are not allowed to have children

This issue is related to a conflict between Celery that uses billard were the toolkit is using multiprocessing, if I remember correctly.

We may want to revisit this since the "thread" pool does not support several Celery features like terminate and timeouts.

@tdruez tdruez added this to the 2021-08 milestone Aug 9, 2021
@tdruez tdruez self-assigned this Aug 9, 2021
@pombredanne
Copy link
Member

@tdruez

We may want to revisit this since the "thread" pool does not support several Celery features like terminate and timeouts.

let's do it.
One thing we could do on the SCTK side is support receiving callables and pool-like objects as arguments such that it could use a SCIO-provided one when provided.

@tdruez
Copy link
Contributor Author

tdruez commented Aug 10, 2021

@pombredanne The problem is not only on the TK side, we have entirely based the multiprocessing code in ScanCode.io on the concurrent.futures module.

@tdruez
Copy link
Contributor Author

tdruez commented Aug 16, 2021

Current concurrency setup:

  • Celery workers with pool=thread (based on concurrent.futures.ThreadPoolExecutor)
    -> Handles the parallelisation of the Pipelines executions
  • ScanCode.io pipes based on concurrent.futures.ProcessPoolExecutor
    -> Handles the distribution of scanning processes over the multiple CPUs (multiprocessing)
  • ScanCode scan_resource based on multiprocessing and _thread modules
    -> Handles the timeouts of scanning functions

We decided to use the thread pool for the Celery workers as the default prefork (based on billard does not work with the multiprocessing Python module that is used in ScanCode-toolkit and ScanCode.io

The issue: The thread pool does not support several Celery features like terminate and timeouts.

Possible solutions:

  1. Keep everything as-is and implement custom methods for the terminate and timeouts on the TaskPool class.
    -> There's likely good reasons why Celery does not support this though...

  2. Switch back to pool=prefork for Celery workers, rewrite the entire code for pipes based on ProcessPoolExecutor and tweak the toolkit to replace multiprocessing dependencies
    -> This will likely make the code very dependant on billard which is too specific to Celery

  3. Simplify the whole concurrency setup to be only based on Celery, where everything down to single resource scanning would be an async task.
    -> We may need to deal with multiple queues to have separation between Pipeline execution and resource scanning. Also, cancelling a Pipeline execution for which thousands of resource scan tasks have been created may get even worst to handle.

  4. Explore new queue library:

tdruez added a commit that referenced this issue Sep 8, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 15, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 16, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 16, 2021
The interuptible system in the toolkit does not play nice with ProcessPool

Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 16, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 16, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 16, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 17, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 17, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 17, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 17, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 17, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 17, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 22, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 22, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 23, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 23, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 23, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 27, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 27, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 27, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 27, 2021
… mode #176

Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 28, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 30, 2021
…ings #176

Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 30, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 30, 2021
tdruez added a commit that referenced this issue Sep 30, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Oct 6, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Oct 6, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Oct 7, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Oct 7, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Oct 7, 2021
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Oct 8, 2021
@tdruez
Copy link
Contributor Author

tdruez commented Oct 8, 2021

Ability to stop a running pipeline and delete queued pipeline available in latest release https://github.com/nexB/scancode.io/releases/tag/v30.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants