Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the calculation for the number of threads to be used #39

Merged
merged 4 commits into from
Sep 26, 2024
Merged

Conversation

haoyu-zc
Copy link
Collaborator

@haoyu-zc haoyu-zc commented Sep 24, 2024

There's a bug when n_jobs is provided as a negative number. Here's the original contract and code:

Docstring:
n_jobs: number of CPU cores to use for parallelization. The value
None will use all available cores (os.cpu_count()), and negative
values will use os.cpu_count() - n_jobs. Default is 1.

Code:

# get the number of cores to use
n_jobs = os.cpu_count() if n_jobs is None else n_jobs
default_n_threads = (os.cpu_count() - n_jobs) if n_jobs < 0 else n_jobs

It's clear that we should fix the subtraction to an addition:

default_n_threads = (os.cpu_count() + n_jobs) if n_jobs < 0 else n_jobs

@haoyu-zc haoyu-zc requested a review from miltondp September 24, 2024 17:24
Copy link
Member

@miltondp miltondp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @haoyu-zc. I left some comments requesting a few more changes. See if it makes sense for you.

@@ -640,7 +640,7 @@ def ccc(

# get number of cores to use
n_jobs = os.cpu_count() if n_jobs is None else n_jobs
default_n_threads = (os.cpu_count() - n_jobs) if n_jobs < 0 else n_jobs
default_n_threads = (os.cpu_count() + n_jobs) if n_jobs < 0 else n_jobs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the problem described in the PR. However, if n_jobs has an absolute value equal or larger than os.cpu_count() it will be zero or negative, right? We should check if the final value for default_n_threads is within a valid range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think the docstring of this function needs to be updated, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I missed! I'll update the code and add test cases. Thanks!

@haoyu-zc
Copy link
Collaborator Author

Changes made:

  • Renamed default_n_threads to n_workers since now we can use both processes and threads
  • Wrote a function to compute the number of workers, covered by unit tests
  • Replaced two assertions with exceptions

@haoyu-zc haoyu-zc requested a review from miltondp September 26, 2024 03:50
Copy link
Member

@miltondp miltondp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Should we also update the version in setup.py for a new release, or you plan to do that in another PR? (whatever you prefer is fine for me).

Also, after installing this version (pip install -e .), I tested again the last issue #37 (with Python 3.12, not Python 3.9) and now everything works as expected.

Another issue that seems to be fixed now is #38 :) I wonder what happened.

Comment on lines +537 to +541
n_workers = os.cpu_count() + n_jobs if n_jobs < 0 else n_jobs

if n_workers < 1:
raise ValueError(f"The number of threads/processes to use must be greater than 0. Got {n_workers}."
"Please check the n_jobs argument provided")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. If a user specifies more threads than actual CPU cores, that's a problem for the user (using more resources than actual ones) and we should not check for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! As ThreadPoolExecutor(max_workers=n_workers) also only complains when n_workers < 0. We don't need to check for that case

@haoyu-zc
Copy link
Collaborator Author

Looks great. Should we also update the version in setup.py for a new release, or you plan to do that in another PR? (whatever you prefer is fine for me).

Also, after installing this version (pip install -e .), I tested again the last issue #37 (with Python 3.12, not Python 3.9) and now everything works as expected.

Another issue that seems to be fixed now is #38 :) I wonder what happened.

I think it might be some changes with Python itself. I don't have a clue now either...

I'll do a new PR for the new release to clarify things.

@haoyu-zc haoyu-zc merged commit f9568cf into main Sep 26, 2024
6 checks passed
@haoyu-zc haoyu-zc deleted the fix-njobs branch September 26, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants