-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add threads flag to parallel apply #1171
Add threads flag to parallel apply #1171
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Alex, looks really handy!
- Would a quick explainer on the differences fit into any of the existing notebooks?
- Can you please update the date modified up top?
Added the last modified date. I don't think this is used in Notebooks. It is used in Coastlines and other places. From more testing, I'm not sure it makes a massive difference, but I think it's worth having as an option still. |
This is fantastic @alexgleith! As someone from a non-ICT background I don't really have as good an understanding of this stuff as I should - any chance you could add one additional sentence to the doc string explaining why/when using threads vs processes might be a better option? (just something not super technical to help beginner users make a more informed choice) |
No worries, @robbibt. Done. The longer explanation from my (still kind of lay person) view is that a process is a whole new operation, whereas a thread is running in the existing operation. Think of it as like spawning a new machine (process) to do some work, instead of doing another piece of work on the existing machine (thread). A process is very much separated, and inter-process communication is hard, whereas threads share a single process, and so can share memory and communicate directly that way. Being "thread safe" is important too, to stop tasks stomping on each other's memory... I think most of what this function is being used for will be thread safe. Most stuff in Python is thread safe, these days, but it does need consideration. My $0.02! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good to have the extra explanation!
Ok cool! I can't merge it in. I have no power here 😆 No rush. Maybe @omad has an opinion still. |
Proposed changes
The
parallel_apply
function uses a process pool to do work in parallel, which is not necessarilythe best way to do parallel work. This PR keeps that as the default but also provides an option
to us threading as well.
Closes issues (optional)
Checklist (replace
[ ]
with[x]
to check off)Load packages
General advice
)jupyterlab_code_formatter
tool can be used to format code cells to a consistent style: select each code cell, then clickEdit
and then one of theApply X Formatter
options (YAPF
orBlack
are recommended).NCI
andDEA Sandbox
(flag if not working as part of PR and ask for help to solve if needed)Notebook currently compatible with the NCI|DEA Sandbox environment only
line below the notebook title to reflect the environments the notebook is compatible with