Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add threads flag to parallel apply #1171

Conversation

alexgleith
Copy link
Contributor

Proposed changes

The parallel_apply function uses a process pool to do work in parallel, which is not necessarily
the best way to do parallel work. This PR keeps that as the default but also provides an option
to us threading as well.

Closes issues (optional)

  • n/a

Checklist (replace [ ] with [x] to check off)

  • Notebook created using the DEA-notebooks template
  • Remove any unused Python packages from Load packages
  • Remove any unused/empty code cells
  • Remove any guidance cells (e.g. General advice)
  • [] Ensure that all code cells follow the PEP8 standard for code. The jupyterlab_code_formatter tool can be used to format code cells to a consistent style: select each code cell, then click Edit and then one of the Apply X Formatter options (YAPF or Black are recommended).
  • Include relevant tags in the final notebook cell (refer to the DEA Tags Index, and re-use tags if possible)
  • Clear all outputs, run notebook from start to finish, and save the notebook in the state where all cells have been sequentially evaluated
  • Test notebook on both the NCI and DEA Sandbox (flag if not working as part of PR and ask for help to solve if needed)
  • If applicable, update the Notebook currently compatible with the NCI|DEA Sandbox environment only line below the notebook title to reflect the environments the notebook is compatible with

Copy link
Collaborator

@BexDunn BexDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Alex, looks really handy!

  • Would a quick explainer on the differences fit into any of the existing notebooks?
  • Can you please update the date modified up top?

@alexgleith
Copy link
Contributor Author

Added the last modified date.

I don't think this is used in Notebooks. It is used in Coastlines and other places.

From more testing, I'm not sure it makes a massive difference, but I think it's worth having as an option still.

@robbibt
Copy link
Member

robbibt commented Jan 9, 2024

This is fantastic @alexgleith! As someone from a non-ICT background I don't really have as good an understanding of this stuff as I should - any chance you could add one additional sentence to the doc string explaining why/when using threads vs processes might be a better option? (just something not super technical to help beginner users make a more informed choice)

@alexgleith
Copy link
Contributor Author

No worries, @robbibt. Done.

The longer explanation from my (still kind of lay person) view is that a process is a whole new operation, whereas a thread is running in the existing operation. Think of it as like spawning a new machine (process) to do some work, instead of doing another piece of work on the existing machine (thread).

A process is very much separated, and inter-process communication is hard, whereas threads share a single process, and so can share memory and communicate directly that way.

Being "thread safe" is important too, to stop tasks stomping on each other's memory... I think most of what this function is being used for will be thread safe. Most stuff in Python is thread safe, these days, but it does need consideration.

My $0.02!

Copy link
Collaborator

@BexDunn BexDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good to have the extra explanation!

@alexgleith
Copy link
Contributor Author

Ok cool!

I can't merge it in. I have no power here 😆

No rush. Maybe @omad has an opinion still.

@robbibt robbibt merged commit ad849c4 into GeoscienceAustralia:develop Jan 22, 2024
0 of 2 checks passed
@alexgleith alexgleith deleted the add-threads-flag-to-parallel-apply branch May 28, 2024 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants