Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue jobs in internal queue instead of dumping all jobs on cluster at once #46

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sandeepklr
Copy link

Hi Dan,

I have modified code to include a queue for maximum number of jobs to run on the cluster at any time.

Please find below a summary of the changes:

  • Use max_processes parameter for maximum # of cluster jobs to run at once.
  • Create a Session object before starting JobMonitor and embed the Session object in the job monitor.
    - Everywhere that used a session_id now uses the embedded session object in the JobMonitor.
  • Function _submit_jobs() is no longer used. All jobs are submitted from the JobMonitor using _append_job_to_session()
  • check_alive() function has been refactored into two functions: check_alive() and check_job_status():
    - check_alive() is still called everytime the local heartbeat is received
    - check_alive() goes through the queue and looks for jobs to remove from queue either because they have finished, or they have hit the maximum number of resubmits in case of errors. Depending on the number of empty slots, new jobs are spun up.
  • all_jobs_done() is now simplified to just check that ALL jobs have been processed on the cluster.

@landscape-bot
Copy link

Code Health
Repository health increased by 25% when pulling 174ab9b on sandeepklr:master into c291881 on pygridtools:master.

@landscape-bot
Copy link

Code Health
Repository health increased by 26% when pulling a3a0b7b on sandeepklr:master into c291881 on pygridtools:master.

@desilinguist
Copy link
Contributor

HI @sandeepklr can you please refresh this PR if you are still interested in merging this in? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants