Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEFECT] too many threads modifying variable in DistributedMemoryRunner #1881

Closed
7 of 13 tasks
joshua-cogliati-inl opened this issue Jul 8, 2022 · 2 comments · Fixed by #1899
Closed
7 of 13 tasks

Comments

@joshua-cogliati-inl
Copy link
Contributor

joshua-cogliati-inl commented Jul 8, 2022

Thank you for the defect report

Defect Description

Basically, in JobHandler terminateJobs which modifies the run list ( __running ) on a different thread than cleanJobQueue and fillJobQueue, both of which assume that only the loop thread can modify the run list.

In DistributedMemoryRunner, isDone checks self.thread.finished, but kill calls del self.thread
So if isDone is called after del self.thread, then self.thread.finished will error out.

Traceback (most recent call last):
  File "/home/fred/miniconda3/envs/raven_libraries_heron_newer_ray_and_h5py/li
b/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/home/fred/miniconda3/envs/raven_libraries_heron_newer_ray_and_h5py/li
b/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/projects/raven_db/fred/raven/ravenframework/JobHandler.py", line 442,
 in startLoop
    self.cleanJobQueue()
  File "/projects/raven_db/fred/raven/ravenframework/JobHandler.py", line 922, in cleanJobQueue
    if run is not None and run.isDone():
  File "/projects/raven_db/fred/raven/ravenframework/Runners/DistributedMemoryRunner.py", line 69, in isDone
    if self.thread is None:
AttributeError: 'DistributedMemoryRunner' object has no attribute 'thread'

Steps to Reproduce

Run a long optimizer that uses terminateJobs.

Expected Behavior

Not to crash.

Screenshots and Input Files

No response

OS

Linux

OS Version

No response

Dependency Manager

CONDA

For Change Control Board: Issue Review

  • Is it tagged with a type: defect or task?
  • Is it tagged with a priority: critical, normal or minor?
  • If it will impact requirements or requirements tests, is it tagged with requirements?
  • If it is a defect, can it cause wrong results for users? If so an email needs to be sent to the users.
  • Is a rationale provided? (Such as explaining why the improvement is needed or why current code is wrong.)

For Change Control Board: Issue Closure

  • If the issue is a defect, is the defect fixed?
  • If the issue is a defect, is the defect tested for in the regression test system? (If not explain why not.)
  • If the issue can impact users, has an email to the users group been written (the email should specify if the defect impacts stable or master)?
  • If the issue is a defect, does it impact the latest release branch? If yes, is there any issue tagged with release (create if needed)?
  • If the issue is being closed without a pull request, has an explanation of why it is being closed been provided?
@joshua-cogliati-inl joshua-cogliati-inl changed the title [DEFECT] [DEFECT] too many threads modifying variable in DistributedMemoryRunner Jul 8, 2022
@PaulTalbot-INL
Copy link
Collaborator

Approved to close via #1879 and/or #1883, noting that it's possible new issues may be discovered that will require re-addressing this issue.

@joshua-cogliati-inl joshua-cogliati-inl mentioned this issue Jul 19, 2022
9 tasks
@PaulTalbot-INL
Copy link
Collaborator

Also approved to close via #1899.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants