Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handles host bounce after crash #89

Merged

Conversation

tereshch-aws
Copy link
Collaborator

@tereshch-aws tereshch-aws commented Feb 18, 2023

Fixing issue that host was removed from the sim state after host crash. With this fix, we change the way how we track state of the host during simulation. Crashed and hosts that have finished it's execution are no longer removed from simulation's state, that allowed us to bounce them. We might need to add explicit way to remove hosts from simulation in the future but there is no need to do it right now.

This change also encapsulates host's state management and it's operations(e.g. crash/bounce) into Role type that allowed to simplify Sim::step function code and make task's state management code more clean.

Update: @th7nder submitted #89 in parallel, with the same approach to the fix but without refactoring. I don't mind to go with it if my refactoring is too much.

Special focus:

  • Fixes Re-starting a crashed host with bounce panics #88
  • Role enum -> struct refactoring. Role could be renamed now but I can't come up good name.
  • I'm not happy finished_err function. It feels weird to check is_finished and then call finished_err()? but I was not able to come up with better way to do it.
  • My initial idea to track hosts state as Index<addr, Option<Role>> didn't work out well. It polluted code with handling option is none and expects. After moving state into Role code become more clean.

Fixing issue that host was removed from the sim state
after host crash. With this fix, we change the way how we track state of
the host during simulation. Crashed and hosts that have finished it's
execution are no longer removed from simulation's state, that allowed us
to bounce them. We might need to add explicit way to remove hosts from
simulation in the future but there is no need to do it right now.

This change also encapsulates host's state management and it's
operations(e.g. crash/bounce) into Role type that allowed to simplify
Sim::step function code and make task's state management code more
clean.
@mcches
Copy link
Contributor

mcches commented Feb 20, 2023

I agree Role has probably run its course. Maybe that should be called Host and the current one becomes Os since it is just per-host software that is always there.

src/role.rs Outdated Show resolved Hide resolved
src/role.rs Outdated Show resolved Hide resolved
src/role.rs Outdated Show resolved Hide resolved
src/role.rs Outdated Show resolved Hide resolved
src/sim.rs Outdated Show resolved Hide resolved
src/sim.rs Outdated Show resolved Hide resolved
src/sim.rs Outdated Show resolved Hide resolved
src/sim.rs Outdated Show resolved Hide resolved
src/role.rs Outdated Show resolved Hide resolved
@tereshch-aws
Copy link
Collaborator Author

I agree Role has probably run its course. Maybe that should be called Host and the current one becomes Os since it is just per-host software that is always there.

I've stuck with renaming. I like current Host name and it seems appropriate. I'm not huge fun of OS name. Host seems like combination of hardware and OS.

Current Role seems like represent a software we run on the Host. I've tried a few options, I'm not super happy with all of them, first one is my favorite.

  • Role -> Application
  • Role -> Service
  • Role -> HostRuntime
  • Role -> Soft

RoleType could be renamed to Kind. Kind::Client or Kind::Server or Kind::Service

Copy link
Contributor

@mcches mcches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor idea for tick. Comments are mostly around naming and docs.

src/role.rs Outdated Show resolved Hide resolved
src/role.rs Outdated Show resolved Hide resolved
src/role.rs Outdated Show resolved Hide resolved
src/sim.rs Outdated Show resolved Hide resolved
src/sim.rs Outdated Show resolved Hide resolved
src/role.rs Outdated Show resolved Hide resolved
src/role.rs Outdated Show resolved Hide resolved
Role turn out to become mostly wrapper around Rt, so this change merges
it into Rt, combining software status tracking and runtime ticks.
src/rt.rs Show resolved Hide resolved
src/top.rs Show resolved Hide resolved
Copy link
Contributor

@mcches mcches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes. Comments are all doc suggestions.

src/rt.rs Outdated Show resolved Hide resolved
src/rt.rs Outdated Show resolved Hide resolved
src/rt.rs Outdated Show resolved Hide resolved
src/rt.rs Outdated Show resolved Hide resolved
src/rt.rs Outdated Show resolved Hide resolved
src/rt.rs Show resolved Hide resolved
src/rt.rs Outdated Show resolved Hide resolved
src/rt.rs Show resolved Hide resolved
src/top.rs Show resolved Hide resolved
Co-authored-by: Brett McChesney <39924297+mcches@users.noreply.github.com>
@mcches mcches merged commit 55d4f2f into tokio-rs:main Mar 1, 2023
th7nder pushed a commit to th7nder/turmoil that referenced this pull request Mar 13, 2023
Hosts were made fallible in tokio-rs#67, which surfaced a bug in the
sim's event loop when hosts were crashed.

This fixes that issue by extracting the software result on completion
and includes some refactoring to consolidate `Role` and `Rt`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-starting a crashed host with bounce panics
2 participants