Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot #3880

Merged
merged 7 commits into from
Dec 14, 2019

Conversation

stephenxs
Copy link
Collaborator

- What I did
Fix issue reboot cause test failed if warm-reboot follows any kind of hardware-caused reboot #3871

- How I did it

  1. check whether /proc/cmdline indicates warm/fast reboot.
    if yes the software reboot cause file will be treated as the reboot cause.
    finish
  2. check whether platform api returns a reboot cause.
    if yes it is treated as the reboot cause.
    finish.
  3. check whether /hosts/reboot-cause contains a cause.
    if yes it is treated as the cause otherwise return unknown.

Detailed information can be found in the issue description.

- How to verify it
reboot the DUT in the following sequence

  1. reboot
  2. warm-reboot
  3. fast-reboot
  4. power cycle reboot
  5. warm-reboot
  6. fast-reboot
  7. issue a warm-reboot and then cut off the power supply after kernel reloaded and ahead of process-reboot-cause handling the reboot cause, the reboot cause should be power off.

reboot-cause-test.txt

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Stephen Sun added 2 commits December 12, 2019 09:55
…ned when warm reboot follows a hardware caused reboot

1. check whether /proc/cmdline indicates warm/fast reboot.
   if yes the software reboot cause file will be treated as the reboot cause.
   finish
2. check whether platform api returns a reboot cause.
   if yes it is treated as the reboot cause.
   finish.
3. check whether /hosts/reboot-cause contains a cause.
   if yes it is treated as the cause otherwise return unknown.
Copy link
Collaborator

@stepanblyschak stepanblyschak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented

@@ -22,6 +23,9 @@ REBOOT_CAUSE_DIR = "/host/reboot-cause/"
REBOOT_CAUSE_FILE = REBOOT_CAUSE_DIR + "reboot-cause.txt"
PREVIOUS_REBOOT_CAUSE_FILE = REBOOT_CAUSE_DIR + "previous-reboot-cause.txt"
FIRST_BOOT_PLATFORM_FILE = "/tmp/notify_firstboot_to_platform"
REBOOT_CAUSE_KEXEC = "/proc/cmdline"
# The following SONIC_BOOT_TYPEs come from the warm/fast reboot script which is in sonic-utilities
REBOOT_CAUSE_KEXEC_PATTERN = ".*SONIC_BOOT_TYPE=(fast-reboot|warm|fastfast)$"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to new design for fast reboot - "SONIC_BOOT_TYPE=fast", but also need to check for "fast-reboot" in case reboot was done from 201803

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regular expression will match if SONIC_BOOT_TYPE is the last argument, lets have '.*' instead of '$'

Copy link
Collaborator

@nazariig nazariig Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs IMHO:

REBOOT_CAUSE_KEXEC_PATTERN = ".*SONIC_BOOT_TYPE=(fast-reboot|warm|fastfast)$" ->
REBOOT_CAUSE_KEXEC_PATTERN = "SONIC_BOOT_TYPE=(fast-reboot|warm|fastfast)"

re.match(REBOOT_CAUSE_KEXEC_PATTERN, cause_file_kexec) ->
re.search(REBOOT_CAUSE_KEXEC_PATTERN, cause_file_kexec)

@@ -47,6 +51,16 @@ def log_error(msg):


# ============================= Functions =============================
def is_warmfast_reboot_from_proc_cmdline():
if os.path.isfile(REBOOT_CAUSE_KEXEC):
cause_file = open(REBOOT_CAUSE_KEXEC, "r")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files was not closed, try to use with open(REBOOT_CASE_KEXEC, "r"):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. use "with" for all open file cases.

# The reboot was not caused by hardware. If there is a REBOOT_CAUSE_FILE, it will
# contain any software-related reboot info. We will use it as the previous cause.
if os.path.isfile(REBOOT_CAUSE_FILE):
cause_file = open(REBOOT_CAUSE_FILE, "r")
Copy link
Collaborator

@nazariig nazariig Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs please use with statement

1. use "with" statement
2. update fast/warm reboot BOOT_ARG
@@ -22,6 +23,9 @@ REBOOT_CAUSE_DIR = "/host/reboot-cause/"
REBOOT_CAUSE_FILE = REBOOT_CAUSE_DIR + "reboot-cause.txt"
PREVIOUS_REBOOT_CAUSE_FILE = REBOOT_CAUSE_DIR + "previous-reboot-cause.txt"
FIRST_BOOT_PLATFORM_FILE = "/tmp/notify_firstboot_to_platform"
REBOOT_CAUSE_KEXEC = "/proc/cmdline"
# The following SONIC_BOOT_TYPEs come from the warm/fast reboot script which is in sonic-utilities
REBOOT_CAUSE_KEXEC_PATTERN = ".*SONIC_BOOT_TYPE=(fast-reboot|warm|fastfast|fast).*"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will not work for 201803->master upgrade. Please take a look at https://github.com/Azure/sonic-utilities/blob/201803/scripts/fast-reboot#L30

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for pointing out. will fix it.
Btw, we don't need to consider versions earlier than 201803, right?
@jleveque as Stepan isn't in working hours, can you help ensure that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenxs: You are correct; we don't need to consider versions earlier than 201803.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

prev_cause_file.write(previous_reboot_cause)
prev_cause_file.close()
with open(PREVIOUS_REBOOT_CAUSE_FILE, "w") as prev_cause_file:
prev_cause_file.write(previous_reboot_cause)
Copy link
Contributor

@yxieca yxieca Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an issue for this PR or change request for this PR. Just want to start the discussion here.

This change is the result of a team brainstorm of various combinations of hard/soft reboot reason. We fell that we've covered all the cases for now. But in case we didn't. We could consider change the previous reboot cause file to 3 lines:

Reboot cause: (calculation result)
Detected hardware cause: (hardware cause)
Detected software cause: (software cause)

Not sure if any other part of our code has dependency on this file being single line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I agree your point.
in addition, if reboot causes from different sources are conflict, it's better to have it in a detailed way.
meanwhile, the drawback is it can confuse user sometimes. for example, if the following info displayed:

calculated reboot cause: warm reboot
hardware cause: power off
software cause: warm reboot

probably it's better to only display in a 3-line way in case of we known for sure there is conflicts among causes from different sources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got a good point. However, if we know there is a conflict, then we would have figured it out and fixed it in code. We cannot predict what we don't know yet. :-)

Copy link
Contributor

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this change!

else:
previous_reboot_cause = hardware_reboot_cause
# 2. Check if the previous reboot was caused by hardware
# If yes, the hardware reboot cause will be treated as teh reboot cause
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teh [](start = 70, length = 3)

-> the

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if os.path.isfile(FIRST_BOOT_PLATFORM_FILE):
if (previous_reboot_cause == UNKNOWN_REBOOT_CAUSE):
previous_reboot_cause = UNKNOWN_REBOOT_CAUSE
os.remove(FIRST_BOOT_PLATFORM_FILE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 121 to line 132 are same as line 140 to line 151.

suggest to consolidate them as a single function.

previous_reboot_cause = find_software_reboot_cause()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

else:
# /proc/cmdline says it's a warm/fast reboot but /host/reboot-cause.txt doesn't exist.
# report an error.
log_error("/proc/cmdline indicates a fast/warm reboot but {} doesn't exist".format(REBOOT_CAUSE_DIR))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than report error,

I think we should still assign the

proc_cmdline_reboot_cause = warm or fast

we should treat proc cmdline as truth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@yxieca
Copy link
Contributor

yxieca commented Dec 14, 2019

retest vs please

@yxieca yxieca merged commit 80bb7fd into sonic-net:master Dec 14, 2019
yxieca pushed a commit that referenced this pull request Dec 14, 2019
…ned when warm reboot follows a hardware caused reboot (#3880)

* [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot
1. check whether /proc/cmdline indicates warm/fast reboot.
   if yes the software reboot cause file will be treated as the reboot cause.
   finish
2. check whether platform api returns a reboot cause.
   if yes it is treated as the reboot cause.
   finish.
3. check whether /hosts/reboot-cause contains a cause.
   if yes it is treated as the cause otherwise return unknown.

* [process-reboot-cause]Fix review comments

* [process-reboot-cause]address comments
1. use "with" statement
2. update fast/warm reboot BOOT_ARG

* [process-reboot-cause]address comments

* refactor the code flow

* Remove escape

* Remove extra ':'
@stephenxs stephenxs deleted the reboot-cause-warm-follows-hw branch December 16, 2019 22:29
abdosi pushed a commit that referenced this pull request Jan 3, 2020
…ned when warm reboot follows a hardware caused reboot (#3880)

* [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot
1. check whether /proc/cmdline indicates warm/fast reboot.
   if yes the software reboot cause file will be treated as the reboot cause.
   finish
2. check whether platform api returns a reboot cause.
   if yes it is treated as the reboot cause.
   finish.
3. check whether /hosts/reboot-cause contains a cause.
   if yes it is treated as the cause otherwise return unknown.

* [process-reboot-cause]Fix review comments

* [process-reboot-cause]address comments
1. use "with" statement
2. update fast/warm reboot BOOT_ARG

* [process-reboot-cause]address comments

* refactor the code flow

* Remove escape

* Remove extra ':'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants