-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot #3880
[process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot #3880
Conversation
…ned when warm reboot follows a hardware caused reboot 1. check whether /proc/cmdline indicates warm/fast reboot. if yes the software reboot cause file will be treated as the reboot cause. finish 2. check whether platform api returns a reboot cause. if yes it is treated as the reboot cause. finish. 3. check whether /hosts/reboot-cause contains a cause. if yes it is treated as the cause otherwise return unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented
@@ -22,6 +23,9 @@ REBOOT_CAUSE_DIR = "/host/reboot-cause/" | |||
REBOOT_CAUSE_FILE = REBOOT_CAUSE_DIR + "reboot-cause.txt" | |||
PREVIOUS_REBOOT_CAUSE_FILE = REBOOT_CAUSE_DIR + "previous-reboot-cause.txt" | |||
FIRST_BOOT_PLATFORM_FILE = "/tmp/notify_firstboot_to_platform" | |||
REBOOT_CAUSE_KEXEC = "/proc/cmdline" | |||
# The following SONIC_BOOT_TYPEs come from the warm/fast reboot script which is in sonic-utilities | |||
REBOOT_CAUSE_KEXEC_PATTERN = ".*SONIC_BOOT_TYPE=(fast-reboot|warm|fastfast)$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to new design for fast reboot - "SONIC_BOOT_TYPE=fast", but also need to check for "fast-reboot" in case reboot was done from 201803
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regular expression will match if SONIC_BOOT_TYPE is the last argument, lets have '.*' instead of '$'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenxs IMHO:
REBOOT_CAUSE_KEXEC_PATTERN = ".*SONIC_BOOT_TYPE=(fast-reboot|warm|fastfast)$" ->
REBOOT_CAUSE_KEXEC_PATTERN = "SONIC_BOOT_TYPE=(fast-reboot|warm|fastfast)"
re.match(REBOOT_CAUSE_KEXEC_PATTERN, cause_file_kexec) ->
re.search(REBOOT_CAUSE_KEXEC_PATTERN, cause_file_kexec)
@@ -47,6 +51,16 @@ def log_error(msg): | |||
|
|||
|
|||
# ============================= Functions ============================= | |||
def is_warmfast_reboot_from_proc_cmdline(): | |||
if os.path.isfile(REBOOT_CAUSE_KEXEC): | |||
cause_file = open(REBOOT_CAUSE_KEXEC, "r") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
files was not closed, try to use with open(REBOOT_CASE_KEXEC, "r"):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. use "with" for all open file cases.
# The reboot was not caused by hardware. If there is a REBOOT_CAUSE_FILE, it will | ||
# contain any software-related reboot info. We will use it as the previous cause. | ||
if os.path.isfile(REBOOT_CAUSE_FILE): | ||
cause_file = open(REBOOT_CAUSE_FILE, "r") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenxs please use with
statement
1. use "with" statement 2. update fast/warm reboot BOOT_ARG
@@ -22,6 +23,9 @@ REBOOT_CAUSE_DIR = "/host/reboot-cause/" | |||
REBOOT_CAUSE_FILE = REBOOT_CAUSE_DIR + "reboot-cause.txt" | |||
PREVIOUS_REBOOT_CAUSE_FILE = REBOOT_CAUSE_DIR + "previous-reboot-cause.txt" | |||
FIRST_BOOT_PLATFORM_FILE = "/tmp/notify_firstboot_to_platform" | |||
REBOOT_CAUSE_KEXEC = "/proc/cmdline" | |||
# The following SONIC_BOOT_TYPEs come from the warm/fast reboot script which is in sonic-utilities | |||
REBOOT_CAUSE_KEXEC_PATTERN = ".*SONIC_BOOT_TYPE=(fast-reboot|warm|fastfast|fast).*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will not work for 201803->master upgrade. Please take a look at https://github.com/Azure/sonic-utilities/blob/201803/scripts/fast-reboot#L30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for pointing out. will fix it.
Btw, we don't need to consider versions earlier than 201803, right?
@jleveque as Stepan isn't in working hours, can you help ensure that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenxs: You are correct; we don't need to consider versions earlier than 201803.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
prev_cause_file.write(previous_reboot_cause) | ||
prev_cause_file.close() | ||
with open(PREVIOUS_REBOOT_CAUSE_FILE, "w") as prev_cause_file: | ||
prev_cause_file.write(previous_reboot_cause) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an issue for this PR or change request for this PR. Just want to start the discussion here.
This change is the result of a team brainstorm of various combinations of hard/soft reboot reason. We fell that we've covered all the cases for now. But in case we didn't. We could consider change the previous reboot cause file to 3 lines:
Reboot cause: (calculation result)
Detected hardware cause: (hardware cause)
Detected software cause: (software cause)
Not sure if any other part of our code has dependency on this file being single line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally I agree your point.
in addition, if reboot causes from different sources are conflict, it's better to have it in a detailed way.
meanwhile, the drawback is it can confuse user sometimes. for example, if the following info displayed:
calculated reboot cause: warm reboot
hardware cause: power off
software cause: warm reboot
probably it's better to only display in a 3-line way in case of we known for sure there is conflicts among causes from different sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You got a good point. However, if we know there is a conflict, then we would have figured it out and fixed it in code. We cannot predict what we don't know yet. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this change!
else: | ||
previous_reboot_cause = hardware_reboot_cause | ||
# 2. Check if the previous reboot was caused by hardware | ||
# If yes, the hardware reboot cause will be treated as teh reboot cause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
teh [](start = 70, length = 3)
-> the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
if os.path.isfile(FIRST_BOOT_PLATFORM_FILE): | ||
if (previous_reboot_cause == UNKNOWN_REBOOT_CAUSE): | ||
previous_reboot_cause = UNKNOWN_REBOOT_CAUSE | ||
os.remove(FIRST_BOOT_PLATFORM_FILE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 121 to line 132 are same as line 140 to line 151.
suggest to consolidate them as a single function.
previous_reboot_cause = find_software_reboot_cause()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
else: | ||
# /proc/cmdline says it's a warm/fast reboot but /host/reboot-cause.txt doesn't exist. | ||
# report an error. | ||
log_error("/proc/cmdline indicates a fast/warm reboot but {} doesn't exist".format(REBOOT_CAUSE_DIR)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other than report error,
I think we should still assign the
proc_cmdline_reboot_cause = warm or fast
we should treat proc cmdline as truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
retest vs please |
…ned when warm reboot follows a hardware caused reboot (#3880) * [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot 1. check whether /proc/cmdline indicates warm/fast reboot. if yes the software reboot cause file will be treated as the reboot cause. finish 2. check whether platform api returns a reboot cause. if yes it is treated as the reboot cause. finish. 3. check whether /hosts/reboot-cause contains a cause. if yes it is treated as the cause otherwise return unknown. * [process-reboot-cause]Fix review comments * [process-reboot-cause]address comments 1. use "with" statement 2. update fast/warm reboot BOOT_ARG * [process-reboot-cause]address comments * refactor the code flow * Remove escape * Remove extra ':'
…ned when warm reboot follows a hardware caused reboot (#3880) * [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot 1. check whether /proc/cmdline indicates warm/fast reboot. if yes the software reboot cause file will be treated as the reboot cause. finish 2. check whether platform api returns a reboot cause. if yes it is treated as the reboot cause. finish. 3. check whether /hosts/reboot-cause contains a cause. if yes it is treated as the cause otherwise return unknown. * [process-reboot-cause]Fix review comments * [process-reboot-cause]address comments 1. use "with" statement 2. update fast/warm reboot BOOT_ARG * [process-reboot-cause]address comments * refactor the code flow * Remove escape * Remove extra ':'
- What I did
Fix issue reboot cause test failed if warm-reboot follows any kind of hardware-caused reboot #3871
- How I did it
if yes the software reboot cause file will be treated as the reboot cause.
finish
if yes it is treated as the reboot cause.
finish.
if yes it is treated as the cause otherwise return unknown.
Detailed information can be found in the issue description.
- How to verify it
reboot the DUT in the following sequence
reboot-cause-test.txt
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)