Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: History doesn't include reboots, but should #15983

Closed
garrett opened this issue Jun 23, 2021 · 19 comments · Fixed by #21444
Closed

metrics: History doesn't include reboots, but should #15983

garrett opened this issue Jun 23, 2021 · 19 comments · Fixed by #21444
Labels
enhancement good-first-issue Appropriate for new contributors page:metrics

Comments

@garrett
Copy link
Member

garrett commented Jun 23, 2021

Page: Metrics

When looking at the system history, it's not clear when a reboot happened.

Story: I had another system crash and would like to see the context of when spikes happened. Did the computer spike up in memory and CPU usage before the crash? Or what that the result of booting up? It's unclear in the UI at the moment. Having a boot marker like the system logs would make it more obvious.

@garrett garrett changed the title metrics: metrics: History doesn't include reboots, but should Jun 23, 2021
@martinpitt martinpitt added the good-first-issue Appropriate for new contributors label Jul 29, 2021
@martinpitt
Copy link
Member

Agreed -- it could even appear as an "event", like we have CPU/memory spikes. Chances are very high that a reboot triggers a CPU spike anyway.

@martinpitt
Copy link
Member

We can certainly correlate this with reboots from last, and if we have it, we can certainly feed it in as event.

There are other causes of large data gaps, like suspends, rescue mode, or the admin just stopping PCP. Whenever we encounter a nontrivial data gap, should we visually set them apart somehow? i.e. start a new block instead of putting long contiguous empty graphs in between? that might make the page a bit easier to comprehend.

@jelly
Copy link
Member

jelly commented Nov 17, 2021

Related on my laptop (which I suspend), the metrics page seems to show empty blocks when my laptops suspends.
image

@garrett
Copy link
Member Author

garrett commented Nov 17, 2021

Yeah, I'm getting that too. 😞

@dev-DTECH
Copy link

Hey @garrett, I would like to work on this issue.

@KKoukiou
Copy link
Contributor

KKoukiou commented Mar 30, 2023

Hey @garrett, I would like to work on this issue.

@dev-DTECH This is not a very easy good first issue actually. You will need to parse information about reboots from journal probably and insert these in the right timestrap in the metrics graph events. The whole code for this is here https://github.com/cockpit-project/cockpit/blob/main/pkg/metrics/metrics.jsx but as said, it's not just a 10 lines PR.

@dev-DTECH
Copy link

Hey @garrett, I would like to work on this issue.

@dev-DTECH This is not a very easy good first issue actually. You will need to parse information about reboots from journal probably and insert these in the right timestrap in the metrics graph events. The whole code for this is here https://github.com/cockpit-project/cockpit/blob/main/pkg/metrics/metrics.jsx but as said, it's not just a 10 lines PR.

Yeah I understand that but I am eager to learn and also I am well acquainted with journal. If anyone else is not working on it I can try to resolve this issue.

@dev-DTECH
Copy link

Hey @KKoukiou, I searched a bit and figured output that the command 'last -x' shows the timings of crash/reboot/shutdown
So I am trying to use the output of this command to indicate the reboots in the metric history.

Is it the correct way or should I consider another way?

@KKoukiou
Copy link
Contributor

KKoukiou commented Apr 3, 2023

@dev-DTECH is looks fine to start with that.

@dev-DTECH
Copy link

Hey @KKoukiou, so I got the reboot times using cockpit.spawn("last -x | grep reboot".split(" "))

Every reboot has a start time and end time
image
So should I show the whole range of time as the reboot or just the start/end?

@martinpitt
Copy link
Member

Note that the reboot range seems to include the whole time between booting and shutting down. E.g. I usually apply OS updates on Saturday mornings, then reboot, and they look like this:

reboot   system boot  6.1.14-200.fc37. Sat Mar  4 07:25 - 09:12 (7+01:46)

I.e. it spans over a week -- I suppose the "7+" means "7 days, one hour, and 46 minutes". TBH I find that output rather hard to interpret.. It gets easier to read with --fulltimes:

reboot   system boot  6.1.14-200.fc37. Sat Mar  4 07:25:49 2023 - Sat Mar 11 09:12:46 2023 (7+01:46)

Plus, there's also shutdowns. But it seems to me that we can only show the time when the computer started, which I believe is the first timestamp. With that, we can also ignore the shutdowns.

Please don't run cockpit.script() with grep, run cockpit.spawn(["last", "--time-format=iso", "reboot"]). That time format is easier to parse, then you can use date-fn's parseISO() to convert it to an useful datetime object.

@dev-DTECH
Copy link

Ok that's much better formatted

This is cockpit.spawn(["last", "--time-format=iso", "reboot"]) then the time parsed with parseISO()
image
Thanks for the help. This will make my task so much easier.

@ashutosh7i
Copy link

Hello @martinpitt sir, Do we still need this feature?
can i work on this issue??

@martinpitt
Copy link
Member

@ashutosh7i yes, this is still relevant, and fixing would be nice! Note that this is not the easiest task to start with (not hard, but perhaps start with something easier). Please consider #15983 (comment)

@ashutosh7i
Copy link

So i have some progress on this,

  • sir @martinpitt i have worked accordingly as you mentioned in this comment #15983
    i am parsing data in similar format, To show the reboot event i am considering the second timestamp as the time of reboot.
    for example, in this response-
reboot   system boot  6.1.14-200.fc37. Sat Mar  4 07:25:49 2023 - Sat Mar 11 09:12:46 2023 (7+01:46)
reboot   system boot  kernal version     [timestamp 1]               - [timestamp 2]        (session duration)

i am parsing these using cockpit.spawn, then mapping the timestamps in UI.
i am using timestamp 2 as the time when reboot happened and show the event "Reboot" there.

Sample image-
imageedit_2_7252825514

Now i have some questions-

  1. Since reboot is a critical event, should i show it in place of "spikes" or in place of "Load, Disk, Network, I/O" ?
  2. What about design? what exact phrase should i use, is "Reboot" fine? @garrett

@garrett
Copy link
Member Author

garrett commented Dec 11, 2023

Looks good. Thanks!

We might even want to consider making it bold, as it's not just an important event, but it is also a "landmark" event (where it is a specific event that shows when one session stopped and another started).

Since reboot is a critical event, should i show it in place of "spikes" or in place of "Load, Disk, Network, I/O" ?

Yes; thanks!

What about design? what exact phrase should i use, is "Reboot" fine?

Yes, that works.

@martinpitt
Copy link
Member

Thanks @ashutosh7i ! Can you please send a pull request with your changes, so that we can review and test the implementation there? Cheers!

@ajshrmaofficial
Copy link

Hey @garrett @martinpitt ,
I hope you guys are doing well,
I just wanted to ask you guys if this issue is still available, as I do not see any PR attached to it
By the way, I liked this project very much and want to contribute if possible (I'm new to contributions).
Thanks

@martinpitt
Copy link
Member

@ajshrmaofficial Yes, it is still outstanding and there's no PR. Thanks for your interest! Please work through https://github.com/cockpit-project/cockpit/blob/main/HACKING.md first to set up a dev environment and learn how to do and test a change first. Have fun!

jelly added a commit to jelly/cockpit that referenced this issue Dec 17, 2024
Show a boot as an metric event in the historical metrics overview. A
boot is likely to cause a high CPU/memory spikes so it is interesting
for a system administrator to be aware of them. We obtain the boot
information from systemd as `last` is deprecated and not all distros use
lastlog2 while `journalctl` is freely available.

Closes: cockpit-project#15983
jelly added a commit to jelly/cockpit that referenced this issue Dec 17, 2024
Show a boot as an metric event in the historical metrics overview. A
boot is likely to cause a high CPU/memory spikes so it is interesting
for a system administrator to be aware of them. We obtain the boot
information from systemd as `last` is deprecated and not all distros use
lastlog2 while `journalctl` is freely available.

Closes: cockpit-project#15983
jelly added a commit to jelly/cockpit that referenced this issue Dec 18, 2024
Show a boot as an metric event in the historical metrics overview. A
boot is likely to cause a high CPU/memory spikes so it is interesting
for a system administrator to be aware of them. We obtain the boot
information from systemd as `last` is deprecated and not all distros use
lastlog2 while `journalctl` is freely available.

Closes: cockpit-project#15983
martinpitt pushed a commit to jelly/cockpit that referenced this issue Dec 19, 2024
Show a boot as an metric event in the historical metrics overview. A
boot is likely to cause a high CPU/memory spikes so it is interesting
for a system administrator to be aware of them. We obtain the boot
information from systemd as `last` is deprecated and not all distros use
lastlog2, while `journalctl` is available everywhere.

Fixes cockpit-project#15983
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good-first-issue Appropriate for new contributors page:metrics
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants