display component uptime #1223

9547 · 2021-03-17T15:11:49Z

Feature Request

Is your feature request related to a problem? Please describe:

Describe the feature you'd like:

I want to display the uptime time for each component in tiup {cluster|dm} display xxx

Describe alternatives you've considered:

As some components' Prometheus metric API returns process_start_time_seconds metric, which we can use directly to represent the process's start timestamp, and use time.Now() - start_timestamp as uptime, those components as below:

pd
tidb
tikv
ticdc
drainer
pump
alertmanager, grafana, prometheus

Those components doesn't contain the process_start_time_seconds:

tiflash
tispark-{master,worker}

For those components that do not include this metric, especially TIFlash, we can wait until the product side is integrated before adding it. During this transition, we can use ssh then ps to see how long the process lives.

Teachability, Documentation, Adoption, Migration Strategy:

The text was updated successfully, but these errors were encountered:

9547 · 2021-03-17T15:12:00Z

/assign

9547 · 2021-03-17T15:12:29Z

@lucklove @AstroProfundis PTAL

lucklove · 2021-03-18T02:43:49Z

As some components' Prometheus metric API returns process_start_time_seconds metric, which we can use directly to represent the process's start timestamp, and use time.Now() - start_timestamp as uptime

That's a good idea, but what if the user didn't deploy prometheus? Can we handle that case?

9547 · 2021-03-18T02:50:20Z

As some components' Prometheus metric API returns process_start_time_seconds metric, which we can use directly to represent the process's start timestamp, and use time.Now() - start_timestamp as uptime

That's a good idea, but what if the user didn't deploy prometheus? Can we handle that case?

I'm sorry, I didn't make myself clear, we are not using Prometheus, but use the component's metric API. Btw, if use Prometheus to query the latest uptime, this is not correct, maybe fetched the staled data, which is the last message before the process dies.

lucklove · 2021-03-18T04:00:41Z

I've checked the metrics components returned, but I didn't find any metric that records the start_timestamp...

9547 · 2021-03-18T15:53:24Z

I've checked the metrics components returned, but I didn't find any metric that records the start_timestamp...

Sorry for the spelling error, is process_start_time_seconds, not start_timestamp. BTW, we can use systemctl status xxx | grep -Po ".*; \K(.*)(?= ago)" to get the process's uptime for those doesn't has metric.

lucklove · 2021-03-22T03:53:49Z

I've checked the metrics components returned, but I didn't find any metric that records the start_timestamp...

Sorry for the spelling error, is process_start_time_seconds, not start_timestamp. BTW, we can use systemctl status xxx | grep -Po ".*; \K(.*)(?= ago)" to get the process's uptime for those doesn't has metric.

Nice, I deployed a cluster and check the metric, it seems pd and tidb has this metric returned but TiKV doesn't.
To keep consistency, we can use the systemctl way for all components no matter they have process_start_time_seconds or not. This may make display slow because we need to iterate every instance and connect the ssh client, it's not friendly for big cluster. So maybe we can make --uptime a option, by default, we don't show uptime.

9547 · 2021-03-22T04:54:48Z

I've checked the metrics components returned, but I didn't find any metric that records the start_timestamp...

Sorry for the spelling error, is process_start_time_seconds, not start_timestamp. BTW, we can use systemctl status xxx | grep -Po ".*; \K(.*)(?= ago)" to get the process's uptime for those doesn't has metric.

Nice, I deployed a cluster and check the metric, it seems pd and tidb has this metric returned but TiKV doesn't.

To keep consistency, we can use the systemctl way for all components no matter they have process_start_time_seconds or not. This may make display slow because we need to iterate every instance and connect the ssh client, it's not friendly for big cluster. So maybe we can make --uptime a option, by default, we don't show uptime.

I've checked those components under cluster version v4.0.4 and dm nightly, below components have the process_start_time_seconds metric:

pd
tidb
tikv
ticdc
drainer
pump
dm-{master, worker}
alertmanager, grafana, prometheus

Those components doesn't contain the process_start_time_seconds:

tiflash
tispark-{master,worker}

So most of the components are implemented, and I think via metric api is more convenient and useful for normal usage. And we can implement it firstly through metric api, if not implemented or service was down, then use ssh-systemctl.

However, once all the services are down, they will be degraded to be accessed through ssh-systemctl, which will affect the query time, so --uptime or --no-uptime maybe needed 🤔

lucklove · 2021-03-22T13:44:58Z

The version I have checked is v4.0.0.

So I think there must be compatibility issues... Not every version implements this.

9547 · 2021-04-08T15:24:55Z

It's implemented in #1231

9547 added the type/feature-request Categorizes issue as related to a new feature. label Mar 17, 2021

ti-chi-bot assigned 9547 Mar 17, 2021

9547 mentioned this issue Mar 21, 2021

dm,cluster: display with uptime returned #1231

Merged

9547 closed this as completed Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

display component uptime #1223

display component uptime #1223

9547 commented Mar 17, 2021

9547 commented Mar 17, 2021

9547 commented Mar 17, 2021

lucklove commented Mar 18, 2021

9547 commented Mar 18, 2021

lucklove commented Mar 18, 2021

9547 commented Mar 18, 2021

lucklove commented Mar 22, 2021

9547 commented Mar 22, 2021

lucklove commented Mar 22, 2021

9547 commented Apr 8, 2021

display component uptime #1223

display component uptime #1223

Comments

9547 commented Mar 17, 2021

Feature Request

9547 commented Mar 17, 2021

9547 commented Mar 17, 2021

lucklove commented Mar 18, 2021

9547 commented Mar 18, 2021

lucklove commented Mar 18, 2021

9547 commented Mar 18, 2021

lucklove commented Mar 22, 2021

9547 commented Mar 22, 2021

lucklove commented Mar 22, 2021

9547 commented Apr 8, 2021