Need to create system status page of the che.openshift.io #1224

ibuziuk · 2019-01-29T10:29:54Z

Currently there is no status page for che.openshift.io which would provide information about the state of the platform. There are many different online services that are providing information about the state of their platform:

It was decided instead of creating custom dsaas service use account on https://www.statuspage.io/

sub-tasks:

Create che.openshift.io account on statuspage.io and contribute info about github state Create che.openshift.io account on statuspage.io and contribute info about github state #1240
Investigate how prometeus metrics could be used on statuspage.io Investigate how prometheus metrics could be used on statuspage.io #1237
Need to contribute 2 system metrics to statuspage Need to contribute 2 system metrics to statuspage #1286

Related openshift.io user-story - openshiftio/openshift.io#4730

The text was updated successfully, but these errors were encountered:

slemeur · 2019-01-30T18:32:15Z

Should we have the root epic openshiftio/openshift.io#4730 under this repository?

ibuziuk · 2019-01-30T18:34:50Z

@slemeur having user-story under openshift.io works just fine IMO (I personally do not think we should have it under rh-che since the status service would be a separate repo)

fche · 2019-02-20T15:54:52Z

Does this service allow you to feed metrics or programmatic configuration changes to it? e.g. can you tell it to start monitoring a given route url that doesn't exist yet, and measure time until it does?

ibuziuk · 2019-02-20T16:22:04Z

can you tell it to start monitoring a given route url that doesn't exist yet, and measure time until it does ?

@fche hmm.. why would you like to start monitoring non-existing route ?
The main question we are currently having is if statuspage.io can support Prometheus format properly - #1237

fche · 2019-02-20T17:05:41Z

why would you like to start monitoring non-existing route ?

Related to the other need to track openshift route-creation times. Notify service at oc api call start time, let it determine time taken for route to be actually accessible.

ibuziuk · 2019-02-20T20:02:58Z

@fche AFAIK, it is planned to be done on che-server side and exposing via prometheus metric - eclipse-che/che#12699

fche · 2019-02-20T20:08:40Z

cc: @gorkem
In other than the short term, does this sounds like the sort of tool we should provide for ourselves, as opposed to outsourcing it?

fche · 2019-02-20T20:13:00Z

it is planned to be done on che-server side

OK, assuming it is in a position to reliably tell whether the routes are externally accessible.
BTW, submitted this RFE for openshift to consider supplying this info itself: openshift/origin#22107

ibuziuk · 2019-02-20T20:19:28Z

In other than the short term, does this sounds like the sort of tool we should provide for ourselves, as opposed to outsourcing it?

@fche if we opt for a custom dsaas service the major question is, who will be the primary owner / maintainer ?

fche · 2019-02-20T21:44:46Z

who will be the primary owner / maintainer

aye, there is the rub

But independent of that question, one can work out in greater detail just what info you'd like to see there.

ibuziuk · 2019-02-20T22:21:55Z

@fche I believe most of the details are covered in the following user-story - openshiftio/openshift.io#4730

fche · 2019-02-21T20:32:35Z

What do you think the chances are that many or all of the datasets you are talking about could be rendered entirely as grafana (or perhaps pcp) dashboards? So, assume there is a queriable metric database nearby the rhche server. Assume it's been gathering the status/health metrics being discussed over at openshiftio/openshift.io#4730. Does the "system status" have to be anything other than a preconfigured dashboard - with some combination of graphical or textual forms we can generate?

ibuziuk · 2019-02-22T17:40:58Z

What do you think the chances are that many or all of the datasets you are talking about could be rendered entirely as grafana (or perhaps pcp) dashboards?

I believe everything could be rendered entirely via grafana, but the goal of statuspage is to make it user-friendly, easy to update, easy to notify users, easy to create incident, easy to scheduled maintenance etc.
So, graphana and status page are two different beasts.

fche · 2019-02-22T17:50:48Z

Could we think about it as the public status-page being downstream of our internal status dashboards & machinery? i.e., not tightly coupled to che, but rather to a hypothetical dev-console health dashboard?

ibuziuk · 2019-02-22T17:52:58Z

IMO, che.openshift.io is a very special case not ~~tightly~~ related to the SaaS which deserves own status page

fche · 2019-02-22T18:00:42Z

Understood, just trying to minimize number of bits of machinery and maximize reusability. Maybe think of it more like - a running copy of che should have its own health display for benefit of each of its users. Can the public dashboard be another consumer of that same data & maybe even some of the same renderings?

ibuziuk · 2019-02-22T18:03:38Z

well, potentially it could, but ideally status page should be deployed separately from the monitored service - if the service is down, status page should be still up with the reported accident (if status page is part of the service itself it would be down together with the service during incident / scheduled maintenance)

fche · 2019-02-22T18:09:13Z

Yup, kind of like a reliable mirror.

fche · 2019-02-26T18:50:57Z

As a prototype, before we do a full proper operator / openshift4 / prometheus flavoured thing, we could perhaps layer a small piece of new code on top of the existing osd-monitor-poc pcp-based infrastructure, to relay metric threshold crossing events to statuspage.io. We'd need to know a sample metric name and threshold predicate, and statuspage.io api credentials.

ibuziuk · 2019-02-26T18:52:46Z

@fche will you be able to give a hand with impl. push part in the next sprint (first we need to figure out which metrics are we going to push - hobby plan offers only 2 system metrics, so we need to be picky) ?

fche · 2019-02-26T18:57:35Z

Can indeed help with a quick prototype, presuming building on the present osd-monitor-poc machinery, not major new stuff. It's about as complicated as adding a new outbound zabbix relay.

ibuziuk · 2019-02-26T19:07:39Z

Sounds good, I will reach you once I would have more details about params for statuspage API

ibuziuk · 2019-07-30T12:52:36Z

Closing this epic since https://che.statuspage.io/ is setup and we have a separate issue for contributing system metrics to statuspage (which is currently not a priority) - #1286

ibuziuk added the kind/task label Jan 29, 2019

ibuziuk self-assigned this Feb 5, 2019

ibuziuk changed the title ~~Need to create a dsaas service for providing status of the che.openshift.io~~ Need to create system status page of the che.openshift.io Feb 6, 2019

ibuziuk added Epic kind/epic and removed kind/task labels Feb 14, 2019

ibuziuk removed the Epic label Feb 20, 2019

ibuziuk mentioned this issue Mar 11, 2019

Successfully started workspaces ratio eclipse-che/che#12852

Merged

ibuziuk closed this as completed Jul 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to create system status page of the che.openshift.io #1224

Need to create system status page of the che.openshift.io #1224

ibuziuk commented Jan 29, 2019 •

edited

Loading

slemeur commented Jan 30, 2019

ibuziuk commented Jan 30, 2019

fche commented Feb 20, 2019

ibuziuk commented Feb 20, 2019

fche commented Feb 20, 2019

ibuziuk commented Feb 20, 2019

fche commented Feb 20, 2019

fche commented Feb 20, 2019

ibuziuk commented Feb 20, 2019

fche commented Feb 20, 2019

ibuziuk commented Feb 20, 2019

fche commented Feb 21, 2019

ibuziuk commented Feb 22, 2019

fche commented Feb 22, 2019

ibuziuk commented Feb 22, 2019 •

edited

Loading

fche commented Feb 22, 2019

ibuziuk commented Feb 22, 2019

fche commented Feb 22, 2019

fche commented Feb 26, 2019

ibuziuk commented Feb 26, 2019 •

edited

Loading

fche commented Feb 26, 2019

ibuziuk commented Feb 26, 2019

ibuziuk commented Jul 30, 2019

Need to create system status page of the che.openshift.io #1224

Need to create system status page of the che.openshift.io #1224

Comments

ibuziuk commented Jan 29, 2019 • edited Loading

slemeur commented Jan 30, 2019

ibuziuk commented Jan 30, 2019

fche commented Feb 20, 2019

ibuziuk commented Feb 20, 2019

fche commented Feb 20, 2019

ibuziuk commented Feb 20, 2019

fche commented Feb 20, 2019

fche commented Feb 20, 2019

ibuziuk commented Feb 20, 2019

fche commented Feb 20, 2019

ibuziuk commented Feb 20, 2019

fche commented Feb 21, 2019

ibuziuk commented Feb 22, 2019

fche commented Feb 22, 2019

ibuziuk commented Feb 22, 2019 • edited Loading

fche commented Feb 22, 2019

ibuziuk commented Feb 22, 2019

fche commented Feb 22, 2019

fche commented Feb 26, 2019

ibuziuk commented Feb 26, 2019 • edited Loading

fche commented Feb 26, 2019

ibuziuk commented Feb 26, 2019

ibuziuk commented Jul 30, 2019

ibuziuk commented Jan 29, 2019 •

edited

Loading

ibuziuk commented Feb 22, 2019 •

edited

Loading

ibuziuk commented Feb 26, 2019 •

edited

Loading