Skip to content
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.

Need to create system status page of the che.openshift.io #1224

Closed
2 of 3 tasks
ibuziuk opened this issue Jan 29, 2019 · 23 comments
Closed
2 of 3 tasks

Need to create system status page of the che.openshift.io #1224

ibuziuk opened this issue Jan 29, 2019 · 23 comments
Assignees

Comments

@ibuziuk
Copy link
Member

ibuziuk commented Jan 29, 2019

Currently there is no status page for che.openshift.io which would provide information about the state of the platform. There are many different online services that are providing information about the state of their platform:

It was decided instead of creating custom dsaas service use account on https://www.statuspage.io/

sub-tasks:

Related openshift.io user-story - openshiftio/openshift.io#4730

@slemeur
Copy link

slemeur commented Jan 30, 2019

Should we have the root epic openshiftio/openshift.io#4730 under this repository?

@ibuziuk
Copy link
Member Author

ibuziuk commented Jan 30, 2019

@slemeur having user-story under openshift.io works just fine IMO (I personally do not think we should have it under rh-che since the status service would be a separate repo)

@ibuziuk ibuziuk self-assigned this Feb 5, 2019
@ibuziuk ibuziuk changed the title Need to create a dsaas service for providing status of the che.openshift.io Need to create system status page of the che.openshift.io Feb 6, 2019
@fche
Copy link

fche commented Feb 20, 2019

Does this service allow you to feed metrics or programmatic configuration changes to it? e.g. can you tell it to start monitoring a given route url that doesn't exist yet, and measure time until it does?

@ibuziuk
Copy link
Member Author

ibuziuk commented Feb 20, 2019

can you tell it to start monitoring a given route url that doesn't exist yet, and measure time until it does ?

@fche hmm.. why would you like to start monitoring non-existing route ?
The main question we are currently having is if statuspage.io can support Prometheus format properly - #1237

@fche
Copy link

fche commented Feb 20, 2019

why would you like to start monitoring non-existing route ?

Related to the other need to track openshift route-creation times. Notify service at oc api call start time, let it determine time taken for route to be actually accessible.

@ibuziuk
Copy link
Member Author

ibuziuk commented Feb 20, 2019

@fche AFAIK, it is planned to be done on che-server side and exposing via prometheus metric - eclipse-che/che#12699

@fche
Copy link

fche commented Feb 20, 2019

cc: @gorkem
In other than the short term, does this sounds like the sort of tool we should provide for ourselves, as opposed to outsourcing it?

@fche
Copy link

fche commented Feb 20, 2019

it is planned to be done on che-server side

OK, assuming it is in a position to reliably tell whether the routes are externally accessible.
BTW, submitted this RFE for openshift to consider supplying this info itself: openshift/origin#22107

@ibuziuk
Copy link
Member Author

ibuziuk commented Feb 20, 2019

In other than the short term, does this sounds like the sort of tool we should provide for ourselves, as opposed to outsourcing it?

@fche if we opt for a custom dsaas service the major question is, who will be the primary owner / maintainer ?

@ibuziuk ibuziuk removed the Epic label Feb 20, 2019
@fche
Copy link

fche commented Feb 20, 2019

who will be the primary owner / maintainer

aye, there is the rub

But independent of that question, one can work out in greater detail just what info you'd like to see there.

@ibuziuk
Copy link
Member Author

ibuziuk commented Feb 20, 2019

@fche I believe most of the details are covered in the following user-story - openshiftio/openshift.io#4730

@fche
Copy link

fche commented Feb 21, 2019

What do you think the chances are that many or all of the datasets you are talking about could be rendered entirely as grafana (or perhaps pcp) dashboards? So, assume there is a queriable metric database nearby the rhche server. Assume it's been gathering the status/health metrics being discussed over at openshiftio/openshift.io#4730. Does the "system status" have to be anything other than a preconfigured dashboard - with some combination of graphical or textual forms we can generate?

@ibuziuk
Copy link
Member Author

ibuziuk commented Feb 22, 2019

What do you think the chances are that many or all of the datasets you are talking about could be rendered entirely as grafana (or perhaps pcp) dashboards?

I believe everything could be rendered entirely via grafana, but the goal of statuspage is to make it user-friendly, easy to update, easy to notify users, easy to create incident, easy to scheduled maintenance etc.
So, graphana and status page are two different beasts.

@fche
Copy link

fche commented Feb 22, 2019

Could we think about it as the public status-page being downstream of our internal status dashboards & machinery? i.e., not tightly coupled to che, but rather to a hypothetical dev-console health dashboard?

@ibuziuk
Copy link
Member Author

ibuziuk commented Feb 22, 2019

IMO, che.openshift.io is a very special case not tightly related to the SaaS which deserves own status page

@fche
Copy link

fche commented Feb 22, 2019

Understood, just trying to minimize number of bits of machinery and maximize reusability. Maybe think of it more like - a running copy of che should have its own health display for benefit of each of its users. Can the public dashboard be another consumer of that same data & maybe even some of the same renderings?

@ibuziuk
Copy link
Member Author

ibuziuk commented Feb 22, 2019

well, potentially it could, but ideally status page should be deployed separately from the monitored service - if the service is down, status page should be still up with the reported accident (if status page is part of the service itself it would be down together with the service during incident / scheduled maintenance)

@fche
Copy link

fche commented Feb 22, 2019

Yup, kind of like a reliable mirror.

@fche
Copy link

fche commented Feb 26, 2019

As a prototype, before we do a full proper operator / openshift4 / prometheus flavoured thing, we could perhaps layer a small piece of new code on top of the existing osd-monitor-poc pcp-based infrastructure, to relay metric threshold crossing events to statuspage.io. We'd need to know a sample metric name and threshold predicate, and statuspage.io api credentials.

@ibuziuk
Copy link
Member Author

ibuziuk commented Feb 26, 2019

@fche will you be able to give a hand with impl. push part in the next sprint (first we need to figure out which metrics are we going to push - hobby plan offers only 2 system metrics, so we need to be picky) ?

@fche
Copy link

fche commented Feb 26, 2019

Can indeed help with a quick prototype, presuming building on the present osd-monitor-poc machinery, not major new stuff. It's about as complicated as adding a new outbound zabbix relay.

@ibuziuk
Copy link
Member Author

ibuziuk commented Feb 26, 2019

Sounds good, I will reach you once I would have more details about params for statuspage API

@ibuziuk
Copy link
Member Author

ibuziuk commented Jul 30, 2019

Closing this epic since https://che.statuspage.io/ is setup and we have a separate issue for contributing system metrics to statuspage (which is currently not a priority) - #1286

@ibuziuk ibuziuk closed this as completed Jul 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants