Support marking APIs as healthy/unhealthy #812

amdprophet · 2019-02-25T23:48:18Z

Description

Adds the ability to mark an API as healthy/unhealthy. Uchiwa will only make requests to APIs marked as healthy. APIs marked as unhealthy will be watched be a goroutine and marked as healthy once they respond with an HTTP 2xx error.

Related Issue

Fixes #811.

Motivation and Context

See #811.

How Has This Been Tested?

I've tested with the following configuration:

{
  "sensu": [
    {
      "name": "Site 1",
      "host": "127.0.0.1",
      "port": 4567
    },
    {
      "name": "Site 2",
      "host": "127.0.0.1",
      "port": 4567
    },
    {
      "name": "Site 2",
      "host": "10.0.1.11",
      "port": 4567
    }
  ],
  "uchiwa": {
    "host": "0.0.0.0",
    "port": 3000,
      "refresh": 5,
      "loglevel": "warn"
  }
}

I've tested Uchiwa with one or both APIs down, one API up and one API down, and then starting the APIs that were down to see APIs marked as healthy.

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>

amdprophet · 2019-02-25T23:51:46Z

@cwjohnston had mentioned that we should mark the API as unhealthy if the stats from GET /info show an unhealthy state. I'm not sure how we can do this as the JSON deserialization and call to determineHealth() happens far after we call GET /info. @palourde do you have any ideas on how we can implement this without deserializing twice?

codecov-io · 2019-02-26T00:31:36Z

Codecov Report

Merging #812 into master will decrease coverage by 0.13%.
The diff coverage is 4.34%.

@@            Coverage Diff             @@
##           master     #812      +/-   ##
==========================================
- Coverage   20.93%   20.79%   -0.14%     
==========================================
  Files          28       28              
  Lines        2532     2553      +21     
==========================================
+ Hits          530      531       +1     
- Misses       1937     1957      +20     
  Partials       65       65

Impacted Files	Coverage Δ
uchiwa/daemon/subscriptions.go	`83.33% <0%> (ø)`	⬆️
uchiwa/daemon/helpers.go	`22.22% <0%> (ø)`	⬆️
uchiwa/daemon/daemon.go	`5.86% <0%> (-0.44%)`	⬇️
uchiwa/main.go	`44.64% <100%> (+1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f83c003...79440f3. Read the comment docs.

palourde · 2019-02-26T15:33:34Z

@amdprophet So determineHealth() is called before checkAPIHealth() right? Therefore, before we return here, maybe we could just mark the API as unhealthly?

uchiwa/uchiwa/daemon/daemon.go

Lines 234 to 241 in 79440f3

    
           if d.snapshot.Info != nil { 
        
           	if !d.snapshot.Info.Redis.Connected { 
        
           		return structs.SensuHealth{Output: "Not connected to Redis", Status: 1} 
        
           	} 
        
           	if !d.snapshot.Info.Transport.Connected && d.snapshot.Info != nil { 
        
           		return structs.SensuHealth{Output: "Not connected to the transport", Status: 1} 
        
           	} 
        
           }

amdprophet · 2019-02-26T16:43:44Z

@palourde the problem is trying to determine which API is unhealthy but I just realized that /info would mean the whole datacenter is unhealthy. Perhaps we shouldn't mark APIs as unhealthy based on the contents of the /info response else we should mark all APIs within the DC as unhealthy.

cwjohnston · 2019-02-26T16:52:16Z

I just realized that /info would mean the whole datacenter is unhealthy

@amdprophet can you elaborate on this?

In my experience the /info API will return a HTTP 500 (internal server error) when that particular API host is disconnected from either Redis or RabbitMQ. Because /info only returns information about the API host being queried, I don’t think we can reasonably consider a whole datacenter down based on a a single API host’s state.

On the other hand, the /health endpoint could be used to determine the health of a whole datacenter, as the combination of queue depth and number of consumers reflects the overall state of the datacenter. That said, I think that reporting on /health results is outside the scope of what we’re trying to do here.

palourde · 2019-02-26T17:02:14Z

At this point, like Cameron mentioned, I feel like it's a different requirement and probably a different solution too.

The health of Sensu itself (number of consumers, quantity of messages, etc.) shouldn't impact how Uchiwa deal with its datacenter APIs. However, we could definitely improve how Uchiwa reports the health of a datacenter with these metrics.

amdprophet · 2019-02-26T17:12:29Z

@cwjohnston I was confusing the functionality of /info with /health and I agree with what has been said.

I think this PR is in a state where it can be reviewed and merged if everything looks good.

palourde

amdprophet added 3 commits February 12, 2019 15:36

only fetch from healthy apis

022ef68

Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>

use goroutines to monitor unhealthy apis

14c3086

Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>

some small bug fixes

79440f3

Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>

amdprophet requested a review from palourde February 25, 2019 23:48

amdprophet added ready-for-review in progress and removed in progress labels Feb 25, 2019

amdprophet changed the title ~~[WIP] Support marking APIs as healthy/unhealthy~~ Support marking APIs as healthy/unhealthy Feb 26, 2019

palourde approved these changes Feb 26, 2019

View reviewed changes

amdprophet merged commit d76dc26 into master Feb 26, 2019

amdprophet deleted the health-daemon branch February 26, 2019 21:00

amdprophet removed in progress ready-for-review labels Feb 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support marking APIs as healthy/unhealthy #812

Support marking APIs as healthy/unhealthy #812

amdprophet commented Feb 25, 2019

amdprophet commented Feb 25, 2019

codecov-io commented Feb 26, 2019 •

edited

Loading

palourde commented Feb 26, 2019

amdprophet commented Feb 26, 2019

cwjohnston commented Feb 26, 2019

palourde commented Feb 26, 2019

amdprophet commented Feb 26, 2019

palourde left a comment

Support marking APIs as healthy/unhealthy #812

Support marking APIs as healthy/unhealthy #812

Conversation

amdprophet commented Feb 25, 2019

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

amdprophet commented Feb 25, 2019

codecov-io commented Feb 26, 2019 • edited Loading

Codecov Report

palourde commented Feb 26, 2019

amdprophet commented Feb 26, 2019

cwjohnston commented Feb 26, 2019

palourde commented Feb 26, 2019

amdprophet commented Feb 26, 2019

palourde left a comment

Choose a reason for hiding this comment

codecov-io commented Feb 26, 2019 •

edited

Loading