Skip to content
This repository has been archived by the owner on Jan 7, 2020. It is now read-only.

Add concurrency to the fetchData function #781

Merged
merged 6 commits into from
Jun 1, 2018
Merged

Conversation

amdprophet
Copy link
Member

Signed-off-by: Justin Kolberg amd.prophet@gmail.com

Description

Adds concurrency to fetchData() in daemon.go. Sensu API performance becomes the main limiting factor in Uchiwa performance with this change.

Related Issue

#750

Motivation and Context

Uchiwa becomes very slow with one or more large Sensu installations.

How Has This Been Tested?

Benchmarked against three staged Sensu datacenters:

Old:

{"timestamp":"2018-05-31T10:20:01.049587855-07:00","level":"warn","message":"Loading the configuration file config.json"}
{"timestamp":"2018-05-31T10:20:01.051740115-07:00","level":"info","message":"Updating the datacenter Site 1"}
{"timestamp":"2018-05-31T10:20:01.053135187-07:00","level":"warn","message":"Uchiwa is now listening on 0.0.0.0:3000"}
{"timestamp":"2018-05-31T10:20:16.863954073-07:00","level":"info","message":"Updating the datacenter Site 2"}
{"timestamp":"2018-05-31T10:20:36.863593601-07:00","level":"info","message":"Updating the datacenter Site 3"}
52.025680442s
{"timestamp":"2018-05-31T10:25:14.225580405-07:00","level":"info","message":"Updating the datacenter Site 1"}
{"timestamp":"2018-05-31T10:25:26.668433934-07:00","level":"info","message":"Updating the datacenter Site 2"}
{"timestamp":"2018-05-31T10:25:41.266359255-07:00","level":"info","message":"Updating the datacenter Site 3"}
38.911402966s

New:

{"timestamp":"2018-05-31T14:07:22.552640113-07:00","level":"warn","message":"Loading the configuration file config.json"}
{"timestamp":"2018-05-31T14:07:22.553484743-07:00","level":"warn","message":"Uchiwa is now listening on 0.0.0.0:3000"}
{"timestamp":"2018-05-31T14:07:22.553752527-07:00","level":"info","message":"Updating the datacenter Site 3"}
{"timestamp":"2018-05-31T14:07:22.554319529-07:00","level":"info","message":"Updating the datacenter Site 1"}
{"timestamp":"2018-05-31T14:07:22.554448697-07:00","level":"info","message":"Updating the datacenter Site 2"}
15.770854874s
{"timestamp":"2018-05-31T14:08:24.051077286-07:00","level":"info","message":"Updating the datacenter Site 3"}
{"timestamp":"2018-05-31T14:08:24.051162268-07:00","level":"info","message":"Updating the datacenter Site 1"}
{"timestamp":"2018-05-31T14:08:24.051213996-07:00","level":"info","message":"Updating the datacenter Site 2"}
14.344004765s

Screenshots (if appropriate): N/A

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>
@amdprophet amdprophet force-pushed the fetchData-concurrency branch from 978f2c0 to 043a62b Compare June 1, 2018 00:08
@codecov-io
Copy link

codecov-io commented Jun 1, 2018

Codecov Report

Merging #781 into master will decrease coverage by 0.67%.
The diff coverage is 1.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #781      +/-   ##
==========================================
- Coverage   22.27%   21.59%   -0.68%     
==========================================
  Files          28       28              
  Lines        2375     2454      +79     
==========================================
+ Hits          529      530       +1     
- Misses       1781     1859      +78     
  Partials       65       65
Impacted Files Coverage Δ
uchiwa/daemon/datacenters.go 0% <0%> (ø) ⬆️
uchiwa/daemon/daemon.go 8.25% <1.26%> (-4.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1d0a7d...d406365. Read the comment docs.

Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>
Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>
Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>
Copy link

@echlebek echlebek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes from an online-discussion I had with Justin:

  • Test coverage for the daemon package is quite low, so these types of changes will need cautious manual testing.

  • While there is room for improvement in the implementation, it would require substantial refactoring of not only this PR but likely also parts of Uchiwa.

  • Uchiwa is in maintenance mode, so it's not clear that there would be a good cost/benefit ratio to doing this work.

  • The fetcher methods need to be mutex locked for their duration (excluding the GetFoo methods at the top of each fetcher)

  • The PR should be tested manually with the race detector enabled, to try to tease out any concurrency bugs that might exist.

  • While channels could be employed here, the centralized state inherent in the design would limit their usefulness.

go d.fetchAggregates()

if f.enterprise {
go d.fetchEnterpriseMetrics()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably requires a wg.Add(1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit.

}

// fetch sensu data from the datacenter
wg.Add(8)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be wg.Add(7)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit.

Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>
Copy link

@echlebek echlebek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do defer mu.Unlock() immediately after locking mutexes, to make sure they are unlocked properly no matter what happens in the function afterwards.

Do this for wg.Done() as well. Even if the code is currently correct, defer tends to harden the code against future breakage, if a maintainer adds another return point to the function.


func (d *DatacenterSnapshotFetcher) fetchSilenced() {
silenced, err := d.datacenter.GetSilenced()
d.mutex.Lock()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better pattern is

d.mutex.Lock()
defer d.mutex.Unlock()

This ensures that the mutex is always unlocked.

}

d.mutex.Unlock()
d.wg.Done()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer d.wg.Done() at the top of the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed - and also made use of the mutex in fetchEvents().

Signed-off-by: Justin Kolberg <amd.prophet@gmail.com>
@amdprophet amdprophet merged commit 815ad56 into master Jun 1, 2018
@amdprophet amdprophet deleted the fetchData-concurrency branch June 1, 2018 23:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants