enable shrink the socket pool size #116

gnawux · 2018-02-27T06:43:20Z

This is an updated version of #87, changes:

rebased to new development branch
update based on code review
add test cases

Below is the original introduction

we found the mgo will allocate the pool size during burst traffic but won't
close the sockets any more until restart the client or server.

And the mongo document defines two related query options

By implementing these two options, it could shrink the pool to minPoolSize after
the sockets introduced by burst traffic timeout.

The idea comes from https://github.com/JodeZer/mgo , he investigated
this issue and provide the initial commits.

I found there are still some issue in sockets maintenance, and had a PR against
his repo JodeZer#1 .

This commit include JodeZer's commits and my fix, and I simplified the data structure.
What's in this commit could be described as this figure:

+------------------------+
|        Session         | <-------+ Add options here
+------------------------+

+------------------------+
|        Cluster         | <-------+ Add options here
+------------------------+

+------------------------+
|        Server          | <-------+*Add options here
|                        |          *add timestamp when recycle a socket  +---+
|          +-----------+ |    +---+ *periodically check the unused sockets    |
|          | shrinker  <------+          and reclaim the timeout sockets. +---+
|          +-----------+ |                                                    |
|                        |                                                    |
+------------------------+                                                    |
                                                                              |
+------------------------+                                                    |
|        Socket          | <-------+ Add a field for last used times+---------+
+------------------------+

gnawux · 2018-02-27T08:02:56Z

@szank @domodwyer I updated #87 based on your review and added tests here.

szank

Hi,
Looks good. There are few minor typos, but I will approve it anyway.
Of course you are more than welcome to fix it, but otherwise, it is solid.
Approved.

szank · 2018-02-27T12:28:15Z

session_test.go

+	}
+	wg.Wait()
+	stats := mgo.GetStats()
+	c.Logf("living socket: After queries: %d, before queris: %d", stats.SocketsAlive, oldSocket)


queris: typo.

szank · 2018-02-27T12:29:36Z

session_test.go

+	c.Logf("living socket: After queries: %d, before queris: %d", stats.SocketsAlive, oldSocket)
+
+	// give some time for shrink the pool, the tick is set to 1 minute
+	c.Log("Sleeping... 1 minutes to for pool shrinking")


Sorry for nitpicking, but it should be "1 minute" here, and time.Seleep(60*time.Second) below

szank · 2018-02-27T12:30:45Z

server.go

+		now := time.Now()
+		end := 0
+		reclaimMap := map[*mongoSocket]struct{}{}
+		// Because the acquirision and recycle are done at the tail of array,


acquisition

szank · 2018-02-27T12:33:02Z

session.go

+//     maxIdleTimeMS=<millisecond>
+//
+//        The maximum number of milliseconds that a connection can remain idle in the pool
+//        before being removed and closed.


Could you please add a clarification:
If maxIdleTimeMS is 0, connections will never be closed due to inactivity.

we found the mgo will allocate the pool size during burst traffic but won't close the sockets any more until restart the client or server. And the mongo document defines two related query options - [minPoolSize](https://docs.mongodb.com/manual/reference/connection-string/#urioption.minPoolSize) - [maxIdleTimeMS](https://docs.mongodb.com/manual/reference/connection-string/#urioption.maxIdleTimeMS) By implementing these two options, it could shrink the pool to minPoolSize after the sockets introduced by burst traffic timeout. The idea comes from https://github.com/JodeZer/mgo , he investigated this issue and provide the initial commits. I found there are still some issue in sockets maintenance, and had a PR against his repo JodeZer#1 . This commit include JodeZer's commits and my fix, and I simplified the data structure. What's in this commit could be described as this figure: +------------------------+ | Session | <-------+ Add options here +------------------------+ +------------------------+ | Cluster | <-------+ Add options here +------------------------+ +------------------------+ | Server | <-------+*Add options here | | *add timestamp when recycle a socket +---+ | +-----------+ | +---+ *periodically check the unused sockets | | | shrinker <------+ and reclaim the timeout sockets. +---+ | +-----------+ | | | | | +------------------------+ | | +------------------------+ | | Socket | <-------+ Add a field for last used times+---------+ +------------------------+ Signed-off-by: Wang Xu <gnawux@gmail.com>

Signed-off-by: Wang Xu <gnawux@gmail.com>

gnawux · 2018-02-27T13:07:36Z

@szank Thanks for your comments, updated.

gnawux · 2018-02-27T14:19:09Z

looks the travis failure is not related with this patch

KJTsanaktsidis · 2018-02-27T23:05:13Z

👍 nice work! LGTM.

domodwyer · 2018-03-01T17:28:53Z

session_test.go

+
+func (s *S) TestPoolShrink(c *C) {
+	if *fast {
+		c.Skip("-fast")


Thanks for this!

domodwyer · 2018-03-01T17:29:57Z

Hi @gnawux

This looks great, I'll get the build to pass with a bit of retry fun and then we'll get this merged.

Thanks so much for taking the time to help!

Dom

gnawux · 2018-03-01T17:36:11Z

It's my pleasure, and as I mentioned in the commit message, the original investigation credits to @JodeZer, I just did some tests and improvement.

#116 adds a much needed ability to shrink the connection pool, but requires tracking the last-used timestamp for each socket after every operation. Frequent calls to time.Now() in the hot-path reduced read throughput by ~6% and increased the latency (and variance) of socket operations as a whole. This PR adds a periodically updated time value to amortise the cost of the last- used bookkeeping, restoring the original throughput at the cost of approximate last-used values (configured to be ~25ms of potential skew). On some systems (currently including FreeBSD) querying the time counter also requires a syscall/context switch. Fixes #142.

* enable shrink the socket pool size we found the mgo will allocate the pool size during burst traffic but won't close the sockets any more until restart the client or server. And the mongo document defines two related query options - [minPoolSize](https://docs.mongodb.com/manual/reference/connection-string/#urioption.minPoolSize) - [maxIdleTimeMS](https://docs.mongodb.com/manual/reference/connection-string/#urioption.maxIdleTimeMS) By implementing these two options, it could shrink the pool to minPoolSize after the sockets introduced by burst traffic timeout. The idea comes from https://github.com/JodeZer/mgo , he investigated this issue and provide the initial commits. I found there are still some issue in sockets maintenance, and had a PR against his repo JodeZer#1 . This commit include JodeZer's commits and my fix, and I simplified the data structure. What's in this commit could be described as this figure: +------------------------+ | Session | <-------+ Add options here +------------------------+ +------------------------+ | Cluster | <-------+ Add options here +------------------------+ +------------------------+ | Server | <-------+*Add options here | | *add timestamp when recycle a socket +---+ | +-----------+ | +---+ *periodically check the unused sockets | | | shrinker <------+ and reclaim the timeout sockets. +---+ | +-----------+ | | | | | +------------------------+ | | +------------------------+ | | Socket | <-------+ Add a field for last used times+---------+ +------------------------+ Signed-off-by: Wang Xu <gnawux@gmail.com> * tests for shrink the socks pool Signed-off-by: Wang Xu <gnawux@gmail.com>

globalsign#116 adds a much needed ability to shrink the connection pool, but requires tracking the last-used timestamp for each socket after every operation. Frequent calls to time.Now() in the hot-path reduced read throughput by ~6% and increased the latency (and variance) of socket operations as a whole. This PR adds a periodically updated time value to amortise the cost of the last- used bookkeeping, restoring the original throughput at the cost of approximate last-used values (configured to be ~25ms of potential skew). On some systems (currently including FreeBSD) querying the time counter also requires a syscall/context switch. Fixes globalsign#142.

gnawux force-pushed the release_idle_for_upstream branch 5 times, most recently from 2930169 to dd9c654 Compare February 27, 2018 07:37

szank previously approved these changes Feb 27, 2018

View reviewed changes

gnawux added 2 commits February 27, 2018 20:43

tests for shrink the socks pool

cebb534

Signed-off-by: Wang Xu <gnawux@gmail.com>

gnawux dismissed szank’s stale review via cebb534 February 27, 2018 13:06

gnawux force-pushed the release_idle_for_upstream branch from ea1fb2b to cebb534 Compare February 27, 2018 13:06

szank approved these changes Feb 28, 2018

View reviewed changes

domodwyer reviewed Mar 1, 2018

View reviewed changes

session_test.go

func (s *S) TestPoolShrink(c *C) {

if *fast {

c.Skip("-fast")

Copy link

domodwyer Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

domodwyer added enhancement needs stg test labels Mar 1, 2018

domodwyer assigned gnawux Mar 1, 2018

domodwyer requested review from brknstrngz, tadukurow, domodwyer, weiishann, csucu and eminano March 1, 2018 17:33

weiishann approved these changes Mar 1, 2018

View reviewed changes

domodwyer approved these changes Mar 1, 2018

View reviewed changes

domodwyer merged commit 860240e into globalsign:development Mar 1, 2018

KJTsanaktsidis mentioned this pull request Mar 4, 2018

Add signaling support for connection pool waiting #115

Merged

domodwyer mentioned this pull request Apr 11, 2018

Investigate performance regression in development #142

Closed

domodwyer mentioned this pull request Apr 19, 2018

Increasing alive socket count, looks like a leak #148

Closed

domodwyer mentioned this pull request Apr 19, 2018

socket: amortise cost of querying OS time counter #149

Merged

domodwyer mentioned this pull request Apr 23, 2018

Release/r2018.04.23 #152

Merged

domodwyer removed the needs stg test label Apr 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable shrink the socket pool size #116

enable shrink the socket pool size #116

gnawux commented Feb 27, 2018

gnawux commented Feb 27, 2018

szank left a comment

szank Feb 27, 2018

szank Feb 27, 2018

szank Feb 27, 2018

szank Feb 27, 2018

gnawux commented Feb 27, 2018

gnawux commented Feb 27, 2018

KJTsanaktsidis commented Feb 27, 2018 •

edited

Loading

domodwyer Mar 1, 2018

domodwyer commented Mar 1, 2018

gnawux commented Mar 1, 2018

enable shrink the socket pool size #116

enable shrink the socket pool size #116

Conversation

gnawux commented Feb 27, 2018

gnawux commented Feb 27, 2018

szank left a comment

Choose a reason for hiding this comment

szank Feb 27, 2018

Choose a reason for hiding this comment

szank Feb 27, 2018

Choose a reason for hiding this comment

szank Feb 27, 2018

Choose a reason for hiding this comment

szank Feb 27, 2018

Choose a reason for hiding this comment

gnawux commented Feb 27, 2018

gnawux commented Feb 27, 2018

KJTsanaktsidis commented Feb 27, 2018 • edited Loading

domodwyer Mar 1, 2018

Choose a reason for hiding this comment

domodwyer commented Mar 1, 2018

gnawux commented Mar 1, 2018

KJTsanaktsidis commented Feb 27, 2018 •

edited

Loading