Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable shrink the socket pool size #116

Merged
merged 2 commits into from
Mar 1, 2018

Conversation

gnawux
Copy link

@gnawux gnawux commented Feb 27, 2018

This is an updated version of #87, changes:

  • rebased to new development branch
  • update based on code review
  • add test cases

Below is the original introduction


we found the mgo will allocate the pool size during burst traffic but won't
close the sockets any more until restart the client or server.

And the mongo document defines two related query options

By implementing these two options, it could shrink the pool to minPoolSize after
the sockets introduced by burst traffic timeout.

The idea comes from https://github.com/JodeZer/mgo , he investigated
this issue and provide the initial commits.

I found there are still some issue in sockets maintenance, and had a PR against
his repo JodeZer#1 .

This commit include JodeZer's commits and my fix, and I simplified the data structure.
What's in this commit could be described as this figure:

+------------------------+
|        Session         | <-------+ Add options here
+------------------------+

+------------------------+
|        Cluster         | <-------+ Add options here
+------------------------+

+------------------------+
|        Server          | <-------+*Add options here
|                        |          *add timestamp when recycle a socket  +---+
|          +-----------+ |    +---+ *periodically check the unused sockets    |
|          | shrinker  <------+          and reclaim the timeout sockets. +---+
|          +-----------+ |                                                    |
|                        |                                                    |
+------------------------+                                                    |
                                                                              |
+------------------------+                                                    |
|        Socket          | <-------+ Add a field for last used times+---------+
+------------------------+

@gnawux gnawux force-pushed the release_idle_for_upstream branch 5 times, most recently from 2930169 to dd9c654 Compare February 27, 2018 07:37
@gnawux
Copy link
Author

gnawux commented Feb 27, 2018

@szank @domodwyer I updated #87 based on your review and added tests here.

szank
szank previously approved these changes Feb 27, 2018
Copy link

@szank szank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Looks good. There are few minor typos, but I will approve it anyway.
Of course you are more than welcome to fix it, but otherwise, it is solid.
Approved.

session_test.go Outdated
}
wg.Wait()
stats := mgo.GetStats()
c.Logf("living socket: After queries: %d, before queris: %d", stats.SocketsAlive, oldSocket)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queris: typo.

session_test.go Outdated
c.Logf("living socket: After queries: %d, before queris: %d", stats.SocketsAlive, oldSocket)

// give some time for shrink the pool, the tick is set to 1 minute
c.Log("Sleeping... 1 minutes to for pool shrinking")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for nitpicking, but it should be "1 minute" here, and time.Seleep(60*time.Second) below

server.go Outdated
now := time.Now()
end := 0
reclaimMap := map[*mongoSocket]struct{}{}
// Because the acquirision and recycle are done at the tail of array,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acquisition

session.go Outdated
// maxIdleTimeMS=<millisecond>
//
// The maximum number of milliseconds that a connection can remain idle in the pool
// before being removed and closed.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a clarification:
If maxIdleTimeMS is 0, connections will never be closed due to inactivity.

we found the mgo will allocate the pool size during burst traffic but won't
close the sockets any more until restart the client or server.

And the mongo document defines two related query options

- [minPoolSize](https://docs.mongodb.com/manual/reference/connection-string/#urioption.minPoolSize)
- [maxIdleTimeMS](https://docs.mongodb.com/manual/reference/connection-string/#urioption.maxIdleTimeMS)

By implementing these two options, it could shrink the pool to minPoolSize after
the sockets introduced by burst traffic timeout.

The idea comes from https://github.com/JodeZer/mgo , he investigated
this issue and provide the initial commits.

I found there are still some issue in sockets maintenance, and had a PR against
his repo JodeZer#1 .

This commit include JodeZer's commits and my fix, and I simplified the data structure.
What's in this commit could be described as this figure:

+------------------------+
| Session | <-------+ Add options here
+------------------------+

+------------------------+
| Cluster | <-------+ Add options here
+------------------------+

+------------------------+
| Server | <-------+*Add options here
| | *add timestamp when recycle a socket +---+
| +-----------+ | +---+ *periodically check the unused sockets |
| | shrinker <------+ and reclaim the timeout sockets. +---+
| +-----------+ | |
| | |
+------------------------+ |
|
+------------------------+ |
| Socket | <-------+ Add a field for last used times+---------+
+------------------------+

Signed-off-by: Wang Xu <gnawux@gmail.com>
Signed-off-by: Wang Xu <gnawux@gmail.com>
@gnawux
Copy link
Author

gnawux commented Feb 27, 2018

@szank Thanks for your comments, updated.

@gnawux
Copy link
Author

gnawux commented Feb 27, 2018

looks the travis failure is not related with this patch

@KJTsanaktsidis
Copy link

KJTsanaktsidis commented Feb 27, 2018

👍 nice work! LGTM.


func (s *S) TestPoolShrink(c *C) {
if *fast {
c.Skip("-fast")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

@domodwyer
Copy link

Hi @gnawux

This looks great, I'll get the build to pass with a bit of retry fun and then we'll get this merged.

Thanks so much for taking the time to help!

Dom

@gnawux
Copy link
Author

gnawux commented Mar 1, 2018

It's my pleasure, and as I mentioned in the commit message, the original investigation credits to @JodeZer, I just did some tests and improvement.

@domodwyer domodwyer merged commit 860240e into globalsign:development Mar 1, 2018
domodwyer added a commit that referenced this pull request Apr 19, 2018
#116 adds a much needed ability to shrink the connection pool, but requires
tracking the last-used timestamp for each socket after every operation. Frequent
calls to time.Now() in the hot-path reduced read throughput by ~6% and increased
the latency (and variance) of socket operations as a whole.

This PR adds a periodically updated time value to amortise the cost of the last-
used bookkeeping, restoring the original throughput at the cost of approximate
last-used values (configured to be ~25ms of potential skew).

On some systems (currently including FreeBSD) querying the time counter also
requires a syscall/context switch.

Fixes #142.
domodwyer added a commit that referenced this pull request Apr 19, 2018
#116 adds a much needed ability to shrink the connection pool, but requires
tracking the last-used timestamp for each socket after every operation. Frequent
calls to time.Now() in the hot-path reduced read throughput by ~6% and increased
the latency (and variance) of socket operations as a whole.

This PR adds a periodically updated time value to amortise the cost of the last-
used bookkeeping, restoring the original throughput at the cost of approximate
last-used values (configured to be ~25ms of potential skew).

On some systems (currently including FreeBSD) querying the time counter also
requires a syscall/context switch.

Fixes #142.
domodwyer added a commit that referenced this pull request Apr 19, 2018
#116 adds a much needed ability to shrink the connection pool, but requires
tracking the last-used timestamp for each socket after every operation. Frequent
calls to time.Now() in the hot-path reduced read throughput by ~6% and increased
the latency (and variance) of socket operations as a whole.

This PR adds a periodically updated time value to amortise the cost of the last-
used bookkeeping, restoring the original throughput at the cost of approximate
last-used values (configured to be ~25ms of potential skew).

On some systems (currently including FreeBSD) querying the time counter also
requires a syscall/context switch.

Fixes #142.
domodwyer added a commit that referenced this pull request Apr 19, 2018
#116 adds a much needed ability to shrink the connection pool, but requires
tracking the last-used timestamp for each socket after every operation. Frequent
calls to time.Now() in the hot-path reduced read throughput by ~6% and increased
the latency (and variance) of socket operations as a whole.

This PR adds a periodically updated time value to amortise the cost of the last-
used bookkeeping, restoring the original throughput at the cost of approximate
last-used values (configured to be ~25ms of potential skew).

On some systems (currently including FreeBSD) querying the time counter also
requires a syscall/context switch.

Fixes #142.
@domodwyer domodwyer mentioned this pull request Apr 23, 2018
libi pushed a commit to libi/mgo that referenced this pull request Dec 1, 2022
* enable shrink the socket pool size

we found the mgo will allocate the pool size during burst traffic but won't
close the sockets any more until restart the client or server.

And the mongo document defines two related query options

- [minPoolSize](https://docs.mongodb.com/manual/reference/connection-string/#urioption.minPoolSize)
- [maxIdleTimeMS](https://docs.mongodb.com/manual/reference/connection-string/#urioption.maxIdleTimeMS)

By implementing these two options, it could shrink the pool to minPoolSize after
the sockets introduced by burst traffic timeout.

The idea comes from https://github.com/JodeZer/mgo , he investigated
this issue and provide the initial commits.

I found there are still some issue in sockets maintenance, and had a PR against
his repo JodeZer#1 .

This commit include JodeZer's commits and my fix, and I simplified the data structure.
What's in this commit could be described as this figure:

+------------------------+
| Session | <-------+ Add options here
+------------------------+

+------------------------+
| Cluster | <-------+ Add options here
+------------------------+

+------------------------+
| Server | <-------+*Add options here
| | *add timestamp when recycle a socket +---+
| +-----------+ | +---+ *periodically check the unused sockets |
| | shrinker <------+ and reclaim the timeout sockets. +---+
| +-----------+ | |
| | |
+------------------------+ |
|
+------------------------+ |
| Socket | <-------+ Add a field for last used times+---------+
+------------------------+

Signed-off-by: Wang Xu <gnawux@gmail.com>

* tests for shrink the socks pool

Signed-off-by: Wang Xu <gnawux@gmail.com>
libi pushed a commit to libi/mgo that referenced this pull request Dec 1, 2022
globalsign#116 adds a much needed ability to shrink the connection pool, but requires
tracking the last-used timestamp for each socket after every operation. Frequent
calls to time.Now() in the hot-path reduced read throughput by ~6% and increased
the latency (and variance) of socket operations as a whole.

This PR adds a periodically updated time value to amortise the cost of the last-
used bookkeeping, restoring the original throughput at the cost of approximate
last-used values (configured to be ~25ms of potential skew).

On some systems (currently including FreeBSD) querying the time counter also
requires a syscall/context switch.

Fixes globalsign#142.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants