Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable shrink the socket pool size #87

Closed

Conversation

gnawux
Copy link

@gnawux gnawux commented Jan 26, 2018

we found the mgo will allocate the pool size during burst traffic but won't
close the sockets any more until restart the client or server.

And the mongo document defines two related query options

By implementing these two options, it could shrink the pool to minPoolSize after
the sockets introduced by burst traffic timeout.

The idea comes from https://github.com/JodeZer/mgo , he investigated
this issue and provide the initial commits.

I found there are still some issue in sockets maintenance, and had a PR against
his repo JodeZer#1 .

This commit include JodeZer's commits and my fix, and I simplified the data structure.
What's in this commit could be described as this figure:

+------------------------+
|        Session         | <-------+ Add options here
+------------------------+

+------------------------+
|        Cluster         | <-------+ Add options here
+------------------------+

+------------------------+
|        Server          | <-------+*Add options here
|                        |          *add timestamp when recycle a socket  +---+
|          +-----------+ |    +---+ *periodically check the unused sockets    |
|          | shrinker  <------+          and reclaim the timeout sockets. +---+
|          +-----------+ |                                                    |
|                        |                                                    |
+------------------------+                                                    |
                                                                              |
+------------------------+                                                    |
|        Socket          | <-------+ Add a field for last used times+---------+
+------------------------+

we found the mgo will allocate the pool size during burst traffic but won't
close the sockets any more until restart the client or server.

And the mongo document defines two related query options

- [minPoolSize](https://docs.mongodb.com/manual/reference/connection-string/#urioption.minPoolSize)
- [maxIdleTimeMS](https://docs.mongodb.com/manual/reference/connection-string/#urioption.maxIdleTimeMS)

By implementing these two options, it could shrink the pool to minPoolSize after
the sockets introduced by burst traffic timeout.

The idea comes from https://github.com/JodeZer/mgo , he investigated
this issue and provide the initial commits.

I found there are still some issue in sockets maintenance, and had a PR against
his repo JodeZer#1 .

This commit include JodeZer's commits and my fix, and I simplified the data structure.
What's in this commit could be described as this figure:

+------------------------+
|        Session         | <-------+ Add options here
+------------------------+

+------------------------+
|        Cluster         | <-------+ Add options here
+------------------------+

+------------------------+
|        Server          | <-------+*Add options here
|                        |          *add timestamp when recycle a socket  +---+
|          +-----------+ |    +---+ *periodically check the unused sockets    |
|          | shrinker  <------+          and reclaim the timeout sockets. +---+
|          +-----------+ |                                                    |
|                        |                                                    |
+------------------------+                                                    |
                                                                              |
+------------------------+                                                    |
|        Socket          | <-------+ Add a field for last used times+---------+
+------------------------+

Signed-off-by: Wang Xu <gnawux@gmail.com>
@gnawux
Copy link
Author

gnawux commented Jan 26, 2018

In the testing matrix, Go 1.9/Mongo 3.4 failed. And I can't find how it related to this change. Could anyone help figuring out what happened?

@domodwyer
Copy link

domodwyer commented Jan 28, 2018

Hi @gnawux

Great PR and I love a good diagram, thanks! I'm going to take a good look at this as it could impact performance, though I suppose only under decreasing load so might not be an issue.

As for the tests, they're just flakey. I restarted the build and it succeeded fine.

Thank you for taking the time to open a PR! It's really appreciated.

Dom

@domodwyer
Copy link

Is it possible to include a couple of unit and/or integration tests to cover the "shrinking" code path?

@gnawux
Copy link
Author

gnawux commented Jan 29, 2018

Sure, I'd like to, but it may take some time :)

By the way, do you have any suggested docs or examples for the test cases of mgo that I could ref?

@domodwyer
Copy link

Hi @gnawux

That's would be great, thanks! The code looks fine, the tests are just for future maintainability though not such a problem with this change I suppose.

Either way, a basic example is TestPing() - your test case will want to:

  • Dial to mongo with shrinking enabled
  • Copy() a few sessions to create new connections
  • Close() all copied sessions
  • Wait/force a shrink
  • Check Stats.SocketsAlive to ensure the pool has shrunk to minPoolSize

I can help if you need! Thanks again for taking the time to help 👍

Dom

@@ -419,7 +421,7 @@ func (cluster *mongoCluster) server(addr string, tcpaddr *net.TCPAddr) *mongoSer
if server != nil {
return server
}
return newServer(addr, tcpaddr, cluster.sync, cluster.dial)
return newServer(addr, tcpaddr, cluster.sync, cluster.dial, cluster.minPoolSize, cluster.maxIdleTimeMS)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit late, and I might be wrong here, but aren't you reading the cluster fields (minPoolSize, maxIdleTimeMs) outside of the read lock?

I assume they stay constant during the run of the application, but stil I would appreciate a comment here stating tha this is safe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do, thanks @szank

@@ -366,6 +377,16 @@ func ParseURL(url string) (*DialInfo, error) {
doc = append(doc, bson.DocElem{Name: strings.TrimSpace(kvp[0]), Value: strings.TrimSpace(kvp[1])})
}
readPreferenceTagSets = append(readPreferenceTagSets, doc)
case "minPoolSize":
minPoolSize, err = strconv.Atoi(opt.value)
if err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please return an error if the value is <0?
This is unlikely to happen, but it is still an invalid input and it would be better to notify the user instead of silently ignoring the values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update

@@ -552,6 +583,15 @@ func DialWithInfo(info *DialInfo) (*Session, error) {
if info.PoolLimit > 0 {
session.poolLimit = info.PoolLimit
}

if info.MinPoolSize > 0 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this checks should be reversed to return an error if MinPoolSize or MaxIdleTimeMS are <0

}
now := time.Now()
end := 0
reclaimMap := map[*mongoSocket]bool{}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a minor nitpick, but map[*mongoSocket]struct{}{} is more idiomatic

copy(next, server.unusedSockets[end:])
server.unusedSockets = next
remainSockets := []*mongoSocket{}
for _, s := range server.liveSockets {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is late, and I might be missing something here, but it looks like you populate the reclaimMap with the unused sockets. Then you check if the liveSockets contain elements from the unusedSockets. Is it possible to liveSockets and unusedSockets to contain pointers to the same structs ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the liveSockets includes both in-use sockets and unusedSockets, and when we reclaim the timeout elements in unusedSockets, they should be removed from liveSockets as well. If you don't remove them, there would lead to a "leakage", i.e. when you allocate new socket, there are full of live sockets but no unused, and no sockets could be allocated then. I observed the case during my local test.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks for the clarification :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants