-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent error "This socket has been ended by the other party" #144
Comments
Same issue, with other bolt driver (this time java driver).... |
just saw the same symptoms with the javascript bolt driver. Using version 1.1.0-M02 : Query failing with this error:
|
is anyone maintaining this driver? There's been no post on this issue for more than a month.. |
@viz I had the same issue, I checked the latest version of the driver from GitHub and so far so good. I'll update you if it starts again. |
No, nevermind, still happening. |
Hi, sorry for long waiting time. Could @vpsouza you share the docker image that you are using and give us some instruction how to re-produce this error? |
@zhenlineo I'm using HAProxy in the middle, configured as it was said in the HAProxy doc something really close to this This is the stacktrace based on commit 56c7995
|
Hi @dozoisch, your connection will get killed by HAProxy after 30s timeout? (Not sure the HAProxy setting means an idle timeout or just kill the tcp connection after 30s) Some more info about this neo4j/neo4j-java-driver#213 |
@zhenlineo so the "timeout connect" is timeout when establishing the connection. The idle timeout is the "timeout client". (see http://serverfault.com/a/778222/138439 for more info) I'm currently putting in place a retry mechanism. So that when this particular error happen I retry a number of times. I'll report on how that goes! |
So I'm doing something similar to this: const maxRetry = 4;
function doRun(cypher, params, numberOfTry) {
const session = driver.session();
return Promise.resolve().then(() => session.run(cypher, params))
.then(
results => {
session.close();
return results;
},
err => {
session.close();
if (numberOfTry < maxRetry && err.message.indexOf('ended by the other party') > -1) {
// Try again
return doRun(cypher, params, ++numberOfTry);
}
return Promise.reject(err);
}
);
}
exports.run = function(cypher, params) {
return doRun(cypher, params, 0);
}; And it seems to be working so far. I've put the retry count higher than what is probably needed, but I'm logging when it happens so I'll be able to tweak it. It seems to be working so far! It's not the perfect solution, but it does work around the issue. |
@dozoisch You might want to set the retry time to be higher than pool size which you could set in the config when you construct the driver. The driver will first try to reuse connections from the connection pool and then roll back to create new connections. So for the worst case, you need to retry poolSize + 1 times to get a fresh connection with the server. |
Oh good idea! |
I get this error in the following scenario:
|
Hello @vpsouza, @viz, @louisiukas. Do you also use any proxy or load balancer between JS driver and Neo4j database? Current implementation of the connection pool does not have any "test on borrow" or "idle connection refresh" functionality. So if idle pooled connection is killed/broken it'll just be given to the user code as is. |
Hey @lutovich. I also seem to be getting this problem in my Heroku app, but not localhost. I'm not sure, but I believe they use proxy/load balancer: https://devcenter.heroku.com/articles/how-heroku-works I also have IP-based geolocation in my app where on localhost it gets the proper location, but on Heroku it gets the server's location, if that gives you any idea of whether or not they use proxy/load balancer. Update: I can confirm this is only happening on my Heroku app, and not localhost. |
@zhenlineo @lutovich Is this an issue that will/can be fixed, or should I just implement retries in my code? |
@booboothefool I think it is better to have retries in your code for now. We'll try to work out a way to fix/mitigate this problem. |
FYI: I don't use a load balancer but my app and database are running in separate linked docker containers (sandboxed on different ip addresses on same host). The connection breaks because I deliberately restart the neo4j-container to test for such a scenario in production. |
We've added test-on-borrow functionality to the Java driver (neo4j/neo4j-java-driver#297) and would like to get some feedback on that. If it turns out to be fine then similar feature will be added to the Javascript driver. |
Also get similar issue:
My neo4j machine is on aws behind an internal load balancer currently using version Didn't had the chance to update to version |
@lutovich Is there anyway to temporary workaround this issue? Can also add that even after removing the load balancer im still getting this issue. |
@royipressburger do you have any errors in the database logs |
@lutovich Hi, I dont see any error in the logs. |
@royipressburger query should be retried in a fresh session with some delay. Logging might be incomplete because of a problem fixed here: neo4j/neo4j#8662. |
@lutovich Thanks for you answer ill try new session and a some delay. |
Ever since I started doing this, I stopped seeing the error. Like it doesn't even need to retry... I literally do not see "socket has been ended by the other party" anymore. Don't really get it, but I'm happy, as my app always fetches successfully. It's been a few weeks too. 👍
|
All you did is to move the location of the session creation? Looks odd |
@royipressburger Yeah... that's really all I did. It doesn't happen on reads for me anymore, but still happens on writes. I agree with you that it's definitely too common with writes. |
What do you suggest is the proper way to do this? From my code above, I realize this completely breaks my app when I have a good amount of users. I end up with out of memory errors on my GrapheneDB. Am I really supposed to be retrying poolSize + 1 times? Default is 50, so 51 times, really? Also, should I be closing the session if it errors?
I switched it to this (only retry 3 times with 1500ms delay) and it seems to not destroy my servers with out of memory, also session.close() on the error, but really need to hear your suggestions as this is starting to give me a lot of problems in production now that I have more users.
How would you write this? Thank you. |
Hi @booboothefool, Sorry for a late reply. Code with |
Hey @lutovich, Yeah, it's been about a week and things seem to be working well with 3 retries, 1.5 second delay. poolSize + 1 = 51 retries seems like way too much... should probably just give up after 3 failures... lol. I'm using some unknown little library for the |
Hi @booboothefool, Heap size is not very big. Multiple back-to-back retries might cause database to spin up many new threads to serve new connections from the driver. Threads are really heavy objects, maybe this caused OOMs. This is pure speculation... What kind of errors do you see when retry happens? Are they only "This socket has been ended by the other party"? We plan to add API with retries for 1.2 release. It will probably be useful in this case. |
@lutovich Yeah, only "This socket has been ended by other party". I see 1.2 has been released. Is it recommended to switch over? If so, can you show an example of how to use the retry API? |
@booboothefool I just added a readme item about new retry API here: #224 |
Besides using This issue shall not be seen anymore. Thanks for raising this issue and give us your feedbacks. |
Hi, i'm connecting to my neo4j running on a docker container and i'm getting an error when i try to attempt to make another connection/query. The error is "This socket has been ended by the other party". This is my Dockerfile with the opened ports:
EXPOSE 7474 7473 7687
This is my driver utility:
The way i'm making my queries to neo4j:
And this is my stack trace:
I'm desperate, please help me!
The text was updated successfully, but these errors were encountered: