-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection pooling? #8
Comments
There's not currently a way to pool connections between filters :-/ That said, what you're doing to implement expiring bloom filters is exactly how we do it and is commonly how it's done elsewhere. How many filters do you have at any one time? |
Ah, it's good to hear that you do it the same way. I only have 7 filters at once, but this is multiplied by each celery worker making its own connections to the filters- this ends up being a few thousand connections in practice. I'll probably try sharing across different processes |
Is the number of connections problematic at the redis server level? It uses Assuming it's not actual networking overhead causing the heartache, at the end of the day, all your celery workers are interacting with this single shared resource. It seems likely that eventually redis' performance may become an issue. For some context, there are a few projects for which we use |
Those are good numbers for reference, thanks! Are you partitioning across the different redis-server instances on the same ec2 instances for those performance reasons/did you find that to be better than using individual redis-server instances on each box? I'm currently at around 45k reads per second without pipelining (on a hosted solution actually, on what appears to be m2.2xlarges) and was somewhat concerned about the number of open connections, but judging by your experiences it shouldn't be a big issue (except for hosted plans with connection limits). Thanks for all the advice! |
We treat the servers as just It may also help to give some context about the bloom capacities. IIRC, we generally have a capacity of about 1e9 for each month partition, and it uses I think 7 or so hashes for each filter. |
For what it's worth, apparently there is performance degradation with high connection counts: Requests/second vs # of open connections, from http://redis.io/topics/benchmarks
|
Is there a good way to share redis connections across different instances of pyreBloom?
I'm currently using a rotating pool of filters to implement a sort of TTL- newly seen urls get added to the current filter, but all filters are checked for membership. At some set interval (hour/day/etc) the oldest filter gets cleared out and reused for the current one.
This works out pretty well, except it uses up lots of connections. Is there a good way to reuse connections between filters or specify the key name to check? Should I take an entirely different approach to expiring old urls?
The text was updated successfully, but these errors were encountered: