-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests are a bit flaky with the new redis backend #95
Comments
I usually test in isolated docker containers and I didn't notice anything unusual, except it is talking longer than memory backend. Additionally, the integration test is taking noticeably longer as it needs to train about 7K instances, classify 1K instances, untrain 2K instances, and classify 1K instances again, both for Memory Redis. If the Redis is not running or cant connect, those tests are skipped. I think the lag is coming from the periodic save operations when Redis tries to log the write operations into dump.rdb file. However, since we don't need persistence on disk, we can disable this behavior. I will give it a try and will see if the speed improves. |
Let try disabling that. If the slower tests are mostly due to the larger test case, I'm fine with that. We aren't running tons of builds back to back, so I think better test quality beats speed here. |
Data size does affect the testing time. I have refactored the code to make it configurable as constants rather than hard coding numbers in the test itself. TRAINING_SIZE = 4000, TESTING_SIZE = 1000 => About 17 sec. |
I would much prefer to move the backend initialization and configuration to be run once per test suit rather than for each test case. However, MiniTest does not provide this functionality out of the box. It is possible with custom runner though. |
If you're willing to write a custom runner, I'm open to it. The only reason we use minitest is b/c it was originally done with test unit. |
I am fairly new to actually writing tests, so I will have to understand the dynamics of what is involved and how would that custom runner work. Test unit had various lifecycle callback methods including before and after each test case and suit, but for some reason Minitest decided not to have them available. I initially tried writing an initialize method in my test class, but it turned out that it was going to be called for each test case, so it was not going to help. However, we can perhaps cache the instance variable that holds the reference to Redis backend. On the other hand, I feel like it will be a micro-optimization, because currently the slowest test is the integration one with lots of data. That said, #97 can still be merged and we can revisit custom runner option later. |
I'll leave this open for now so we can continue to discuss and explore. |
It looks like writing the custom runner will not benefit us in this case. I did some experiments and here are my findings:
|
Excellent profiling. I'd say based on your results, it's safe to close this. |
The integration test is where all the time is consumed, here are some stats based on different values of training and testing data sizes (time values are sum of Memory and Redis backends for the same task):
|
Cool. Let's keep an eye on it. I think after #97 its pretty good. |
I have added some more data in the previous comment to indicate how much accumulative time the integration test took individually in both the backends. It looks like Memory is about 40 times faster than Redis, but obviously the benefit of Redis is in persistence and collaboration. |
not terribly surprising. We may want to document that. |
With #98 it is not only documented, but reproducible as well. By documenting if you mean we should add it in the README file then yes, we can add a summarized note there. |
* Disabled Redis disc persistence and refactored integration test, fixes #95 * Added Bayes backend benchmarks
Yeah, I meant the readme. I plan to restructure the readme to make it a bit more cohesive soon, so any note will work. |
I have added brief note about the performance of the Redis backend in PR #103. However, I agree that the README needs some restructure with more headings and sub-headings to reorganize the information that was accumulated over the years. |
Yeah, the readme is a bit of a mess, but shouldn't be too hard to reorganize. I'll probably throw up a quick github page or something similar to help with discovery as well. |
* Disabled Redis disc persistence and refactored integration test, fixes #95 * rb-gsl gem is now maintained as gsl * Documented Redis backend performance
This is a rather good idea actually. We can just use the |
That sounds good. |
* Disabled Redis disc persistence and refactored integration test, fixes #95 * Added Dockerfile with documentation
It looks like the tests are a bit flaky now and seem to hang from time to time.
The text was updated successfully, but these errors were encountered: