-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse live test matrix on client and service version #10085
Comments
Varying what we run based on the day of the week makes builds non-repeatable and will cause issues when diagnosing issues as well as tracking the test history (assuming the test names are the same). Instead of varying which settings we use based on the day we should create a test matrix that remains constant and we can execute a subset of that matrix in different jobs or pipelines that can be run on whatever schedule makes sense to balance resource consumption. It is not feasible to test every possible combination and permutation so we have to do our best to come up with a sparse test matrix that maximizes the value we gain from the test coverage with the cost (both time and money) it takes to run all those. I do not believe we should be introducing what appears to be randomization to our test runs to try and address this problem we should simply run the set of tests we feel covers the most important permutations. |
I agree with Wes's points on this. We should not have a single pipeline that has different behaviors on different days. It will be impossible to debug failures in the pipeline for anyone without the tribal knowledge of what happens on each day of the week. |
It is a tradeoff between repeatable and discovery/maintenance. For the option of static setting on pipelines, we have to manually add configs every time there is an update on Moreover, Java is at the phase of issue discover. Randomization will disclose some failures only happen on specific combo on certain platforms. I don't think day of week will affect the test result at current phase. If it is the case in future, we can easily switch back to static setting by current implementation. |
I don't think discovery/maintenance will be helped by making the pipelines run differently on different days. It will in fact make it more difficult to discovery what really caused something to fail and thus make test exhibit reliability issues. Imagine working on a PR mutlipe days and on one day the CI checks fail but the next day when you get back to it they all pass. The likelihood of someone actually investigating the failure from yesterday is almost zero. I'd like to get to the true issue we are trying to solve as opposed to jumping to this particular solution. So what is the real underlying issue we are trying to solve? Is it that we want to run all the test combinations but they take too long? If that is the case have we tried any parallelization efforts in running the tests? I bet there is work we can do to improve the performance of these tests that would make this a non-issue and we wouldn't need to worry about overly-complicating things by running different variants on different days. |
The rolling strategy only applies to live tests. CI will only run on the same set of parameters (currently mock client and latest service version). The target is to avoid manually changing the configuration file for newly introduced parameters. |
Even for nightly live test runs we don't want it failing one day and not the next the same thing will start to happen were folks will ignore the failure because it started passing the next day. You still didn't answer the question about the real problem we are trying to solve. If it is about runtime then we should explore parallelization options. |
I'm sure I understand that statement. Wouldn't it simply be a matter of passing the parameter in the build yml file vs doing it in java code? I'm not sure how it avoids changing one over the other, some configuration will have to be changed in either case. |
Our main goal here is make a sparse test matrix to reduce the daily test run times and to save resources. Instead of pivoting on calendar day, we could also refactor the test runs into multiple legs (which are self-consistent) and then decide on the frequency of those test runs. @sima-zhu do you you could take a 2nd pass at the table above and propose different factoring so that each test leg is self-consistent and doesn't use the day of the week as a deciding factor of what gets tested? |
For example, if we have new parameters, like rpc or new service version. |
There is a common test yml template specifically for sharing such test configuration. Adding a new test matrix into that template will include it in all the libraries that share the common test templates, which should be all our newer ones. So what is the issue with adding a new test matrix entry to that matrix to cover the new parameters you want to change? |
Another thing is we don't want win java8 always run on netty, v1. |
The nature of the sparse matrix is to identify the key combinations we care about and add them to the test matrix. Unfortunately it isn't realistic to run all combinations, all the time. We should identity the matrix parameters we want to run all the time and if we want to run additional matrix entries only sometimes we should create additional pipelines that will run them on some schedule handled by our devops, not vary them in code as our pipelines will not be deterministic in that case. |
Could you share your thoughts? |
@sima-zhu have there been any investigation into running all the combinations in parallel in maven? Also for reference do we have any timing data on how long a test run currently takes to run okhttp and netty all the time in one test run? I believe that is the current situation. |
I have KeyVault for your reference. |
KeyVault is running all for now(4 in total, combinations of 2 clients and 2 vesrions). KeyVault is not the most time consuming tests. Storage takes even more, so we cannot extend this to storage until we figure out the best approach. |
cc @g2vinay
|
As what I learned from discussion, KeyVault has a risk of throttling if running too many in parallels. |
@weshaggard a simple for loop executing an API call for 25 - 50 times is enough to throttle the KV. |
The motivation to propose the rolling strategy was to test all combinations of HttpClient and ServiceVersions every day without spending too much time or consuming too many resources. We should have the necessary alerts to notify the team of any failures. |
@g2vinay have you chatted with the keyvault owners in the other languages? .NET for example tests usually run in about 30 mins and I'm pretty sure they are running them in parallel. I also see the python keyvault tests are running a large matrix in parallel. I do think it is worth further investigation into improving parallelization and will likely go a long way to improving the runtime of the tests. I believe we can still do a better job in general about enabling our tests to run in parallel. However even with full parallelization we can only get so far so I think we can define the full matrix in the yml file but only run a subset all the time and run the full set on a weekly or monthly schedule in another pipeline. That should enable us to add more parameters now while we still work on improving the tests themselves. |
Synced with @srnagar Here is how we want to move forward.
For non-critical path, we will defer the choice to each client to setup their own matrix. E.g. KeyVault can have additional pipeline run on netty + old version or more pipelines if needed. @weshaggard Let me know if you have any suggestion. |
Yes that seems inline with what I was thinking. You could even add more matrix entries and only execute them in another pipeline under some condition. See Azure/azure-sdk-for-python#9031 which is only running a subset of the matrix entries for PRs but more for nightly test runs. You could use a similar strategy to run more matrix entries on some other weekly/monthly schedule if needed. |
@sima-zhu Does this mean that there will be no testing of earlier service versions and that we only ever test the latest? Will there be some pipeline specified that runs against all service versions? |
@JonathanGiles Need to work with @srnagar @g2vinay whether we want to enable the pipeline. |
Fixed with #10312 |
Problems:
Java live tests currently only run on netty httpclient and latest service version. However, we have more popular httpclients and more service version needs to cover on our live tests to make sure customers are using good library.
Proposals:
We currently have 2 httpClients (netty, okhttp) and several service version.
Our test is able to configure which httpclient and service version the test can run on.
One of the example basically looks like below:
Pros:
The text was updated successfully, but these errors were encountered: