Sparse live test matrix on client and service version #10085

sima-zhu · 2020-04-10T17:28:46Z

Problems:

Java live tests currently only run on netty httpclient and latest service version. However, we have more popular httpclients and more service version needs to cover on our live tests to make sure customers are using good library.

Proposals:

We currently have 2 httpClients (netty, okhttp) and several service version.
Our test is able to configure which httpclient and service version the test can run on.

We implement a rolling strategy, which make all combinations can run on different platform (os+jdk) on one day. Then roll the tests on the next day, which means all platforms has the chance of running different settings on weekly basis.

One of the example basically looks like below:


		Monday	Tuesday	Wednesday	Thursday	Friday	Saturday	Sunday
Java 8	Linux	Netty	OkHttp	Netty	OkHttp	Netty	OkHttp	Netty
	Windows	OkHttp	Netty	OkHttp	Netty	OkHttp	Netty	OkHttp
	MacOS	Netty	OkHttp	Netty	OkHttp	Netty	OkHttp	Netty
Java 11	Linux	OkHttp	Netty	OkHttp	Netty	OkHttp	Netty	OkHttp
	Windows	Netty	OkHttp	Netty	OkHttp	Netty	OkHttp	Netty
	MacOS	OkHttp	Netty	OkHttp	Netty	OkHttp	Netty	OkHttp


		Monday	Tuesday	Wednesday	Thursday	Friday	Saturday	Sunday
Java 8	Linux	Version 1	Version 2	Version 3	Version 4	Version 1	Version 2	Version 3
	Windows	Version 2	Version 3	Version 4	Version 1	Version 2	Version 3	Version 1
	MacOS	Version 3	Version 4	Version 1	Version 2	Version 3	Version 1	Version 2
Java 11	Linux	Version 4	Version 1	Version 2	Version 3	Version 4	Version 1	Version 2
	Windows	Version 1	Version 2	Version 3	Version 4	Version 1	Version 2	Version 3
	MacOS	Version 2	Version 3	Version 4	Version 1	Version 2	Version 3	Version 4

Pros:

In this way, each build does not takes too long to finish as there are usually 1 or 2 configurations to run on for every test.
We have a weekly repeatable pattern.
There are display names and printouts on log messages which is able to reproduce on the configurations it failed on pipeline.

weshaggard · 2020-04-10T17:58:28Z

Varying what we run based on the day of the week makes builds non-repeatable and will cause issues when diagnosing issues as well as tracking the test history (assuming the test names are the same). Instead of varying which settings we use based on the day we should create a test matrix that remains constant and we can execute a subset of that matrix in different jobs or pipelines that can be run on whatever schedule makes sense to balance resource consumption.

It is not feasible to test every possible combination and permutation so we have to do our best to come up with a sparse test matrix that maximizes the value we gain from the test coverage with the cost (both time and money) it takes to run all those. I do not believe we should be introducing what appears to be randomization to our test runs to try and address this problem we should simply run the set of tests we feel covers the most important permutations.

kurtzeborn · 2020-04-14T17:03:52Z

I agree with Wes's points on this. We should not have a single pipeline that has different behaviors on different days. It will be impossible to debug failures in the pipeline for anyone without the tribal knowledge of what happens on each day of the week.

sima-zhu · 2020-04-14T18:04:18Z

It is a tradeoff between repeatable and discovery/maintenance.

For the option of static setting on pipelines, we have to manually add configs every time there is an update on version/httpclient. It asks a lot of maintenance work, sometimes we have to re-release the azure-core-test by supporting some new env variables.

Moreover, Java is at the phase of issue discover. Randomization will disclose some failures only happen on specific combo on certain platforms. I don't think day of week will affect the test result at current phase. If it is the case in future, we can easily switch back to static setting by current implementation.

weshaggard · 2020-04-14T20:47:21Z

I don't think discovery/maintenance will be helped by making the pipelines run differently on different days. It will in fact make it more difficult to discovery what really caused something to fail and thus make test exhibit reliability issues. Imagine working on a PR mutlipe days and on one day the CI checks fail but the next day when you get back to it they all pass. The likelihood of someone actually investigating the failure from yesterday is almost zero.

I'd like to get to the true issue we are trying to solve as opposed to jumping to this particular solution. So what is the real underlying issue we are trying to solve? Is it that we want to run all the test combinations but they take too long? If that is the case have we tried any parallelization efforts in running the tests? I bet there is work we can do to improve the performance of these tests that would make this a non-issue and we wouldn't need to worry about overly-complicating things by running different variants on different days.

sima-zhu · 2020-04-14T20:54:53Z

The rolling strategy only applies to live tests. CI will only run on the same set of parameters (currently mock client and latest service version).

The target is to avoid manually changing the configuration file for newly introduced parameters.
Meanwhile, don't run too many test suites in one build. Better to finish the build in the early morning.

weshaggard · 2020-04-14T21:02:30Z

Even for nightly live test runs we don't want it failing one day and not the next the same thing will start to happen were folks will ignore the failure because it started passing the next day.

You still didn't answer the question about the real problem we are trying to solve. If it is about runtime then we should explore parallelization options.

weshaggard · 2020-04-14T21:04:42Z

The target is to avoid manually changing the configuration file for newly introduced parameters.

I'm sure I understand that statement. Wouldn't it simply be a matter of passing the parameter in the build yml file vs doing it in java code? I'm not sure how it avoids changing one over the other, some configuration will have to be changed in either case.

joshfree · 2020-04-14T21:11:48Z

Our main goal here is make a sparse test matrix to reduce the daily test run times and to save resources.

Instead of pivoting on calendar day, we could also refactor the test runs into multiple legs (which are self-consistent) and then decide on the frequency of those test runs. @sima-zhu do you you could take a 2nd pass at the table above and propose different factoring so that each test leg is self-consistent and doesn't use the day of the week as a deciding factor of what gets tested?

sima-zhu · 2020-04-14T21:11:48Z

For example, if we have new parameters, like rpc or new service version.
Then we need to have all clients (storage, keyvault, as you can name it) to update the yml file. Also we need to setup tons of new pipelines to make all clients run on new parameters. What's more, in order to let tests pick up the right env, we need to introduce env variable in java code as well, either in common azure-core-test or in each client library. If say in azure-core-test, then we have to release azure-core-test which don't want to release that often.

weshaggard · 2020-04-14T21:19:12Z

For example, if we have new parameters, like rpc or new service version.
Then we need to have all clients (storage, keyvault, as you can name it) to update the yml file.

There is a common test yml template specifically for sharing such test configuration. Adding a new test matrix into that template will include it in all the libraries that share the common test templates, which should be all our newer ones. So what is the issue with adding a new test matrix entry to that matrix to cover the new parameters you want to change?

sima-zhu · 2020-04-14T21:23:33Z

Another thing is we don't want win java8 always run on netty, v1.
If say we have (netty, okhttp) and (v1,v2)
We want (win + java8) has the chance to run (netty, v1), (netty, v2), (okhttp,v1), (okhttp, v2)
If we don't want to run too long, then we need to have 4 pipelines on each (os + jdk).
If we add one more httpclient, then it will increase to 6 pipelines per (os + jdk)

weshaggard · 2020-04-14T21:31:01Z

The nature of the sparse matrix is to identify the key combinations we care about and add them to the test matrix. Unfortunately it isn't realistic to run all combinations, all the time. We should identity the matrix parameters we want to run all the time and if we want to run additional matrix entries only sometimes we should create additional pipelines that will run them on some schedule handled by our devops, not vary them in code as our pipelines will not be deterministic in that case.

sima-zhu · 2020-04-14T21:36:11Z

@JonathanGiles @srnagar

Could you share your thoughts?

weshaggard · 2020-04-14T21:44:04Z

@sima-zhu have there been any investigation into running all the combinations in parallel in maven? Also for reference do we have any timing data on how long a test run currently takes to run okhttp and netty all the time in one test run? I believe that is the current situation.

sima-zhu · 2020-04-14T21:46:22Z

I have KeyVault for your reference.
https://dev.azure.com/azure-sdk/internal/_build?definitionId=504&_a=summary

sima-zhu · 2020-04-14T21:47:27Z

KeyVault is running all for now(4 in total, combinations of 2 clients and 2 vesrions). KeyVault is not the most time consuming tests. Storage takes even more, so we cannot extend this to storage until we figure out the best approach.

weshaggard · 2020-04-14T22:14:07Z

cc @g2vinay
Just taking a quick look at KeyVault it looks like each test leg takes about 2hrs 40mins and there are a couple things that jumps out at me:

Why are we setting MaxParallel to 2 (https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/keyvault/tests.yml#L7) that means only 2 legs can run at a time which will extend the full runtime of the entire pipeline.
Looking at a single test that seems to take at least 5 minutes to complete, listDeletedKeys right from the test method I see a number of calls to Sleep. That is generally a bad idea in tests because you guarantee that the test will take a longer time, instead it is generally recommended that we poll instead of blindly sleep.
As far as I can tell it doesn't appear as though the tests are not running in parallel. We should look into running them in parallel, see https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html. We should do that for all the tests.

sima-zhu · 2020-04-14T22:20:22Z

As what I learned from discussion, KeyVault has a risk of throttling if running too many in parallels.
Some of the client resources have the concern of running in parallel. The test parameters execute sequentially for the same reason (run netty, v1, then netty v2, and so on). The parallel setting depends on clients and not able to apply to all.

g2vinay · 2020-04-14T22:21:07Z

@weshaggard
Thanks for your observations.
For #1 and #3, increasing parallelism, causes KV to throttle.
For #2, sleep statements are there to give enough time for the action to complete on service side and to avoid throttling the KV.

a simple for loop executing an API call for 25 - 50 times is enough to throttle the KV.
So, lopping indefinitely without any delay tends to do that too.
I would like to experiment by changing the retry policy for #2 though and see if it reduces the need for the required delay.

srnagar · 2020-04-14T23:41:48Z

The nature of the sparse matrix is to identify the key combinations we care about and add them to the test matrix. Unfortunately it isn't realistic to run all combinations, all the time.

The motivation to propose the rolling strategy was to test all combinations of HttpClient and ServiceVersions every day without spending too much time or consuming too many resources.
If this is not an option, then we should at least look into the key combinations that run every day and augment it with another pipeline that runs all the less-important combinations once a week/month.

We should have the necessary alerts to notify the team of any failures.

weshaggard · 2020-04-15T21:24:59Z

@g2vinay have you chatted with the keyvault owners in the other languages? .NET for example tests usually run in about 30 mins and I'm pretty sure they are running them in parallel. I also see the python keyvault tests are running a large matrix in parallel. I do think it is worth further investigation into improving parallelization and will likely go a long way to improving the runtime of the tests.

I believe we can still do a better job in general about enabling our tests to run in parallel. However even with full parallelization we can only get so far so I think we can define the full matrix in the yml file but only run a subset all the time and run the full set on a weekly or monthly schedule in another pipeline. That should enable us to add more parameters now while we still work on improving the tests themselves.

sima-zhu · 2020-04-15T21:59:39Z

Synced with @srnagar

Here is how we want to move forward.
In archtype-sdk-tests.yml, we make current clients(netty, okhttp) evenly distributed on current matrix. We only run on latest service version.

Linux + jdk8 : netty + latest version
MacOs + jdk8: okhttp + latest version
Win + jdk8: netty + latest version
Linux+ jdk11: okhttp + latest version
MacOs + jdk11: netty + latest version
Win+ jdk11: okhttp + latest version

For non-critical path, we will defer the choice to each client to setup their own matrix. E.g. KeyVault can have additional pipeline run on netty + old version or more pipelines if needed.

@weshaggard Let me know if you have any suggestion.

weshaggard · 2020-04-15T23:52:03Z

Yes that seems inline with what I was thinking. You could even add more matrix entries and only execute them in another pipeline under some condition. See Azure/azure-sdk-for-python#9031 which is only running a subset of the matrix entries for PRs but more for nightly test runs. You could use a similar strategy to run more matrix entries on some other weekly/monthly schedule if needed.

JonathanGiles · 2020-04-19T21:11:00Z

@sima-zhu Does this mean that there will be no testing of earlier service versions and that we only ever test the latest? Will there be some pipeline specified that runs against all service versions?

sima-zhu · 2020-04-20T16:52:37Z

@JonathanGiles
Yes, we only run latest version in nightly live builds.
The non-critical path we currently did not setup yet. It depends on each client library.
I have setup a non-critical pipeline for KeyVault and run once a week currently.
https://dev.azure.com/azure-sdk/internal/_build?definitionId=1673&_a=summary

Need to work with @srnagar @g2vinay whether we want to enable the pipeline.

weshaggard · 2020-04-21T21:52:27Z

Fixed with #10312

triage-new-issues bot added the triage label Apr 10, 2020

sima-zhu assigned sima-zhu, weshaggard, srnagar and JonathanGiles Apr 10, 2020

sima-zhu added the EngSys This issue is impacting the engineering system. label Apr 10, 2020

triage-new-issues bot removed the triage label Apr 10, 2020

sima-zhu added Client This issue points to a problem in the data-plane of the library. test-enhancement labels Apr 10, 2020

sima-zhu mentioned this issue Apr 10, 2020

Implement rolling strategy for live tests #9527

Closed

sima-zhu unassigned JonathanGiles Apr 16, 2020

sima-zhu changed the title ~~Rolling strategy on live test~~ Sparse live test matrix on client and service version Apr 16, 2020

weshaggard closed this as completed Apr 21, 2020

github-actions bot locked and limited conversation to collaborators Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse live test matrix on client and service version #10085

Sparse live test matrix on client and service version #10085

sima-zhu commented Apr 10, 2020 •

edited

Loading

weshaggard commented Apr 10, 2020

kurtzeborn commented Apr 14, 2020

sima-zhu commented Apr 14, 2020

weshaggard commented Apr 14, 2020

sima-zhu commented Apr 14, 2020

weshaggard commented Apr 14, 2020

weshaggard commented Apr 14, 2020

joshfree commented Apr 14, 2020

sima-zhu commented Apr 14, 2020 •

edited

Loading

weshaggard commented Apr 14, 2020

sima-zhu commented Apr 14, 2020 •

edited

Loading

weshaggard commented Apr 14, 2020

sima-zhu commented Apr 14, 2020

weshaggard commented Apr 14, 2020

sima-zhu commented Apr 14, 2020

sima-zhu commented Apr 14, 2020 •

edited

Loading

weshaggard commented Apr 14, 2020

sima-zhu commented Apr 14, 2020

g2vinay commented Apr 14, 2020

srnagar commented Apr 14, 2020

weshaggard commented Apr 15, 2020

sima-zhu commented Apr 15, 2020 •

edited

Loading

weshaggard commented Apr 15, 2020

JonathanGiles commented Apr 19, 2020

sima-zhu commented Apr 20, 2020

weshaggard commented Apr 21, 2020

Sparse live test matrix on client and service version #10085

Sparse live test matrix on client and service version #10085

Comments

sima-zhu commented Apr 10, 2020 • edited Loading

weshaggard commented Apr 10, 2020

kurtzeborn commented Apr 14, 2020

sima-zhu commented Apr 14, 2020

weshaggard commented Apr 14, 2020

sima-zhu commented Apr 14, 2020

weshaggard commented Apr 14, 2020

weshaggard commented Apr 14, 2020

joshfree commented Apr 14, 2020

sima-zhu commented Apr 14, 2020 • edited Loading

weshaggard commented Apr 14, 2020

sima-zhu commented Apr 14, 2020 • edited Loading

weshaggard commented Apr 14, 2020

sima-zhu commented Apr 14, 2020

weshaggard commented Apr 14, 2020

sima-zhu commented Apr 14, 2020

sima-zhu commented Apr 14, 2020 • edited Loading

weshaggard commented Apr 14, 2020

sima-zhu commented Apr 14, 2020

g2vinay commented Apr 14, 2020

srnagar commented Apr 14, 2020

weshaggard commented Apr 15, 2020

sima-zhu commented Apr 15, 2020 • edited Loading

weshaggard commented Apr 15, 2020

JonathanGiles commented Apr 19, 2020

sima-zhu commented Apr 20, 2020

weshaggard commented Apr 21, 2020

sima-zhu commented Apr 10, 2020 •

edited

Loading

sima-zhu commented Apr 14, 2020 •

edited

Loading

sima-zhu commented Apr 14, 2020 •

edited

Loading

sima-zhu commented Apr 14, 2020 •

edited

Loading

sima-zhu commented Apr 15, 2020 •

edited

Loading