SEP 10 - PR Merge Requirements #13

Ch3LL · 2019-05-21T14:47:29Z

SEP to define requirements to merge a PR

sjorge · 2019-05-21T14:55:00Z

I have some questions about the test requirements... what if the bugfix are just pure coding errors?

e.g.

if var in ['a', 'b ']:
  # do stuff

When fixing this to the proper 'b' (no space), would that fall under the same extra reviewer requirement for exceptions? Or should such fixes en up in there own category with maybe slightless less strict test requirements?

What about fixing a small bug in code that currently does not have tests? Would a test only covering that specific edge case be enough or should all the mainline and other edge cases also be covered with tests before the PR with the fix can go in?

Ch3LL · 2019-05-21T15:16:44Z

In that particular example a test might actually come in handy. If that if statement was broken and caused the file state to not work properly, for example, adding test coverage ensures whatever logic is in that if statement always occurs when we expect it too.

But now that you bring it up though I think it might be good to define what code changes besides documentation, do not require a test before hand so contributors are aware when they need to write a test and when they do not, but still require the 3 reviews. Off of the top of my head I can think of:

cosmetic changes (for example changing a log from log.error to log.debug)
pylint fixes

There might be some more that I'm not thinking of. Anyone from the @saltstack/team-core have any other ideas?

Ch3LL · 2019-05-21T15:23:38Z

Sorry I did not see these questions before I posted:

What about fixing a small bug in code that currently does not have tests? Would a test only covering that specific edge case be enough or should all the mainline and other edge cases also be covered with tests before the PR with the fix can go in?

The requirement would be only specific to the changes you are making. Although more tests are always great :)

sjorge · 2019-05-21T15:27:02Z

Thanks for clarifying, it think that will be very important so that (new) contributers have a clear idea of what needs to be done.

Ch3LL · 2019-05-21T15:27:50Z

No problem. Thanks for reviewing. Once I get more input I'll update the SEP to clarify more of the exceptions.

max-arnold · 2019-05-21T18:14:40Z

Currently there are several cloud- and device-specific modules that have no tests (probably because it wasn't realistic to test them against a real system). What tests I'm supposed to add If I'm going to fix a bug in one of these modules?

Ch3LL · 2019-05-21T18:30:41Z

In these cases you can write unit tests, although it can be challenging to not over mock something. Our team can also try to get access to said cloud or see if there is an option to mock a device.

waynew · 2019-05-21T18:58:07Z

This will be SEP 10

aplanas · 2019-05-23T09:45:00Z

0000-pr-merge-requirements.md

+
+All PR requirements
+  - Approval Reviews: 1 approval review from core team member OR
+                      1 approval review from captain of working group


To avoid bottleneck, do make sense that two +1 from team members is also OK?

can you clarify this statement? Do you mean you want to require three approvals? currently as it is written this will require 1 approval for PRs unless we want to add an exception for tests not passing or test not written which will require 3.

Ah sorry. No. What I mean is that the +1 from the captain of the working group can be replaced with two +1 of other team members, if there is not any -1 from the captain or any other group member.

thanks for clarifying. We talked about this extensively and the reasoning for requiring the core team member or the captain was since the captain of the working group will be shepherding a particular feature set during a release we want to ensure they also have eyes on the changes coming in to ensure the team is progressing as expected. but i would love to see what the @saltstack/team-core thinks about this idea.

It sounds like what you're after is just making sure that it doesn't take forever to get stuff merged in. That's a fair desire, and I think at least in part what's driving the idea behind working groups.

I think largely it's a question of how diligent approvers are, and maybe one of trust? On the one extreme, we just trust everyone to write correct code and push it willy-nilly. On the other extreme we have one single individual who's merging code (see: Linux Kernel), and it doesn't make it in without intense scrutiny.

Personally, when I'm doing a PR the longest part of that task is actually just understanding what the change is supposed to be, and then in some cases running the code myself. Also checking for things like, "Does this even make sense??"

Of course, it's hard to tell if other people mean the same thing by that check mark. Then there's the whole idea of just getting more eyes on the code. I've found typos that other people have missed, and vice versa. Heck, sometimes I've come back and reviewed my own code and discovered problems that I missed!

I don't think that I have a strong feeling about what is/should be the right answer here. I have noticed (like yourself) that sometimes it takes too long to get something merged in. I think the biggest problem there is actually the one that we're addressing head-on: getting our testing green and reliable and requiring it to stay green before merge, even if the failure appears unrelated to the code at hand.

By doing that it's taking us longer initially, but as we make progress and fix those issues we will have more confidence in the test suite, and it will run faster. That means there's less time for other changes to get merged in causing conflicts and holding up the merge. So maybe that will fix the problem.

Or maybe we'll find out that we're still having issues merging fast enough.

I think for now my recommendation is that we err on the side of caution - keep the merge policy as is, but for our first working group postmortem we should also figure out if we've got good friction there, or if we've just got stumbling blocks. I think a certain amount is healthy for finding bugs before they're released in the wild. But if we're just getting in the way of progress, without a payoff, I think we should address that.

It sounds like what you're after is just making sure that it doesn't take forever to get stuff merged in.

You are correct. There are PR quite old in the backlog. But my main concerns are pointing also in another direction:

People have vacations, gets sick, live gets complicated, etc. If the captain is in this situation, only a core member can review the PR for this team. And I am not sure that this will scale.

The captain is part of a team. If only one component of the team is authoritative in reviews, there is no motivation for the rest of the team to review anything.

The captain sometimes will not be an expert in all the PR, so she/he can delegate some PR to other members.

I think largely it's a question of how diligent approvers are, and maybe one of trust? On the one extreme, we just trust everyone to write correct code and push it willy-nilly.

I would assume that we are professionals, knowledgeable and with good intentions. The review process is mostly to be sure that the quality standard is there, and no mistakes are slipped.

On the other extreme we have one single individual who's merging code (see: Linux Kernel), and it doesn't make it in without intense scrutiny.

AFAIK the Linux kernel is managed by a big team, Linus trust the team when merge into his authoritative repo. This is basically why git is a distributed source code manager, because he cannot scale.

Personally, when I'm doing a PR the longest part of that task is actually just understanding what the change is supposed to be, and then in some cases running the code myself. Also checking for things like, "Does this even make sense??"

You mean during the review? If so this is a good approach indeed and very valuable.

Of course, it's hard to tell if other people mean the same thing by that check mark. Then there's the whole idea of just getting more eyes on the code. I've found typos that other people have missed, and vice versa. Heck, sometimes I've come back and reviewed my own code and discovered problems that I missed!

Indeed.

I don't think that I have a strong feeling about what is/should be the right answer here. I have noticed (like yourself) that sometimes it takes too long to get something merged in. I think the biggest problem there is actually the one that we're addressing head-on: getting our testing green and reliable and requiring it to stay green before merge, even if the failure appears unrelated to the code at hand.

I very much agree here.

By doing that it's taking us longer initially, but as we make progress and fix those issues we will have more confidence in the test suite, and it will run faster. That means there's less time for other changes to get merged in causing conflicts and holding up the merge. So maybe that will fix the problem.

Ah I see. Do you think that the CI is the main reason of slower merge rate, and not by the lack of reviewers?

Or maybe we'll find out that we're still having issues merging fast enough.

I think for now my recommendation is that we err on the side of caution - keep the merge policy as is, but for our first working group postmortem we should also figure out if we've got good friction there, or if we've just got stumbling blocks. I think a certain amount is healthy for finding bugs before they're released in the wild. But if we're just getting in the way of progress, without a payoff, I think we should address that.

I agree to review any decision done, and learn from the experience.

And I understand the difficult of trusting external reviewers. IIUC the captain can be also a community member, if so the proposal in this document is in fact a way to open more the project that how it is today, and this is very valuable.

People have vacations, gets sick, live gets complicated, etc. If the captain is in this situation, only a core member can review the PR for this team.

There is a provision in the working group SEP (see #12) where the captains can delegate the responsibility.

And I am not sure that this will scale.

I'm sure you're right 😃

I do expect us to learn through this process, and improve by applying the lessons that we learn. If these working groups end out taking off with a lot more interest and velocity, that would be great! Probably a whole new class of problems that we'll have to face.

The captain is part of a team. If only one component of the team is authoritative in reviews, there is no motivation for the rest of the team to review anything.

I hope that's not seen as the case - I would review code whether or not my review was the gate for merging for the same reason that I mentioned earlier. Sometimes I manage to catch problems that other reviewers miss. The main reason that I review code is to catch defects earlier and more cheaply. I think that's a worthwhile attitude to cultivate on our teams.

The captain sometimes will not be an expert in all the PR, so she/he can delegate some PR to other members.

AFAICT anyone can still review PRs, requesting changes, or marking their approval. In these situations the captain should request that one of the other team (also) reviews the code.

I would assume that we are professionals, knowledgeable and with good intentions

I trust that you and I are 😄 I'm also paranoid about both careless and bad actors. My threat model is that someone, at some point, either because of a personal or political agenda will make an attempt to compromise the security of the Salt project because it is a high-value target. Heck, I'm not above suspecting that I could become compromised in some way. I'd like to believe that I never could be... but better to design the process with the highest chance of catching those attempts, intentional or otherwise

Ah I see. Do you think that the CI is the main reason of slower merge rate, and not by the lack of reviewers?

I do. I can't speak for everyone else on the team, but I've had review processes that look like this:

Review the code. Looks great.

Tests are still running. Leave browser open and go do other work.

Come back some time later and discover that the tests were green but the branch has been merged to. Drat!

Push a merge to the PR branch

Tests are still running. Leave browser open and go do other work.

Come back some time later to discover new test failures. Either fix it or relaunch build.

Come back some time later and discover that the tests were green but the branch has been merge to. Drat!

Repeat 2-3 times

Finally test is green and branch is up to date

merge

Having stable branches/tests should drastically improve this process. Especially if we can get a reliable functional/integration layer setup that we can run instead on most PRs. Right now our "integration" tests pretty much exercise the full salt, including the network, which is probably not necessary. If our tests just ran in seconds or minutes, we would be able to merge much faster!

And I understand the difficult of trusting external reviewers. IIUC the captain can be also a community member, if so the proposal in this document is in fact a way to open more the project that how it is today, and this is very valuable.

Yes, they can. We're working on opening things a lot more, which I'm super excited for!

aplanas · 2019-05-23T09:47:25Z

0000-pr-merge-requirements.md

+All PR requirements
+  - Approval Reviews: 1 approval review from core team member OR
+                      1 approval review from captain of working group
+  - All tests pass.


Some windows tests and some pylint are bit fragile. Also the CI sometimes fail because other random reasons.

I support this requirement: we cannot merge with all the tests passing, but I wonder what can we do about the cases where CI is not ready. Is there any mechanism where the contributor can help here?

If there is a case where we are okay with the tests not passing due to CI issues this would fall under the exception to require 3 reviews.

Is there any mechanism where the contributor can help here?

If there is a test that is failing that is not related to the PR you can definitely dive in and figure out why and push a separate PR to try to fix, but ultimately that responsibility lays on our team. If windows or pylint tests are failing constantly we need to dive in and figure out why. I think it makes sense to add some more documentation around how we have these tests setup so its easier for contributors to try to run them with the same infrastructure as us. I'll add the step to document this to this SEP.

I think it makes sense to add some more documentation around how we have these tests setup so its easier for contributors to try to run them with the same infrastructure as us. I'll add the step to document this to this SEP.

Thanks, I love that.

aplanas · 2019-05-23T09:49:38Z

0000-pr-merge-requirements.md

+
+Feature PR requirements:
+  - Test Coverage: tests written to cover new feature.
+  - Release Notes: Add note in release notes of new feature for relative release.


Ouch, that is new for me. Where are the release notes file?

doc/topics/releases

Thanks. Maybe is my fault, but I do not see this process documented.

If this is the case maybe we need to extend the page in saltstack that explain how to contribute, and add those pointers and the location of the releases file.

thanks for pointing that out. I'll include that in the documentation when we document PR requirements in the contributor docs after this is merged.

aplanas · 2019-05-23T09:50:18Z

0000-pr-merge-requirements.md

+Feature PR requirements:
+  - Test Coverage: tests written to cover new feature.
+  - Release Notes: Add note in release notes of new feature for relative release.
+  - Add `.. versionadded:: <release>` to module's documentation


How do we now the version when we are in develop branch?

it is the next release that is due to release, for example right now it would be neon as you can see from here: https://docs.saltstack.com/en/latest/topics/releases/version_numbers.html

the person reviewing the PR should be correcting the version if it is not correct.

So is OK to add .. versionadded:: TBD. Thanks.

yeah thats a great idea, then the reviewer can clarify the version. I'll include that in the docs as well :)

I like this TBD, also makes it easier for future release to grab all of those.
As the proces will be the same for every release instead of figuring out what you need code name wise.

to clarify the idea, the reviewer would inform the contributor what to change TBD to if they were not sure before we merge. We would not merge it with TBD, because we want to ensure the intended version is there before merge.

Is there a reason that someone would be merging a feature that wouldn't go into the next named release?

If we do require that, perhaps we could just have a simple text file in the project root like "NEXT_CODENAME" or something, so it's easy to find (and thus, use).

i think it would be easiest to update it here: https://docs.saltstack.com/en/latest/topics/releases/index.html since we already update it for releases, and it would be available in both the code base and the documentation online.

aplanas · 2019-05-23T09:52:03Z

0000-pr-merge-requirements.md

+  - Cannot merge your own PR until 1 reviewer approves from defined list above that is not the author.
+
+Bug Fix PR requirements:
+  - Test Coverage: regression test written to cover bug fix.


Do makes sense to create always an github issue and reference the issue number in the comment?

This will eventually make easy to track the status of the bugs.

i like that idea :)

cachedout · 2019-05-31T14:47:01Z

0000-pr-merge-requirements.md

+  - Point to the issue the PR is resolving. If there is not an issue one will need to be created.
+
+Feature PR requirements:
+  - Test Coverage: tests written to cover new feature.


(A wild @cachedout appears)

IMHO this is going to be very difficult to achieve for a large number of cases.

The first, and most substantial barrier to this is that new features are rarely built in isolation and instead, usually enhancements to features which already exist.

However, some of the existing features do not have test coverage. As such, a rule like this becomes extremely burdensome. Effectively, this means that if I want to add a function to a module which does not have test coverage, I'm suddenly tasked with creating a testing framework around that module which might be a task which is much larger than I'd prefer to take on -- especially in the case below where this policy seems to apply to bug fixes as well.

If you just look at unit test coverage for modules right now, this condition wherein a module exists but no unit test file is present would apply in just under half the cases. (Somewhere near 43% with rough math.)

This represents a substantial barrier to entry for new feature development. It does not seem to me to be a responsible policy to implement until such time as their is at least some basic testing in place for all modules. Is there a plan to achieve this at present?

I see below that there is an exemption for this policy which means that that PR is left open for three months but it's not clear to me what good that does. Could somebody please clarify?

(A wild @cachedout appears)

IMHO this is going to be very difficult to achieve for a large number of cases.

The first, and most substantial barrier to this is that new features are rarely built in isolation and instead, usually enhancements to features which already exist.

I do not see the problem here. I extended some tests or created new ones when I changed the behavior of a current feature, or create a new one when the feature is new.

However, some of the existing features do not have test coverage. As such, a rule like this becomes extremely burdensome. Effectively, this means that if I want to add a function to a module which does not have test coverage, I'm suddenly tasked with creating a testing framework around that module which might be a task which is much larger than I'd prefer to take on -- especially in the case below where this policy seems to apply to bug fixes as well.

I think that this makes the rule even better. IMHO there is no excuse to improve the code in any commit. The poor state of the current coverage is not an excuse to make it even worse.

Another argument is also the quality of the current tests. Some of them are not in good shape either, but requiring to improve the current tests related with the PR under review can be a debatable option.

On the other hand, I interpret the new part of the sentence as that you only need to cover the new feature behavior, but not the old one. So this means that, strictly speaking, only a partial test could be OK.

This represents a substantial barrier to entry for new feature development. It does not seem to me to be a responsible policy to implement until such time as their is at least some basic testing in place for all modules. Is there a plan to achieve this at present?

Maybe is only me, but not forcing test coverage in something like salt is a very worrisome PoV. Executing code in production machines that is not tested by any unit test / integration test is an argument to not to trust this code.

Executing code in production machines that is not tested by any unit test / integration test is an argument to not to trust this code.

I agree. However, being unable to get reliable signaling from the tests themselves is a much greater liability. When tests are required, it puts contributors in the position of writing tests simply for the sake of writing tests. Too often, these tests are poorly constructed, unhelpful, and unmaintained.

In my view, it is much better to encourage tests, of course, but to give contributors the freedom to make tests when it makes sense to do so. A blanket policy removes the ability for contributors to make sensible decisions in this regard and there are many examples of tech debt accumulated in the project in this manner.

Executing code in production machines that is not tested by any unit test / integration test is an argument to not to trust this code.

I agree. However, being unable to get reliable signaling from the tests themselves is a much greater liability. When tests are required, it puts contributors in the position of writing tests simply for the sake of writing tests. Too often, these tests are poorly constructed, unhelpful, and unmaintained.

But there are reviewers that can evaluate what the test is doing.

If some developer, for the sake of speed decide to do a test that is equivalent to:

result = module.function(params) assert result == 'expectation' or True

a reviewer must be aware that this is useless and help to design a better test. And do not merge the code until this issue is fixed.

On the contrary, if the developer understand that the test is optional, so be it and no test will be done. IMHO this will not improve the situation.

In my view, it is much better to encourage tests, of course, but to give contributors the freedom to make tests when it makes sense to do so. A blanket policy removes the ability for contributors to make sensible decisions in this regard and there are many examples of tech debt accumulated in the project in this manner.

I do not see how making the tests optional will decrease the technical deb (sorry if I am misunderstanding your point)

Hey wild @cachedout and @aplanas! Thanks for the comments!

On the other hand, I interpret the new part of the sentence as that you only need to cover the new feature behavior, but not the old one. So this means that, strictly speaking, only a partial test could be OK.

That's the prevailing attitude. Having zero tests is not good. Having basic tests of some sort is kind of the MVP.

I agree. However, being unable to get reliable signaling from the tests themselves is a much greater liability. When tests are required, it puts contributors in the position of writing tests simply for the sake of writing tests. Too often, these tests are poorly constructed, unhelpful, and unmaintained.

But there are reviewers that can evaluate what the test is doing.

Precisely.

The purpose of the tests is to verify that 1) code is behaving as intended and 2) future code changes do not have unintended side effects.

The purpose of code reviews is so that humans can verify that the code is doing what it's supposed to do. After all, it's pretty trivial to write code that always passes tests, if you don't care what the tests or code is doing.

Too often, these tests are poorly constructed, unhelpful, and unmaintained.

I have encountered several of these tests 🙂

unmaintained

This is the part that we're trying to fix. Maybe it's worth updating this SEP to specify some of the human-oriented parts of this.

We're not requiring tests to satisfy some arbitrary notion that we should have test coverage, nor are we mandating that X style of tests are written, nor even X level of code coverage.

But we do expect that:

If you're fixing a bug, you can write a test that exhibits the buggy behavior. And then write the fix for it.

If you're adding a feature you should be able to codify how that feature is supposed to work. By spelling that out with an automated test 😄 It doesn't necessarily have to run the master-minion gauntlet, or really even hit the underlying system - though preferably the tests+code can show what happens when the system isn't in the "happy path" state.

What we've seen is that many, if not most, of our bugs can be traced back to ignoring test failures and merging code, even if it's (apparently) unrelated to the test failures. We're not quite at a point where we have full confidence in our tests, but that's what we're after. If there's a better way to communicate this in the SEP, I think we should consider updating the text.

@cachedout you're absolutely right that we shouldn't just require tests for testing's sake.

Do you have any suggestions for how we could better express what our goals are, and why we have that requirement?

This is why I am choosing to call it out as a specific form of technical debt in which the debt becomes due at the point at which a test may be incorrectly reporting a working module when it turns out that the module is not functional from the perspective of the user.

I'm not sure that this is different at all from not having tests. The benefit of having tests, however, is that when frobnosticate upgrades from version 3 to 4, now we have tests for version 3, so when we make updates for version 4 and we inadvertently break code for version 3 we have a quick feedback showing that we did, in fact, break existing behavior.

It is honestly troubling that you see no distinction between the two. I'm not being hyperbolic here.

Scenario 1: Test exists which isn't actually testing the module, but rather the other logic in the function, with external dependencies mocked. The test passes because we've controlled for the operation of the module. Does it work? We don't know for sure, but the passing test lets us think that it is working.

Scenario 2: No test. Does it work? We have no idea.

When I say that I'm unclear on the distinction, what I mean is that I don't see why technical debt enters into the discussion.

The reason I say that is because the discussion is around this question:

What happens when system outside of Salt changes in a breaking way?

No tests

Salt breaks

Only tests that aren't an accurate reflection of the underlying system

Salt breaks

So why does technical debt about those tests enter the conversation? That's what I don't understand. Tests that don't use real 3rd party systems do not tell us anything about 3rd party systems. Is anyone arguing that they do?

These are not remotely the same. Sure, a unit test where you've mocked out the thing you're testing helps keep you from breaking logic with later changes to Salt, and that's not nothing. But it's also not a substitute for properly testing the operation of the module, and it doesn't help you to detect upstream changes.

Right. I think that's what everyone is saying.

So when the upstream module changes, you'd better hope it does so in a way that causes the mocks you wrote to no longer allow the test to pass, or else you're oblivious to a change that is meanwhile causing the module to break for the people actually using that module.

I don't think anyone is disagreeing with that statement either?

So yes, there's a cost to maintaining the tests, but it's not like the alternative is free. There's a cost there, too.

I don't think that anyone is arguing that having no tests isn't costly, so let's dispense with this straw-man argument, please.

What are we arguing about anyway? I just looked back through this discussion... ah. Okay, now I see. The question is about this statement:

Test Coverage: tests written to cover new feature.

And the question is "what good will this do?

The answer then, should be this:

Provide an increased confidence in the code in question

With accompanying tests, at least there is enough understanding of what the code should be doing that automated tests can be written

Spending some extra time writing tests may indicate more thought/care went into the design.

Unit tests provide increased confidence that code does not change in an unexpectedly backwards-incompatible fashion

Functional tests provide increased confidence that salt interactions with the underlying systems behave as expected

End-to-end/Integration tests provide increased confidence that Salt as a whole operates correctly.

Unit tests are just one part of a test infrastructure, arguably a minor part

Agreed.

, and the fact that they're easier to write will make them the go-to if we have a dogmatic policy to require tests instead of making the focus what it should be: to come up with the best way to accurately test how the software actually works.

🤷‍♂ That may be correct. But is it better to have no tests, or some tests? Because what we're talking about is the difference between merging code that has no tests, and merging code that has at least some tests. Or changing this SEP to be more explicit about test requirements.

My personal opinion is that currently the way it's stated is fine. My current experience says that some tests are better than no tests. And not spelling out exactly what kinds of tests we're requiring gives the community more flexibility - just want some unit tests? Cool, that's a great start. Want to create an entire set of tests, with unit, functional, and integration tests, as well as extensive documentation and examples? Awesome!

But that's just my strong opinion, weakly held.

I'd recommend watching this excellent talk from this year's PyCon: https://www.youtube.com/watch?v=Ldlz4V-UCFw

Nice! That definitely illustrates a lot of the challenges to proper testing, and I've definitely seen those problems myself. Hopefully created them less often 😝 I know that I definitely have a preference for the stub-style doubles and DI/functional style code/tests myself.

I know that we will probably get some PRs that have tests that mock too much, and some that will commit other anti-patterns. My hope is that we'll catch these during the PRs and can give people guidance on how to write better tests. I think this is a worthwhile first step that will produce a lot of good information that we can use as we work to improve our test infrastructure. I think it's a Goldilocks requirement - it's not too small and it's not too big, it's just right.

The reason I say that is because the discussion is around this question:

What happens when system outside of Salt changes in a breaking way?

No tests

Salt breaks

Only tests that aren't an accurate reflection of the underlying system

Salt breaks

So why does technical debt about those tests enter the conversation? That's what I don't understand. Tests that don't use real 3rd party systems do not tell us anything about 3rd party systems. Is anyone arguing that they do?

No. For the umpteenth time, the concern (and an entirely valid one) is that this will lead to unit tests being over-represented due to their expediency. The technical debt comes from waiting to write better tests, or just not writing them at all.

Exhibit A:

% find tests/unit -type f -name test_\*.py | wc -l 726 % find tests/integration -type f -name test_\*.py | wc -l 224

This ratio is not going to get better if we mandate that tests are written without making it easier to write more accurate tests.

My current experience says that some tests are better than no tests.

This is reductive to the point of absurdity.

I know that we will probably get some PRs that have tests that mock too much, and some that will commit other anti-patterns. My hope is that we'll catch these during the PRs and can give people guidance on how to write better tests. I think this is a worthwhile first step that will produce a lot of good information that we can use as we work to improve our test infrastructure.

"Hope" is a particularly bad strategy on which to base the future of the test infra. You can "hope" all you like that we're going to use unit tests as a starting point and build around them, but as someone who is far more familiar with Salt and how we've approached testing over the years, I can say that this is not very likely. The tests that are written all too frequently just become "the tests" for that module, unless repeated bug reports cause us to write better ones.

A hard requirement like this is for after we've approached creating better ways to test complex things. Not before.

This represents a substantial barrier to entry for new feature development. It does not seem to me to be a responsible policy to implement until such time as their is at least some basic testing in place for all modules. Is there a plan to achieve this at present?

I see below that there is an exemption for this policy which means that that PR is left open for three months but it's not clear to me what good that does. Could somebody please clarify?

Just want to go back to your original questions surrounding what happens when a PR is left open @cachedout

I do think that we should more clearly define how we would approach the situation where a user is not able to write a test and we do not want to merge without test coverage. One idea was to run community salt-sprints where we add tests for these specific PRs, but I think its fair to say that we should plan these out so they occur often, and are not just wishful thinking. These test focused salt-sprints could also slowly try to tackle this concern:

"If you just look at unit test coverage for modules right now, this condition wherein a module exists but no unit test file is present would apply in just under half the cases. (Somewhere near 43% with rough math.)"

What ya think of this approach? I'm sure you will point out something I'm missing ;)

This could work, my concern is that how do you make sure that the subject matter expert is in charge of making the tests for the module?

I could definitely see people writing tests for things that they don't understand and then all that is being done is writing tests for coverage purposes instead of for actually testing the important bits.

Great question. Maybe we ensure certain test coverage is assigned out to the appropriate work groups during the sprint? Although this only covers certain areas for now as we have only started up a couple work groups this time around.

I'm not sure how to ensure someone is a subject matter expert unless I'm familiar with them in the community, just the same as i do not know if they are an expert when they push a pull request. The only other thought I have is to make it clear in the announcement beforehand that these are the tests we are covering and the areas and if you have expertise in this area please join to help write tests. Also to make it clear the intention was also to include team-core in this sprint to help tackle these test issues as well so they should be able to help in some of those areas that others do not have expertise in.

I think another thing that will help is for our team to triage the PRs/tests beforehand and we can add a bit of direction on each PR on how someone could add adequate test coverage.

thatch45 · 2019-06-17T16:36:47Z

Ok, I think that we are close to completion here but we have a single sticking point, and that is the area of what test authoring looks like.

We have a number of situations to cover, multiple testing approaches and models to consider and we can't cover all of it in a single line:

- Test Coverage: tests written to cover new features.

What we need to do is define a few things:

Where unit tests work well and are considered a good solution.
Where integration tests should be required
What situations don't require tests because the test suite is not expansive enough
If tests cannot be written for a feature for reasonable reasons how is the lack of coverage documented so that a manual test can be run to cover the situation
Document and roadmap where the test suite needs to be expanded to fill in these gaps

We need to make sure that when someone submits a PR the rules around tests are clear. The goal is to have PRs create well-made tests that cover new features. But in looking at this conversation I fear it has devolved into a number of points of squabbling.

So let us step back and ask. What are the macro questions that need to be answered to achieve the goal that we all share? These 5 points that require outlining are the ones I see, what are we missing?

max-arnold · 2019-06-27T16:22:37Z

I'd like to raise a design review question.

Submitted PRs sometimes solve very specific use-case (or are just quick hacks) at the cost of increased complexity, combinatorial explosion of possible code paths, and feature fragmentation (I'm mostly talking about configuration options and kwargs that alter function behavior). This increases the overall impression of Salt as complex and buggy.

Other PRs solve an important use-case, but in a very limited way - for one specific module, mode or CLI utility (salt, salt-call, salt-run or salt-ssh). This also results in feature fragmentation, non-DRY code (when other people start to copy the feature) and other downsides.

Just a few recent examples:

Second (tangential to already existing configuration option) way to disable grains was merged, despite some concerns from reviewers: Add parameter named grains_blacklist salt#49481
A case when a more generic feature supersedes the narrow one (both were merged into the same major release branch): Add parameter named grains_blacklist salt#49481 Luckily, this was successfully resolved. I think this is release management and product management issue.
An important problem that is solved in a hacky way by introducing two more configuration flags: Documentation on skip_grains salt#53603 (comment)

Requests to saltstack/team-core for a review are usually left unanswered...

I like how such problems are handled in Django:

They are pretty much against introduction of new flags, unless it is strictly necessary or is part of a deprecation process
Instead of one-off features, they always push for extendability and more systemic (architectural) decisions

So, the question is: how the PR review process could be improved to prevent these long-term consequences?

mattp- · 2019-06-27T18:38:44Z

I tend to agree pretty unilaterally with @max-arnold . Maybe a core working group could aid in such a thing?

waynew · 2019-06-28T21:16:58Z

@mattp- I think that would be a great thing for the core working group to tackle. I'm not sure what the correct answer is here. One thing I have noticed over the last several months when working with @thatch45 is that he has a pretty phenomenal mind for that large picture stuff. We'll get discussing things and he's able to take a quick step back and say, "This is the way to do this." And so far it's always been, "Ahhh, yes, that is it."

I don't know how we can bottle up Tom and distribute him around, though ;)

I do agree with @max-arnold though that there's definitely some more holistic things that we need to address with PRs from an architecture/project level. I also feel like that's a bit more of a fuzzy gray area.

Like when it comes to tests there are some pretty straightforward metrics that we can use to gauge their usefulness (of course there are also some fuzzy parts to that, too).

I think we should consider some sort or architectural guidelines for the project in a separate SEP/working group, and keep this focused a lower level.

Tom had a couple of questions that I think we should address before this enters the final comment period:

Where unit tests work well and are considered a good solution.
Where integration tests should be required
What situations don't require tests because the test suite is not expansive enough
If tests cannot be written for a feature for reasonable reasons how is the lack of coverage documented so that a manual test can be run to cover the situation
Document and roadmap where the test suite needs to be expanded to fill in these gaps

Re 1:

I think unit tests are most useful when testing a logical "chunk" of code. We have a lot of code within Salt that could be a standalone, agnostic function that doesn't actually (need to) interface with the outside world. I forget if it was a link posted here or somewhere else, but the general principle was that the function doing the IO should be calling the functions doing the logic. I like that a lot. It allows you to do nice black-box testing of the function(s), so mocks (or at the very least, monkeypatching) is not necessary.

I think unless your functionality is strictly doing IO, unit tests are highly appropriate.

There may also be a place for mocking out IO/3rd party pieces as a way of testing that our code works under a certain set of assumptions.

I like the idea of gating PRs on running these tests.

Re 2:

I've seen conflicting definitions of what makes a function/integration test, so I'm going to define them this way:

functional test
This is a test that does real interaction with the system. It may fake out parts of an operation for long running/complex interactions/isolation. It may also execute in a way that Salt would never execute it - that is, without a master or a minion created, pillars loaded, etc.

integration test
This is a test that works the way that Salt expects to work. It may bypass a cli, but it will have an actual master/minion/etc created.

I'm not attached to those terms, I just want to make sure that when you read my thoughts you understand what I mean when I use them.

I believe most modules should have functional tests attached. I think it's a very good idea, and if there are any libraries that we're using the functional tests should exercise those modules - the better to catch changes in APIs, etc.

I think we should also gate PRs on some set of functional tests. Maybe the full battery, I'm not sure.

I think it's important to have integration tests for all of the states and modules in Salt. For one, they provide examples of how we should be using the state/module. For bug fixes I think it's 100% necessary to create an integration test that duplicates the behavior we've seen in the wild to prevent regressions. I've seen a number of bugs that were introduced that could have been caught this way.

I think PRs should be gated on the integration tests that were included in the PR, and possibly the others that involve the module.

Releases, of course, should absolutely be gated on on the entire battery of tests running - unit, functional, integration, and manual tests.

Re 3:

Obviously doc changes shouldn't require tests.

Typos may not require them.

I think if a code block is overly complicated and the change is minor enough that it's tolerable to not require tests. If we can test it with minimal (<8h? <4h?) effort, I think it's appropriate to write some sort of test, and I have a strong preference for that, if only to prevent regressions.

Re 4:

I think we could have a manual_tests directory or .txt file (there should be a limited number of them, right? RIGHT???) where those are documented.

I'm also a fan of labeling code with some ASCII art:

        ,     \    /      ,        
       / \    )\__/(     / \       
      /   \  (_\  /_)   /   \      
 ____/_____\__\@  @/___/_____\____ 
|             |\../|              |
|              \VV/               |
|        - Here be Dragons -      |
|        This code has no         |
|        automated tests,         |
|        change at your peril!    |
|_________________________________|
 |    /\ /      \\       \ /\    | 
 |  /   V        ))       V   \  | 
 |/     `       //        '     \| 
 `              V                '

But maybe that's just me 😂

**Re 5: **

I think this falls under the idea of having an architecture/core/steering team.

Anyway, those are my thoughts.

Tests are good, we should come up with an enhancement to the PR review process taking a higher level view in something other than this SEP, and get this into Final Review.

Do we have any proposals for adjusting the wording around the test requirements? Or should we leave it as-is and enter the final comment period (i.e. 1-week)?

Ch3LL · 2019-07-01T20:42:46Z

I can definitely update the SEP with more details to answer the questions @thatch45 brought up. And I think you have outlined things nicely @waynew

My only concern (before I update the SEP) is this question:

If tests cannot be written for a feature for reasonable reasons how is the lack of coverage documented so that a manual test can be run to cover the situation

I don't think we should take the approach to add a manual test when tests cannot be added, rather we should take the time to answer how can we update the test suite to ensure it can be tested. I think manual tests should be a rare exception and really never imo. We are trying to automate our current manual test run and adding more tests would deter this work and we would need to automate it later on anyways. So I think we should pause, slow down and figure out how to add it now.

In cases where it honestly cannot be tested with integration tests, unit tests can still be added as this does provide some coverage. Although, as it has been pointed out unit tests for many areas of salt should not be considered full test coverage, especially in the execution/state modules in salt. So I think it would be useful to document the current test coverage for states/execution modules in salt in our documentation. This would allow users and the community to be aware of which modules have full test coverage. We can outline all the states and execution modules, and define which ones have integration or unit tests, and define what that means for a user using the module. Or we can document the current test coverage on each modules documentation, which might be easier for someone to update as tests get added.

waynew · 2019-07-03T20:41:22Z

In cases where it honestly cannot be tested with integration tests, unit tests can still be added as this does provide some coverage. Although, as it has been pointed out unit tests for many areas of salt should not be considered full test coverage, especially in the execution/state modules in salt. So I think it would be useful to document the current test coverage for states/execution modules in salt in our documentation. This would allow users and the community to be aware of which modules have full test coverage.

I believe I agree with that. Codecov should be able to help a lot with that? I think it's possible to inspect subsections of the code using codecov - perhaps it's possible to us a badge or something in the documentation for each (part?) of the modules.

cachedout · 2019-07-04T08:39:21Z

Codecov should be able to help a lot with that?

There are caveats. Tools like Codecov don't have a good understanding of concepts like the Salt loader. A code coverage tool will know if code was exercised via a direct import and subsequent function call, but not via something like __salt__. I would be quite wary of using Codecov as an indicator of test coverage unless there was a plan to address this.

waynew · 2019-07-05T14:50:44Z

TBH, I think that's actually fine if codecov only handles direct imports. As you've mentioned at various times, we should be testing against real, underlying systems (just a for-instance, file.chattr has some specific requirements in the underlying applications that drive it). It's possible to do a direct import salt.modules.file and then run tests against salt.modules.file.chattr without exercising the entire master-minion communication cycle.

I suspect we could have a full coverage run of the entire states/modules this way, as well as unit tests in ~1hr. Obviously that's a gut feeling driven by several experiences. But if we had a fast set of tests that would run in this way for each PR (and perhaps an extra smoke test that took 15m or so), and then a longer much more thorough test suite, I think that would be pretty sweet.

All of that as background to my statement that I think it's OK to rely on tools like codecov 😄 I don't think it's the only way, or maybe even the best way. But it seems like a reasonable tradeoff to me.

gtmanfred · 2019-07-08T15:51:33Z

I was thinking about this this morning when looking at some prs. I was thinking this SEP was also including turning on the full test suite for all prs.

When Brett and I started working on the test suite in November of 2017, we had a goal of reducing the cost of running the test suite for all the PRs, while still getting results that we could merge, and fixing any errors for extra stuff by spending a lot of time monitoring the test suite, and fixing them as the popped up, instead of putting that load on the people trying to give us free bug fixes, and new features for updated libraries.

At the time we were on Linode, and we were spending about $4k a month on the test suite, between prs and just daily test suite runs. So the plan was to move to OpenNebula in our own datacenter and run the tests from there. We ran more problems with time there, because the internal salt cloud on open nebula wasn't on the best hardware, until enterprise's test suite was put on a new hypervisor, and we got some performance back, but it wasn't enought, so we decided to move to aws instances, because they were running faster at the time (2018.3 wasn't released yet, and 2017.7 and 2016.3 were running for 5 hours on Linode, and only took about an hour to complete on AWS) We were able to reduce the cost from $4k on Linode to about $800 a month on AWS. (at the time)

It is worth mentioning that right now salt is spinning up between 14 - 28 VMs per PR. (because it has 2 Jenkins instances, the 14 tests are for develop, which has had the py2 tests disabled, so once the Jenkins migration is done, this will be half as many).

Right now it the vms are up between 6 and 11 minutes, so for a the test suite to run on a pr, if I am generous, it is going to cost 24¢ (.17$/hour * 6min/60min/hour * 14) per pr to run the tests. (and as much as 50¢, but 24¢ is a good estimate, especially when it should be less once the extra jenkins migration is over)

(.17 being the cost of a c5.xlarge from what I remember us using when I still worked at salt) https://aws.amazon.com/ec2/pricing/on-demand/

When you enable running the full test suite again, the test suite averages taking 7 hours (for python3, so you can't get away from this by disabling python2 which is actually about 2 hours faster)

So lets do that math, (.17$/hour * 7 hours/1hour * 7) 7 is being generous again that we are disabling py2 tests. That is $8 per pr increase to test the whole test suite and that is PER time the test suite is run on a pr, which almost never happens that it is only run once.

So at the worst estimate for the test suite right now with only related tests running, and a generous estimate for what it would turn into, that is a 32x increase in hit for price of running the test suite? And that is also assuming that the test suite isn't run more times in the future to get all the flaky tests to pass on one PR, which will result in more test suite runs per PR.

Is salt willing to fund this or have yall fallen into new money that wasn't available 9 months ago and are willing to part with that?

Just food for thought, because that is a lot of money.

gtmanfred · 2019-07-08T17:40:31Z

Note: the previous comment only pertains to running the full test suite on all prs. I cannot find where that was being discussed previously.

Ch3LL · 2019-07-18T16:25:22Z

This SEP does not include running the full test suite, but I'll try to answer some of your concerns as best as I can and route some of the other questions to other people on the team.

"When you enable running the full test suite again, the test suite averages taking 7 hours "

There is a known issue on neon and develop where the test suite takes over 6 hours to run, we are currently diving into this issue. For Linux tests we should only be seeing 2-3 hour runs. So hopefully we can figure this out and that will decrease test run times.

Also pytest was added to neon last week to py2/py3 tests and we will also be adding it to develop. With this migration to pytest it makes things a bit easier to parallelize. I cannot remember the numbers exactly but I think when we added parallelization to a PR test run that pushed the test run down to an hour. Is that right @s0undt3ch ? We are not entirely migrated to parallelizing the jobs yet because we did run into some issues with AWS limits that we are trying to iron out. But we are putting efforts into these parallel efforts which will hopefully help with future test run times on PRs.

We are also reviewing the current tests in the test suite to see if we can speed up tests that take up a lot of the test run time. We have also spent a lot of time trying to ensure a lot of flaky tests are not flaky anymore and fixed. Although this list has not completely decreased across the board.

"Is salt willing to fund this or have yall fallen into new money that wasn't available 9 months ago and are willing to part with that?"

We understand that there will be an increase temporarily while we try to make some other advancements in the test suite. I do not know what the exact costs are as I have not been directly included in that part of the project and @s0undt3ch and @bryceml might be able to elaborate there. All of our efforts are currently directed to improving the test suite and ensuring all tests are always passing and stable. Ensuring all the tests run on each PR before we merge as of right now has helped us to ensure these efforts don't keep getting pushed backwards.

s0undt3ch · 2019-07-18T16:52:26Z

I cannot remember the numbers exactly but I think when we added parallelization to a PR test run that pushed the test run down to an hour. Is that right @s0undt3ch ?

Yes, that's correct, and that's one hour for windows, linux'es finished earlier.
As @Ch3LL mentioned, we hit some limits and we backed out of parallelization. The focus now is improving slow tests to a point we might not even need to parallelize, but the support, under pytest, exists.

I do not know what the exact costs are as I have not been directly included in that part of the project and @s0undt3ch and @bryceml might be able to elaborate there.

I don't know exactly our costs, @bryceml might be able to shed some light there.

bryceml · 2019-07-18T18:04:21Z

These are our costs in the saltstack-ci account, which is just drone.saltstack.com, jenkinsci.saltstack.com, and all their slaves. We started using the new jenkins in the saltstack-ci account as our primary jenkins server July 8th.
https://github.com/bryceml/salt/raw/temp/Screenshot_2019-07-18_11-48-19.png

test-kichen is kitchen nodes, jenkinsci is the jenkins slaves, everything else is part of drone or the jenkins master itself.

The actual cost to us is 15% less than the values on the chart.

As we introduce new os's, and globally after neon is released we are planning on switching to m5.large instead of c5.xlarge on ami's that support it, which should reduce costs since it seems that a lot of tests primarily use only 1 core of the VM.

Ch3LL · 2019-08-02T15:48:01Z

I'm going to try to summarize everything here with all of the discussions and comments that where made related to the SEP changes and put it in final comment period before votes.

We will require tests on PR's but we need to make it clear via documentation and reviews where unit and integration tests are most valuable. We also need to make it clear that only unit or only integration tests should not be viewed as full test coverage. Changes that do not require tests: documentation, cosmetic changes (ex. log.debug -> log.trace), fixing tests, pylint, changes outside of the salt directory.
We will update the test documentation to make it easier for contributors to run the tests as they are run on Pull Requests. This will make it easier for them to troubleshoot any failures.
In the case where a developer cannot write a test, the Needs Testcase label will need to be added to the PR. We will document how other developers can filter on this label to help write tests for PRs that have not been merged in because of this new requirement. We will also look into planning community events around test coverage and writing tests for PRs with the Needs Testcase label.
Document where we are lacking in test coverage and create a plan to add this test coverage into the test suite.
All Tests need to pass before merging in the PR.
If an exception needs to be made where a test cannot be written or we are okay with some of the tests failing 3 votes are required before merging.

PR Merge Requirements

6883882

add clarifications

1cbd5e3

waynew changed the title ~~PR Merge Requirements~~ SEP 10 - PR Merge Requirements May 21, 2019

aplanas reviewed May 23, 2019

View reviewed changes

add doc clarifications and bug PR requirements

a714ece

cachedout reviewed May 31, 2019

View reviewed changes

Akm0d self-requested a review June 7, 2019 17:44

Akm0d approved these changes Jun 7, 2019

View reviewed changes

Add line about merge access two-factor auth

745470a

Ch3LL added 2 commits July 24, 2019 10:55

add more clarifications

4cc7c4c

Add documentation test coverage need

4d4b6ec

Ch3LL added the Final Comment Period Speak now or forever hold your peace. label Aug 2, 2019

waynew approved these changes Aug 22, 2019

View reviewed changes

garethgreenaway approved these changes Aug 22, 2019

View reviewed changes

garethgreenaway approved these changes Aug 28, 2019

View reviewed changes

s0undt3ch approved these changes Aug 28, 2019

View reviewed changes

thatch45 approved these changes Sep 4, 2019

View reviewed changes

Ch3LL merged commit 8230e15 into saltstack:master Sep 4, 2019

Ch3LL mentioned this pull request Jan 14, 2020

Write docs for SEP 10 - PR Merge Requirements saltstack/salt#55862

Merged

waynew added Accepted and removed Final Comment Period Speak now or forever hold your peace. labels Jan 24, 2020

SEP 10 - PR Merge Requirements #13

SEP 10 - PR Merge Requirements #13

Conversation

Ch3LL commented May 21, 2019

sjorge commented May 21, 2019 • edited Loading

Ch3LL commented May 21, 2019

Ch3LL commented May 21, 2019

sjorge commented May 21, 2019

Ch3LL commented May 21, 2019

max-arnold commented May 21, 2019

Ch3LL commented May 21, 2019

waynew commented May 21, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cachedout May 31, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thatch45 commented Jun 17, 2019

max-arnold commented Jun 27, 2019 • edited Loading

mattp- commented Jun 27, 2019

waynew commented Jun 28, 2019

Ch3LL commented Jul 1, 2019

waynew commented Jul 3, 2019

cachedout commented Jul 4, 2019 • edited Loading

waynew commented Jul 5, 2019

gtmanfred commented Jul 8, 2019 • edited Loading

gtmanfred commented Jul 8, 2019

Ch3LL commented Jul 18, 2019

s0undt3ch commented Jul 18, 2019 • edited Loading

bryceml commented Jul 18, 2019 • edited Loading

Ch3LL commented Aug 2, 2019

sjorge commented May 21, 2019 •

edited

Loading

cachedout May 31, 2019 •

edited

Loading

max-arnold commented Jun 27, 2019 •

edited

Loading

cachedout commented Jul 4, 2019 •

edited

Loading

gtmanfred commented Jul 8, 2019 •

edited

Loading

s0undt3ch commented Jul 18, 2019 •

edited

Loading

bryceml commented Jul 18, 2019 •

edited

Loading