Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove most of the time.Sleep from the e2e tests #2374

Merged
merged 1 commit into from
Apr 20, 2018

Conversation

aledbf
Copy link
Member

@aledbf aledbf commented Apr 18, 2018

What this PR does / why we need it:

A first step to remove hardcoded time.Sleep from e2e tests

Changes:

  • Before each test, a new namespace is created and a new ingress controller is deployed
  • In case of validation errors, we get the Full Stack Trace
  • The SlowSpecThreshold was adjusted to 20 seconds (default is 5)
  • Now it is possible to just run make e2e-test against minikube (previous upload of the docker image)

After this PR we can start running the e2e tests in parallel

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 18, 2018
@aledbf aledbf changed the title Remove most of the time.Sleep from the e2e tests WIP: Remove most of the time.Sleep from the e2e tests Apr 18, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 18, 2018
@aledbf aledbf force-pushed the improve-tests branch 3 times, most recently from 232ef22 to d962ecd Compare April 18, 2018 23:33
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 18, 2018
@aledbf aledbf force-pushed the improve-tests branch 4 times, most recently from 4961256 to 5584235 Compare April 19, 2018 12:27
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 19, 2018
@aledbf aledbf force-pushed the improve-tests branch 3 times, most recently from 5ae19cb to 779ad7f Compare April 19, 2018 13:53
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 19, 2018
@aledbf aledbf force-pushed the improve-tests branch 3 times, most recently from 74c26c8 to 7288e2e Compare April 19, 2018 14:32
@zrdaley
Copy link
Contributor

zrdaley commented Apr 19, 2018

I did some digging into why the Dynamic Configuration when session affinity annotation is present [It] should use sticky sessions when ingress rules are configured test was failing. It actually has nothing to do with the deployment not updating to the correct number of replicas in time.

The problem is from the ngx.timer.every call that syncs the backends in the lua balancer module. This call runs asynchronously every second.

From ngx.timer.every docs:

Because timer callbacks run in the background and their running time will not add to any client request’s response time, they can easily accumulate in the server and exhaust system resources due to either Lua programming mistakes or just too much client traffic.

Although sleeps are undesirable in tests, I feel that the simplest solution is to use one in this case. I have tested a few times with a 1 second sleep and this has been enough to prevent the error from occuring in Dynamic Configuration when session affinity annotation is present [It] should use sticky sessions when ingress rules are configured. I can't see this asynchronous call actually causing problems in reality.

@aledbf
Copy link
Member Author

aledbf commented Apr 19, 2018

@zrdaley thank you for digging into that issue. I am still cleaning the tests but I will add a timeout in the dynamic scenario you mention

@zrdaley
Copy link
Contributor

zrdaley commented Apr 19, 2018

No problem! One will also be needed in the dynamic configuration test Updating affinity annotation on ingress after the deployment is updated to increase the number of replicas for the same reason.

@aledbf aledbf force-pushed the improve-tests branch 2 times, most recently from fc2f0da to f79ed0a Compare April 19, 2018 15:41
@aledbf aledbf force-pushed the improve-tests branch 4 times, most recently from 083c690 to fb4fe94 Compare April 19, 2018 17:00
@codecov-io
Copy link

codecov-io commented Apr 19, 2018

Codecov Report

Merging #2374 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2374   +/-   ##
=======================================
  Coverage   39.12%   39.12%           
=======================================
  Files          73       73           
  Lines        5204     5204           
=======================================
  Hits         2036     2036           
  Misses       2878     2878           
  Partials      290      290

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3ff76a...62a80a3. Read the comment docs.

@aledbf aledbf changed the title WIP: Remove most of the time.Sleep from the e2e tests Remove most of the time.Sleep from the e2e tests Apr 19, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 19, 2018
@aledbf
Copy link
Member Author

aledbf commented Apr 19, 2018

/assign @ElvinEfendi

@@ -180,52 +189,46 @@ func (f *Framework) GetNginxURL(scheme RequestScheme) (string, error) {
// WaitForNginxServer waits until the nginx configuration contains a particular server section
func (f *Framework) WaitForNginxServer(name string, matcher func(cfg string) bool) error {
// initial wait to allow the update of the ingress controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is not needed anymore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -180,52 +189,46 @@ func (f *Framework) GetNginxURL(scheme RequestScheme) (string, error) {
// WaitForNginxServer waits until the nginx configuration contains a particular server section
func (f *Framework) WaitForNginxServer(name string, matcher func(cfg string) bool) error {
// initial wait to allow the update of the ingress controller
time.Sleep(5 * time.Second)
return wait.PollImmediate(Poll, time.Minute*2, f.matchNginxConditions(name, matcher))
}

// WaitForNginxConfiguration waits until the nginx configuration contains a particular configuration
func (f *Framework) WaitForNginxConfiguration(matcher func(cfg string) bool) error {
// initial wait to allow the update of the ingress controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pod.Status.ContainerStatuses[0].State.Running != nil {
return f.Logs(&pod)
if strings.HasPrefix(pod.GetName(), "nginx-ingress-controller") {
if isRunning, err := podRunningReady(&pod); err == nil && isRunning {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -46,6 +46,10 @@ func RegisterCommonFlags() {
// Randomize specs as well as suites
config.GinkgoConfig.RandomizeAllSpecs = true

// Default SlowSpecThreshold is 5 seconds.
// To low for the king of operations we need to tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/To/Too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

test/e2e/up.sh Outdated
@@ -27,7 +27,7 @@ curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.25.2/minik

echo "starting minikube..."
# Using sync-frequency=5s helps to speed up the tests (during the cleanup of resources inside a namespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably reword this to "Using lower value for sync-frequency helps..." so that we don't have to keep this up-to-date with the value we set.(currently they are not in sync 1s vs 5s)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ElvinEfendi
Copy link
Member

I love that we now have an isolated ingress-nginx deployment per test suite 😍

kubectl logs -n ingress-nginx -l app=ingress-nginx
exit 1
fi
sleep 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you decide to not keep the existing waitForPod approach? That seemed more resilient to me than hard coded 5s sleep.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only because of the sleep of 30 seconds. Also now we use the same wait logic than the rest of the tests

Copy link
Member

@ElvinEfendi ElvinEfendi Apr 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that 30s is a timeout threshold, it does not necessarily wait 30s. It checks every next second and breaks out once the pods are ready - so if the pods are ready before 5s it would return faster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also now we use the same wait logic than the rest of the tests

In the rest of the tests for ingress-nginx deployment we use WaitForPodsReady(i.e
https://github.com/kubernetes/ingress-nginx/pull/2374/files#diff-e427cee5cedcaaa18c6ac5414ea08183R379) which is more aligned with the previous implementation in this script IMHO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@ElvinEfendi ElvinEfendi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just few small comments/questions - LGTM in general!

@ElvinEfendi
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 20, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf, ElvinEfendi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 10fc254 into kubernetes:master Apr 20, 2018
@andrewloux
Copy link
Contributor

Great work on this PR @aledbf - didn't think I'd see this in master so soon!

@aledbf aledbf deleted the improve-tests branch April 20, 2018 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants