Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing failures in system tests #5683

Closed
jywarren opened this issue May 8, 2019 · 18 comments
Closed

Timing failures in system tests #5683

jywarren opened this issue May 8, 2019 · 18 comments
Labels
bug the issue is regarding one of our programs which faces problems when a certain task is executed high-priority testing issues are usually for adding unit tests, integration tests or any other tests for a feature

Comments

@jywarren
Copy link
Member

jywarren commented May 8, 2019

We're seeing intermittent test failures due to Capybara not waiting long enough; now:

ERROR["test_posting_from_the_editor", #<Minitest::Reporters::Suite:0x0000564375476be8 @name="PostTest">, 42.33207633999996]
 test_posting_from_the_editor#PostTest (42.33s)
Capybara::ElementNotFound:         Capybara::ElementNotFound: Unable to find field "Title" that is not disabled
            test/system/post_test.rb:20:in `block in <class:PostTest>'

We can individually set a wait parameter but maybe better to set it system wide...

PRs that are stuck:

Here are some of the builds:

@jywarren jywarren added bug the issue is regarding one of our programs which faces problems when a certain task is executed testing issues are usually for adding unit tests, integration tests or any other tests for a feature labels May 8, 2019
@jywarren
Copy link
Member Author

jywarren commented May 9, 2019

I've set Capybara.default_max_wait_time = 8 in #5526, but we're still seeing intermittent failures. We could set it to 12 or 15 or 20, but this would start to really make looooooong Travis runs. This one was already 18 min 3 sec: https://travis-ci.org/publiclab/plots2/builds/530045458

@jywarren
Copy link
Member Author

jywarren commented May 9, 2019

https://stackoverflow.com/questions/36732120/capybara-rspec-test-takes-a-long-time-to-execute-why says that 3rd party javascript loading can take a lot of time, though I don't know why.

apparently we can turn on debug mode in capybara-webkit (maybe selenium? dunno) and see the output to see what's taking so long.

this wasn't too helpful: https://stackoverflow.com/questions/37005387/poltergeist-capybara-view-testing-taking-long-time-when-running-all-test-in-spec

These folks found it required adding time as well: https://stackoverflow.com/questions/28464090/poltergeist-capybara-test-unable-to-find-css-intermittently?rq=1 ?

@jywarren
Copy link
Member Author

Noting re: Sebastian's idea on the build taking too long; when i see 1137s to do docker build, i see:

dpkg: error processing package google-chrome-stable (--install):
 dependency problems - leaving unconfigured
Processing triggers for mime-support (3.60) ...
Errors were encountered while processing:
 google-chrome-stable

@jywarren
Copy link
Member Author

So: after connecting with @icarito on this, our best theory right now is that actually the slowdown in overall test run is from installation of google-chrome-stable, not from actually running the system tests.

However, we are also seeing system test failures which seem timing related.

HYPOTHESIS: Potentially, whatever is causing google-chrome-stable to take so long to install is slowing down processing in the Travis test container overall, and causing the many system tests to blow past their timeout limits. Are we, perhaps, trying to run too much in the Travis container? Might we be up against some kind of memory limit?

@jywarren
Copy link
Member Author

@alaxalves with your experience in Docker, would you be able to offer any insights here? Noting that @icarito has been working the overall test run speed in #5730

Thanks, all, this is becoming a bit urgent as many PRs are failing due to these issues! Many thanks for your work on this. Also mentioning @publiclab/plots2-reviewers @publiclab/soc to just make people aware that we're encountering these issues.

@alaxalves
Copy link
Member

@alaxalves with your experience in Docker, would you be able to offer any insights here? Noting that @icarito has been working the overall test run speed in #5730

Thanks, all, this is becoming a bit urgent as many PRs are failing due to these issues! Many thanks for your work on this. Also mentioning @publiclab/plots2-reviewers @publiclab/soc to just make people aware that we're encountering these issues.

@jywarren What I have suggested in publiclab/mapknitter#605 improves our build time, since everything is run in parallel. Regarding the variable Capybara.default_max_wait_time, during a Rails 5 upgrade in Noosfero we have switched our driver from Firefox to Chrome, and we have set this variable to 60. Check it out: https://gitlab.com/noosfero/noosfero/blob/master/features/support/selenium.rb#L32

@jywarren
Copy link
Member Author

Whoa, 60 seconds? Well, we can try this, but formerly I thought this had been causing the really long build times. Now that we are thinking it could actually be an issue with installing google-chrome-stable, maybe this is reasonable?

@jywarren
Copy link
Member Author

OK, trying Capybara.default_max_wait_time = 60 in #5681 -- hold on to your seats!

@jywarren
Copy link
Member Author

Hmm. It seems to have had a positive effect since i'm seeing more tests pass - like #5748, but some don't, like #5654

@alaxalves
Copy link
Member

Whoa, 60 seconds? Well, we can try this, but formerly I thought this had been causing the really long build times. Now that we are thinking it could actually be an issue with installing google-chrome-stable, maybe this is reasonable?

Installing google-chrome-stable definitely is a big cause to the long builds duration. Those random failing tests might also be related to this:
image

@jywarren I could work on this if you want, but I'd freeze my work on Mapknitter for a while. What do you say?

@jywarren
Copy link
Member Author

jywarren commented May 21, 2019 via email

@alaxalves
Copy link
Member

Well, it's your call -- we'd love your help on this, but if you feel it'd compromise your other work, I totally understand! But I'm fine with it if you are ok spending a little time on this? Thank you, in any case!

On Mon, May 20, 2019 at 8:22 PM Álax de Carvalho Alves < @.***> wrote: Whoa, 60 seconds? Well, we can try this, but formerly I thought this had been causing the really long build times. Now that we are thinking it could actually be an issue with installing google-chrome-stable, maybe this is reasonable? Installing google-chrome-stable definitely is a big cause to the long builds duration. Those random failing tests might also be related to this: [image: image] https://user-images.githubusercontent.com/19597045/58059863-28dc5900-7b45-11e9-8803-c7f40c4e3be5.png @jywarren https://github.com/jywarren I could work on this if you want, but I'd freeze my work on Mapknitter for a while. What do you say? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5683?email_source=notifications&email_token=AAAF6J6H7SUPW7TDXV3OX3DPWM6B5A5CNFSM4HLOMGOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV2M7OA#issuecomment-494194616>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAF6J4RNGNTTKNB6MT3GU3PWM6B5ANCNFSM4HLOMGOA .

@jywarren This week I'll take a look a this, I'd like get this train rolling and start some DevOps improvements in here too. No problemo. 😄 💪

@jywarren
Copy link
Member Author

I believe this is solved. I'm no longer seeing it happening! We can re-open if it happens again. Thanks!!!

@jywarren
Copy link
Member Author

jywarren commented Jun 3, 2019

I'm going to re-open because I am seeing intermittent system test failures, like this:

The error is: https://travis-ci.org/publiclab/plots2/jobs/540748689#L3885

 FAIL["test_viewing_the_dashboard", #<Minitest::Reporters::Suite:0x00007fa5e78de4f8 @name="DashboardTest">, 150.24667566600004]
 test_viewing_the_dashboard#DashboardTest (150.25s)
        expected to find visible css ".row.header > h1" with text "Dashboard" but there were no matches. Also found "Community research", which matched the selector but not all filters. 
        test/system/dashboard_test.rb:19:in `block in <class:DashboardTest>'

Now, this means we're not logged in, but it's not reliably reproducible. So, I think it's likely that it's a timing/timeout issue!

For example, in the very next build it passed.

@jywarren
Copy link
Member Author

jywarren commented Jun 3, 2019

These two PRs are reliably (4+ restarts) not passing their system tests:

#5824 + #5825

@alaxalves
Copy link
Member

@jywarren Apparently #5825 is failing because there's a end missing somewhere.

@jywarren
Copy link
Member Author

jywarren commented Jun 4, 2019

Oof! Thanks @alaxalves! 🎊 i had assumed I hadn't touched any system test code and was so used to system tests not working... but my changes are to one of the only features we actually have system tests for, so this is a perfect use case for system tests! 🎉

@jywarren
Copy link
Member Author

jywarren commented Jun 4, 2019

OMG and the other is also a true system test failure! Wow, closing again. It's so pleasant to have ones tests fail due to real bugs! 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug the issue is regarding one of our programs which faces problems when a certain task is executed high-priority testing issues are usually for adding unit tests, integration tests or any other tests for a feature
Projects
None yet
Development

No branches or pull requests

2 participants