Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mozdownload sometimes fails on Taskcluster and Travis #13274

Closed
foolip opened this issue Sep 30, 2018 · 23 comments
Closed

mozdownload sometimes fails on Taskcluster and Travis #13274

foolip opened this issue Sep 30, 2018 · 23 comments

Comments

@foolip
Copy link
Member

foolip commented Sep 30, 2018

At least twice some of the Firefox tasks for pushes to master have failed in mozdownload:

https://tools.taskcluster.net/groups/AGupAeh6TrSdyNlDxcqZXw (for commit 2df7f9f, now only discoverable via API)
https://tools.taskcluster.net/groups/eRtqwYxjTfeob0g4hHcOlw (for commit 91491de)

The most recent failure was:

Traceback (most recent call last):
  File "./wpt", line 5, in <module>
    wpt.main()
  File "/home/test/web-platform-tests/tools/wpt/wpt.py", line 129, in main
    rv = script(*args, **kwargs)
  File "/home/test/web-platform-tests/tools/wpt/run.py", line 510, in run
    **kwargs)
  File "/home/test/web-platform-tests/tools/wpt/run.py", line 488, in setup_wptrunner
    kwargs["binary"] = setup_cls.install(venv, channel=channel)
  File "/home/test/web-platform-tests/tools/wpt/run.py", line 165, in install
    return self.browser.install(venv.path, channel)
  File "/home/test/web-platform-tests/tools/wpt/browser.py", line 134, in install
    destination=dest).download()
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/factory.py", line 121, in __init__
    scraper_types[scraper_type].__init__(self, **kwargs)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 346, in __init__
    Scraper.__init__(self, *args, **kwargs)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 135, in __init__
    self._retry_check_404(self.get_build_info)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 150, in _retry_check_404
    self._retry(func, **retry_kwargs)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 141, in _retry
    return redo.retry(func, **retry_kwargs)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/redo/__init__.py", line 162, in retry
    return action(*args, **kwargs)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 403, in get_build_info
    self.date, self.build_index)
  File "/home/test/web-platform-tests/_venv/lib/python2.7/site-packages/mozdownload/scraper.py", line 484, in get_build_info_for_date
    raise errors.NotFoundError(message, url)
mozdownload.errors.NotFoundError: Folder for builds on 2018-09-28-22-04-33 has not been found: https://archive.mozilla.org/pub/firefox/nightly/2018/09/

The previous was very similar except for the date: "mozdownload.errors.NotFoundError: Folder for builds on 2018-09-24-10-03-54 has not been found: https://archive.mozilla.org/pub/firefox/nightly/2018/09/"

That it's happened 4 days apart suggests that it wasn't just a transient problem with archive.mozilla.org.

Here's where the error is thrown:
https://github.com/mozilla/mozdownload/blob/866cfebe9b8137bfe7ba8411efbe9d0e9d24093a/mozdownload/scraper.py

@jgraham, can you take a look?

@foolip foolip changed the title mozdownload step sometimes fails on Taskcluster mozdownload sometimes fails on Taskcluster Sep 30, 2018
@foolip
Copy link
Member Author

foolip commented Sep 30, 2018

The reason I noticed is that in https://wpt.fyi/test-runs?label=taskcluster, the row for 2df7f9f has only stable results, but I expected stable + experimental. @Hexcles FYI.

@mdittmer
Copy link
Contributor

mdittmer commented Oct 4, 2018

@Hexcles could you please add a priority label to this? I think it's your call because you are managing priority/urgency of shipping TaskCluster runs on wpt.fyi.

@gsnedders
Copy link
Member

@foolip
Copy link
Member Author

foolip commented Oct 8, 2018

@jgraham, can you take a look? Needs more retry?

@jgraham
Copy link
Contributor

jgraham commented Oct 8, 2018

I fairly strongly suspect that this is happening when a new nightly is being released (maybe some platforms are available and some are not?). But we are already handling that badly; it's possible to end up with some tests run in the previous nightly and some in the new one. Really we need a single decsion task that picks a binary URL and makes it available to the subsequent tasks to ensure that they all run against the exact same version. Note that Chrome could have the same issue, but it's less likely since the releases are less often. But it's harder to solve in that case; we probably actually need to download the .deb and make it available as an artifact since there isn't a longlived URL AFAIK.

@foolip
Copy link
Member Author

foolip commented Oct 8, 2018

we probably actually need to download the .deb and make it available as an artifact since there isn't a longlived URL AFAIK

Enter, stage left, @jugglinmike to say something about how this is done in https://github.com/web-platform-tests/results-collection

@foolip
Copy link
Member Author

foolip commented Nov 27, 2018

I happened to look at recent commits, and https://github.com/web-platform-tests/wpt/commits/75b92bf3d1791dc0e47cd8a716a135e98d2d2937 has a similar failure (https://tools.taskcluster.net/groups/ZklAzb_fTueVrBAWR0kGBA): "requests.exceptions.ConnectionError: HTTPSConnectionPool(host='hg.mozilla.org', port=443): Max retries exceeded with url: /mozilla-central/archive/tip.zip/testing/profiles/ (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fc6f85775d0>: Failed to establish a new connection: [Errno 110] Connection timed out',))"

@jgraham is there anything we can do to mitigate this?

@foolip foolip changed the title mozdownload sometimes fails on Taskcluster mozdownload sometimes fails on Taskcluster and Travis Dec 11, 2018
@foolip
Copy link
Member Author

foolip commented Dec 11, 2018

#14450 (comment) shows this happening when running unit tests in Travis too.

@jgraham, is there nothing we can do to make this more reliable?

@foolip
Copy link
Member Author

foolip commented Jan 23, 2019

This happened in #15012 now:
https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=4488

The failure was in test_install_firefox because of a network error in mozdownload:
https://gist.github.com/foolip/62ec6e42108ff93f22408340a84ac083#file-74-txt-L106

@jgraham should we mark the test as xfail?

@gsnedders
Copy link
Member

@foolip that test shouldn't be any more flaky than any of the test run jobs where we start off by downloading the nightly browser

@foolip
Copy link
Member Author

foolip commented Jan 23, 2019

Perhaps it is one of few pytest tests that do this, because I've seen it fail a few times and I don't think I've seen it for other tests. But it happens for ./wpt run of course, that was the beginning of this issue.

@jgraham is there an infra bug filed in some Mozilla repo for improving the reliability of this setup? Or could it be a mozdownload bug?

@foolip
Copy link
Member Author

foolip commented Jan 30, 2019

This happened for @annevk on #15122.

@jgraham
Copy link
Contributor

jgraham commented Jan 30, 2019

It's a mozdownload bug. The way it works is that it looks for a directory containing builds, stores that, and then later uses the stored directory to look for the actual build. But the operation of creating a directory full of builds isn't (even nearly) atomic; the directory is created when the first artifacts are available not when the last build is complete. So the "solution" here is either a) stop using mozdownload and roll our own thing, b) rearchitect mozdownload to do the build and directory lookup in a single operation or c) catch the failure and try again with the previous build. c) is probably the most practical option but the tool really doesn't seem to be designed in a way that makes a fix here easy.

@foolip
Copy link
Member Author

foolip commented Jan 30, 2019

Is there a stable URL that can be used to download the latest build for a given platform, or does mozdownload exist precisely because downloading Firefox isn't that easy? If it were easy then just skipping mozdownload would be a decent option.

@jgraham do you know if there's a bug filed for mozdownload about this?

@gsnedders
Copy link
Member

mozilla/mozdownload#524 is the mozdownload bug, as appears above.

@jgraham
Copy link
Contributor

jgraham commented Jan 30, 2019

There is a stable url for "latest build" but that's what we're using and what's causing the problem. There isn't a stable url per build type/platform.

@foolip
Copy link
Member Author

foolip commented Feb 11, 2019

This happened on #15280.

@jgraham have you seen many Gecko exports blocked because of this? It seems from the stack that there's already retry involved, so I guess a fix for mozilla/mozdownload#524 is the only hope?

@whimboo
Copy link
Contributor

whimboo commented Feb 11, 2019

The latest stable nightly build could be found like this for linux64:
https://download.mozilla.org/?product=firefox-nightly-latest-ssl&os=linux64&lang=en-US

Which other kind of builds are required?

@foolip
Copy link
Member Author

foolip commented Feb 11, 2019

Those are the ones that have been failing, but we also download stable and beta for other runs. Not sure if those use mozdownload, @jgraham would know though.

@whimboo
Copy link
Contributor

whimboo commented Feb 11, 2019

@jgraham
Copy link
Contributor

jgraham commented Feb 11, 2019

We can definitely experiment with not using mozdownload.

@jgraham
Copy link
Contributor

jgraham commented Feb 12, 2019

Fixed via #15329

@jgraham jgraham closed this as completed Feb 12, 2019
@foolip
Copy link
Member Author

foolip commented Feb 12, 2019

Thanks James!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants