-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate pull requests in TaskCluster #12657
Validate pull requests in TaskCluster #12657
Conversation
70c22b2
to
c3f109b
Compare
I had to iterate on this a bit after changing the upstream repository, but this patch is functioning as expected. I've rebased and pushed up a couple of commits to demonstrate how the CI handles patches with no affected tests and how it handles patches with one or more affected tests (and unstable ones, at that). Those final two commits should not be merged to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very very excited to see this land. But I think the implementation got a little hard to maintain by trying to force everything to go via a shell script.
.taskcluster.yml
Outdated
owner: ${event.pusher.email} | ||
source: ${event.repository.url} | ||
payload: | ||
image: jugglinmike/web-platform-tests:0.18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really need to set up a shared dockerhub account :)
.taskcluster.yml
Outdated
- name: wpt-${browser.name}-${browser.channel}-stability | ||
description: >- | ||
Verify that all tests affected by a pull request are stable | ||
when executed in ${browser.name}. As of 2018-08-23, this task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the extra text from the description; I don't think it's that helpful (we're more likely to forget to remove it later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like there to be some contributor-facing indication of this behavior.
I appreciate the concern about bitrot, though, so I've moved it to a statement
logged to standard out immediately following the command invocation. Does that
seem safe enough to you?
.taskcluster.yml
Outdated
# Bash removes null bytes from string values when set as | ||
# environment variables. This invalidates the output of `wpt | ||
# affected-tests` because it uses the null byte as a separator | ||
# between test names. The list of effected tests is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be affected
. But this complexity seems like a clue that a better solution is required.
tools/ci/ci_taskcluster.sh
Outdated
# to be the name of the browser under test. This restricts the syntax available | ||
# to consumers: value-accepting options must be specified using the equals sign | ||
# (`=`). | ||
for argument in $@; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this exceeded my complexity threshold for a bash script. If we want a single entry point for all the things, let's make it a python script instead.
From #10503:
At the same time, I'm unconvinced we should block doing this on having that working, because it's not a regression currently. |
82e452d
to
ad8f76b
Compare
569c1b6
to
0d6c2a2
Compare
@jgraham I've re-implemented This introduces more indirection, and I'm on the fence about whether it's |
tools/ci/taskcluster-run.py
Outdated
import subprocess | ||
|
||
browser_specific_args = { | ||
"firefox": ["--install-browser", "--reftest-internal"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--reftest-internal
is the default, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right
tools/wptrunner/wptrunner/wptcommandline.py: if kwargs["reftest_internal"] is None:
tools/wptrunner/wptrunner/wptcommandline.py- # Default to the internal reftest implementation on Linux and OSX
tools/wptrunner/wptrunner/wptcommandline.py: kwargs["reftest_internal"] = sys.platform.startswith("linux") or sys.platform.startswith("darwin")
I wanted to verify with the person who added this flag, and it turns out that was you:
Explicitly set Firefox to use the fast reftest runner.
This should happen by default on Linux, but it doesn't hurt to be explicit.
Have you had a change of heart? Or should we keep the flag for the sake of explicitness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the flag and aim for a situation where we can use the same flags for all browsers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You got it.
@@ -0,0 +1,101 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think I'm OK with landing this as-is, but I wonder what the effect would be of moving the logic in this file into wpt run
directly? If one added --commit-range
as an argument to that function then the only things that wouldn't directly fit would be getting the arguments right per-browser and gzipping artifacts. Those could perhaps be moved out into the task definitions.
tools/ci/taskcluster-run.py
Outdated
} | ||
|
||
def tests_affected(commit_range): | ||
output = subprocess.check_output([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could, of course, just import and use the function directly rather than going via a process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgraham wouldn't that require hacking sys.path? If so, I'm in favour of forking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think it makes more long term sense to move all of this into wpt run
and not have another wrapper script at all. So I don't object to defering that change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation LGTM. I didn't look into the Travis & TC failures though. Could you take a look?
tools/ci/taskcluster-run.py
Outdated
} | ||
|
||
def tests_affected(commit_range): | ||
output = subprocess.check_output([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgraham wouldn't that require hacking sys.path? If so, I'm in favour of forking.
@jgraham Instead of installing Firefox via |
This aspect of the wrapper is motivated by my naive attempt to print the affected tests to standard out. Without that, it would be a simple matter of piping
That serves the same purpose as the logging statement in this patch:
Considering that we may not need browser-specific arguments, either, we may be able to compose these tasks in the Do either of you see value in maintaining separation between the various sub-commands? I don't mean to push the UNIX philosophy just for the sake of it, and I know that technically speaking, these are all implemented in the same application. That said, as someone who's been collecting results for a while, I appreciate the ability to compose tasks without patching WPT. I also find it easier to find features that are exposed in their own terms rather than as arguments to the increasingly-large |
I was thinking about the opposite; making
So, I think there are just a series of tradeoffs here and no clearly obvious right answer. Having a series of independent subcommands is great because, as you say, we can script different things together using common utilities, and we get a cleaner separation between concerns for testing. But on the other hand it has real and serious disadvantages:
So I don't have a single answer at the moment, but my inclination such as it is is to keep building composable utilities for flexibility and experimentation, but to make the most common patterns baked in features (particularly of
So, the current |
Sounds good to me. Thanks for taking the time to write all of that up!
I've pushed up a commit to do this. It necessitated a change to That said, the strategy may be incomplete. If we test from the tip of the pull Contributors could get a more accurate picture by rebasing their patch, but I That idea highlights an underlying problem: we will only run these tasks in These are all questions that I'm sure TravisCI et. al have worked out already, |
This improves the authenticity of the reported results because it simulates how the patch will behave after it is merged. This also mimics the behavior of the TravisCI continuous integration platform.
The `start.sh` script now supports all git references, so this computation is no longer necessary.
This reverts commit 887512c.
Regarding revision selection: a quick review of some TravisCI logs showed that they use a GitHub-provided reference I was not aware of: I'm removing the "do not merge yet label"; this should be ready for another review. @jgraham and/or @Hexcles: would you mind? |
@jgraham @Hexcles In I walked through an example in that issue report, but to summarize:
Is this difference intentional? The former allowed an unstable test to pass through undetected, but that doesn't necessarily mean the latter is technically superior. |
The difference is semi-intentional in that the Anyway, my general feeling is that the difference is OK and we should try out this patch as is without trying to change the semantics. |
tools/ci/taskcluster-run.py
Outdated
import subprocess | ||
|
||
browser_specific_args = { | ||
"firefox": ["--install-browser", "--reftest-internal"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the flag and aim for a situation where we can use the same flags for all browsers.
`--reftest-internal` is enabled by default in GNU/Linux environments.
I've included another commit to avoid a runtime exception for pull requests that have zero affected tests. We can validate that prior to merging when we remove the intentional instability. |
849fece
to
dba6e62
Compare
@jgraham I've resolved the conflicts introduced by gh-12679 and triggered both types of jobs:
Could you take another look? |
gh-12878 was recently merged to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
owner: ${event.pull_request.user.login}@users.noreply.github.com | ||
source: ${event.repository.url} | ||
payload: | ||
image: jugglinmike/web-platform-tests:0.21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A somewhat unrelated question: how can we create a public/shareable DockerHub account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, but I think we all want that
This reverts commit ee8d57d.
Extend the configuration for the TaskCluster service to generate tasks in
response to GitHub pull requests. Each pull request should create one cluster
containing four tasks, all of which operate on the affected tests as identified
by
wpt tests-affected
:The former two tasks are intended for future use in comparing test results with
data available on https://wpt.fyi. This will be informative only. The latter
two tasks are intended to verify that the affected tests are stable. This will
influence whether or not the pull request may be merged.
A demonstration of expected functionality is available on Bocoup's fork of WPT:
master
branch (this functionality was implemented previously, and this pullrequest is not intended to influence the behavior we see today)
master
branch of WPT forkIt seems wise to vet this in WPT for a bit before allowing the results to
control whether pull requests may be merged. The configuration proposed here
will permit unstable pull requests, but it stores the status of the check in a
dedicated artifact. After some time, we can use this to compare the behavior
between the current TravisCI-powered checks and this new one. If they align, we
can revert the second commit.
Due to some changes to indentation, using
-w
or the?w=1
query stringparameter may make this change set easier to review.
[fixes #10503]