Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add envoy as upstream project #653

Merged
merged 2 commits into from
May 15, 2019
Merged

Add envoy as upstream project #653

merged 2 commits into from
May 15, 2019

Conversation

keith
Copy link
Member

@keith keith commented May 14, 2019

This depends on envoyproxy/envoy#6938

Closes #218

@keith
Copy link
Member Author

keith commented May 14, 2019

Please let me know if anything else needs to be done (besides the linked PR being merged)

@philwo
Copy link
Member

philwo commented May 14, 2019

Thank you @keith! This should be enough for downstream regression tests to ensure that Bazel doesn't break Envoy. We can take it from here, as soon as the linked PR is merged. :)

By the way, I noticed that the checks of your linked Envoy PR that run on Azure take quite a while (42 minutes for Linux and already >1.5hours for macOS). In addition to downstream testing, we also offer postsubmit and presubmit (pull request) testing on Bazel CI for Linux, macOS and Windows on high-performance machines, including remote caching / execution, ... - in case that's interesting to Envoy, let me know!

@keith
Copy link
Member Author

keith commented May 14, 2019

Thanks, I'll forward that along to the right people!

@htuch
Copy link

htuch commented May 14, 2019

CC @lizan

@keith
Copy link
Member Author

keith commented May 14, 2019

Ok @philwo, the dependent PR is merged now!

buildkite/bazelci.py Outdated Show resolved Hide resolved
Copy link

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fweikert fweikert merged commit 3b0ba60 into bazelbuild:master May 15, 2019
@keith keith deleted the ks/envoy-config branch May 15, 2019 16:53
@keith
Copy link
Member Author

keith commented May 15, 2019

Thanks @fweikert, do we need to setup the webhook now?

@philwo
Copy link
Member

philwo commented May 15, 2019

Feel free to setup the webhook, yes.

I did a test run today and noticed that we have to install Ninja in the CI images - is that correct?

See: https://buildkite.com/bazel/envoy/builds/1#d88ec2d2-d523-4f91-8ce9-895b2b901bad

@keith
Copy link
Member Author

keith commented May 15, 2019

Ah yes, sorry @philwo that is correct. We're using rules_foreign_cc to build some external deps

@keith
Copy link
Member Author

keith commented May 15, 2019

We haven't received any instructions for the webhook setup, is there a specific URL we need to point to?

@philwo
Copy link
Member

philwo commented May 15, 2019

@keith Sorry! I just sent you a Buildkite invite - when you accept it, you should be able to see the instructions here: https://buildkite.com/bazel/envoy/settings/setup/github.

I'm happy to send invites to other Envoy admins, too - just let me know their e-mail addresses. (Only admins should need a Buildkite account, because read-only access to the pipeline is public and doesn't require logging into an account.)

After you add the webhook, could you please ensure that at least one person with write access to the Envoy repository connects their GitHub account with Buildkite (here: https://buildkite.com/user/connected-apps)? This gives Buildkite the necessary permissions to post commit statuses back to GitHub. :)

@keith
Copy link
Member Author

keith commented May 15, 2019

Before setting that up I've submitted #657 to try and get a green build

@philwo
Copy link
Member

philwo commented May 15, 2019

Build is passing now, tests are running: https://buildkite.com/bazel/envoy/builds/1

@philwo
Copy link
Member

philwo commented May 15, 2019

Just one failing test: https://storage.googleapis.com/bazel-untrusted-buildkite-artifacts/70630d49-d650-47bd-858b-69e3e45987e0/test/common/network/dns_impl_test/attempt_1.log

The /etc/hosts supplied by Google Compute Engine doesn't map "::1" to "localhost", which causes the test to fail - but luckily I can easily fix that by modifying the /etc/hosts file in the host.

I have to build and deploy new host VMs tomorrow anyway and will add this fix. Then your builds and tests should be green. :)

@keith
Copy link
Member Author

keith commented May 15, 2019

Awesome thanks! Feel free to ping me when it's ready and I'll coordinate on the webhook side

@philwo
Copy link
Member

philwo commented May 16, 2019

@keith The pipeline is green now: https://buildkite.com/bazel/envoy/builds/1 :)

@laurentlb
Copy link
Contributor

Can we avoid adding projects to the CI when they don't pass with incompatible flags?

https://buildkite.com/bazel/bazelisk-plus-incompatible-flags/builds/110#72eb6717-c478-44de-833d-b7da832ee58e

@philwo
Copy link
Member

philwo commented May 17, 2019

I don't know. I would like to add them to prevent breaking them with a release and if the pipeline is green with Bazel latest and HEAD, that's already something, right? It gives us a good signal to prevent regressions.

I think the migration stuff is also important, but a separate issue. Forcing projects to do all migrations before adding them might just result in them never joining CI, especially bigger projects with lots of dependencies.

@laurentlb
Copy link
Contributor

Our workflow is: don't flip a flag if it's going to break anything. Each time we add a new project that's not compatible, you basically delay changes to Bazel. So that's very disruptive to the team.

@philwo
Copy link
Member

philwo commented May 17, 2019

The breakage doesn't go away, just because we don't test it on CI 😅 I would rather add important projects to our CI and know when we're breaking the ecosystem, rather than pretending the problem doesn't exist.

If our users are slow to migrate, then let's raise awareness for this problem and make it easier to migrate? I would guess people are just not aware that they have to proactively migrate something. And I mean, why should they, if their build is currently not broken. People usually aren't actively looking for more work to do, maybe our assumption that this is a good approach to handle migrations is just not good.

But I know that simply not testing is not a solution for this and I'll continue to add projects that want to get onboarded and where their pipeline is green.

@htuch
Copy link

htuch commented May 17, 2019

@laurentlb please chat with @dslomov about the motivation for this addition and some our recent issues.

@laurentlb
Copy link
Contributor

Well, end users can migrate when we do the release and they want to use it.

Projects on the CI have to migrate early. This is so that we can detect migration problems early. So, requirements for being added to the CI should be much higher than just being a Bazel user.

We could also redesign the complete incompatible change workflow, but I'd rather not to.

@laurentlb
Copy link
Contributor

rather than pretending the problem doesn't exist.

Just to clarify what I suggest:

  1. Require the owners to have their project working with Bazel, including the incompatible flags we're about to roll out.
  2. Ask them to monitor https://buildkite.com/bazel/bazelisk-plus-incompatible-flags/ (at least once a month) and proactively fix/report issues
  3. If they agree, then great, let's add the project to the CI!

@philwo
Copy link
Member

philwo commented May 17, 2019

@laurentlb I have a mandate and support from Dmitry to test Bazel's ecosystem on this CI and that's what my team will do. We can discuss the details of this, but this is in my area of responsibility and thus I can make the decision, just like the Starlark team can do for Starlark language features. :)

I'm happy to discuss this together when @dslomov is back, but not on this already merged PR on a Friday evening.

@fweikert
Copy link
Member

I discussed this issue with laurentlb@ offline. The problem is that downstream breakages delay flag flips, and our current window for incompatible changes is very short due to the 1.0 timeline.

On the other hand, as philwo@ noted breakages don't go away automatically. Luckily philwo@ and I already have a TODO to improve the CI onboarding process, and this issue looks like a good candidate.

Once a project has its own pipeline and wants to be included in the downstream pipeline, we could declare it a "downstream candidate" that gets tested against future flag flips, but doesn't block any releases. If the candidate passes, we promote the project and include it in the downstream pipeline.

Moreover, as philwo@ noted in another issue, we should make it very explicit which levels of CI support exist (e.g. postsubmit -> presubmit w. GitHub integration -> downstream project), and which requirements for the project owners go along with each level.
Such a solution would benefit everyone since project owners get better test coverage, while the Bazel release process isn't delayed.

@keith
Copy link
Member Author

keith commented May 17, 2019

I'm happy to do the legwork to fix some of these, I started look at the log to do so and, unsurprisingly, immediately hit problems with dependencies. Many of which don't have new versions fixing these issues themselves. Is there a recommended approach for this case?

@laurentlb
Copy link
Contributor

@keith Thanks! I'll be happy to help you.
Can we discuss this somewhere else, like Envoy bug tracker?

If you depend on projects that are not on our CI, this is something we may want to fix.
If the projects are on the CI, there are probably tracking bugs.

@keith
Copy link
Member Author

keith commented May 17, 2019

I've created an upstream issue for tracking fixes envoyproxy/envoy#6995

@philwo
Copy link
Member

philwo commented May 21, 2019

@keith Would you like to continue with the webhook setup? The build is green now. :)

Fixing the issues with incompatible changes (thanks for that!) is unrelated to this, by the way, so you wouldn't have to wait for that.

@keith
Copy link
Member Author

keith commented May 22, 2019

@laurentlb @philwo would it make sense to get grpc running on this CI too? It looks like they have a complex config and it would probably be nice to test. They also have a lot of incompatible change violations at the moment

@htuch
Copy link

htuch commented May 22, 2019

CC @markdroth, who is one of the gRPC C library owners; do you reckon we can get gRPC on Bazel CI?

@philwo
Copy link
Member

philwo commented May 22, 2019

@keith Yeah, sounds like a good project to test!

@keith
Copy link
Member Author

keith commented May 29, 2019

I've submitted an issue upstream to see if we get buy in from the grpc folks grpc/grpc#19171

@markdroth
Copy link

Sorry for not replying sooner -- my github notifications go to the wrong email address for all repos except the grpc repo.

Anyway, I've commented on the issue you filed in the grpc repo.

joeleba pushed a commit to joeleba/continuous-integration that referenced this pull request Jun 17, 2019
* Add envoy as upstream project

* Use raw URL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get Envoy proxy on to Buildkite
7 participants