Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antrea works on Windows Node #331

Closed
20 of 21 tasks
wenyingd opened this issue Jan 19, 2020 · 15 comments
Closed
20 of 21 tasks

Antrea works on Windows Node #331

wenyingd opened this issue Jan 19, 2020 · 15 comments
Assignees
Labels
api-review Categorizes an issue or PR as actively needing an API review. area/component/agent Issues or PRs related to the agent component area/component/cni Issues or PRs related to the cni component area/OS/windows Issues or PRs related to the Windows operating system. kind/design Categorizes issue or PR as related to design. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. proposal A concrete proposal for adding a feature

Comments

@wenyingd
Copy link
Contributor

wenyingd commented Jan 19, 2020

Describe what you are trying to solve
Antrea could be run on Windows as worker Nodes in the cluster.

Describe the solution you have in mind
Windows HNS Network is used to support the container networking, and "Transparent" is the chosen driver type.
For each Pod, a HNS Endpoint is created and attached on both the infra container and the workload container. The IP address of the HNS Endpoint is allocated by IPAM. The IPAM is leveraging host-local by default, which is the same as Antrea on Linux.
OVS Extension is enabled on the target HNS Network, and OpenFlow is used to implement container networking connectivity and Network Policy rules. Each HNS Endpoint has a single port added on the OVS bridge, which has the same name as the HNS Endpoint.
Routes for the cross-nodes connectivity are also needed to be setup on Windows host, and the routing configurations are similar to Antrea on Linux.
The physical networking interface which Windows Node uses to join the K8S cluster is adding on the OVS bridge as the uplink interface. And OpenFlow entries are added to support Winows host networking.
Named pipe is used for the communications between local processes, including, 1) antrea-agent v.s. OVSDB, 2) antrea-agent v.s. OF Switch 3) CNI plugin v.s. CNI server.

Describe how your solution impacts user flows
The work flows to enable Antrea on Windows should be:

  1. Setup a K8S cluster, and the master Node should be a Linux node. Antrea controller should be schecduled on the master Node
  2. Prepare Whindows Server hosts. Install docker engine on the host, and nable Hyper-V feature. Note: in the first phase to support Windows, Hyper-V is required on the Windows host.
  3. Install K8S binaries on Windows host
  4. Run kubelet on Windows host to join the cluster
  5. Run Antrea agent on Windows host

Describe the main design/architecture of your solution
Please refer to the design doc:
https://docs.google.com/document/d/1lSis0XnKz8UcJSkxTgRtDhP2DAwQtcZDjT6ZRySUb48/edit?usp=sharing

Subtasks

TODO

  • Support windows components livenessProbe and readinessProbe
@wenyingd wenyingd added the proposal A concrete proposal for adding a feature label Jan 19, 2020
@wenyingd wenyingd self-assigned this Jan 19, 2020
@antoninbas
Copy link
Contributor

@wenyingd I haven't taken a look at #372 and #373 yet, but what's the testing strategy for this? Do we have a K8s testbed with Windows Nodes and can we enable at least a subset of the tests we run on Linux as part of CI? @edwardbadboy @lzhecheng

@wenyingd
Copy link
Contributor Author

We need a K8S testbed with Windows Nodes as the worker nodes, and the master Node should be Linux node. Since K8S components(kubelet, kube-proxy), Antrea agent and OVS are working as processes on Windows node, we will provide a script to help deploy and run the processes in later PRs, and we also need a Windows node to run Windows specific and common test cases.

For unit test and integration test, if the new test case is Windows specific, it should be placed in a file with "_windows_test" suffix. For existing cases, if the case is closely releveant with Linux, it should be moved to a "_linux_test" suffix file, otherwise the tests should be passed on Windows node.

For CI test cases, I think we could refer to these e2e tests: https://github.com/kubernetes-sigs/windows-testing

@lzhecheng
Copy link
Contributor

Currently, we don't have any Windows testbed :(

@McCodeman McCodeman added area/component/agent Issues or PRs related to the agent component area/component/cni Issues or PRs related to the cni component area/OS/windows Issues or PRs related to the Windows operating system. kind/design Categorizes issue or PR as related to design. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Feb 12, 2020
@ruicao93 ruicao93 self-assigned this Feb 24, 2020
@McCodeman McCodeman added api-review Categorizes an issue or PR as actively needing an API review. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Feb 26, 2020
@michmike
Copy link

this is awesome. i love this work and can't wait to talk about it in Kubernetes SIG-Windows.
a couple of questions.

  1. What versions of windows will you support? Windows Server 2019 (1809) and beyond for example
  2. Does NSX-T NCP also use the transparent driver?
  3. Will heterogensouc clusters be supported? meaning a mix of windows and linux worker nodes in the same cluster. i assumed it is, just wanted to make sure it is called out
  4. why is Hyper-V required on the Windows host? i know we are working with Microsoft to avoid this. is that still a requirement?
  5. What are the scalability and performance goals?

@jianjuns
Copy link
Contributor

@michmike thanks for the support.
@wenyingd and @ruicao93 can provide more information, but let me try answering some your questions.

  1. @wenyingd and @ruicao93 can provide the exact patch version.
  2. Yes. We can use transparent driver only.
  3. Yes.
  4. We do not have all changes in upstream OVS to remove Hyper-V vSwitch dependency. And I feel it will be long term work to maintain OVS support for Windows without Hyper-V, because even with our private OVS build we meet compatibility issues time to time with Windows patches.
  5. We have not done complete scale and performance tests yet.
    For cluster scale, Antrea itself should not be different from that for Linux Nodes (so far we tested 1K Nodes, 100KPods, and 50K NetworkPolicies, but it does not mean Antrea can not scale higher).
    For a single Node scale/performance, as the current implementation uses upstream kube-proxy userspace mode, we do see some K8s Service traffic performance issues. We assume it can be fixed once we implement kube-proxy functionalities with OVS (which will be supported in 1-2 releases).

@ruicao93
Copy link
Contributor

ruicao93 commented May 22, 2020

Thanks @jianjuns for the answers.

@michmike, for the qustions:

  1. What versions of windows will you support? Windows Server 2019 (1809) and beyond for example

We plan to support Windows Server 2019 (1809) in first release because it's the latest LTS version. The newer versions of windows(1903+) currently are not in test scope. But I think we can verify Antrea on these versions after we complete the CI integration for windows.

Thanks,
Rui

@jayunit100
Copy link
Contributor

jayunit100 commented Dec 4, 2020

@jianjuns "long term work" ... as in months or years ? :) i think getting off Hyper-v is emerging as a high priority for us .

Is HyperV supported on AWS VMs ? If not, we won't be able to run antrea on windows in AWS.

@ruicao93
Copy link
Contributor

ruicao93 commented Dec 4, 2020

@jianjuns "long term work" ... as in months or years ? :) i think getting off Hyper-v is emerging as a high priority for us .

Hi @jayunit100 , I think the "long term work" means we need to wait OVS remove the dependency to hyper-v. Actually we already have private patches which support it but only be used in VMware product. I'm not sure when the patches could be ported to upstream OVS. Maybe Jianjun know more info about it.

Is HyperV supported on AWS VMs ? If not, we won't be able to run antrea on windows in AWS.

I search the problem, but can find the answer from AWS official doc. It seems only baremetal instances on AWS has full hardware virtualization capabilities.
https://forums.aws.amazon.com/thread.jspa?messageID=926556
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/system-requirements-for-hyper-v-on-windows
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/windows-ami-version-history.html#Virtualization-types
https://aws.amazon.com/marketplace/pp/Amazon-Web-Services-Microsoft-Windows-Server-2019-/B07RJTTGML

@michmike
Copy link

michmike commented Dec 5, 2020

i think we can use our good friends at cloudbase and Alin to help us push the fixes upstream in OVS. do we have anyone from antrea team involved in OVS community that can help here as well?

@jianjuns
Copy link
Contributor

jianjuns commented Dec 5, 2020

We did talk to Alin, Ben, and Anand on these changes. @wenyingd can update the status. Probably let us discuss offline.

@jayunit100
Copy link
Contributor

lets sync about this at the antrea community meeting this wk, and then ill update this issue with details afterwards. Im also going to talk to @yasensim on wednseday

@vicky-liu
Copy link

Alin didnt want to merge the private patches into OVS upstream to disable hyper-v because he is working on a new implementation on v2 HCN API. Let me check the progress and update here.

@jayunit100
Copy link
Contributor

thanks , this is actually a blocker for us now so, anything we can do to help please let us know !

@jayunit100
Copy link
Contributor

FWIW, we do have hyper-vless branches we can use now, but we just dont have them in a production ready context. @ruicao93 has details.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. area/component/agent Issues or PRs related to the agent component area/component/cni Issues or PRs related to the cni component area/OS/windows Issues or PRs related to the Windows operating system. kind/design Categorizes issue or PR as related to design. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. proposal A concrete proposal for adding a feature
Projects
None yet
Development

No branches or pull requests

9 participants