Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create KEP for Windows Node Support #676

Merged
merged 2 commits into from
Jan 11, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions keps/sig-windows/0000-20190103-windows-node-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
kep-number: 0
title: Windows node support
authors:
- "@benmoss"
- "@astrieanna"
owning-sig: sig-windows
participating-sigs:
- sig-architecture
reviewers:
- TBD
approvers:
- TBD
editor: TBD
creation-date: 2018-11-29
last-updated: 2019-01-03
status: provisional
---

# Windows node support


## Table of Contents

* [Windows node support](#windows-node-support)
* [Table of Contents](#table-of-contents)
* [Summary](#summary)
* [Motivation](#motivation)
* [Goals](#goals)
* [Non-Goals](#non-goals)
* [Proposal](#proposal)
* [What works today](#what-works-today)
* [What will work eventually](#what-will-work-eventually)
* [What will never work (without underlying OS changes)](#what-will-never-work-without-underlying-os-changes)
* [Relevant resources/conversations](#relevant-resourcesconversations)
* [Risks and Mitigations](#risks-and-mitigations)
* [Graduation Criteria](#graduation-criteria)
* [Implementation History](#implementation-history)
* [Other references](#other-references)


## Summary

There is strong interest in the community for adding support for workloads running on Microsoft Windows. This is non-trivial due to the significant differences in the implementation of Windows from the Linux-based OSes that have so far been supported by Kubernetes.


## Motivation

Windows-native workloads still account for a significant portion of the enterprise software space. While containerization technologies emerged first in the UNIX ecosystem, Microsoft has made investments in recent years to enable support for containers in its Windows OS. As users of Windows increasingly turn to containers as the preferred abstraction for running software, the Kubernetes ecosystem stands to benefit by becoming a cross-platform cluster manager.

### Goals

- Enable users to run nodes on Windows servers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have written:

  • Enable users to run Windows server containers on Windows servers using Kubernetes

- Document the differences and limitations compared to Linux
- Test results added to testgrid to prevent regression of functionality

### Non-Goals

- Adding Windows support to all projects in the Kubernetes ecosystem (Cluster Lifecycle, etc)

## Proposal

As of 29-11-2018 much of the work for enabling Windows nodes has already been completed. Both `kubelet` and `kube-proxy` have been adapted to work on Windows Server, and so the first goal of this KEP is largely already complete.

### What works today
- Windows-based containers can be created by kubelet, [provided the host OS version matches the container base image](https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility)
- ConfigMap, Secrets: as environment variables or volumes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about volumes, such as emptyDir, shared between containers within a Pod?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about storage medium Memory or HugePages?

- Resource limits
- Pod & container metrics
- Pod networking with [Azure-CNI](https://github.com/Azure/azure-container-networking/blob/master/docs/cni.md), [OVN-Kubernetes](https://github.com/openvswitch/ovn-kubernetes), [two CNI meta-plugins](https://github.com/containernetworking/plugins), [Flannel](https://github.com/coreos/flannel) and [Calico](https://github.com/projectcalico/calico)
- Dockershim CRI
- Many<sup id="a1">[1]</sup> of the e2e conformance tests when run with [alternate Windows-based images](https://hub.docker.com/r/e2eteam/) which are being moved to [kubernetes-sigs/windows-testing](https://www.github.com/kubernetes-sigs/windows-testing)
- Persistent storage: FlexVolume with [SMB + iSCSI](https://github.com/Microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows), and in-tree AzureFile and AzureDisk providers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do pod hostname and subdomain fields work? How about hostAliases? dnsConfig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I expect someone to read through PodSpec field by field to make sure we haven't forgotten something.
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2743

<sup id="a1">1</sup> This list should be available at https://k8s-testgrid.appspot.com/sig-windows but this test setup is not currently working. https://k8s-testgrid.appspot.com/google-windows#windows-prototype is also running against a Windows cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is addressing those issues part of #685?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm clarifying those in #685


### What will work eventually
- `kubectl port-forward` hasn't been implemented due to lack of an `nsenter` equivalent to run a process inside a network namespace.
- CRIs other than Dockershim: CRI-containerd support is forthcoming


### What will never work (without underlying OS changes)
- Certain Pod functionality
- Privileged containers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume Linux capabilities don't work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And Linux-specific security features, such as seccomp, SELinux, and AppArmor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should enumerate all of the fields of PodSecurityContext that don't make sense for Windows

- Reservations are not enforced by the OS, but overprovisioning could be blocked with `--enforce-node-allocatable=pods` (pending: tests needed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that QoS (burstable, best effort) doesn't work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there equivalents of any of the shared namespaces (e.g., shareProcessNamespace)? Can containers within a pod see each other in any way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does terminationGracePeriodSeconds work?

- CSI plugins, which require privileged containers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FlexVolume?

- [Some parts of the V1 API](https://github.com/kubernetes/kubernetes/issues/70604)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please inline the contents of that issue into this document

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it seems we lost quite a bit of detail compared to previous discussions. Do those issues still hold true?
https://docs.google.com/document/d/1YkLZIYYLMQhxdI2esN5PuTkhQHhO0joNvnbHpW68yg8/edit#heading=h.4khm1q370oiq

For instance, some pod features didn't work due to: Single file volume mappings. No shipped releases of Windows can map a single file, only an entire folder, into a pod/container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other persistent issues: uid/guid vs usernames, per-user Linux filesystem permissions, read-only root filesystems

Other resolvable issues: images using Linux-specific tools, hardcoded images with no windows equivalent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are system OOMs reported?

- Overlay networking support in Windows Server 1803 is not fully functional using the `win-overlay` CNI plugin. Specifically service IPs do not work on Windows nodes. This is currently specific to `win-overlay` - other CNI plugins (OVS, AzureCNI) work.

### Relevant resources/conversations

- [sig-architecture thread](https://groups.google.com/forum/#!topic/kubernetes-sig-architecture/G2zKJ7QK22E)
- [cncf-k8s-conformance thread](https://lists.cncf.io/g/cncf-k8s-conformance/topic/windows_conformance_tests/27913232)
- [kubernetes/enhancements proposal](https://github.com/kubernetes/features/issues/116)


### Risks and Mitigations

**Second class support**: Kubernetes contributors are likely to be thinking of Linux-based solutions to problems, as Linux remains the primary OS supported. Keeping Windows support working will be an ongoing burden potentially limiting the pace of development.

**User experience**: Users today will need to use some combination of taints and node selectors in order to keep Linux and Windows workloads separated. In the best case this imposes a burden only on Windows users, but this is still less than ideal.

## Graduation Criteria

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@craiglpeters agreed to draft these

To use as a starting point, here are some issues discussed in email and in prior SIG Arch meetings:

  1. There need to be adequate, continuously run, non-flaky tests with publicly accessible results, enabled as part of the release-blocking suite. Without this it's hard to have reasonable discussions about what does and doesn't work, and the release team can't make a judgement about release readiness or risk. Really, this is needed for any feature at any stage of maturity in order for us to make it available to users in a Kubernetes release.

  2. There needs to be adequate end user and admin documentation that describes what the user does and how to use it. I know there is a start on user documentation (WIP: Windows doc set for v1.13 stable website#10875), which at least covered "how to use it", and I'll take another look at it. One purpose of this KEP was to fill the role that a priori design proposals traditionally fill in providing a deeper level of detail about how a feature works and why.

  3. Reliability needs to be sufficiently high. Users run GA features in production. Usually we have some mileage on features in beta before they go GA, and at least a quarter or two of e2e test results.

  4. Compatibility can't be broken in GA features, either for existing users/clusters/features or for the new feature going forward, and the feature needs to adhere to the deprecation policy (https://kubernetes.io/docs/reference/using-api/deprecation-policy/).

Note that a draft document stated "you may want to wait for Windows Server 2019 availability from Microsoft and support in Kubernetes for production workloads", which needs to be clarified.

There were also questions about the user experience, particular for mixed-OS clusters. Alternatives for ensuring Windows containers land on Windows nodes and Linux containers land on Linux nodes include:

  • Manual node labels and selectors for both Linux and Windows workloads
  • Manual taints and tolerations just for Windows workloads
  • Automatically applied nodeSelectors for both Linux and Windows workloads
    • derived from image manifest
    • derived from something else in PodSpec
  • Automatically applied tolerations for at least Windows workloads
    • derived from image manifest
    • derived from something else in PodSpec

Some issues with the above:

  • We don't want to break compatibility for existing Linux workloads
  • We don't want the UX for Windows apps to be worse than for Linux forever
  • Setting first-class os and arch properties by default in the apiserver would break existing use cases, such as ARM
    os and arch node labels appear to be still be beta
  • Not clear that most container images contain the necessary OS info
  • Not clear that extracting the OS info from the container image manifest during admission control is feasible for private image repos

Some of this was discussed in a document:
https://docs.google.com/document/d/1XLs8Mbz1-xOIiDW9XSSuhx9fshpxJM1NDD1a0oVbzfc/edit


## Implementation History



## Other references

[Past release proposal for v1.12/13](https://docs.google.com/document/d/1YkLZIYYLMQhxdI2esN5PuTkhQHhO0joNvnbHpW68yg8/edit#)
6 changes: 6 additions & 0 deletions keps/sig-windows/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
reviewers:
- sig-windows-leads
approvers:
- sig-windows-leads
labels:
- sig/windows