Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platform support policy proposal #21

Merged
merged 15 commits into from
Oct 2, 2014
Merged
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions platform-support-policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Chef Platform Support Policy

The purpose of this RFC is to clarify:

* What specific operating system platforms and platform versions are supported by the software produced by Chef Software, Inc.
* What is the meaning of "supported platform"

This RFC does *not* address "What is the product lifecycle of Chef Software, Inc.'s software". That is covered in a separate RFC.

## Chef Client

A Chef Client supported platform means:

* Omnitruck won't fail when confronted with the platform and version
* The most important core resources (package, service, template) work out of the box
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the parenthetical list supposed to be everything we consider "core resources" or simply examples? If it is an example, is it worth it to enumerate the full list? I fear the endless bikeshed that could come from such a list. However, it is hard to imagine calling a platform "supported" without user, group, file, and directory also working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that is the intended actual list. It's what @adamhjk refers to as the "holy trinity of resources" but there were some complaints about that language, so I reworded it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add i.e. to that list? so those who are looking closely can interpret that as a complete list?

* The most important core resources (i.e. package, service, template) work out of the box

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to be pedantic, but you probably want to use "eg." rather than "i.e."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. is correct in this case, as package, service and template are actually the only 'most important core resources' in existence, and not merely examples of them.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I misunderstood. I thought there were other examples still to be named.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to see i.e. added here as a minimum explanation that these three resources are, in fact, requirements.

* Ohai attributes for ```platform```, ```platform_family```, ```platform_version``` and ```kernel.machine``` are correct

Chef Client support policies also apply to Ohai, since that is a dependency.

### Tier 1 Support

Tier 1 supported platforms are those for which Chef builds native binary "Omnitruck" (full-stack installer) packages. For each platform, Chef performs some post-build verification on them or their equivalents. For example, we may elect to do post-build verification for Oracle Enterprise Linux using the same test results as Red Hat Enterprise Linux, since they are so similar).

Platform | Versions | Architectures | Package Format | Built on
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick: This table is currently not 100% correct but it is where we ideally want to get to.

Currently for centos, mac and ubuntu we perform builds on more that one version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to capture what the desired state is, since it is an RFC, and then once adopted we put things in compliance. There are target platforms (e.g. Ubuntu 11.04) that are EOL that we should no longer build for, for example.

--- | --- | --- | --- | ---
AIX | 6.1, 7.1 | ppc64 | bff | AIX 6.1
CentOS | 5, 6, 7 | i386, x86_64 | rpm | RHEL 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about RHEL? Should we move to a place were we just refer to RHEL variants as EL and list what EL includes in one place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schisamo I deliberately call El variants out separately because they have subtle differences (example: OEL with the UEK kernel by default)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then we need a row like:

RHEL | 5, 6, 7 | i386, x86_64 | rpm | RHEL 5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks like I left it off by accident

FreeBSD | 9, 10 | i386, amd64 | pkg_add pkg | FreeBSD 9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance FreeBSD support could be expanded for FreeBSD 8.X. I know Dyn had this compiling without issues and the 8.X series is still considered a production branch: http://www.freebsd.org/releases/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tas50 I just can't see adding Tier 1 support and all the infrastructure involved to do that for something that's going to be end-of-life in 11 months.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliandunn, surely in 11 months we will be discussing Chef 13 and dropping FreeBSD 8 support. I cannot measure the required infrastructure work, but considering that the EOL of other platform versions below is closer, I do not see this as crazy. All this taking into account that it already seems to work on FreeBSD 8. Of course, if the work involved is very large, it's not worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a zero sum game, and based on the size of the TODO list and the distros that we should support that are more modern and the lack of agility that we have in our CI infrastructure, I'm skeptical. If we did a stack ranking exercise of distro support things like better OEL testing (server), SmartOS, Debian (server), ArchLinux, Gentoo, etc are going to be ahead of it. I'd argue that SLES 10 is probably more important as well. And if we start from the top of the list and work down we're not going to hit FreeBSD 8 before its end of life.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lamont-granquist That's exactly my feeling. Our resources are limited and at some point we have to make some calls even if they are tough ones.

Mac OS X | 10.6, 10.7, 10.8, 10.9 | x86_64 | dmg | Mac OS 10.7
Oracle Enterprise Linux | 5, 6, 7 | i386, x86_64 | rpm | RHEL 5
Red Hat Enterprise Linux | 5, 6, 7 | i386, x86_64 | rpm | RHEL 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't build specific binaries or have specific testers for OEL or RHEL and the client package has certainly not hit that as an issue. We also support Scientific Linux and Linux Mint with the same amount of effort that we've put in to OEL and RHEL. Amazon Linux as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are moving to a place where all EL builds are done on RHEL vs CentOS (new CI uses RHEL builders). I believe @juliandunn is just documenting that aspiration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but:

"there is equipment in the CI pipeline to perform client verification tests (CVT) on machines of that platform"

And then it lists "platforms" of "OEL/RHEL/CentOS", which implies to me that we have testers for all three distros, times 3 versions, times 2 architectures, and for each category (cvt tester, omnibus builder, omnibus tester). That'd be 54 virts just to build the client for RHEL-ish platforms.

So, that's why I think we need to separate "Tier 1 Support/Response" vs "CVT Testers" vs "Omnibus Builders" vs "Omnibus Testers" where all 4 of those support matrixes may be different. Otherwise we all look at this matrixes and interpret differently what it means for what our CI infrastructure needs to support. I know that the plan of record right now is to build on EL5 and test on EL6/EL7 so we'll have 6 CVT testers, 2 omnibus builders and 6 omnibus testers, but that certainly isn't clear just from the content in this document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lamont-granquist I think we're getting into implementation details now (CVT vs omnibus tester) -- however the client team does post-build verification is up to you. All I mean to say is that tier 1 platforms should have some sort of post-build verification, because we've gotten burned in the past (example: critical resources on Solaris not working)

It also need not mean that we have all these things running permanently, or that we do post-build verification on every single build. My intent is just to say that "before we ship a GA for xyz, we should have run some kind of smoke test on the exact platform variant & version we claim to support".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"tier 1 platforms should have some sort of post-build verification, because we've gotten burned in the past"

Right, I disagree.

  • OEL/RHEL/CentOS should be "Tier1" in terms of what we tell customers.
  • I do not want and do not think we need testers for more than one of those (for the client).

We have largely NOT gotten burned in the past on OEL or RHEL when compared to CentOS, so comparing it to Solaris is largely apples and oranges. Clearly based on the Solaris-specific code in the client and on past history we need testers there in order to validate. I haven't seen the same errors on the RHEL-variants. Even in the one error I can think of in the past year (the postrm script issues) that turned up in Amazon Linux, not in OEL/CentOS/RHEL.

So, I think you're making up a rule here which will have very little value and will have a large amount of cost (at least 6 testers per distro, if not 12 testers per distro). The expansion of the test matrix produces instability, even just instability in jenkins (I just had a chef-client-build run fail because jenkins dropped in the middle of it a few minutes ago). The more boxes there are in the matrix then the more failures we are going to have. I seriously doubt our ability to engineer an infrastructure where we get intermittent test failures down to zero, I think that's about as likely as ever getting a drug-free society. Likewise, we do not need to drive distro-specific bug fixes to zero. Just because we have a metric doesn't mean there isn't an acceptable risk of failures to escape CI testing. At some point the cost of trying to mitigate the last few failures exceeds the cost of just living with fixing those bugs after they are released.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ask @sersut to weigh in on this argument because he previously stated the opposite desire. A middle ground would simply for me to write a column that says "tested on" and indicate where the tier 1 build is to be tested.

Solaris | 9, 10, 11 | sparc, x86 | shar | Solaris 9
Windows | 7, 8, 8.1, 2003R2, 2008, 2008R2, 2012, 2012R2 | x86, x86_64 | msi | Windows 2008R2
Ubuntu Linux | 10.04, 12.04, 14.04 | x86, x86_64 | deb | Ubuntu 10.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debian as well. That's actually more important to call out than all the different RHEL flavors because the init system is (annoyingly enough) different.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also strongly vote for Debian. Thanks for supporting it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you list it as Tier 1 though, @lamont-granquist? As in, have post-build verification testers for it?


### Tier 2 Support

Tier 2 supported platforms are those on which Omnitruck will serve packages, but those packages may not have been built on that OS variant. Additionally, we may or may not do post-build verification on these platforms.

* SUSE Linux Enterprise Server 10, 11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have testers for SLES11, SLES10 is actually broken and we'd need to compile packages specifically for SLES10 since the EL5 binaries fail on SLES10 (ie. it would need to be tier 1).

Based on the failure with SLES10, i think we need to maintain SLES11 testers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I question the need to maintain the infrastructure for that and to treat SLES as Tier 1. Do we have enough SLES users to justify it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to support SLES at Tier1. It's huge in Europe and in
north american scientific computing.
-s

On Fri, Jul 25, 2014 at 1:06 AM, Julian C. Dunn notifications@github.com
wrote:

In platform-support-policy.md:

+--- | --- | --- | --- | ---
+AIX | 6.1, 7.1 | ppc64 | bff | AIX 6.1
+CentOS | 5, 6, 7 | i386, x86_64 | rpm | RHEL 5
+FreeBSD | 9, 10 | i386, amd64 | pkg_add pkg | FreeBSD 9
+Mac OS X | 10.6, 10.7, 10.8, 10.9 | x86_64 | dmg | Mac OS 10.7
+Oracle Enterprise Linux | 5, 6, 7 | i386, x86_64 | rpm | RHEL 5
+Red Hat Enterprise Linux | 5, 6, 7 | i386, x86_64 | rpm | RHEL 5
+Solaris | 9, 10, 11 | sparc, x86 (10 and 11 only) | shar | Solaris 9
+Windows | 2003R2, 2008, 2008R2, 2012, 2012R2 | x86, x86_64 | msi | Windows 2008R2
+Ubuntu Linux | 10.04, 12.04, 14.04 | x86, x86_64 | deb | Ubuntu 10.04
+
+### Tier 2 Support
+
+Tier 2 supported platforms are those on which Omnitruck will serve packages, but those packages may not have been built on that OS variant. Additionally, we do no CVT on these platforms.
+
+* SUSE Linux Enterprise Server 10, 11

I guess I question the need to maintain the infrastructure for that and to
treat SLES as Tier 1. Do we have enough SLES users to justify it?


Reply to this email directly or view it on GitHub
https://github.com/opscode/chef-rfc/pull/21/files#r15386451.

* Scientific Linux 5.x, 6.x and 7.x (i386 and x86-64)
* Debian Linux 6.x and 7.x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah we currently have omnibus builders and testers for debian and need to maintain that.

* Gentoo Linux (rolling release)
* Arch Linux (rolling release)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an ArchLinux wiki page on doing omnibus-chef compiles which is actually Arch-User-Maintained.

This is actually a somewhat bad model because random library header files being present can cause configure scripts in omnibus to link against software on the box and make the health checks fail. Of course at that point I'm not even certain health checks make any sense? Same thing with gentoo... rolling source-build releases meeting up with omnibus is kind of odd... They've got bleeding edge ruby installs and everything so why not just gem install chef?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can drop Gentoo and Arch to unofficially unsupported if you think that's the right answer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't. Ideally I'd say we support Gentoo and Arch. And this'll probably become a bigger issue as ChefDK gets rolled out. The fact that we've got folks building omnibus-chef installs with random dev headers and showing up in our issues when the health_check fails suggests to me that we look at disabling the health_checks for those platforms because strict-omnibus makes less sense there, or that we go ahead and start making and releasing official ominbus packages for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the risk of "tier proliferation" I was thinking maybe I add a third that says there are no packages built for Tier 3 platforms, but people are welcome to build their own and/or jam an existing package for some neighbor platform onto that box, and submit patches for things.

I agree that you shouldn't fail health checks for any of these platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, problem is that we will kinda start to waste our time now with "Tier 3" support for Arch. As people find that random header files are picked up, the health_check fails, and we wind up accepting patches to add --without-whatever to the configure lines. At some point it might be easier to just officially produce a build for it. Although, while there was a flurry of activity recently, it seems to have died down a bit.

* Fedora (current non-EOL revisions)
* OpenSUSE 12.3 (until EOL on 15 September 2014), 13.1
* OmniOS stable and LTS releases

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get "SmartOS (rolling release)" into Tier 2? I'm more interested in ensuring that core resources work correctly on SmartOS than having the proper package delivered by Omnitruck. Joyent currently provides Rubygems builds for Chef Client, so maybe something could be worked out?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (and others besides @aglarond) would also love to see at least best-effort Tier 2 support for the SmartOS Illumos distro. It's one of the few cloud-specific OSes in the open source world, and lots of SmartOS users (including the parent company, Joyent) also use Chef heavily.

I'd be more than happy to contribute some of my at-work time to getting this into Omnitruck for automatic knife bootstrapping, if someone from Chef can just point me in the right direction WRT contribution. Omnibus/Omnitruck is a bit daunting for a newcomer. CC/ @someara @sax as I've discussed this with both of them before more than once.

Also note that a lot of the code we'd need for this would likely be cross-platform across all Illumos distros (OmniOS, SmartOS, OpenIndiana), and we might even be able to re-use some of the Omnibus build code for Solaris. So with the right guidance, we might be able to wrap this pretty quickly and get benefits across many platforms.

Lastly, note the work that's happening here: https://github.com/sax/vagrant-smartos-zones

### Not Supported

"Not supported" means there may be code in-tree, but we don't build for and test on those platforms. At our discrection, we may take patches that don't break any tier 1 or tier 2 platforms, but we have no way of testing these.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"discretion"


* Solaris 8
* AIX 5.1L
* FreeBSD 8
* OpenBSD
* NetBSD
* Windows 2003, Windows 2000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, should XP and Vista be added to this list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vista is ~2008r2, which we support, isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vista is 2008 R1. R2 is Windows 7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but 2003r2, 2008, 2008r2 are all listed as tier1 support. This is another goofy edge case where I would bet money we can state we support it since we have to support the associated server SKUs but we don't because we don't explicitly test the client SKUs with ChefDK.

* RHEL/CentOS/Oracle/Scientific 4.x or older
* RHEL or SLES on POWER (ppc64) or System/z
* HP-UX
* Mac OS X 10.5, older, or anything ppc-based

## Chef Server

Includes any of the add-ons (webui2/manage, push, etc.)

### Supported

* Ubuntu 10.04LTS, 12.04LTS, 14.04LTS
* RHEL 5.x, 6.x, 7.x
* CentOS 5.x, 6.x, 7.x
* Oracle Enterprise Linux 5.x, 6.x, 7.x

### Unsupported

* Any other Linux or UNIX distributions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a very non-exhaustive list, I don't know if this is needed.

* Windows

## ChefDK

### Supported

* Windows 7, 8, 8.1
* Fedora (current non-EOL releases)
* RHEL 6.x
* Mac OS X 10.8, 10.9
* Ubuntu 12.04, 14.04

ChefDK bundles Chef Client. Therefore, Chef Client is supported, by extension, on the foregoing client platforms, if not already mentioned explicitly in the Chef Client support matrix.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already being explicit, it'd be nice to mention which tier they count as (I'm assuming Tier 1?). Or would ChefDK count as its own tier in the Chef Client matrix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually put the desktop Windows platforms in the matrix, so it covers us off - this was just extra insurance in case I missed anything. (We would generally consider them Tier 1.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, cool.


### Unsupported

* Windows Vista, XP, 2000
* Mac OS X < 10.8, anything ppc

## Appendix: Guiding Principles for Operating System Version Support

Once Chef Software, Inc. decides to support an operating system, we will also develop rules to determine under what upstream vendor lifecycle we will continue to support products, and they will be documented in this section. Vendors have various terminology to describe support lifecycles ('standard support', 'extended support', etc.) and it is useful to clarify what those mean in the context of Chef's products.

Platform | Support Until | References
--- | --- | ---
Mac OS X | Current version, plus two previous versions | Apple does not clearly announce EOLs, so we have made this choice
RHEL and EL-variants | End of RedHat Production 3 Phase | https://access.redhat.com/support/policy/updates/errata/
Solaris | End of Premier Support | http://www.oracle.com/us/support/library/lifetime-support-hardware-301321.pdf
Ubuntu | End of LTS lifecycle for LTS releases, end of standard release lifecycle for non-LTS releases | https://wiki.ubuntu.com/LTS
Windows | End of Extended Support | https://support.microsoft.com/lifecycle/?c2=1163