Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze how https is working on k8s and OpenShift #13869

Closed
skabashnyuk opened this issue Jul 16, 2019 · 24 comments
Closed

Analyze how https is working on k8s and OpenShift #13869

skabashnyuk opened this issue Jul 16, 2019 · 24 comments
Assignees
Labels
kind/task Internal things, technical debt, and to-do tasks to be performed.

Comments

@skabashnyuk
Copy link
Contributor

Is your task related to a problem? Please describe.

The goal of this task is to test how https is working on k8s and different flavors of OpenShift.

@skabashnyuk skabashnyuk added kind/task Internal things, technical debt, and to-do tasks to be performed. severity/P1 Has a major impact to usage or development of the system. team/platform labels Jul 16, 2019
@sleshchenko sleshchenko self-assigned this Jul 16, 2019
@sleshchenko sleshchenko added the status/in-progress This issue has been taken by an engineer and is under active development. label Jul 16, 2019
@benoitf
Copy link
Contributor

benoitf commented Jul 16, 2019

FYI on GCP and EC2

Google Cloud Platform k8s version: "v1.12.8-gke.10"

with multi-user/tls let's encrypt certificates:

workspace is starting fine but after I start a workspace I have the error in dashboard:
Error: Workspace doesn't have a server which matches with URL: https:///?uid=939264
when trying to open the workspace (che-theia)

And without multi-user = single-user

Server is starting fine and workspace is starting fine but unable to open the workspace:
All URLs on che-theia container are returning the main index.html of the GWT IDE
Request URL: https:///servervtg2c4bu-theia-idew6l/runtime.e89741c86dc89dd2ddfd.js
of any URL that starts with https:///servervtg2c4bu-theia-idew6l/ will return

<!--

    Copyright (c) 2012-2018 Red Hat, Inc.
    This program and the accompanying materials are made
    available under the terms of the Eclipse Public License 2.0
    which is available at https://www.eclipse.org/legal/epl-2.0/

    SPDX-License-Identifier: EPL-2.0

    Contributors:
      Red Hat, Inc. - initial API and implementation

-->
<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <meta name="mobile-web-app-capable" content="yes">
    <title>Eclipse Che</title>
    <link rel="shortcut icon" href="/_app/favicon.ico"/>
    <link href="https://fonts.googleapis.com/css?family=Droid+Sans+Mono" rel="stylesheet" type="text/css"/>
    <script type="text/javascript" language="javascript">

        /**This parameter is needed to define sdk mode.*/

...

@l0rd l0rd added this to the 7.0.0 milestone Jul 16, 2019
@sleshchenko
Copy link
Member

OCP 3.9 run on docker
Che Server with tls enabled works fine. I deployed Multi-User Che with ocp.sh:

./ocp.sh --run-ocp --deploy-che --multiuser --secure --no-pull --setup-ocp-oauth

Minishift v1.34.1
Failed to deploy Che Server with tls enabled because of DNS alternative names issue on minishift.
Che Server failed with the following error:
Screenshot_20190717_092608
The certificate has the following alternative DNS names:
Screenshot_20190717_092622
As an example, DNS alternatives names of a certificate on OCP
Screenshot_20190717_101055

Maybe there is an ability to customize certificate for minishift but I did not investigate this issue yet

@monaka
Copy link
Member

monaka commented Jul 17, 2019

AFAIK, there are several way to support https on K8s/Ingress. It's hard to support all of them, IMO.
It may be a realistic approach to expect their TLS key has already deployed as their secret object by hand.

@nickboldt
Copy link
Contributor

nickboldt commented Jul 17, 2019

@sleshchenko If you're going to test OCP 3, please use 3.11.

OCP 3.9 is not supported for use with CRW 1.x, and no planned support for CRW 2, so it follows logically that 3.9 is also not supported for Che 7. :)

That said, testing on OCP 4.1 is recommended too, as we DEFINITELY support that one in CRW 1.2 and 2.0.

@sleshchenko
Copy link
Member

@benoitf Thanks for sharing issues you faced and helping me a lot to get my own GCP installation and deploy Eclipse Che there. I faced the same issues and figure out that it caused by single-host server strategy (that is the default when --tls is specified during deploying) but not TLS enabled.
More about single-host issue see #12971 (comment). I explained there an only single-user issue, but for a multi-user error message is different but it caused by the same issue with rewrite-target.

Now, I'm trying to get working TLS for Multi-User Che deployed on GCP when multi-host is used.

@sleshchenko
Copy link
Member

I got success in deploying Che (both Single-user and Multi-user) on Kubernetes cluster powered by GCP (Google Cloud Platform). Here are the summarized issues I faced:

  1. Without modification of Helm Chart user gets issues related to single-host strategy that is configured by default in helm chart with tls enabled.

  2. If helm chart is updated with multi-host server strategy, then the user gets issues related to TLS configuration that we provide out-of-the-box: letsencrypt certificate manager, the created certificate and acme: true for ingresses at the same time. I believe that it worked for single-host but it does not for multi-host.
    These items are addressed in the following issue: Fix an ability to use TLS with K8s infra and helm chart #13946

  3. Multi-host server strategy requires wildcard certificate, and I'm not sure that we need to set it up in helm chart since it's specific to a platform where user run his Che, DNS provider that he uses:

For steps that I did to get it working on GCP:

  1. Create and properly configure DNS zone on GCP.
  2. Create a special service account that would allow the certificate manager to change my DNS configuration in my GCP project.
  3. Create and configure letsencrypt clusterissuer that uses privileges prepared on step 2.
  4. Create a k8s wildcard certificate that is configured with a right wildcard host and DNS type challenge;

We should investigate this topic more and maybe prepare proper instructions on how to get wildcard certificates on some of the platforms, but not include automation of it in helm chart.

This is a pretty whole picture I have today, I'm not a big expert in DNS and TLS things, sorry if there are some inaccuracies.

@sleshchenko
Copy link
Member

I tried to use helm chart instead of chectl to install Secure Multi-user Che and discover that it does not work. It caused by different default value for namespace where workspaces are created, for chectl it's the same as Che Server it, for helm chart it's a new unique namespace per each workspace.
And such installation does not work correctly because nobody propagates certificate for workspaces.
I see the following two possible fixes:

  1. Che Server knows certificate, so it can propagate it through namespaces which are created by it for workspaces.
  2. Is about setting up Kubernetes Replicator[1].

But since the second way still it requires creating an empty secret, I think the first way is more preferable.

I continue working on preparing fixes for HTTPS with a different combination of configuration.

[1] https://itnext.io/using-wildcard-certificates-with-cert-manager-in-kubernetes-and-replicating-across-all-namespaces-5ed1ea30bb93

@l0rd l0rd added severity/blocker Causes system to crash and be non-recoverable or prevents Che developers from working on Che code. and removed severity/P1 Has a major impact to usage or development of the system. labels Jul 23, 2019
@skabashnyuk
Copy link
Contributor Author

The scope of this issue that is left.

  • minishift (local linux)
  • minikube (local linux)
  • AWS
  • Azure.

We are looking for some free tear on AWS or Azure. Not sure we are able to do that without credit card. Trying to manage that problem.

@skabashnyuk
Copy link
Contributor Author

@l0rd @slemeur @benoitf do you think we should test any other environment combinations?

@l0rd l0rd removed this from the 7.0.0 milestone Jul 24, 2019
@l0rd l0rd removed the severity/blocker Causes system to crash and be non-recoverable or prevents Che developers from working on Che code. label Jul 24, 2019
@skabashnyuk skabashnyuk changed the title Test how https is working on k8s and OpenShift Analyze how https is working on k8s and OpenShift Jul 24, 2019
@benoitf
Copy link
Contributor

benoitf commented Jul 24, 2019

For Azure free credits are easy to get
I used DigitalOcean as well (easy to get free credits as well)

@sleshchenko
Copy link
Member

After merging of these PRs #13946, che-incubator/chectl#237 there should not be any issues with HTTPs on Kubernetes if correct TLS certificate is provided for Che.
I've already tested it on GCP (Google Cloud Platform) and now I'm going to test it against Azure.

@sleshchenko
Copy link
Member

I manage to run Eclipse Che on Kubernetes instance powered by Microsoft Azure.

The only issue that I faced: I did not manage to set up an automatic generating of wildcard certificates.
Looks like cert-manager official document[1] is out-dated because it contains --password option that is not supported anymore [2].
[1] https://docs.cert-manager.io/en/latest/tasks/issuers/setup-acme/dns01/azuredns.html
[2] https://docs.microsoft.com/en-us/cli/azure/create-an-azure-service-principal-azure-cli?view=azure-cli-latest#password-based-authentication

After try to actualize instructions, cert-manager failed to generate certificates because of the following error
Screenshot_20190726_153958
It looks similar to the following registered issue:
cert-manager/cert-manager#1650

So, finally I created certificate manually with certbot

certbot certonly --preferred-challenges=dns --manual --email=sleshche@redhat.com --server https://acme-v02.api.letsencrypt.org/directory -d *.che.azure.codenvy-dev.com --agree-tos

Create TLS secret for che

kubectl create secret tls che-tls --key=privkey.pem --cert=fullchain.pem  -n che

and everything works just fine.

@benoitf
Copy link
Contributor

benoitf commented Jul 30, 2019

I was able to use cert-manager on Azure and updated the whole instructions in google doc

so doc is now fine for Azure and GCP in order to have a working multi-user/multi-host/tls with free let's encrypt certificates ! :-)

@benoitf
Copy link
Contributor

benoitf commented Jul 30, 2019

I tried on Amazon EC2 as well and validated cert-manager/ingress-nginx/route53/tls/multi-user/multi-host and I've updated the documentation

@sleshchenko
Copy link
Member

sleshchenko commented Jul 31, 2019

I tried on Amazon EC2 but faced an issue with right setting up EC2 cluster, it does not work for me out-of-the-box. Nginx service was not provisioned with an external IP address.

$ kubectl describe service -n ingress-nginx ingress-nginx
Warning  CreatingLoadBalancerFailed  1m                service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service ingress-nginx/ingress-nginx: AccessDenied: User: arn:aws:sts::269287474311:assumed-role/masters.ide.aws-serg.codenvy-dev.com/i-0c029cb1b9c854bff is not authorized to perform: iam:CreateServiceLinkedRole on resource: arn:aws:iam::269287474311:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing

Here is a article that helps me to solve my issue https://medium.com/faun/aws-eks-the-role-is-not-authorized-to-perform-ec2-describeaccountattributes-error-1c6474781b84
aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com command should be executed.
This info is added to our doc about setting up Che on Amazon remote Kubernetes cluster.

So, finally I deployed my Che with TLS enabled on Amazon and it worked just fine.

@sleshchenko
Copy link
Member

sleshchenko commented Aug 2, 2019

I've tested Che with TLS enabled on minishift and minikube. Here are some instructions/comments what should be done to test it:
Prerequisites

  1. To test Che with TLS enabled, it's needed to own wildcard certificates, cert bot may be used to generate such
certbot certonly --preferred-challenges=dns --manual --email=sleshche@redhat.com --server https://acme-v02.api.letsencrypt.org/directory -d *.local.che.com --agree-tos

But you need your own DNS server to do DNS challenge.
2. Probably your Minishift/Minikube is run locally and it not available publicly. Even in such case, you need to configured DNS server to point admired host to virtual machine ip, like 192.168.99.100. So, nobody except you will be able to access your K8s/OS cluster using public host name.

Minikube:
There were no big issues with minikube, the only k8s infra specific prerequisite should be done - che-tls secret with generated certificates should be created in che namespace:

kubectl create namespace che
kubectl create secret tls che-tls --key=privkey.pem --cert=fullchain.pem  -n che

And note that there were discovered two chectl issues, and you should use up-to-date chectl binaries: che-incubator/chectl#239, che-incubator/chectl#241

Minishift:
It's a bit harder to get Che with https on minishift.
In the case of minikube, we specify k8s cluster which host we expect via igresses.
In the case of minishift, we need to configure minishift and let it know which public hostname and routing suffix should be used. I used the latest (v1.34.1+c2ff9cb) minishift binary and it worked buggy for me when I specify public-hostname and routing suffix. More see minishift/minishift#3309 (comment)
Probably, specifying public-hostname is not needed and it should be possible to do like this

$ minishift start $(needed memory params and admired vm-driver)
$ minishift openshift config set --patch '{"routingConfig": {"subdomain": "myminishift.com"}}'

In case of K8s, we configure each Ingress with secret where TLS cert is stored.
In case of OS infra, we expect that OS cluster manager right certificates, so we need to configure router with our certificates:

$ cat fullchain.pem privkey.pem > minishift-cert.pem

$ oc login -u system:admin
$ oc project default
$ oc delete secret router-certs
$ oc create secret tls router-certs --key=privkey.pem --cert=minishift-cert.pem
$ oc rollout latest router

Now, it should be possible to deploy Che on minishift with TLS enabled. I did it with checlt and used operator as installer.

I'm not sure what is users admired way to provide TLS certs for Che - configure their router or provide a certificate for Che Server in the same way as it can be done for K8s infra. So, maybe we should create a separate issue to implement such an ability (make Che Server reuse che-tls secret instead of relying on router configuration)

@sleshchenko
Copy link
Member

I tried to test Che with TLS enabled on CRC but did not manage to run it successfully.
One of the main failure reason: there is no ability to configure the public hostname and routing suffix as minikube and minishift have.
I tried to modify the source code of crc [1] and paste my domain there, but try was not a success.
crc.testing and apps-crc.testing domains are also hardcoded in a bundle. I tried to modify some bundle configuration after crc unpack it, but failed on next phase, oc login is redirected to openshift-oauth.crc.testing for some reason even after replacing crc.testing in all places.

I assume that Che should work just fine if OpenShift 4.x is configured properly and I think it does not make much sense to play with CRC to do things that are not implemented yet (configure public-hostname and routing-suffix)

[1] https://github.com/code-ready/crc/blob/dede24a5ac5e1dab762c5fe7796eac6e44b46374/pkg/crc/preflight/preflight_checks_linux.go#L38

@benoitf
Copy link
Contributor

benoitf commented Aug 6, 2019

@sleshchenko is there a pending issue to allow custom domain in CRC ? (if none we could create one or discuss how to manage TLS there ? )

@sleshchenko
Copy link
Member

@sleshchenko is there a pending issue to allow custom domain in CRC ? (if none we could create one or discuss how to manage TLS there ? )

I got the following answer from CRC guys via CoreOS Slack

it would be hard, so no, not on the roadmap now
We don't have and for us it is not something we can do since this also contain the certs for the hostname.
you can create an issue

So, we can create an issue to discuss TLS configuration.

@sleshchenko
Copy link
Member

I'm closing this issue since we tested platforms that were defined as a scope:

  1. minishift (local linux)
  2. minikube (local linux)
  3. GCP
  4. AWS
  5. Azure

If anybody has any issues with TLS on the listed platforms or any others - feel free to register a new issue.

@slemeur
Copy link
Contributor

slemeur commented Aug 6, 2019

@sleshchenko Could you list the issues identified and fixed along this issue?

@sleshchenko
Copy link
Member

Summary of the work done in this issue:

GPC was the first platform we tried to test.
We discovered that issues user faced was not because of TLS enabled but because of single-host that was enabled by default along with TLS. There is an issue for single-host [1].
Then was figure out that TLS functionality in Helm Chart was designed to use letsencrypt certificate manager that should generate new certificate per each host (Che, Keycloak, Workspace Servers). And it did not work because there was a conflict in the secret name, but even after fixing it, we probably would face the issue with letsencrypt certificates limit [2].
The solution here is to use wildcard certificates and to archive it a user has to set up DNS Challenge and certificate for his DNS provider [3]. The following doc contains up-to-date information about GCP [4].

Azure was the second platform we tested.
We discovered that documentation to set up DNS Challenge for Azure was outdated since the password parameter is no longer supported. The following PR [6] make it working again.
After setting up DNS challenge, Che with TLS enabled was deployed via chectl without any issues.
Detailed instructions can be found in the following docs [7].

AWS
With AWS we faced an amazon issue that for new users there no needed permissions in place, we discovered how to fix it manually [8]. Detailed instructions(along with fixing permissions issue) can be found in the following docs [9].

Minikube&Minishift
The defailed info was already published here [10]

CRC
We did manage to test Che with TLS enabled on local CRC. I believe that it's possible but additional CRC source code changing is needed. More info is here [11]

[1] #12971 (comment)
[2] https://letsencrypt.org/docs/rate-limits/
[3] https://docs.cert-manager.io/en/latest/tasks/issuers/setup-acme/dns01/google.html
[4] https://docs.google.com/document/d/1T5N7oB3XDgABAA9mebJWeTeDflKxq5NXDM1QI9mmQfE/edit?ts=5d35c03f
[5] https://docs.cert-manager.io/en/latest/tasks/issuers/setup-acme/dns01/azuredns.html
[6] cert-manager/cert-manager#1940
[7] https://docs.google.com/document/d/1WSB5VTS0sBask5lE0pyhH5Gp-8qC4xXr8NgckF0b0Z8/edit
[8] https://medium.com/faun/aws-eks-the-role-is-not-authorized-to-perform-ec2-describeaccountattributes-error-1c6474781b84
[9] https://docs.google.com/document/d/1BAnjIZgBLjkUu_7RG9EgSDMQ1MbzWRbC0yVqfU18IcM/edit
[10] #13869 (comment)
[11] #13869 (comment)

cc @slemeur

@sudheerherle
Copy link

Somebody please provide access to the 9th document from @sleshchenko comment above.

@sleshchenko
Copy link
Member

@sudheerherle it's already published to Che 7 docs, you should be able to find it here https://www.eclipse.org/che/docs/che-7/deploying-che-on-kubernetes-on-aws/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/task Internal things, technical debt, and to-do tasks to be performed.
Projects
None yet
Development

No branches or pull requests

8 participants