Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Che fails on version 1.4.1 #1269

Closed
gorkem opened this issue Aug 16, 2017 · 17 comments
Closed

Che fails on version 1.4.1 #1269

gorkem opened this issue Aug 16, 2017 · 17 comments

Comments

@gorkem
Copy link

gorkem commented Aug 16, 2017

Che has a service named che-host that has a route configured. When the service is trying to reach itself through the internal names it fails, public route works fine. When I ssh into the pod and try to curl to http://che-host it fails after a long wait where as http://localhost and with internal ip address works fine.

Also when I enable the logs I see lines like follow for every pod.
1610 docker_sandbox.go:263] Couldn't find network status for eclipse-che/che-3-deploy through plugin: invalid network status for

We also did a minishift ssh and compared /etc/resolv.conf with 1.3.1 and 1.4.1 only has a single nameserver entry where as 1.3.1 also includes and entry for search.

@amisevsk Anything else I am missing?

@gbraad
Copy link
Member

gbraad commented Aug 16, 2017 via email

@gorkem
Copy link
Author

gorkem commented Aug 16, 2017

I seem to have b2d

@gbraad
Copy link
Member

gbraad commented Aug 16, 2017

I did a quick run of the versions you mentioned, using B2D, but I never get an entry for 'search' in /etc/resolv.conf

  • Minishift v.1.4.1
    • OpenShift v3.6 - nameserver 192.168.122.1
    • OpenShift v1.5.1 - nameserver 192.168.122.1
  • Minishift v1.3.1
    • OpenShift v1.5.1 - nameserver 192.168.122.1

With CentOS (v1.3.1 - OS v1.5.1)

# Generated by NetworkManager
nameserver 192.168.122.1
nameserver 192.168.42.1

With CentOS (v1.4.1 - OS v3.6.0)

# Generated by NetworkManager
nameserver 192.168.122.1
nameserver 192.168.42.1

Which is as expected...

@gorkem
Copy link
Author

gorkem commented Aug 16, 2017

Hmmm. I do not have those entries anymore on my 1.3.1 either

@amisevsk
Copy link

I've managed to narrow this down somewhat.

Running minishift 1.4.1 and Openshift version

OpenShift Master: v3.6.0+c4dd4cf
Kubernetes Master: v1.6.1+5115d708d7

I am able to reproduce the issue. Running minishift 1.4.1 and OpenShift v1.5.1 it does not occur.

However, the issue is actually that pods cannot resolve their own service or service's clusterIP. Starting a second pod, I can curl che-host without issue, but from within che host I cannot. There are no real networking issues except that pods cannot access their own services -- localhost and external work fine.

After a bit of digging, I came across this section of kubernetes documentation that seems related.

@gbraad Is there a setting that has changed between 1.5.1 and 3.6? I don't see this issue on OpenShift Online, running

OpenShift Master: v3.6.173.0.5 (online version 3.5.0.20)
Kubernetes Master: v1.6.1+5115d708d7

@gbraad
Copy link
Member

gbraad commented Aug 16, 2017

Is there a setting that has changed between 1.5.1 and 3.6?

I do not have enough visibility on this, but hopefully @csrwng or @bparees knows more about this. This could very well be related to how oc cluster up sets up the configuration. You could try this with a new VM and just running oc cluster up. This would exclude any of our configuration that happens.

@csrwng
Copy link

csrwng commented Aug 16, 2017

We've seen this issue in cluster up before -- see openshift/origin#12111
There were 2 different problems we found with code,

  1. /sys/devices/virtual/net was not getting mounted into the origin container as read/write and the kubelet was not able to set the hairpin mode on the docker bridge
  2. /var/lib/docker was hardcoded as the host's docker directory and in some machines that wasn't the real docker directory

If either of these is still the cause, one thing you can try to solve the issue is running:

ifconfig docker0 promisc

on the minishift vm

@amisevsk
Copy link

@csrwng It looks like your command solved the issue for me.

Regarding the problems you listed,

  1. /sys/devices/virtual/net seems to be mounted RW in the origin container:
            {
                "Source": "/sys/devices/virtual/net",
                "Destination": "/sys/devices/virtual/net",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
  1. I'm not sure how to check, but /var/lib/docker is a symlink to /mnt/sda1/var/lib/docker, which seems to be correct.

We've figured out a workaround for our issue (using localhost instead of the service), but I would still like to know what's going wrong.

@gbraad
Copy link
Member

gbraad commented Aug 17, 2017

I would still like to know what's going wrong.

So do I, as we might have to provide this as a known issue. Especially since the PR openshift/origin#12744 seems to have been available since Feb 1st. And therefore this should have been observed since v1.5.1?

@csrwng
Copy link

csrwng commented Aug 17, 2017

@gbraad if you tell minishift v1.4.1 to run origin version v3.6, does it obtain the v3.6 oc client to run 'oc cluster up'? or does it use the one bundled in the image? If the latter, what version of the 'oc' binary is included in the minishift v1.4.1 image?

@praveenkumar
Copy link
Contributor

praveenkumar commented Aug 17, 2017

run origin version v3.6, does it obtain the v3.6 oc client to run 'oc cluster up'?

@csrwng Yes it obtain 3.6 oc client to deploy 3.6 cluster.

@LalatenduMohanty
Copy link
Member

LalatenduMohanty commented Aug 21, 2017

@csrwng I am confused. I guess Origin 3.6 oc binary has the fix i.e. openshift/origin#12744 right?

@csrwng
Copy link

csrwng commented Aug 21, 2017

Yes it should have the fix

@LalatenduMohanty LalatenduMohanty added this to the v1.6.0 milestone Aug 30, 2017
@praveenkumar
Copy link
Contributor

@gorkem can you try out it with minishift 1.5.0 without any workaround and let us know if it still fail because here we are using 3.6.0 as default openshift version.

@coolbrg coolbrg self-assigned this Sep 13, 2017
@coolbrg
Copy link
Contributor

coolbrg commented Sep 13, 2017

Any update @gorkem ? Have you got chance to try minishift 1.5.0 ?

@gorkem
Copy link
Author

gorkem commented Sep 13, 2017

I was able to use it with minishift 1.5.1 without any hacks.

@coolbrg
Copy link
Contributor

coolbrg commented Sep 13, 2017

Thanks, @gorkem for confirming. We are closing this issue now. If you facing anything please feel free to open the new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants