Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openvswitch and routes not working after restart on Rocky-Linux 8.5 #599

Open
Tokix opened this issue Feb 17, 2022 · 6 comments
Open

openvswitch and routes not working after restart on Rocky-Linux 8.5 #599

Tokix opened this issue Feb 17, 2022 · 6 comments
Labels

Comments

@Tokix
Copy link

Tokix commented Feb 17, 2022

Description
With a bit of modification of the kubeinit files I am able to get okd deployed on rocky-linux 8.5. You can see the modifications here: https://github.com/Tokix/kubeinit I could make a pull request but there is one thing that is not working as expected and that is the restart of the server. After the restart the routes are vanished and I'm not able to reach the frontend anymore.

To Reproduce
Steps to reproduce the behavior:

  1. Install a Redhat 8.5 machine setup ssh connection as nyctea as described in the manual
  2. In my case I had to install python on the hypervisor_host machine addionally before running the playbook successfully

yum install python3

  1. Clone the changes for Rocky8.5

git clone https://github.com/Tokix/kubeinit.git

  1. Run the playbook
ansible-playbook \
    -v --user root \
    -e kubeinit_spec=okd-libvirt-3-1-1 \
    -i ./kubeinit/inventory \
    ./kubeinit/playbook.yml

  1. Enable the frontend
ssh root@nyctea
chmod +x  create-external-ingress.sh
./create-external-ingress.sh
  1. Setup the DNS Entries for your system
  2. check if the url is working (it works at this point):

https://console-openshift-console.apps.okdcluster.kubeinit.local/

  1. reboot the server

init 6

  1. The URL is not working any longer:

https://console-openshift-console.apps.okdcluster.kubeinit.local/

Expected behavior
The external url of the cluster should be available on restart and the routes should be set.

Screenshots
Working route-configuration before the restart:

image

Route configuration after restart:

image

Infrastructure

  • Hypervisors OS: Rocky-Linux
  • Version 8.5

Deployment command

ansible-playbook \
    -v --user root \
    -e kubeinit_spec=okd-libvirt-3-1-1 \
    -i ./kubeinit/inventory \
    ./kubeinit/playbook.yml

Inventory file diff

I did no changes to the inventory file

Additional context

As selinux is active on rocky-linux 8.5 my first thought was that some changes could not be persisted so I disabled selinux for testing. However it is still not running after restart.

Checked this old issue https://forums.opensuse.org/showthread.php/530879-openvswitch-loses-configuration-on-reboot but it seems that the booting order of openvswitch and network.service is fine.

Furthermore I ran the steps "Attach our cluster network to the logical router" in the file kubeinit/roles/kubeinit_libvirt/tasks/create_network.yml - This got me back to the correct routing table but I'm still not able to reach the guest-systems via 10.0.0.1-x

Is there any script or service that needs or can be re-run to enable the networking after reboot?
In any case I'm thankful for any hints let me know if you need more information.

Thank you in any case for the great project :)

@ccamacho ccamacho added the keep label Mar 5, 2022
@jeffabailey
Copy link
Contributor

I'm also running into a problem with Rocky Linux.

Any help is welcome, this is a cool project, I hope we can get it working on Rocky.

TASK [kubeinit.kubeinit.kubeinit_prepare : Create ssh config file from template] *******************************************************************************
task path: /home/jeff/.ansible/collections/ansible_collections/kubeinit/kubeinit/roles/kubeinit_prepare/tasks/create_host_ssh_config.yml:52
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: jeff
<127.0.0.1> EXEC /bin/sh -c 'echo ~jeff && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/jeff/.ansible/tmp `"&& mkdir "` echo /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742 `" && echo ansible-tmp-1650765476.136161-198434-213715344205742="` echo /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742 `" ) && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/template/__init__.py", line 1117, in do_template
    res = j2_concat(rf)
  File "<template>", line 47, in root
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/jinja2/runtime.py", line 903, in _fail_with_undefined_error
    raise self._undefined_exception(self._undefined_message)
jinja2.exceptions.UndefinedError: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/plugins/action/template.py", line 146, in run
    resultant = templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False)
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/template/__init__.py", line 1154, in do_template
    raise AnsibleUndefinedVariable(e)
ansible.errors.AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'
fatal: [localhost]: FAILED! => {
    "changed": false,
    "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'"
}

PLAY RECAP *****************************************************************************************************************************************************
localhost                  : ok=48   changed=7    unreachable=0    failed=1    skipped=25   rescued=0    ignored=0   

@jeffabailey
Copy link
Contributor

jeffabailey commented May 6, 2022

My issue isn't specific to Rocky, so I'll add a new issue.

I ran into the same error using Debian.

Edit (Issue added): #647

@ccamacho
Copy link
Collaborator

Maybe there are some IPtables rules not persisted after rebooting and I dont have a way to test this on Rocky.

@logeshwaris
Copy link

Hi @ccamacho,

Thanks for the awesome project. 👍

I am also running into same issue. After reboot, I am not able to reach 10.0.0.x.
Is there a way where we can re enable the networking after reboot?

@tschuyebuhl
Copy link

I've got two servers, one with alma 8.x (which also seems to lose connectivity after reboot), and one with centos stream. I could help with providing some debug data, I can sacrifice my currently running clusters if need be.

@tschuyebuhl
Copy link

Okay, so the one with CentOS 8 and vanilla k8s didn't persist after restart. The VM's launched fine, but there was no networking. Also, the service pod only had one IP address, from the 10.89.x.x subnet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants