Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEV] Genesis not wait long enough time to have NICs get IP address when STP is not closed #1638

Closed
zet809 opened this issue Aug 5, 2016 · 5 comments
Assignees
Milestone

Comments

@zet809
Copy link

zet809 commented Aug 5, 2016

For currect genesis base for ppc64, the NICs will be in DOWN state when genesis kernel is running. We have set it up within xcat genesis doxcat script.
The issue is, if not close STP on switch or set port fast, it will be about 50 seconds before the first package can be send out.
Currently, within doxcat, we only wait for 20 seconds to check whether the NICs can get IP address through DHCP, it is not long enough.
We need to modify it to be 60s or longer.

The log below shows that, from starting dhclient to require IP address to the time that the DHCP request is send out successfully, it will cost about 40 seconds.
2016-08-05T06:24:43.784494+00:00 (none) xcat.genesis.doxcat: Setting IP via DHCP... ==> Start dhclient to require dynamic IP address for all NICs
2016-08-05T06:24:45.793430+00:00 (none) xcat.genesis.doxcat: Acquiring network addresses..
2016-08-05T06:24:45.812918+00:00 (none) dhclient[1290]: Can't bind to dhcp address: Cannot assign requested address
2016-08-05T06:24:45.812924+00:00 (none) dhclient[1290]: Please make sure there is no other dhcp server
...
2016-08-05T06:24:59.308137+00:00 (none) dhclient[1289]: DHCPDISCOVER on enP5p7s0f0 to 255.255.255.255 port 67 interval 10 (xid=0x52655a84)
2016-08-05T06:25:05.819030+00:00 (none) xcat.genesis.doxcat: still can not get bootnic, go into /bin/bash
2016-08-05T06:25:09.676646+00:00 (none) dhclient[1289]: DHCPDISCOVER on enP5p7s0f0 to 255.255.255.255 port 67 interval 12 (xid=0x52655a84)
2016-08-05T06:25:21.089198+00:00 (none) dhclient[1289]: DHCPDISCOVER on enP5p7s0f0 to 255.255.255.255 port 67 interval 13 (xid=0x52655a84)
2016-08-05T06:25:22.089597+00:00 (none) dhclient[1289]: DHCPREQUEST on enP5p7s0f0 to 255.255.255.255 port 67 (xid=0x52655a84) ==> The DHCP request send out successfully at this time.
2016-08-05T06:25:22.089616+00:00 (none) dhclient[1289]: DHCPOFFER from 192.168.3.29
2016-08-05T06:25:22.112080+00:00 (none) dhclient[1289]: DHCPACK from 192.168.3.29 (xid=0x52655a84)
2016-08-05T06:25:22.119990+00:00 (none) rsyslogd: [origin software="rsyslogd" swVersion="8.10.0" x-pid="1205" x-info="http://www.rsyslog.com"] exiting on signal 15.

@zet809 zet809 added this to the 2.12.2 milestone Aug 5, 2016
@zet809 zet809 changed the title [DEV] Genesis not wait long enough time to have NICs get IP address when STP not closed [DEV] Genesis not wait long enough time to have NICs get IP address when STP is not closed Aug 5, 2016
@whowutwut
Copy link
Member

whowutwut commented Aug 5, 2016

@zet809 Should the solution be to wait longer? I think an alternative is to have it documented that we need to set port fast switching on the switches to work with OpenPower servers. @cxhong has created a script to configure the switches and in there we could either set it automatically, or at least check for it...

Or,. xcatprobe could detect this configuration error and print a WARN message.

@zet809
Copy link
Author

zet809 commented Aug 8, 2016

@whowutwut Yes, the current solution is to wait longer.
It's true that we can either document to configure port fast or have a script to configure switches automatically. But we can only support several kind of most commonly using switches.
My idea is that if the OS installation works without doing anything for the switches, wait longer is acceptable. What is your idea?

@zet809
Copy link
Author

zet809 commented Aug 11, 2016

The verification shows that without closing STP, the rh7.2 can be installed successfully, so close this defect first and we can reopen this defect once we encountered again.

@zet809 zet809 closed this as completed Aug 11, 2016
@whowutwut
Copy link
Member

whowutwut commented Aug 16, 2016

Hi @zet809 , I've noticed the following...

A few times last week and today... When the node is idle for some time, the compute side MAC no longer shows up on the switch table.

frame45_sw1>show mac-address-table interface port 1
     MAC address       VLAN     Port    Trnk  State  Permanent  Openflow
  -----------------  --------  -------  ----  -----  ---------  --------
  70:e2:84:14:0a:5e       1    1              FWD                  N  

Could this be caused by not enabling the port fast?

@daniceexi
Copy link
Contributor

@whowutwut The switch mac table has an ageing time for each mac address. When the ageing time expired, the mac address is removed. Generally the ageing time is 300s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants