Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dhcpmon] Delay Counter Snapshot During Indeterminate State #5509

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tahmed-dev
Copy link
Contributor

This PR fixes a rare, however possible, condition when the DHCP RX
packet do happen before dhcpmon timer kicks in and DHCP TX packets
happens after. If DHCP remains silent for the next bucket duratrion,
those TX packet will be snapshot without switching DHCP state to
healthy.

signed-off-by: Tamer Ahmed tamer.ahmed@microsoft.com

- Why I did it
Fix issue were DHCP is reported unhealthy due to perfect packet split of rx happening before check timer kicks in and tx happens after timer is serviced.

- How I did it
Do not update snapshot counters if DHCP state indeterminate

- How to verify it
When delaying TX packet to take place after timer is serviced. The issue happens

Sep 29 21:57:13.433168 str-s6000-acs-14 ALERT dhcp_relay#dhcpmon[43]: dhcpmon detected disparity in DHCP Relay behavior. Duration: 306 (sec) for vlan: 'Agg-Vlan1000'
Sep 29 21:57:13.433168 str-s6000-acs-14 NOTICE dhcp_relay#dhcpmon[43]: [    Agg-Vlan1000-Snapshot rx/tx] Discover:         6/        0, Offer:         6/        6, Request:         6/      288, ACK:         6/        6
Sep 29 21:57:13.433168 str-s6000-acs-14 NOTICE dhcp_relay#dhcpmon[43]: [    Agg-Vlan1000- Current rx/tx] Discover:         7/        0, Offer:         6/        6, Request:         6/      288, ACK:         6/        6

- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006

This PR fixes a rare, however possible, condition when the DHCP RX
packet do happen before dhcpmon timer kicks in and DHCP TX packets
happens after. If DHCP remains silent for the next bucket duratrion,
those TX packet will be snapshot without switching DHCP state to
healthy.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Copy link
Contributor

@pavel-shirshov pavel-shirshov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Please wait for other reviewers

* The idea here is to delay snapshoting those packet and compare them
* with next opportunity when dhcp packet are received.
*/
if (state_data[0].dhcp_status != DHCP_MON_STATUS_INDETERMINATE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to check state_data[0].dhcp_status != DHCP_MON_STATUS_INDETERMINATE here? Should you also check that state_data[1].dhcp_status != DHCP_MON_STATUS_INDETERMINATE?

@lguohan
Copy link
Collaborator

lguohan commented Oct 21, 2020

@tahmed-dev, wonder what is the status for this pr?

@tahmed-dev
Copy link
Contributor Author

@tahmed-dev, wonder what is the status for this pr?

@lguohan I'd like to hold off on this PR until I get more data from the current dhcpmon. The current dhcpmon will show more logs that will prove the theory above. Also, the current dhcpmon has bucket size of 18 sec which might indeed solve the current issue as 1 min is not an integral multiple of buckets anymore. This case is very slim and would happen if there is only one dhcp client that has its timer aligned with the monitor timer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants