Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchagent validates mirror session queue parameter against maximum value from SAI #1957

Merged
merged 3 commits into from
Oct 18, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions orchagent/mirrororch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@
#define MIRROR_SESSION_DSCP_MIN 0
#define MIRROR_SESSION_DSCP_MAX 63

// 15 is a typical value, but if vendor's SAI does not supply the maximum value,
// allow all 8-bit numbers, effectively cancelling validation by orchagent.
#define MIRROR_SESSION_DEFAULT_NUM_TC 255
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if 15 is the typical value, I would suggest having 15 as the default limit. By having 255 and if user supply a value like 100, this still can fail in SAI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure of some automatic tests has shown me that the question is more complex than that. Firstly, let me describe our internal discussions. My first inclination was to use 15 as a typical default, but our architect pointed out that if some vendor supports a higher number and does not implement SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES, then a user configuring that higher number will be prevented from doing so. On the other hand, if we set the default high, then a smart user of that vendor's switch could go up to its maximum, and only a dumb user would enter a value that is too high and crash the switch. So I changed it to 255, which would allow all values up to 254. Of course, you may take the position that if a vendor wants to allow use of higher numbered queues, that vendor must implement SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES.

Now I would like to consult about another aspect. Changing to 255 exposed a bug in the test which I missed because internally I ran the wrong test version, but the failed tests in github show it. The vs test uses setReadOnlyAttr to simulate a value of 15 for SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES, expects passing 14 to succeed and 15 to fail. This worked as long as the default in orchagent was 15. But after I changed it to 255, creating a session with queue=15 succeeded, and the test failed. I believe the reason is that for efficiency, orchagent only queries SAI once for SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES, caches the value, and compares against the cached value whenever a session is created. You can see that code in this PR. But that probably means that when the test tries to set the limit with setReadOnlyAttr, it has no effect, because orchagent already completed its initialization, found no implementation of SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES in the vs environment, and set the limit to the default of 255. This theory is supported by additional tests I did today, where 254 succeeds and 255 fails. That means that the test cannot simulate its own limit, and must assume the same default as orchagent.

To sum up:

  1. In the light of the discussion above, do you still think I should change the default back to 15?
  2. Do you agree that I should get rid of setReadOnlyAttr in the test, and align with whatever default orchagent sets for testing the limit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Approving from my side. Lets wait for @bingwang-ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I gather you answer to #1 is No. What about #2? Unless I change something in the test, it will continue to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clarification in #1 makes sense to me. Regarding #2, I think updating the test code is better.


extern sai_switch_api_t *sai_switch_api;
extern sai_mirror_api_t *sai_mirror_api;
extern sai_port_api_t *sai_port_api;

Expand Down Expand Up @@ -80,9 +85,26 @@ MirrorOrch::MirrorOrch(TableConnector stateDbConnector, TableConnector confDbCon
m_policerOrch(policerOrch),
m_mirrorTable(stateDbConnector.first, stateDbConnector.second)
{
sai_status_t status;
sai_attribute_t attr;

m_portsOrch->attach(this);
m_neighOrch->attach(this);
m_fdbOrch->attach(this);

// Retrieve the number of valid values for queue, starting at 0
attr.id = SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES;
status = sai_switch_api->get_switch_attribute(gSwitchId, 1, &attr);
if (status != SAI_STATUS_SUCCESS)
{
SWSS_LOG_WARN("Failed to get switch attribute number of traffic classes. \
Use default value. rv:%d", status);
m_maxNumTC = MIRROR_SESSION_DEFAULT_NUM_TC;
}
else
{
m_maxNumTC = attr.value.u8;
}
}

bool MirrorOrch::bake()
Expand Down Expand Up @@ -373,6 +395,11 @@ task_process_status MirrorOrch::createEntry(const string& key, const vector<Fiel
else if (fvField(i) == MIRROR_SESSION_QUEUE)
{
entry.queue = to_uint<uint8_t>(fvValue(i));
if (entry.queue >= m_maxNumTC)
{
SWSS_LOG_ERROR("Failed to get valid queue %s", fvValue(i).c_str());
return task_process_status::task_invalid_entry;
}
}
else if (fvField(i) == MIRROR_SESSION_POLICER)
{
Expand Down
2 changes: 2 additions & 0 deletions orchagent/mirrororch.h
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ class MirrorOrch : public Orch, public Observer, public Subject
NeighOrch *m_neighOrch;
FdbOrch *m_fdbOrch;
PolicerOrch *m_policerOrch;
// Maximum number of traffic classes starting at 0, thus queue can be 0 - m_maxNumTC-1
uint8_t m_maxNumTC;

Table m_mirrorTable;

Expand Down
49 changes: 48 additions & 1 deletion tests/test_mirror_port_span.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,54 @@
@pytest.mark.usefixtures('dvs_mirror_manager')
@pytest.mark.usefixtures('dvs_policer_manager')
class TestMirror(object):

def check_syslog(self, dvs, marker, log, expected_cnt):
(ec, out) = dvs.runcmd(['sh', '-c', "awk \'/%s/,ENDFILE {print;}\' /var/log/syslog | grep \'%s\' | wc -l" % (marker, log)])
assert out.strip() == str(expected_cnt)


def test_PortMirrorQueue(self, dvs, testlog):
"""
This test covers valid and invalid values of the queue parameter. All sessions have source & dest port.
Operation flow:
1. Create mirror session with queue 0, verify session becomes active and error not written to log.
2. Create mirror session with queue max valid value, verify session becomes active and error not written to log.
3. Create mirror session with queue max valid value + 1, verify session doesnt get created and error written to log.
Due to lag in table operations, verify_no_mirror is necessary at the end of each step, to ensure cleanup before next step.
"""

session = "TEST_SESSION"
dst_port = "Ethernet16"
src_ports = "Ethernet12"

# Simulate SAI max number of traffic classes
dvs.setReadOnlyAttr('SAI_OBJECT_TYPE_SWITCH', 'SAI_SWITCH_ATTR_QOS_MAX_NUMBER_OF_TRAFFIC_CLASSES', '15')

# Sub Test 1
marker = dvs.add_log_marker()
self.dvs_mirror.create_span_session(session, dst_port, src_ports, direction="BOTH", queue="0")
self.dvs_mirror.verify_session_status(session)
self.dvs_mirror.remove_mirror_session(session)
self.dvs_mirror.verify_no_mirror()
self.check_syslog(dvs, marker, "Failed to get valid queue 0", 0)

# Sub Test 2
marker = dvs.add_log_marker()
self.dvs_mirror.create_span_session(session, dst_port, src_ports, direction="RX", queue="14")
self.dvs_mirror.verify_session_status(session)
self.dvs_mirror.remove_mirror_session(session)
self.dvs_mirror.verify_no_mirror()
self.check_syslog(dvs, marker, "Failed to get valid queue 14", 0)

# Sub Test 3
marker = dvs.add_log_marker()
self.dvs_mirror.create_span_session(session, dst_port, src_ports, direction="TX", queue="15")
self.dvs_mirror.verify_session_status(session, expected=0)
self.dvs_mirror.remove_mirror_session(session)
self.dvs_mirror.verify_no_mirror()
self.check_syslog(dvs, marker, "Failed to get valid queue 15", 1)


def test_PortMirrorAddRemove(self, dvs, testlog):
"""
This test covers the basic SPAN mirror session creation and removal operations
Expand Down Expand Up @@ -471,7 +519,6 @@ def test_PortLAGMirrorUpdateLAG(self, dvs, testlog):
self.dvs_vlan.asic_db.wait_for_n_keys("ASIC_STATE:SAI_OBJECT_TYPE_LAG", 0)



# Add Dummy always-pass test at end as workaroud
# for issue when Flaky fail on final test it invokes module tear-down before retrying
def test_nonflaky_dummy():
Expand Down