Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many saithrift servers in SAI repo #1879

Open
kcudnik opened this issue Aug 30, 2023 · 18 comments
Open

Too many saithrift servers in SAI repo #1879

kcudnik opened this issue Aug 30, 2023 · 18 comments
Assignees
Labels

Comments

@kcudnik
Copy link
Collaborator

kcudnik commented Aug 30, 2023

While playing with DASH an SAI repos i noticed that there are too many saithrift servers, and each one is slightly different then each other:

  1. https://github.com/opencomputeproject/SAI/blob/master/meta/sai_rpc_frontend.cpp (frontend), and meta/sai_rpc_server.cpp (second frontend?) (no main file here) is generated by gensairpc.pl, but this file is not used, also sai.thrift file is generated (with -e option can include experimental headers)
  2. https://github.com/opencomputeproject/SAI/blob/master/test/saithrift/src/switch_sai_rpc_server.cpp (frontend) and main file (https://github.com/opencomputeproject/SAI/blob/master/test/saithrift/src/saiserver.cpp) (which has its own switch_sai.thrift, which seems to be manually maintained)
  3. https://github.com/opencomputeproject/SAI/blob/master/test/saithriftv2/src/saiserver.cpp (main) and it uses frontend file from point 1 (https://github.com/opencomputeproject/SAI/blob/master/meta/sai_rpc_frontend.cpp), it uses sai.thrift generated from SAI/meta
  4. https://github.com/opencomputeproject/SAI/blob/master/bm/sai_adapter/test/sai_thrift_src/switch_sai_rpc_server.cpp (frontend) and main (https://github.com/opencomputeproject/SAI/blob/master/bm/sai_adapter/test/sai_thrift_src/saiserver.cpp), this one have its own switch_sai.thrift which also seems to be manually maintained, latest changed in this directory was in Jun 2020, so it was 3 years ago, im guessing this is abandoned then

then DASH utilizes saithriftv2 from SAI repo option 3, should we make and maintain only one saithrift server and deprecate and remove others ? we should deprecate 1,2 and 4 i think, and also why gensairpc.pl is generating sai_rpc_server.cpp but this file is not used ? - it is used, i overlooked that, it's included as cpp file in sai_rpc_frontend.cpp (agrh!)

any ideas?

also i would like to get rid of global and external linking to gSwitchId and gPortMap, those are global variables in frontends needed by saithrit when is used as a library for example in syncd, normally those variables are declared in thrift server main, but in case of syncd they are implemented in syncd itself, can saithrift figure out gSwitchId when user is calling create_switch, and PortMap from get PORT_LIST attribute on switch, and then attributes on specific ports if needed?

btw. that SAI/meta/sai_rpc_frontent.cpp have bugs, for example when it passes attributes and it encounters unknow attribute type, it just ignoring it, without any warning or error, and user has no idea whether this attribute was passed correctly or not, this is in attribute conversion thrift/sai here: https://github.com/opencomputeproject/SAI/blob/master/meta/sai_rpc_frontend.cpp#L512 and https://github.com/opencomputeproject/SAI/blob/master/meta/sai_rpc_frontend.cpp#L822 (at least those 2 obvious ones)

also there is one of the problems in DASH repo libsai.so which returns SUCCESS on failed get port/switch attribute, and saithrift or ptf tests are not able to handle that and fails, which is wrong behavior, it should be able to handle error message SAI_STATUS_NOT_SUPPORTED, this depends to it's own issue, but at this point i don't know whether this should be in DASH saithrift, SAI test/saithriftv2 or SAI/test/ptf tests, or DASH/dash-pipeline tests

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 2, 2023

@chrispsommers fyi, can you have some comments maybe on this ?

@rlhui @lguohan
hey, can we make this maybe a topic in SAI community? i would also like to deprecate and remove and clean SAI/bm/ seems for me that this is early p4 mode, which was checked in back in 2018 which is 5 years old now, there is behavioral-mode submodule https://github.com/p4lang/behavioral-model.git which is bmv2 implementation, loaded into SAI/bm/behavioral-model and this is recently updated, i could not find any recent SAI and DASH references to bm/behavioral-model, DASH is directly pooling https://github.com/p4lang/behavioral-model.git repo inside docker files: dash-pipeline/dockerfiles/Dockerfile.bmv2-bldr:FROM p4lang/behavioral-model@sha256:ce45720e28a9

@chrispsommers
Copy link
Contributor

i would also like to deprecate and remove and clean SAI/bm/ seems for me that this is early p4 mode,
@rlhui Agreed, this is an ancient unmaintained project, just clutters the repo. It was a great idea in its day, but has fallen by the wayside. Now it's just a distraction and might mislead a newcomer. Any projects which somehow uses it, e.g. from a submodule, will still be able to refer to it through git history, so deleting it should not cause any side effects.

@chrispsommers
Copy link
Contributor

@kcudnik re:

then DASH utilizes saithriftv2 from SAI repo option 3, should we make and maintain only one saithrift server and deprecate and remove others ? we should deprecate 1,2 and 4 i think
Agreed. Similar to previous comment, any older project supporting saithriftv1 can continue to import the older refs from Git History, even if we delete it for housecleaning. Others may disagree on removing it so let's get a broad consensus. We could post a large warning in the README for projects being deprecated to allow for some grace period e.g. 3-6 months before acting, then set a tickler to delete it.

@chrispsommers
Copy link
Contributor

@kcudnik re:

also there is one of the problems in DASH repo libsai.so which returns SUCCESS on failed get port/switch attribute, and saithrift or ptf tests are not able to handle that and fails, which is wrong behavior, it should be able to handle error message SAI_STATUS_NOT_SUPPORTED, this depends to it's own issue, but at this point i don't know whether this should be in DASH saithrift, SAI test/saithriftv2 or SAI/test/ptf tests, or DASH/dash-pipeline tests

I can't recall if this is a workaround to avoid breaking DASH-specific PTF or SAI Challenger test cases, it might be. Seems like it should be filed in DASH as an issue. We might want to get @reshmaintel or @andriy-kokhan to comment.

@andriy-kokhan
Copy link
Contributor

@kcudnik re:

then DASH utilizes saithriftv2 from SAI repo option 3, should we make and maintain only one saithrift server and deprecate and remove others ? we should deprecate 1,2 and 4 i think
Agreed. Similar to previous comment, any older project supporting saithriftv1 can continue to import the older refs from Git History, even if we delete it for housecleaning. Others may disagree on removing it so let's get a broad consensus. We could post a large warning in the README for projects being deprecated to allow for some grace period e.g. 3-6 months before acting, then set a tickler to delete it.

@chrispsommers , I think saithrift is still referred in some places in sonic-buildimage. The community may want to clean it up first even if it's not actually involved in the build process.

@andriy-kokhan
Copy link
Contributor

@kcudnik re:

also there is one of the problems in DASH repo libsai.so which returns SUCCESS on failed get port/switch attribute, and saithrift or ptf tests are not able to handle that and fails, which is wrong behavior, it should be able to handle error message SAI_STATUS_NOT_SUPPORTED, this depends to it's own issue, but at this point i don't know whether this should be in DASH saithrift, SAI test/saithriftv2 or SAI/test/ptf tests, or DASH/dash-pipeline tests

I can't recall if this is a workaround to avoid breaking DASH-specific PTF or SAI Challenger test cases, it might be. Seems like it should be filed in DASH as an issue. We might want to get @reshmaintel or @andriy-kokhan to comment.

I do not recall either SAI-C TCs that retrieve switch/port attributes or SAI-C workarounds for that. Usually, get operation is not that actively exercised by SAI test cases. So, I'm not surprised that this was missed.
@kcudnik , could you please specify the test cases that fail because of this issue?

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 7, 2023

i noticed that failure when returning actual sai not implemented in DASH sai implementation, let me bring that locally and i will post here which tests are failing, i don't remember whether that was particular testcase or sithrift server itself

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 7, 2023

@andriy-kokhan ok, when i run ptf tests, and return actually SAI_STATUS_NOT_SUPPORTED(-2) on not supported attributes i get this error

 $ make run-saithrift-ptftests
./disable_veth_ipv6.sh
Disable IPv6 to suppress Linux control plane noise
...
docker run -u root --network=host --rm -w /tests/ -it  \
-w /tests/ \
local/dash-saithrift-client:latest \
./functional/ptf/run-tests.sh
+ ptf --test-dir ./functional/ptf --pypath /SAI/ptf --interface 0@veth1 --interface 1@veth3 --test-case-timeout=1800 '--test-params=connection='\''tcp'\'';target='\''bmv2'\'';traffic_check='\''bidir'\'';'
/usr/local/lib/python3.8/dist-packages/ptf-0.9.3-py3.8.egg/EGG-INFO/scripts/ptf:19: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Using packet manipulation module: ptf.packet_scapy
saidashacl.SaiThriftDashAclTest ... Reboot mode is: cold
Config_db.json uses default path /SAI/ptf/config/../resources/config_db.json
port_config.ini uses default path /SAI/ptf/config/../resources/port_config.ini
Ignore all the expect error code and exception captures.
Get number_of_system_ports 0
Get number_of_active_ports 2
Restore all the expect error code and exception captures.
SkipTest on expected error. sai_thrift_get_switch_attribute with errorcode: -2 error: sai_thrift_exception(status=-2)
skipped 'SkipTest on expected error. sai_thrift_get_switch_attribute with errorcode: -2 error: sai_thrift_exception(status=-2)'

----------------------------------------------------------------------
Ran 1 test in 0.015s

OK (skipped=1)
saidasheni.CreateDeleteEniTest ... Reboot mode is: cold
Config_db.json uses default path /SAI/ptf/config/../resources/config_db.json
port_config.ini uses default path /SAI/ptf/config/../resources/port_config.ini

and run tests hangs in this position, seems indefinitely

this is the error:

SkipTest on expected error. sai_thrift_get_switch_attribute with errorcode: -2 error: sai_thrift_exception(status=-2)
skipped 'SkipTest on expected error. sai_thrift_get_switch_attribute with errorcode: -2 error: sai_thrift_exception(status=-2)'

when i return dummy success, the flow is like this:

...
Using packet manipulation module: ptf.packet_scapy
saidashacl.SaiThriftDashAclTest ... Reboot mode is: cold
Config_db.json uses default path /SAI/ptf/config/../resources/config_db.json
port_config.ini uses default path /SAI/ptf/config/../resources/port_config.ini
Ignore all the expect error code and exception captures.
Get number_of_system_ports 0
Get number_of_active_ports 2
Restore all the expect error code and exception captures.
***** Number of available resources *****
SAI_SWITCH_ATTR_ECMP_MEMBERS :  0
ecmp_members :  0..._DOUBLE_NAT_ENTRY :  0
available_double_nat_entry :  0
Get default vlan...
For Common platform, Only check Port status.
Turn up port 281474976710657
Turn up port 281474976710658
For Common platform, expecting bridge ports not been created by default.
Finish SaiHelperBase setup

Sending packet...
 <Ether  dst=00:00:02:03:04:05 src=00:06:07:08:09:...
...
 ----------------------------------------------------------------------^M
 Ran 1 test in 1.103s
 
 OK

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 7, 2023

i don't know which function sai_thrift_get_switch_attribute is throwing exception, the one in saithrift cpp or adapter.py, but the test should be able to handle that

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 7, 2023

this is saithrift current implementation:

14715   void sai_thrift_get_port_attribute(sai_thrift_attribute_list_t& attr_list_out, const sai_thrift_object_id_t port_oid, const sai_thrift_attribute_list_t& attr_list) {
...
14749     status = port_api->get_port_attribute((sai_object_id_t)port_oid, attr_count, sai_attr_list);
14750     if (status != SAI_STATUS_SUCCESS) {
14751       sai_thrift_exception e;
14752       e.status = status;
14753       throw e;
14754     }

it throws exception on any non success api

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 7, 2023

in sai_adapter.py: (generated adapter)

   24 # Expected error code
   25 # Used with SKIP_TEST_ON_EXPECTED_ERROR
   26 # For some expected errors in test
   27 # Like SAI_STATUS_NOT_SUPPORTED = -2
   28 EXPECTED_ERROR_CODE = [-2]
...
48408 def sai_thrift_get_switch_attribute(client,
48409                                     number_of_active_ports=None,
...
49652         attr_list = client.sai_thrift_get_switch_attribute(attr_list)
49653     except sai_thrift_exception as e:
49654         status = e.status
49655         if SKIP_TEST_ON_EXPECTED_ERROR and status in EXPECTED_ERROR_CODE:
49656             reason = "SkipTest on expected error. sai_thrift_get_switch_attribute with errorcode: {} error: {}".format(
49657                 status, e)
49658             print(reason)
49659             testutils.skipped_test_count=1
49660             raise SkipTest(reason)
49661         if CATCH_EXCEPTIONS:
49662             return None
49663         else:
49664             raise e
49665

even if error is in expected error code, you will still get raise exception SkipTest, the problem here is that python SAI apis don't return error code, they return attribute, and exception in case of error, so the actual tests would need to be modified to catch exception if not supported attribute is queried

in case of ptf tests, this is attribute 11 which is SAI_PORT_ATTR_SUPPORTED_FEC_MODE_EXTENDED, and after that maybe some other attributes ?

w:- getSwitchAttribute: [0] attr 183 is NOT SUPPORTED
n:- getSwitchAttribute: [0] SAI_SWITCH_ATTR_NUMBER_OF_ACTIVE_PORTS = 2
w:- getSwitchAttribute: [0] attr 11 is NOT SUPPORTED <<----- this one

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 7, 2023

problem lies in this function DASH/dash-pipeline/SAI/SAI/ptf/sai_base_test.py

 829     def saveNumberOfAvaiableResources(self, debug=False):
 830         """
 831         Save number of available resources
 832         This allows to verify if all the test objects were removed
 833
 834         Args:
 835             debug (bool): enables debug option
 836         Return:
 837             dict: switch_resources dictionary with available resources
 838         """
 839
 840
 841         switch_resources = sai_thrift_get_switch_attribute(
 842             self.client,
 843             available_ipv4_route_entry=True,
 844             available_ipv6_route_entry=True,
 845             available_ipv4_nexthop_entry=True,
 846             available_ipv6_nexthop_entry=True,
 847             available_ipv4_neighbor_entry=True,
 848             available_ipv6_neighbor_entry=True,
 849             available_next_hop_group_entry=True,
 850             available_next_hop_group_member_entry=True,
 851             available_fdb_entry=True,
 852             available_ipmc_entry=True,
 853             available_snat_entry=True,
 854             available_dnat_entry=True,
 855             available_double_nat_entry=True,
 856             number_of_ecmp_groups=True,
 857             ecmp_members=True)
 858
 859
 860         if debug:
 861             self.printNumberOfAvaiableResources(switch_resources)
 862
 863         return switch_resources

it throws exception, since so far this was returning success, but when we actually return not supported, this throws exception and actual test is not executed at all

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 7, 2023

all this is a little bit convoluted, with that previous catch skiptest #1879 (comment), i think this should all be revisited, since single not supported on availability resource should not be deal breaker on the test, but i have too little knowledge to figure out how to properly fix this

from my understanding that exception catch should be inside each test separately, instated of sai_adapter.py

fyl @lguohan

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 7, 2023

@lguohan @andriy-kokhan @chrispsommers i added info warning message that we will remove bm dir at the end of 2023 #1889

@reshmaintel
Copy link
Contributor

reshmaintel commented Sep 7, 2023

Adding @ravi861 to help with DASH SAI PTF exception. Thanks.

@richardyu-ms
Copy link
Collaborator

richardyu-ms commented Oct 16, 2023

also i would like to get rid of global and external linking to gSwitchId and gPortMap, those are global variables in frontends needed by saithrit when is used as a library for example in syncd, normally those variables are declared in thrift server main, but in case of syncd they are implemented in syncd itself, can saithrift figure out gSwitchId when user is calling create_switch, and PortMap from get PORT_LIST attribute on switch, and then attributes on specific ports if needed?

@kcudnik , I recalled, we keep using the global variable, gSwitchId, cause saiserver might need to be used within and integrated with syncd container(saiserver v1 also supports that), like read and write some output with syncd manner, but also support rpc invocations.

@richardyu-ms
Copy link
Collaborator

I think saithrift is still referred in some places in sonic-buildimage. The community may want to clean it up first even if it's not actually involved in the build process.

@kcudnik like syncd output the sairedis.log, sai test cases also generated the logs with method name, parameters, and return values, i cannot recall if the exception will be locked, we can check on that.

@kcudnik
Copy link
Collaborator Author

kcudnik commented Oct 16, 2023

ok, sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants