Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncd crash on master RPC image #6622

Closed
Junchao-Mellanox opened this issue Feb 1, 2021 · 15 comments · Fixed by #6689
Closed

syncd crash on master RPC image #6622

Junchao-Mellanox opened this issue Feb 1, 2021 · 15 comments · Fixed by #6689
Assignees

Comments

@Junchao-Mellanox
Copy link
Collaborator

Junchao-Mellanox commented Feb 1, 2021

Description

syncd crash on master RPC image, error log:

Jan 28 03:50:38.688445 r-anaconda-10 INFO syncd#/supervisord: syncd /usr/bin/syncd: symbol lookup error: /usr/bin/syncd: undefined symbol: _ZN6apache6thrift6server13TSimpleServerC1ERKN5boost10shared_ptrINS0_10TProcessorEEERKNS4_INS0_9transport16TServerTransportEEERKNS4_INS9_17TTransportFactoryEEERKNS4_INS0_8protocol16TProtocolFactoryEEE

The issue is found beween hash 25e4d77 and hash 46b3bd5. I went over all changes between two hash, here is a suspicious commit: 8ce1e3e

In this commit, it removes libboost-all-dev and upgrade it to libboost1.71-dev.

Steps to reproduce the issue:
Load master RPC image

Describe the results you received:

syncd crash and not all dockers up

Describe the results you expected:

syncd should be able to up correctly

Additional information you deem important (e.g. issue happens only occasionally):

Found at hash 46b3bd5

@lguohan
Copy link
Collaborator

lguohan commented Feb 1, 2021

can you provide the suspicious commit id?

@Junchao-Mellanox
Copy link
Collaborator Author

8ce1e3e

@lguohan
Copy link
Collaborator

lguohan commented Feb 1, 2021

@tahmed-dev , can you take a look?

@tahmed-dev
Copy link
Contributor

@tahmed-dev , can you take a look?

Looking.

@tahmed-dev
Copy link
Contributor

The unmangled symbols is:
apache::thrift::server::TSimpleServer::TSimpleServer(boost::shared_ptr<apache::thrift::TProcessor> const&, boost::shared_ptr<apache::thrift::transport::TServerTransport> const&, boost::shared_ptr<apache::thrift::transport::TTransportFactory> const&, boost::shared_ptr<apache::thrift::protocol::TProtocolFactory> const&)

Looks like recent TSimpleServer is using std::shared_ptr instead of boost::shared_ptr

@tahmed-dev
Copy link
Contributor

@lguohan to solve this issue we can

  1. install thrift server v0.13.0 that moved away from boost smat ptr, or
  2. change our build to use std smart ptr here

I also made change (PR:784) to azure pipeline to pull matching boost lib to what we have in buildimage.

@lguohan
Copy link
Collaborator

lguohan commented Feb 1, 2021

for the second proposal, it is using CXXFLAGS="-DFORCE_BOOST_SMART_PTR" , I do not quite understand the proposal here.

@tahmed-dev
Copy link
Contributor

The second proposal to remove the macro definition such that it will use std::shared_ptr. It's been adopted in c++11.

On another note and for first proposal above, there is no 0.13.0 in Debian packaging and it was not back-ported to buster.

@lguohan
Copy link
Collaborator

lguohan commented Feb 1, 2021

do you know why it is added, and who?

@tahmed-dev
Copy link
Contributor

@stepanblyschak the boost flag was introduced via PR:2640 Wonder why it was forced to use boost variant?

@lguohan
Copy link
Collaborator

lguohan commented Feb 2, 2021

@liat-grozovik , can you take a look at the question?

@stepanblyschak
Copy link
Collaborator

stepanblyschak commented Feb 3, 2021

@tahmed-dev @lguohan
Initially thrift 0.9.3 used boost pointers by default. With upgrade to 0.11.0 -DFORCE_BOOST_SMART_PTR forces thrift to be backward compatible. Please see the boost usage in saithrift code: https://github.com/opencomputeproject/SAI/blob/master/test/saithrift/src/switch_sai_rpc_server.cpp#L77 which does not compile with 0.11.0 without -DFORCE_BOOST_SMART_PTR
If we want to remove -DFORCE_BOOST_SMART_PTR we need to change this line to std::shared_ptr https://github.com/opencomputeproject/SAI/blob/master/test/saithrift/src/switch_sai_rpc_server.cpp#L77 and that will break SAI thrift compatibility with 0.9.3.
Wonder if thrfit provides an alias smth like apache::thrift::shared_ptr that can preserve backwards compatibility and let us use c++ 11 variants.

@tahmed-dev
Copy link
Contributor

thanks @stepanblyschak! Indeed thrift 0.11.0 is using stdcxx name that could be either boost or std and is not hard coded. In v0.13.0, boost use for smart pointers was removed since it was adopted in C++ standard.

Let me dig further why the mangled name differed when upgraded to boost v1.71. I am under the impression that this compile time dependency would not affect mangled names. I might be wrong. I'll wait to see the recent build of syncd rpc.

@tahmed-dev
Copy link
Contributor

@Junchao-Mellanox, @daall was the issue seen with RPC syncd built after PR:6649 got merged and syncd compiled with that PR?

@Junchao-Mellanox
Copy link
Collaborator Author

I verified with hash aae9664, the issue is still there.

Feb  5 01:09:02.854833 r-tigris-13 INFO syncd#/supervisord: syncd /usr/bin/syncd: symbol lookup error: /usr/bin/syncd: undefined symbol: _ZN6apache6thrift6server13TSimpleServerC1ERKN5boost10shared_ptrINS0_10TProcessorEEERKNS4_INS0_9transport16TServerTransportEEERKNS4_INS9_17TTransportFactoryEEERKNS4_INS0_8protocol16TProtocolFactoryEEE
Feb  5 01:09:02.857073 r-tigris-13 INFO syncd#supervisord 2021-02-05 01:09:02,856 INFO exited: syncd (exit status 127; not expected)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants