Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[orchdaemon]: Fixed sairedis record file rotation #2299

Merged
merged 7 commits into from
Oct 3, 2022

Conversation

bacrossland
Copy link
Contributor

What I did
Fix Azure/sonic-buildimage#8162

Moved sairedis record file rotation logic out of flush() to fix issue.

Why I did it
Sairedis record file was not releasing the file handle on rotation. This is because the file handle release was inside the flush() which was only being called if a select timeout was triggered. Moved the logic to its own function which is called in the start() loop.

How I verified it
Ran a script to fill log and verified that rotation was happening correctly.

Signed-off-by: Bryan Crossland bryan.crossland@target.com

Signed-off-by: Bryan Crossland bryan.crossland@target.com
@bacrossland bacrossland requested a review from prsunny as a code owner May 27, 2022 19:51
@ghost
Copy link

ghost commented May 27, 2022

CLA assistant check
All CLA requirements met.

@lguohan
Copy link
Contributor

lguohan commented May 28, 2022

@kcudnik , can you review this?

@lguohan lguohan requested a review from kcudnik May 28, 2022 19:25
kcudnik
kcudnik previously approved these changes May 31, 2022
@bacrossland
Copy link
Contributor Author

Pipeline fixes are still broken:

@kcudnik @prsunny The errors that are happing on this PR are not related to the code change it makes. The failure are related to problems in the master branch already or problems in the build pipeline.

Path does not exist: /home/vsts/work/1/a/gcov_output

How do you want me to proceed so this PR can be merged?

@kcudnik
Copy link
Contributor

kcudnik commented Jun 6, 2022

Please take a look at this PR too: sonic-net/sonic-sairedis#1058

@Blueve
Copy link

Blueve commented Jul 12, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Blueve
Copy link

Blueve commented Jul 13, 2022

@bacrossland Build has passed. Could you add UT for your change?
https://dev.azure.com/mssonic/build/_build/results?buildId=121405&view=codecoverage-tab

@bacrossland
Copy link
Contributor Author

Thank you @Blueve. I will get that unit test added.

@bacrossland
Copy link
Contributor Author

/easycla

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 8, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@Blueve
Copy link

Blueve commented Aug 17, 2022

/azp run coverage.Azure.sonic-swss.amd64

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@Blueve
Copy link

Blueve commented Aug 17, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Blueve
Copy link

Blueve commented Aug 17, 2022

@kcudnik The UT coverage for this repo might not accurate now and most PR are not having UT. Do you think we can merge this PR wo/ UT?

@kcudnik
Copy link
Contributor

kcudnik commented Aug 22, 2022

no, please add corresponding tests to fix code coverage, we dont want to skip this, threshold must be met

@bacrossland
Copy link
Contributor Author

@kcudnik I'm setting aside time next week to fix this PR and get the testing in.

@kcudnik
Copy link
Contributor

kcudnik commented Sep 18, 2022

sure, lets fix this this week

kcudnik
kcudnik previously approved these changes Sep 20, 2022
@bacrossland
Copy link
Contributor Author

Merged in master.

@bacrossland
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 2299 in repo sonic-net/sonic-swss

@bacrossland
Copy link
Contributor Author

@kcudnik @Blueve I've added the ut for code coverage. The PR is now failing on files missing from swss-common. I don't have permission to rerun the test. Can you rerun the pipeline to help clear up these issues? Thanks.

@Blueve
Copy link

Blueve commented Oct 1, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bacrossland
Copy link
Contributor Author

bacrossland commented Oct 1, 2022

The LGTM analysis is failing on this. It's coming from swss-common:

[2022-10-01 09:09:02] [build-stderr] In file included from defaultvalueprovider.cpp:10:
[2022-10-01 09:09:02] [build-stderr] defaultvalueprovider.h:8:10: fatal error: libyang/libyang.h: No such file or directory
[2022-10-01 09:09:02] [build-stderr]     8 | #include <libyang/libyang.h>
[2022-10-01 09:09:02] [build-stderr]       |          ^~~~~~~~~~~~~~~~~~~
[2022-10-01 09:09:02] [build-stderr] compilation terminated.
[2022-10-01 09:09:02] [build-stderr] make[3]: *** [Makefile:787: libswsscommon_la-defaultvalueprovider.lo] Error 1
[2022-10-01 09:09:02] [build-stderr] make[3]: *** Waiting for unfinished jobs....
[2022-10-01 09:09:06] [build-stdout] make[3]: Leaving directory '/opt/src/sonic-swss-common/common'
[2022-10-01 09:09:06] [build-stderr] make[2]: *** [Makefile:440: all-recursive] Error 1
[2022-10-01 09:09:06] [build-stdout] make[2]: Leaving directory '/opt/src/sonic-swss-common'
[2022-10-01 09:09:06] [build-stderr] make[1]: *** [Makefile:372: all] Error 2
[2022-10-01 09:09:06] [build-stdout] make[1]: Leaving directory '/opt/src/sonic-swss-common'
[2022-10-01 09:09:06] [build-stderr] dh_auto_build: error: make -j4 returned exit code 2
[2022-10-01 09:09:06] [build-stderr] make: *** [debian/rules:33: build] Error 25
[2022-10-01 09:09:06] [build-stderr] dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2
[2022-10-01 09:09:06] [ERROR] Spawned process exited abnormally (code 2; tried to run: [/opt/work/lgtm-workspace/lgtm/extract.sh])
A fatal error occurred: Exit status 2 from command: [/opt/work/lgtm-workspace/lgtm/extract.sh]

@bacrossland
Copy link
Contributor Author

Test vstest is failing on this. Timeout of p2mp_tunnel:

test_p2mp_tunnel failed (1 runs remaining out of 2).
	<class 'AssertionError'>
	Operation timed out after 30 seconds with result None
	[<TracebackEntry /agent/_work/1/s/tests/conftest.py:1803>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:1770>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:405>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:462>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:500>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:105>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:126>, <TracebackEntry /agent/_work/1/s/tests/dvslib/dvs_common.py:60>]
test_p2mp_tunnel failed; it passed 0 out of the required 1 times.
	<class 'AssertionError'>
	Operation timed out after 30 seconds with result None
	[<TracebackEntry /usr/local/lib/python3.8/dist-packages/six.py:718>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:1803>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:1770>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:405>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:462>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:500>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:105>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:126>, <TracebackEntry /agent/_work/1/s/tests/dvslib/dvs_common.py:60>]
test_vlan_extension failed (1 runs remaining out of 2).
	<class 'AssertionError'>
	Operation timed out after 30 seconds with result None
	[<TracebackEntry /usr/local/lib/python3.8/dist-packages/six.py:718>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:1803>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:1770>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:405>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:462>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:500>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:105>, <TracebackEntry /agent/_work/1/s/tests/conftest.py:126>, <TracebackEntry /agent/_work/1/s/tests/dvslib/dvs_common.py:60>]
test_vlan_extension failed; it passed 0 out of the required 1 times.
	<class 'AssertionError'>
	Wrong number of created entries.
	[<TracebackEntry /agent/_work/1/s/tests/test_evpn_tunnel_p2mp.py:65>, <TracebackEntry /agent/_work/1/s/tests/evpn_tunnel.py:524>, <TracebackEntry /agent/_work/1/s/tests/evpn_tunnel.py:50>]

Also here. Assertion failed on removing ipv6 link:

test_NeighborAddRemoveIpv6LinkLocal failed (1 runs remaining out of 2).
	<class 'AssertionError'>
	assert 2 == 4
  -2
  +4
	[<TracebackEntry /agent/_work/1/s/tests/test_ipv6_link_local.py:63>]

@bacrossland
Copy link
Contributor Author

Pulled in latest commits that have fixes for missing libyang

@bacrossland
Copy link
Contributor Author

@kcudnik @Blueve @prsunny Everything is passing. Ready for a review and a merge.

@Blueve Blueve merged commit 24d29f1 into sonic-net:master Oct 3, 2022
@yxieca
Copy link
Contributor

yxieca commented Oct 3, 2022

@bacrossland can you raise a separate PR for 202205 branch? This change cannot be cherry-picked to 202205 branch cleanly.

@qiluo-msft
Copy link
Contributor

This commit could not be cleanly cherry-picked to 202012. Please submit another PR.

@bacrossland
Copy link
Contributor Author

Sure. I'll put those in later tonight or tomorrow morning.

@bacrossland
Copy link
Contributor Author

PR 202205 Release: #2480

@bacrossland
Copy link
Contributor Author

@qiluo-msft 202012 is going to take more work as that release is missing things I relied on for the unit tests I put in master. I have to rework my unit tests to make it work for that old release. I could also just put in a PR without the unit tests but that would be less desirable.

@bacrossland
Copy link
Contributor Author

PR 202012 Release: #2481

Pterosaur pushed a commit to Pterosaur/sonic-swss that referenced this pull request Nov 5, 2022
* [orchdaemon]: Fixed sairedis record file rotation

* What I did
Fix sonic-net/sonic-buildimage#8162
Moved sairedis record file rotation logic out of flush() to fix issue.

Why I did it
Sairedis record file was not releasing the file handle on rotation. This is because the file handle release was inside the flush() which was only being called if a select timeout was triggered. Moved the logic to its own function which is called in the start() loop.

How I verified it
Ran a script to fill log and verified that rotation was happening correctly.

Signed-off-by: Bryan Crossland bryan.crossland@target.com
@liuh-80
Copy link
Contributor

liuh-80 commented Feb 2, 2023

202012 branch PR merged: #2481

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants