Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[build] Enable streaming telemetry Docker container by default #2354

Merged
merged 4 commits into from
Mar 1, 2019
Merged

[build] Enable streaming telemetry Docker container by default #2354

merged 4 commits into from
Mar 1, 2019

Conversation

jleveque
Copy link
Contributor

@jleveque jleveque commented Dec 6, 2018

  • Enable streaming telemetry Docker container by default by setting ENABLE_SYSTEM_TELEMETRY = y in default config.

  • Make src/telemetry/debian/rules executable.

  • Add generated files in src/telemetry to .gitignore file

@jleveque jleveque self-assigned this Dec 6, 2018
@jleveque jleveque requested review from lguohan and hui-ma December 6, 2018 00:49
@nikos-github
Copy link
Collaborator

Please help answering a couple of things since you are enabling this by default for everyone.

a) What testing has been done? Please include some basic testing etc.
b) What is the default performance impact, if any, on memory and CPU? If impact is measurable, cli would have to be provided to turn on telemetry collection.

@lguohan
Copy link
Collaborator

lguohan commented Dec 6, 2018

agree with nikos.

rules/config Outdated Show resolved Hide resolved
Copy link
Collaborator

@nikos-github nikos-github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my questions. Thanks.

@lguohan
Copy link
Collaborator

lguohan commented Dec 6, 2018

@hui-ma , can you provide answer to nikos question?

@hui-ma
Copy link
Contributor

hui-ma commented Dec 6, 2018

Sure. I agree with Nikos and Guohan too. @jipanyang has already provided all such info before.

  1. There is auto test suite include in the package:
    https://github.com/Azure/sonic-telemetry/blob/master/doc/grpc_telemetry.md#autotest
    We should add it to jenkin auto test for sonic-telemetry repo
    "go test -v ./gnmi_server/"

  2. The perf test has been done by Jipan and documented here. The cpu load is depending on the polling interval and amount of data is polling. With reasonable value, the cpu load is low: "Without the counter polling activity, the average cpu utilization rate in 60 seconds is 12%. The number changed to 13% with Ethernet ports counters polling every second"
    https://github.com/Azure/sonic-telemetry/blob/master/doc/grpc_telemetry.md#performance-and-scale-test

@nikos-github
Copy link
Collaborator

Is there a way to completely disable the polling through configuration in config_db.json or perhaps configuration under /etc/sonic/telemetry directory? I'm ok with packaging this by default in the image but I need the ability to disable the functionality completely.

@jleveque
Copy link
Contributor Author

jleveque commented Dec 7, 2018

@nikos-github: If you want to completely disable the functionality, you can simply add ENABLE_SYSTEM_TELEMETRY=n to your build command line. Will this not suffice for you?

@nikos-github
Copy link
Collaborator

@jleveque Your suggestion won't work for customers who obtain the image from github and won't be building it themselves.

@zhenggen-xu
Copy link
Collaborator

I guess you can just stop telemetry docker. Linux CLI. :-)

@nikos-github
Copy link
Collaborator

nikos-github commented Dec 7, 2018

@zhenggen-xu That's not how we would like to manage our customers' sonic deployments. We want to express customer intent through configuration and not by whacking dockers.
Some customers may indeed deploy a script that will stop the docker. Those will be very few, if any at all. Most will need and ask for more granular control. Especially those that rely on telemetry solutions outside of the telemetry docker.

@jipanyang
Copy link
Collaborator

@nikos-github Being able to enable/disable a docker service while online looks like a requirement from your customer, will you be able to create PR for this? I think other candidates are snmp, dhcp_relay, and radv.

@nikos-github
Copy link
Collaborator

@jipanyang Do you have a request to present from someone that is requesting the telemetry functionality that isn't building their own image? If so, lets discuss more in private. To bring this docker in the image by default, it has to come along with a config knob in config_db.json which has to work across reboots and config reloads.

Copy link
Collaborator

@jipanyang jipanyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me.

One existing behavior I'm not sure:
Should telemetry docker has dependency on database service instead of swss service?
https://github.com/Azure/sonic-buildimage/blob/master/files/build_templates/telemetry.service.j2
Currently telemetry docker doesn't have real dependency on swss, though it is good to start after swss.

@jleveque
Copy link
Contributor Author

jleveque commented Mar 1, 2019

@nikos-github: We have opened an issue to create a knob in ConfigDB with which we can enable/disable the telemetry docker: #2622

@jleveque jleveque merged commit 0fd4f18 into sonic-net:master Mar 1, 2019
@jleveque jleveque deleted the enable_telemetry branch March 1, 2019 00:27
@nikos-github
Copy link
Collaborator

@jleveque When will the knob be ready? By default, the telemetry docker should be disabled. Why did we merge this when there is a valid objection to which Guohan also agreed to?

yxieca added a commit to yxieca/sonic-buildimage that referenced this pull request Jun 28, 2022
swss:
* 665bbbb 2022-06-27 | Add support for IP interface loopback action (sonic-net#2307) (HEAD -> 202205, github/202205) [Lior Avramov]
* c7f5743 2022-06-28 | [asan] suppress the static variable leaks (sonic-net#2354) [Yakiv Huryk]
* 37e2a31 2022-06-28 | [tests] [asan] add graceful stop flag (sonic-net#2347) [Yakiv Huryk]
* 5ab84cf 2022-06-28 | [202205][cherry-pick] Fix mux_acl_rule adding issue (sonic-net#2358) [bingwang-ms]

linkmgrd:
* a836ef7 2022-06-28 | Use Vlan MAC as src MAC for link prober by default  (sonic-net#93) (HEAD -> 202205) [Jing Zhang]
* a828e86 2022-06-28 | Fix inconsistent mux state (sonic-net#92) (github/202205) [Longxiang Lyu]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
yxieca added a commit that referenced this pull request Jun 29, 2022
swss:
* 665bbbb 2022-06-27 | Add support for IP interface loopback action (#2307) (HEAD -> 202205, github/202205) [Lior Avramov]
* c7f5743 2022-06-28 | [asan] suppress the static variable leaks (#2354) [Yakiv Huryk]
* 37e2a31 2022-06-28 | [tests] [asan] add graceful stop flag (#2347) [Yakiv Huryk]
* 5ab84cf 2022-06-28 | [202205][cherry-pick] Fix mux_acl_rule adding issue (#2358) [bingwang-ms]

linkmgrd:
* a836ef7 2022-06-28 | Use Vlan MAC as src MAC for link prober by default  (#93) (HEAD -> 202205) [Jing Zhang]
* a828e86 2022-06-28 | Fix inconsistent mux state (#92) (github/202205) [Longxiang Lyu]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
dprital added a commit to dprital/sonic-buildimage that referenced this pull request Jul 7, 2022
Update sonic-swss submodule pointer to include the following:
* Port configuration incremental update support ([sonic-net#2305](sonic-net/sonic-swss#2305))
* [VS Test] Skip failing subport tests ([sonic-net#2370](sonic-net/sonic-swss#2370))
* [teammgr]: Waiting MACsec ready before doLagMemberTask ([sonic-net#2286](sonic-net/sonic-swss#2286))
* [vnetorch] [vxlanorch] fix a set of memory usage issues ([sonic-net#2352](sonic-net/sonic-swss#2352))
* [tests] [asan] add graceful stop flag ([sonic-net#2347](sonic-net/sonic-swss#2347))
* [asan] suppress the static variable leaks ([sonic-net#2354](sonic-net/sonic-swss#2354))
* Add support for IP interface loopback action ([sonic-net#2307](sonic-net/sonic-swss#2307))
* [orchagent]: srv6orch support for uSID ([sonic-net#2335](sonic-net/sonic-swss#2335))

Signed-off-by: dprital <drorp@nvidia.com>
liat-grozovik pushed a commit that referenced this pull request Jul 14, 2022
Update sonic-swss submodule pointer to include the following:
* Port configuration incremental update support ([#2305](sonic-net/sonic-swss#2305))
* [VS Test] Skip failing subport tests ([#2370](sonic-net/sonic-swss#2370))
* [teammgr]: Waiting MACsec ready before doLagMemberTask ([#2286](sonic-net/sonic-swss#2286))
* [vnetorch] [vxlanorch] fix a set of memory usage issues ([#2352](sonic-net/sonic-swss#2352))
* [tests] [asan] add graceful stop flag ([#2347](sonic-net/sonic-swss#2347))
* [asan] suppress the static variable leaks ([#2354](sonic-net/sonic-swss#2354))
* Add support for IP interface loopback action ([#2307](sonic-net/sonic-swss#2307))
* [orchagent]: srv6orch support for uSID ([#2335](sonic-net/sonic-swss#2335))

Signed-off-by: dprital <drorp@nvidia.com>
skbarista pushed a commit to skbarista/sonic-buildimage that referenced this pull request Aug 17, 2022
swss:
* 665bbbb 2022-06-27 | Add support for IP interface loopback action (sonic-net#2307) (HEAD -> 202205, github/202205) [Lior Avramov]
* c7f5743 2022-06-28 | [asan] suppress the static variable leaks (sonic-net#2354) [Yakiv Huryk]
* 37e2a31 2022-06-28 | [tests] [asan] add graceful stop flag (sonic-net#2347) [Yakiv Huryk]
* 5ab84cf 2022-06-28 | [202205][cherry-pick] Fix mux_acl_rule adding issue (sonic-net#2358) [bingwang-ms]

linkmgrd:
* a836ef7 2022-06-28 | Use Vlan MAC as src MAC for link prober by default  (sonic-net#93) (HEAD -> 202205) [Jing Zhang]
* a828e86 2022-06-28 | Fix inconsistent mux state (sonic-net#92) (github/202205) [Longxiang Lyu]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
vivekrnv pushed a commit to vivekrnv/sonic-buildimage that referenced this pull request Aug 26, 2022
Currently, ASAN sometimes reports the BufferOrch::m_buffer_type_maps and QosOrch::m_qos_maps as leaked. However, their lifetime is the lifetime of a process so they are not really 'leaked'.
This also adds a simple way to add more suppressions later if required.

Example of ASAN report:

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f96aa952d30 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xead30)
    #1 0x55ca1da9f789 in __static_initialization_and_destruction_0 /__w/2/s/orchagent/bufferorch.cpp:39
    #2 0x55ca1daa02af in _GLOBAL__sub_I_bufferorch.cpp /__w/2/s/orchagent/bufferorch.cpp:1321
    #3 0x55ca1e2a9cd4  (/usr/bin/orchagent+0xe89cd4)

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f96aa952d30 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xead30)
    #1 0x55ca1da6d2da in __static_initialization_and_destruction_0 /__w/2/s/orchagent/qosorch.cpp:80
    #2 0x55ca1da6ecf2 in _GLOBAL__sub_I_qosorch.cpp /__w/2/s/orchagent/qosorch.cpp:2000
    #3 0x55ca1e2a9cd4  (/usr/bin/orchagent+0xe89cd4)

- What I did
Added an lsan suppression config with static variable leak suppression

- Why I did it
To suppress ASAN false positives

- How I verified it
Run a test that produces the static variable leaks report and checked that report has these leaks suppressed.

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants