Configurable TS->Master Heartbeat timeout #2418

amitanandaiyer · 2019-09-24T22:44:33Z

TS->master heartbeat is hardcoded with a 10 sec timeout.

For use cases with thausands of tablets, the master can sometimes be slow if there is contention at the master side.

Would be useful to have this be configurable (through a gflag)

amitanandaiyer · 2019-09-24T22:59:05Z

The TS is only supposed to send a full report the very first time it connects to the master.
_{View in Slack}

amitanandaiyer · 2019-09-24T22:59:19Z

However in this case, (due to the nodes having about 5k tablets each) processing each of those requests at the master takes > 10secs
_{View in Slack}

amitanandaiyer · 2019-09-24T22:59:33Z

This causes the RPC to time out at the TServer end, and it keeps retrying the rpc to the master with the master having to reprocess all the tablets again.
_{View in Slack}

amitanandaiyer · 2019-09-24T22:59:48Z

So, the TS is continuously sending full tablet reports to the master; and the master is getting overwhelmed because processing a full tablet report is a lot of work.
_{View in Slack}

instead of hardcoding 10sec timeout #2418 Summary: Currently, the TS uses a hard-coded 10 sec timeout for the heartbeat RPC. If the TServer has a lot of tablets, the initial RPC reporting the full tablet report can take a long time. Use FLAGS_heartbeat_rpc_timeout_ms to make this configurable for such large clusters. Test Plan: eyeball Reviewers: kannan, mihnea, hector Reviewed By: hector Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D7279

kmuthukk added the area/docdb YugabyteDB core features label Sep 25, 2019

amitanandaiyer added the good first issue This is a good issue to start contributing! label Sep 25, 2019

kmuthukk assigned amitanandaiyer Sep 25, 2019

amitanandaiyer closed this as completed Oct 1, 2019

ryan-ally mentioned this issue Nov 30, 2023

[Snyk] Fix for 1 vulnerabilities ryan-ally/yugabyte-db#214

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable TS->Master Heartbeat timeout #2418

Configurable TS->Master Heartbeat timeout #2418

amitanandaiyer commented Sep 24, 2019

amitanandaiyer commented Sep 24, 2019

amitanandaiyer commented Sep 24, 2019

amitanandaiyer commented Sep 24, 2019

amitanandaiyer commented Sep 24, 2019

Configurable TS->Master Heartbeat timeout #2418

Configurable TS->Master Heartbeat timeout #2418

Comments

amitanandaiyer commented Sep 24, 2019

amitanandaiyer commented Sep 24, 2019

amitanandaiyer commented Sep 24, 2019

amitanandaiyer commented Sep 24, 2019

amitanandaiyer commented Sep 24, 2019