[MultiDB] update warmboot design #618

dzhangalibaba · 2020-05-21T23:31:19Z

add warmboot design for database backup

Signed-off-by: Dong Zhang d.zhang@alibaba-inc.com

qiluo-msft · 2020-05-22T03:56:17Z

doc/database/multi_database_instances.md

+  NUMA node0 CPU(s):   0-3
+  ```
+
+  - [x] Third part script like redis-dump/load is very slow, usually takes **~23s** when data size is **~40K**


Could you make sure you also benchmark the unix socket for each method, and list the best between TCP and unix socket.

tcp and unix socket almost have the same performance for this case

qiluo-msft · 2020-05-22T03:58:19Z

doc/database/multi_database_instances.md

+      dbid = swsssdk.SonicDBConfig.get_dbid(dbname)
+      dbhost = swsssdk.SonicDBConfig.get_hostname(dbname)
+
+      r = redis.Redis(host=dbhost, unix_socket_path=dbsocket, db=dbid)


Why mix TCP socket and unix socket?

it is not mix, if unix_socket_path is there, it always use unix socket, the host and port parameters are ignored. I leave host there since sometimes I changed unix_socket_path to port for switching mode.

qiluo-msft · 2020-05-22T04:07:16Z

doc/database/multi_database_instances.md

+  import redis
+
+  dblists = swsssdk.SonicDBConfig.get_dblist()
+  for dbname in dblists:


Why not iterate all DB instances, and move all code for one instance into LUA script.
Even more, pre-load the script into redis to minimize the critical path.

iterating all DB instances is also fine. But the performance for this case is almost the same as iterating all database. Also, the performance for pre-load is the same as experiment shown, since the lua script is very short and simple and only used once if we iterate all DB instances.

yxieca · 2020-06-19T04:23:33Z

doc/database/multi_database_instances.md


-Discussed with Guohan's team offline, we won't support all the warmreboot cases when the database\_config.json file changed. We only want to accept the database instances spliting situation. For example , before warmrebbot, there is two instances and after there are four isntances, we can use rdb file to restoring all data in four instances and then write a logic to flush unnecessary database on each instance. This logic will be done in a new script which will be executed after warmreboot database restoration.  
+Today we already assigned each database name with a unique number (APPL_DB 0, ASIC_DB 1, ...) and assign them into different redis instances in design. This makes it possible to migrate all data in all redis instances into one redis instance without any conflicts. Then we can handle this single redis instance the same as what we did today, since today we are only use single redis instance. So the poposed new idea is as below steps:


I think this grant scheme is ok when multiple DB instances are used for load balancing, meaning that they have different DB IDs.

Not sure if this assumption still true for multi-ASCI scenario. Potentially, there we could have multiple DB instances contains same DB IDs/Table names.

Adding a bit more thought to Ying's comment above - with the multi-AISC architecture now we would have to run "this warmboot logic" per namespace .. where in each namespace database docker could have one ore more redis instances.

As Judy said, multi ASIC has multi namespaces/dockers, each namespace should have the same logic.

judyjoseph · 2020-06-19T21:37:29Z

doc/database/multi_database_instances.md

+  db4:keys=99,expires=0,avg_ttl=0
+  db5:keys=3145,expires=0,avg_ttl=0
+  db6:keys=365,expires=0,avg_ttl=0
+  ```


In this output, find that the "DB names" are used to identify the databases after migration into a single instance, eg: db0, db1 etc ?
If that is the case , even with current single database docker scheme -- we need to ensure that the database names are unique across different redis instances. eg: COUNTERS_DB cannot be there in redis instance0 and redis instance1.

in single DB case, today database names are unique across different redis instance, this is why this approach is proposed.

Sure, thanks - it should be ok then ! I am wondering if we need to add some validation to enforce this in SonicDBConfig or so when we parse the database_config file ?

dzhangalibaba · 2020-06-25T21:53:25Z

Is it OK to go ahead with this proposal ? Let me know if I can change the codes and make progress @qiluo-msft @lguohan @yxieca @judyjoseph

qiluo-msft · 2020-07-01T20:06:19Z

@judyjoseph Could you help review again?

judyjoseph · 2020-07-06T06:33:54Z

LGTM as well, the only feedback I have relating to multi-ASIC scenario is that we would need more CPU cycles for doing this DB save and restore/flushdb activity in different namespaces .. Need to do this migration in parallel ( like spawning multiple threads etc ) across host + namespace redis servers.

doc/database/multi_database_instances.md

address review

judyjoseph

For multi-asic need enhancements to apply this logic per namespace.
Also correct few more words in the doc like intance, perfoemacne before merging in.

* [MultiDB] update warmboot design * fix format * fix format 1 * add cpu info * fix format 2 * add tcp/unixsocket comp * fix format * Update multi_database_instances.md address review

qiluo-msft · 2020-07-20T23:09:14Z

doc/database/multi_database_instances.md


-Discussed with Guohan's team offline, we won't support all the warmreboot cases when the database\_config.json file changed. We only want to accept the database instances spliting situation. For example , before warmrebbot, there is two instances and after there are four isntances, we can use rdb file to restoring all data in four instances and then write a logic to flush unnecessary database on each instance. This logic will be done in a new script which will be executed after warmreboot database restoration.  
+Today we already assigned each database name with a unique number (APPL_DB 0, ASIC_DB 1, ...) and assign them into different redis instances in design. This makes it possible to migrate all data in all redis instances into one redis instance without any conflicts. Then we can handle this single redis instance the same as what we did today, since today we are only use single redis instance. So the poposed new idea is as below steps:


Today we already assigned each database name with a unique number (APPL_DB 0, ASIC_DB 1, ...) and assign them into different redis instances in design [](start = 0, length = 150)

@dzhangalibaba This is tricky/hidden information for a long run. Please help add a unit test or vs test or PR checker to gatekeeper every version handle this assumption.

xjasonlyu · 2021-11-19T09:26:00Z

Hi @dzhangalibaba

I noticed that you said “We tried to create two database instances and separate the huge write into two database instances. The test result shows the performance (time) improved 20-30%” in the motivation section, and I wonder if you can share more specific test cases or any other ways to simulate the tests would be very helpful!

Also, the “Third part script like redis-dump/load is very slow, usually takes ~23s when data size is ~40K, TCP and UNIX SOCKET connections almost take the same time.” you mentioned in the wiki, what does it mean that the data size is 40K (40KB in json file size or 40,000 keys)? I tried to reproduce the test but got far different results, so I guess it might be a problem with the test data I used.

dzhangalibaba · 2021-11-19T19:02:52Z

Hi Jason, I noticed that you said “We tried to create two database instances and separate the huge write into two database instances. The test result shows the performance (time) improved 20-30%” in the motivation section, and I wonder if you can share more specific test cases or any other ways to simulate the tests would be very helpful! - We focused on the route download performance. For testing, what we did before is to generate a large json file containing 40K route entries. Then we use swssconfig tool to load this json file into APP_DB, then orchagent will do the job to program ASIC_DB / ASIC, after that, we measured how long it takes to make all those 40K entries programmed in ASIC. We found the performance improved ~30% using multiDB. I made some demos earlier to Qi Luo/Shi (MSFT team), Zhenggen Xu(Linkined team), you can check with them for what I did before as well if that's convenient for you. Also, the “Third part script like redis-dump/load is very slow, usually takes ~23s when data size is ~40K, TCP and UNIX SOCKET connections almost take the same time.” you mentioned in the wiki, what does it mean that the data size is 40K (40KB in json file size or 40,000 keys)? I tried to reproduce the test but got far different results, so I guess it might be a problem with the test data I used. - it means the entries/keys in json file. Best, DONG

…

On Fri, Nov 19, 2021 at 1:26 AM Jason Lyu ***@***.***> wrote: Hi @dzhangalibaba <https://github.com/dzhangalibaba> I noticed that you said “We tried to create two database instances and separate the huge write into two database instances. The test result shows the performance (time) improved 20-30%” in the motivation section, and I wonder if you can share more specific test cases or any other ways to simulate the tests would be very helpful! Also, the “Third part script like redis-dump/load is very slow, usually takes ~23s when data size is ~40K, TCP and UNIX SOCKET connections almost take the same time.” you mentioned in the wiki, what does it mean that the data size is 40K (40KB in json file size or 40,000 keys)? I tried to reproduce the test but got far different results, so I guess it might be a problem with the test data I used. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#618 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJ74GSVP5OXM2D6CFCSKY7DUMYJ3FANCNFSM4NHJEYHQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

qiluo-msft reviewed May 22, 2020

View reviewed changes

qiluo-msft requested review from lguohan and yxieca June 1, 2020 17:06

dong.zhang added 6 commits June 8, 2020 09:25

[MultiDB] update warmboot design

eeb152d

fix format

bff9c8e

fix format 1

c55791e

add cpu info

d7ab827

fix format 2

1eb8ebb

add tcp/unixsocket comp

7766f09

dzhangalibaba force-pushed the database_doc branch from 95836ce to 7766f09 Compare June 9, 2020 05:52

fix format

72f4371

yxieca reviewed Jun 19, 2020

View reviewed changes

rlhui requested a review from judyjoseph June 19, 2020 05:06

judyjoseph reviewed Jun 19, 2020

View reviewed changes

yxieca approved these changes Jul 1, 2020

View reviewed changes

lguohan reviewed Jul 6, 2020

View reviewed changes

doc/database/multi_database_instances.md Outdated Show resolved Hide resolved

Update multi_database_instances.md

3dbc70b

address review

judyjoseph approved these changes Jul 12, 2020

View reviewed changes

lguohan approved these changes Jul 13, 2020

View reviewed changes

qiluo-msft approved these changes Jul 13, 2020

View reviewed changes

qiluo-msft merged commit 6a8dcf4 into sonic-net:master Jul 13, 2020

qiluo-msft reviewed Jul 20, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MultiDB] update warmboot design #618

[MultiDB] update warmboot design #618

dzhangalibaba commented May 21, 2020

qiluo-msft May 22, 2020

dzhangalibaba Jun 9, 2020

qiluo-msft May 22, 2020

dzhangalibaba Jun 9, 2020

qiluo-msft May 22, 2020

dzhangalibaba Jun 9, 2020

yxieca Jun 19, 2020

judyjoseph Jun 19, 2020

dzhangalibaba Jun 23, 2020

judyjoseph Jun 19, 2020

dzhangalibaba Jun 23, 2020

judyjoseph Jul 6, 2020

dzhangalibaba commented Jun 25, 2020

qiluo-msft commented Jul 1, 2020

judyjoseph commented Jul 6, 2020

judyjoseph left a comment

qiluo-msft Jul 20, 2020

xjasonlyu commented Nov 19, 2021

dzhangalibaba commented Nov 19, 2021 via email


		Discussed with Guohan's team offline, we won't support all the warmreboot cases when the database\_config.json file changed. We only want to accept the database instances spliting situation. For example , before warmrebbot, there is two instances and after there are four isntances, we can use rdb file to restoring all data in four instances and then write a logic to flush unnecessary database on each instance. This logic will be done in a new script which will be executed after warmreboot database restoration.
		Today we already assigned each database name with a unique number (APPL_DB 0, ASIC_DB 1, ...) and assign them into different redis instances in design. This makes it possible to migrate all data in all redis instances into one redis instance without any conflicts. Then we can handle this single redis instance the same as what we did today, since today we are only use single redis instance. So the poposed new idea is as below steps:

[MultiDB] update warmboot design #618

[MultiDB] update warmboot design #618

Conversation

dzhangalibaba commented May 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dzhangalibaba commented Jun 25, 2020

qiluo-msft commented Jul 1, 2020

judyjoseph commented Jul 6, 2020

judyjoseph left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xjasonlyu commented Nov 19, 2021

dzhangalibaba commented Nov 19, 2021 via email