Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MultiDB] update warmboot design #618

Merged
merged 8 commits into from
Jul 13, 2020

Conversation

dzhangalibaba
Copy link
Contributor

  • add warmboot design for database backup

Signed-off-by: Dong Zhang d.zhang@alibaba-inc.com

NUMA node0 CPU(s): 0-3
```

- [x] Third part script like redis-dump/load is very slow, usually takes **~23s** when data size is **~40K**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make sure you also benchmark the unix socket for each method, and list the best between TCP and unix socket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tcp and unix socket almost have the same performance for this case

dbid = swsssdk.SonicDBConfig.get_dbid(dbname)
dbhost = swsssdk.SonicDBConfig.get_hostname(dbname)

r = redis.Redis(host=dbhost, unix_socket_path=dbsocket, db=dbid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why mix TCP socket and unix socket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not mix, if unix_socket_path is there, it always use unix socket, the host and port parameters are ignored. I leave host there since sometimes I changed unix_socket_path to port for switching mode.

import redis

dblists = swsssdk.SonicDBConfig.get_dblist()
for dbname in dblists:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not iterate all DB instances, and move all code for one instance into LUA script.
Even more, pre-load the script into redis to minimize the critical path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterating all DB instances is also fine. But the performance for this case is almost the same as iterating all database. Also, the performance for pre-load is the same as experiment shown, since the lua script is very short and simple and only used once if we iterate all DB instances.

@qiluo-msft qiluo-msft requested review from lguohan and yxieca June 1, 2020 17:06

Discussed with Guohan's team offline, we won't support all the warmreboot cases when the database\_config.json file changed. We only want to accept the database instances spliting situation. For example , before warmrebbot, there is two instances and after there are four isntances, we can use rdb file to restoring all data in four instances and then write a logic to flush unnecessary database on each instance. This logic will be done in a new script which will be executed after warmreboot database restoration.
Today we already assigned each database name with a unique number (APPL_DB 0, ASIC_DB 1, ...) and assign them into different redis instances in design. This makes it possible to migrate all data in all redis instances into one redis instance without any conflicts. Then we can handle this single redis instance the same as what we did today, since today we are only use single redis instance. So the poposed new idea is as below steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this grant scheme is ok when multiple DB instances are used for load balancing, meaning that they have different DB IDs.

Not sure if this assumption still true for multi-ASCI scenario. Potentially, there we could have multiple DB instances contains same DB IDs/Table names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a bit more thought to Ying's comment above - with the multi-AISC architecture now we would have to run "this warmboot logic" per namespace .. where in each namespace database docker could have one ore more redis instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Judy said, multi ASIC has multi namespaces/dockers, each namespace should have the same logic.

@rlhui rlhui requested a review from judyjoseph June 19, 2020 05:06
db4:keys=99,expires=0,avg_ttl=0
db5:keys=3145,expires=0,avg_ttl=0
db6:keys=365,expires=0,avg_ttl=0
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this output, find that the "DB names" are used to identify the databases after migration into a single instance, eg: db0, db1 etc ?
If that is the case , even with current single database docker scheme -- we need to ensure that the database names are unique across different redis instances. eg: COUNTERS_DB cannot be there in redis instance0 and redis instance1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in single DB case, today database names are unique across different redis instance, this is why this approach is proposed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks - it should be ok then ! I am wondering if we need to add some validation to enforce this in SonicDBConfig or so when we parse the database_config file ?

@dzhangalibaba
Copy link
Contributor Author

Is it OK to go ahead with this proposal ? Let me know if I can change the codes and make progress @qiluo-msft @lguohan @yxieca @judyjoseph

@qiluo-msft
Copy link
Contributor

@judyjoseph Could you help review again?

@judyjoseph
Copy link
Contributor

LGTM as well, the only feedback I have relating to multi-ASIC scenario is that we would need more CPU cycles for doing this DB save and restore/flushdb activity in different namespaces .. Need to do this migration in parallel ( like spawning multiple threads etc ) across host + namespace redis servers.

Copy link
Contributor

@judyjoseph judyjoseph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multi-asic need enhancements to apply this logic per namespace.
Also correct few more words in the doc like intance, perfoemacne before merging in.

@qiluo-msft qiluo-msft merged commit 6a8dcf4 into sonic-net:master Jul 13, 2020
kartik-arista pushed a commit to kartik-arista/SONiC that referenced this pull request Jul 16, 2020
* [MultiDB] update warmboot design
* fix format
* fix format 1
* add cpu info
* fix format 2
* add tcp/unixsocket comp
* fix format
* Update multi_database_instances.md
address review

Discussed with Guohan's team offline, we won't support all the warmreboot cases when the database\_config.json file changed. We only want to accept the database instances spliting situation. For example , before warmrebbot, there is two instances and after there are four isntances, we can use rdb file to restoring all data in four instances and then write a logic to flush unnecessary database on each instance. This logic will be done in a new script which will be executed after warmreboot database restoration.
Today we already assigned each database name with a unique number (APPL_DB 0, ASIC_DB 1, ...) and assign them into different redis instances in design. This makes it possible to migrate all data in all redis instances into one redis instance without any conflicts. Then we can handle this single redis instance the same as what we did today, since today we are only use single redis instance. So the poposed new idea is as below steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today we already assigned each database name with a unique number (APPL_DB 0, ASIC_DB 1, ...) and assign them into different redis instances in design [](start = 0, length = 150)

@dzhangalibaba This is tricky/hidden information for a long run. Please help add a unit test or vs test or PR checker to gatekeeper every version handle this assumption.

@xjasonlyu
Copy link

Hi @dzhangalibaba

I noticed that you said “We tried to create two database instances and separate the huge write into two database instances. The test result shows the performance (time) improved 20-30%” in the motivation section, and I wonder if you can share more specific test cases or any other ways to simulate the tests would be very helpful!

Also, the “Third part script like redis-dump/load is very slow, usually takes ~23s when data size is ~40K, TCP and UNIX SOCKET connections almost take the same time.” you mentioned in the wiki, what does it mean that the data size is 40K (40KB in json file size or 40,000 keys)? I tried to reproduce the test but got far different results, so I guess it might be a problem with the test data I used.

@dzhangalibaba
Copy link
Contributor Author

dzhangalibaba commented Nov 19, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants