Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore OVN DB from apiserver resources #1805

Closed
oilbeater opened this issue Aug 15, 2022 · 6 comments · Fixed by #1989
Closed

Restore OVN DB from apiserver resources #1805

oilbeater opened this issue Aug 15, 2022 · 6 comments · Fixed by #1989
Assignees
Labels
feature New network feature investigation

Comments

@oilbeater
Copy link
Collaborator

Feature request

When the majority of ovn db crash, we should have a way to rescue the cluster from apiserver with no data loss.

If it can be achieved, we can just remove the data and restart ovn-central to recover from an incident, and we no longer need the ha of ovn db.

@oilbeater oilbeater added feature New network feature investigation labels Aug 15, 2022
@oilbeater oilbeater changed the title Restore OVN db from apiserver resources Restore OVN DB from apiserver resources Aug 15, 2022
@oilbeater oilbeater assigned lut777 and hongzhen-ma and unassigned lut777 Aug 16, 2022
@hongzhen-ma
Copy link
Collaborator

测试的时候,发现同步完OVN NB DB的数据后,nbctl 可以查询到数据,sbctl 查询不出来数据

Image

这个需要解决

@hongzhen-ma
Copy link
Collaborator

如果只删除 ovnnb_db.db 文件,保留 ovnsb_db.db 文件,在重建 ovn-central pod之后,在完成 NB DB的恢复之后,SB DB数据也能恢复正常。

@hongzhen-ma
Copy link
Collaborator

查看ovn-controller的文档,如果是同时删除NB SB DB之后,可以在每个 ovs-ovn 中执行命令
ovs-appctl -t /var/run/ovn/ovn-controller.3721.ctl sb-cluster-state-reset
用于恢复 ovn-controller 的本地索引

企业微信截图_15f2de3d-164f-4fed-a9a4-dfc7cd7dfb19

@hongzhen-ma
Copy link
Collaborator

在 ovs 健康检查中,加入 ovn-controller 对 OVN SB DB 写入的检查,不能正常写入时,就执行 sb-cluster-state-reset 命令,恢复写入控制。
这样就只需要关注 NB DB的内容恢复,SB DB的内容,会由 ovn-controller 恢复写入。

@hongzhen-ma
Copy link
Collaborator

企业微信截图_afc36816-3ca6-4781-a152-53e56f05d13d

从测试的ovs-ovn pod log 上看,在执行 sb-cluster-state-reset 命令后,会重新注册一遍 port 信息,这个过程可能会影响 应用访问的连续性

@hongzhen-ma
Copy link
Collaborator

https://confluence.alauda.cn/pages/viewpage.action?pageId=130548145

经过讨论,确认 OVN SB DB的恢复,由 ovs-ovn的健康检查,自动执行 sb-cluster-state-reset 命令实现。
NB DB 的恢复,在重启 ovn-central pod 之后,通过删除 kube-ovn-controller pod 重建,利用 kube-ovn-controller 的初始化来恢复。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New network feature investigation
Projects
No open projects
Status: Done
Status: Done
3 participants