-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(portal-server/mis-server): 启动时跳过已停用集群的ssh检查 #1347
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
🦋 Changeset detectedLatest commit: 9e60741 The changes in this PR will be included in the next version bump. This PR includes changesets to release 7 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
…erIds' into fix-skip-deactivated-cluster-ssh
此PR需要测试老师介入测试,实际部署环境下多集群停机,停用,scow重启等情况 |
ddadaal
approved these changes
Jul 10, 2024
pkuhpc-review-bot
bot
added
Code-Approved
Code Review approved
ReadyForMerge
Ready for merge
and removed
Code-ReviewRequested
Code Review Requested
labels
Jul 10, 2024
pkuhpc-review-bot
bot
added
E2E-ReviewRequested
E2E Test requested
and removed
ReadyForMerge
Ready for merge
labels
Jul 10, 2024
关闭适配器后,fetch job报错,管理平台无法启动 |
lyl-available
approved these changes
Jul 15, 2024
pkuhpc-review-bot
bot
added
E2E-Approved
E2E Test approved
ReadyForMerge
Ready for merge
and removed
E2E-ReviewRequested
E2E Test requested
labels
Jul 15, 2024
@ddadaal 由于测试老师在测试过程中发现后台fetchJobs的自动执行会导致mis-server服务停机,不断重启,所以增加以下修改。
|
pkuhpc-review-bot
bot
added
Code-ReviewRequested
Code Review Requested
and removed
Code-Approved
Code Review approved
ReadyForMerge
Ready for merge
labels
Jul 16, 2024
ddadaal
approved these changes
Jul 16, 2024
pkuhpc-review-bot
bot
added
Code-Approved
Code Review approved
ReadyForMerge
Ready for merge
and removed
Code-ReviewRequested
Code Review Requested
labels
Jul 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
背景
在实际应用中会出现集群所在机器关机,或者所在网络不可用等情况,管理员可能会在出现问题之后在管理系统页面将集群停用
所以在启用时,应只检查启用中的集群是否满足启用条件
修改
此PR进行以下修改
系统启动时只对启用中集群登录节点进行免密检查
在插入公钥接口
insertKeyToNewUser
接口中增加clusters
参数,现阶段由于插入公钥失败不影响创建用户,是对所有集群执行在启用集群
activateCluster
接口中增加对启用集群登录节点检查免密的操作,若失败将报错,无法启用集群增加修改对
price plugin
中使用callOnAll
进行多集群操作时,如果适配器请求失败会抛出错误的部分作业价格表设置页面在之前的某个Issue中已经修改了不报错,只显示当前可用集群的价格信息,所以现在对
price plugin
中多集群处理修改为不抛出错误,在logger
中进行提示上述修改可以解决下面两个问题
1.系统使用过程中,当多集群下某个适配器请求失败时,
mis-server
由于后台fetchJobs
的执行会造成无法连接2.系统启动时,某一个适配器请求失败,
mis-server
或依赖于mis-server
的portal-server
无法启动当适配器正常连接时,随着打开作业价格表或后台
fetchJobs
的执行,没有请求到的集群价格设置数据可以再次获取修改后
本地docker-cluster测试
1.正常启用中管理系统和门户系统页面正常
2.login,c1,slurm全部stop的情况,管理系统页面和门户系统页面
报错或无集群数据
3.直接操作停用集群
管理系统不显示停用集群数据
门户系统不显示停用集群数据,正在访问的集群会报错
4.在login,c1,slurm全部已停用并操作了集群停用后重启scow,集群管理页面因集群异常不显示启用按键
5.重新开启login,c1,slurm,操作启用集群