-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HLD] Event Driven TechSupport Invocation & CoreDump Mgmt #818
[HLD] Event Driven TechSupport Invocation & CoreDump Mgmt #818
Conversation
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
state = enabled|disabled; # Enable this to make the Techsupport Invocation event driven based on core-dump generation | ||
rate_limit_interval = 300; # Minimum Time in seconds, between two successive techsupport invocations. | ||
Manual Invocations will be considered as well in the cooloff calculation | ||
max_techsupport_size = 10; # A perentage value should be specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vivekreddynv typo? Did you mean percentage
?
This signifies maximum Size to which /var/dump/ directory can be grown until. | ||
The actual value in bytes is calculate based on the available space in the filesystem hosting /var/dump | ||
When the limit is crossed, the older techsupport dumps are incrementally deleted | ||
max_core_size = 5; # A perentage value should be specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vivekreddynv typo? Did you mean percentage
?
|
||
Where `<comm>` value in the command name associated with a process. comm value of a running process can be read from `/proc/[pid]/comm` file | ||
|
||
## 4. Schema Additions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vivekreddynv suggest to follow SONiC DB schema description conventions (ABNF):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
description "First Revision"; | ||
} | ||
|
||
typedef enable-knob { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vivekreddynv we already have this type in SONiC:
https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-types.yang#L54
typedef admin_status {
type enumeration {
enum up;
enum down;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel "enabled|disabled" makes better sense here over "up|down".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vivekreddynv sorry, a typo:
https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-types.yang#L119
typedef admin_mode {
type enumeration {
enum enabled;
enum disabled;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay, I'll change this
leaf feature_name { | ||
description "The name of this feature"; | ||
/* TODO: Leafref once the FEATURE YANG is added*/ | ||
type string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vivekreddynv please add a key length constraint
@nazariig do you have any further comments? if not can you please approve? |
#### What I did sonic-utilities changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](sonic-net/SONiC#818 ) Summary of the changes: - Added the AUTO GEN CLI for the CFG DB tables required for this feature - Added the coredump_gen_handler.py & techsupport_cleanup.py scripts. - Added the UT's required for these scripts. - Enhanced coredump-compress & generate-dump scripts
#### Why I did it Changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](sonic-net/SONiC#818 ) Requires: sonic-net/sonic-utilities#1796. Merging in any order would be fine. Summary of the changes: - Added the YANG Models for the new tables introduces as a part of this feature. - Enhanced init_cfg.json with the default config required - Added a compile Time flag which enables/disables the config required for this feature inside the init_cfg.json - Enhanced the supervisor-proc-exit-listener script to populate `<feature>:<critical_proc> = <comm>:<pid>` info in the STATE_DB when it observes an proc exit notification for the critical processes running inside the docker.
#### What I did sonic-utilities changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](sonic-net/SONiC#818 ) Summary of the changes: - Added the AUTO GEN CLI for the CFG DB tables required for this feature - Added the coredump_gen_handler.py & techsupport_cleanup.py scripts. - Added the UT's required for these scripts. - Enhanced coredump-compress & generate-dump scripts
Signed-off-by: Vivek Reddy Karri vkarri@nvidia.com
The high-level design document for the "Event Driven TechSupport Invocation & CoreDump Mgmt" enhancement