Skip to content

Commit

Permalink
Update root-cause-analysis-model-card.md (#1684)
Browse files Browse the repository at this point in the history
Updating to fields and presentation of fields for Model Card++ 3.0
Release.

## Description
<!-- Note: The pull request title will be included in the CHANGELOG. -->
<!-- Provide a standalone description of changes in this PR. -->
<!-- Reference any issues closed by this PR with "closes #1234". All PRs
should have an issue they close-->
Closes 

## By Submitting this PR I confirm:
- I am familiar with the [Contributing
Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these
changes.
- When the PR is ready for review, the documentation is up to date with
these changes.
  • Loading branch information
AnuradhaKaruppiah authored May 16, 2024
2 parents 58a572a + 26a82e3 commit ee9d932
Show file tree
Hide file tree
Showing 5 changed files with 18 additions and 95 deletions.
6 changes: 2 additions & 4 deletions models/model-cards/abp-model-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,11 +160,9 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No

### Are there explicit model and dataset restrictions?
* No
### Describe access restrictions

### Are there access restrictions to systems, model, and data?
* No
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

### Is there a digital signature?
* No
Expand Down
9 changes: 2 additions & 7 deletions models/model-cards/dfp-model-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,10 +108,10 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
## Model Card ++ Bias Subcard

### Individuals from the following adversely impacted (protected classes) groups participate in model design and testing.
* None of the Above.
* None of the Above.

### Describe measures taken to mitigate against unwanted bias.
* None of the Above.
* None of the Above.

## Model Card ++ Explainability Subcard

Expand Down Expand Up @@ -167,13 +167,8 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
### Name explicit model and/or dataset restrictions.
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development.

### Are there access restrictions to systems, model, and data?
* No


## Model Card ++ Privacy Subcard


### Generatable or reverse engineerable personally-identifiable information (PII)?
* None

Expand Down
6 changes: 2 additions & 4 deletions models/model-cards/gnn-fsi-model-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,11 +159,9 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* Not Applicable

### Are there explicit model and dataset restrictions?
* No
### Describe access restrictions

### Are there access restrictions to systems, model, and data?
* No
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

### Is there a digital signature?
* No
Expand Down
6 changes: 2 additions & 4 deletions models/model-cards/phishing-model-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,11 +168,9 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No

### Are there explicit model and dataset restrictions?
* No
### Describe access restrictions

### Are there access restrictions to systems, model, and data?
* No
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

### Is there a digital signature?

Expand Down
86 changes: 10 additions & 76 deletions models/model-cards/root-cause-analysis-model-card.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,63 +21,49 @@ limitations under the License.
# Model Overview

## Description:

* Root cause analysis is a binary classifier differentiating between ordinary logs and errors/problems/root causes in the log files. <br>

## References(s):

* Devlin J. et al. (2018), BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 <br>

## Model Architecture:

**Architecture Type:**

* Transformers <br>

**Network Architecture:**

* BERT <br>

## Input: (Enter "None" As Needed)

**Input Format:**

* CSV <br>

**Input Parameters:**

* kern.log file contents <br>

**Other Properties Related to Output:**

* N/A <br>

## Output: (Enter "None" As Needed)

**Output Format:**

* Binary Results, Root Cause or Ordinary <br>

**Output Parameters:**

* N/A <br>

**Other Properties Related to Output:**

* N/A <br>

## Software Integration:

**Runtime(s):**

* Morpheus <br>

**Supported Hardware Platform(s):** <br>

* Ampere/Turing <br>

**Supported Operating System(s):** <br>

* Linux <br>

## Model Version(s):
Expand All @@ -88,67 +74,31 @@ limitations under the License.
## Training Dataset:

**Link:**

* https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/datasets/training-data/root-cause-training-data.csv <br>

**Properties (Quantity, Dataset Descriptions, Sensor(s)):**

* kern.log files from DGX machines <br>

## Evaluation Dataset:

**Link:**

* https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/datasets/validation-data/root-cause-validation-data-input.jsonlines <br>

**Properties (Quantity, Dataset Descriptions, Sensor(s)):**

* kern.log files from DGX machines <br>

## Inference:

**Engine:**

* Triton <br>

**Test Hardware:** <br>

* Other <br>

# Subcards

## Model Card ++ Bias Subcard

### What is the gender balance of the model validation data?
* Not Applicable

### What is the racial/ethnicity balance of the model validation data?
* Not Applicable

### What is the age balance of the model validation data?
* Not Applicable

### What is the language balance of the model validation data?
* Not Applicable

### What is the geographic origin language balance of the model validation data?
* Not Applicable

### What is the educational background balance of the model validation data?
* Not Applicable

### What is the accent balance of the model validation data?
* Not Applicable

### What is the face/key point balance of the model validation data?
* Not Applicable

### What is the skin/tone balance of the model validation data?
* Not Applicable

### What is the religion balance of the model validation data?
* Not Applicable

### Individuals from the following adversely impacted (protected classes) groups participate in model design and testing.
* Not Applicable

Expand All @@ -160,26 +110,24 @@ limitations under the License.
### Name example applications and use cases for this model.
* The model is primarily designed for testing purposes and serves as a small pre-trained model specifically used to evaluate and validate the Root Cause Analysis pipeline. This model is an example of customized transformer-based root cause analysis. It can be used for pipeline testing purposes. It needs to be re-trained for specific root cause analysis or predictive maintenance needs with the fine-tuning scripts in the repo. The hyperparameters can be optimised to adjust to get the best results with another dataset. The aim is to get the model to predict some false positives that could be previously unknown error types. Users can use this root cause analysis approach with other log types too. If they have known failures in their logs, they can use them to train along with ordinary logs and can detect other root causes they weren't aware of before.

### Fill in the blank for the model technique.

### Intended Users.
* This model is designed for developers seeking to test the root cause analysis pipeline with a small pre-trained model trained on a very small `kern.log` file from a DGX.

### Name who is intended to benefit from this model.

* The intended beneficiaries of this model are developers who aim to test the functionality of the DFP pipeline using synthetic datasets

### Describe the model output.
* This model output can be used as a binary result, Root cause or Ordinary

### List the steps explaining how this model works.
### Describe how this model works.
* A BERT model gets fine-tuned with the kern.log dataset and in the inference it predicts one of the binary classes. Root cause or Ordinary.

### Name the adversely impacted groups (protected classes) this has been tested to deliver comparable outcomes regardless of:
* Not Applicable

### List the technical limitations of the model.
* For different log types and content, different models need to be trained.

### Has this been verified to have met prescribed NVIDIA quality standards?
* Yes

### What performance metrics were used to affirm the model's performance?
* F1

Expand All @@ -195,10 +143,7 @@ limitations under the License.
### Link the location of the training dataset's repository.
* https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/models/datasets/training-data/root-cause-training-data.csv

### Is the model used in an application with physical safety impact?
* No

### Describe physical safety impact (if present).
### Describe the life critical impact (if present).
* None

### Was model and dataset assessed for vulnerability for potential form of attack?
Expand All @@ -210,20 +155,12 @@ limitations under the License.
### Name use case restrictions for the model.
* Different models need to be trained depending on the log types.

### Has this been verified to have met prescribed quality standards?
* No

### Name target quality Key Performance Indicators (KPIs) for which this has been tested.
* N/A

### Is the model and dataset compliant with National Classification Management Society (NCMS)?
* No

### Are there explicit model and dataset restrictions?
* It is for pipeline testing purposes.
### Describe access restrictions

### Are there access restrictions to systems, model, and data?
* No
* The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

### Is there a digital signature?
* No
Expand All @@ -232,7 +169,7 @@ limitations under the License.


### Generatable or reverse engineerable personally-identifiable information (PII)?
* Neither
* None

### Was consent obtained for any PII used?
* N/A
Expand All @@ -249,12 +186,9 @@ limitations under the License.
### If PII collected for the development of this AI model, was it minimized to only what was required?
* N/A

### Is data in dataset traceable?
### Is there data provenance?
* Original raw logs are not saved. The small sample in the repo is saved for testing the pipeline.

### Are we able to identify and trace source of dataset?
* N/A

### Does data labeling (annotation, metadata) comply with privacy laws?
* N/A

Expand Down

0 comments on commit ee9d932

Please sign in to comment.