From 19c054e0e3db64bd69dc0ae884c79e5c39244e3a Mon Sep 17 00:00:00 2001 From: Xiaoguang Chen Date: Thu, 26 Jan 2023 18:41:23 +0000 Subject: [PATCH] feat: SageMaker Clarify Model Monitors for Endpoint (JSON Lines Format) The commit includes two new example notebooks demonstrating how to schedule a SageMaker Clarify Model Monitor for a SageMaker real-time inference endpoint of which the requests and responses are in SageMaker Dense Format JSON Lines. --- sagemaker-clarify/index.rst | 2 + ...r-Monitoring-Bias-Drift-for-Endpoint.ipynb | 1949 +++++++++++++++++ ...ature-Attribution-Drift-for-Endpoint.ipynb | 1561 +++++++++++++ .../model/ll-adult-prediction-model.tar.gz | Bin 0 -> 950 bytes .../test_data/test-dataset.jsonl | 334 +++ .../test_data/validation-dataset.jsonl | 666 ++++++ sagemaker_model_monitor/index.rst | 4 +- 7 files changed, 4515 insertions(+), 1 deletion(-) create mode 100644 sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Bias-Drift-for-Endpoint.ipynb create mode 100644 sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Feature-Attribution-Drift-for-Endpoint.ipynb create mode 100644 sagemaker_model_monitor/fairness_and_explainability_jsonlines/model/ll-adult-prediction-model.tar.gz create mode 100644 sagemaker_model_monitor/fairness_and_explainability_jsonlines/test_data/test-dataset.jsonl create mode 100644 sagemaker_model_monitor/fairness_and_explainability_jsonlines/test_data/validation-dataset.jsonl diff --git a/sagemaker-clarify/index.rst b/sagemaker-clarify/index.rst index 2fbd679632..cc28cdeb43 100644 --- a/sagemaker-clarify/index.rst +++ b/sagemaker-clarify/index.rst @@ -27,6 +27,8 @@ SageMaker Clarify Model Monitoring :maxdepth: 1 ../sagemaker_model_monitor/fairness_and_explainability/SageMaker-Model-Monitor-Fairness-and-Explainability + ../sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Bias-Drift-for-Endpoint + ../sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Feature-Attribution-Drift-for-Endpoint SageMaker Clarify Online Explainability --------------------------------------- diff --git a/sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Bias-Drift-for-Endpoint.ipynb b/sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Bias-Drift-for-Endpoint.ipynb new file mode 100644 index 0000000000..0905fefbfe --- /dev/null +++ b/sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Bias-Drift-for-Endpoint.ipynb @@ -0,0 +1,1949 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "5a524a4c-5a39-4b6b-abb1-1c8e1b2de84c", + "metadata": {}, + "source": [ + "# Amazon SageMaker Clarify Model Bias Monitor - JSON Lines Format" + ] + }, + { + "cell_type": "markdown", + "id": "4eaae7a8-2ab1-4f7c-8cb2-6b23606c58c1", + "metadata": {}, + "source": [ + "## Runtime\n", + "\n", + "This notebook takes approximately 60 minutes to run." + ] + }, + { + "cell_type": "markdown", + "id": "a0a2c6a4-a249-40bf-adbc-8bd00fb06cfe", + "metadata": { + "tags": [] + }, + "source": [ + "## Introduction" + ] + }, + { + "cell_type": "markdown", + "id": "1879bacd-fedd-434a-8094-40cd48f5f140", + "metadata": {}, + "source": [ + "[Amazon SageMaker Model Monitor](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html) continuously monitors the quality of Amazon SageMaker machine learning models in production. It enables developers to set alerts for when there are deviations in the model quality. Early and pro-active detection of these deviations enables corrective actions, such as retraining models, auditing upstream systems, or fixing data quality issues without having to monitor models manually or build additional tooling. \n", + "\n", + "[Amazon SageMaker Clarify Model Bias Monitor](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-bias-drift.html) is a model monitor that helps data scientists and ML engineers monitor predictions for bias on a regular basis. Bias can be introduced or exacerbated in deployed ML models when the training data differs from the data that the model sees during deployment (that is, the live data). These kinds of changes in the live data distribution might be temporary (for example, due to some short-lived, real-world events) or permanent. In either case, it might be important to detect these changes. For example, the outputs of a model for predicting home prices can become biased if the mortgage rates used to train the model differ from current, real-world mortgage rates. With bias drift detection capabilities in model monitor, when SageMaker detects bias beyond a certain threshold, it automatically generates metrics that you can view in SageMaker Studio and through Amazon CloudWatch alerts. \n", + "\n", + "This notebook demonstrates the process for setting up a model monitor for continuous monitoring of bias drift of the data and model of a [SageMaker real-time inference endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html). The model input and output are in [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats). SageMaker Clarify model monitor also supports analyzing CSV data, which is illustrated in [another notebook](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_model_monitor/fairness_and_explainability/SageMaker-Model-Monitor-Fairness-and-Explainability.ipynb).\n", + "\n", + "In general, you can use the model bias monitor for real-time inference endpoint in this way,\n", + "\n", + "1. Enable the endpoint for data capture. Then, when the customer invokes the endpoint, the endpoint saves the invocations to a data capture S3 location. \n", + "1. Schedule a model bias monitor to monitor the endpoint (to be more specific, the data capture S3 location) and a ground truth S3 location.\n", + "1. You need to regularly fetch the captured data, label it, and then upload the ground truth labels to the ground truth S3 URI.\n", + "\n", + "The monitor executes processing jobs regularly to merge the captured data and ground truth data, do bias analysis for the merged data, and then generate analysis reports and publish metrics to CloudWatch." + ] + }, + { + "cell_type": "markdown", + "id": "a4eed2c2-4e67-49cd-8b16-01d10c0acdb0", + "metadata": {}, + "source": [ + "## General Setup" + ] + }, + { + "cell_type": "markdown", + "id": "56e754c8-d82a-49a3-9967-d7a487a42549", + "metadata": {}, + "source": [ + "The notebook uses the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk). The following cell upgrades the SDK and its dependencies. Then you may need to restart the kernel and rerun the notebook to pick up the up-to-date APIs, if the notebook is executed in the SageMaker Studio." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e815029f-6166-40f6-a5dd-da2358f8b7fa", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -U sagemaker\n", + "!pip install -U boto3\n", + "!pip install -U botocore" + ] + }, + { + "cell_type": "markdown", + "id": "43f20cf6-1672-45ab-966b-5db2d51aad53", + "metadata": {}, + "source": [ + "### Imports\n", + "\n", + "The following cell imports the APIs to be used by the notebook." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "21f01570-2eee-46ef-b044-8b65569c26b7", + "metadata": {}, + "outputs": [], + "source": [ + "import sagemaker\n", + "import pandas as pd\n", + "import datetime\n", + "import json\n", + "import random\n", + "import threading\n", + "import time\n", + "import pprint" + ] + }, + { + "cell_type": "markdown", + "id": "5baa9278-a1c9-427c-a9d9-5ddab19bcd49", + "metadata": {}, + "source": [ + "### Handful of configuration\n", + "\n", + "To begin, ensure that these prerequisites have been completed.\n", + "\n", + "* Specify an AWS Region to host the model.\n", + "* Specify an IAM role to execute jobs.\n", + "* Define the S3 URIs that stores the model file, input data and output data. For demonstration purposes, this notebook uses the same bucket for them. In reality, they could be separated with different security policies." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "74b11f7c-e9cd-4321-8de5-27ca6dd85d01", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "AWS region: us-west-2\n", + "RoleArn: arn:aws:iam::000000000000:role/service-role/AmazonSageMaker-ExecutionRole-20200714T163791\n", + "Demo Bucket: sagemaker-us-west-2-000000000000\n", + "Demo Prefix: sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6\n", + "Demo S3 key: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6\n", + "The endpoint will save the captured data to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/data-capture\n", + "You should upload the ground truth data to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/ground-truth\n", + "The baselining job will save the analysis results to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/baselining-output\n", + "The monitor will save the analysis results to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output\n" + ] + } + ], + "source": [ + "sagemaker_session = sagemaker.Session()\n", + "s3_client = sagemaker_session.boto_session.client(\"s3\")\n", + "\n", + "region = sagemaker_session.boto_region_name\n", + "print(f\"AWS region: {region}\")\n", + "\n", + "role = sagemaker.get_execution_role()\n", + "print(f\"RoleArn: {role}\")\n", + "\n", + "# A different bucket can be used, but make sure the role for this notebook has\n", + "# the s3:PutObject permissions. This is the bucket into which the data is captured\n", + "bucket = sagemaker_session.default_bucket()\n", + "print(f\"Demo Bucket: {bucket}\")\n", + "prefix = sagemaker.utils.unique_name_from_base(\"sagemaker/DEMO-ClarifyModelMonitor\")\n", + "print(f\"Demo Prefix: {prefix}\")\n", + "s3_key = f\"s3://{bucket}/{prefix}\"\n", + "print(f\"Demo S3 key: {s3_key}\")\n", + "\n", + "data_capture_s3_uri = f\"{s3_key}/data-capture\"\n", + "ground_truth_s3_uri = f\"{s3_key}/ground-truth\"\n", + "baselining_output_s3_uri = f\"{s3_key}/baselining-output\"\n", + "monitor_output_s3_uri = f\"{s3_key}/monitor-output\"\n", + "\n", + "print(f\"The endpoint will save the captured data to: {data_capture_s3_uri}\")\n", + "print(f\"You should upload the ground truth data to: {ground_truth_s3_uri}\")\n", + "print(f\"The baselining job will save the analysis results to: {baselining_output_s3_uri}\")\n", + "print(f\"The monitor will save the analysis results to: {monitor_output_s3_uri}\")" + ] + }, + { + "cell_type": "markdown", + "id": "d7da5265-858f-4478-978b-ad592464b61d", + "metadata": {}, + "source": [ + "### Model file and data files\n", + "\n", + "This example includes a prebuilt [SageMaker Linear Learner](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) model trained by [a SageMaker Clarify offline processing example notebook](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-clarify/fairness_and_explainability/fairness_and_explainability_jsonlines_format.ipynb). The model supports [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats) (MIME type `\"application/jsonlines\"`).\n", + "\n", + "* The model input can one or more lines, each line is a JSON object that has a \"features\" key pointing to a list of feature values concerning demographic characteristics of individuals. For example,\n", + "\n", + "```\n", + "{\"features\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\n", + "{\"features\":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]}\n", + "```\n", + "\n", + "* The model output has the predictions of whether a person has a yearly income that is more than $50,000. Each prediction is a JSON object that has a \"predicted_label\" key pointing to the predicted label, and the \"score\" key pointing to the confidence score. For example,\n", + "\n", + "```\n", + "{\"predicted_label\":1,\"score\":0.989977359771728}\n", + "{\"predicted_label\":1,\"score\":0.504138827323913}\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "f75d26c9-0f0b-422d-97cb-b74efd5eacd6", + "metadata": {}, + "outputs": [], + "source": [ + "model_file = \"model/ll-adult-prediction-model.tar.gz\"" + ] + }, + { + "cell_type": "markdown", + "id": "dc4d1d6a-c75c-4563-9699-33de88469093", + "metadata": {}, + "source": [ + "This example includes two dataset files, both in the JSON Lines format." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "f1eaa4fe-622f-4745-a3cc-52d40db8ce9f", + "metadata": {}, + "outputs": [], + "source": [ + "train_dataset_path = \"test_data/validation-dataset.jsonl\"\n", + "test_dataset_path = \"test_data/test-dataset.jsonl\"\n", + "dataset_type = \"application/jsonlines\"" + ] + }, + { + "cell_type": "markdown", + "id": "5ca1001e-0b91-4133-8bce-6710aaa33270", + "metadata": {}, + "source": [ + "The train dataset has the features and the ground truth label (pointed to by the key \"label\")," + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "06c22c10-7ba8-417a-a0dc-1e152a0a3287", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"features\":[41,2,220531,14,15,2,9,0,4,1,0,0,60,38],\"label\":1}\n", + "{\"features\":[33,2,35378,9,13,2,11,5,4,0,0,0,45,38],\"label\":1}\n", + "{\"features\":[36,2,223433,12,14,2,11,0,4,1,7688,0,50,38],\"label\":1}\n", + "{\"features\":[40,2,220589,7,12,4,0,1,4,0,0,0,40,38],\"label\":0}\n", + "{\"features\":[30,2,231413,15,10,2,2,0,4,1,0,0,40,38],\"label\":1}\n" + ] + } + ], + "source": [ + "!head -n 5 $train_dataset_path" + ] + }, + { + "cell_type": "markdown", + "id": "ddebb1fd-d480-4700-8dd8-3143205331a6", + "metadata": {}, + "source": [ + "The test dataset only has features." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "9f78d463-f1ff-4483-8cf3-562bccb98a2b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"features\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\n", + "{\"features\":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]}\n", + "{\"features\":[34,2,162604,11,9,4,2,2,2,1,0,0,40,37]}\n", + "{\"features\":[20,2,258509,11,9,4,6,3,2,1,0,0,40,37]}\n", + "{\"features\":[27,2,446947,9,13,4,0,4,2,0,0,0,55,37]}\n" + ] + } + ], + "source": [ + "!head -n 5 $test_dataset_path" + ] + }, + { + "cell_type": "markdown", + "id": "a7b89b8d-5036-4bd9-8aa5-f5d638617aba", + "metadata": {}, + "source": [ + "Here are the headers of the train dataset. \"Target\" is the header of the ground truth label, and the others are the feature headers. They will be used to beautify the analysis report." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "2a843093-0548-48dd-9f82-e80af07c357e", + "metadata": {}, + "outputs": [], + "source": [ + "all_headers = [\n", + " \"Age\",\n", + " \"Workclass\",\n", + " \"fnlwgt\",\n", + " \"Education\",\n", + " \"Education-Num\",\n", + " \"Marital Status\",\n", + " \"Occupation\",\n", + " \"Relationship\",\n", + " \"Ethnic group\",\n", + " \"Sex\",\n", + " \"Capital Gain\",\n", + " \"Capital Loss\",\n", + " \"Hours per week\",\n", + " \"Country\",\n", + " \"Target\",\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "2441fc17-0299-4b11-afe7-efdb167263ad", + "metadata": {}, + "source": [ + "To verify that the execution role for this notebook has the necessary permissions to proceed, put a simple test object into the S3 bucket specified above. If this command fails, update the role to have `s3:PutObject` permission on the bucket and try again." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "dfe69a8c-9bf6-47c4-bb59-a775fd3b6934", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Success! We are all set to proceed with uploading to S3.\n" + ] + } + ], + "source": [ + "sagemaker.s3.S3Uploader.upload_string_as_file_body(\n", + " body=\"hello\",\n", + " desired_s3_uri=f\"{s3_key}/upload-test-file.txt\",\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(\"Success! We are all set to proceed with uploading to S3.\")" + ] + }, + { + "cell_type": "markdown", + "id": "7a099ef6-8d09-478d-854c-989758bad1c5", + "metadata": {}, + "source": [ + "Then upload the files to S3 so that they can be used by SageMaker." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "0f0fe183-4c83-4d22-bce5-65eba6a351e2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model file has been uploaded to s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/ll-adult-prediction-model.tar.gz\n", + "Train data is uploaded to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/validation-dataset.jsonl\n", + "Test data is uploaded to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/test-dataset.jsonl\n" + ] + } + ], + "source": [ + "model_url = sagemaker.s3.S3Uploader.upload(\n", + " local_path=model_file,\n", + " desired_s3_uri=s3_key,\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(f\"Model file has been uploaded to {model_url}\")\n", + "\n", + "train_data_s3_uri = sagemaker.s3.S3Uploader.upload(\n", + " local_path=train_dataset_path,\n", + " desired_s3_uri=s3_key,\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(f\"Train data is uploaded to: {train_data_s3_uri}\")\n", + "test_data_s3_uri = sagemaker.s3.S3Uploader.upload(\n", + " local_path=test_dataset_path,\n", + " desired_s3_uri=s3_key,\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(f\"Test data is uploaded to: {test_data_s3_uri}\")" + ] + }, + { + "cell_type": "markdown", + "id": "2d11cc57-8ab4-422e-9492-4126f34ef4c5", + "metadata": {}, + "source": [ + "## Real-time Inference Endpoint\n", + "\n", + "This section creates a SageMaker real-time inference endpoint to showcase the data capture capability in action. The model monitor will be scheduled for the endpoint and process the captured data.\n" + ] + }, + { + "cell_type": "markdown", + "id": "3d295bc3-3a82-4f22-9768-29572c0ae4f3", + "metadata": { + "tags": [] + }, + "source": [ + "### Deploy the model to an endpoint\n", + "\n", + "Start with deploying the pre-trained model. Here, create a SageMaker `Model` object with the inference image and model file. Then deploy the model with the data capture configuration and wait until the endpoint is ready to serve traffic.\n", + "\n", + "`DataCaptureConfig` enables capturing the request payload and the response payload of the endpoint. The data format is passed to the `json_content_types` parameter, so the request payload and response payload are captured as plain text (without being encoding by BASE64)." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "d0c565e0-051a-4f6c-bcb6-3dca8f4ec592", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "SageMaker model name: DEMO-ll-adult-pred-model-monitor-1674105930-3d28\n", + "SageMaker endpoint name: DEMO-ll-adult-pred-model-monitor-1674105930-3d28\n", + "SageMaker Linear Learner image: 174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:1\n" + ] + } + ], + "source": [ + "model_name = sagemaker.utils.unique_name_from_base(\"DEMO-ll-adult-pred-model-monitor\")\n", + "endpoint_name = model_name\n", + "print(f\"SageMaker model name: {model_name}\")\n", + "print(f\"SageMaker endpoint name: {endpoint_name}\")\n", + "\n", + "image_uri = sagemaker.image_uris.retrieve(\"linear-learner\", region, \"1\")\n", + "print(f\"SageMaker Linear Learner image: {image_uri}\")\n", + "\n", + "model = sagemaker.model.Model(\n", + " role=role,\n", + " name=model_name,\n", + " image_uri=image_uri,\n", + " model_data=model_url,\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "\n", + "data_capture_config = sagemaker.model_monitor.DataCaptureConfig(\n", + " enable_capture=True,\n", + " sampling_percentage=100, # Capture 100% of the traffic\n", + " destination_s3_uri=data_capture_s3_uri,\n", + " json_content_types=[dataset_type],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "c86306f2-8f15-4d39-9cbb-2f6c0e7ee978", + "metadata": {}, + "source": [ + "**NOTE**: The following cell takes about 10 minutes to deploy the model." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "77330b34-0640-4b00-b3bb-4a8ea6e9a223", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Deploying model DEMO-ll-adult-pred-model-monitor-1674105930-3d28 to endpoint DEMO-ll-adult-pred-model-monitor-1674105930-3d28\n", + "------!" + ] + } + ], + "source": [ + "print(f\"Deploying model {model_name} to endpoint {endpoint_name}\")\n", + "model.deploy(\n", + " initial_instance_count=1,\n", + " instance_type=\"ml.m5.xlarge\",\n", + " endpoint_name=endpoint_name,\n", + " data_capture_config=data_capture_config,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "14bf8504-bca2-4948-867a-cab4ca349bd9", + "metadata": {}, + "source": [ + "### Invoke the endpoint\n", + "\n", + "Now send data to this endpoint to get inferences in real time. The model supports mini-batch predictions, so you can put one or more records to a single request." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "44a908e5-c16f-41dc-b718-323ab5ed4268", + "metadata": {}, + "outputs": [], + "source": [ + "with open(test_dataset_path, \"r\") as f:\n", + " test_data = f.read().splitlines()" + ] + }, + { + "cell_type": "markdown", + "id": "2ccc2ed6-355a-4cdb-a44e-1463c0d9ef9f", + "metadata": {}, + "source": [ + "#### Example: Single record" + ] + }, + { + "cell_type": "markdown", + "id": "ea0e8368-37b1-41d2-b0da-0f22fee2b87e", + "metadata": {}, + "source": [ + "Request payload:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "52fbb63a-e1d8-414e-968a-20822305f23c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"features\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\n" + ] + } + ], + "source": [ + "request_payload = test_data[0]\n", + "print(request_payload)" + ] + }, + { + "cell_type": "markdown", + "id": "f880886a-38cc-44c1-acc4-f3876956e2a8", + "metadata": {}, + "source": [ + "Response payload:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "87531e43-c9d1-4d9b-8019-19bec1a832eb", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'{\"predicted_label\":1,\"score\":0.989977359771728}\\n'" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "response = sagemaker_session.sagemaker_runtime_client.invoke_endpoint(\n", + " EndpointName=endpoint_name,\n", + " ContentType=dataset_type,\n", + " Accept=dataset_type,\n", + " Body=request_payload,\n", + ")\n", + "response_payload = response[\"Body\"].read().decode(\"utf-8\")\n", + "response_payload" + ] + }, + { + "cell_type": "markdown", + "id": "22fe887e-ec0d-4b2a-9c32-28d93c2e25be", + "metadata": {}, + "source": [ + "#### Example: Two records" + ] + }, + { + "cell_type": "markdown", + "id": "6094ad1c-55dd-40d1-b31f-8d47f21814c3", + "metadata": {}, + "source": [ + "Request payload:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "2cd41694-9e20-461f-ae85-5f792a521753", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'{\"features\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\\n{\"features\":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]}'" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "request_payload = \"\\n\".join(test_data[:2])\n", + "request_payload" + ] + }, + { + "cell_type": "markdown", + "id": "3ab91982-67b4-4293-86cb-bb61be2f67aa", + "metadata": {}, + "source": [ + "Response payload:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "fece49e7-38b9-4b33-91ca-f23fcd06dcbb", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'{\"predicted_label\":1,\"score\":0.989977359771728}\\n{\"predicted_label\":1,\"score\":0.504138827323913}\\n'" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "response = sagemaker_session.sagemaker_runtime_client.invoke_endpoint(\n", + " EndpointName=endpoint_name,\n", + " ContentType=dataset_type,\n", + " Accept=dataset_type,\n", + " Body=request_payload,\n", + ")\n", + "response_payload = response[\"Body\"].read().decode(\"utf-8\")\n", + "response_payload" + ] + }, + { + "cell_type": "markdown", + "id": "243eac0c-a697-42b6-a56f-c0279cc7cd57", + "metadata": {}, + "source": [ + "### View captured data\n", + "\n", + "Because data capture is enabled in the previous steps, the request and response payload, along with some additional metadata, are saved in the Amazon S3 location specified in the `DataCaptureConfig`.\n", + "\n", + "Now list the captured data files stored in Amazon S3. There should be different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:\n", + "\n", + "`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "18c649dd-40ef-4260-b499-0f3c371f970f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Waiting for captured data to show up.........................................................\n", + "Found capture data files:\n", + "s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/data-capture/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/AllTraffic/2023/01/19/05/28-32-105-b59d42b8-3dac-421a-a586-0c4effa143d2.jsonl\n" + ] + } + ], + "source": [ + "print(\"Waiting for captured data to show up\", end=\"\")\n", + "for _ in range(120):\n", + " captured_data_files = sorted(\n", + " sagemaker.s3.S3Downloader.list(\n", + " s3_uri=f\"{data_capture_s3_uri}/{endpoint_name}\",\n", + " sagemaker_session=sagemaker_session,\n", + " )\n", + " )\n", + " if captured_data_files:\n", + " break\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(1)\n", + "print()\n", + "print(\"Found capture data files:\")\n", + "print(\"\\n \".join(captured_data_files[-5:]))" + ] + }, + { + "cell_type": "markdown", + "id": "0b4b01fd-4df2-42ff-935e-8843f1bc568f", + "metadata": {}, + "source": [ + "Next, view the content of a single capture file." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "e4ad7021-4bcc-4fe1-880e-11a872941ff1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"application/jsonlines\",\"mode\":\"INPUT\",\"data\":\"{\\\"features\\\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\",\"encoding\":\"JSON\"},\"endpointOutput\":{\"observedContentType\":\"application/jsonlines\",\"mode\":\"OUTPUT\",\"data\":\"{\\\"predicted_label\\\":1,\\\"score\\\":0.989977359771728}\\n\",\"encoding\":\"JSON\"}},\"eventMetadata\":{\"eventId\":\"50ef5ec7-4d6b-4606-b13d-803c7416f853\",\"inferenceTime\":\"2023-01-19T05:28:32Z\"},\"eventVersion\":\"0\"}\n", + "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"application/jsonlines\",\"mode\":\"INPUT\",\"data\":\"{\\\"features\\\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\\n{\\\"features\\\":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]}\",\"encoding\":\"JSON\"},\"endpointOutput\":{\"observedContentType\":\"application/jsonlines\",\"mode\":\"OUTPUT\",\"data\":\"{\\\"predicted_label\\\":1,\\\"score\\\":0.989977359771728}\\n{\\\"predicted_label\\\":1,\\\"score\\\":0.504138827323913}\\n\",\"encoding\":\"JSON\"}},\"eventMetadata\":{\"eventId\":\"74e40c99-2d94-4a6d-a21e-f0222769012e\",\"inferenceTime\":\"2023-01-19T05:28:32Z\"},\"eventVersion\":\"0\"}\n", + "\n" + ] + } + ], + "source": [ + "captured_data = sagemaker.s3.S3Downloader.read_file(\n", + " s3_uri=captured_data_files[-1],\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(captured_data)" + ] + }, + { + "cell_type": "markdown", + "id": "6e09cffd-111a-43a1-8429-2fa3fbce9d2e", + "metadata": {}, + "source": [ + "Finally, the contents of a single line is present below in formatted JSON to observe a little better.\n", + "\n", + "* `\"captureData\"` has two fields, `\"endpointInput\"` is the captured invocation request, and `\"endpointOutput\"` is the response.\n", + "* `\"eventMetadata\"` has the inference ID and event ID." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "14611944-0ae1-4f9f-ab6e-4b5c74ee7f3f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"captureData\": {\n", + " \"endpointInput\": {\n", + " \"observedContentType\": \"application/jsonlines\",\n", + " \"mode\": \"INPUT\",\n", + " \"data\": \"{\\\"features\\\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\\n{\\\"features\\\":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]}\",\n", + " \"encoding\": \"JSON\"\n", + " },\n", + " \"endpointOutput\": {\n", + " \"observedContentType\": \"application/jsonlines\",\n", + " \"mode\": \"OUTPUT\",\n", + " \"data\": \"{\\\"predicted_label\\\":1,\\\"score\\\":0.989977359771728}\\n{\\\"predicted_label\\\":1,\\\"score\\\":0.504138827323913}\\n\",\n", + " \"encoding\": \"JSON\"\n", + " }\n", + " },\n", + " \"eventMetadata\": {\n", + " \"eventId\": \"74e40c99-2d94-4a6d-a21e-f0222769012e\",\n", + " \"inferenceTime\": \"2023-01-19T05:28:32Z\"\n", + " },\n", + " \"eventVersion\": \"0\"\n", + "}\n" + ] + } + ], + "source": [ + "print(json.dumps(json.loads(captured_data.splitlines()[-1]), indent=4))" + ] + }, + { + "cell_type": "markdown", + "id": "4b473f92-7142-4f79-8a27-86672682a5b2", + "metadata": {}, + "source": [ + "### Start generating some artificial traffic\n", + "The cell below starts a thread to send some traffic to the endpoint. If there is no traffic, the monitoring jobs are marked as `Failed` since there is no data to process.\n", + "\n", + "Notice the `InferenceId` attribute used to invoke, in this example, it will be used to join the captured data with the ground truth data. If it is not available, then the `eventId` will be used for the join operation." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "0af95cc5-9e1d-46fd-b373-16015c87be58", + "metadata": {}, + "outputs": [], + "source": [ + "class WorkerThread(threading.Thread):\n", + " def __init__(self, do_run, *args, **kwargs):\n", + " super(WorkerThread, self).__init__(*args, **kwargs)\n", + " self.__do_run = do_run\n", + " self.__terminate_event = threading.Event()\n", + "\n", + " def terminate(self):\n", + " self.__terminate_event.set()\n", + "\n", + " def run(self):\n", + " while not self.__terminate_event.is_set():\n", + " self.__do_run(self.__terminate_event)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "00e832f7-8cc7-4044-b2aa-f22c93d2078d", + "metadata": {}, + "outputs": [], + "source": [ + "def invoke_endpoint(terminate_event):\n", + " for index, record in enumerate(test_data):\n", + " response = sagemaker_session.sagemaker_runtime_client.invoke_endpoint(\n", + " EndpointName=endpoint_name,\n", + " ContentType=dataset_type,\n", + " Accept=dataset_type,\n", + " Body=record,\n", + " InferenceId=str(index), # unique ID per row\n", + " )\n", + " response[\"Body\"].read()\n", + " time.sleep(1)\n", + " if terminate_event.is_set():\n", + " break\n", + "\n", + "\n", + "# Keep invoking the endpoint with test data\n", + "invoke_endpoint_thread = WorkerThread(do_run=invoke_endpoint)\n", + "invoke_endpoint_thread.start()" + ] + }, + { + "cell_type": "markdown", + "id": "c61c772d-0628-4b9f-843d-1cd631cbf99f", + "metadata": { + "tags": [] + }, + "source": [ + "## Ground Truth Data\n", + "\n", + "Besides captured data, bias drift monitoring execution also requires ground truth data. In real use cases, you should regularly label the captured data, then upload the ground truth data (labels) to designated S3 location. For demonstration purpose, this example notebook generates fake ground truth data following [this schema](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-merge.html), and then uploads it to `ground_truth_s3_uri` which is another key input to the monitor. The bias drift monitoring execution will first merge the captured data and the ground truth data, and then do bias analysis for the merged data.\n", + "\n", + "Notice **the value of the ground truth \"data\" field must be in the same format as how the ground truth labels are stored in the input dataset**." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "d43e06d4-32d8-451c-81f2-be1f131a5ec0", + "metadata": {}, + "outputs": [], + "source": [ + "def ground_truth_with_id(inference_id):\n", + " random.seed(inference_id) # to get consistent results\n", + " label = 1 if random.random() < 0.7 else 0 # randomly generate positive labels 70% of the time\n", + " # format required by the merge job and bias monitoring job\n", + " return {\n", + " \"groundTruthData\": {\n", + " \"data\": json.dumps(\n", + " {\"label\": label} # Also use the \"label\" key, the same as in the input dataset.\n", + " ),\n", + " \"encoding\": \"JSON\",\n", + " },\n", + " \"eventMetadata\": {\n", + " \"eventId\": str(inference_id),\n", + " },\n", + " \"eventVersion\": \"0\",\n", + " }\n", + "\n", + "\n", + "def upload_ground_truth(upload_time):\n", + " records = [ground_truth_with_id(i) for i in range(len(test_data))]\n", + " fake_records = [json.dumps(r) for r in records]\n", + " data_to_upload = \"\\n\".join(fake_records)\n", + " target_s3_uri = f\"{ground_truth_s3_uri}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl\"\n", + " print(f\"Uploading {len(fake_records)} records to\", target_s3_uri)\n", + " sagemaker.s3.S3Uploader.upload_string_as_file_body(\n", + " body=data_to_upload,\n", + " desired_s3_uri=target_s3_uri,\n", + " sagemaker_session=sagemaker_session,\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "49137517-172a-45ea-b139-ae78555b47e6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Uploading 334 records to s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/ground-truth/2023/01/19/04/2933.jsonl\n" + ] + } + ], + "source": [ + "# Generate data for the last hour, in case the first monitoring execution is in this hour\n", + "upload_ground_truth(datetime.datetime.utcnow() - datetime.timedelta(hours=1))" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "573901f2-fbba-4bf0-b73c-807c44fe709b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Uploading 334 records to s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/ground-truth/2023/01/19/05/2933.jsonl\n" + ] + } + ], + "source": [ + "# Generate data once an hour\n", + "def generate_fake_ground_truth(terminate_event):\n", + " upload_ground_truth(datetime.datetime.utcnow())\n", + " for _ in range(0, 60):\n", + " time.sleep(60)\n", + " if terminate_event.is_set():\n", + " break\n", + "\n", + "\n", + "ground_truth_thread = WorkerThread(do_run=generate_fake_ground_truth)\n", + "ground_truth_thread.start()" + ] + }, + { + "cell_type": "markdown", + "id": "f8d87f96-1ab6-4ad9-bd0d-f21b18ebcded", + "metadata": {}, + "source": [ + "## Model Bias Monitor\n", + "\n", + "Similar to the other monitoring types, the standard procedure of creating a [bias drift monitor](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-bias-drift.html) is first run a baselining job, and then schedule the monitor.\n", + "\n", + "A bias drift monitoring execution starts a merge job that joins the captured data and ground truth data together using the inference ID. Then a SageMaker Clarify bias analysis job is started to compute all the [pre-training bias metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-data-bias.html) and [post-training bias metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-post-training-bias.html). on the merged data. The max execution time is divided equally between two jobs, the notebook is scheduling an hourly model bias monitor, so the `max_runtime_in_seconds` parameter should not exceed 1800 seconds." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "273af941-56ff-4a08-a1e1-023e2d4ec090", + "metadata": {}, + "outputs": [], + "source": [ + "model_bias_monitor = sagemaker.model_monitor.ModelBiasMonitor(\n", + " role=role,\n", + " sagemaker_session=sagemaker_session,\n", + " max_runtime_in_seconds=1800,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "c47a6f66-bdd8-4815-b3ed-286035f6e4ce", + "metadata": {}, + "source": [ + "### Baselining job\n", + "\n", + "A baselining job runs predictions on training dataset and suggests constraints. The `suggest_baseline()` method of `SageMakerClarifyProcessor` starts a SageMaker Clarify processing job to generate the constraints.\n", + "\n", + "The step is not mandatory, but providing constraints file to the monitor can enable violations file generation." + ] + }, + { + "cell_type": "markdown", + "id": "b7bd931a-bacc-480b-8d2d-c363abe9943f", + "metadata": {}, + "source": [ + "#### Configurations\n", + "\n", + "Information about the input data need to be provided to the processor." + ] + }, + { + "cell_type": "markdown", + "id": "6398d447-0ccf-4c79-a29d-8d6a54e1c034", + "metadata": {}, + "source": [ + "`DataConfig` stores information about the dataset to be analyzed. For example, the dataset file and its format (like JSON Lines), where to store the analysis results. Some special things to note about this configuration for the JSON Lines dataset,\n", + "\n", + "* The parameter value `\"features\"` or `\"label\"` is **NOT** a header string. Instead, it is a `JMESPath` (https://jmespath.org) expression to locate the features list or the ground truth label in the dataset. In this example notebook they happen to be the same as the keys in the dataset. But for example, if the dataset has records like below, then `features` should be \"data.features.values\", and `label` should be \"data.label\". \n", + "\n", + "```\n", + "{\"data\": {\"features\": {\"values\": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37]}, \"label\": 0}}\n", + "```\n", + "\n", + "* SageMaker Clarify processing job will load the JSON Lines dataset into tabular representation for further analysis, and the parameter `headers` is the list of column names. **The label header shall be the last one in the headers list**, and the order of feature headers shall be the same as the order of features in a record." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "fd146e26-a54c-4a31-acc9-5a406ddf8680", + "metadata": {}, + "outputs": [], + "source": [ + "features_jmespath = \"features\"\n", + "ground_truth_label_jmespath = \"label\"\n", + "data_config = sagemaker.clarify.DataConfig(\n", + " s3_data_input_path=train_data_s3_uri,\n", + " s3_output_path=baselining_output_s3_uri,\n", + " features=features_jmespath,\n", + " label=ground_truth_label_jmespath,\n", + " headers=all_headers,\n", + " dataset_type=dataset_type,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "93c9c98b-67a5-45e0-8aa5-a488e25a6de8", + "metadata": {}, + "source": [ + "`ModelConfig` is configuration related to model to be used for inferencing. In order to compute post-training bias metrics, the computation needs to get inferences for the SageMaker model. To accomplish this, the processing job will use the model to create an ephemeral endpoint (also known as \"shadow endpoint\"). The processing job will delete the shadow endpoint after the computations are completed. One special thing to note about this configuration for the JSON Lines model input and output,\n", + "\n", + "* `content_template` is used by SageMaker Clarify processing job to convert the tabular data to the request payload acceptable to the shadow endpoint. To be more specific, the placeholder `$features` will be replaced by **the features list** from records. The request payload of a record from the testing dataset happens to be similar to the record itself, like `{\"features\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}`, because both the dataset and the model input conform to the same format." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "3a49acc6-c6a9-46fa-aed7-e93e67fae373", + "metadata": {}, + "outputs": [], + "source": [ + "content_template = '{\"features\":$features}'\n", + "model_config = sagemaker.clarify.ModelConfig(\n", + " model_name=model_name, # The name of the SageMaker model\n", + " instance_type=\"ml.m5.xlarge\", # The instance type of the shadow endpoint\n", + " instance_count=1, # The instance count of the shadow endpoint\n", + " content_type=dataset_type, # The data format of the model input\n", + " accept_type=dataset_type, # The data format of the model output\n", + " content_template=content_template,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "ca3c02c3-0238-48c9-8f21-73ddb317c506", + "metadata": {}, + "source": [ + "`ModelPredictedLabelConfig` specifies how to extract predicted label from the model output. The example model returns the predicted label as well as the confidence score, so there are two ways to define this configuration,\n", + "\n", + "* Set the `label` parameter to \"predicted_label\" which is the `JMESPath` expression to locate the predicted label in the model output. This is the way used in this example.\n", + "* Alternatively, you can set the `probability` parameter to \"score\" which is the `JMESPath` expression to locate the confidence score in the model output. And set the `probability_threshold` parameter to a floating number in between 0 and 1. The post-training analysis will use it to convert a score to binary predicted label (`0` or `1`). The default value is 0.5, which means a probability value > 0.5 indicates predicted label `1`." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "c6dc6502-8a28-4cda-a135-2c687e9097b6", + "metadata": {}, + "outputs": [], + "source": [ + "predicted_label_jmespath = \"predicted_label\"\n", + "model_predicted_label_config = sagemaker.clarify.ModelPredictedLabelConfig(\n", + " label=predicted_label_jmespath,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "506b583a-f643-45dc-bdd3-ae29120734fa", + "metadata": {}, + "source": [ + "`BiasConfig` is the configuration of the sensitive groups in the dataset. Typically, bias is measured by computing a metric and comparing it across groups. \n", + "\n", + " * The group of interest is specified using the facet parameters. With the following configuration, the baselining job will check for bias in the model's predictions with respect to gender and income. Specifically, it is checking if the model is more likely to predict that males have an annual income of over $50,000 compared to females. Although not demonstrated in this example, a bias monitor can measure bias against multiple sensitive attributes, if you provide a list of facets.\n", + " * The `group_name` parameter is used to form subgroups for the measurement of [Conditional Demographic Disparity in Labels](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-data-bias-metric-cddl.html) (CDDL) and [Conditional Demographic Disparity in Predicted Labels](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-post-training-bias-metric-cddpl.html) (CDDPL) with regard to [Simpson’s paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox)." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "0ead08ae-1867-41b9-8c0e-6202760c4175", + "metadata": {}, + "outputs": [], + "source": [ + "bias_config = sagemaker.clarify.BiasConfig(\n", + " label_values_or_threshold=[1], # the positive outcome is earning >$50,000\n", + " facet_name=\"Sex\", # the sensitive attribute is the gender\n", + " facet_values_or_threshold=[0], # the disadvantaged group is female\n", + " group_name=\"Age\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "3c9417f1-b2b2-4c23-81ba-256ff4616c5c", + "metadata": {}, + "source": [ + "#### Kick off baselining job\n", + "\n", + "Call the `suggest_baseline()` method to start the baselining job. The job computes all the [pre-training bias metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-data-bias.html) and [post-training bias metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-post-training-bias.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "9c27e74b-31f6-435a-a0d4-bef52a4cdcdb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Job Name: baseline-suggestion-job-2023-01-19-05-29-34-322\n", + "Inputs: [{'InputName': 'dataset', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/validation-dataset.jsonl', 'LocalPath': '/opt/ml/processing/input/data', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'analysis_config', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/baselining-output/analysis_config.json', 'LocalPath': '/opt/ml/processing/input/config', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]\n", + "Outputs: [{'OutputName': 'analysis_result', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/baselining-output', 'LocalPath': '/opt/ml/processing/output', 'S3UploadMode': 'EndOfJob'}}]\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model_bias_monitor.suggest_baseline(\n", + " bias_config=bias_config,\n", + " data_config=data_config,\n", + " model_config=model_config,\n", + " model_predicted_label_config=model_predicted_label_config,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "9cf396d3-c7ab-4041-8820-64c5ebd15d46", + "metadata": {}, + "source": [ + "**NOTE**: The following cell waits until the baselining job is completed (in about 10 minutes). It then inspects the suggested constraints. This step can be skipped, because the monitor to be scheduled will automatically pick up baselining job name and wait for it before monitoring execution." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "ad0ece68-f130-4b66-b8ab-36d2916502c8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + ".................................................................................................!\n", + "Suggested constraints: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/baselining-output/analysis.json\n", + "{\n", + " \"version\": \"1.0\",\n", + " \"post_training_bias_metrics\": {\n", + " \"label\": \"Target\",\n", + " \"facets\": {\n", + " \"Sex\": [\n", + " {\n", + " \"value_or_threshold\": \"0\",\n", + " \"metrics\": [\n", + " {\n", + " \"name\": \"AD\",\n", + " \"description\": \"Accuracy Difference (AD)\",\n", + " \"value\": -0.15156641604010024\n", + " },\n", + " {\n", + " \"name\": \"CDDPL\",\n", + " \"description\": \"Conditional Demographic Disparity in Predicted Labels (CDDPL)\",\n", + " \"value\": 0.28176563733194276\n", + " },\n", + " {\n", + " \"name\": \"DAR\",\n", + " \"description\": \"Difference in Acceptance Rates (DAR)\",\n", + " \"value\": -0.09508196721311479\n", + " },\n", + " {\n", + " \"name\": \"DCA\",\n", + " \"description\": \"Difference in Conditional Acceptance (DCA)\",\n", + " \"value\": -0.5278688524590163\n", + " },\n", + " {\n", + " \"name\": \"DCR\",\n", + " \"description\": \"Difference in Conditional Rejection (DCR)\",\n", + " \"value\": 0.027874251497005953\n", + " },\n", + " {\n", + " \"name\": \"DI\",\n", + " \"description\": \"Disparate Impact (DI)\",\n", + " \"value\": 0.17798594847775176\n", + " },\n", + " {\n", + " \"name\": \"DPPL\",\n", + " \"description\": \"Difference in Positive Proportions in Predicted Labels (DPPL)\",\n", + " \"value\": 0.2199248120300752\n", + " },\n", + " {\n", + " \"name\": \"DRR\",\n", + " \"description\": \"Difference in Rejection Rates (DRR)\",\n", + " \"value\": 0.12565868263473046\n", + " },\n", + " {\n", + " \"name\": \"FT\",\n", + " \"description\": \"Flip Test (FT)\",\n", + " \"value\": -0.03333333333333333\n", + " },\n", + " {\n", + " \"name\": \"GE\",\n", + " \"description\": \"Generalized Entropy (GE)\",\n", + " \"value\": 0.0841186702174704\n", + " },\n", + " {\n", + " \"name\": \"RD\",\n", + " \"description\": \"Recall Difference (RD)\",\n", + " \"value\": 0.1308103661044837\n", + " },\n", + " {\n", + " \"name\": \"SD\",\n", + " \"description\": \"Specificity Difference (SD)\",\n", + " \"value\": 0.10465328014037645\n", + " },\n", + " {\n", + " \"name\": \"TE\",\n", + " \"description\": \"Treatment Equality (TE)\",\n", + " \"value\": 2.916666666666667\n", + " }\n", + " ]\n", + " }\n", + " ]\n", + " },\n", + " \"label_value_or_threshold\": \"1\"\n", + " },\n", + " \"pre_training_bias_metrics\": {\n", + " \"label\": \"Target\",\n", + " \"facets\": {\n", + " \"Sex\": [\n", + " {\n", + " \"value_or_threshold\": \"0\",\n", + " \"metrics\": [\n", + " {\n", + " \"name\": \"CDDL\",\n", + " \"description\": \"Conditional Demographic Disparity in Labels (CDDL)\",\n", + " \"value\": 0.27459074287718793\n", + " },\n", + " {\n", + " \"name\": \"CI\",\n", + " \"description\": \"Class Imbalance (CI)\",\n", + " \"value\": 0.36936936936936937\n", + " },\n", + " {\n", + " \"name\": \"DPL\",\n", + " \"description\": \"Difference in Positive Proportions in Labels (DPL)\",\n", + " \"value\": 0.2326441102756892\n", + " },\n", + " {\n", + " \"name\": \"JS\",\n", + " \"description\": \"Jensen-Shannon Divergence (JS)\",\n", + " \"value\": 0.04508199943437752\n", + " },\n", + " {\n", + " \"name\": \"KL\",\n", + " \"description\": \"Kullback-Liebler Divergence (KL)\",\n", + " \"value\": 0.22434464102537785\n", + " },\n", + " {\n", + " \"name\": \"KS\",\n", + " \"description\": \"Kolmogorov-Smirnov Distance (KS)\",\n", + " \"value\": 0.2326441102756892\n", + " },\n", + " {\n", + " \"name\": \"LP\",\n", + " \"description\": \"L-p Norm (LP)\",\n", + " \"value\": 0.32900845595810163\n", + " },\n", + " {\n", + " \"name\": \"TVD\",\n", + " \"description\": \"Total Variation Distance (TVD)\",\n", + " \"value\": 0.2326441102756892\n", + " }\n", + " ]\n", + " }\n", + " ]\n", + " },\n", + " \"label_value_or_threshold\": \"1\"\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "model_bias_monitor.latest_baselining_job.wait(logs=False)\n", + "print()\n", + "model_bias_constraints = model_bias_monitor.suggested_constraints()\n", + "print(f\"Suggested constraints: {model_bias_constraints.file_s3_uri}\")\n", + "print(\n", + " sagemaker.s3.S3Downloader.read_file(\n", + " s3_uri=model_bias_constraints.file_s3_uri,\n", + " sagemaker_session=sagemaker_session,\n", + " )\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "5545f7e0-8256-4b33-8385-741c23b9acc6", + "metadata": {}, + "source": [ + "### Monitoring Schedule\n", + "\n", + "With above constraints collected, now call `create_monitoring_schedule()` method to schedule an hourly model bias monitor." + ] + }, + { + "cell_type": "markdown", + "id": "b99f1d50-d9ce-42c6-84da-a710bfb7b47a", + "metadata": {}, + "source": [ + "If a baselining job has been submitted, then the monitor object will automatically pick up the analysis configuration from the baselining job. But if the baselining step is skipped, or if the capture dataset has different nature than the training dataset, then analysis configuration has to be provided.\n", + "\n", + "`BiasAnalysisConfig` is a subset of the configuration of the baselining job, many options are not needed because,\n", + "\n", + "* Model bias monitor will merge the captured data and the ground truth data, and then use the merged data as the dataset.\n", + "* Capture data already includes predictions, so there is no need to create shadow endpoint.\n", + "* Attributes like predicted label are provided as part of EndpointInput.\n", + "\n", + "Highlights,\n", + "\n", + "* From `endpoint_name` the monitor can figure out the location of data captured by the endpoint.\n", + "* `ground_truth_s3_uri` is the location of ground truth data\n", + "* `features_attribute` is the `JMESPath` expression to locate the features in model input, similar to the `features` parameter of `DataConfig`.\n", + "* `inference_attribute` is the `JMESPath` expression to locate the predicted label in model output, similar to the `label` parameter of `ModelPredictedLabelConfig`." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "8d160d3e-0482-4c4b-a171-e62eddb38b87", + "metadata": {}, + "outputs": [], + "source": [ + "schedule_expression = sagemaker.model_monitor.CronExpressionGenerator.hourly()" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "1c7a1355-2997-46f2-ae02-cb00063e3661", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model bias monitoring schedule: monitoring-schedule-2023-01-19-05-37-42-411\n" + ] + } + ], + "source": [ + "model_bias_analysis_config = None\n", + "if not model_bias_monitor.latest_baselining_job:\n", + " model_bias_analysis_config = sagemaker.model_monitor.BiasAnalysisConfig(\n", + " bias_config,\n", + " headers=all_headers,\n", + " label=ground_truth_label_jmespath,\n", + " )\n", + "model_bias_monitor.create_monitoring_schedule(\n", + " analysis_config=model_bias_analysis_config,\n", + " endpoint_input=sagemaker.model_monitor.EndpointInput(\n", + " endpoint_name=endpoint_name,\n", + " destination=\"/opt/ml/processing/input/endpoint\",\n", + " features_attribute=features_jmespath, # mandatory if no baselining job\n", + " inference_attribute=predicted_label_jmespath, # mandatory if no baselining job\n", + " # look back 6 hour for captured data\n", + " start_time_offset=\"-PT6H\",\n", + " end_time_offset=\"-PT0H\",\n", + " ),\n", + " ground_truth_input=ground_truth_s3_uri,\n", + " output_s3_uri=monitor_output_s3_uri,\n", + " schedule_cron_expression=schedule_expression,\n", + ")\n", + "print(f\"Model bias monitoring schedule: {model_bias_monitor.monitoring_schedule_name}\")" + ] + }, + { + "cell_type": "markdown", + "id": "bf22401a-4662-4063-b47f-5be6becf3c3b", + "metadata": {}, + "source": [ + "#### Wait for the first execution\n", + "\n", + "The schedule starts jobs at the previously specified intervals. Code below waits until time crosses the hour boundary (in UTC) to see executions kick off.\n", + "\n", + "Note: Even for an hourly schedule, Amazon SageMaker has a buffer period of 20 minutes to schedule executions. The execution might start in anywhere from zero to ~20 minutes from the hour boundary. This is expected and done for load balancing in the backend." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "ae00eb31-bbc7-4cf9-9fae-b323b4d380b2", + "metadata": {}, + "outputs": [], + "source": [ + "def wait_for_execution_to_start(model_monitor):\n", + " print(\n", + " \"An hourly schedule was created above and it will kick off executions ON the hour (plus 0 - 20 min buffer).\"\n", + " )\n", + "\n", + " print(\"Waiting for the first execution to happen\", end=\"\")\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " while \"LastMonitoringExecutionSummary\" not in schedule_desc:\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(60)\n", + " print()\n", + " print(\"Done! Execution has been created\")\n", + "\n", + " print(\"Now waiting for execution to start\", end=\"\")\n", + " while schedule_desc[\"LastMonitoringExecutionSummary\"][\"MonitoringExecutionStatus\"] in \"Pending\":\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(10)\n", + "\n", + " print()\n", + " print(\"Done! Execution has started\")" + ] + }, + { + "cell_type": "markdown", + "id": "16fabf1c-8458-4186-9fb2-7bfa2462b705", + "metadata": {}, + "source": [ + "**NOTE**: The following cell waits until the first monitoring execution is started. As explained above, the wait could take more than 60 minutes." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "b512df1e-57cf-4ba3-9262-0c325c4a600e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "An hourly schedule was created above and it will kick off executions ON the hour (plus 0 - 20 min buffer).\n", + "Waiting for the first execution to happen................................\n", + "Done! Execution has been created\n", + "Now waiting for execution to start..........\n", + "Done! Execution has started\n" + ] + } + ], + "source": [ + "wait_for_execution_to_start(model_bias_monitor)" + ] + }, + { + "cell_type": "markdown", + "id": "210955ae-1709-423f-98c0-ca93476eebde", + "metadata": {}, + "source": [ + "In real world, a monitoring schedule is supposed to be active all the time. But in this example, it can be stopped to avoid incurring extra charges. A stopped schedule will not trigger further executions, but the ongoing execution will continue. And if needed, the schedule can be restarted by `start_monitoring_schedule()`." + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "a6980d31-c96d-4850-a7fb-c8583eeac54e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Stopping Monitoring Schedule with name: monitoring-schedule-2023-01-19-05-37-42-411\n" + ] + } + ], + "source": [ + "model_bias_monitor.stop_monitoring_schedule()" + ] + }, + { + "cell_type": "markdown", + "id": "117a4a1d-4410-4f60-b859-762f18f7370b", + "metadata": {}, + "source": [ + "#### Wait for the execution to finish\n", + "\n", + "In the previous cell, the first execution has started. This section waits for the execution to finish so that its analysis results are available. Here are the possible terminal states and what each of them mean:\n", + "\n", + "* `Completed` - This means the monitoring execution completed, and no issues were found in the violations report.\n", + "* `CompletedWithViolations` - This means the execution completed, but constraint violations were detected.\n", + "* `Failed` - The monitoring execution failed, maybe due to client error (perhaps incorrect role permissions) or infrastructure issues. Further examination of `FailureReason` and `ExitMessage` is necessary to identify what exactly happened.\n", + "* `Stopped` - job exceeded max runtime or was manually stopped." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "2b07426d-f805-4527-9863-1d3d664734fa", + "metadata": {}, + "outputs": [], + "source": [ + "# Waits for the schedule to have last execution in a terminal status.\n", + "def wait_for_execution_to_finish(model_monitor):\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " execution_summary = schedule_desc.get(\"LastMonitoringExecutionSummary\")\n", + " if execution_summary is not None:\n", + " print(\"Waiting for execution to finish\", end=\"\")\n", + " while execution_summary[\"MonitoringExecutionStatus\"] not in [\n", + " \"Completed\",\n", + " \"CompletedWithViolations\",\n", + " \"Failed\",\n", + " \"Stopped\",\n", + " ]:\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(60)\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " execution_summary = schedule_desc[\"LastMonitoringExecutionSummary\"]\n", + " print()\n", + " print(f\"Done! Execution Status: {execution_summary['MonitoringExecutionStatus']}\")\n", + " else:\n", + " print(\"Last execution not found\")" + ] + }, + { + "cell_type": "markdown", + "id": "01434010-3c04-4ef5-acd2-21a3a0035fc8", + "metadata": {}, + "source": [ + "**NOTE**: The following cell takes about 10 minutes." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "25e36f00-f488-4a16-867f-92c53d819782", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Waiting for execution to finish..........\n", + "Done! Execution Status: CompletedWithViolations\n" + ] + } + ], + "source": [ + "wait_for_execution_to_finish(model_bias_monitor)" + ] + }, + { + "cell_type": "markdown", + "id": "442c7bbd-0af7-44a1-bec9-a94f180f6892", + "metadata": {}, + "source": [ + "#### Merged data\n", + "\n", + "Merged data is the intermediate results of bias drift monitoring execution. It is saved to JSON Lines files under the \"merge\" folder of `monitor_output_s3_uri`. Each line is a valid JSON object which combines the captured data and the ground truth data." + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "b6df9816-63ad-4e44-b26d-b79fba785307", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Found merged files:\n", + "s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/merge/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/AllTraffic/2023/01/19/05/part-00001-d8c20802-c1f9-45c3-a3fa-007f2b73f21b.c000.jsonl\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/merge/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/AllTraffic/2023/01/19/05/part-00002-45e374bf-f12b-4abb-988d-5b5179da218e.c000.jsonl\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/merge/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/AllTraffic/2023/01/19/06/part-00000-e4cbb1f0-cc6d-4f92-9397-af7f00604374.c000.jsonl\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/merge/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/AllTraffic/2023/01/19/06/part-00001-ff724347-c6e3-4ac4-9872-c008d3dad640.c000.jsonl\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/merge/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/AllTraffic/2023/01/19/06/part-00002-f68c1866-e8b6-4c16-aba3-96669f7af976.c000.jsonl\n" + ] + } + ], + "source": [ + "merged_data_s3_uri = f\"{monitor_output_s3_uri}/merge\"\n", + "merged_data_files = sagemaker.s3.S3Downloader.list(\n", + " s3_uri=merged_data_s3_uri,\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(\"Found merged files:\")\n", + "print(\"\\n \".join(merged_data_files[-5:]))" + ] + }, + { + "cell_type": "markdown", + "id": "9f71db78-5d65-4768-b5ff-461057c5f922", + "metadata": {}, + "source": [ + "The following cell prints a single line of a merged data file.\n", + "\n", + "* `eventId` is the inference ID from the captured data and the ground truth data\n", + "* `groundTruthData` is from the ground truth data\n", + "* `captureData` is from the captured data." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "6581b300-4ee0-4884-aef7-bf94577c07aa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"eventVersion\": \"0\",\n", + " \"groundTruthData\": {\n", + " \"data\": \"{\\\"label\\\": 0}\",\n", + " \"encoding\": \"JSON\"\n", + " },\n", + " \"captureData\": {\n", + " \"endpointInput\": {\n", + " \"data\": \"{\\\"features\\\":[42,2,222756,9,13,0,9,4,4,1,7430,0,40,37]}\",\n", + " \"encoding\": \"JSON\",\n", + " \"mode\": \"INPUT\",\n", + " \"observedContentType\": \"application/jsonlines\"\n", + " },\n", + " \"endpointOutput\": {\n", + " \"data\": \"{\\\"predicted_label\\\":1,\\\"score\\\":0.90517783164978}\\n\",\n", + " \"encoding\": \"JSON\",\n", + " \"mode\": \"OUTPUT\",\n", + " \"observedContentType\": \"application/jsonlines\"\n", + " }\n", + " },\n", + " \"eventMetadata\": {\n", + " \"eventId\": \"d7f04b0c-e3f3-4a55-b0be-47c4942e9ee8\",\n", + " \"inferenceId\": \"95\",\n", + " \"inferenceTime\": \"2023-01-19T06:04:56Z\"\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "merged_data_file = sagemaker.s3.S3Downloader.read_file(\n", + " s3_uri=merged_data_files[-1],\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "merged_record = merged_data_file.splitlines()[-1]\n", + "print(json.dumps(json.loads(merged_record), indent=4))" + ] + }, + { + "cell_type": "markdown", + "id": "27ecf876-5999-4c2a-adcd-0a8537f082e6", + "metadata": {}, + "source": [ + "#### Inspect execution results\n", + "\n", + "List the generated reports,\n", + "\n", + "* analysis.json includes all the bias metrics.\n", + "* report.* files are static report files to visualize the bias metrics" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "3c767cbd-78c5-433d-a850-e230cb5a55dd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Report URI: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/monitoring-schedule-2023-01-19-05-37-42-411/2023/01/19/06\n", + "Found Report Files:\n", + "s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/monitoring-schedule-2023-01-19-05-37-42-411/2023/01/19/06/analysis.json\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/monitoring-schedule-2023-01-19-05-37-42-411/2023/01/19/06/constraint_violations.json\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/monitoring-schedule-2023-01-19-05-37-42-411/2023/01/19/06/report.html\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/monitoring-schedule-2023-01-19-05-37-42-411/2023/01/19/06/report.ipynb\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674105929-04d6/monitor-output/DEMO-ll-adult-pred-model-monitor-1674105930-3d28/monitoring-schedule-2023-01-19-05-37-42-411/2023/01/19/06/report.pdf\n" + ] + } + ], + "source": [ + "schedule_desc = model_bias_monitor.describe_schedule()\n", + "execution_summary = schedule_desc.get(\"LastMonitoringExecutionSummary\")\n", + "if execution_summary and execution_summary[\"MonitoringExecutionStatus\"] in [\n", + " \"Completed\",\n", + " \"CompletedWithViolations\",\n", + "]:\n", + " last_model_bias_monitor_execution = model_bias_monitor.list_executions()[-1]\n", + " last_model_bias_monitor_execution_report_uri = (\n", + " last_model_bias_monitor_execution.output.destination\n", + " )\n", + " print(f\"Report URI: {last_model_bias_monitor_execution_report_uri}\")\n", + " last_model_bias_monitor_execution_report_files = sorted(\n", + " sagemaker.s3.S3Downloader.list(\n", + " s3_uri=last_model_bias_monitor_execution_report_uri,\n", + " sagemaker_session=sagemaker_session,\n", + " )\n", + " )\n", + " print(\"Found Report Files:\")\n", + " print(\"\\n \".join(last_model_bias_monitor_execution_report_files))\n", + "else:\n", + " last_model_bias_monitor_execution = None\n", + " print(\n", + " \"====STOP==== \\n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures.\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "602a2ef3-4d6c-4d93-974e-77a679fc4757", + "metadata": {}, + "source": [ + "If there are any violations compared to the baseline, they are listed here. See [Bias Drift Violations](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-bias-drift-violations.html) for the schema of the file, and how violations are detected." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "a7174d2e-9ee4-437f-be9a-c9d984318b76", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{ 'version': '1.0',\n", + " 'violations': [ { 'constraint_check_type': 'bias_drift_check',\n", + " 'description': 'Metric value 0.37745519271384836 '\n", + " \"doesn't meet the baseline constraint \"\n", + " 'requirement 0.28176563733194276',\n", + " 'facet': 'Sex',\n", + " 'facet_value': '0',\n", + " 'metric_name': 'CDDPL'},\n", + " { 'constraint_check_type': 'bias_drift_check',\n", + " 'description': 'Metric value -0.3522727272727273 '\n", + " \"doesn't meet the baseline constraint \"\n", + " 'requirement -0.09508196721311479',\n", + " 'facet': 'Sex',\n", + " 'facet_value': '0',\n", + " 'metric_name': 'DAR'},\n", + " { 'constraint_check_type': 'bias_drift_check',\n", + " 'description': 'Metric value -36.42045454545455 '\n", + " \"doesn't meet the baseline constraint \"\n", + " 'requirement -0.5278688524590163',\n", + " 'facet': 'Sex',\n", + " 'facet_value': '0',\n", + " 'metric_name': 'DCA'},\n", + " { 'constraint_check_type': 'bias_drift_check',\n", + " 'description': 'Metric value -0.07676294702237407 '\n", + " \"doesn't meet the baseline constraint \"\n", + " 'requirement 0.027874251497005953',\n", + " 'facet': 'Sex',\n", + " 'facet_value': '0',\n", + " 'metric_name': 'DCR'},\n", + " { 'constraint_check_type': 'bias_drift_check',\n", + " 'description': \"Metric value -0.14 doesn't meet the \"\n", + " 'baseline constraint requirement '\n", + " '-0.03333333333333333',\n", + " 'facet': 'Sex',\n", + " 'facet_value': '0',\n", + " 'metric_name': 'FT'},\n", + " { 'constraint_check_type': 'bias_drift_check',\n", + " 'description': 'Metric value 0.9537540310981236 '\n", + " \"doesn't meet the baseline constraint \"\n", + " 'requirement 0.0841186702174704',\n", + " 'facet': 'Sex',\n", + " 'facet_value': '0',\n", + " 'metric_name': 'GE'},\n", + " { 'constraint_check_type': 'bias_drift_check',\n", + " 'description': 'Metric value 0.170704663945835 '\n", + " \"doesn't meet the baseline constraint \"\n", + " 'requirement 0.1308103661044837',\n", + " 'facet': 'Sex',\n", + " 'facet_value': '0',\n", + " 'metric_name': 'RD'},\n", + " { 'constraint_check_type': 'bias_drift_check',\n", + " 'description': 'Metric value 0.2792792792792793 '\n", + " \"doesn't meet the baseline constraint \"\n", + " 'requirement 0.10465328014037645',\n", + " 'facet': 'Sex',\n", + " 'facet_value': '0',\n", + " 'metric_name': 'SD'},\n", + " { 'constraint_check_type': 'bias_drift_check',\n", + " 'description': \"Metric value Infinity doesn't meet \"\n", + " 'the baseline constraint requirement '\n", + " '2.916666666666667',\n", + " 'facet': 'Sex',\n", + " 'facet_value': '0',\n", + " 'metric_name': 'TE'}]}\n" + ] + } + ], + "source": [ + "violations = model_bias_monitor.latest_monitoring_constraint_violations()\n", + "if violations is not None:\n", + " pprint.PrettyPrinter(indent=4).pprint(violations.body_dict)" + ] + }, + { + "cell_type": "markdown", + "id": "1b2e3d97-27cc-4325-814d-04219d25ab76", + "metadata": {}, + "source": [ + "By default, the analysis results are also published to CloudWatch, see [CloudWatch Metrics for Bias Drift Analysis](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-bias-drift-cw.html)." + ] + }, + { + "cell_type": "markdown", + "id": "f6388287-b810-4522-bcc1-928228982388", + "metadata": {}, + "source": [ + "## Cleanup\n", + "\n", + "The endpoint can keep running and capturing data, but if there is no plan to collect more data or use this endpoint further, it should be deleted to avoid incurring additional charges. Note that deleting endpoint does not delete the data that was captured during the model invocations." + ] + }, + { + "cell_type": "markdown", + "id": "554e8db8-4918-420c-9b4d-5c7263a402e7", + "metadata": {}, + "source": [ + "First stop the worker threads," + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "f813097c-00cc-4ee4-91cc-d03b72915c67", + "metadata": {}, + "outputs": [], + "source": [ + "invoke_endpoint_thread.terminate()\n", + "ground_truth_thread.terminate()" + ] + }, + { + "cell_type": "markdown", + "id": "80f971c4-c1ae-4766-ab44-a30d361df523", + "metadata": {}, + "source": [ + "Then stop all monitors scheduled for the endpoint" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "e4b99289-3924-4d40-9860-75ccea76646b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Stopping Monitoring Schedule with name: monitoring-schedule-2023-01-19-05-37-42-411\n", + "Waiting for execution to finish\n", + "Done! Execution Status: CompletedWithViolations\n", + "\n", + "Deleting Monitoring Schedule with name: monitoring-schedule-2023-01-19-05-37-42-411\n" + ] + } + ], + "source": [ + "model_bias_monitor.stop_monitoring_schedule()\n", + "wait_for_execution_to_finish(model_bias_monitor)\n", + "model_bias_monitor.delete_monitoring_schedule()" + ] + }, + { + "cell_type": "markdown", + "id": "f2442401-06c9-481a-a04c-e339d618af54", + "metadata": {}, + "source": [ + "Finally, delete the endpoint" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "d6dd0678-66d3-493d-bee4-7e2a9dab901e", + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_session.delete_endpoint(endpoint_name=endpoint_name)\n", + "sagemaker_session.delete_model(model_name=model_name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5779c2e2-1115-4d33-9820-e7bd452f3604", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "Python 3 (Data Science)", + "language": "python", + "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.10" + }, + "toc-autonumbering": false, + "toc-showmarkdowntxt": false + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Feature-Attribution-Drift-for-Endpoint.ipynb b/sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Feature-Attribution-Drift-for-Endpoint.ipynb new file mode 100644 index 0000000000..393faf64f9 --- /dev/null +++ b/sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Feature-Attribution-Drift-for-Endpoint.ipynb @@ -0,0 +1,1561 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "5a524a4c-5a39-4b6b-abb1-1c8e1b2de84c", + "metadata": {}, + "source": [ + "# Amazon SageMaker Clarify Model Explainability Monitor - JSON Lines Format" + ] + }, + { + "cell_type": "markdown", + "id": "4eaae7a8-2ab1-4f7c-8cb2-6b23606c58c1", + "metadata": {}, + "source": [ + "## Runtime\n", + "\n", + "This notebook takes approximately 60 minutes to run." + ] + }, + { + "cell_type": "markdown", + "id": "a0a2c6a4-a249-40bf-adbc-8bd00fb06cfe", + "metadata": { + "tags": [] + }, + "source": [ + "## Introduction" + ] + }, + { + "cell_type": "markdown", + "id": "1879bacd-fedd-434a-8094-40cd48f5f140", + "metadata": {}, + "source": [ + "[Amazon SageMaker Model Monitor](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html) continuously monitors the quality of Amazon SageMaker machine learning models in production. It enables developers to set alerts for when there are deviations in the model quality. Early and pro-active detection of these deviations enables corrective actions, such as retraining models, auditing upstream systems, or fixing data quality issues without having to monitor models manually or build additional tooling. \n", + "\n", + "[Amazon SageMaker Clarify Model Explainability Monitor](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-feature-attribution-drift.html) is a model monitor that helps data scientists and ML engineers monitor predictions for feature attribution drift on a regular basis. A drift in the distribution of live data for models in production can result in a corresponding drift in the feature attribution values. As the model is monitored, customers can view exportable reports and graphs detailing feature attributions in SageMaker Studio and configure alerts in Amazon CloudWatch to receive notifications if it is detected that the attribution values drift beyond a certain threshold. \n", + "\n", + "This notebook demonstrates the process for setting up a model monitor for continuous monitoring of feature attribution drift of a [SageMaker real-time inference endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html). The model input and output are in [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats). SageMaker Clarify model monitor also supports analyzing CSV data, which is illustrated in [another notebook](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_model_monitor/fairness_and_explainability/SageMaker-Model-Monitor-Fairness-and-Explainability.ipynb).\n", + "\n", + "In general, you can use the model explainability monitor for real-time inference endpoint in this way,\n", + "\n", + "1. Enable the endpoint for data capture. Then, when the customer invokes the endpoint, the endpoint saves the invocations to a data capture S3 location. \n", + "1. Schedule a model explainability monitor to monitor the endpoint (to be more specific, the data capture S3 location) and a ground truth S3 location.\n", + "\n", + "The monitor executes processing jobs regularly to do feature attribution analysis, and then generate analysis reports and publish metrics to CloudWatch." + ] + }, + { + "cell_type": "markdown", + "id": "a4eed2c2-4e67-49cd-8b16-01d10c0acdb0", + "metadata": {}, + "source": [ + "## General Setup" + ] + }, + { + "cell_type": "markdown", + "id": "56e754c8-d82a-49a3-9967-d7a487a42549", + "metadata": {}, + "source": [ + "The notebook uses the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk). The following cell upgrades the SDK and its dependencies. Then you may need to restart the kernel and rerun the notebook to pick up the up-to-date APIs, if the notebook is executed in the SageMaker Studio." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e815029f-6166-40f6-a5dd-da2358f8b7fa", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -U sagemaker\n", + "!pip install -U boto3\n", + "!pip install -U botocore" + ] + }, + { + "cell_type": "markdown", + "id": "43f20cf6-1672-45ab-966b-5db2d51aad53", + "metadata": {}, + "source": [ + "### Imports\n", + "\n", + "The following cell imports the APIs to be used by the notebook." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "21f01570-2eee-46ef-b044-8b65569c26b7", + "metadata": {}, + "outputs": [], + "source": [ + "import sagemaker\n", + "import pandas as pd\n", + "import copy\n", + "import datetime\n", + "import json\n", + "import random\n", + "import threading\n", + "import time\n", + "import pprint" + ] + }, + { + "cell_type": "markdown", + "id": "5baa9278-a1c9-427c-a9d9-5ddab19bcd49", + "metadata": {}, + "source": [ + "### Handful of configuration\n", + "\n", + "To begin, ensure that these prerequisites have been completed.\n", + "\n", + "* Specify an AWS Region to host the model.\n", + "* Specify an IAM role to execute jobs.\n", + "* Define the S3 URIs that stores the model file, input data and output data. For demonstration purposes, this notebook uses the same bucket for them. In reality, they could be separated with different security policies." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "74b11f7c-e9cd-4321-8de5-27ca6dd85d01", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "AWS region: us-west-2\n", + "RoleArn: arn:aws:iam::000000000000:role/service-role/AmazonSageMaker-ExecutionRole-20200714T163791\n", + "Demo Bucket: sagemaker-us-west-2-000000000000\n", + "Demo Prefix: sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04\n", + "Demo S3 key: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04\n", + "The endpoint will save the captured data to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/data-capture\n", + "The baselining job will save the analysis results to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/baselining-output\n", + "The monitor will save the analysis results to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/monitor-output\n" + ] + } + ], + "source": [ + "sagemaker_session = sagemaker.Session()\n", + "s3_client = sagemaker_session.boto_session.client(\"s3\")\n", + "\n", + "region = sagemaker_session.boto_region_name\n", + "print(f\"AWS region: {region}\")\n", + "\n", + "role = sagemaker.get_execution_role()\n", + "print(f\"RoleArn: {role}\")\n", + "\n", + "# A different bucket can be used, but make sure the role for this notebook has\n", + "# the s3:PutObject permissions. This is the bucket into which the data is captured\n", + "bucket = sagemaker_session.default_bucket()\n", + "print(f\"Demo Bucket: {bucket}\")\n", + "prefix = sagemaker.utils.unique_name_from_base(\"sagemaker/DEMO-ClarifyModelMonitor\")\n", + "print(f\"Demo Prefix: {prefix}\")\n", + "s3_key = f\"s3://{bucket}/{prefix}\"\n", + "print(f\"Demo S3 key: {s3_key}\")\n", + "\n", + "data_capture_s3_uri = f\"{s3_key}/data-capture\"\n", + "baselining_output_s3_uri = f\"{s3_key}/baselining-output\"\n", + "monitor_output_s3_uri = f\"{s3_key}/monitor-output\"\n", + "\n", + "print(f\"The endpoint will save the captured data to: {data_capture_s3_uri}\")\n", + "print(f\"The baselining job will save the analysis results to: {baselining_output_s3_uri}\")\n", + "print(f\"The monitor will save the analysis results to: {monitor_output_s3_uri}\")" + ] + }, + { + "cell_type": "markdown", + "id": "d7da5265-858f-4478-978b-ad592464b61d", + "metadata": {}, + "source": [ + "### Model file and data files\n", + "\n", + "This example includes a prebuilt [SageMaker Linear Learner](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) model trained by [a SageMaker Clarify offline processing example notebook](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-clarify/fairness_and_explainability/fairness_and_explainability_jsonlines_format.ipynb). The model supports [SageMaker JSON Lines dense format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html#common-in-formats) (MIME type `\"application/jsonlines\"`).\n", + "\n", + "* The model input can one or more lines, each line is a JSON object that has a \"features\" key pointing to a list of feature values concerning demographic characteristics of individuals. For example,\n", + "\n", + "```\n", + "{\"features\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\n", + "{\"features\":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]}\n", + "```\n", + "\n", + "* The model output has the predictions of whether a person has a yearly income that is more than $50,000. Each prediction is a JSON object that has a \"predicted_label\" key pointing to the predicted label, and the \"score\" key pointing to the confidence score. For example,\n", + "\n", + "```\n", + "{\"predicted_label\":1,\"score\":0.989977359771728}\n", + "{\"predicted_label\":1,\"score\":0.504138827323913}\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "f75d26c9-0f0b-422d-97cb-b74efd5eacd6", + "metadata": {}, + "outputs": [], + "source": [ + "model_file = \"model/ll-adult-prediction-model.tar.gz\"" + ] + }, + { + "cell_type": "markdown", + "id": "dc4d1d6a-c75c-4563-9699-33de88469093", + "metadata": {}, + "source": [ + "This example includes two dataset files, both in the JSON Lines format." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "f1eaa4fe-622f-4745-a3cc-52d40db8ce9f", + "metadata": {}, + "outputs": [], + "source": [ + "train_dataset_path = \"test_data/validation-dataset.jsonl\"\n", + "test_dataset_path = \"test_data/test-dataset.jsonl\"\n", + "dataset_type = \"application/jsonlines\"" + ] + }, + { + "cell_type": "markdown", + "id": "5ca1001e-0b91-4133-8bce-6710aaa33270", + "metadata": {}, + "source": [ + "The train dataset has the features and the ground truth label (pointed to by the key \"label\")," + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "06c22c10-7ba8-417a-a0dc-1e152a0a3287", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"features\":[41,2,220531,14,15,2,9,0,4,1,0,0,60,38],\"label\":1}\n", + "{\"features\":[33,2,35378,9,13,2,11,5,4,0,0,0,45,38],\"label\":1}\n", + "{\"features\":[36,2,223433,12,14,2,11,0,4,1,7688,0,50,38],\"label\":1}\n", + "{\"features\":[40,2,220589,7,12,4,0,1,4,0,0,0,40,38],\"label\":0}\n", + "{\"features\":[30,2,231413,15,10,2,2,0,4,1,0,0,40,38],\"label\":1}\n" + ] + } + ], + "source": [ + "!head -n 5 $train_dataset_path" + ] + }, + { + "cell_type": "markdown", + "id": "ddebb1fd-d480-4700-8dd8-3143205331a6", + "metadata": {}, + "source": [ + "The test dataset only has features." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "9f78d463-f1ff-4483-8cf3-562bccb98a2b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"features\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\n", + "{\"features\":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]}\n", + "{\"features\":[34,2,162604,11,9,4,2,2,2,1,0,0,40,37]}\n", + "{\"features\":[20,2,258509,11,9,4,6,3,2,1,0,0,40,37]}\n", + "{\"features\":[27,2,446947,9,13,4,0,4,2,0,0,0,55,37]}\n" + ] + } + ], + "source": [ + "!head -n 5 $test_dataset_path" + ] + }, + { + "cell_type": "markdown", + "id": "a7b89b8d-5036-4bd9-8aa5-f5d638617aba", + "metadata": {}, + "source": [ + "Here are the headers of the train dataset. \"Target\" is the header of the ground truth label, and the others are the feature headers. They will be used to beautify the analysis report." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "2a843093-0548-48dd-9f82-e80af07c357e", + "metadata": {}, + "outputs": [], + "source": [ + "all_headers = [\n", + " \"Age\",\n", + " \"Workclass\",\n", + " \"fnlwgt\",\n", + " \"Education\",\n", + " \"Education-Num\",\n", + " \"Marital Status\",\n", + " \"Occupation\",\n", + " \"Relationship\",\n", + " \"Ethnic group\",\n", + " \"Sex\",\n", + " \"Capital Gain\",\n", + " \"Capital Loss\",\n", + " \"Hours per week\",\n", + " \"Country\",\n", + " \"Target\",\n", + "]\n", + "label_header = all_headers[-1]" + ] + }, + { + "cell_type": "markdown", + "id": "2441fc17-0299-4b11-afe7-efdb167263ad", + "metadata": {}, + "source": [ + "To verify that the execution role for this notebook has the necessary permissions to proceed, put a simple test object into the S3 bucket specified above. If this command fails, update the role to have `s3:PutObject` permission on the bucket and try again." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "dfe69a8c-9bf6-47c4-bb59-a775fd3b6934", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Success! We are all set to proceed with uploading to S3.\n" + ] + } + ], + "source": [ + "sagemaker.s3.S3Uploader.upload_string_as_file_body(\n", + " body=\"hello\",\n", + " desired_s3_uri=f\"{s3_key}/upload-test-file.txt\",\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(\"Success! We are all set to proceed with uploading to S3.\")" + ] + }, + { + "cell_type": "markdown", + "id": "7a099ef6-8d09-478d-854c-989758bad1c5", + "metadata": {}, + "source": [ + "Then upload the files to S3 so that they can be used by SageMaker." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "0f0fe183-4c83-4d22-bce5-65eba6a351e2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model file has been uploaded to s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/ll-adult-prediction-model.tar.gz\n", + "Train data is uploaded to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/validation-dataset.jsonl\n", + "Test data is uploaded to: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/test-dataset.jsonl\n" + ] + } + ], + "source": [ + "model_url = sagemaker.s3.S3Uploader.upload(\n", + " local_path=model_file,\n", + " desired_s3_uri=s3_key,\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(f\"Model file has been uploaded to {model_url}\")\n", + "\n", + "train_data_s3_uri = sagemaker.s3.S3Uploader.upload(\n", + " local_path=train_dataset_path,\n", + " desired_s3_uri=s3_key,\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(f\"Train data is uploaded to: {train_data_s3_uri}\")\n", + "test_data_s3_uri = sagemaker.s3.S3Uploader.upload(\n", + " local_path=test_dataset_path,\n", + " desired_s3_uri=s3_key,\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(f\"Test data is uploaded to: {test_data_s3_uri}\")" + ] + }, + { + "cell_type": "markdown", + "id": "2d11cc57-8ab4-422e-9492-4126f34ef4c5", + "metadata": {}, + "source": [ + "## Real-time Inference Endpoint\n", + "\n", + "This section creates a SageMaker real-time inference endpoint to showcase the data capture capability in action. The model monitor will be scheduled for the endpoint and process the captured data.\n" + ] + }, + { + "cell_type": "markdown", + "id": "3d295bc3-3a82-4f22-9768-29572c0ae4f3", + "metadata": { + "tags": [] + }, + "source": [ + "### Deploy the model to an endpoint\n", + "\n", + "Start with deploying the pre-trained model. Here, create a SageMaker `Model` object with the inference image and model file. Then deploy the model with the data capture configuration and wait until the endpoint is ready to serve traffic.\n", + "\n", + "`DataCaptureConfig` enables capturing the request payload and the response payload of the endpoint. The data format is passed to the `json_content_types` parameter, so the request payload and response payload are captured as plain text (without being encoding by BASE64)." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "d0c565e0-051a-4f6c-bcb6-3dca8f4ec592", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "SageMaker model name: DEMO-ll-adult-pred-model-monitor-1674106124-9611\n", + "SageMaker endpoint name: DEMO-ll-adult-pred-model-monitor-1674106124-9611\n", + "SageMaker Linear Learner image: 174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:1\n" + ] + } + ], + "source": [ + "model_name = sagemaker.utils.unique_name_from_base(\"DEMO-ll-adult-pred-model-monitor\")\n", + "endpoint_name = model_name\n", + "print(f\"SageMaker model name: {model_name}\")\n", + "print(f\"SageMaker endpoint name: {endpoint_name}\")\n", + "\n", + "image_uri = sagemaker.image_uris.retrieve(\"linear-learner\", region, \"1\")\n", + "print(f\"SageMaker Linear Learner image: {image_uri}\")\n", + "\n", + "model = sagemaker.model.Model(\n", + " role=role,\n", + " name=model_name,\n", + " image_uri=image_uri,\n", + " model_data=model_url,\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "\n", + "data_capture_config = sagemaker.model_monitor.DataCaptureConfig(\n", + " enable_capture=True,\n", + " sampling_percentage=100, # Capture 100% of the traffic\n", + " destination_s3_uri=data_capture_s3_uri,\n", + " json_content_types=[dataset_type],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "c86306f2-8f15-4d39-9cbb-2f6c0e7ee978", + "metadata": {}, + "source": [ + "**NOTE**: The following cell takes about 10 minutes to deploy the model." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "77330b34-0640-4b00-b3bb-4a8ea6e9a223", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Deploying model DEMO-ll-adult-pred-model-monitor-1674106124-9611 to endpoint DEMO-ll-adult-pred-model-monitor-1674106124-9611\n", + "------!" + ] + } + ], + "source": [ + "print(f\"Deploying model {model_name} to endpoint {endpoint_name}\")\n", + "model.deploy(\n", + " initial_instance_count=1,\n", + " instance_type=\"ml.m5.xlarge\",\n", + " endpoint_name=endpoint_name,\n", + " data_capture_config=data_capture_config,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "14bf8504-bca2-4948-867a-cab4ca349bd9", + "metadata": {}, + "source": [ + "### Invoke the endpoint\n", + "\n", + "Now send data to this endpoint to get inferences in real time. The model supports mini-batch predictions, so you can put one or more records to a single request." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "44a908e5-c16f-41dc-b718-323ab5ed4268", + "metadata": {}, + "outputs": [], + "source": [ + "with open(test_dataset_path, \"r\") as f:\n", + " test_data = f.read().splitlines()" + ] + }, + { + "cell_type": "markdown", + "id": "2ccc2ed6-355a-4cdb-a44e-1463c0d9ef9f", + "metadata": {}, + "source": [ + "#### Example: Single record" + ] + }, + { + "cell_type": "markdown", + "id": "ea0e8368-37b1-41d2-b0da-0f22fee2b87e", + "metadata": {}, + "source": [ + "Request payload:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "52fbb63a-e1d8-414e-968a-20822305f23c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"features\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\n" + ] + } + ], + "source": [ + "request_payload = test_data[0]\n", + "print(request_payload)" + ] + }, + { + "cell_type": "markdown", + "id": "f880886a-38cc-44c1-acc4-f3876956e2a8", + "metadata": {}, + "source": [ + "Response payload:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "87531e43-c9d1-4d9b-8019-19bec1a832eb", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'{\"predicted_label\":1,\"score\":0.989977359771728}\\n'" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "response = sagemaker_session.sagemaker_runtime_client.invoke_endpoint(\n", + " EndpointName=endpoint_name,\n", + " ContentType=dataset_type,\n", + " Accept=dataset_type,\n", + " Body=request_payload,\n", + ")\n", + "response_payload = response[\"Body\"].read().decode(\"utf-8\")\n", + "response_payload" + ] + }, + { + "cell_type": "markdown", + "id": "22fe887e-ec0d-4b2a-9c32-28d93c2e25be", + "metadata": {}, + "source": [ + "#### Example: Two records" + ] + }, + { + "cell_type": "markdown", + "id": "6094ad1c-55dd-40d1-b31f-8d47f21814c3", + "metadata": {}, + "source": [ + "Request payload:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "2cd41694-9e20-461f-ae85-5f792a521753", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'{\"features\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\\n{\"features\":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]}'" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "request_payload = \"\\n\".join(test_data[:2])\n", + "request_payload" + ] + }, + { + "cell_type": "markdown", + "id": "3ab91982-67b4-4293-86cb-bb61be2f67aa", + "metadata": {}, + "source": [ + "Response payload:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "fece49e7-38b9-4b33-91ca-f23fcd06dcbb", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'{\"predicted_label\":1,\"score\":0.989977359771728}\\n{\"predicted_label\":1,\"score\":0.504138827323913}\\n'" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "response = sagemaker_session.sagemaker_runtime_client.invoke_endpoint(\n", + " EndpointName=endpoint_name,\n", + " ContentType=dataset_type,\n", + " Accept=dataset_type,\n", + " Body=request_payload,\n", + ")\n", + "response_payload = response[\"Body\"].read().decode(\"utf-8\")\n", + "response_payload" + ] + }, + { + "cell_type": "markdown", + "id": "243eac0c-a697-42b6-a56f-c0279cc7cd57", + "metadata": {}, + "source": [ + "### View captured data\n", + "\n", + "Because data capture is enabled in the previous steps, the request and response payload, along with some additional metadata, are saved in the Amazon S3 location specified in the `DataCaptureConfig`.\n", + "\n", + "Now list the captured data files stored in Amazon S3. There should be different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:\n", + "\n", + "`s3://{data_capture_s3_uri}/{endpoint_name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "18c649dd-40ef-4260-b499-0f3c371f970f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Waiting for captured data to show up............................................................\n", + "Found capture data files:\n", + "s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/data-capture/DEMO-ll-adult-pred-model-monitor-1674106124-9611/AllTraffic/2023/01/19/05/31-46-585-9ee7358c-33e7-467d-9133-624e448e6552.jsonl\n" + ] + } + ], + "source": [ + "print(\"Waiting for captured data to show up\", end=\"\")\n", + "for _ in range(120):\n", + " captured_data_files = sorted(\n", + " sagemaker.s3.S3Downloader.list(\n", + " s3_uri=f\"{data_capture_s3_uri}/{endpoint_name}\",\n", + " sagemaker_session=sagemaker_session,\n", + " )\n", + " )\n", + " if captured_data_files:\n", + " break\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(1)\n", + "print()\n", + "print(\"Found capture data files:\")\n", + "print(\"\\n \".join(captured_data_files[-5:]))" + ] + }, + { + "cell_type": "markdown", + "id": "0b4b01fd-4df2-42ff-935e-8843f1bc568f", + "metadata": {}, + "source": [ + "Next, view the content of a single capture file." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "e4ad7021-4bcc-4fe1-880e-11a872941ff1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"application/jsonlines\",\"mode\":\"INPUT\",\"data\":\"{\\\"features\\\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\",\"encoding\":\"JSON\"},\"endpointOutput\":{\"observedContentType\":\"application/jsonlines\",\"mode\":\"OUTPUT\",\"data\":\"{\\\"predicted_label\\\":1,\\\"score\\\":0.989977359771728}\\n\",\"encoding\":\"JSON\"}},\"eventMetadata\":{\"eventId\":\"40394cfe-37c3-4abb-b22c-dd7accbe9608\",\"inferenceTime\":\"2023-01-19T05:31:46Z\"},\"eventVersion\":\"0\"}\n", + "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"application/jsonlines\",\"mode\":\"INPUT\",\"data\":\"{\\\"features\\\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\\n{\\\"features\\\":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]}\",\"encoding\":\"JSON\"},\"endpointOutput\":{\"observedContentType\":\"application/jsonlines\",\"mode\":\"OUTPUT\",\"data\":\"{\\\"predicted_label\\\":1,\\\"score\\\":0.989977359771728}\\n{\\\"predicted_label\\\":1,\\\"score\\\":0.504138827323913}\\n\",\"encoding\":\"JSON\"}},\"eventMetadata\":{\"eventId\":\"32aaea1b-1a60-4870-b81d-40271f950c4a\",\"inferenceTime\":\"2023-01-19T05:31:46Z\"},\"eventVersion\":\"0\"}\n", + "\n" + ] + } + ], + "source": [ + "captured_data = sagemaker.s3.S3Downloader.read_file(\n", + " s3_uri=captured_data_files[-1],\n", + " sagemaker_session=sagemaker_session,\n", + ")\n", + "print(captured_data)" + ] + }, + { + "cell_type": "markdown", + "id": "6e09cffd-111a-43a1-8429-2fa3fbce9d2e", + "metadata": {}, + "source": [ + "Finally, the contents of a single line is present below in formatted JSON to observe a little better.\n", + "\n", + "* `\"captureData\"` has two fields, `\"endpointInput\"` is the captured invocation request, and `\"endpointOutput\"` is the response.\n", + "\n", + "* `\"eventMetadata\"` has the inference ID and event ID." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "14611944-0ae1-4f9f-ab6e-4b5c74ee7f3f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"captureData\": {\n", + " \"endpointInput\": {\n", + " \"observedContentType\": \"application/jsonlines\",\n", + " \"mode\": \"INPUT\",\n", + " \"data\": \"{\\\"features\\\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}\\n{\\\"features\\\":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]}\",\n", + " \"encoding\": \"JSON\"\n", + " },\n", + " \"endpointOutput\": {\n", + " \"observedContentType\": \"application/jsonlines\",\n", + " \"mode\": \"OUTPUT\",\n", + " \"data\": \"{\\\"predicted_label\\\":1,\\\"score\\\":0.989977359771728}\\n{\\\"predicted_label\\\":1,\\\"score\\\":0.504138827323913}\\n\",\n", + " \"encoding\": \"JSON\"\n", + " }\n", + " },\n", + " \"eventMetadata\": {\n", + " \"eventId\": \"32aaea1b-1a60-4870-b81d-40271f950c4a\",\n", + " \"inferenceTime\": \"2023-01-19T05:31:46Z\"\n", + " },\n", + " \"eventVersion\": \"0\"\n", + "}\n" + ] + } + ], + "source": [ + "print(json.dumps(json.loads(captured_data.splitlines()[-1]), indent=4))" + ] + }, + { + "cell_type": "markdown", + "id": "4b473f92-7142-4f79-8a27-86672682a5b2", + "metadata": {}, + "source": [ + "### Start generating some artificial traffic\n", + "The cell below starts a thread to send some traffic to the endpoint. If there is no traffic, the monitoring jobs are marked as `Failed` since there is no data to process.\n", + "\n", + "Notice the `InferenceId` attribute used to invoke, in this example, it will be used to join the captured data with the ground truth data. If it is not available, then the `eventId` will be used for the join operation." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "0af95cc5-9e1d-46fd-b373-16015c87be58", + "metadata": {}, + "outputs": [], + "source": [ + "class WorkerThread(threading.Thread):\n", + " def __init__(self, do_run, *args, **kwargs):\n", + " super(WorkerThread, self).__init__(*args, **kwargs)\n", + " self.__do_run = do_run\n", + " self.__terminate_event = threading.Event()\n", + "\n", + " def terminate(self):\n", + " self.__terminate_event.set()\n", + "\n", + " def run(self):\n", + " while not self.__terminate_event.is_set():\n", + " self.__do_run(self.__terminate_event)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "00e832f7-8cc7-4044-b2aa-f22c93d2078d", + "metadata": {}, + "outputs": [], + "source": [ + "def invoke_endpoint(terminate_event):\n", + " for index, record in enumerate(test_data):\n", + " response = sagemaker_session.sagemaker_runtime_client.invoke_endpoint(\n", + " EndpointName=endpoint_name,\n", + " ContentType=dataset_type,\n", + " Accept=dataset_type,\n", + " Body=record,\n", + " InferenceId=str(index), # unique ID per row\n", + " )\n", + " response[\"Body\"].read()\n", + " time.sleep(1)\n", + " if terminate_event.is_set():\n", + " break\n", + "\n", + "\n", + "# Keep invoking the endpoint with test data\n", + "invoke_endpoint_thread = WorkerThread(do_run=invoke_endpoint)\n", + "invoke_endpoint_thread.start()" + ] + }, + { + "cell_type": "markdown", + "id": "f8d87f96-1ab6-4ad9-bd0d-f21b18ebcded", + "metadata": {}, + "source": [ + "## Model Explainability Monitor\n", + "\n", + "Similar to the other monitoring types, the standard procedure of creating a [feature attribution drift monitor](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-feature-attribution-drift.html) is first run a baselining job, and then schedule the monitor." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "273af941-56ff-4a08-a1e1-023e2d4ec090", + "metadata": {}, + "outputs": [], + "source": [ + "model_explainability_monitor = sagemaker.model_monitor.ModelExplainabilityMonitor(\n", + " role=role,\n", + " sagemaker_session=sagemaker_session,\n", + " max_runtime_in_seconds=3600,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "c47a6f66-bdd8-4815-b3ed-286035f6e4ce", + "metadata": {}, + "source": [ + "### Baselining job\n", + "\n", + "A baselining job runs predictions on training dataset and suggests constraints. The `suggest_baseline()` method of `SageMakerClarifyProcessor` starts a SageMaker Clarify processing job to generate the constraints.\n", + "\n", + "The step is not mandatory, but providing constraints file to the monitor can enable violations file generation." + ] + }, + { + "cell_type": "markdown", + "id": "b7bd931a-bacc-480b-8d2d-c363abe9943f", + "metadata": {}, + "source": [ + "#### Configurations\n", + "\n", + "Information about the input data need to be provided to the processor." + ] + }, + { + "cell_type": "markdown", + "id": "6398d447-0ccf-4c79-a29d-8d6a54e1c034", + "metadata": {}, + "source": [ + "`DataConfig` stores information about the dataset to be analyzed. For example, the dataset file and its format (like JSON Lines), where to store the analysis results. Some special things to note about this configuration for the JSON Lines dataset,\n", + "\n", + "* The parameter value `\"features\"` or `\"label\"` is **NOT** a header string. Instead, it is a `JMESPath` (https://jmespath.org) expression to locate the features list or the ground truth label in the dataset (the ground truth label is not needed for the explainability analysis, the parameter is specified so that the job knows it should be excluded from the dataset). In this example notebook they happen to be the same as the keys in the dataset. But for example, if the dataset has records like below, then `features` should be \"data.features.values\", and `label` should be \"data.label\". \n", + "\n", + "```\n", + "{\"data\": {\"features\": {\"values\": [25, 2, 226802, 1, 7, 4, 6, 3, 2, 1, 0, 0, 40, 37]}, \"label\": 0}}\n", + "```\n", + "\n", + "* SageMaker Clarify processing job will load the JSON Lines dataset into tabular representation for further analysis, and the parameter `headers` is the list of column names. **The label header shall be the last one in the headers list**, and the order of feature headers shall be the same as the order of features in a record." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "fd146e26-a54c-4a31-acc9-5a406ddf8680", + "metadata": {}, + "outputs": [], + "source": [ + "features_jmespath = \"features\"\n", + "ground_truth_label_jmespath = \"label\"\n", + "data_config = sagemaker.clarify.DataConfig(\n", + " s3_data_input_path=train_data_s3_uri,\n", + " s3_output_path=baselining_output_s3_uri,\n", + " features=features_jmespath,\n", + " label=ground_truth_label_jmespath,\n", + " headers=all_headers,\n", + " dataset_type=dataset_type,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "93c9c98b-67a5-45e0-8aa5-a488e25a6de8", + "metadata": {}, + "source": [ + "`ModelConfig` is configuration related to model to be used for inferencing. In order to compute SHAP values, the SageMaker Clarify explainer generates synthetic dataset and then get its predictions for the SageMaker model. To accomplish this, the processing job will use the model to create an ephemeral endpoint (also known as \"shadow endpoint\"). The processing job will delete the shadow endpoint after the computations are completed. One special thing to note about this configuration for the JSON Lines model input and output,\n", + "\n", + "* `content_template` is used by SageMaker Clarify processing job to convert the tabular data to the request payload acceptable to the shadow endpoint. To be more specific, the placeholder `$features` will be replaced by **the features list** from records. The request payload of a record from the testing dataset happens to be similar to the record itself, like `{\"features\":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]}`, because both the dataset and the model input conform to the same format." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "3a49acc6-c6a9-46fa-aed7-e93e67fae373", + "metadata": {}, + "outputs": [], + "source": [ + "content_template = '{\"features\":$features}'\n", + "model_config = sagemaker.clarify.ModelConfig(\n", + " model_name=model_name, # The name of the SageMaker model\n", + " instance_type=\"ml.m5.xlarge\", # The instance type of the shadow endpoint\n", + " instance_count=1, # The instance count of the shadow endpoint\n", + " content_type=dataset_type, # The data format of the model input\n", + " accept_type=dataset_type, # The data format of the model output\n", + " content_template=content_template,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "506b583a-f643-45dc-bdd3-ae29120734fa", + "metadata": {}, + "source": [ + "Currently, the SageMaker Clarify explainer offers a scalable and efficient implementation of SHAP, so the explainability config is `SHAPConfig`, including\n", + "\n", + "* `baseline`: A list of records (at least one) to be used as the baseline dataset in the Kernel SHAP algorithm, each record is JSON object that includes a list of features. It can also be a S3 object URI, the S3 file should be in the same format as dataset.\n", + "* `num_samples`: Number of samples to be used in the Kernel SHAP algorithm. This number determines the size of the generated synthetic dataset to compute the SHAP values.\n", + "* `agg_method`: Aggregation method for global SHAP values. Valid values are\n", + " * \"mean_abs\" (mean of absolute SHAP values for all instances),\n", + " * \"median\" (median of SHAP values for all instances) and\n", + " * \"mean_sq\" (mean of squared SHAP values for all instances).\n", + "* `use_logit`: Indicator of whether the logit function is to be applied to the model predictions. Default is False. If \"use_logit\" is true then the SHAP values will have log-odds units.\n", + "* `save_local_shap_values`: Indicator of whether to save the local SHAP values in the output location. Default is True." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "0ead08ae-1867-41b9-8c0e-6202760c4175", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "SHAP baseline: [{'features': [39, 2, 184870, 10, 10, 3, 6, 1, 4, 1, 1597, 61, 41, 37]}]\n" + ] + } + ], + "source": [ + "# Here use the mean value of train dataset as SHAP baseline\n", + "dataset = []\n", + "with open(train_dataset_path) as f:\n", + " dataset = [json.loads(row)[\"features\"] for row in f]\n", + "mean_values = pd.DataFrame(dataset).mean().round().astype(int).to_list()\n", + "mean_record = {\"features\": mean_values}\n", + "shap_baseline = [mean_record]\n", + "print(f\"SHAP baseline: {shap_baseline}\")\n", + "\n", + "shap_config = sagemaker.clarify.SHAPConfig(\n", + " baseline=shap_baseline,\n", + " num_samples=100,\n", + " agg_method=\"mean_abs\",\n", + " save_local_shap_values=False,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "3c9417f1-b2b2-4c23-81ba-256ff4616c5c", + "metadata": {}, + "source": [ + "#### Kick off baselining job\n", + "\n", + "Call the `suggest_baseline()` method to start the baselining job. The model output has a key \"score\" pointing to a confidence score value between `0` and `1`. So, the `model_scores` parameter is set to the `JMESPath` expression \"score\" which can locate the score in the model output." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "9c27e74b-31f6-435a-a0d4-bef52a4cdcdb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Job Name: baseline-suggestion-job-2023-01-19-05-32-51-896\n", + "Inputs: [{'InputName': 'dataset', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/validation-dataset.jsonl', 'LocalPath': '/opt/ml/processing/input/data', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'analysis_config', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/baselining-output/analysis_config.json', 'LocalPath': '/opt/ml/processing/input/config', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]\n", + "Outputs: [{'OutputName': 'analysis_result', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/baselining-output', 'LocalPath': '/opt/ml/processing/output', 'S3UploadMode': 'EndOfJob'}}]\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "confidence_score_jmespath = \"score\"\n", + "model_explainability_monitor.suggest_baseline(\n", + " explainability_config=shap_config,\n", + " data_config=data_config,\n", + " model_config=model_config,\n", + " model_scores=confidence_score_jmespath, # The JMESPath to locate the confidence score in model output\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "9cf396d3-c7ab-4041-8820-64c5ebd15d46", + "metadata": {}, + "source": [ + "**NOTE**: The following cell waits until the baselining job is completed (in about 10 minutes). It then inspects the suggested constraints. This step can be skipped, because the monitor to be scheduled will automatically pick up baselining job name and wait for it before monitoring execution." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "ad0ece68-f130-4b66-b8ab-36d2916502c8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "..................................................................................................!\n", + "Suggested constraints: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/baselining-output/analysis.json\n", + "{\n", + " \"version\": \"1.0\",\n", + " \"explanations\": {\n", + " \"kernel_shap\": {\n", + " \"label0\": {\n", + " \"global_shap_values\": {\n", + " \"Age\": 0.05962069398767555,\n", + " \"Workclass\": 0.009340874120660063,\n", + " \"fnlwgt\": 0.0010900750377509304,\n", + " \"Education\": 0.014739126275038199,\n", + " \"Education-Num\": 0.09891391226656666,\n", + " \"Marital Status\": 0.05452765230404344,\n", + " \"Occupation\": 0.0025392834714334667,\n", + " \"Relationship\": 0.018169508641909988,\n", + " \"Ethnic group\": 0.005295263900463686,\n", + " \"Sex\": 0.032080828962127876,\n", + " \"Capital Gain\": 0.09913318680892579,\n", + " \"Capital Loss\": 0.013518474176382519,\n", + " \"Hours per week\": 0.03641124946588507,\n", + " \"Country\": 0.004894213349476741\n", + " },\n", + " \"expected_value\": 0.250623226165771\n", + " }\n", + " }\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "model_explainability_monitor.latest_baselining_job.wait(logs=False)\n", + "print()\n", + "model_explainability_constraints = model_explainability_monitor.suggested_constraints()\n", + "print(f\"Suggested constraints: {model_explainability_constraints.file_s3_uri}\")\n", + "print(\n", + " sagemaker.s3.S3Downloader.read_file(\n", + " s3_uri=model_explainability_constraints.file_s3_uri,\n", + " sagemaker_session=sagemaker_session,\n", + " )\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "5545f7e0-8256-4b33-8385-741c23b9acc6", + "metadata": {}, + "source": [ + "### Monitoring Schedule\n", + "\n", + "With above constraints collected, now call `create_monitoring_schedule()` method to schedule an hourly model explainability monitor." + ] + }, + { + "cell_type": "markdown", + "id": "b99f1d50-d9ce-42c6-84da-a710bfb7b47a", + "metadata": {}, + "source": [ + "If a baselining job has been submitted, then the monitor object will automatically pick up the analysis configuration from the baselining job. But if the baselining step is skipped, or if the capture dataset has different nature than the training dataset, then analysis configuration has to be provided.\n", + "\n", + "`ModelConfig` is required by `ExplainabilityAnalysisConfig` for the same reason as it is required by the baselining job. Note that only features are required for computing feature attribution, so ground truth label should be excluded.\n", + "\n", + "Highlights,\n", + "\n", + "* From `endpoint_name` the monitor can figure out the location of data captured by the endpoint.\n", + "* `features_attribute` is the `JMESPath` expression to locate the features in model input, similar to the `features` parameter of `DataConfig`.\n", + "* `inference_attribute` stores the `JMESPath` expression to locate the confidence score in model output, similar to the `model_scores` parameter of the `suggest_baseline()` method." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "8d160d3e-0482-4c4b-a171-e62eddb38b87", + "metadata": {}, + "outputs": [], + "source": [ + "schedule_expression = sagemaker.model_monitor.CronExpressionGenerator.hourly()" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "1c7a1355-2997-46f2-ae02-cb00063e3661", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model explainability monitoring schedule: monitoring-schedule-2023-01-19-05-41-04-758\n" + ] + } + ], + "source": [ + "# Remove label because only features are required for the analysis\n", + "headers_without_label_header = copy.deepcopy(all_headers)\n", + "headers_without_label_header.remove(label_header)\n", + "model_explainability_analysis_config = sagemaker.model_monitor.ExplainabilityAnalysisConfig(\n", + " explainability_config=shap_config,\n", + " model_config=model_config,\n", + " headers=headers_without_label_header,\n", + ")\n", + "model_explainability_monitor.create_monitoring_schedule(\n", + " analysis_config=model_explainability_analysis_config,\n", + " endpoint_input=sagemaker.model_monitor.EndpointInput(\n", + " endpoint_name=endpoint_name,\n", + " destination=\"/opt/ml/processing/input/endpoint\",\n", + " features_attribute=features_jmespath,\n", + " inference_attribute=confidence_score_jmespath,\n", + " ),\n", + " output_s3_uri=monitor_output_s3_uri,\n", + " schedule_cron_expression=schedule_expression,\n", + ")\n", + "print(\n", + " f\"Model explainability monitoring schedule: {model_explainability_monitor.monitoring_schedule_name}\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "bf22401a-4662-4063-b47f-5be6becf3c3b", + "metadata": {}, + "source": [ + "#### Wait for the first execution\n", + "\n", + "The schedule starts jobs at the previously specified intervals. Code below waits until time crosses the hour boundary (in UTC) to see executions kick off.\n", + "\n", + "Note: Even for an hourly schedule, Amazon SageMaker has a buffer period of 20 minutes to schedule executions. The execution might start in anywhere from zero to ~20 minutes from the hour boundary. This is expected and done for load balancing in the backend." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "ae00eb31-bbc7-4cf9-9fae-b323b4d380b2", + "metadata": {}, + "outputs": [], + "source": [ + "def wait_for_execution_to_start(model_monitor):\n", + " print(\n", + " \"An hourly schedule was created above and it will kick off executions ON the hour (plus 0 - 20 min buffer).\"\n", + " )\n", + "\n", + " print(\"Waiting for the first execution to happen\", end=\"\")\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " while \"LastMonitoringExecutionSummary\" not in schedule_desc:\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(60)\n", + " print()\n", + " print(\"Done! Execution has been created\")\n", + "\n", + " print(\"Now waiting for execution to start\", end=\"\")\n", + " while schedule_desc[\"LastMonitoringExecutionSummary\"][\"MonitoringExecutionStatus\"] in \"Pending\":\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(10)\n", + "\n", + " print()\n", + " print(\"Done! Execution has started\")" + ] + }, + { + "cell_type": "markdown", + "id": "16fabf1c-8458-4186-9fb2-7bfa2462b705", + "metadata": {}, + "source": [ + "**NOTE**: The following cell waits until the first monitoring execution is started. As explained above, the wait could take more than 60 minutes." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "b512df1e-57cf-4ba3-9262-0c325c4a600e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "An hourly schedule was created above and it will kick off executions ON the hour (plus 0 - 20 min buffer).\n", + "Waiting for the first execution to happen........................\n", + "Done! Execution has been created\n", + "Now waiting for execution to start.\n", + "Done! Execution has started\n" + ] + } + ], + "source": [ + "wait_for_execution_to_start(model_explainability_monitor)" + ] + }, + { + "cell_type": "markdown", + "id": "210955ae-1709-423f-98c0-ca93476eebde", + "metadata": {}, + "source": [ + "In real world, a monitoring schedule is supposed to be active all the time. But in this example, it can be stopped to avoid incurring extra charges. A stopped schedule will not trigger further executions, but the ongoing execution will continue. And if needed, the schedule can be restarted by `start_monitoring_schedule()`." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "a6980d31-c96d-4850-a7fb-c8583eeac54e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Stopping Monitoring Schedule with name: monitoring-schedule-2023-01-19-05-41-04-758\n" + ] + } + ], + "source": [ + "model_explainability_monitor.stop_monitoring_schedule()" + ] + }, + { + "cell_type": "markdown", + "id": "117a4a1d-4410-4f60-b859-762f18f7370b", + "metadata": {}, + "source": [ + "#### Wait for the execution to finish\n", + "\n", + "In the previous cell, the first execution has started. This section waits for the execution to finish so that its analysis results are available. Here are the possible terminal states and what each of them mean:\n", + "\n", + "* `Completed` - This means the monitoring execution completed, and no issues were found in the violations report.\n", + "* `CompletedWithViolations` - This means the execution completed, but constraint violations were detected.\n", + "* `Failed` - The monitoring execution failed, maybe due to client error (perhaps incorrect role permissions) or infrastructure issues. Further examination of `FailureReason` and `ExitMessage` is necessary to identify what exactly happened.\n", + "* `Stopped` - job exceeded max runtime or was manually stopped." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "2b07426d-f805-4527-9863-1d3d664734fa", + "metadata": {}, + "outputs": [], + "source": [ + "# Waits for the schedule to have last execution in a terminal status.\n", + "def wait_for_execution_to_finish(model_monitor):\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " execution_summary = schedule_desc.get(\"LastMonitoringExecutionSummary\")\n", + " if execution_summary is not None:\n", + " print(\"Waiting for execution to finish\", end=\"\")\n", + " while execution_summary[\"MonitoringExecutionStatus\"] not in [\n", + " \"Completed\",\n", + " \"CompletedWithViolations\",\n", + " \"Failed\",\n", + " \"Stopped\",\n", + " ]:\n", + " print(\".\", end=\"\", flush=True)\n", + " time.sleep(60)\n", + " schedule_desc = model_monitor.describe_schedule()\n", + " execution_summary = schedule_desc[\"LastMonitoringExecutionSummary\"]\n", + " print()\n", + " print(f\"Done! Execution Status: {execution_summary['MonitoringExecutionStatus']}\")\n", + " else:\n", + " print(\"Last execution not found\")" + ] + }, + { + "cell_type": "markdown", + "id": "01434010-3c04-4ef5-acd2-21a3a0035fc8", + "metadata": {}, + "source": [ + "**NOTE**: The following cell takes about 10 minutes." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "25e36f00-f488-4a16-867f-92c53d819782", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Waiting for execution to finish........\n", + "Done! Execution Status: Completed\n" + ] + } + ], + "source": [ + "wait_for_execution_to_finish(model_explainability_monitor)" + ] + }, + { + "cell_type": "markdown", + "id": "27ecf876-5999-4c2a-adcd-0a8537f082e6", + "metadata": {}, + "source": [ + "#### Inspect execution results\n", + "\n", + "List the generated reports,\n", + "\n", + "* analysis.json includes the global SHAP values.\n", + "* report.* files are static report files to visualize the SHAP values." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "3c767cbd-78c5-433d-a850-e230cb5a55dd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Report URI: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/monitor-output/DEMO-ll-adult-pred-model-monitor-1674106124-9611/monitoring-schedule-2023-01-19-05-41-04-758/2023/01/19/06\n", + "Found Report Files:\n", + "s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/monitor-output/DEMO-ll-adult-pred-model-monitor-1674106124-9611/monitoring-schedule-2023-01-19-05-41-04-758/2023/01/19/06/analysis.json\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/monitor-output/DEMO-ll-adult-pred-model-monitor-1674106124-9611/monitoring-schedule-2023-01-19-05-41-04-758/2023/01/19/06/report.html\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/monitor-output/DEMO-ll-adult-pred-model-monitor-1674106124-9611/monitoring-schedule-2023-01-19-05-41-04-758/2023/01/19/06/report.ipynb\n", + " s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-ClarifyModelMonitor-1674106123-4d04/monitor-output/DEMO-ll-adult-pred-model-monitor-1674106124-9611/monitoring-schedule-2023-01-19-05-41-04-758/2023/01/19/06/report.pdf\n" + ] + } + ], + "source": [ + "schedule_desc = model_explainability_monitor.describe_schedule()\n", + "execution_summary = schedule_desc.get(\"LastMonitoringExecutionSummary\")\n", + "if execution_summary and execution_summary[\"MonitoringExecutionStatus\"] in [\n", + " \"Completed\",\n", + " \"CompletedWithViolations\",\n", + "]:\n", + " last_model_explainability_monitor_execution = model_explainability_monitor.list_executions()[-1]\n", + " last_model_explainability_monitor_execution_report_uri = (\n", + " last_model_explainability_monitor_execution.output.destination\n", + " )\n", + " print(f\"Report URI: {last_model_explainability_monitor_execution_report_uri}\")\n", + " last_model_explainability_monitor_execution_report_files = sorted(\n", + " sagemaker.s3.S3Downloader.list(\n", + " s3_uri=last_model_explainability_monitor_execution_report_uri,\n", + " sagemaker_session=sagemaker_session,\n", + " )\n", + " )\n", + " print(\"Found Report Files:\")\n", + " print(\"\\n \".join(last_model_explainability_monitor_execution_report_files))\n", + "else:\n", + " last_model_explainability_monitor_execution = None\n", + " print(\n", + " \"====STOP==== \\n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures.\"\n", + " )" + ] + }, + { + "cell_type": "markdown", + "id": "602a2ef3-4d6c-4d93-974e-77a679fc4757", + "metadata": {}, + "source": [ + "If there are any violations compared to the baseline, they are listed here. See [Feature Attribution Drift Violations](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-model-attribution-drift-violations.html) for the schema of the file, and how violations are detected." + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "a7174d2e-9ee4-437f-be9a-c9d984318b76", + "metadata": {}, + "outputs": [], + "source": [ + "violations = model_explainability_monitor.latest_monitoring_constraint_violations()\n", + "if violations is not None:\n", + " pprint.PrettyPrinter(indent=4).pprint(violations.body_dict)" + ] + }, + { + "cell_type": "markdown", + "id": "1b2e3d97-27cc-4325-814d-04219d25ab76", + "metadata": {}, + "source": [ + "By default, the analysis results are also published to CloudWatch, see [CloudWatch Metrics for Feature Attribution Drift Analysis](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-feature-attribute-drift-cw.html)." + ] + }, + { + "cell_type": "markdown", + "id": "f6388287-b810-4522-bcc1-928228982388", + "metadata": {}, + "source": [ + "## Cleanup\n", + "\n", + "The endpoint can keep running and capturing data, but if there is no plan to collect more data or use this endpoint further, it should be deleted to avoid incurring additional charges. Note that deleting endpoint does not delete the data that was captured during the model invocations." + ] + }, + { + "cell_type": "markdown", + "id": "554e8db8-4918-420c-9b4d-5c7263a402e7", + "metadata": {}, + "source": [ + "First stop the worker thread," + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "f813097c-00cc-4ee4-91cc-d03b72915c67", + "metadata": {}, + "outputs": [], + "source": [ + "invoke_endpoint_thread.terminate()" + ] + }, + { + "cell_type": "markdown", + "id": "80f971c4-c1ae-4766-ab44-a30d361df523", + "metadata": {}, + "source": [ + "Then stop all monitors scheduled for the endpoint" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "e4b99289-3924-4d40-9860-75ccea76646b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Stopping Monitoring Schedule with name: monitoring-schedule-2023-01-19-05-41-04-758\n", + "Waiting for execution to finish\n", + "Done! Execution Status: Completed\n", + "\n", + "Deleting Monitoring Schedule with name: monitoring-schedule-2023-01-19-05-41-04-758\n" + ] + } + ], + "source": [ + "model_explainability_monitor.stop_monitoring_schedule()\n", + "wait_for_execution_to_finish(model_explainability_monitor)\n", + "model_explainability_monitor.delete_monitoring_schedule()" + ] + }, + { + "cell_type": "markdown", + "id": "f2442401-06c9-481a-a04c-e339d618af54", + "metadata": {}, + "source": [ + "Finally, delete the endpoint" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "d6dd0678-66d3-493d-bee4-7e2a9dab901e", + "metadata": {}, + "outputs": [], + "source": [ + "sagemaker_session.delete_endpoint(endpoint_name=endpoint_name)\n", + "sagemaker_session.delete_model(model_name=model_name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8a63b56-b7fe-48f0-8ca8-ef2a9ed9721e", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "Python 3 (Data Science)", + "language": "python", + "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.10" + }, + "toc-autonumbering": false, + "toc-showmarkdowntxt": false + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/sagemaker_model_monitor/fairness_and_explainability_jsonlines/model/ll-adult-prediction-model.tar.gz b/sagemaker_model_monitor/fairness_and_explainability_jsonlines/model/ll-adult-prediction-model.tar.gz new file mode 100644 index 0000000000000000000000000000000000000000..a066dbdfa33e9aef279888a83e3e9c7efc9331a6 GIT binary patch literal 950 zcmV;n14;ZJiwFP!00000|Ls*vh!jT{?%mlKSdvI&RgfGy?L}c`sGirli>r56$dW)v zScameyLP6vx~r|Kc6V4t@DRw!gWf!N5=F=*!CO$`NrDNvc}Q}Omt1m;5R$)odZuTZ zS#&jq9O|3tuI~T+=)WrDK4qA&F0Z)O8v=3>YBd@*8lt#1P;=RK8}*jkY@s@8*~qOS zRI_aKrUIHoA{7x9dTn3Dx<$UFv`6D5o}WqU-7NJur=CyGpvgD6f^YNMT! z1{v$+jBR-cZ_}XLQxi<0l+YG;9gJm<#1vyKd%|s$ZTF-z*tc_^t67O~RbQEz$tEh> z`rj_wn{M{_s@Zh{B4_265rBj-6F5wm8m8%@D#{g7{!|gt8f4u4m!cy$@jW6HCcZx- z4;`g&InKZ$}ZD%EiAo#9@f zD>JkUFA1Yy&Vf%NP~d{FN=yU^WvZbi(-kE07~F!>>)(A{i{>m#Uq^s8+t#k5deDR( zXF32>ZL}bRQ_Yo@yIiZ+Ys*jq=$DjO_2a4LY3H$^e&8uC#@tR25z)sUBhc5afC?=Z zQY8XU;-TBLfR_>yseox6{V9S_FX>Q$;mCx|Y)lNHtwEH(r;!Aaq>+SNE(fZg^JqND zqIfduOoVt|B8b<=ipTgO(4qa#7Vgm0r&~ojC!Xok9TLWjX1ig!AB=_4{4$Qr?Z2Ru zgF=VyJt%KKg=c+DoC*4ns}5^*Xm0vUna|empWpa%^!{E(n=7muH$VAs?djpt!+HMM zH&>4yT?e`yzVj9H(EnCe`k#`DIz58yg&&@BD_C6a&xP8@{c~Tbji{q8+|5YjsTT!(* zB)O_s*lT2pHi~V`Z7dy_)zppkw+?9j0{s4l_1N2%fq{X6fq{X6fq{X6fq{X6fq{X6 Yfq{X6fq{X6!T%Tk05Bsyv;Zgo0Nw}TWdHyG literal 0 HcmV?d00001 diff --git a/sagemaker_model_monitor/fairness_and_explainability_jsonlines/test_data/test-dataset.jsonl b/sagemaker_model_monitor/fairness_and_explainability_jsonlines/test_data/test-dataset.jsonl new file mode 100644 index 0000000000..11fc0af4fa --- /dev/null +++ b/sagemaker_model_monitor/fairness_and_explainability_jsonlines/test_data/test-dataset.jsonl @@ -0,0 +1,334 @@ +{"features":[28,2,133937,9,13,2,0,0,4,1,15024,0,55,37]} +{"features":[43,2,72338,12,14,2,12,0,1,1,0,0,40,37]} +{"features":[34,2,162604,11,9,4,2,2,2,1,0,0,40,37]} +{"features":[20,2,258509,11,9,4,6,3,2,1,0,0,40,37]} +{"features":[27,2,446947,9,13,4,0,4,2,0,0,0,55,37]} +{"features":[20,2,95552,11,9,4,11,3,4,1,0,0,40,37]} +{"features":[46,2,145636,11,9,2,3,0,4,1,3103,0,50,37]} +{"features":[18,2,150675,0,6,4,11,3,4,1,0,0,40,37]} +{"features":[22,2,197050,11,9,4,7,3,4,0,0,0,20,37]} +{"features":[20,2,246635,15,10,4,11,3,4,0,2597,0,20,37]} +{"features":[65,0,200764,11,9,6,0,1,4,0,0,0,40,37]} +{"features":[38,2,175665,15,10,2,9,5,4,0,0,0,40,37]} +{"features":[34,3,337995,9,13,0,3,4,2,1,15020,0,50,37]} +{"features":[42,2,86912,9,13,0,7,1,4,1,0,0,40,37]} +{"features":[40,2,100451,15,10,4,2,1,4,1,0,0,40,37]} +{"features":[45,2,192360,12,14,2,3,0,4,1,0,1902,50,37]} +{"features":[55,2,150507,15,10,2,0,0,4,1,0,0,40,37]} +{"features":[36,2,48976,9,13,2,11,5,4,0,0,0,40,37]} +{"features":[34,2,111567,15,10,4,3,1,4,1,0,0,40,37]} +{"features":[26,2,167350,15,10,2,6,0,4,1,3137,0,50,37]} +{"features":[29,2,485944,9,13,4,11,3,2,1,0,0,40,37]} +{"features":[44,1,112763,12,14,0,9,4,4,0,0,0,38,37]} +{"features":[37,5,195843,11,9,2,2,0,4,1,5013,0,40,37]} +{"features":[22,5,181096,9,13,4,9,3,2,1,0,0,20,37]} +{"features":[53,2,119170,11,9,2,13,0,2,1,0,1740,40,37]} +{"features":[61,1,205711,11,9,2,9,0,4,1,0,0,30,37]} +{"features":[46,0,260549,15,10,2,0,0,4,1,0,0,80,37]} +{"features":[18,2,129053,1,7,4,7,3,4,1,0,0,28,37]} +{"features":[22,2,209034,15,10,4,7,1,4,0,0,0,35,37]} +{"features":[29,2,266583,11,9,2,11,0,2,1,2829,0,38,37]} +{"features":[30,2,96480,8,11,4,0,3,4,0,0,0,32,37]} +{"features":[66,4,331960,11,9,2,2,0,4,1,0,0,20,37]} +{"features":[44,2,83891,9,13,0,0,3,1,1,5455,0,40,37]} +{"features":[61,5,103575,15,10,0,2,1,4,1,0,0,40,10]} +{"features":[38,2,589809,9,13,2,0,0,4,1,0,0,45,37]} +{"features":[33,2,214288,11,9,2,6,0,4,1,0,1848,48,37]} +{"features":[31,2,280927,9,13,4,3,1,4,0,0,0,40,37]} +{"features":[49,2,380922,12,14,2,3,0,4,1,15024,0,80,37]} +{"features":[34,2,361497,1,7,2,13,0,4,1,0,0,40,37]} +{"features":[37,2,306868,11,9,0,2,4,4,1,0,0,38,37]} +{"features":[17,2,364952,0,6,3,7,2,4,1,0,0,40,37]} +{"features":[60,2,338833,11,9,4,0,1,2,0,0,0,38,37]} +{"features":[30,4,70985,11,9,2,4,0,4,1,0,0,75,37]} +{"features":[22,2,240229,11,9,4,0,3,4,0,0,0,40,37]} +{"features":[51,2,173987,11,9,2,2,0,4,1,0,0,40,37]} +{"features":[29,2,157103,8,11,4,12,3,2,1,0,1974,40,37]} +{"features":[42,2,205195,11,9,2,2,0,4,1,0,0,40,37]} +{"features":[25,5,120268,15,10,2,2,3,4,1,0,0,50,37]} +{"features":[64,2,104973,11,9,2,0,0,4,1,0,0,45,37]} +{"features":[38,4,248694,15,10,2,2,0,4,1,0,0,36,37]} +{"features":[54,1,108739,1,7,6,10,4,2,0,0,0,40,37]} +{"features":[57,2,151874,11,9,2,7,5,2,0,0,0,50,37]} +{"features":[27,2,150767,15,10,4,6,3,4,1,0,0,48,37]} +{"features":[53,2,239155,15,10,2,3,0,4,1,0,0,50,37]} +{"features":[35,2,166497,14,15,2,9,0,4,1,0,1902,60,37]} +{"features":[22,2,50610,15,10,4,7,1,4,0,0,0,40,37]} +{"features":[52,2,335997,9,13,2,12,0,4,1,7688,0,38,37]} +{"features":[27,4,209301,11,9,2,2,0,4,1,0,0,60,37]} +{"features":[26,2,247196,15,10,4,5,3,4,1,0,0,35,37]} +{"features":[23,2,213902,15,10,4,7,4,4,0,0,0,20,37]} +{"features":[25,1,281412,11,9,4,7,3,4,0,0,0,35,37]} +{"features":[17,2,154337,1,7,4,7,3,4,0,0,0,13,37]} +{"features":[22,2,95647,1,7,4,13,3,1,1,0,0,40,28]} +{"features":[32,2,177695,9,13,2,2,0,1,1,0,0,45,17]} +{"features":[54,2,64421,15,10,6,12,4,4,0,0,0,40,37]} +{"features":[45,2,176341,11,9,0,7,4,4,0,0,0,32,37]} +{"features":[20,2,203914,2,8,4,7,3,4,0,0,0,25,37]} +{"features":[22,2,23940,11,9,4,3,1,1,1,0,0,40,37]} +{"features":[32,2,169768,9,13,5,12,1,2,1,0,0,40,37]} +{"features":[36,2,109133,9,13,2,11,0,4,1,0,0,50,37]} +{"features":[33,2,41610,11,9,5,2,1,4,1,0,0,40,37]} +{"features":[37,2,33440,11,9,5,7,4,4,0,0,0,40,37]} +{"features":[46,2,151325,0,6,2,2,0,4,1,0,0,40,37]} +{"features":[54,1,182429,11,9,6,13,4,4,0,0,0,38,37]} +{"features":[34,2,195748,7,12,4,0,3,2,0,0,0,38,37]} +{"features":[22,2,248446,4,3,4,8,1,4,1,0,0,50,12]} +{"features":[42,2,188789,5,4,6,5,1,4,0,0,0,35,37]} +{"features":[34,2,185480,7,12,4,0,3,4,0,0,0,40,37]} +{"features":[39,2,30875,9,13,0,11,4,4,0,0,0,40,37]} +{"features":[21,2,116489,15,10,4,9,3,4,0,0,0,40,37]} +{"features":[18,2,99591,1,7,4,7,3,4,0,0,0,16,37]} +{"features":[43,2,282678,11,9,0,3,1,4,0,0,0,60,37]} +{"features":[56,1,238405,11,9,6,0,1,4,0,0,0,40,37]} +{"features":[32,1,247156,11,9,2,7,0,2,1,3103,0,38,37]} +{"features":[19,2,73461,11,9,4,12,1,2,1,0,0,40,37]} +{"features":[35,2,98776,11,9,4,3,1,4,1,0,0,60,37]} +{"features":[30,2,232766,11,9,0,7,4,4,0,0,0,40,37]} +{"features":[32,2,220333,11,9,2,2,0,4,1,7298,0,46,37]} +{"features":[27,2,321456,15,10,2,10,0,4,1,0,0,40,37]} +{"features":[41,2,173307,11,9,2,13,0,4,1,0,0,43,37]} +{"features":[22,2,351952,15,10,4,0,3,4,0,0,0,38,37]} +{"features":[33,2,108438,15,10,2,3,0,4,1,0,0,60,37]} +{"features":[30,2,171483,11,9,4,2,3,4,1,0,0,38,37]} +{"features":[32,2,453983,11,9,2,5,0,4,1,0,0,44,37]} +{"features":[37,2,48779,11,9,4,3,1,4,1,0,0,50,37]} +{"features":[42,2,222756,9,13,0,9,4,4,1,7430,0,40,37]} +{"features":[49,2,118520,11,9,0,0,1,4,0,0,0,45,37]} +{"features":[34,2,199539,8,11,2,2,0,4,1,0,0,48,37]} +{"features":[42,2,201343,11,9,2,2,0,4,1,2885,0,40,37]} +{"features":[49,2,99340,4,3,5,6,4,4,0,0,0,40,5]} +{"features":[48,2,163706,9,13,2,3,0,4,1,15024,0,70,37]} +{"features":[59,2,176118,12,14,2,9,0,4,1,0,0,7,37]} +{"features":[67,3,147377,11,9,2,3,0,4,1,0,0,45,37]} +{"features":[36,2,225330,11,9,0,7,4,4,0,0,0,40,37]} +{"features":[32,2,147921,14,15,4,7,1,4,0,0,0,35,37]} +{"features":[36,2,110013,12,14,4,11,1,4,0,0,0,40,37]} +{"features":[76,4,130585,15,10,2,7,5,4,0,0,0,12,37]} +{"features":[41,4,134724,8,11,2,7,5,4,0,3103,0,40,37]} +{"features":[44,2,160369,15,10,2,8,0,4,1,0,0,2,37]} +{"features":[24,2,172169,15,10,4,5,4,4,1,0,0,30,37]} +{"features":[35,2,106471,9,13,4,2,1,4,1,0,0,35,37]} +{"features":[25,1,336320,9,13,0,10,1,4,0,0,0,40,37]} +{"features":[62,2,186446,15,10,0,12,4,4,0,0,0,43,37]} +{"features":[39,2,183279,9,13,2,11,0,4,1,7298,0,40,37]} +{"features":[65,4,135517,5,4,2,2,0,4,1,0,0,40,37]} +{"features":[48,0,72808,1,7,0,0,1,4,0,0,0,42,37]} +{"features":[56,2,197577,11,9,0,7,1,4,0,0,0,40,37]} +{"features":[51,3,110327,1,7,2,2,0,4,1,0,0,60,37]} +{"features":[23,2,237811,15,10,4,0,4,2,0,0,0,40,36]} +{"features":[18,2,632271,15,10,3,0,2,4,0,0,0,40,27]} +{"features":[18,2,220754,1,7,4,5,3,4,1,0,0,24,37]} +{"features":[61,2,29797,11,9,0,11,2,4,0,0,0,40,37]} +{"features":[32,2,183470,8,11,2,2,0,0,1,0,0,42,37]} +{"features":[36,2,127388,7,12,2,11,5,4,0,0,0,40,37]} +{"features":[19,2,78401,11,9,4,7,3,4,1,0,0,40,37]} +{"features":[37,2,385330,5,4,5,7,4,2,1,0,0,40,37]} +{"features":[53,2,161691,12,14,0,3,1,4,0,4865,0,40,37]} +{"features":[31,2,301251,9,13,2,2,0,4,1,0,0,50,37]} +{"features":[30,2,198660,11,9,2,5,0,4,1,0,0,40,37]} +{"features":[44,2,105896,9,13,0,9,1,4,0,0,0,36,37]} +{"features":[23,2,132220,11,9,2,5,0,4,1,0,0,40,37]} +{"features":[45,1,317846,7,12,0,3,4,4,1,0,0,47,37]} +{"features":[32,2,33117,8,11,2,7,0,4,1,0,0,40,37]} +{"features":[41,2,192602,15,10,2,2,0,4,1,0,0,40,37]} +{"features":[30,2,408328,13,1,3,5,4,4,1,0,0,40,24]} +{"features":[34,2,233729,7,12,2,9,0,2,1,0,0,50,37]} +{"features":[21,2,174063,8,11,4,7,3,4,0,0,0,20,37]} +{"features":[30,2,175323,8,11,2,3,5,4,0,0,0,52,37]} +{"features":[20,2,460356,2,8,4,7,1,4,1,0,0,30,24]} +{"features":[33,2,119422,11,9,2,3,0,4,1,0,0,40,37]} +{"features":[26,2,269168,15,10,2,3,0,1,1,0,0,40,37]} +{"features":[21,5,173534,15,10,4,9,3,4,0,0,0,40,6]} +{"features":[48,2,235891,11,9,4,7,1,4,1,0,0,40,31]} +{"features":[70,3,217801,9,13,2,11,0,4,1,0,0,15,37]} +{"features":[52,1,251841,12,14,4,9,1,4,0,0,0,50,37]} +{"features":[24,2,196943,8,11,2,9,0,4,1,0,0,40,37]} +{"features":[41,2,204415,1,7,0,5,1,4,1,0,0,48,37]} +{"features":[23,2,130959,9,13,2,9,0,4,1,2407,0,6,1]} +{"features":[46,2,316271,4,3,2,2,0,4,1,0,0,55,37]} +{"features":[59,2,124137,11,9,0,11,1,4,1,2202,0,40,37]} +{"features":[36,4,140676,9,13,4,11,1,4,1,0,0,50,37]} +{"features":[52,2,91506,11,9,2,5,0,4,1,0,0,45,37]} +{"features":[40,2,300195,15,10,0,12,4,2,0,0,0,40,37]} +{"features":[51,3,119570,9,13,2,2,0,4,1,0,0,50,37]} +{"features":[43,2,303155,9,13,2,3,0,4,1,0,0,50,37]} +{"features":[30,2,210541,11,9,0,2,1,4,0,0,0,40,37]} +{"features":[48,2,153312,15,10,2,11,0,2,1,0,0,60,37]} +{"features":[50,5,137815,9,13,2,2,0,4,1,0,0,40,37]} +{"features":[38,4,179824,11,9,4,4,1,4,1,0,0,50,37]} +{"features":[41,2,106159,11,9,4,6,3,4,1,14344,0,48,37]} +{"features":[69,2,104827,11,9,6,12,4,4,0,0,0,8,37]} +{"features":[21,2,278254,15,10,4,5,3,2,1,0,0,40,37]} +{"features":[33,3,287372,15,10,2,3,0,4,1,0,0,50,37]} +{"features":[51,5,152810,8,11,2,12,0,4,1,0,0,40,37]} +{"features":[46,2,106662,9,13,5,11,1,4,1,99999,0,55,37]} +{"features":[35,2,108140,11,9,0,2,1,4,1,0,0,40,37]} +{"features":[29,2,231507,11,9,4,2,1,4,1,0,0,35,37]} +{"features":[34,4,114074,8,11,6,3,4,4,0,0,0,40,37]} +{"features":[52,2,163776,11,9,2,11,0,4,1,0,1902,60,37]} +{"features":[45,2,123219,4,3,4,6,1,4,1,0,0,40,37]} +{"features":[25,2,391591,11,9,4,2,1,4,1,0,0,50,37]} +{"features":[61,1,202384,9,13,2,9,5,4,0,0,0,30,37]} +{"features":[58,2,282023,9,13,2,3,0,4,1,0,0,50,37]} +{"features":[51,5,22211,11,9,0,3,1,4,1,0,0,37,37]} +{"features":[27,2,192936,9,13,4,9,1,4,0,0,0,45,37]} +{"features":[51,1,106365,7,12,0,0,4,4,0,0,0,40,37]} +{"features":[51,2,166461,1,7,0,6,4,2,0,5455,0,40,37]} +{"features":[52,2,251585,0,6,2,13,0,4,1,0,0,55,37]} +{"features":[61,1,149981,11,9,6,0,1,4,0,0,0,40,37]} +{"features":[23,2,161092,9,13,4,0,3,4,1,0,0,40,37]} +{"features":[40,2,21755,15,10,4,2,2,0,1,0,0,30,37]} +{"features":[20,2,174436,11,9,4,2,3,4,1,0,0,60,37]} +{"features":[26,4,33016,8,11,0,7,4,4,0,0,0,55,37]} +{"features":[55,1,134042,12,14,2,3,5,4,0,0,0,40,37]} +{"features":[32,2,259425,15,10,0,2,1,4,1,0,0,40,37]} +{"features":[26,2,359854,9,13,4,8,2,4,0,0,0,35,24]} +{"features":[44,2,217039,14,15,2,9,0,4,1,99999,0,60,37]} +{"features":[61,2,194804,13,1,5,13,1,2,1,14344,0,40,37]} +{"features":[34,4,198068,11,9,2,2,0,4,1,0,0,40,37]} +{"features":[42,4,52131,15,10,4,3,1,4,1,0,0,40,37]} +{"features":[23,2,239539,11,9,4,6,3,1,1,0,0,40,28]} +{"features":[25,2,54298,11,9,2,11,0,4,1,0,0,30,37]} +{"features":[17,2,35603,2,8,4,11,3,4,0,0,0,20,37]} +{"features":[31,2,241880,8,11,4,0,1,2,1,0,0,45,37]} +{"features":[35,2,46947,15,10,0,0,1,4,0,0,0,45,37]} +{"features":[28,2,203171,15,10,0,2,1,4,1,0,0,40,37]} +{"features":[37,2,199739,15,10,0,2,3,4,1,0,0,40,37]} +{"features":[23,2,215395,15,10,4,2,1,4,1,0,0,40,37]} +{"features":[53,2,117932,11,9,0,6,1,4,0,0,0,40,37]} +{"features":[30,5,107142,9,13,2,9,0,4,1,0,0,37,37]} +{"features":[33,2,173730,8,11,2,6,0,4,1,0,0,40,37]} +{"features":[53,3,200400,10,16,0,3,1,4,1,0,0,60,37]} +{"features":[50,2,158948,11,9,2,9,0,4,1,0,0,84,37]} +{"features":[39,2,206888,15,10,0,0,1,4,0,0,0,40,37]} +{"features":[26,2,124483,9,13,4,9,1,1,1,0,0,25,17]} +{"features":[34,5,62327,9,13,2,9,0,4,1,0,0,40,37]} +{"features":[26,2,366889,11,9,4,13,1,4,1,0,0,40,37]} +{"features":[21,2,30796,15,10,4,7,3,4,0,0,0,25,37]} +{"features":[46,2,130667,11,9,2,13,0,2,1,0,0,40,37]} +{"features":[67,0,231604,11,9,4,0,1,4,1,0,0,40,37]} +{"features":[25,2,332409,8,11,2,2,0,4,1,0,0,40,37]} +{"features":[34,2,51854,11,9,4,6,1,4,1,0,0,40,37]} +{"features":[50,2,62593,8,11,2,4,0,1,1,0,0,40,37]} +{"features":[47,2,78954,1,7,0,11,4,4,0,0,0,28,37]} +{"features":[39,2,205997,15,10,2,11,5,4,0,0,0,21,37]} +{"features":[51,2,231230,11,9,2,6,0,4,1,0,0,45,37]} +{"features":[62,2,291904,11,9,0,8,1,2,0,0,0,20,37]} +{"features":[58,2,49893,12,14,2,3,0,4,1,0,0,50,37]} +{"features":[36,2,141584,15,10,2,9,0,4,1,0,0,50,37]} +{"features":[28,2,259609,11,9,4,2,3,4,1,0,0,50,37]} +{"features":[22,2,125010,9,13,4,0,1,4,0,0,0,20,37]} +{"features":[59,5,136819,12,14,2,9,0,4,1,0,0,8,37]} +{"features":[69,4,199829,9,13,2,3,0,4,1,0,1258,40,37]} +{"features":[33,4,100580,15,10,2,7,5,4,0,0,0,10,37]} +{"features":[56,2,257555,12,14,2,9,0,4,1,0,0,40,37]} +{"features":[47,2,100113,5,4,2,13,0,4,1,0,2051,40,37]} +{"features":[38,0,236648,11,9,2,2,0,4,1,0,0,40,37]} +{"features":[41,2,99679,0,6,2,2,0,4,1,0,0,40,37]} +{"features":[32,2,339482,12,14,4,3,1,4,1,0,0,48,37]} +{"features":[28,2,120475,11,9,4,2,1,4,1,0,0,35,37]} +{"features":[22,2,137876,15,10,4,10,1,4,1,0,0,20,37]} +{"features":[36,4,110861,11,9,0,2,3,4,1,0,0,20,37]} +{"features":[55,4,225623,15,10,2,4,0,4,1,0,0,40,37]} +{"features":[47,2,323212,11,9,6,7,1,4,0,0,0,40,37]} +{"features":[59,2,157831,11,9,0,0,1,4,0,0,0,16,37]} +{"features":[25,2,25497,15,10,4,13,1,4,1,4101,0,40,37]} +{"features":[42,4,114580,12,14,0,3,4,4,0,0,0,70,37]} +{"features":[22,2,273675,11,9,3,7,2,2,0,0,0,35,31]} +{"features":[31,0,40909,15,10,2,12,0,2,1,0,0,40,37]} +{"features":[42,3,557349,9,13,2,3,0,4,1,0,0,70,37]} +{"features":[18,2,219256,15,10,4,11,3,4,0,0,0,25,37]} +{"features":[39,2,126569,11,9,4,2,1,4,1,0,0,40,29]} +{"features":[37,2,108282,9,13,2,3,0,4,1,0,0,45,37]} +{"features":[31,2,147270,15,10,4,0,3,4,0,0,0,35,37]} +{"features":[44,2,90582,9,13,2,2,0,4,1,0,0,50,37]} +{"features":[51,2,379797,0,6,2,6,0,2,1,0,0,40,37]} +{"features":[37,1,136749,11,9,4,0,3,4,0,0,0,35,37]} +{"features":[25,0,198813,9,13,4,0,4,2,0,0,1590,40,37]} +{"features":[30,2,159123,11,9,2,2,0,4,1,0,0,45,37]} +{"features":[36,3,196554,11,9,2,2,0,4,1,0,0,46,37]} +{"features":[31,2,238002,9,13,2,13,0,4,1,0,0,55,24]} +{"features":[43,2,125577,11,9,5,0,4,2,0,0,0,40,37]} +{"features":[22,2,97212,11,9,4,7,1,4,0,0,0,15,37]} +{"features":[19,2,222866,0,6,4,4,2,4,1,0,0,40,37]} +{"features":[18,2,175752,11,9,4,5,3,4,1,0,0,30,37]} +{"features":[28,2,77009,15,10,4,11,2,4,0,0,0,40,37]} +{"features":[54,2,162745,11,9,2,2,0,4,1,0,0,55,37]} +{"features":[30,2,94235,9,13,2,9,0,4,1,0,1977,50,37]} +{"features":[19,2,158343,15,10,4,7,3,4,0,0,0,12,37]} +{"features":[49,2,201127,1,7,2,13,0,4,1,0,1902,70,37]} +{"features":[39,2,118429,15,10,0,11,1,4,1,0,0,40,37]} +{"features":[36,2,334365,1,7,2,13,0,4,1,0,0,60,37]} +{"features":[42,2,89226,8,11,2,13,0,4,1,0,0,45,37]} +{"features":[33,2,56121,11,9,4,13,1,4,1,0,0,60,37]} +{"features":[61,5,140851,9,13,2,9,0,4,1,0,0,40,37]} +{"features":[36,2,86643,2,8,2,6,0,4,1,0,0,48,37]} +{"features":[20,2,175808,11,9,4,2,3,4,1,0,0,40,37]} +{"features":[19,2,58471,11,9,4,2,3,4,0,0,0,40,37]} +{"features":[55,2,118057,11,9,6,2,4,4,1,0,0,51,37]} +{"features":[30,2,192002,15,10,2,2,0,4,1,0,0,40,37]} +{"features":[61,2,43904,11,9,0,7,1,2,1,0,0,40,37]} +{"features":[39,3,31709,15,10,2,0,5,4,0,0,0,20,37]} +{"features":[39,2,286026,9,13,2,2,0,4,1,0,0,52,37]} +{"features":[55,4,110844,11,9,2,3,5,4,0,0,0,40,37]} +{"features":[32,2,200401,11,9,4,3,1,4,1,0,0,40,3]} +{"features":[44,5,101603,9,13,2,3,0,4,1,0,0,40,37]} +{"features":[58,2,49159,11,9,2,0,5,4,0,0,0,40,37]} +{"features":[52,5,168035,15,10,2,12,0,4,1,0,0,45,37]} +{"features":[18,2,260977,2,8,4,11,3,4,0,0,0,20,37]} +{"features":[47,2,33794,11,9,2,2,0,4,1,0,0,56,37]} +{"features":[26,2,242464,8,11,4,3,1,4,1,0,0,50,37]} +{"features":[35,2,97554,7,12,2,3,0,4,1,0,0,50,37]} +{"features":[39,4,245361,15,10,4,9,3,4,0,0,0,10,37]} +{"features":[26,2,178478,15,10,4,11,3,4,0,0,0,40,37]} +{"features":[31,2,104509,15,10,5,7,4,4,0,0,0,35,37]} +{"features":[31,2,159187,15,10,2,2,0,4,1,0,0,25,37]} +{"features":[67,4,167015,9,13,6,11,1,4,1,0,0,30,37]} +{"features":[40,2,199668,11,9,0,11,3,4,0,0,0,25,37]} +{"features":[35,2,37778,11,9,2,2,0,4,1,0,0,50,37]} +{"features":[54,4,139023,15,10,2,11,0,4,1,0,0,40,37]} +{"features":[45,3,188694,14,15,2,9,0,4,1,0,0,50,37]} +{"features":[50,2,178251,12,14,2,0,5,4,0,0,0,40,37]} +{"features":[51,2,81534,1,7,4,7,2,1,1,0,0,35,37]} +{"features":[37,2,353550,12,14,2,3,0,4,1,15024,0,60,37]} +{"features":[54,1,231482,11,9,2,2,0,4,1,0,0,40,30]} +{"features":[22,2,228394,11,9,4,7,1,4,0,0,0,50,37]} +{"features":[38,1,94529,11,9,2,5,5,4,0,3103,0,50,37]} +{"features":[35,2,135289,8,11,0,2,1,4,1,0,0,50,37]} +{"features":[37,0,32950,7,12,0,3,4,2,0,0,0,40,37]} +{"features":[45,2,165346,15,10,0,3,4,4,0,0,0,64,37]} +{"features":[57,1,62701,15,10,6,3,1,4,1,6849,0,40,37]} +{"features":[30,2,49358,2,8,4,11,3,2,0,0,0,40,37]} +{"features":[52,2,227832,9,13,2,9,0,4,1,0,0,50,37]} +{"features":[67,2,188903,9,13,2,9,0,4,1,0,0,40,37]} +{"features":[28,4,183151,11,9,2,2,0,4,1,0,0,40,37]} +{"features":[42,5,116493,9,13,2,10,0,4,1,0,0,52,37]} +{"features":[48,1,93449,14,15,2,9,0,1,1,99999,0,40,28]} +{"features":[18,2,211683,2,8,4,5,3,4,1,0,0,20,37]} +{"features":[47,2,155107,11,9,2,12,0,4,1,0,0,40,37]} +{"features":[55,3,150917,15,10,2,3,0,4,1,0,1977,45,37]} +{"features":[51,2,135388,2,8,6,6,1,4,1,0,1564,40,37]} +{"features":[38,2,183683,0,6,3,7,1,4,1,0,0,45,37]} +{"features":[47,4,185859,11,9,2,4,0,4,1,3103,0,60,37]} +{"features":[44,4,22933,11,9,2,3,0,4,1,0,0,40,37]} +{"features":[40,2,356934,14,15,2,3,0,4,1,0,0,50,37]} +{"features":[52,2,94448,8,11,2,9,0,4,1,0,0,40,37]} +{"features":[59,2,107318,5,4,2,2,0,4,1,5178,0,50,37]} +{"features":[31,2,83413,11,9,4,11,3,4,1,0,0,40,37]} +{"features":[34,2,162312,9,13,2,0,0,1,1,0,0,40,28]} +{"features":[44,2,118212,0,6,2,6,0,4,1,0,0,40,37]} +{"features":[35,1,132879,11,9,2,13,0,4,1,0,0,40,37]} +{"features":[25,4,121285,9,13,4,11,1,4,0,0,0,40,37]} +{"features":[22,2,341760,9,13,4,3,3,4,0,0,0,40,37]} +{"features":[35,2,216473,11,9,0,2,4,4,1,0,0,40,37]} +{"features":[25,2,179255,15,10,4,0,3,4,0,0,0,25,37]} +{"features":[36,2,298635,9,13,2,7,0,3,1,0,0,40,18]} +{"features":[20,2,204596,15,10,4,11,3,4,0,0,0,32,37]} +{"features":[27,2,285897,11,9,2,13,0,4,1,0,1887,40,37]} +{"features":[19,2,386492,15,10,4,5,3,4,1,0,0,16,37]} +{"features":[29,2,178610,15,10,0,7,4,4,0,0,0,21,37]} +{"features":[49,2,96854,11,9,0,7,4,4,1,0,0,40,37]} +{"features":[45,2,293628,15,10,2,9,0,4,1,0,0,50,28]} +{"features":[67,2,192995,11,9,6,0,4,4,0,6723,0,40,37]} +{"features":[30,2,235847,9,13,4,7,3,4,0,0,0,24,37]} diff --git a/sagemaker_model_monitor/fairness_and_explainability_jsonlines/test_data/validation-dataset.jsonl b/sagemaker_model_monitor/fairness_and_explainability_jsonlines/test_data/validation-dataset.jsonl new file mode 100644 index 0000000000..0149a084a9 --- /dev/null +++ b/sagemaker_model_monitor/fairness_and_explainability_jsonlines/test_data/validation-dataset.jsonl @@ -0,0 +1,666 @@ +{"features":[41,2,220531,14,15,2,9,0,4,1,0,0,60,38],"label":1} +{"features":[33,2,35378,9,13,2,11,5,4,0,0,0,45,38],"label":1} +{"features":[36,2,223433,12,14,2,11,0,4,1,7688,0,50,38],"label":1} +{"features":[40,2,220589,7,12,4,0,1,4,0,0,0,40,38],"label":0} +{"features":[30,2,231413,15,10,2,2,0,4,1,0,0,40,38],"label":1} +{"features":[33,4,218164,11,9,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[42,2,213464,15,10,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[20,2,247794,11,9,4,11,1,4,0,0,0,84,38],"label":0} +{"features":[43,2,174575,15,10,0,0,1,4,1,0,0,45,38],"label":0} +{"features":[42,4,54202,14,15,2,9,0,4,1,0,0,50,38],"label":1} +{"features":[27,2,126060,11,9,4,3,1,4,0,0,0,40,38],"label":0} +{"features":[25,2,182866,11,9,4,5,3,4,1,0,0,40,38],"label":0} +{"features":[43,2,302041,11,9,4,0,1,2,0,0,0,40,38],"label":0} +{"features":[30,2,91145,11,9,4,5,4,4,1,0,0,55,38],"label":0} +{"features":[41,2,648223,3,2,3,4,4,4,1,0,0,40,25],"label":0} +{"features":[60,2,101096,10,16,4,9,1,4,0,0,0,65,38],"label":1} +{"features":[45,3,197332,15,10,2,2,0,4,1,0,0,55,38],"label":1} +{"features":[42,2,174112,12,14,4,9,1,4,0,0,0,40,38],"label":0} +{"features":[36,2,183902,9,13,2,9,5,4,0,0,0,4,38],"label":1} +{"features":[76,2,199949,9,13,2,0,0,4,1,20051,0,50,38],"label":1} +{"features":[45,0,71823,15,10,2,0,0,2,1,0,0,20,38],"label":0} +{"features":[37,2,147258,6,5,2,6,0,4,1,0,0,50,38],"label":1} +{"features":[41,2,119079,11,9,2,11,0,4,1,0,0,49,38],"label":1} +{"features":[38,2,193961,15,10,2,2,0,1,1,0,0,40,29],"label":1} +{"features":[76,2,125784,9,13,2,3,0,4,1,0,0,40,38],"label":0} +{"features":[45,2,155659,9,13,2,9,0,4,1,0,0,60,38],"label":1} +{"features":[30,2,345122,14,15,2,9,0,4,1,0,0,50,38],"label":0} +{"features":[30,2,171598,9,13,3,11,1,4,0,0,0,50,38],"label":0} +{"features":[58,3,78104,15,10,2,3,0,4,1,7298,0,60,38],"label":1} +{"features":[37,2,224541,15,10,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[17,2,369909,0,6,4,7,3,4,1,0,0,20,38],"label":0} +{"features":[45,2,204205,5,4,0,6,1,4,1,0,0,48,38],"label":0} +{"features":[64,2,180401,0,6,2,13,0,4,1,0,0,40,38],"label":1} +{"features":[49,2,129513,11,9,2,13,0,4,1,0,0,50,38],"label":1} +{"features":[23,2,125491,15,10,4,7,1,1,0,0,0,35,39],"label":0} +{"features":[20,0,410446,11,9,4,0,2,4,1,0,0,20,38],"label":0} +{"features":[51,2,259323,9,13,2,3,0,4,1,0,0,50,38],"label":1} +{"features":[44,2,206686,15,10,0,0,4,4,0,0,0,40,38],"label":0} +{"features":[22,2,106700,7,12,4,0,3,4,0,0,0,27,38],"label":0} +{"features":[47,2,185041,15,10,2,2,0,4,1,7298,0,40,38],"label":1} +{"features":[30,2,327202,2,8,4,2,1,2,1,0,0,40,38],"label":0} +{"features":[35,2,136343,11,9,4,11,1,4,1,0,0,40,38],"label":0} +{"features":[47,1,287320,12,14,4,9,1,4,1,0,0,40,38],"label":0} +{"features":[27,5,553473,9,13,2,10,5,2,0,0,0,48,38],"label":0} +{"features":[43,2,462180,14,15,2,9,0,4,1,99999,0,60,38],"label":1} +{"features":[49,1,34021,9,13,4,9,3,4,0,0,0,50,38],"label":0} +{"features":[43,2,350379,4,3,0,8,4,4,0,0,0,40,25],"label":0} +{"features":[44,2,174283,11,9,2,2,0,4,1,0,0,40,38],"label":1} +{"features":[39,2,164733,15,10,0,0,1,4,0,0,0,45,38],"label":0} +{"features":[37,2,124293,15,10,2,0,0,4,1,0,0,50,38],"label":0} +{"features":[36,1,110791,7,12,5,0,4,4,0,0,0,40,38],"label":0} +{"features":[26,2,195994,15,10,4,11,1,4,0,0,0,15,38],"label":0} +{"features":[52,4,72257,15,10,2,11,0,4,1,0,0,50,38],"label":0} +{"features":[20,2,231981,15,10,4,13,1,4,1,0,0,32,38],"label":0} +{"features":[43,2,346321,12,14,2,9,0,4,1,0,0,45,38],"label":1} +{"features":[28,2,412149,0,6,4,4,2,4,1,0,0,35,25],"label":0} +{"features":[61,2,128848,11,9,2,6,0,4,1,3471,0,40,38],"label":0} +{"features":[46,3,168796,9,13,2,11,0,4,1,0,0,55,38],"label":0} +{"features":[36,2,185099,14,15,2,9,0,4,1,0,0,55,38],"label":1} +{"features":[40,3,50644,7,12,0,11,4,4,0,1506,0,40,38],"label":0} +{"features":[32,2,340917,11,9,4,5,1,4,1,0,0,40,38],"label":0} +{"features":[46,2,175625,14,15,0,9,4,4,0,0,0,40,38],"label":0} +{"features":[43,2,216697,15,10,2,10,0,3,1,0,0,32,38],"label":0} +{"features":[36,2,389725,15,10,0,0,1,4,1,0,0,45,38],"label":0} +{"features":[28,4,192838,8,11,2,2,0,4,1,0,0,45,38],"label":0} +{"features":[55,0,35723,12,14,2,3,0,4,1,0,0,60,38],"label":1} +{"features":[39,2,270059,15,10,0,0,4,4,0,0,0,35,38],"label":0} +{"features":[44,2,116825,14,15,2,9,0,4,1,15024,0,80,38],"label":1} +{"features":[23,1,324637,15,10,4,0,1,4,1,0,0,30,38],"label":0} +{"features":[28,2,160731,11,9,2,2,0,4,1,0,0,40,30],"label":1} +{"features":[53,1,216931,15,10,2,10,0,4,1,4386,0,40,38],"label":1} +{"features":[59,2,243226,0,6,0,6,1,4,0,0,0,40,38],"label":0} +{"features":[19,2,63918,15,10,4,0,1,4,1,0,0,40,38],"label":0} +{"features":[38,2,52963,9,13,4,0,1,4,0,0,0,50,38],"label":0} +{"features":[17,2,268276,2,8,4,7,3,4,1,0,0,12,38],"label":0} +{"features":[39,2,114079,7,12,4,2,1,4,1,0,0,40,38],"label":0} +{"features":[61,2,130684,15,10,2,9,0,4,1,0,0,42,38],"label":0} +{"features":[37,2,245053,15,10,0,5,3,4,1,0,1504,40,38],"label":0} +{"features":[40,2,53835,9,13,2,11,0,4,1,0,0,50,38],"label":1} +{"features":[41,2,225892,15,10,2,2,0,4,1,0,0,48,38],"label":1} +{"features":[31,2,131425,9,13,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[40,2,71305,11,9,2,7,0,2,1,0,0,40,38],"label":0} +{"features":[46,0,167381,11,9,2,0,5,4,0,0,0,40,38],"label":1} +{"features":[45,2,187730,9,13,4,9,3,4,1,0,0,40,38],"label":0} +{"features":[48,2,95661,15,10,4,0,1,4,0,0,0,43,38],"label":0} +{"features":[39,2,150217,15,10,0,11,1,4,0,0,0,38,38],"label":0} +{"features":[28,5,37250,9,13,4,9,3,4,1,0,0,16,38],"label":0} +{"features":[18,2,27920,1,7,4,3,3,4,0,0,0,25,38],"label":0} +{"features":[22,2,129172,15,10,4,7,3,4,1,0,0,16,38],"label":0} +{"features":[28,2,138054,7,12,4,7,1,3,1,0,0,40,38],"label":0} +{"features":[50,2,33304,11,9,2,2,0,4,1,0,0,40,38],"label":1} +{"features":[52,2,110977,10,16,4,3,1,4,1,0,0,40,38],"label":1} +{"features":[50,2,172175,14,15,2,9,0,4,1,0,0,50,38],"label":1} +{"features":[37,3,107164,0,6,4,13,1,4,1,0,2559,50,38],"label":1} +{"features":[38,2,160808,11,9,2,2,0,2,1,4386,0,48,38],"label":0} +{"features":[57,3,51016,11,9,2,3,0,4,1,0,0,60,38],"label":1} +{"features":[34,2,253438,15,10,2,3,0,4,1,0,0,60,38],"label":1} +{"features":[38,2,185330,15,10,4,2,3,4,0,0,0,25,38],"label":0} +{"features":[33,4,24504,11,9,5,2,2,4,1,0,0,50,38],"label":0} +{"features":[37,2,278632,6,5,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[66,5,102640,11,9,6,9,4,2,0,0,0,35,38],"label":0} +{"features":[35,2,168675,11,9,5,13,3,4,1,0,0,50,38],"label":0} +{"features":[37,3,86459,7,12,5,3,4,4,1,0,0,50,38],"label":0} +{"features":[51,2,138847,9,13,2,3,0,4,1,0,0,40,38],"label":1} +{"features":[36,2,163290,15,10,0,11,4,4,0,0,0,40,38],"label":0} +{"features":[33,2,134886,15,10,4,0,3,4,0,99999,0,30,38],"label":1} +{"features":[50,2,271262,11,9,2,13,0,4,1,0,0,40,38],"label":1} +{"features":[37,2,186191,11,9,2,6,0,4,1,0,0,46,38],"label":0} +{"features":[59,2,261816,15,10,0,3,1,4,0,0,0,52,27],"label":0} +{"features":[63,2,174018,15,10,2,11,0,2,1,0,0,40,38],"label":1} +{"features":[33,2,124827,11,9,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[39,2,318416,0,6,5,7,3,2,0,0,0,12,38],"label":0} +{"features":[36,2,214816,11,9,4,2,1,4,0,0,0,40,38],"label":0} +{"features":[50,2,34832,9,13,2,12,0,4,1,15024,0,40,38],"label":1} +{"features":[29,2,413297,7,12,4,11,1,4,1,0,0,45,25],"label":0} +{"features":[44,2,68748,15,10,2,11,0,4,1,0,0,48,38],"label":0} +{"features":[47,5,156417,15,10,0,9,4,4,1,0,0,20,38],"label":0} +{"features":[26,2,302603,11,9,4,13,3,4,1,0,0,45,38],"label":0} +{"features":[58,4,106942,15,10,0,2,4,4,1,0,0,40,38],"label":0} +{"features":[28,2,203776,0,6,2,2,0,4,1,0,0,50,38],"label":0} +{"features":[17,1,173497,1,7,4,9,3,2,1,0,0,15,38],"label":0} +{"features":[66,0,47358,0,6,2,2,0,4,1,3471,0,40,38],"label":0} +{"features":[50,2,174102,11,9,0,2,3,4,1,0,0,40,32],"label":0} +{"features":[33,2,119176,15,10,6,0,4,4,0,0,0,40,38],"label":0} +{"features":[36,4,219611,9,13,4,11,1,2,0,2174,0,50,38],"label":0} +{"features":[48,2,102102,8,11,2,12,0,4,1,0,0,50,38],"label":1} +{"features":[20,2,157541,15,10,4,2,3,4,1,0,0,40,38],"label":0} +{"features":[68,2,218637,15,10,2,11,0,4,1,0,2377,55,38],"label":1} +{"features":[27,2,198258,9,13,4,11,3,4,1,0,0,35,38],"label":0} +{"features":[29,2,110134,15,10,0,6,1,4,1,0,0,40,38],"label":0} +{"features":[65,5,29276,5,4,6,7,2,4,0,0,0,24,38],"label":0} +{"features":[38,2,33001,9,13,2,3,0,4,1,0,0,55,38],"label":1} +{"features":[43,4,277647,11,9,2,3,0,4,1,0,0,35,38],"label":0} +{"features":[39,2,214816,9,13,2,3,0,4,1,0,0,60,38],"label":0} +{"features":[52,4,237868,15,10,4,0,4,4,1,0,0,5,38],"label":0} +{"features":[52,0,30731,9,13,2,3,0,4,1,0,0,45,38],"label":1} +{"features":[29,2,228346,8,11,4,2,1,4,1,0,0,50,38],"label":0} +{"features":[52,1,199995,12,14,2,3,0,4,1,7298,0,60,38],"label":1} +{"features":[46,0,31141,15,10,0,13,1,4,1,0,0,40,38],"label":0} +{"features":[42,2,231813,1,7,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[39,2,272950,9,13,2,2,0,4,1,0,0,45,38],"label":1} +{"features":[36,2,182074,15,10,0,0,1,4,1,0,0,45,38],"label":0} +{"features":[54,2,118793,11,9,2,0,0,4,1,0,0,45,38],"label":0} +{"features":[28,2,207513,11,9,4,11,3,4,1,0,0,48,38],"label":0} +{"features":[54,2,97778,5,4,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[33,2,217460,11,9,2,11,0,4,1,0,0,60,38],"label":1} +{"features":[90,2,221832,9,13,2,3,0,4,1,0,0,45,38],"label":0} +{"features":[57,5,109015,2,8,0,7,4,4,0,0,0,40,38],"label":0} +{"features":[29,2,40083,10,16,4,9,1,4,1,0,0,40,1],"label":0} +{"features":[25,2,188767,11,9,4,2,3,4,1,0,0,40,38],"label":0} +{"features":[30,2,154568,9,13,2,2,0,1,1,0,0,36,39],"label":1} +{"features":[38,2,161016,15,10,0,9,1,4,0,0,0,32,38],"label":0} +{"features":[22,2,117789,15,10,4,9,3,4,0,0,0,10,38],"label":0} +{"features":[26,5,294400,11,9,2,10,0,4,1,0,0,38,38],"label":0} +{"features":[41,2,168293,12,14,0,3,4,4,0,0,0,45,38],"label":0} +{"features":[29,4,164607,8,11,2,4,0,4,1,0,0,50,38],"label":0} +{"features":[51,5,226885,11,9,4,13,1,4,1,0,0,40,38],"label":0} +{"features":[76,4,117169,5,4,4,4,1,4,1,0,0,30,38],"label":0} +{"features":[22,2,184756,15,10,4,11,3,4,0,0,0,30,38],"label":0} +{"features":[49,2,248895,11,9,2,6,0,4,1,0,0,45,38],"label":0} +{"features":[36,4,257250,8,11,2,4,0,4,1,0,0,99,38],"label":0} +{"features":[61,4,133969,11,9,2,11,0,1,1,0,0,63,34],"label":0} +{"features":[31,2,236599,9,13,2,3,0,4,1,0,0,45,38],"label":1} +{"features":[22,2,150175,15,10,4,0,3,4,0,0,0,20,38],"label":0} +{"features":[25,2,191921,15,10,4,13,3,4,1,0,0,40,38],"label":0} +{"features":[56,2,170324,4,3,2,2,0,2,1,0,0,40,37],"label":0} +{"features":[35,2,107125,9,13,2,9,0,4,1,0,0,16,38],"label":1} +{"features":[62,2,103344,9,13,6,3,1,4,1,10520,0,50,38],"label":1} +{"features":[24,1,317443,9,13,2,9,5,2,0,0,0,40,38],"label":0} +{"features":[22,2,341227,15,10,4,0,1,4,1,0,0,20,38],"label":0} +{"features":[25,2,290528,11,9,2,6,0,4,1,0,0,40,38],"label":0} +{"features":[27,2,198286,15,10,4,7,1,4,0,0,0,34,38],"label":0} +{"features":[64,2,256466,11,9,2,12,0,1,1,0,0,60,29],"label":1} +{"features":[32,1,223267,11,9,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[32,2,388672,15,10,0,5,1,4,1,0,0,16,38],"label":0} +{"features":[24,2,509629,11,9,4,7,3,4,0,0,0,25,38],"label":0} +{"features":[21,2,191460,1,7,4,7,4,2,0,0,0,40,38],"label":0} +{"features":[54,2,90363,7,12,2,3,0,4,1,0,0,40,38],"label":1} +{"features":[49,2,192323,11,9,2,6,0,4,1,0,0,40,38],"label":0} +{"features":[36,2,218490,8,11,2,11,0,4,1,0,0,60,38],"label":0} +{"features":[24,2,159580,9,13,4,7,3,2,0,0,0,75,38],"label":0} +{"features":[56,2,220187,15,10,2,11,0,4,1,0,0,45,38],"label":1} +{"features":[52,2,218550,15,10,3,0,1,4,0,14084,0,16,38],"label":1} +{"features":[68,2,195868,9,13,2,11,0,4,1,20051,0,40,38],"label":1} +{"features":[44,2,151780,15,10,6,3,1,2,0,0,0,40,38],"label":0} +{"features":[58,2,190747,11,9,2,6,0,4,1,0,0,40,38],"label":0} +{"features":[29,4,142519,11,9,2,6,0,4,1,0,0,40,38],"label":0} +{"features":[73,1,205580,4,3,2,9,0,4,1,0,0,6,38],"label":0} +{"features":[58,3,78634,1,7,2,13,0,4,1,0,0,60,38],"label":0} +{"features":[21,2,314182,11,9,4,7,1,4,0,0,0,40,38],"label":0} +{"features":[44,2,297991,7,12,4,3,1,1,0,0,0,50,38],"label":0} +{"features":[36,2,186110,15,10,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[46,4,31267,11,9,2,13,0,4,1,0,0,50,38],"label":0} +{"features":[34,2,57426,9,13,4,11,1,4,1,0,0,45,38],"label":0} +{"features":[21,2,107882,7,12,4,7,3,4,0,0,0,9,38],"label":0} +{"features":[58,5,194068,12,14,2,9,0,4,1,0,1977,50,38],"label":1} +{"features":[22,2,332194,15,10,4,7,3,2,1,0,0,40,38],"label":0} +{"features":[65,3,115922,9,13,2,3,0,4,1,0,0,40,38],"label":1} +{"features":[27,2,302406,15,10,2,11,0,4,1,0,0,40,38],"label":1} +{"features":[37,2,270059,15,10,0,0,4,4,0,25236,0,25,38],"label":1} +{"features":[40,2,375603,11,9,0,0,4,2,1,0,0,40,38],"label":0} +{"features":[24,2,456460,7,12,2,0,5,4,0,0,0,40,38],"label":0} +{"features":[35,2,202397,9,13,2,2,0,1,1,0,0,40,29],"label":1} +{"features":[35,4,120066,15,10,2,2,0,0,1,0,0,60,38],"label":0} +{"features":[33,2,197424,11,9,2,3,0,4,1,5013,0,40,38],"label":0} +{"features":[36,4,67728,9,13,2,11,0,4,1,0,0,50,38],"label":1} +{"features":[23,2,99543,2,8,4,13,1,4,1,0,0,46,38],"label":0} +{"features":[49,3,229737,14,15,2,9,0,4,1,99999,0,37,38],"label":1} +{"features":[62,2,194167,11,9,0,6,1,4,0,2174,0,40,38],"label":0} +{"features":[34,2,188096,11,9,4,0,1,4,0,0,0,36,38],"label":0} +{"features":[40,2,338740,11,9,2,3,0,4,1,0,0,40,38],"label":0} +{"features":[24,2,275691,1,7,4,13,3,4,1,0,0,39,38],"label":0} +{"features":[17,2,220384,1,7,4,0,3,4,1,0,0,15,38],"label":0} +{"features":[51,2,302146,1,7,4,7,1,2,0,0,0,40,38],"label":0} +{"features":[31,0,166626,11,9,2,0,0,4,1,0,0,40,38],"label":1} +{"features":[52,2,145271,9,13,2,2,0,1,1,0,0,40,38],"label":0} +{"features":[30,2,95299,11,9,2,6,0,1,1,0,0,40,39],"label":1} +{"features":[28,2,31801,11,9,4,5,2,4,1,0,0,60,38],"label":0} +{"features":[24,2,228613,1,7,4,6,4,4,0,0,0,40,38],"label":0} +{"features":[40,2,234633,15,10,4,2,1,4,1,0,0,40,38],"label":0} +{"features":[26,2,146343,15,10,2,11,5,2,0,0,0,40,38],"label":0} +{"features":[42,2,331651,12,14,4,9,1,4,0,8614,0,50,38],"label":1} +{"features":[26,2,167106,11,9,4,2,2,1,1,0,0,40,16],"label":0} +{"features":[27,0,196386,7,12,2,0,0,4,1,4064,0,40,7],"label":0} +{"features":[28,1,146949,11,9,2,5,0,4,1,0,0,40,38],"label":0} +{"features":[36,2,47310,11,9,4,7,1,2,0,0,0,40,38],"label":0} +{"features":[45,1,192793,15,10,2,10,0,4,1,0,0,40,38],"label":1} +{"features":[29,2,535978,15,10,2,2,0,4,1,0,0,45,38],"label":0} +{"features":[22,2,324922,11,9,4,6,1,4,1,0,0,50,38],"label":0} +{"features":[47,2,155489,11,9,2,13,0,4,1,7688,0,55,38],"label":1} +{"features":[39,5,85566,9,13,2,9,0,4,1,0,0,40,38],"label":0} +{"features":[24,2,385540,11,9,2,11,0,4,1,0,0,40,25],"label":0} +{"features":[39,2,167140,12,14,2,3,0,4,1,0,0,40,38],"label":0} +{"features":[39,2,347960,14,15,4,9,1,4,0,14084,0,35,38],"label":1} +{"features":[51,2,180807,15,10,0,3,4,4,0,0,0,40,38],"label":0} +{"features":[24,2,310380,15,10,3,0,3,2,0,0,0,45,38],"label":0} +{"features":[55,2,271710,15,10,4,0,1,4,1,0,0,45,38],"label":0} +{"features":[32,0,191385,7,12,0,10,1,4,1,2174,0,40,38],"label":0} +{"features":[22,2,320451,15,10,4,10,3,1,1,0,0,24,18],"label":0} +{"features":[59,2,277034,11,9,0,12,4,4,1,0,0,60,38],"label":1} +{"features":[24,2,403865,15,10,2,2,0,4,1,0,0,56,38],"label":0} +{"features":[41,5,47170,9,13,2,9,5,0,0,0,0,48,38],"label":1} +{"features":[40,2,273308,11,9,0,6,4,4,0,0,0,48,25],"label":0} +{"features":[57,4,152030,15,10,2,11,5,4,0,0,0,25,38],"label":1} +{"features":[36,2,194905,9,13,6,9,4,4,0,0,0,44,38],"label":0} +{"features":[31,4,229946,11,9,2,9,0,4,1,0,0,40,3],"label":0} +{"features":[28,2,119793,8,11,0,3,1,4,1,10520,0,50,38],"label":1} +{"features":[38,2,143538,11,9,4,6,1,4,0,0,0,40,38],"label":0} +{"features":[28,2,108574,15,10,2,0,5,4,0,0,0,15,38],"label":0} +{"features":[32,2,194141,11,9,0,6,3,4,1,0,0,50,38],"label":0} +{"features":[49,4,107597,11,9,0,3,4,4,0,14084,0,30,38],"label":1} +{"features":[37,2,186035,7,12,2,2,0,4,1,0,0,55,38],"label":0} +{"features":[50,2,263200,4,3,3,7,4,4,0,0,0,34,25],"label":0} +{"features":[37,2,70562,3,2,4,7,4,4,0,0,0,48,7],"label":0} +{"features":[38,2,195686,15,10,2,2,0,4,1,0,0,40,38],"label":1} +{"features":[44,1,197919,15,10,0,7,4,4,0,0,0,40,38],"label":0} +{"features":[30,4,261943,1,7,3,2,1,4,1,0,0,30,15],"label":0} +{"features":[20,3,95997,11,9,4,4,3,4,1,0,0,70,38],"label":0} +{"features":[32,2,151773,15,10,2,2,0,4,1,0,0,45,38],"label":0} +{"features":[56,2,177271,8,11,2,12,0,4,1,0,0,40,38],"label":1} +{"features":[24,2,537222,11,9,2,3,0,4,1,0,0,50,38],"label":0} +{"features":[59,2,196482,11,9,6,0,4,4,0,0,0,40,38],"label":0} +{"features":[24,2,43323,11,9,4,7,1,4,0,0,1762,40,38],"label":0} +{"features":[40,2,259307,12,14,2,3,0,4,1,0,0,50,38],"label":1} +{"features":[35,2,167990,6,5,2,6,0,4,1,0,0,40,1],"label":0} +{"features":[32,2,158416,11,9,0,11,1,4,1,0,0,50,38],"label":0} +{"features":[27,2,199903,9,13,4,9,1,4,0,0,0,40,38],"label":0} +{"features":[44,2,210534,4,3,2,5,0,4,1,0,0,40,25],"label":0} +{"features":[50,2,128798,9,13,2,12,0,4,1,0,0,40,38],"label":1} +{"features":[17,2,176467,6,5,4,13,1,4,1,0,0,20,38],"label":0} +{"features":[29,2,153805,11,9,4,6,2,3,1,0,0,40,6],"label":0} +{"features":[23,2,238917,5,4,4,2,2,4,1,0,0,36,38],"label":0} +{"features":[69,5,34339,11,9,2,10,0,4,1,0,0,40,38],"label":0} +{"features":[34,2,205733,11,9,4,0,1,4,0,0,0,40,38],"label":0} +{"features":[29,2,193152,11,9,4,5,1,4,1,0,1408,40,38],"label":0} +{"features":[35,2,191628,15,10,2,9,0,4,1,0,0,40,38],"label":0} +{"features":[17,2,51939,1,7,4,11,3,4,0,0,0,15,38],"label":0} +{"features":[34,3,80249,15,10,2,4,0,4,1,0,0,72,38],"label":0} +{"features":[50,2,162632,11,9,2,3,0,4,1,0,0,45,38],"label":0} +{"features":[21,2,292264,11,9,4,2,1,4,1,0,0,35,38],"label":0} +{"features":[40,2,224799,9,13,2,9,0,4,1,0,0,45,38],"label":0} +{"features":[37,2,194004,1,7,2,2,0,4,1,0,0,25,38],"label":0} +{"features":[32,2,188245,1,7,4,8,4,2,0,0,0,40,38],"label":0} +{"features":[49,3,201498,11,9,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[33,5,313729,12,14,4,9,1,4,1,0,0,60,38],"label":0} +{"features":[19,2,172893,15,10,4,3,3,4,0,0,0,30,38],"label":0} +{"features":[41,2,252058,9,13,4,0,1,4,1,0,0,40,38],"label":0} +{"features":[39,2,188540,11,9,0,3,1,4,1,0,0,45,38],"label":0} +{"features":[47,2,168232,9,13,2,0,0,4,1,7298,0,40,38],"label":1} +{"features":[58,2,199278,9,13,0,3,1,4,1,0,0,38,38],"label":0} +{"features":[41,2,104334,15,10,2,11,0,4,1,0,0,50,38],"label":1} +{"features":[24,2,281221,9,13,4,0,2,1,0,0,0,40,35],"label":0} +{"features":[23,2,197613,15,10,4,0,1,4,0,0,0,40,38],"label":0} +{"features":[33,2,229716,11,9,0,0,1,4,1,0,0,38,38],"label":0} +{"features":[30,2,255279,11,9,0,0,4,4,0,0,0,20,38],"label":0} +{"features":[25,2,282063,5,4,2,5,0,4,1,0,0,40,25],"label":0} +{"features":[40,2,105936,9,13,0,9,1,4,0,0,0,40,38],"label":0} +{"features":[39,2,32146,15,10,4,2,1,4,1,0,0,40,38],"label":0} +{"features":[29,2,118230,11,9,4,11,1,4,0,0,0,35,38],"label":0} +{"features":[43,5,115005,11,9,0,12,1,4,0,0,0,40,38],"label":0} +{"features":[26,2,190469,9,13,4,12,1,4,1,0,0,40,38],"label":0} +{"features":[35,2,347491,8,11,4,2,1,4,1,0,0,40,38],"label":0} +{"features":[23,2,45834,9,13,4,3,1,4,0,0,0,50,38],"label":0} +{"features":[20,2,237305,15,10,4,6,2,2,0,0,0,35,38],"label":0} +{"features":[48,2,160647,15,10,4,3,1,4,0,0,0,40,20],"label":1} +{"features":[31,2,241885,11,9,4,4,4,4,1,0,0,45,38],"label":0} +{"features":[47,2,108510,0,6,2,11,0,4,1,0,0,65,38],"label":0} +{"features":[55,0,189985,15,10,0,0,4,2,0,0,0,40,38],"label":0} +{"features":[23,2,201145,11,9,4,2,1,4,1,0,0,65,38],"label":0} +{"features":[45,2,167187,9,13,4,9,1,4,0,0,0,40,38],"label":1} +{"features":[63,3,272425,8,11,2,3,0,4,1,0,0,40,38],"label":1} +{"features":[41,2,49797,11,9,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[30,2,381153,11,9,4,2,1,4,1,0,0,40,38],"label":0} +{"features":[33,2,170148,11,9,0,0,4,4,0,0,0,45,38],"label":0} +{"features":[27,2,113054,11,9,5,6,1,4,1,0,0,43,38],"label":0} +{"features":[62,2,319582,11,9,6,11,1,4,0,0,0,32,38],"label":0} +{"features":[24,2,289448,8,11,4,0,3,1,0,0,0,40,29],"label":0} +{"features":[44,2,277488,15,10,2,6,0,4,1,3103,0,40,38],"label":1} +{"features":[25,2,371987,11,9,0,0,1,4,0,0,0,40,38],"label":0} +{"features":[39,2,509060,15,10,0,7,1,4,1,0,0,40,38],"label":0} +{"features":[17,2,211870,6,5,4,7,1,4,1,0,0,6,38],"label":0} +{"features":[29,2,131088,11,9,4,5,3,4,1,0,0,25,38],"label":0} +{"features":[42,5,222884,9,13,0,0,1,4,1,0,0,40,38],"label":0} +{"features":[25,2,124590,11,9,4,3,2,4,1,0,0,40,38],"label":0} +{"features":[60,2,88055,0,6,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[23,2,184255,11,9,2,11,5,4,0,0,0,40,38],"label":0} +{"features":[28,2,66434,0,6,4,7,4,4,0,0,0,15,38],"label":0} +{"features":[31,2,118551,6,5,0,0,1,4,0,0,0,40,38],"label":0} +{"features":[41,4,26598,11,9,0,2,1,4,1,0,0,40,38],"label":0} +{"features":[28,2,157391,9,13,4,11,3,4,0,0,0,40,38],"label":0} +{"features":[45,4,275445,9,13,0,3,4,4,1,0,0,50,38],"label":0} +{"features":[19,2,100999,9,13,4,9,3,4,0,0,0,30,38],"label":0} +{"features":[19,4,206599,15,10,4,7,3,4,0,0,0,22,38],"label":0} +{"features":[25,1,197728,9,13,4,3,1,4,0,0,0,20,38],"label":0} +{"features":[48,2,123075,10,16,2,9,0,4,1,0,0,45,38],"label":1} +{"features":[37,1,117760,8,11,4,10,1,4,1,4650,0,40,38],"label":0} +{"features":[44,2,230684,9,13,2,3,0,4,1,7688,0,50,38],"label":1} +{"features":[24,2,22201,11,9,2,10,0,1,1,0,0,40,36],"label":0} +{"features":[62,4,159939,11,9,2,4,0,4,1,0,0,35,38],"label":0} +{"features":[57,1,118481,9,13,2,9,0,4,1,0,1902,40,38],"label":1} +{"features":[51,2,239155,8,11,0,7,1,4,1,0,0,40,38],"label":0} +{"features":[37,2,67125,11,9,0,11,1,4,1,0,0,60,38],"label":0} +{"features":[19,2,255161,11,9,4,11,3,4,1,0,0,25,38],"label":0} +{"features":[30,2,243841,11,9,0,7,2,1,0,0,0,40,34],"label":0} +{"features":[27,2,91501,11,9,2,12,5,4,0,0,0,40,38],"label":0} +{"features":[60,2,232242,11,9,2,11,0,4,1,0,0,40,38],"label":0} +{"features":[26,2,104746,11,9,2,2,0,4,1,5013,0,60,38],"label":0} +{"features":[19,2,72355,15,10,4,7,1,4,1,0,0,20,38],"label":0} +{"features":[22,2,203182,9,13,4,3,4,4,0,0,0,30,38],"label":0} +{"features":[50,5,173020,15,10,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[17,2,276718,11,9,4,0,3,4,1,0,0,20,38],"label":0} +{"features":[61,1,95450,9,13,2,3,0,4,1,5178,0,50,38],"label":1} +{"features":[28,2,312588,0,6,0,7,1,4,0,0,0,40,38],"label":0} +{"features":[22,2,284317,7,12,4,0,1,4,0,0,0,40,38],"label":0} +{"features":[35,2,185325,9,13,2,9,0,4,1,0,0,50,38],"label":1} +{"features":[40,2,149466,11,9,0,5,1,2,1,0,0,35,38],"label":0} +{"features":[32,2,114746,11,9,5,5,4,1,0,0,0,60,34],"label":0} +{"features":[23,4,208503,15,10,0,0,3,4,1,0,0,40,38],"label":0} +{"features":[33,2,290763,15,10,4,11,1,4,0,0,0,40,38],"label":0} +{"features":[34,2,37646,7,12,2,2,0,4,1,0,0,65,38],"label":0} +{"features":[47,2,334039,9,13,2,3,0,4,1,7298,0,44,38],"label":1} +{"features":[51,2,219599,11,9,2,6,5,4,0,0,0,40,38],"label":0} +{"features":[36,2,206521,11,9,4,6,1,4,1,0,0,40,38],"label":0} +{"features":[46,2,45288,9,13,4,7,1,4,1,0,0,40,38],"label":0} +{"features":[17,2,60562,6,5,4,7,3,4,0,0,0,20,38],"label":0} +{"features":[47,3,79627,14,15,0,9,1,4,1,27828,0,50,38],"label":1} +{"features":[31,2,213002,2,8,4,11,1,4,1,4650,0,50,38],"label":0} +{"features":[23,1,210029,15,10,4,0,3,4,0,0,0,20,38],"label":0} +{"features":[53,2,79324,11,9,2,2,0,4,1,0,0,40,38],"label":1} +{"features":[50,2,137815,11,9,2,13,0,4,1,0,0,60,38],"label":1} +{"features":[23,1,157331,9,13,4,9,1,4,0,0,0,40,38],"label":0} +{"features":[45,2,43479,15,10,2,13,0,4,1,0,0,48,38],"label":0} +{"features":[38,2,183279,15,10,2,3,0,4,1,0,0,44,38],"label":1} +{"features":[41,4,150533,14,15,2,9,0,4,1,0,0,50,38],"label":1} +{"features":[32,2,27856,15,10,4,0,1,4,0,0,0,40,38],"label":0} +{"features":[44,2,123983,9,13,0,7,1,1,1,0,0,40,2],"label":0} +{"features":[38,2,198216,15,10,0,3,4,4,0,0,0,40,38],"label":0} +{"features":[42,2,33002,11,9,2,3,0,4,1,0,0,48,38],"label":0} +{"features":[43,2,115562,9,13,2,9,0,4,1,0,0,42,38],"label":1} +{"features":[34,2,300687,11,9,2,2,0,2,1,0,0,40,38],"label":0} +{"features":[48,2,287480,12,14,2,12,0,4,1,0,0,40,38],"label":1} +{"features":[61,2,146788,5,4,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[29,2,452205,11,9,0,7,4,4,0,0,0,36,38],"label":0} +{"features":[23,2,182812,15,10,4,7,3,4,0,0,0,40,5],"label":0} +{"features":[48,2,192791,11,9,2,6,0,4,1,0,0,40,38],"label":0} +{"features":[68,3,182131,15,10,2,3,0,4,1,10605,0,20,38],"label":1} +{"features":[23,2,200973,11,9,4,0,1,4,0,0,0,40,38],"label":0} +{"features":[45,3,271901,11,9,2,11,0,4,1,0,0,32,38],"label":1} +{"features":[22,2,110946,15,10,4,7,1,4,0,0,0,40,38],"label":0} +{"features":[49,2,206947,11,9,0,0,1,4,0,0,0,40,38],"label":0} +{"features":[25,2,154863,11,9,4,0,4,2,1,0,0,35,38],"label":0} +{"features":[56,2,102106,11,9,2,5,0,4,1,0,0,40,38],"label":0} +{"features":[53,2,120839,2,8,0,4,3,4,1,0,0,40,38],"label":0} +{"features":[29,5,106972,12,14,4,9,1,4,0,0,0,35,38],"label":0} +{"features":[60,2,227468,15,10,6,10,1,2,0,0,0,40,38],"label":0} +{"features":[25,2,179462,5,4,4,5,4,4,1,0,0,40,38],"label":0} +{"features":[46,2,201595,11,9,2,13,0,4,1,0,0,70,38],"label":0} +{"features":[17,2,137042,0,6,4,9,3,4,1,0,0,20,38],"label":0} +{"features":[50,4,213654,11,9,2,11,0,2,1,0,0,40,38],"label":0} +{"features":[54,5,119565,9,13,2,3,0,4,1,0,0,40,32],"label":1} +{"features":[28,2,60288,11,9,4,0,3,4,0,0,0,40,38],"label":0} +{"features":[34,2,229732,8,11,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[22,2,133833,15,10,4,7,3,4,0,0,0,25,38],"label":0} +{"features":[29,2,290740,7,12,4,8,1,4,0,0,0,50,38],"label":0} +{"features":[49,2,123584,1,7,2,13,0,4,1,0,0,75,38],"label":0} +{"features":[40,2,206066,11,9,2,2,0,4,1,0,0,50,38],"label":0} +{"features":[38,2,183279,15,10,2,2,0,4,1,0,0,43,38],"label":0} +{"features":[34,2,287737,15,10,2,3,5,4,0,0,1485,40,38],"label":1} +{"features":[52,2,90189,5,4,0,8,3,2,0,0,0,16,38],"label":0} +{"features":[51,2,128143,15,10,2,2,0,4,1,0,0,40,38],"label":1} +{"features":[20,2,184779,15,10,4,12,3,4,0,0,0,20,38],"label":0} +{"features":[28,2,54243,11,9,0,13,1,4,1,0,0,60,38],"label":0} +{"features":[21,2,213015,11,9,4,5,2,2,1,2176,0,40,38],"label":0} +{"features":[43,2,240504,11,9,2,5,0,4,1,0,0,40,38],"label":0} +{"features":[43,2,236985,11,9,2,2,0,2,1,0,0,40,38],"label":0} +{"features":[43,2,154538,7,12,0,2,1,4,1,0,0,40,38],"label":0} +{"features":[33,2,159247,9,13,2,9,0,4,1,0,0,40,38],"label":1} +{"features":[35,2,171327,11,9,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[36,2,342642,12,14,4,3,1,4,1,0,0,15,38],"label":0} +{"features":[50,2,34233,11,9,2,4,0,4,1,0,0,50,38],"label":0} +{"features":[26,2,196805,15,10,2,13,0,2,1,0,0,65,38],"label":0} +{"features":[27,2,262478,11,9,4,4,3,2,1,0,0,30,38],"label":0} +{"features":[34,2,184147,11,9,5,11,4,2,0,0,0,20,38],"label":0} +{"features":[36,2,29984,2,8,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[44,2,210525,9,13,2,9,0,4,1,0,0,40,38],"label":1} +{"features":[51,2,237729,15,10,0,0,4,4,0,0,0,40,38],"label":0} +{"features":[32,4,173854,9,13,0,9,2,4,1,0,0,35,38],"label":1} +{"features":[23,4,184370,11,9,0,7,1,4,0,0,0,40,38],"label":0} +{"features":[49,2,281647,12,14,2,3,0,4,1,0,0,45,38],"label":1} +{"features":[61,2,54373,15,10,2,11,0,4,1,0,0,40,38],"label":0} +{"features":[41,2,154194,11,9,4,11,3,4,0,0,0,40,38],"label":0} +{"features":[30,2,48829,11,9,4,11,1,4,0,0,1602,30,38],"label":0} +{"features":[52,1,255927,15,10,6,0,1,4,0,0,0,24,38],"label":0} +{"features":[41,2,120277,9,13,2,9,0,4,1,0,0,40,38],"label":1} +{"features":[39,2,129495,15,10,5,0,4,2,0,0,0,40,38],"label":0} +{"features":[30,2,310889,15,10,4,5,1,4,1,0,0,55,38],"label":0} +{"features":[72,2,284080,3,2,0,7,1,2,1,0,0,40,38],"label":0} +{"features":[27,2,132191,11,9,4,2,1,4,1,0,0,40,38],"label":0} +{"features":[45,2,49298,9,13,4,12,3,4,1,0,0,40,38],"label":0} +{"features":[42,2,106900,8,11,4,12,1,4,1,0,0,40,38],"label":0} +{"features":[23,2,140462,11,9,4,6,3,4,1,0,0,40,38],"label":0} +{"features":[37,2,272950,11,9,0,2,1,4,1,0,0,40,38],"label":0} +{"features":[43,5,345969,14,15,2,9,0,4,1,0,0,50,38],"label":1} +{"features":[46,2,318259,8,11,0,12,2,4,0,0,0,36,38],"label":0} +{"features":[32,2,296282,9,13,2,11,0,4,1,0,0,40,38],"label":0} +{"features":[20,2,238685,15,10,4,7,1,4,0,0,0,32,38],"label":0} +{"features":[21,2,197583,15,10,4,0,3,4,0,0,0,20,38],"label":0} +{"features":[34,2,342709,12,14,2,3,0,4,1,0,0,40,38],"label":0} +{"features":[27,1,209109,12,14,4,9,3,4,1,0,0,35,38],"label":0} +{"features":[38,2,331395,5,4,2,4,0,4,1,3942,0,84,31],"label":0} +{"features":[41,1,107327,8,11,0,9,4,4,0,0,0,40,38],"label":0} +{"features":[47,4,237731,11,9,2,4,0,4,1,2829,0,65,38],"label":0} +{"features":[43,2,260761,11,9,2,6,0,4,1,0,0,40,25],"label":0} +{"features":[42,2,154374,9,13,2,3,0,4,1,0,2415,60,38],"label":1} +{"features":[27,2,243569,1,7,2,5,0,4,1,3942,0,40,38],"label":0} +{"features":[54,1,31533,12,14,2,0,0,4,1,7298,0,40,38],"label":1} +{"features":[37,2,36425,11,9,4,7,1,4,0,0,0,40,38],"label":0} +{"features":[46,5,192779,9,13,2,3,0,4,1,7688,0,40,38],"label":1} +{"features":[52,5,314627,12,14,0,9,1,1,0,0,0,40,38],"label":0} +{"features":[74,4,146929,11,9,2,11,0,4,1,0,0,55,38],"label":0} +{"features":[55,2,49996,1,7,4,6,1,2,0,0,0,40,38],"label":0} +{"features":[35,1,190964,9,13,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[66,2,185336,11,9,6,11,2,4,0,0,0,35,38],"label":0} +{"features":[51,1,175750,11,9,0,13,4,2,1,0,0,40,38],"label":0} +{"features":[56,2,219762,11,9,2,11,5,4,0,0,0,35,38],"label":0} +{"features":[33,2,155343,11,9,2,11,0,4,1,3103,0,40,38],"label":1} +{"features":[36,1,28996,11,9,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[46,2,98012,8,11,0,0,1,4,0,0,0,40,38],"label":0} +{"features":[50,4,105010,11,9,2,4,0,4,1,0,2051,20,38],"label":0} +{"features":[52,2,29658,11,9,2,0,0,4,1,0,0,40,38],"label":0} +{"features":[56,2,275236,9,13,2,6,0,4,1,0,0,40,38],"label":0} +{"features":[29,2,161155,7,12,2,9,0,4,1,0,0,50,38],"label":0} +{"features":[20,2,235442,15,10,4,7,1,4,1,0,0,35,38],"label":0} +{"features":[30,2,206051,11,9,2,13,0,4,1,0,0,40,38],"label":0} +{"features":[55,2,37438,8,11,2,2,0,4,1,0,0,40,38],"label":1} +{"features":[60,2,162947,4,3,0,6,1,4,0,0,0,40,32],"label":0} +{"features":[39,2,147548,11,9,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[50,2,159650,15,10,2,12,0,4,1,0,0,60,38],"label":1} +{"features":[35,2,86648,14,15,2,9,0,4,1,7688,0,50,38],"label":1} +{"features":[24,5,61737,9,13,4,9,1,4,1,0,0,40,38],"label":0} +{"features":[33,1,70164,9,13,4,9,1,0,1,0,0,60,38],"label":0} +{"features":[39,2,129597,9,13,2,11,0,4,1,3464,0,40,38],"label":0} +{"features":[27,0,47907,9,13,4,0,1,4,0,0,0,40,38],"label":0} +{"features":[39,2,150061,12,14,0,3,4,2,0,15020,0,60,38],"label":1} +{"features":[51,2,55507,11,9,2,2,0,2,1,0,0,40,38],"label":0} +{"features":[53,0,271544,11,9,2,0,0,2,1,0,1977,40,38],"label":1} +{"features":[22,2,188950,15,10,4,12,3,4,1,0,0,40,38],"label":0} +{"features":[44,2,252202,11,9,0,0,1,4,0,0,0,40,38],"label":0} +{"features":[42,2,173590,15,10,2,0,0,4,1,0,1628,40,38],"label":0} +{"features":[33,2,105370,11,9,0,10,1,4,1,0,0,70,38],"label":0} +{"features":[46,2,162030,11,9,6,0,4,4,0,0,0,43,38],"label":0} +{"features":[19,2,86150,1,7,4,11,3,1,0,0,0,19,29],"label":0} +{"features":[18,2,25837,1,7,4,9,3,4,1,0,0,15,38],"label":0} +{"features":[62,4,173631,15,10,2,3,0,4,1,0,0,70,38],"label":0} +{"features":[81,2,100675,3,2,2,9,0,4,1,0,0,15,30],"label":0} +{"features":[24,5,184216,15,10,4,0,3,4,0,0,0,40,38],"label":0} +{"features":[20,2,38001,15,10,4,7,3,4,0,0,0,20,38],"label":0} +{"features":[18,2,123714,1,7,4,5,1,2,1,0,0,40,38],"label":0} +{"features":[21,2,256356,1,7,4,8,2,4,0,0,0,40,25],"label":0} +{"features":[30,2,75573,9,13,4,3,1,4,0,0,0,45,10],"label":0} +{"features":[53,2,31588,9,13,2,9,0,4,1,0,0,52,38],"label":1} +{"features":[45,2,265097,11,9,2,7,0,4,1,0,1902,40,38],"label":1} +{"features":[61,5,159908,1,7,6,7,4,4,0,0,0,32,38],"label":1} +{"features":[24,3,142404,9,13,2,3,0,4,1,0,0,40,38],"label":1} +{"features":[29,2,55390,7,12,4,12,1,4,1,0,0,45,38],"label":0} +{"features":[20,2,49179,15,10,4,9,1,4,1,0,0,35,38],"label":0} +{"features":[31,2,209448,0,6,2,4,0,4,1,2105,0,40,25],"label":0} +{"features":[54,2,138944,11,9,2,11,0,4,1,0,0,44,38],"label":0} +{"features":[24,2,181820,15,10,4,0,3,4,1,0,0,40,38],"label":0} +{"features":[46,2,101430,1,7,0,5,4,2,0,0,0,40,38],"label":0} +{"features":[27,2,238859,8,11,4,2,1,4,1,0,0,40,38],"label":0} +{"features":[19,2,318822,15,10,4,0,2,4,0,0,0,40,38],"label":0} +{"features":[30,2,174789,7,12,2,3,0,4,1,0,1848,50,38],"label":1} +{"features":[17,2,146268,0,6,4,7,3,4,0,0,0,10,38],"label":0} +{"features":[58,2,142158,9,13,0,3,4,4,0,0,0,35,38],"label":0} +{"features":[42,2,510072,11,9,2,2,0,4,1,0,0,40,38],"label":1} +{"features":[32,2,257043,11,9,4,0,1,4,0,0,0,42,38],"label":0} +{"features":[58,2,127264,0,6,2,2,0,4,1,0,0,50,38],"label":0} +{"features":[27,2,93021,11,9,4,0,4,3,0,0,0,40,38],"label":0} +{"features":[56,2,282023,14,15,2,9,0,4,1,0,0,45,38],"label":1} +{"features":[35,2,162601,11,9,0,0,4,4,0,0,0,40,38],"label":0} +{"features":[41,4,147110,11,9,2,6,0,4,1,0,0,25,38],"label":0} +{"features":[45,2,72844,11,9,0,3,1,4,0,0,0,46,38],"label":0} +{"features":[36,3,306156,15,10,2,11,0,4,1,15024,0,60,38],"label":1} +{"features":[32,1,286101,11,9,4,13,4,2,0,0,0,37,38],"label":0} +{"features":[35,3,202027,15,10,0,3,1,4,1,0,0,60,38],"label":0} +{"features":[24,2,174461,9,13,4,11,1,4,0,0,0,50,38],"label":0} +{"features":[39,1,189911,1,7,0,0,4,4,0,0,0,40,38],"label":0} +{"features":[57,4,95280,15,10,2,11,0,4,1,99999,0,45,38],"label":1} +{"features":[24,1,249101,11,9,0,10,4,2,0,0,0,40,38],"label":0} +{"features":[36,2,749636,15,10,0,0,4,4,0,0,0,40,38],"label":0} +{"features":[35,2,187119,15,10,0,3,1,4,0,0,0,70,38],"label":0} +{"features":[19,2,184207,15,10,4,11,1,4,1,0,0,40,38],"label":0} +{"features":[42,2,176286,7,12,2,3,0,4,1,0,0,40,38],"label":1} +{"features":[51,4,35295,11,9,4,4,4,4,1,0,0,45,38],"label":0} +{"features":[44,2,165599,11,9,2,6,0,4,1,0,0,48,38],"label":0} +{"features":[29,2,162312,8,11,4,6,1,3,1,0,0,40,38],"label":0} +{"features":[36,5,137421,8,11,2,12,0,1,1,0,0,37,16],"label":0} +{"features":[41,5,100800,12,14,0,9,1,4,1,0,0,35,38],"label":0} +{"features":[66,2,142723,4,3,3,5,4,4,0,0,0,40,32],"label":0} +{"features":[28,2,199903,9,13,4,0,1,4,0,0,0,20,38],"label":0} +{"features":[38,2,210438,5,4,0,11,4,4,0,0,0,40,38],"label":0} +{"features":[39,2,216149,14,15,0,9,1,4,1,0,0,70,38],"label":1} +{"features":[34,2,355571,11,9,0,6,4,2,0,0,0,40,38],"label":0} +{"features":[52,4,42984,14,15,2,9,0,4,1,0,0,70,38],"label":1} +{"features":[52,2,226084,11,9,6,8,2,4,0,0,0,40,38],"label":0} +{"features":[29,4,229842,11,9,4,13,4,2,1,0,0,45,38],"label":0} +{"features":[40,4,29036,15,10,4,6,1,4,1,0,0,35,38],"label":0} +{"features":[36,2,102864,11,9,4,6,3,4,0,0,0,40,38],"label":0} +{"features":[27,4,334132,7,12,4,9,1,4,0,0,0,78,38],"label":0} +{"features":[65,2,172906,11,9,6,0,4,4,0,0,0,40,38],"label":0} +{"features":[41,2,163287,11,9,2,9,0,4,1,7688,0,43,38],"label":1} +{"features":[41,4,83411,11,9,2,3,0,4,1,0,0,40,38],"label":1} +{"features":[45,3,160440,11,9,0,3,1,4,1,0,0,42,38],"label":0} +{"features":[65,2,143554,15,10,5,0,1,4,0,0,0,38,38],"label":0} +{"features":[49,2,242987,9,13,2,9,0,4,1,0,0,40,3],"label":0} +{"features":[25,2,166971,11,9,2,11,0,4,1,0,0,52,38],"label":0} +{"features":[28,4,204984,9,13,4,12,1,4,1,0,0,45,38],"label":0} +{"features":[24,2,267706,15,10,4,2,3,4,0,0,0,45,38],"label":0} +{"features":[20,0,191878,15,10,4,0,3,2,0,0,0,20,38],"label":0} +{"features":[33,5,175023,11,9,2,10,0,4,1,0,0,37,38],"label":0} +{"features":[23,2,179423,9,13,4,0,1,4,0,0,0,5,38],"label":0} +{"features":[78,3,188044,9,13,2,3,0,4,1,0,2392,40,38],"label":1} +{"features":[30,2,427474,6,5,2,7,0,4,1,0,0,40,25],"label":0} +{"features":[55,4,189933,5,4,2,4,0,4,1,0,0,50,38],"label":0} +{"features":[20,2,219211,15,10,4,7,3,4,1,0,0,20,38],"label":0} +{"features":[30,2,87561,7,12,4,12,1,4,0,0,0,40,38],"label":0} +{"features":[38,2,203836,11,9,2,11,0,4,1,3464,0,40,3],"label":0} +{"features":[34,2,157289,15,10,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[30,2,175856,12,14,2,9,0,4,1,0,0,38,38],"label":0} +{"features":[40,2,240124,11,9,2,3,0,4,1,0,0,40,38],"label":1} +{"features":[39,2,201410,9,13,2,13,0,4,1,0,1977,45,29],"label":1} +{"features":[42,2,190179,9,13,2,9,0,4,1,99999,0,40,38],"label":1} +{"features":[47,2,357848,11,9,2,2,0,4,1,0,0,40,38],"label":1} +{"features":[33,2,120201,11,9,0,0,3,3,0,0,0,65,38],"label":0} +{"features":[29,2,170301,11,9,2,0,5,4,0,2829,0,40,38],"label":0} +{"features":[35,2,183898,8,11,2,3,0,4,1,7298,0,50,38],"label":1} +{"features":[45,2,123681,11,9,2,11,0,4,1,0,0,40,38],"label":1} +{"features":[33,2,169496,9,13,2,3,0,4,1,0,0,50,38],"label":1} +{"features":[34,2,152246,11,9,2,13,0,0,1,0,0,52,38],"label":0} +{"features":[47,3,101926,9,13,0,3,1,4,1,0,0,70,38],"label":1} +{"features":[30,2,142977,15,10,0,2,1,4,1,0,0,65,38],"label":0} +{"features":[34,2,260560,11,9,2,6,0,4,1,0,0,40,38],"label":0} +{"features":[39,2,315291,11,9,4,0,4,2,0,0,0,40,38],"label":0} +{"features":[24,2,306779,8,11,4,3,3,4,1,0,0,35,38],"label":0} +{"features":[47,2,339863,11,9,2,11,0,4,1,0,0,45,38],"label":1} +{"features":[77,4,71676,15,10,6,0,1,4,0,0,1944,1,38],"label":0} +{"features":[53,2,250034,9,13,2,3,0,2,1,0,0,50,38],"label":1} +{"features":[33,2,91666,2,8,0,3,1,4,1,0,0,40,38],"label":0} +{"features":[36,2,113397,11,9,2,5,0,4,1,0,0,40,38],"label":0} +{"features":[51,2,56915,11,9,2,2,0,0,1,0,0,40,38],"label":0} +{"features":[17,2,99462,1,7,4,7,3,0,0,0,0,20,38],"label":0} +{"features":[44,5,167265,12,14,2,9,0,4,1,0,0,60,38],"label":1} +{"features":[43,2,124919,11,9,2,7,0,1,1,0,0,60,23],"label":0} +{"features":[35,2,247750,11,9,6,7,4,2,1,0,0,40,38],"label":0} +{"features":[46,1,36228,11,9,2,2,0,4,1,0,1902,40,38],"label":0} +{"features":[39,0,314822,15,10,2,0,0,2,1,0,0,40,38],"label":0} +{"features":[38,2,168407,15,10,0,0,4,4,0,5721,0,44,38],"label":0} +{"features":[50,2,105010,9,13,2,4,0,4,1,0,0,45,38],"label":1} +{"features":[47,2,72880,12,14,4,9,1,4,0,0,0,40,38],"label":0} +{"features":[47,4,318593,11,9,2,3,0,4,1,0,0,25,38],"label":0} +{"features":[26,2,201481,9,13,4,3,1,4,0,0,0,40,38],"label":0} +{"features":[36,2,139743,15,10,6,9,3,4,0,0,0,40,38],"label":0} +{"features":[46,2,216934,9,13,0,0,1,4,1,0,0,40,31],"label":0} +{"features":[17,1,191910,1,7,4,11,3,4,1,0,0,20,38],"label":0} +{"features":[19,2,229431,15,10,4,9,3,4,1,0,0,11,38],"label":0} +{"features":[36,2,43712,0,6,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[41,2,320984,14,15,2,9,0,4,1,99999,0,65,38],"label":1} +{"features":[51,2,126010,11,9,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[41,0,564135,12,14,2,3,0,4,1,0,0,40,38],"label":1} +{"features":[37,2,305259,7,12,0,3,1,4,0,0,0,48,38],"label":0} +{"features":[41,2,320744,11,9,4,2,1,4,1,3325,0,50,38],"label":0} +{"features":[45,2,166929,1,7,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[57,3,123053,14,15,2,9,0,1,1,15024,0,50,18],"label":1} +{"features":[32,2,154120,11,9,2,13,0,4,1,7298,0,40,38],"label":1} +{"features":[48,2,109832,12,14,2,9,0,4,1,0,1902,40,38],"label":1} +{"features":[45,3,84324,7,12,2,9,0,4,1,0,0,50,38],"label":1} +{"features":[24,2,233280,7,12,4,11,3,4,0,0,0,37,38],"label":0} +{"features":[43,1,174491,11,9,0,12,1,2,0,0,0,40,38],"label":0} +{"features":[26,2,39014,2,8,2,8,5,3,0,0,0,40,5],"label":0} +{"features":[48,2,273828,4,3,4,5,1,4,1,0,0,40,25],"label":0} +{"features":[53,2,53197,12,14,2,9,0,4,1,3103,0,40,38],"label":1} +{"features":[34,2,286020,11,9,2,6,0,4,1,0,0,45,38],"label":0} +{"features":[48,2,235646,15,10,2,11,0,4,1,3103,0,40,38],"label":1} +{"features":[61,2,160942,12,14,2,11,0,4,1,3103,0,50,38],"label":0} +{"features":[42,4,177937,9,13,3,3,1,4,1,0,0,45,30],"label":0} +{"features":[37,2,98941,12,14,4,3,1,4,1,0,0,40,38],"label":1} +{"features":[32,2,169589,8,11,2,5,0,4,1,0,0,40,38],"label":1} +{"features":[35,2,219902,11,9,5,13,4,2,0,0,0,48,38],"label":0} +{"features":[38,2,107125,15,10,4,11,1,4,1,0,0,60,38],"label":0} +{"features":[59,2,453067,15,10,2,9,0,4,1,0,0,36,38],"label":1} +{"features":[43,2,222971,4,3,4,6,4,4,0,0,0,40,25],"label":0} +{"features":[34,2,294064,12,14,2,3,0,4,1,0,0,50,9],"label":0} +{"features":[21,2,56582,1,7,4,7,3,4,1,0,0,50,38],"label":0} +{"features":[61,2,166124,11,9,2,2,0,4,1,0,0,40,38],"label":1} +{"features":[32,2,107218,9,13,4,0,1,1,1,0,0,40,38],"label":0} +{"features":[72,2,56559,11,9,2,11,0,4,1,0,0,12,38],"label":0} +{"features":[45,2,198759,10,16,2,3,0,4,1,0,0,60,38],"label":0} +{"features":[38,2,119741,12,14,2,2,0,2,1,0,0,40,38],"label":1} +{"features":[26,2,117217,9,13,0,7,1,4,0,0,0,45,38],"label":0} +{"features":[48,2,115585,9,13,2,11,0,4,1,0,0,40,38],"label":0} +{"features":[22,5,311512,15,10,2,7,0,2,1,0,0,15,38],"label":0} +{"features":[34,2,164190,15,10,2,9,0,4,1,0,1902,38,38],"label":1} +{"features":[37,2,387430,15,10,2,0,0,4,1,0,0,37,38],"label":0} +{"features":[62,2,214288,11,9,2,6,0,4,1,0,0,40,38],"label":0} +{"features":[28,2,190911,11,9,2,2,0,4,1,0,0,40,38],"label":0} +{"features":[35,2,267798,11,9,0,2,4,4,1,0,0,40,38],"label":0} +{"features":[28,2,204516,0,6,4,13,1,4,1,0,0,45,38],"label":0} +{"features":[19,2,125591,1,7,4,7,1,4,0,0,0,40,38],"label":0} +{"features":[31,2,113364,7,12,2,6,0,4,1,0,0,55,38],"label":0} +{"features":[64,2,133166,11,9,2,3,0,4,1,0,0,5,38],"label":0} +{"features":[21,2,178255,15,10,4,0,1,4,0,0,0,30,3],"label":0} +{"features":[21,2,116788,11,9,4,2,3,4,1,0,0,40,38],"label":0} +{"features":[20,2,141481,1,7,2,11,2,4,0,0,0,50,38],"label":0} +{"features":[33,2,138142,15,10,5,7,4,2,0,0,0,25,38],"label":0} +{"features":[25,2,254613,11,9,4,2,3,4,1,0,0,40,4],"label":0} +{"features":[54,4,200960,9,13,2,11,0,4,1,0,0,50,38],"label":1} +{"features":[24,2,200593,11,9,2,5,0,4,1,0,0,50,38],"label":0} +{"features":[62,2,200332,11,9,2,6,0,4,1,0,0,40,38],"label":0} +{"features":[20,4,197207,11,9,0,11,1,4,0,0,0,30,38],"label":0} +{"features":[53,2,133436,5,4,0,6,1,4,0,0,0,40,38],"label":0} +{"features":[17,4,228786,0,6,4,7,3,4,0,0,0,24,38],"label":0} +{"features":[27,2,404421,15,10,4,5,1,2,1,0,0,40,38],"label":0} +{"features":[55,2,61708,11,9,2,0,0,4,1,6418,0,50,38],"label":1} +{"features":[21,2,147655,11,9,4,0,3,4,0,0,0,40,38],"label":0} +{"features":[35,1,103966,12,14,0,0,4,4,0,0,0,41,38],"label":0} diff --git a/sagemaker_model_monitor/index.rst b/sagemaker_model_monitor/index.rst index 740ae6db8b..571e924259 100644 --- a/sagemaker_model_monitor/index.rst +++ b/sagemaker_model_monitor/index.rst @@ -33,10 +33,12 @@ Visualization visualization/SageMaker-Model-Monitor-Visualize -Detect post-training data and model bias +Model Bias and Model Explainability ======================================== .. toctree:: :maxdepth: 1 /sagemaker_model_monitor/fairness_and_explainability/SageMaker-Model-Monitor-Fairness-and-Explainability + /sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Bias-Drift-for-Endpoint + /sagemaker_model_monitor/fairness_and_explainability_jsonlines/SageMaker-Monitoring-Feature-Attribution-Drift-for-Endpoint