NVIDIA · nvkevlu · May 31, 2023 · May 26, 2023 · May 26, 2023 · May 26, 2023
diff --git a/docs/examples/tensorboard_streaming.rst b/docs/examples/tensorboard_streaming.rst
@@ -9,7 +9,7 @@ Introduction
 In this exercise, you will learn how to stream TensorBoard events from the clients
 to the server in order to visualize live training metrics from a central place on the server.
 
-This exercise will be working with the ``tensorboard-streaming`` example in the examples folder,
+This exercise will be working with the ``tensorboard-streaming`` example in the advanced examples folder under experiment-tracking,
 which builds upon :doc:`hello_pt` by adding TensorBoard streaming.
 
 The setup of this exercise consists of one **server** and two **clients**.
@@ -42,7 +42,7 @@ Adding TensorBoard Streaming to Configurations
 
 Inside the config folder there are two files, ``config_fed_client.json`` and ``config_fed_server.json``.
 
-.. literalinclude:: ../../examples/advanced/experiment-tracking/tensorboard-streaming/jobs/tensorboard-streaming/app/config/config_fed_client.json
+.. literalinclude:: ../../examples/advanced/experiment-tracking/tensorboard/jobs/tensorboard-streaming/app/config/config_fed_client.json
    :language: json
    :linenos:
    :caption: config_fed_client.json
@@ -60,7 +60,7 @@ which converts local events to federated events.
 This changes the event ``analytix_log_stats`` into a fed event ``fed.analytix_log_stats``,
 which will then be streamed from the clients to the server.
 
-.. literalinclude:: ../../examples/advanced/experiment-tracking/tensorboard-streaming/jobs/tensorboard-streaming/app/config/config_fed_server.json
+.. literalinclude:: ../../examples/advanced/experiment-tracking/tensorboard/jobs/tensorboard-streaming/app/config/config_fed_server.json
    :language: json
    :linenos:
    :caption: config_fed_server.json
@@ -83,7 +83,7 @@ In this exercise, all of the TensorBoard code additions will be made in ``pt_lea
 
 First we must initialize our TensorBoard writer to the ``AnalyticsSender`` we defined in the client config:
 
-.. literalinclude:: ../../examples/advanced/experiment-tracking/tensorboard-streaming/jobs/tensorboard-streaming/app/custom/pt_learner.py
+.. literalinclude:: ../../examples/advanced/experiment-tracking/tensorboard/jobs/tensorboard-streaming/app/custom/pt_learner.py
    :language: python
    :lines: 103-106
    :lineno-start: 103
@@ -98,7 +98,7 @@ but we can also define it in the client config to be passed into the constructor
 Now that our TensorBoard writer is set to ``AnalyticsSender``,
 we can write and stream training metrics to the server in ``local_train()``:
 
-.. literalinclude:: ../../examples/advanced/experiment-tracking/tensorboard-streaming/jobs/tensorboard-streaming/app/custom/pt_learner.py
+.. literalinclude:: ../../examples/advanced/experiment-tracking/tensorboard/jobs/tensorboard-streaming/app/custom/pt_learner.py
    :language: python
    :lines: 144-174
    :lineno-start: 144
@@ -160,4 +160,4 @@ Congratulations!
 Now you will be able to see the live training metrics of each client from a central place on the server.
 
 The full source code for this exercise can be found in
-`examples/advanced/experiment-tracking/tensorboard-streaming <https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/experiment-tracking/tensorboard-streaming>`_.
+`examples/advanced/experiment-tracking/tensorboard <https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/experiment-tracking/tensorboard>`_.
diff --git a/examples/advanced/experiment-tracking/mlflow/README.md b/examples/advanced/experiment-tracking/mlflow/README.md
@@ -0,0 +1,84 @@
+# Hello PyTorch with MLflow
+
+Example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) to train an image classifier
+using federated averaging ([FedAvg](https://arxiv.org/abs/1602.05629)) and [PyTorch](https://pytorch.org/)
+as the deep learning training framework.
+
+This example also highlights the MLflow streaming capability from the clients to the server.
+
+> **_NOTE:_** This example uses the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset and will load its data within the trainer code.
+
+### 1. Install requirements and configure PYTHONPATH
+
+Install additional requirements:
+
+```
+python -m pip install -r requirements.txt
+```
+
+Set `PYTHONPATH` to include custom files of this example:
+```
+export PYTHONPATH=${PWD}/../pt
+```
+
+### 2. Run the experiment
+
+Use nvflare simulator to run the example:
+
+```
+nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 ./jobs/hello-pt-mlflow
+```
+
+### 3. Access the logs and results
+
+You can find the running logs and results inside the simulator's workspace in a directory named "simulate_job".
+
+```bash
+$ ls /tmp/nvflare/simulate_job/
+app_server  app_site-1  app_site-2  log.txt tb_events
+
+```
+
+By default, MLflow will create an experiment log directory under a directory named "mlruns" in the simulator's workspace.
+
+### 4. MLflow Streaming
+
+For the job `hello-pt-mlflow`, on the client side, the client code in `PTLearner` uses the syntax for mlflow (to make it easier to use code already using tracking with MLflow):
+
+```
+self.writer.log_metrics({"train_loss": cost.item(), "running_loss": running_loss}, current_step)
+
+self.writer.log_metric("validation_accuracy", metric, epoch)
+
+self.writer.log_text(f"last running_loss reset at '{len(self.train_loader) * epoch + i}' step", "running_loss_reset.txt")
+```
+
+The `MLflowWriter` actually mimics the mlflow to send the information in events to the server through NVFlare events
+of type `analytix_log_stats` for the server to write the data to the MLflow tracking server.
+
+The `ConvertToFedEvent` widget turns the event `analytix_log_stats` into a fed event `fed.analytix_log_stats`,
+which will be delivered to the server side.
+
+On the server side, the `MLflowReceiver` is configured to process `fed.analytix_log_stats` events,
+which writes received data from these events to the MLflow tracking server.
+
+This allows for the server to be the only party that needs to deal with authentication for the MLflow tracking server, and the server
+can buffer the events from many clients to better manage the load of requests to the tracking server.
+
+Note that the server also has `TBAnalyticsReceiver` configured, which also listens to `fed.analytix_log_stats` events by default,
+so the data is also written into TB files on the server.
+
+### 5. Tensorboard Streaming with MLflow
+
+For the job `hello-pt-tb-mlflow`, on the client side, the client code in `PTLearner` uses the syntax for Tensorboard:
+
+```
+self.writer.add_scalar("train_loss", cost.item(), current_step)
+
+self.writer.add_scalar("validation_accuracy", metric, epoch)
+```
+
+The `TBWriter` mimics Tensorboard SummaryWriter and streams events over to the server side instead.
+
+Note that in this job, the server still has `MLflowReceiver` and `TBAnalyticsReceiver` configured the same as in the job with `MLflowWriter`
+on the client side, and the events are converted by the `MLflowReceiver` to write to the MLflow tracking server.
diff --git a/examples/advanced/experiment-tracking/mlflow/experiment_tracking.ipynb b/examples/advanced/experiment-tracking/mlflow/experiment_tracking.ipynb
@@ -0,0 +1,180 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e129ede5",
+   "metadata": {},
+   "source": [
+    "   # Hello PyTorch with MLflow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9bf7e391",
+   "metadata": {},
+   "source": [
+    "In this example, we like to demonstrate that the example code used in hello-pt-tb with PyTorch Tensorboard tracking can be simply switched to using an MLflow tracking server without changing the code.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18ec76f4",
+   "metadata": {},
+   "source": [
+    "\n",
+    "Example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) to train an image classifier using federated averaging ([FedAvg]([FedAvg](https://arxiv.org/abs/1602.05629))) and [PyTorch](https://pytorch.org/) as the deep learning training framework. This example also highlights the streaming capability from the clients to the server with Tensorboard SummaryWriter sender syntax, but with a MLflow receiver\n",
+    "\n",
+    "> **_NOTE:_** This example uses the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset and will load its data within the trainer code.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bbca0050",
+   "metadata": {},
+   "source": [
+    "### 1. Install NVIDIA FLARE\n",
+    "\n",
+    "Follow the [Installation](https://nvflare.readthedocs.io/en/main/getting_started.html#installation) instructions.\n",
+    "Install additional requirements:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e2b5579b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install torch torchvision tensorboard mlflow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8226dd7",
+   "metadata": {},
+   "source": [
+    "### 2. Change Configuration\n",
+    "\n",
+    "in fed_server_config.json\n",
+    "\n",
+    "add the following to the components\n",
+    "```\n",
+    "{\n",
+    "      \"id\": \"mlflow_receiver\",\n",
+    "      \"path\": \"nvflare.app_opt.tracking.mlflow.mlflow_receiver.MLflowReceiver\",\n",
+    "      \"args\": {\n",
+    "        \"kwargs\": {\"experiment_name\": \"hello-pt-experiment\"},\n",
+    "        \"artifact_location\": \"artifacts\"\n",
+    "      }\n",
+    "}\n",
+    "```\n",
+    "This indicates that we are registering the MLflow Receiver in additional to the Tensorboard Receiver.\n",
+    "\n",
+    "Note that the job hello-pt-mlflow is an example using mlflow syntax and the MLflowWriter on the client side, and\n",
+    "hello-pt-tb-mlflow has the learner using tb syntax. Both work with MLflowReceiver.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fe3165d",
+   "metadata": {},
+   "source": [
+    "\n",
+    "### 3. Run the experiment\n",
+    "\n",
+    "Use nvflare simulator to run the examples with the additional common python files included in the python path:\n",
+    "\n",
+    "export PYTHONPATH=${PWD}/../pt\n",
+    "\n",
+    "```\n",
+    "nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 ./jobs/hello-pt-tb-mlflow\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c8f08cef",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 ./jobs/hello-pt-tb-mlflow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b1fe44d",
+   "metadata": {},
+   "source": [
+    "### 4. Tensorboard Tracking\n",
+    "\n",
+    "On the client side, we are still using the TensorBoard SummaryWriter as the `AnalyticsSender`. \n",
+    "\n",
+    "Instead of writing to TB files, it actually generates NVFLARE events of type `analytix_log_stats`.\n",
+    "The `ConvertToFedEvent` widget will turn the event `analytix_log_stats` into a fed event `fed.analytix_log_stats`,\n",
+    "which will be delivered to the server side.\n",
+    "\n",
+    "On the server side, the `TBAnalyticsReceiver` is configured to process `fed.analytix_log_stats` events,\n",
+    "which writes received TB data into appropriate TB files on the server.\n",
+    "\n",
+    "To view training metrics that are being streamed to the server, run:\n",
+    "\n",
+    "```\n",
+    "tensorboard --logdir=/tmp/nvflare/simulate_job/tb_events\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "534d7879",
+   "metadata": {},
+   "source": [
+    "### 5. MLflow tracking\n",
+    "\n",
+    "On the server side, we also configured `MLflowReceiver` to process `fed.analytix_log_stats` events,\n",
+    "which writes received events to the MLflow tracking server.\n",
+    "\n",
+    "To view training metrics that are being streamed to the server, run:\n",
+    "\n",
+    "```\n",
+    "mlflow ui --backend-store-uri=/tmp/nvflare/mlruns\n",
+    "```\n",
+    "\n",
+    "Then \n",
+    "\n",
+    "Look at the URL in browser http://localhost:5000/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "da1e7952-c3e6-4e90-a42e-648a823ede78",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!mlflow ui --backend-store-uri=/tmp/nvflare/mlruns"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "nvflare_example",
+   "language": "python",
+   "name": "nvflare_example"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/...dvanced/experiment-tracking/mlflow/jobs/hello-pt-mlflow/app/config/config_fed_client.json b/...dvanced/experiment-tracking/mlflow/jobs/hello-pt-mlflow/app/config/config_fed_client.json
@@ -0,0 +1,45 @@
+{
+  "format_version": 2,
+
+  "executors": [
+    {
+      "tasks": [
+        "train",
+        "submit_model",
+        "validate"
+      ],
+      "executor": {
+        "id": "Executor",
+        "path": "nvflare.app_common.executors.learner_executor.LearnerExecutor",
+        "args": {
+          "learner_id": "pt_learner"
+        }
+      }
+    }
+  ],
+  "task_result_filters": [
+  ],
+  "task_data_filters": [
+  ],
+  "components": [
+    {
+      "id": "pt_learner",
+      "path": "pt_learner.PTLearner",
+      "args": {
+        "lr": 0.01,
+        "epochs": 5,
+        "analytic_sender_id": "log_writer"
+      }
+    },
+    {
+      "id": "log_writer",
+      "path": "nvflare.app_opt.tracking.mlflow.mlflow_writer.MLflowWriter",
+      "args": {"event_type": "analytix_log_stats"}
+    },
+    {
+      "id": "event_to_fed",
+      "name": "ConvertToFedEvent",
+      "args": {"events_to_convert": ["analytix_log_stats"], "fed_event_prefix": "fed."}
+    }
+  ]
+}