Skip to content

Commit

Permalink
add mlflow_sender and examples
Browse files Browse the repository at this point in the history
formatting and unit tests fix
2, update tb_receiver.py to ignore events from mlflow
3. add notebook for interactive examples
git basic code working with documentation
update README.md
  • Loading branch information
chesterxgchen committed Jan 9, 2023
1 parent 2628d7c commit 02ba157
Show file tree
Hide file tree
Showing 55 changed files with 2,484 additions and 803 deletions.
14 changes: 12 additions & 2 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,21 @@ To get started with these examples, please follow the [Quickstart](https://nvfla
### 1.2 Deep Learning
* [Hello PyTorch](./hello-pt/README.md)
* Example using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) an image classifier using [FedAvg]([FedAvg](https://arxiv.org/abs/1602.05629)) and [PyTorch](https://pytorch.org/) as the deep learning training framework.
* [Hello PyTorch with TensorBoard](./hello-pt-tb/README.md)
* Example building upon [Hello PyTorch](./hello-pt/README.md) showcasing the [TensorBoard](https://tensorflow.org/tensorboard) streaming capability from the clients to the server.
* [Hello TensorFlow](./hello-tf2/README.md)
* Example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) an image classifier using [FedAvg]([FedAvg](https://arxiv.org/abs/1602.05629)) and [TensorFlow](https://tensorflow.org/) as the deep learning training framework.

## 2. Federated ML Experiment Tracking
* [Machine Learning Experiment Tracking via different tools](experiment-tracking/README.md)

All examples, building upon [Hello PyTorch](./hello-pt/README.md) streaming capability from the clients to the server.

* [Hello PyTorch with TensorBoard](experiment-tracking/hello-pt-tb)
* showcasing the [TensorBoard](https://tensorflow.org/tensorboard) support
* [Hello PyTorch with MLFlow](experiment-tracking/hello-pt-mflow)
* showcasing the [MLFlow](https://mlflow.org/) support
* [Hello PyTorch with Weights & Biases](experiment-tracking/hello-pt-wandb)
* showcasing the [WandB](https://wandb.ai) support

## 2. FL algorithms
* [Federated Learning with CIFAR-10](./cifar10/README.md)
* [Simulated Federated Learning with CIFAR-10](./cifar10/cifar10-sim/README.md)
Expand Down
117 changes: 117 additions & 0 deletions examples/experiment-tracking/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Experimental Tracking

## Tools, Sender and Receivers

Through several example, we will show how to track and visualize experiments in real time, compare results
by leverage several experiment tracking tools.
* Tensorboard
* MLFlow
* Weights and Biases
* **Note**: user needs to signup at Weights and Biases to access service, NVFLARE will not provide the access

During Federated Learning phase, user can choose and API syntax that they are used to,
such as APIs from one of above tools, the NVFLARE has developed component that mimic these tool's APIs.
These components are called experiment tracking LogWriters. All clients experiment logs are streamed over FL server,
where the actual experiment logs are recorded. The components that receive these logs are called Receivers.
The receiver component will leverage the experiment tracking tool and record the logs during the experiment run.

In normal setting, we would have pairs of sender and receivers, such as
* TBWriter <-> TBReceiver
* MLFlowWriter <-> MLFlowReceiver
* WandBWriter <-> WandBReceiver

We could also mix-match for any pairs. This allows one write the ML code using one API,
but be able to switch different experiment tracking tool(s), as matter of facts, one can use many receivers for the
same log data send from one sender.

![Experiment Tracking writers and Receivers](experiment_tracking.jpg)

## Experiment logs streaming

On the client side, When writer write the metrics, instead of writing to files, it actually generates NVFLARE events
of type `analytix_log_stats`. The `ConvertToFedEvent` widget will turn the local event `analytix_log_stats` into a
fed event `fed.analytix_log_stats`, which will be delivered to the server side.

On the server side, the `Receiver` is configured to process `fed.analytix_log_stats` events,
which writes received log data into appropriate end points.

## Support custom experiment tracking tools

There are many different experiment tracking tools, you might want to write a different writer and/or receiver for your needs.

There are three things to consider develop a custom experiment tracking tool.

**Data Type**

Currently, we choose to support metrics, params, and text data type. If you require other data type, may sure you add
the AnalyticsDataType

**Writer**

implement LogWriter Interface to specify the API syntax

**Receiver**

Implement AnalyticsReceiver Interface, and determine how to represent different sites' log. In all three implementations
(Tensorboard, MLFlow, WandB), each site's log is represented as one run. Depending on the individual tool, the implementation
can be different. For example, Both Tensorboard and MLFLow, we simple create different runs for each client and map to the
site-name. In WandB implementation, we have to leverage multiprocess and let each run in different process.

## Examples Overview

We illustrate how to leverage different writers and receivers via different examples.
All examples will leverage the example hello-pt.

[hello-pt-tb](hello-pt-tb)
* The first example shows how to use Tensorboard Tracking Tool ( both sender and receiver)
* The second example shows how to use Tensorboard Sender only, while the receivers are MLFlow
![tb](tb.png)

[hello-pt-tb](hello-pt-mlflow)
* The first example shows how to use MLFlow Tracking Tool ( both sender and receiver)
* The second example shows how to use MLFlow Sender only, while the receivers are Tensorboard

![mlflow_1](mlflow_2.png)
![mlflow_1](mlflow_1.png)

[hello-pt-wandb](hello-pt-wandb)
* The example shows how to use Weights & Biases Tracking Tool ( both sender and receiver)


![wandb_1](wandb_1.png)

## Setup for all examples

Example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) to train an image classifier using federated averaging ([FedAvg]([FedAvg](https://arxiv.org/abs/1602.05629))) and [PyTorch](https://pytorch.org/) as the deep learning training framework. This example also highlights the TensorBoard streaming capability from the clients to the server.

> **_NOTE:_** This example uses the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset and will load its data within the trainer code.
### 1. Prepare venv.

```
python -m venv ~/nvflare-hello
source ~/nvflare-hello/bin/activate
```


### 2. Install NVIDIA FLARE

Follow the [Installation](https://nvflare.readthedocs.io/en/main/quickstart.html) instructions.

### 3. Install Jupyter Notebook

```
pip install notebook
```

### 4. Examples in Notebook

For detailed examples explaination, please check the notebook

```
jupyter notebook experiment_tracking.ipynb
```



Loading

0 comments on commit 02ba157

Please sign in to comment.