-
Notifications
You must be signed in to change notification settings - Fork 178
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
formatting and unit tests fix 2, update tb_receiver.py to ignore events from mlflow 3. add notebook for interactive examples git basic code working with documentation update README.md
- Loading branch information
1 parent
2628d7c
commit 02ba157
Showing
55 changed files
with
2,484 additions
and
803 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
# Experimental Tracking | ||
|
||
## Tools, Sender and Receivers | ||
|
||
Through several example, we will show how to track and visualize experiments in real time, compare results | ||
by leverage several experiment tracking tools. | ||
* Tensorboard | ||
* MLFlow | ||
* Weights and Biases | ||
* **Note**: user needs to signup at Weights and Biases to access service, NVFLARE will not provide the access | ||
|
||
During Federated Learning phase, user can choose and API syntax that they are used to, | ||
such as APIs from one of above tools, the NVFLARE has developed component that mimic these tool's APIs. | ||
These components are called experiment tracking LogWriters. All clients experiment logs are streamed over FL server, | ||
where the actual experiment logs are recorded. The components that receive these logs are called Receivers. | ||
The receiver component will leverage the experiment tracking tool and record the logs during the experiment run. | ||
|
||
In normal setting, we would have pairs of sender and receivers, such as | ||
* TBWriter <-> TBReceiver | ||
* MLFlowWriter <-> MLFlowReceiver | ||
* WandBWriter <-> WandBReceiver | ||
|
||
We could also mix-match for any pairs. This allows one write the ML code using one API, | ||
but be able to switch different experiment tracking tool(s), as matter of facts, one can use many receivers for the | ||
same log data send from one sender. | ||
|
||
![Experiment Tracking writers and Receivers](experiment_tracking.jpg) | ||
|
||
## Experiment logs streaming | ||
|
||
On the client side, When writer write the metrics, instead of writing to files, it actually generates NVFLARE events | ||
of type `analytix_log_stats`. The `ConvertToFedEvent` widget will turn the local event `analytix_log_stats` into a | ||
fed event `fed.analytix_log_stats`, which will be delivered to the server side. | ||
|
||
On the server side, the `Receiver` is configured to process `fed.analytix_log_stats` events, | ||
which writes received log data into appropriate end points. | ||
|
||
## Support custom experiment tracking tools | ||
|
||
There are many different experiment tracking tools, you might want to write a different writer and/or receiver for your needs. | ||
|
||
There are three things to consider develop a custom experiment tracking tool. | ||
|
||
**Data Type** | ||
|
||
Currently, we choose to support metrics, params, and text data type. If you require other data type, may sure you add | ||
the AnalyticsDataType | ||
|
||
**Writer** | ||
|
||
implement LogWriter Interface to specify the API syntax | ||
|
||
**Receiver** | ||
|
||
Implement AnalyticsReceiver Interface, and determine how to represent different sites' log. In all three implementations | ||
(Tensorboard, MLFlow, WandB), each site's log is represented as one run. Depending on the individual tool, the implementation | ||
can be different. For example, Both Tensorboard and MLFLow, we simple create different runs for each client and map to the | ||
site-name. In WandB implementation, we have to leverage multiprocess and let each run in different process. | ||
|
||
## Examples Overview | ||
|
||
We illustrate how to leverage different writers and receivers via different examples. | ||
All examples will leverage the example hello-pt. | ||
|
||
[hello-pt-tb](hello-pt-tb) | ||
* The first example shows how to use Tensorboard Tracking Tool ( both sender and receiver) | ||
* The second example shows how to use Tensorboard Sender only, while the receivers are MLFlow | ||
![tb](tb.png) | ||
|
||
[hello-pt-tb](hello-pt-mlflow) | ||
* The first example shows how to use MLFlow Tracking Tool ( both sender and receiver) | ||
* The second example shows how to use MLFlow Sender only, while the receivers are Tensorboard | ||
|
||
![mlflow_1](mlflow_2.png) | ||
![mlflow_1](mlflow_1.png) | ||
|
||
[hello-pt-wandb](hello-pt-wandb) | ||
* The example shows how to use Weights & Biases Tracking Tool ( both sender and receiver) | ||
|
||
|
||
![wandb_1](wandb_1.png) | ||
|
||
## Setup for all examples | ||
|
||
Example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.html) to train an image classifier using federated averaging ([FedAvg]([FedAvg](https://arxiv.org/abs/1602.05629))) and [PyTorch](https://pytorch.org/) as the deep learning training framework. This example also highlights the TensorBoard streaming capability from the clients to the server. | ||
|
||
> **_NOTE:_** This example uses the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset and will load its data within the trainer code. | ||
### 1. Prepare venv. | ||
|
||
``` | ||
python -m venv ~/nvflare-hello | ||
source ~/nvflare-hello/bin/activate | ||
``` | ||
|
||
|
||
### 2. Install NVIDIA FLARE | ||
|
||
Follow the [Installation](https://nvflare.readthedocs.io/en/main/quickstart.html) instructions. | ||
|
||
### 3. Install Jupyter Notebook | ||
|
||
``` | ||
pip install notebook | ||
``` | ||
|
||
### 4. Examples in Notebook | ||
|
||
For detailed examples explaination, please check the notebook | ||
|
||
``` | ||
jupyter notebook experiment_tracking.ipynb | ||
``` | ||
|
||
|
||
|
Oops, something went wrong.