DATASET SPLITTING #1323
-
Hi everybody, In particular, what I'd like to do is to run hello-pt-tb example or cifar10 example and, assuming for example to have 2 clients, start a training in which client1 has access to half dataset and client2 has access to the other half. The idea I got was to download the dataset and split it manually, so I run the command python3 ../pt/utils/cifar10_download_data.py then I saw that the dataset has been downloaded into the /tmp folder, so I moved the content of the downloaded folder into another folder called dataset (maybe I could have splitted there without copying its content). So into the folder named dataset I splitted the content (as subfolder number 1 has some of the five binary files named data_batch... while the subfolder number 2 has the other). Now my question is, how and where I have to explicit the client that the dataset they have to use is, for number1 the subfolder 1 and for client 2 the subfolder number 2? I tried both in simulator, poc and real-world cases but I cannot find where to explicit this in any of this case. Maybe my idea of manually split the dataset in two folder is wrong and there's a simple or better way to do this, so let me know. I noticed that there's an example named hello tf 2 that does a similar things in the class SimpleTrainer into the file Trainer.py but that case doesn't fit with the next steps I'd like to do (after a training with dataset splitted in 2 I'd like to upload and use my own dataset splitted in 2, this is my final task and what I'm trying to reach at the end, so if you have advice also on this point let me know). In any case thank you for your support. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @FabioNotaro2001 thanks for raising the question. Assuming you are using NVFlare 2.2.X, for hello-pt-tb example, you can check this file: https://github.com/NVIDIA/NVFlare/blob/2.2/examples/hello-pt-tb/app/custom/pt_learner.py#L44 So we know it has this argument of Let's assume we have 2 clients site.
And for site-2 we could have:
But then there is a question of how can we have different "config_fed_client.json" for different clients? You need to copy the app to app_server, app_site-1, app_site-2, and then modify the config_fed_client.json inside there.
This would achieve what you want to do. |
Beta Was this translation helpful? Give feedback.
Hi @FabioNotaro2001 thanks for raising the question.
Assuming you are using NVFlare 2.2.X, for hello-pt-tb example, you can check this file: https://github.com/NVIDIA/NVFlare/blob/2.2/examples/hello-pt-tb/app/custom/pt_learner.py#L44
So we know it has this argument of
data_path
.Let's assume we have 2 clients site.
For site-1 we could have the config_fed_client.json that has the following section:
And for site-2 we could have: