[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference #39992

FeixLiu · 2022-02-28T08:40:52Z

PR types

Others

PR changes

Others

Describe

Add entrance of fleet executor into AnalysisPredictor. Add some helper methods to init NCCL environment for distributed inference.

To use the fleet executor for inference, these configures should be set in DistConfig

bool use_dist_model_ // whether use DistModel or not
std::vector<std::string> trainer_endpoints_ // all trainers' endpoints
std::string current_endpoint_ // current trainer's endpoint
int64_t nranks_ // total ranks (number of trainers)
int64_t local_rank_ // local rank
std::string comm_init_config_ // converter config path (used to init the comm)

Note that, use_dist_model_ muse be set true by calling

EnableDistModel(true);

nranks and rank are set simultaneously by calling

SetRanks(int64_t nranks, int64_t, ranks);

trainer_endpoints and current_endpoint are also set simultaneously by calling

SetEndpoints(std::vector<std::string> trainer_endpoints, std::string current_endpoint);

DistConfig should be set to AnalysisConfig by calling

SetDistConfig(dConfig);

The converter config should some sections like this:

[ring_id -> ranks]
0,0,1,2,3,4,5,6,7
1,0,1,2,3
2,4,5,6,7
21,0,1
22,1,2
23,2,3
24,3,4
25,4,5
26,5,6
27,6,7
[rank -> ring_ids]
0,0,1,21
1,0,1,21,22
2,0,1,22,23
3,0,1,23,24
4,0,2,24,25
5,0,2,25,26
6,0,2,26,27
7,0,2,27

This reverts commit 1f80a52.

This reverts commit fd6d650.

This reverts commit 852c91f.

paddle-bot-old · 2022-02-28T08:40:56Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle/fluid/inference/api/analysis_predictor.h

paddle/fluid/inference/api/analysis_predictor.cc

paddle/fluid/inference/api/analysis_predictor.h

paddle/fluid/distributed/fleet_executor/carrier.cc

wangxicoding

LGTM

gongweibao

Don't use VLOG(3) for all output logs while you will see redundancy logs you don't need and
var's name should express its clear meaning.

paddle/fluid/distributed/fleet_executor/carrier.cc

python/paddle/fluid/executor.py

gongweibao · 2022-03-01T02:07:19Z

Add a concise description of PR like #37725

gongweibao

LGTM

paddle/fluid/inference/api/analysis_predictor.cc

qingqing01

LGTM

shangzhizhou

LGTM

Superjomn

LGTM

FeixLiu added 6 commits February 25, 2022 17:23

Add entrance of the DistModel in the Analysis Predictor.

852c91f

fix compile error

fd6d650

refine code

1f80a52

Revert "refine code"

02d487d

This reverts commit 1f80a52.

Revert "fix compile error"

1b252ae

This reverts commit fd6d650.

Revert "Add entrance of the DistModel in the Analysis Predictor."

adf3923

This reverts commit 852c91f.

qingqing01 requested review from gongweibao, shangzhizhou, liyanyanliyanyan, wangxicoding, qingqing01 and Superjomn February 28, 2022 08:55

qingqing01 reviewed Feb 28, 2022

View reviewed changes

FeixLiu requested a review from qingqing01 February 28, 2022 09:36

wangxicoding reviewed Feb 28, 2022

View reviewed changes

paddle/fluid/distributed/fleet_executor/carrier.cc Outdated Show resolved Hide resolved

wangxicoding previously approved these changes Feb 28, 2022

View reviewed changes

gongweibao requested changes Mar 1, 2022

View reviewed changes

paddle/fluid/distributed/fleet_executor/carrier.cc Outdated Show resolved Hide resolved

paddle/fluid/distributed/fleet_executor/carrier.cc Outdated Show resolved Hide resolved

python/paddle/fluid/executor.py Show resolved Hide resolved

sneaxiy dismissed wangxicoding’s stale review via 77b733f March 1, 2022 01:55

FeixLiu requested a review from gongweibao March 1, 2022 02:04

wangxicoding requested a review from TeslaZhao March 1, 2022 02:46

wangxicoding previously approved these changes Mar 1, 2022

View reviewed changes

FeixLiu dismissed wangxicoding’s stale review via bc81b32 March 1, 2022 06:55

gongweibao previously approved these changes Mar 1, 2022

View reviewed changes

qingqing01 reviewed Mar 1, 2022

View reviewed changes

paddle/fluid/inference/api/analysis_predictor.cc Outdated Show resolved Hide resolved

FeixLiu dismissed gongweibao’s stale review via b87b3fb March 1, 2022 09:16

FeixLiu closed this Mar 1, 2022

FeixLiu reopened this Mar 1, 2022

New connection method for fleet executor

fd41be6

FeixLiu force-pushed the update_for_connect branch from b6f9ec3 to fd41be6 Compare March 1, 2022 09:27

qingqing01 approved these changes Mar 1, 2022

View reviewed changes

PaddlePaddle locked and limited conversation to collaborators Mar 1, 2022

PaddlePaddle unlocked this conversation Mar 1, 2022

FeixLiu closed this Mar 1, 2022

FeixLiu reopened this Mar 1, 2022

shangzhizhou approved these changes Mar 1, 2022

View reviewed changes

wangxicoding approved these changes Mar 2, 2022

View reviewed changes

FeixLiu changed the title ~~[fleet_executor] Update for connect~~ [fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference Mar 2, 2022

Superjomn approved these changes Mar 2, 2022

View reviewed changes

leiqing1 approved these changes Mar 2, 2022

View reviewed changes

FeixLiu merged commit 244ae31 into PaddlePaddle:develop Mar 2, 2022

FeixLiu deleted the update_for_connect branch March 2, 2022 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference #39992

[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference #39992

FeixLiu commented Feb 28, 2022 •

edited

Loading

paddle-bot-old bot commented Feb 28, 2022

wangxicoding left a comment

gongweibao left a comment

gongweibao commented Mar 1, 2022 •

edited

Loading

gongweibao left a comment

qingqing01 left a comment

shangzhizhou left a comment

Superjomn left a comment

[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference #39992

[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference #39992

Conversation

FeixLiu commented Feb 28, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Feb 28, 2022

wangxicoding left a comment

Choose a reason for hiding this comment

gongweibao left a comment

Choose a reason for hiding this comment

gongweibao commented Mar 1, 2022 • edited Loading

gongweibao left a comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

shangzhizhou left a comment

Choose a reason for hiding this comment

Superjomn left a comment

Choose a reason for hiding this comment

FeixLiu commented Feb 28, 2022 •

edited

Loading

gongweibao commented Mar 1, 2022 •

edited

Loading