group null judge fix #7122

TimeYWL · 2023-09-25T03:03:42Z

PR types

Bug fixes

PR changes

Others

Description

According to the api(paddle/distributed/communication/group.py:67:

def is_member(self):
    if self.rank < 0:
        return False
    if self.nranks < 2:
        return False
    return True

and the group build code:

if size > 1 and global_rank in ranks:
    rank = 0 if backend == 'heter' else ranks.index(global_rank)
    pg = _new_process_group_impl(
        backend,
        _default_store,
        rank,
        size,
        group_name,
        pg_options=None,
        group_id=gid,
    )
else:
    rank = -1
    pg = None

The code in PaddleNLP/paddlenlp/data/dist_dataloader.py, egg line 155:
if self.mp_group is not None and self.pp_rank == 0:
can not determine the existence of 'mp_group' correctly.

If there is no model_parallel, mp_group will be:
rank: -1, nranks: 1, id: 12, ranks: 0; name: _default_pg12
The broadcast_data_list() will cased error:

Traceback (most recent call last):
  File "run_pretrain.py", line 567, in <module>
    main()
  File "run_pretrain.py", line 549, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/workspace/PaddleNLP/paddlenlp/trainer/trainer.py", line 738, in train
    for step, inputs in enumerate(epoch_iterator):
  File "/workspace/PaddleNLP/paddlenlp/data/dist_dataloader.py", line 181, in __next__
    data_list = broadcast_data_list(data_list, paddle.int64, self.mp_rank, self.mp_group, self.mp_src_rank)
  File "/workspace/PaddleNLP/paddlenlp/data/dist_dataloader.py", line 210, in broadcast_data_list
    paddle.distributed.broadcast(size_cuda, src_rank, group=comm_group).wait()
AttributeError: 'NoneType' object has no attribute 'wait'

paddle-bot · 2023-09-25T03:03:45Z

Thanks for your contribution!

codecov · 2023-09-25T03:46:05Z

Codecov Report

Merging #7122 (4d043d1) into develop (9c3f8a4) will decrease coverage by 0.01%.
Report is 4 commits behind head on develop.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           develop    #7122      +/-   ##
===========================================
- Coverage    59.64%   59.64%   -0.01%     
===========================================
  Files          563      563              
  Lines        82644    82645       +1     
===========================================
  Hits         49291    49291              
- Misses       33353    33354       +1

Files	Coverage Δ
paddlenlp/data/dist_dataloader.py	`14.78% <0.00%> (ø)`

... and 4 files with indirect coverage changes

DesmonDay

LGTM

paddlenlp/data/dist_dataloader.py

group null judge fix

f9c7076

paddle-bot bot added the contributor label Sep 25, 2023

sijunhe requested a review from DesmonDay September 26, 2023 03:27

DesmonDay requested changes Sep 26, 2023

View reviewed changes

paddlenlp/data/dist_dataloader.py Show resolved Hide resolved

fix all

4d043d1

DesmonDay approved these changes Sep 27, 2023

View reviewed changes

DesmonDay merged commit 685d12b into PaddlePaddle:develop Sep 27, 2023
4 checks passed

ZHUI mentioned this pull request Jan 2, 2024

PaddleNLP 2.7.0 Release Note Candidate #7753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

group null judge fix #7122

group null judge fix #7122

TimeYWL commented Sep 25, 2023

paddle-bot bot commented Sep 25, 2023

codecov bot commented Sep 25, 2023 •

edited

Loading

DesmonDay left a comment

group null judge fix #7122

group null judge fix #7122

Conversation

TimeYWL commented Sep 25, 2023

PR types

PR changes

Description

paddle-bot bot commented Sep 25, 2023

codecov bot commented Sep 25, 2023 • edited Loading

Codecov Report

DesmonDay left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 25, 2023 •

edited

Loading