Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move DataLoader code to paddle.io #48699

Merged
merged 8 commits into from
May 11, 2023

Conversation

heavengate
Copy link
Contributor

@heavengate heavengate commented Dec 4, 2022

PR types

Others

PR changes

Others

Description

move DataLoader code to paddle.io

  • mv paddle.io.DataLoader code to paddle.io.reader
  • mv fluid/dataloader directory to paddle.io
  • copy multiprocess_utils.py to paddle.io

NOTE: code for DataLaoder.from_generator DataLoader.from_dataset remains under fluid.reader for these API is deprecated since Paddle 2.0

@paddle-bot
Copy link

paddle-bot bot commented Dec 4, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@zoooo0820 zoooo0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

经线下沟通,在这个PR中分离了旧的Dataloader和新的Dataloader相关的内容,并将新Dataloader及相关底层依赖迁移到了2.0中。旧Dataloader及相关依赖的移除,由于目前部分分布式及量化方向的功能代码中仍然有使用,需要先移除相关调用后才能进行

import numpy as np

from .dataset import IterableDataset
from .sampler import RandomSampler, Sampler, SequenceSampler

__all__ = ["BatchSampler", "DistributedBatchSampler"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

由于fluid公开API列表的定义问题,迁移到2.0的文件,原来__all__ 列表的内容需要清空。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

set(WITH_FLASHATTN ON)
endif()
endif()
# if(WITH_GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分是否不应当修改

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

from .sampler import RandomSampler
from .sampler import WeightedRandomSampler

__all__ = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里也需要__all__=[],API已在paddle/io/__init__.py中公开

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

# These value is used in getting data from another process
QUEUE_GET_TIMEOUT = 60

__all__ = ['DataLoader', 'default_collate_fn']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里__all__也需要置空

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

risemeup1
risemeup1 previously approved these changes May 10, 2023
Copy link
Contributor

@risemeup1 risemeup1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for setup.py

Copy link
Contributor

@zoooo0820 zoooo0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@risemeup1 risemeup1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs

@heavengate heavengate merged commit 793f3b9 into PaddlePaddle:develop May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants