Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add min_pool_size, Add default value of should_shuffle #70

Merged
merged 1 commit into from
Sep 19, 2016

Conversation

reyoung
Copy link
Collaborator

@reyoung reyoung commented Sep 13, 2016

  • min_pool_size would be infinite by default.
    • add unittest for min_pool_size
  • Fix bug in can_over_batch_size
    • add unittest for can_over_batch_size
  • Add DEFINE_PROVIDER_EX
  • Add default value of should_shuffle
    • When training, the default value of should_shuffle is True.
    • When testing, the default value of should_shuffle is False.
    • User a set a provider should_shuffle or not by pass it to @provider
    • should_shuffle can handle a list of value, not just boolean
  • Add input order mapping by using name
    • Add unittest
  • Add check to check input format.
    • Default is close for speed reason.
    • User could stop train when check error, or continue train without
      this train sample.
  • use deque instead of vector in generators pool, make erase
    generator faster.
  • Add chinese/english documentation
  • Make should shuffle = false in unittest
  • Add python files to depends.

DataProvider* DataProvider::create(const DataConfig& config,
const ModelConfig& modelConfig,
bool useGpu) {
return registrar_.createByType(config.type(), config, modelConfig, useGpu);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add ModelConfig in DataProvider::create to get input layer order

@reyoung reyoung force-pushed the fix_can_over_batch_size branch 4 times, most recently from d20681d to 78170c3 Compare September 13, 2016 15:15
return dp;\
});\
})

Copy link
Collaborator

@emailweixu emailweixu Sep 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add more comment

@emailweixu
Copy link
Collaborator

Also please update the data provider documentation

@reyoung reyoung force-pushed the fix_can_over_batch_size branch 2 times, most recently from aab1c00 to 0210938 Compare September 14, 2016 11:48
@reyoung
Copy link
Collaborator Author

reyoung commented Sep 14, 2016

@emailweixu Update codes. Add chinese docs. Engligh document will be added asap.

@emailweixu
Copy link
Collaborator

Need to fix test

@reyoung
Copy link
Collaborator Author

reyoung commented Sep 18, 2016

@emailweixu The unittest error before, is because we didn't disable shuffle when unittest and this patch set the min_pool_size to unlimited. It makes data shuffle correctly, and influences the unittest before.

Also add english documentation.

* cache is a data cache strategy, see `cache`_.
* Init_hook function is invoked once the data provider is initialized,
see `init_hook`_.
.. autofunction:: paddle.trainer.PyDataProvider2.provider
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we use paddle.trainer.PyDataProvider2.provider's comments as documentation.

* min_pool_size would be infinite by default.
  * add unittest for min_pool_size
* Fix bug in can_over_batch_size
  * add unittest for can_over_batch_size
* Add DEFINE_PROVIDER_EX
* Add default value of should_shuffle
  * When training, the default value of should_shuffle is True.
  * When testing, the default value of should_shuffle is False.
  * User a set a provider should_shuffle or not by pass it to `@provider`
  * should_shuffle can handle a list of value, not just boolean
* Add input order mapping by using name
  * Add unittest
* Add check to check input format.
  * Default is close for speed reason.
  * User could stop train when check error, or continue train without
    this train sample.
* use deque instead of vector in generators pool, make erase
  generator faster.
* Add chinese/english documentation
* Make should shuffle = false in unittest
* Add python files to depends.
@emailweixu emailweixu merged commit 90b9cba into PaddlePaddle:master Sep 19, 2016
@reyoung reyoung deleted the fix_can_over_batch_size branch September 22, 2016 04:48
thisjiang pushed a commit to thisjiang/Paddle that referenced this pull request Oct 28, 2021
* refactor lower function

* refine LoweredFunc code gen

* add const support
gglin001 added a commit to graphcore/Paddle-fork that referenced this pull request Dec 8, 2021
* add paddleIArray

* use final inherit, rm data_
wangxicoding pushed a commit to wangxicoding/Paddle that referenced this pull request Dec 9, 2021
* update paddlenlp usage

* update paddlelsim

* update readme

Co-authored-by: ceci3 <592712189@qq.com>
zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this pull request May 23, 2022
danleifeng added a commit to danleifeng/Paddle that referenced this pull request Jul 22, 2022
AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 19, 2022
Made several changes:
- an -> a
- Realse->Release
- Traning ->Training
- Unify application with noun.
zmxdream pushed a commit to zmxdream/Paddle that referenced this pull request Feb 10, 2023
qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this pull request Mar 3, 2023
qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this pull request Mar 3, 2023
lizexu123 pushed a commit to lizexu123/Paddle that referenced this pull request Feb 23, 2024
hanhaowen-mt pushed a commit to hanhaowen-mt/Paddle that referenced this pull request Feb 29, 2024
Fridge003 pushed a commit to Fridge003/Paddle that referenced this pull request Mar 15, 2024
add group_pattern_util.ShardableAxesProvider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants