Add Parquet Format to Store Nomalized Data #131

zhangpengshan · 2015-06-01T16:02:24Z

In Shifu, nomarlied data needs some space to store. To save space and improve performance, try ParquetStorer and ParquetInputFormat in normalize.pig and trianing input.

zhangpengshan · 2015-07-08T04:54:13Z

Parquet format data is added into Shifu 0.2.7 branch.
When set 'isParquet' to true in ModelConfig#normalize, then run shifu norm and shifu train, data will be saved and read as parquet format.

zhangpengshan · 2015-07-08T04:55:40Z

If norm by parquet format ('isParquet'=true) but train as non parquet('isParquet'=false). Exception will be throw in trainning step, please be aware that should be consistent.

zhangpengshan · 2015-07-08T04:56:12Z

I wouldn't close this ticket now as I need to do several refactor work

zhangpengshan · 2015-08-05T05:08:06Z

Although is is faster to read parquet data in training process(not too much if drop 5% variables), but more time spent on norm step.
Space saving is about 5X.

Done in develop-0.2.7

Close this issue but need further improve speed in norm

zhangpengshan added this to the Shifu 0.2.7 milestone Jun 1, 2015

zhangpengshan self-assigned this Jul 8, 2015

zhangpengshan pushed a commit that referenced this issue Jul 8, 2015

Add parquet format to support norm output: #131

fc8678c

zhangpengshan closed this as completed Aug 5, 2015

MiniZhuwei pushed a commit that referenced this issue Jul 17, 2018

Add parquet format to support norm output: #131

6b5e5c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Parquet Format to Store Nomalized Data #131

Add Parquet Format to Store Nomalized Data #131

zhangpengshan commented Jun 1, 2015

zhangpengshan commented Jul 8, 2015

zhangpengshan commented Jul 8, 2015

zhangpengshan commented Jul 8, 2015

zhangpengshan commented Aug 5, 2015

Add Parquet Format to Store Nomalized Data #131

Add Parquet Format to Store Nomalized Data #131

Comments

zhangpengshan commented Jun 1, 2015

zhangpengshan commented Jul 8, 2015

zhangpengshan commented Jul 8, 2015

zhangpengshan commented Jul 8, 2015

zhangpengshan commented Aug 5, 2015