Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add v2 dist benchmark vgg #7539

Merged

Conversation

typhoonzero
Copy link
Contributor

No description provided.

@helinwang helinwang self-requested a review January 18, 2018 04:39

| Batch Size | 32 | 64 | 128 | 256 |
| -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | 247.40 | - | - |
Copy link
Contributor

@helinwang helinwang Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seesm fluid's performance is 247.40/64=3.866 batch per second, and v2's performance is 256.14/128=2.001 batch per second.
Seems the different is huge, do you have an idea why? (also could you please check if my math is correct).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, wrong column. I'll update this PR with full test result.

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we should think about how to
avoid duplication with the same content of PaddleCloud.

@helinwang
Copy link
Contributor

Thanks! Looks like we have a nice improvement over V2 on batch size 256!

#RUN mkdir -p /workspace
#ADD reader.py /workspace/
#RUN python /workspace/reader.py
FROM python:2.7.14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得既然是测试,最好不用这个而是用paddle:dev。

  • 不用安装其他的依赖
  • 调试的时候进入容器可以用各种命令查看系统的状态。

RUN pip install /*.whl && rm -f /*.whl
ENV LD_LIBRARY_PATH=/usr/local/lib
ADD reader.py /workspace/
RUN python /workspace/reader.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个基本上下载不下来,所以需要加提示,提示用户使用代理。

- name: TOPOLOGY
value: ""
- name: ENTRY
value: "cd /workspace && MKL_NUM_THREADS=1 python /workspace/vgg16_v2.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python -u,强制输出日志。

- name: TOPOLOGY
value: ""
- name: ENTRY
value: "python train.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python -u,强制输出日志。

| PaddlePaddle v2 | 15.97 | 17.04 | 17.60 | 17.83 |
| TensorFlow | - | - | - | - |

### different batch size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different batch size
=>
Different Batch Size

| TensorFlow | - | - | - | - |


### Accelerate rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accelerate Rate

| PaddlePaddle v2 (need more tests) | 326.85 | 534.58 | 853.30 | 1041.99 |
| TensorFlow | - | - | - | - |

### different pserver number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different PServer Count

| TensorFlow | - | - | - | - |


### Accelerate rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yancey1989
Yancey1989 previously approved these changes Feb 1, 2018
Copy link
Contributor

@Yancey1989 Yancey1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and please refine the titles with the web-site: http://www.titlecase.com


- Trainer Count: 60
- Batch Size: 128
- Metrics: mini-batch / sec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini-batch / sec

Do you mean samples / sec ?


## Enable verbos logs

Edit `pserver.yaml` and `trainer.yaml` and add an environment variable `GLOG_v=3` to see what happend in detail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we need to add GLOG_logtostderr=1, if you have tested it, please ignore this comment.

RUN pip install -U kubernetes opencv-python && apt-get update -y && apt-get install -y iputils-ping libgtk2.0-dev
# NOTE: By default CI built wheel packages turn WITH_DISTRIBUTE=OFF,
# so we must build one with distribute support to install in this image.
RUN pip install paddlepaddle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pip install is redundant? Move the dataset download after line12 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in order to make debugging faster, lines below changes much, and download dataset is slow, so add this line.

@@ -1,3 +1,16 @@
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyright message is duplicated.

Copy link
Contributor

@Yancey1989 Yancey1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants