PaddlePaddle · will-am · Jan 9, 2018 · Jan 9, 2018
diff --git a/README.cn.md b/README.cn.md
@@ -98,7 +98,7 @@ PaddlePaddle提供了丰富的运算单元，帮助大家以模块化的方式
 
 图像相比文字能够提供更加生动、容易理解及更具艺术感的信息，是人们转递与交换信息的重要来源。图像分类是根据图像的语义信息对不同类别图像进行区分，是计算机视觉中重要的基础问题，也是图像检测、图像分割、物体跟踪、行为分析等其他高层视觉任务的基础，在许多领域都有着广泛的应用。如：安防领域的人脸识别和智能视频分析等，交通领域的交通场景识别，互联网领域基于内容的图像检索和相册自动归类，医学领域的图像识别等。
 
-在图像分类任务中，我们向大家介绍如何训练AlexNet、VGG、GoogLeNet、ResNet、Inception-v4和Inception-Resnet-V2模型。同时提供了能够将Caffe或TensorFlow训练好的模型文件转换为PaddlePaddle模型文件的模型转换工具。
+在图像分类任务中，我们向大家介绍如何训练AlexNet、VGG、GoogLeNet、ResNet、Inception-v4、Inception-Resnet-V2和SE-ResNeXt模型。同时提供了能够将Caffe或TensorFlow训练好的模型文件转换为PaddlePaddle模型文件的模型转换工具。
 
 - 11.1 [将Caffe模型文件转换为PaddlePaddle模型文件](https://github.com/PaddlePaddle/models/tree/develop/image_classification/caffe2paddle)
 - 11.2 [将TensorFlow模型文件转换为PaddlePaddle模型文件](https://github.com/PaddlePaddle/models/tree/develop/image_classification/tf2paddle)
@@ -107,6 +107,7 @@ PaddlePaddle提供了丰富的运算单元，帮助大家以模块化的方式
 - 11.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
 - 11.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
 - 11.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 11.8 [SE-ResNeXt](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
 
 ## 12. 目标检测
 

diff --git a/README.md b/README.md
@@ -72,7 +72,7 @@ As an example for sequence-to-sequence learning, we take the machine translation
 
 ## 9. Image classification
 
-For the example of image classification, we show you how to train AlexNet, VGG, GoogLeNet, ResNet, Inception-v4 and Inception-Resnet-V2 models in PaddlePaddle. It also provides model conversion tools that convert Caffe or TensorFlow trained model files into PaddlePaddle model files.
+For the example of image classification, we show you how to train AlexNet, VGG, GoogLeNet, ResNet, Inception-v4, Inception-Resnet-V2 and SE-ResNeXt models in PaddlePaddle. It also provides model conversion tools that convert Caffe or TensorFlow trained model files into PaddlePaddle model files.
 
 - 9.1 [convert Caffe model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/image_classification/caffe2paddle)
 - 9.2 [convert TensorFlow model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/image_classification/tf2paddle)
@@ -81,5 +81,6 @@ For the example of image classification, we show you how to train AlexNet, VGG,
 - 9.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
 - 9.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
 - 9.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 9.8 [SE-ResNeXt](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
 
 This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE).
diff --git a/image_classification/README.md b/image_classification/README.md
@@ -1,7 +1,7 @@
 图像分类
 =======================
 
-这里将介绍如何在PaddlePaddle下使用AlexNet、VGG、GoogLeNet、ResNet、Inception-v4和Inception-ResNet-v2模型进行图像分类。图像分类问题的描述和这些模型的介绍可以参考[PaddlePaddle book](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)。
+这里将介绍如何在PaddlePaddle下使用AlexNet、VGG、GoogLeNet、ResNet、Inception-v4、Inception-ResNet-v2和SE-ResNeXt模型进行图像分类。图像分类问题的描述和这些模型的介绍可以参考[PaddlePaddle book](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)。
 
 ## 训练模型
 
@@ -22,6 +22,7 @@ import alexnet
 import googlenet
 import inception_v4
 import inception_resnet_v2
+import se_resnext
 
 
 # PaddlePaddle init
@@ -100,7 +101,6 @@ Inception-v4模型可以通过下面的代码获取, 本例中使用的模型输
 out = inception_v4.inception_v4(image, class_dim=CLASS_DIM)
 ```
 
-
 6. 使用Inception-ResNet-v2模型
 
 提供的Inception-ResNet-v2模型支持`3 * 331 * 331`和`3 * 299 * 299`两种大小的输入，同时可以自行设置dropout概率，可以通过如下的代码使用：
@@ -112,6 +112,14 @@ out = inception_resnet_v2.inception_resnet_v2(
 
 注意，由于和其他几种模型输入大小不同，若配合提供的`reader.py`使用Inception-ResNet-v2时请先将`reader.py`中`paddle.image.simple_transform`中的参数为修改为相应大小。
 
+7. 使用SE-ResNeXt模型
+
+SE-ResNeXt模型可以通过下面的代码获取：
+
+```python
+out = se_resnext.se_resnext50(image, class_dim=CLASS_DIM)
+```
+
 ### 定义损失函数
 
 ```python
@@ -199,7 +207,8 @@ def event_handler(event):
 
 ### 定义训练方法
 
-对于AlexNet、VGG、ResNet、Inception-v4和Inception-ResNet-v2，可以按下面的代码定义训练方法：
+对于AlexNet、VGG、ResNet、Inception-v4、Inception-ResNet-v2和SE-ResNeXt，
+可以按下面的代码定义训练方法：
 
 ```python
 # Create trainer

diff --git a/image_classification/infer.py b/image_classification/infer.py
@@ -12,6 +12,7 @@
 import googlenet
 import inception_v4
 import inception_resnet_v2
+import se_resnext
 
 DATA_DIM = 3 * 224 * 224  # Use 3 * 331 * 331 or 3 * 299 * 299 for Inception-ResNet-v2.
 CLASS_DIM = 102
@@ -29,7 +30,7 @@ def main():
         help='The model for image classification',
         choices=[
             'alexnet', 'vgg13', 'vgg16', 'vgg19', 'resnet', 'googlenet',
-            'inception-resnet-v2', 'inception_v4'
+            'inception-resnet-v2', 'inception_v4', 'se-resnext'
         ])
     parser.add_argument(
         'params_path', help='The file which stores the parameters')
@@ -59,6 +60,8 @@ def main():
             image, class_dim=CLASS_DIM, dropout_rate=0.5, data_dim=DATA_DIM)
     elif args.model == 'inception_v4':
         out = inception_v4.inception_v4(image, class_dim=CLASS_DIM)
+    elif args.model == 'se-resnext':
+        out = se_resnext.se_resnext50(image, class_dim=CLASS_DIM)
 
     # load parameters
     with gzip.open(args.params_path, 'r') as f:

diff --git a/image_classification/se_resnext.py b/image_classification/se_resnext.py
@@ -0,0 +1,148 @@
+import paddle.v2 as paddle
+
+__all__ = ['se_resnext50']
+
+
+def squeeze_excitation(input,
+                       num_channels,
+                       pool_size,
+                       reduction_ratio=16,
+                       name='__SE'):
+    squeeze = paddle.layer.img_pool(
+        name='{0}_globalpool'.format(name),
+        input=input,
+        pool_size=pool_size,
+        stride=1,
+        num_channels=num_channels,
+        pool_type=paddle.pooling.Avg())
+    squeeze = paddle.layer.fc(
+        name='{0}_fc0'.format(name),
+        input=squeeze,
+        size=num_channels / reduction_ratio,
+        act=paddle.activation.Relu())
+    excitation = paddle.layer.fc(
+        name='{0}_fc1'.format(name),
+        input=squeeze,
+        size=num_channels,
+        act=paddle.activation.Sigmoid())
+    scale = paddle.layer.broadcast_scale(input=input, weight=excitation)
+    return scale
+
+
+def se_resnext50(input, class_dim):
+    conv0 = paddle.layer.img_conv(
+        name='conv0',
+        input=input,
+        num_channels=3,
+        num_filters=64,
+        filter_size=7,
+        padding=(7 - 1) / 2,
+        stride=2,
+        act=paddle.activation.Linear())
+    conv0 = paddle.layer.batch_norm(
+        name='conv0_norm', input=conv0, act=paddle.activation.Relu())
+    pool0 = paddle.layer.img_pool(
+        name='resnext_pool0',
+        input=conv0,
+        pool_size=3,
+        stride=2,
+        num_channels=64,
+        pool_type=paddle.pooling.Max())
+
+    def conv_block(input, group, depth, input_channels, num_filters, stride,
+                   cardinality, out_size):
+        conv0 = paddle.layer.img_conv(
+            name='conv{0}_{1}_0'.format(group, depth),
+            input=input,
+            num_channels=input_channels,
+            num_filters=num_filters,
+            filter_size=1,
+            act=paddle.activation.Linear())
+        conv0 = paddle.layer.batch_norm(
+            name='conv{0}_{1}_0_norm'.format(group, depth),
+            input=conv0,
+            act=paddle.activation.Relu())
+        conv1 = paddle.layer.img_conv(
+            name='conv{0}_{1}_1'.format(group, depth),
+            input=conv0,
+            num_channels=num_filters,
+            num_filters=num_filters,
+            filter_size=3,
+            padding=1,
+            stride=stride,
+            groups=cardinality,
+            act=paddle.activation.Linear())
+        conv1 = paddle.layer.batch_norm(
+            name='conv{0}_{1}_1_norm'.format(group, depth),
+            input=conv1,
+            act=paddle.activation.Relu())
+        conv2 = paddle.layer.img_conv(
+            name='conv{0}_{1}_2'.format(group, depth),
+            input=conv1,
+            num_channels=num_filters,
+            num_filters=num_filters * 2,
+            filter_size=1,
+            act=paddle.activation.Linear())
+        conv2 = paddle.layer.batch_norm(
+            name='conv{0}_{1}_2_norm'.format(group, depth),
+            input=conv2,
+            act=paddle.activation.Linear())
+
+        scale = squeeze_excitation(
+            name='SE{0}_{1}'.format(group, depth),
+            input=conv2,
+            num_channels=num_filters * 2,
+            pool_size=out_size)
+
+        if input_channels == num_filters * 2:
+            shortcut = input
+        else:
+            shortcut = paddle.layer.img_conv(
+                name='shortcut_proj_{0}'.format(group),
+                input=input,
+                num_channels=input_channels,
+                num_filters=num_filters * 2,
+                filter_size=1,
+                stride=stride,
+                act=paddle.activation.Linear())
+            shortcut = paddle.layer.batch_norm(
+                name='shortcut_proj_{0}_norm'.format(group),
+                input=shortcut,
+                act=paddle.activation.Linear())
+
+        return paddle.layer.addto(
+            input=[scale, shortcut], act=paddle.activation.Relu())
+
+    depth = [3, 4, 6, 3]
+    num_filters = [128, 256, 512, 1024]
+    input_channels = [64, 256, 512, 1024]
+    strides = [1, 2, 2, 2]
+    out_size = [56, 28, 14, 7]
+    conv = pool0
+    for group in range(4):
+        for i in range(depth[group]):
+            conv = conv_block(
+                input=conv,
+                group=group + 1,
+                depth=i,
+                input_channels=input_channels[group]
+                if i == 0 else num_filters[group] * 2,
+                num_filters=num_filters[group],
+                stride=strides[group] if i == 0 else 1,
+                cardinality=32,
+                out_size=out_size[group])
+
+    pool1 = paddle.layer.img_pool(
+        name='resnext_globalpool',
+        input=conv,
+        pool_size=7,
+        stride=1,
+        num_channels=2048,
+        pool_type=paddle.pooling.Avg())
+
+    out = paddle.layer.fc(
+        name='resnext_fc',
+        input=pool1,
+        size=class_dim,
+        act=paddle.activation.Softmax())
+    return out
diff --git a/image_classification/train.py b/image_classification/train.py
@@ -10,6 +10,7 @@
 import googlenet
 import inception_v4
 import inception_resnet_v2
+import se_resnext
 
 DATA_DIM = 3 * 224 * 224  # Use 3 * 331 * 331 or 3 * 299 * 299 for Inception-ResNet-v2.
 CLASS_DIM = 102
@@ -24,7 +25,7 @@ def main():
         help='The model for image classification',
         choices=[
             'alexnet', 'vgg13', 'vgg16', 'vgg19', 'resnet', 'googlenet',
-            'inception-resnet-v2', 'inception_v4'
+            'inception-resnet-v2', 'inception_v4', 'se-resnext'
         ])
     args = parser.parse_args()
 
@@ -64,6 +65,8 @@ def main():
             image, class_dim=CLASS_DIM, dropout_rate=0.5, data_dim=DATA_DIM)
     elif args.model == 'inception_v4':
         out = inception_v4.inception_v4(image, class_dim=CLASS_DIM)
+    elif args.model == 'se-resnext':
+        out = se_resnext.se_resnext50(image, class_dim=CLASS_DIM)
 
     cost = paddle.layer.classification_cost(input=out, label=lbl)