Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OCR CTC model #596

Merged
merged 13 commits into from
Mar 7, 2018
Merged

Conversation

wanghaoshuang
Copy link
Contributor

@wanghaoshuang wanghaoshuang commented Jan 24, 2018

fix #591
A test result on random dummy data as below:

-----------  Configuration Arguments -----------
batch_size: 16
device: -1
l2: 0.0005
learning_rate: 0.001
max_clip: 10.0
min_clip: -10.0
momentum: 0.9
pass_num: 16
------------------------------------------------
Pass[0], batch[0]; loss: 2614.78; edit distance: 185.0.
End pass[0]; train data edit_distance: 11.5625.
End pass[0]; test data edit_distance: 5.25.
Pass[1], batch[0]; loss: 1669.92; edit distance: 752.0.
End pass[1]; train data edit_distance: 47.0.
End pass[1]; test data edit_distance: 5.0625.

Copy link
Collaborator

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add README.md like https://github.com/PaddlePaddle/models/tree/develop/fluid/image_classification
    2. Need to add test part in next PR.

add_arg('device', int, -1, "Device id.'-1' means running on CPU"
"while '0' means GPU-0.")
# yapf: disable
def _to_lodtensor(data, place):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python/paddle/v2/fluid/executor.py can process the sequence data. This can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

res.set_lod([lod])
return res

def _get_feeder_data(data, place):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems, only add prefix _ for the non-exposing function.

Copy link
Contributor Author

@wanghaoshuang wanghaoshuang Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I regard _get_feeder_data as an internal function of the training module. So I add prefix _ according to google python code style.

label_tensor = _to_lodtensor(map(lambda x: x[1], data), place)
return {"pixel": pixel_tensor, "label": label_tensor}

def _ocr_conv(input, num, with_bn, param_attrs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ocr_conv -> conv_group ?

Copy link
Contributor Author

@wanghaoshuang wanghaoshuang Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an internal function. So I add prefix _.

return conv4


def _ocr_ctc_net(images, num_classes, param_attrs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ocr_ctc_net -> ctc_net ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an internal function. So I add prefix _.

label=label,
size=num_classes + 1,
blank=num_classes,
norm_by_times=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

norm_by_times=True -> norm_by_times=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

norm_by_times means whether to divide gradients by sequence length.
With mean op, gradients were divided by batch_size.
If we want to avoid the effect of mean op, it's more reasonable to remove mean_grad op but not make norm_by_times=False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, in current code, the target to be minimized by optimizer is cost but not avg_cost.

# define cost and optimizer
113     cost = fluid.layers.warpctc(
114         input=fc_out,
115         label=label,
116         size=num_classes + 1,
117         blank=num_classes,
118         norm_by_times=True)
119     avg_cost = fluid.layers.mean(x=cost)
120     optimizer = fluid.optimizer.Momentum(
121         learning_rate=args.learning_rate, momentum=args.momentum)
122     opts = optimizer.minimize(cost)

@qingqing01
Copy link
Collaborator

Pass[0], batch[0]; loss: 2614.78; edit distance: 185.0.
End pass[0]; train data edit_distance: 11.5625.
End pass[0]; test data edit_distance: 5.25.
Pass[1], batch[0]; loss: 1669.92; edit distance: 752.0.
End pass[1]; train data edit_distance: 47.0.
End pass[1]; test data edit_distance: 5.0625.

后续log再整理清晰一些。 edit_distance 换成 Word error ?

def main():
args = parser.parse_args()
print_arguments(args)
train(l2=args.l2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train() 的参数直接是 args更简单一些吧。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Done.

norm_by_times=True)
avg_cost = fluid.layers.mean(x=cost)
optimizer = fluid.optimizer.Momentum(
learning_rate=learning_rate / batch_size, momentum=momentum)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

learning_rate / batch_size -> learning_rate

超参数里不用考虑batch_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Done.

num_classes = data_reader.num_classes()
# define network
param_attrs = fluid.ParamAttr(
regularizer=fluid.regularizer.L2Decay(l2 * batch_size),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l2 * batch_size -> l2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Done.

def _ocr_ctc_net(images, num_classes, param_attrs):
conv_features = _ocr_conv(images, 8, True, param_attrs)
sliced_feature = fluid.layers.im2sequence(
input=conv_features, stride=[1, 1], filter_size=[1, 3])
Copy link
Collaborator

@qingqing01 qingqing01 Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sliced_feature输出layout是NCHW

filter_size=[1, 3] -> filter_size=[1, sliced_feature.shape[2]] 更通用一些,输入图片的height变了之后,这里也不用改。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Fixed.

1. Remove 'ocr_ctc' directory to 'ocr'.
2. Init README.md
3. Fix learning rate and l2
4. Refine training log format
5. Reduce arguments of train function
6. Set filter_size of im2sequence dynamicly
7. Add fc op before GRU op
Copy link
Collaborator

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. fluid/ocr -> fluid/ocr_recognition
  2. If verify the forward network, please add an inference.py

#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other models, there is no copyright, so remove it?

conv4 = _conv_block(conv3, 128, (num / 4), with_bn)
return conv4

def _ocr_ctc_net(images, num_classes, param_attrs, rnn_hidden_size=200):
Copy link
Collaborator

@qingqing01 qingqing01 Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 关于命名,觉得可以和其他配置模型保持一致, 我看其他配置里没加 _ 前缀。

  2. 被其他文件import, ocr_conv这样的配置也是可以用的吧。

  3. _ocr_conv, _ocr_ctc_net这样的命名都不好

    • 这组conv不是ocr特有
    • ocr_ctc_net里并没有ctc

size=num_classes + 1,
blank=num_classes,
norm_by_times=True)
avg_cost = fluid.layers.mean(x=cost)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面已经把模型的定义隔离了, 这里的def train()里又包含了一部分网络,隔离不干净!

可以在另一个文件里定义网络:比如叫 crnn_ctc_model.py

这样后续attention模型也可以继续加个文件,复用train.py。

1. Move all network defining to 'crnn_ctc_model.py'
2. Add initilizer for some layers
3. Rename 'fluid/ocr' to 'fluid/ocr_recognition'
4. Remove copyright
5. Rename some functions
2. Add inference script
3. Add load model script
4. Add some functions into ctc_reader
@wanghaoshuang wanghaoshuang merged commit 75d242f into PaddlePaddle:develop Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add model for OCR CTC
2 participants