-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 2078 #2165
Fix 2078 #2165
Conversation
@guoshengCS Thanks for testing and document writing. |
this->storeLocalValues(); | ||
std::vector<std::string> buffers; | ||
paddle::str::split(name, '.', &buffers); | ||
auto it = this->values_.find(buffers[buffers.size() - 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change buffers[buffers.size() - 1]
to buffers.back()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
private: | ||
void storeLocalValues() const { | ||
CHECK_GT(numOutputSegments_, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change CHECK_GT
to CHECK_GE
, numOutputSegments_
can be 0 in practice. For example, the label sequence is O O O O
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
private: | ||
void storeLocalValues() const { | ||
CHECK_GT(numOutputSegments_, 0); | ||
CHECK_GT(numLabelSegments_, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change CHECK_GT
to CHECK_GE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
void storeLocalValues() const { | ||
CHECK_GT(numOutputSegments_, 0); | ||
CHECK_GT(numLabelSegments_, 0); | ||
double precision = (double)numCorrect_ / numOutputSegments_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it to double precision = !numOutputSegments_ ? 0 : (double)numCorrect_ / numOutputSegments_;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
CHECK_GT(numOutputSegments_, 0); | ||
CHECK_GT(numLabelSegments_, 0); | ||
double precision = (double)numCorrect_ / numOutputSegments_; | ||
double recall = (double)numCorrect_ / numLabelSegments_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it to double recall = !numLabelSegments_ ? 0 : (double)numCorrect_ / numLabelSegments_;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
To make it clear, let's illustrate by a NER example. | ||
Assuming that there are two named entity types including ORG and PER which are called 'chunk type' here, | ||
if 'IOB' scheme were used, the label set will be extended to a set including B-ORG, I-ORG, B-PER, I-PER and O, | ||
in which B-ORG for begining of ORG and I-ORG for end of ORG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change end of ORG
to inside of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
'plain' means the whole chunk must contain exactly the same chunk label. | ||
Realizing that the number of is chunk type is 2 and number of tag type is 2, it is easy to validate this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change the example to make chunk type and tag type of different number. In this case, both of them are 2 and may fail to help the users to clarify their misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
For each label in the label sequence, we have: | ||
To use chunk evaluator, the construction of label dict should obey the following rules: | ||
(1) Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should define "chunk type", "tag type" before the following table. And we'd better have a running example to show how to label the words using different schemes. In fact, the following table is the protocol for assigning tag types, not the definition of the schemes. Therefore, I think we also need another table for the definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
The total number of different labels is numTagType*numChunkTypes+1. | ||
We support 4 labelling scheme. | ||
The tag type for each of the scheme is shown as follows: | ||
(2) Map can be done correctly by the listed equations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change Map
to Mapping
Change can be done
to is done
. I think is done
is better. Because if can be done
was used, it may mislead the users to think this is only one of the feasible options. However, this is the only feasible one because we hard coded it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
IOB 0 1 - - | ||
IOE - 0 1 - | ||
IOBES 0 1 2 3 | ||
Continue the NER example, and the label dict should like this to satify above equations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change like
to look like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.Override getTypeImpl instead of getType.
2.I think holding precision
, recall
and F1-score
into an unified map could make the code cleaner and easier to maintain and the extra computation cost is trivial.
3.Revise the document following the review comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except getNames(). Please consult other members to make the final decision.
请参考 http://www.paddlepaddle.org/develop/doc_cn/howto/dev/write_docs_cn.html 看一下生成出的文档格式是否正确。 |
IOB Two labels for chunk type X, B-X for chunk begining and I-X for chunk inside. | ||
IOE Two labels for chunk type X, E-X for chunk ending and I-X for chunk inside. | ||
IOBES Four labels for chunk type X, B-X for chunk begining, I-X for chunk inside, E-X for chunk end and S-X for single word chunk. | ||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
366行可以去掉,只需要一个.. code-block::python
即可。下同。
请生成下文档看下显示是否正确。目前粗看,会有一些问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
For each label in the label sequence, we have: | ||
The construction of label dict should obey the following rules: | ||
(1) Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry. | ||
|
||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why code-block is python? It seems a plain text?
..[SPACE]code-block:: [language]
[EMPTY_LINE]
[SPACE][SPACE][SPACE]Your texts.
.. code-block:: text
abc
def
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM.
fixes #2078