Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Optimized word count for non-Chinese articles #1865

Merged
merged 8 commits into from
May 6, 2022

Conversation

Yhcrown
Copy link

@Yhcrown Yhcrown commented Apr 20, 2022

What this PR does?

fix #1759

  • 将中文和其他字符分开统计,中文按照字数计数,其他的语言默认按照标点分割来计数。

  • BasePost类里添加了charCount字段,用于记录旧版的字符数。

  • 添加了对应的测试代码。

Yhcrown and others added 2 commits April 20, 2022 20:43
Now it can accurately identify the number of words in English articles and mixed language articles with character fieds.
@CLAassistant
Copy link

CLAassistant commented Apr 20, 2022

CLA assistant check
All committers have signed the CLA.

@Yhcrown
Copy link
Author

Yhcrown commented Apr 20, 2022

/assign @ruibaby

@ruibaby
Copy link
Member

ruibaby commented Apr 21, 2022

你好 @Yhcrown ,首先感谢你的贡献。但我认为目前增加这个特性不是太合适。原 issue 我也是加在了 2.0 的 milestone 中。

/cc @halo-dev/sig-halo

@Yhcrown
Copy link
Author

Yhcrown commented Apr 21, 2022

你好 @Yhcrown ,首先感谢你的贡献。但我认为目前增加这个特性不是太合适。原 issue 我也是加在了 2.0 的 milestone 中。

/cc @halo-dev/sig-halo

谢谢开发者的回复,其实个人感觉这个不能算一个特性。我目前一般在博客中记录一些研读论文的笔记,其中有较多引用部分。还有一些因为课程要求需要全英文的报告,会导致原先的wordcount算法记录的文字数量比实际的文字数量(word和typora统计的)多好几千(纯英文报告会上万)。个人目前还是期待能够修复一下的,如果2.0开发周期较长的话,希望开发者能再考虑一下。如果仍认为不妥,我稍后会关闭这个pr。

@ruibaby
Copy link
Member

ruibaby commented Apr 21, 2022

@Yhcrown 明白你的意思,但这已经涉及到添加字段了。目前我们的版本规定是,1.5.x 仅做 bug 修复,不做修改和新功能添加。如果这个修改是在 1.5 发布时合并的话,就不会有问题。如果我们合并了这个修改,可能就得到 1.6.x 的时候。所以我个人认为这个 PR 可以先挂着,也不用 close。

@ruibaby
Copy link
Member

ruibaby commented Apr 21, 2022

@Yhcrown 如果目前这个问题对你来说影响太大,可以暂时先自己维护一个 fork 版本,及时对上游的代码(release 后)进行同步即可。

@Yhcrown
Copy link
Author

Yhcrown commented Apr 21, 2022

@Yhcrown 明白你的意思,但这已经涉及到添加字段了。目前我们的版本规定是,1.5.x 仅做 bug 修复,不做修改和新功能添加。如果这个修改是在 1.5 发布时合并的话,就不会有问题。如果我们合并了这个修改,可能就得到 1.6.x 的时候。所以我个人认为这个 PR 可以先挂着,也不用 close。

好的 理解

1 similar comment
@Yhcrown
Copy link
Author

Yhcrown commented Apr 21, 2022

@Yhcrown 明白你的意思,但这已经涉及到添加字段了。目前我们的版本规定是,1.5.x 仅做 bug 修复,不做修改和新功能添加。如果这个修改是在 1.5 发布时合并的话,就不会有问题。如果我们合并了这个修改,可能就得到 1.6.x 的时候。所以我个人认为这个 PR 可以先挂着,也不用 close。

好的 理解

@ruibaby
Copy link
Member

ruibaby commented Apr 21, 2022

/hold

@f2c-ci-robot f2c-ci-robot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 21, 2022
@JohnNiang
Copy link
Member

/area core
/milestone 1.6.x

@f2c-ci-robot f2c-ci-robot bot added the area/core Issues or PRs related to the Halo Core label Apr 24, 2022
@f2c-ci-robot f2c-ci-robot bot added this to the 1.6.x milestone Apr 24, 2022
Copy link
Member

@JohnNiang JohnNiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Yhcrown ,感谢修复这个问题。但是不建议再添加 charCount 字段,这样也能够将此修复合并到 release-1.5 分支。如果后续确实想要将此 chartCount 存储下来,也建议存储到 PostMeta 里面即可。

@Yhcrown
Copy link
Author

Yhcrown commented Apr 29, 2022

Hi @Yhcrown ,感谢修复这个问题。但是不建议再添加 charCount 字段,这样也能够将此修复合并到 release-1.5 分支。如果后续确实想要将此 chartCount 存储下来,也建议存储到 PostMeta 里面即可。

开发者你好,新增charCount字段是因为原issue中提到了这个要求,个人其实也觉得没必要。现在已删除,但是暂时保留了之前的计算charCount的方法和单元测试。如果有需要我会一并删除。

Copy link
Member

@JohnNiang JohnNiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请尝试添加更多的测试用例,例如:

Hi,您好啊!

Wow,Nice PR。

今天天气非常不错!

Copy link
Member

@ruibaby ruibaby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@f2c-ci-robot f2c-ci-robot bot added the lgtm Indicates that a PR is ready to be merged. label May 6, 2022
Copy link
Member

@JohnNiang JohnNiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/milestone 1.5.x
/cherrypick release-1.5
/approve

@f2c-ci-robot f2c-ci-robot bot modified the milestones: 1.6.x, 1.5.x May 6, 2022
@f2c-ci-robot f2c-ci-robot bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2022
Copy link
Member

@guqing guqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented May 6, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: guqing, JohnNiang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@guqing
Copy link
Member

guqing commented May 6, 2022

/unhold

@f2c-ci-robot f2c-ci-robot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 6, 2022
@f2c-ci-robot f2c-ci-robot bot merged commit 508c41b into halo-dev:master May 6, 2022
@JohnNiang
Copy link
Member

/cherrypick release-1.5

@halo-dev-bot
Copy link
Collaborator

@JohnNiang: new pull request created: #2070

In response to this:

/cherrypick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/core Issues or PRs related to the Halo Core lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

文章字数统计考虑英文单词计数
6 participants