用户自己添加新的词性的问题 #271

liuyunfanng · 2016-07-15T08:07:20Z

现在发现1.2.10给出的用户自定义添加词性的demo不适用于NLPTokenizer.segment(text)分词，还没找出原因，希望能帮忙看一下，谢谢！

hankcs · 2016-07-15T08:14:46Z

请给出触发代码

liuyunfanng · 2016-07-15T09:20:42Z

DemoCustomNature.java中
// 我们可以动态添加一个
pcNature = Nature.create("np");
System.out.println(pcNature);
// 可以将它赋予到某个词语
LexiconUtility.setAttribute("苹果电脑", pcNature);
// 或者
LexiconUtility.setAttribute("苹果电脑", "np 1000");
// 它们将在分词结果中生效
List termList = HanLP.segment("苹果电脑可以运行开源阿尔法狗代码吗");
上面这段代码是可行的
但是如果termList = NLPTokenizer.segment("苹果电脑可以运行开源阿尔法狗代码吗");就会返回错误
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 148
at com.hankcs.hanlp.algoritm.Viterbi.compute(Viterbi.java:121)
at com.hankcs.hanlp.seg.WordBasedGenerativeModelSegment.speechTagging(WordBasedGenerativeModelSegment.java:531)
at com.hankcs.hanlp.seg.Viterbi.ViterbiSegment.segSentence(ViterbiSegment.java:118)
at com.hankcs.hanlp.seg.Segment.seg(Segment.java:454)
at com.hankcs.hanlp.tokenizer.NLPTokenizer.segment(NLPTokenizer.java:37)
at com.hankcs.demo.DemoCustomNature.main(DemoCustomNature.java:50)
我怎么看懂，是生成矩阵的原因吗

hankcs · 2016-07-15T13:08:39Z

[苹果电脑/np, 可以/v, 运行/vn, 开源/v, 阿尔法/nrf, 狗/n, 代码/n, 吗/y]
已经修复，用版本库里的最新代码。

cwj · 2019-04-04T01:43:03Z

from pyhanlp import *

def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")

def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰"))

if name == "main":
add_dictionary()
keyword_extract()

结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr]

在python版本里不起作用呢

hankcs · 2019-04-04T01:49:25Z

from pyhanlp import *

def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")

def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰"))

if name == "main":
add_dictionary()
keyword_extract()

结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr]

在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

cwj · 2019-04-04T01:54:30Z

from pyhanlp import *
def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")
def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰"))
if name == "main":
add_dictionary()
keyword_extract()
结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr]
在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了，是可以的，在NLPTokenizer.segment的时候不起作用

hankcs · 2019-04-04T02:35:33Z

from pyhanlp import *
def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")
def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰"))
if name == "main":
add_dictionary()
keyword_extract()
结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr]
在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了，是可以的，在NLPTokenizer.segment的时候不起作用

感谢反馈，已经修复，请参考上面的commit。
如果还有问题，欢迎重开issue。

cwj · 2019-04-04T03:07:18Z

from pyhanlp import *
def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")
def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰"))
if name == "main":
add_dictionary()
keyword_extract()
结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr]
在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了，是可以的，在NLPTokenizer.segment的时候不起作用

感谢反馈，已经修复，请参考上面的commit。
如果还有问题，欢迎重开issue。

不好意思，请问你们测试通过没？我这边怎么还是不行呢，我这边是需要有其它什么修改么？

hankcs · 2019-04-04T03:08:13Z

from pyhanlp import *
def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")
def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰"))
if name == "main":
add_dictionary()
keyword_extract()
结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr]
在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了，是可以的，在NLPTokenizer.segment的时候不起作用

感谢反馈，已经修复，请参考上面的commit。
如果还有问题，欢迎重开issue。

不好意思，请问你们测试通过没？我这边怎么还是不行呢，我这边是需要有其它什么修改么？

你需要等下一个版本，或者自行编译jar并替换pyhanlp中的jar。

cwj · 2019-04-04T03:10:43Z

from pyhanlp import *
def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")
def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗，迎娶白富美，走上人生巅峰"))
if name == "main":
add_dictionary()
keyword_extract()
结果：[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ，/w, 迎娶/v, 白富美/nr, ，/w, 走上/v, 人生/n, 巅峰/nr]
在python版本里不起作用呢

运行这个试试：https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了，是可以的，在NLPTokenizer.segment的时候不起作用

感谢反馈，已经修复，请参考上面的commit。
如果还有问题，欢迎重开issue。

不好意思，请问你们测试通过没？我这边怎么还是不行呢，我这边是需要有其它什么修改么？

你需要等下一个版本，或者自行编译jar并替换pyhanlp中的jar。

好的，谢谢啦！不过我发现从上个版本到现在这个文件修改的地方挺多的，请问你们下个版本大概什么时候发布呢？

hankcs added the question label Jul 15, 2016

hankcs closed this as completed Jul 15, 2016

hankcs added a commit that referenced this issue Apr 4, 2019

修复词法分析器对动态插入的词条的处理 fix #271 (comment)

d82cc08

hankcs mentioned this issue Apr 5, 2019

1.7.2 版本中 CustomDictionary.insert 对 NLPTokenizer 无效？ #1143

Closed

1 task

huminghe pushed a commit to huminghe/HanLP that referenced this issue Apr 23, 2019

修复词法分析器对动态插入的词条的处理 fix hankcs#271 (comment)

68652c3

hankcs added a commit that referenced this issue Jan 10, 2020

修复词法分析器对动态插入的词条的处理 fix #271 (comment)

f3859ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

用户自己添加新的词性的问题 #271

用户自己添加新的词性的问题 #271

liuyunfanng commented Jul 15, 2016

hankcs commented Jul 15, 2016

liuyunfanng commented Jul 15, 2016

hankcs commented Jul 15, 2016

cwj commented Apr 4, 2019

hankcs commented Apr 4, 2019

cwj commented Apr 4, 2019

hankcs commented Apr 4, 2019

cwj commented Apr 4, 2019

hankcs commented Apr 4, 2019

cwj commented Apr 4, 2019 •

edited

Loading

用户自己添加新的词性的问题 #271

用户自己添加新的词性的问题 #271

Comments

liuyunfanng commented Jul 15, 2016

hankcs commented Jul 15, 2016

liuyunfanng commented Jul 15, 2016

hankcs commented Jul 15, 2016

cwj commented Apr 4, 2019

hankcs commented Apr 4, 2019

cwj commented Apr 4, 2019

hankcs commented Apr 4, 2019

cwj commented Apr 4, 2019

hankcs commented Apr 4, 2019

cwj commented Apr 4, 2019 • edited Loading

cwj commented Apr 4, 2019 •

edited

Loading