Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用户自己添加新的词性的问题 #271

Closed
liuyunfanng opened this issue Jul 15, 2016 · 10 comments
Closed

用户自己添加新的词性的问题 #271

liuyunfanng opened this issue Jul 15, 2016 · 10 comments
Labels

Comments

@liuyunfanng
Copy link

现在发现1.2.10给出的用户自定义添加词性的demo不适用于NLPTokenizer.segment(text)分词,还没找出原因,希望能帮忙看一下,谢谢!

@hankcs
Copy link
Owner

hankcs commented Jul 15, 2016

请给出触发代码

@liuyunfanng
Copy link
Author

DemoCustomNature.java中
// 我们可以动态添加一个
pcNature = Nature.create("np");
System.out.println(pcNature);
// 可以将它赋予到某个词语
LexiconUtility.setAttribute("苹果电脑", pcNature);
// 或者
LexiconUtility.setAttribute("苹果电脑", "np 1000");
// 它们将在分词结果中生效
List termList = HanLP.segment("苹果电脑可以运行开源阿尔法狗代码吗");
上面这段代码是可行的
但是如果termList = NLPTokenizer.segment("苹果电脑可以运行开源阿尔法狗代码吗");就会返回错误
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 148
at com.hankcs.hanlp.algoritm.Viterbi.compute(Viterbi.java:121)
at com.hankcs.hanlp.seg.WordBasedGenerativeModelSegment.speechTagging(WordBasedGenerativeModelSegment.java:531)
at com.hankcs.hanlp.seg.Viterbi.ViterbiSegment.segSentence(ViterbiSegment.java:118)
at com.hankcs.hanlp.seg.Segment.seg(Segment.java:454)
at com.hankcs.hanlp.tokenizer.NLPTokenizer.segment(NLPTokenizer.java:37)
at com.hankcs.demo.DemoCustomNature.main(DemoCustomNature.java:50)
我怎么看懂,是生成矩阵的原因吗

@hankcs
Copy link
Owner

hankcs commented Jul 15, 2016

[苹果电脑/np, 可以/v, 运行/vn, 开源/v, 阿尔法/nrf, 狗/n, 代码/n, 吗/y]
已经修复,用版本库里的最新代码。

@hankcs hankcs closed this as completed Jul 15, 2016
@cwj
Copy link

cwj commented Apr 4, 2019

from pyhanlp import *

def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")

def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗,迎娶白富美,走上人生巅峰"))

if name == "main":
add_dictionary()
keyword_extract()

结果:[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ,/w, 迎娶/v, 白富美/nr, ,/w, 走上/v, 人生/n, 巅峰/nr]

在python版本里不起作用呢

@hankcs
Copy link
Owner

hankcs commented Apr 4, 2019

from pyhanlp import *

def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")

def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗,迎娶白富美,走上人生巅峰"))

if name == "main":
add_dictionary()
keyword_extract()

结果:[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ,/w, 迎娶/v, 白富美/nr, ,/w, 走上/v, 人生/n, 巅峰/nr]

在python版本里不起作用呢

运行这个试试:https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

@cwj
Copy link

cwj commented Apr 4, 2019

from pyhanlp import *
def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")
def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗,迎娶白富美,走上人生巅峰"))
if name == "main":
add_dictionary()
keyword_extract()
结果:[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ,/w, 迎娶/v, 白富美/nr, ,/w, 走上/v, 人生/n, 巅峰/nr]
在python版本里不起作用呢

运行这个试试:https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了,是可以的,在NLPTokenizer.segment的时候不起作用

@hankcs
Copy link
Owner

hankcs commented Apr 4, 2019

from pyhanlp import *
def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")
def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗,迎娶白富美,走上人生巅峰"))
if name == "main":
add_dictionary()
keyword_extract()
结果:[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ,/w, 迎娶/v, 白富美/nr, ,/w, 走上/v, 人生/n, 巅峰/nr]
在python版本里不起作用呢

运行这个试试:https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了,是可以的,在NLPTokenizer.segment的时候不起作用

感谢反馈,已经修复,请参考上面的commit。
如果还有问题,欢迎重开issue。

@cwj
Copy link

cwj commented Apr 4, 2019

from pyhanlp import *
def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")
def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗,迎娶白富美,走上人生巅峰"))
if name == "main":
add_dictionary()
keyword_extract()
结果:[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ,/w, 迎娶/v, 白富美/nr, ,/w, 走上/v, 人生/n, 巅峰/nr]
在python版本里不起作用呢

运行这个试试:https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了,是可以的,在NLPTokenizer.segment的时候不起作用

感谢反馈,已经修复,请参考上面的commit。
如果还有问题,欢迎重开issue。

不好意思,请问你们测试通过没?我这边怎么还是不行呢,我这边是需要有其它什么修改么?

@hankcs
Copy link
Owner

hankcs commented Apr 4, 2019

from pyhanlp import *
def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")
def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗,迎娶白富美,走上人生巅峰"))
if name == "main":
add_dictionary()
keyword_extract()
结果:[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ,/w, 迎娶/v, 白富美/nr, ,/w, 走上/v, 人生/n, 巅峰/nr]
在python版本里不起作用呢

运行这个试试:https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了,是可以的,在NLPTokenizer.segment的时候不起作用

感谢反馈,已经修复,请参考上面的commit。
如果还有问题,欢迎重开issue。

不好意思,请问你们测试通过没?我这边怎么还是不行呢,我这边是需要有其它什么修改么?

你需要等下一个版本,或者自行编译jar并替换pyhanlp中的jar。

@cwj
Copy link

cwj commented Apr 4, 2019

from pyhanlp import *
def add_dictionary():
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("攻城狮")
def keyword_extract():
NLPTokenizer = JClass("com.hankcs.hanlp.tokenizer.NLPTokenizer") # NLP标注
print(NLPTokenizer.segment("攻城狮逆袭单身狗,迎娶白富美,走上人生巅峰"))
if name == "main":
add_dictionary()
keyword_extract()
结果:[攻城/ns, 狮/Ng, 逆袭/v, 单身/n, 狗/n, ,/w, 迎娶/v, 白富美/nr, ,/w, 走上/v, 人生/n, 巅峰/nr]
在python版本里不起作用呢

运行这个试试:https://github.com/hankcs/pyhanlp/blob/7f9c58731aa786005776458a8232855e19ec7cb1/tests/demos/demo_custom_dictionary.py#L21

这个我试过了,是可以的,在NLPTokenizer.segment的时候不起作用

感谢反馈,已经修复,请参考上面的commit。
如果还有问题,欢迎重开issue。

不好意思,请问你们测试通过没?我这边怎么还是不行呢,我这边是需要有其它什么修改么?

你需要等下一个版本,或者自行编译jar并替换pyhanlp中的jar。

好的,谢谢啦!不过我发现从上个版本到现在这个文件修改的地方挺多的,请问你们下个版本大概什么时候发布呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants