Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于自定义字典的路径问题 #419

Closed
420672771 opened this issue Mar 6, 2017 · 7 comments
Closed

关于自定义字典的路径问题 #419

420672771 opened this issue Mar 6, 2017 · 7 comments
Labels

Comments

@420672771
Copy link

看hankcs给出的自定义字典的配置格式是这样的:
data/dictionary/custom/CustomDictionary.txt;CompanyName.txt;school.txt
但实际上这样配置却读不到,程序运行时直接找了根路径+CompanyName.txt文件,
改成:
data/dictionary/custom/CustomDictionary.txt;data/dictionary/custom/CompanyName.txt;data/dictionary/custom/school.txt
这个样子的绝对路径就可以读到了,不知是哪里的错误,还是我理解有偏差,望指教

@yesseecity
Copy link

  • data/dictionary/custom/CustomDictionary.txt;CompanyName.txt;school.txt

  • data/dictionary/custom/CustomDictionary.txt; CompanyName.txt; school.txt
    下面的這個在;後有多一個space,指的是 與data/dictionary/custom/CustomDictionary.txt 同樣的資料夾目錄底下的檔案

而上面的沒有用space隔開,指的是絕對路徑

@hankcs hankcs added the question label Mar 9, 2017
@cicido
Copy link

cicido commented Mar 13, 2017

源代码如下:
String[] pathArray = p.getProperty("CustomDictionaryPath", "data/dictionary/custom/CustomDictionary.txt").split(";");
String prePath = root;
for (int i = 0; i < pathArray.length; ++i)
{
if (pathArray[i].startsWith(" "))
{
pathArray[i] = prePath + pathArray[i].trim();
}
else
{
pathArray[i] = root + pathArray[i];
int lastSplash = pathArray[i].lastIndexOf('/');
if (lastSplash != -1)
{
prePath = pathArray[i].substring(0, lastSplash + 1);
}
}
}
CustomDictionaryPath = pathArray;

@cicido
Copy link

cicido commented Mar 13, 2017

有点不太明白,这里为何要单独处理空格。一个配置为何搞得这么复杂呢?

@hankcs
Copy link
Owner

hankcs commented Mar 20, 2017

@cicido 不复杂,一个配置项中多个路径而已。空格表示与前一个文件在同一个目录,如果不处理的话,路径超级长。其实如果大家嫌配置文件麻烦的话,可以完全脱离配置文件的,直接HanLP.Config.key = value写自己的配置。最开始的时候就预留了这种灵活性,可能大家没想到。

@cicido
Copy link

cicido commented Mar 20, 2017

我原来以为自定义的词典不会太多。后来想了下,可能会越来越来。目前我把jieba, scws的分词词典放在里面了,词性是一致的,频率默认给定2. 估计以后会添加更多的自定义的词典。这样配置,确实能减少长度。

@420672771
Copy link
Author

了解了

@hankcs
Copy link
Owner

hankcs commented Jan 1, 2020

感谢您对HanLP1.x的支持,我一直为没有时间回复所有issue感到抱歉,希望您提的问题已经解决。或者,您可以从《自然语言处理入门》中找到答案。

时光飞逝,HanLP1.x感谢您的一路相伴。我于东部标准时间2019年12月31日发布了HanLP1.x在上一个十年最后一个版本,代号为最后的武士。此后1.x分支将提供稳定性维护,但不是未来开发的焦点。

值此2020新年之际,我很高兴地宣布,HanLP2.0发布了。HanLP2.0的愿景是下一个十年的前沿NLP技术。为此,HanLP2.0采用TensorFlow2.0实现了最前沿的深度学习模型,通过精心设计的框架支撑下游NLP任务,在海量语料库上取得了最前沿的准确率。作为第一个alpha版本,HanLP 2.0.0a0支持分词、词性标注、命名实体识别、依存句法分析、语义依存分析以及文本分类。而且,这些功能并不仅限中文,而是面向全人类语种设计。HanLP2.0提供许多预训练模型,而终端用户仅需两行代码即可部署,深度学习落地不再困难。更多详情,欢迎观看HanLP2.0的介绍视频,或参与论坛讨论

展望未来,HanLP2.0将集成1.x时代继承下来的高效率务实风范,同时冲刺前沿研究,做工业界和学术界的两栖战舰,请诸君继续多多指教,谢谢。

@hankcs hankcs closed this as completed Jan 1, 2020
@hankcs hankcs added ignored and removed question labels Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants