Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phraseTree引发的import error #1886

Closed
oasis-0927 opened this issue Mar 22, 2024 · 2 comments
Closed

phraseTree引发的import error #1886

oasis-0927 opened this issue Mar 22, 2024 · 2 comments
Assignees
Labels

Comments

@oasis-0927
Copy link

oasis-0927 commented Mar 22, 2024

Describe the bug
python3.9+中将cgi.escape 移除,修改为html.escape ,新版本的nltk库中已经进行修改,但是由于本项目引用的是没有进行相关修改的phraseTree,因此在python 3.9+的环境中使用pretty_print方法会报错。

是否可以尝试将phraseTree都统一替换为nltk.tree 来解决此问题。

Code to reproduce the issue

import hanlp
from hanlp_common.document import Document


def merge_pos_into_con(doc: Document):
	flat = isinstancse(doc['pos'][0], str)
	if flat:
		doc = Document((k, [v]) for k, v in doc.items())
	for tree, tags in zip(doc['con'], doc['pos']):
		offset = 0
		for subtree in tree.subtrees(lambda t: t.height() == 2):
			tag = subtree.label()
			if tag == '_':
				subtree.set_label(tags[offset])
			offset += 1
	if flat:
		doc = doc.squeeze()
	return doc


con = hanlp.load('CTB9_CON_FULL_TAG_ELECTRA_SMALL')
tok = hanlp.load(hanlp.pretrained.tok.COARSE_ELECTRA_SMALL_ZH)
pos = hanlp.load(hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL)
nlp = hanlp.pipeline().append(pos, input_key='tok', output_key='pos') \
	.append(con, input_key='tok', output_key='con')
doc = nlp(tok=["2021年", "HanLPv2.1", "带来", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"])['con']
doc.pretty_print()

Describe the current behavior
A clear and concise description of what happened.

Expected behavior
A clear and concise description of what you expected to happen.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Python version:3.10
  • HanLP version:2.1.0b56

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
315947501-49fdf6aa-4e0c-4892-aff0-692cf2a61a4a

    • I've completed this form and searched the web for solutions.
@oasis-0927 oasis-0927 added the bug label Mar 22, 2024
hankcs added a commit that referenced this issue Mar 23, 2024
hankcs added a commit that referenced this issue Mar 23, 2024
@hankcs
Copy link
Owner

hankcs commented Mar 23, 2024

感谢反馈,已经修复,请检查上面的commit是否解决了这个问题。
如果还有问题,欢迎重开issue。

phrasetree有序列化的功能,而且更轻量化。

@hankcs hankcs closed this as completed Mar 23, 2024
@oasis-0927
Copy link
Author

测试已修复,感谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants