-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] New language detection #874
Conversation
4b93fb2
to
3c3de95
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #874 +/- ##
==========================================
- Coverage 77.82% 77.10% -0.73%
==========================================
Files 87 86 -1
Lines 12338 12014 -324
Branches 1624 1570 -54
==========================================
- Hits 9602 9263 -339
- Misses 2434 2459 +25
+ Partials 302 292 -10 |
718562b
to
610a756
Compare
0315e11
to
534b2cc
Compare
a2ce978
to
e061cd7
Compare
I agree with
|
You are right, I cannot reproduce this anymore. Ignore the comment. |
a370052
to
092558b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -75,13 +76,28 @@ class StopwordsFilter(BaseTokenFilter, FileWordListMixin): | |||
""" Remove tokens present in NLTK's language specific lists or a file. """ | |||
name = 'Stopwords' | |||
|
|||
# nltk uses different language nams for some languages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nams -> names
@@ -0,0 +1 @@ | |||
language: en |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new line is missing at the end of the file. The same goes for all *.tab.metadata files.
orangecontrib/text/corpus.py
Outdated
@@ -478,7 +485,7 @@ def copy(self): | |||
|
|||
@staticmethod | |||
def from_documents(documents, name, attributes=None, class_vars=None, metas=None, | |||
title_indices=None): | |||
title_indices=None, language=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Language is missing in the docstring.
@@ -61,7 +61,7 @@ def __init__(self, *args, **kwargs): | |||
|
|||
# Language | |||
row += 1 | |||
language_edit = ComboBox(self, 'language', tuple(sorted(lang2code.items()))) | |||
language_edit = ComboBox(self, 'language', tuple(sorted(LANG2ISO.items()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placing the Wikipedia widget on canvas results in an error:
Traceback (most recent call last):
File "/Users/vesna/orange-canvas-core/orangecanvas/scheme/widgetmanager.py", line 236, in __add_widget_for_node
w = self.create_widget_for_node(node)
File "/Users/vesna/orange-widget-base/orangewidget/workflow/widgetsscheme.py", line 300, in create_widget_for_node
widget = self.create_widget_instance(node)
File "/Users/vesna/orange-widget-base/orangewidget/workflow/widgetsscheme.py", line 413, in create_widget_instance
widget.__init__()
File "/Users/vesna/orange3-text/orangecontrib/text/widgets/owwikipedia.py", line 64, in __init__
language_edit = ComboBox(self, 'language', tuple(sorted(LANG2ISO.items())))
TypeError: '<' not supported between instances of 'NoneType' and 'str'
092558b
to
c11060b
Compare
829d7eb
to
951a90a
Compare
54b2ca6
to
cb929af
Compare
Issue
Implements the new approach to language detection in the add-on.
Fixes #583
Description of changes
Includes
Comments to the reviewer