-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Concatenate: Fix wrong merging of categorical features #4425
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4425 +/- ##
==========================================
- Coverage 87.46% 83.15% -4.32%
==========================================
Files 405 268 -137
Lines 74135 53962 -20173
==========================================
- Hits 64841 44870 -19971
+ Misses 9294 9092 -202 |
acde018
to
1424b4f
Compare
Orange/data/variable.py
Outdated
compute_value=compute_value, name=name, | ||
number_of_decimals=kwargs.pop("number_of_decimals", | ||
self.number_of_decimals), | ||
**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enables creating combinations of self._number_of_decimals
, self.adjust_decimals
and self._format_str
that are forbidden by number_of_decimals setter.
Speaking of self.adjust_decimals
. Why do we need that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. This should go through setter. I changed it to:
def copy(self, compute_value=None, *, name=None, **kwargs):
number_of_decimals = kwargs.pop("number_of_decimals", None)
var = super().copy(compute_value=compute_value, name=name, **kwargs)
if number_of_decimals is not None:
var.number_of_decimals = number_of_decimals
else:
var._number_of_decimals = self._number_of_decimals
var.adjust_decimals = self.adjust_decimals
var.format_str = self._format_str
return var
adjust_decimals
is a flag that tells whether the number of decimals was fixed (e.g. through setter) or is it being adjusted to the largest number of decimals (while reading the file). See function val_from_str_add_cont
.
Orange/data/variable.py
Outdated
super().__init__(name, compute_value, sparse=sparse) | ||
self.values = values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather make self.values
a property and check self._value_index - self._value
consistency in its setter. It's too easy to obtain inconsistent state with this api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But this commit is from #4422, on which this PR was based. #4422 was already merged with this code.
I agree that values
should be a property and it should also be a tuple, not a list. But if I remember correctly, I already tried to change this once but ran into problems. At any rate, it's not in this PR, but if you'd like to try it, I support it.
Orange/data/variable.py
Outdated
|
||
def add_value(self, s): | ||
""" Add a value `s` to the list of values. | ||
""" | ||
if not isinstance(s, str): | ||
raise TypeError("values of DiscreteVariables must be strings") | ||
self._value_index[s] = len(self.values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a public method, I'd add a check if s not in self._value_index
, before inserting a new value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. I agree, but this is from #4422, which is already merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now implemented this in #4450.
Plus fixing some lint issues for a good measure.
Orange/widgets/data/owconcatenate.py
Outdated
(table.domain for table in tables)) | ||
domains = [table.domain for table in tables] | ||
oper = set.union if self.merge_type == OWConcatenate.MergeUnion \ | ||
else set.intersection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why passing oper
parameter? It can be set inside self.merge_domains
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inertia. :)
1424b4f
to
5c56b65
Compare
5c56b65
to
c9acdf2
Compare
c9acdf2
to
b310fb7
Compare
@VesnaT, I added a test that tries different orders of adding signals: 98b0e15#diff-e8327cf214d283cae48e36d934820db9R351. Can you pull the PR and run the test? If it passes: can you modify it to replicate what you're doing in canvas, so that it will fail? |
Issue
Fixes #4406. Also fixes part of #4382.
The problem was that when merging domains, attributes from the first domain are used and other attributes values' that don't appear in the first table are turned into nan's.
This fix applies only when "primary table" is not present. When it is present, #4422 suffices.
Description of changes
When filtering for unique attributes, two (or more) attributes with different values are replaced by a new attribute.
Includes