-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Widget for Self-Organizing Maps #3928
Conversation
I was waiting for this widget for 2 years! 😍 |
Codecov Report
@@ Coverage Diff @@
## master #3928 +/- ##
=========================================
Coverage ? 84.24%
=========================================
Files ? 372
Lines ? 65301
Branches ? 0
=========================================
Hits ? 55012
Misses ? 10289
Partials ? 0 |
Codecov Report
@@ Coverage Diff @@
## master #3928 +/- ##
==========================================
+ Coverage 85.23% 85.26% +0.02%
==========================================
Files 382 385 +3
Lines 67670 68759 +1089
==========================================
+ Hits 57680 58624 +944
- Misses 9990 10135 +145 |
8329382
to
c2aad32
Compare
Ready for review. Reports still fail, but this is due to a problem in Help (in form of a local fix or general solution) appreciated. |
Correction: Reports don't fail, they produce a 10x10 png. Pylint fails because the widget has to many attributes (24/20). This is a common problem that we'll have to discuss. Also, I fear tests for this widget may sometimes fail because of threading problems. I suspect that Stop / Restart button also doesn't work properly. @ales-erjavec could you check what I did wrong this time? |
As always, I love to share my comments with you. 😆
|
Orange/widgets/unsupervised/owsom.py
Outdated
def update(_progress, som): | ||
from AnyQt.QtWidgets import qApp | ||
progressbar.advance() | ||
qApp.processEvents() # This is apparently needed to advance the bar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is apparently needed to advance the bar
And to invoke a MaximumRecursion error if NITERATIONS is ever increased to ~1000.
Instead avoid doing superfluous work in _assign_instances,_redraw.
Never call processEvents() from queued signal connections (and these are queued). It can/will recurse when the signals are emitted faster then they are processed on the receiver side. As a consequence the order of calls to self._assign_instances(som)
can be inverted (as they are called after the recursion point).
In fact it seems like som.winners
call in _assign_instances
uses the som.weights
while they are still being mutated by the continuing optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think I made the same mistakes in the network analysis widget. I removed processEvents
, and som.winners
now gets a oopy of the data.
I can't remove much from _assign_instances
and _redraw
, but it seems to work alright now.
|
@ajdapretnar, after playing with Adult data I agree about the Apply button. But I'm not sure what it should do. Should changing the grid size (with auto apply off), clear the visualization until apply is pressed? Or should the visualization stay as it is, although the values in controls wouldn't match the visualization? Both will be annoying to implement, so I also welcome any other suggestions. I also think it would be cleaner if auto apply only blocked grid shape and size change, while selection would always propagate. Also because these are two different types of "applies": one restarts the optimization and the other is about output. |
Yes, Adult is exactly the data set I had in mind. When things get big, Auto apply starts to make sense. And I agree, what should changing the parameters do? Looking at other visualizations widgets, they block the selection. But that is because other parameters only change the size and color of the points, not the actual visualization. In t-SNE there is a special section for visualization parameters and once the user presses Start, the changes are applied. Perhaps instead of the Auto apply, there should be a simple button called Update visualization? Or something along this line... Nothing is ideal, I agree. |
p.s. I like the beehive icon (big fan of bees 🐝 here). But I already asked Blaz to ask Lara to design them... 🙊 |
@ajdapretnar, like this? Not finished, cleaned up, and may crash. I just want your opinion. You'll notice that it runs optimization immediately after receiving new data, which should usually be OK. But you can stop it, change parameters, rerun. It can easily handle Adult on 15x15 (works on 30x30, too, but it's slow). Initializations are now in a combo instead of radios, like in some other widget(s). It looks nicer in this rearrangement, too. |
This is a great reincarnation of SOM widget. I have some comments, questions, and a request:
|
The intensity of interior color shows the proportion of majority class. It was the same as the border color if the cell was pure. I made the border thinner (it was already thinner on my screen, compared to your screenshot) and the interior color is now always a bit lighter than the border.
It worked until I optimized some code. :) Also the output data was wrong (normalized) due to this bug.
Ask @ajdapretnar how to (re)move the legend. :) I can add a checkbox... but no other widget can hide the legend, because we decided against it at some point and remove these checkboxes.
OK.
I thought SOM doesn't need this, but it makes sense even just for consistency. I added it.
It annoys me, too, but I'm not able to implement it (not for lack of trying). When the optimization is running, changing the control should terminate and restart it, but it doesn't, and sometimes it even crashes. In short: no, I can't do this.
Maybe write a separate issue, just so that we don't forget to discuss it? |
Fails on pylint; one problem is not mine, the other will stay and we'll discuss it later. So it's ready for rereview. |
Documentation provided in #3956. |
Two problems remain, which I think should be addressed within this PR. 1.) Use brown-selected. SOM silently ignores instances with missing values. It should not. I prefer we use imputation as in other widgets and let the user know it happened. 2.) Use heart_disease. It is unclear why some instances are lighter than others. I suggest tooltips. |
Information about skipped instances was actually shown in the tooltip at the "input status" icon. But you're right, this was obscure. Visualization widgets (from projections to mosaic and sieve) do not impute. I added a proper warning instead.
Done, except that numeric variables are binned and colors correspond to bins (as the legend shows). Instead of showing just the mean, the widget shows the whole distribution (by bins). This works better for pie charts, as well as for single-color circles, where the color corresponds to the majority bin (and not to the bin that contains the average value). |
|
||
|
||
def configuration(parent_package='', top_path=None): | ||
from numpy.distutils.misc_util import Configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this is imported inside the function instead of at the top? I see that setup.py
s in other dirs do the same, but don't know if there is a reason behind it or are we just copy pasting and propagating this.
I ask because moving it up and changing import numpy
to from numpy import get_include
(and using that below) might be nice enough and make lint pass as well and avoid the ugly red cross on travis :)
(it is complaining that numpy is not used, which is a bit strange anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suspicion is correct, I copied this from other setups. :) If we're sure this import doesn't need to be local, we should fix all setups. Which implies it doesn't belong to this PR. :)
Tooltips were not wrong but appeared at wrong positions: you could also get a tooltip to the left of the (ellipse) object and below it. I wasn't able to discover the reason but implemented the whole thing differently (and, perhaps, better).
Fixed. |
I fixed lint, but I won't attempt to improve the coverage - too much user interaction code. |
Implements #2800.
To do:
X[rowi]
in Cythondtype=object
)Includes