Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Widget for Self-Organizing Maps #3928

Merged
merged 31 commits into from
Aug 23, 2019
Merged

[ENH] Widget for Self-Organizing Maps #3928

merged 31 commits into from
Aug 23, 2019

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Jul 8, 2019

Implements #2800.

To do:

  • Fix selection in hexagonal grid
  • Support sparse data
  • Add Progress bar
  • Run optimization in separate thread
  • Interrupt optimization on arrival of new data or when shape/dimension changes
  • Add initialization of weights with PCA
  • In random initialization, let the user use a fixed or random seed
  • See how to optimize X[rowi] in Cython
  • Add nogil in Cython
  • Fix crash when color is a meta (with dtype=object)
  • Annotated output
  • Coloring by discretized numeric features
  • Legend
  • Move single selection by keypress
  • Icon
Includes
  • Code changes
  • Tests

@janezd janezd changed the title [ENH] Widget for Self-Organizing Maps [WIP] [ENH] Widget for Self-Organizing Maps Jul 8, 2019
@ajdapretnar
Copy link
Contributor

I was waiting for this widget for 2 years! 😍

@codecov
Copy link

codecov bot commented Jul 9, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@be34e1d). Click here to learn what that means.
The diff coverage is 1.61%.

@@            Coverage Diff            @@
##             master    #3928   +/-   ##
=========================================
  Coverage          ?   84.24%           
=========================================
  Files             ?      372           
  Lines             ?    65301           
  Branches          ?        0           
=========================================
  Hits              ?    55012           
  Misses            ?    10289           
  Partials          ?        0

@codecov
Copy link

codecov bot commented Jul 9, 2019

Codecov Report

Merging #3928 into master will increase coverage by 0.02%.
The diff coverage is 86.86%.

@@            Coverage Diff             @@
##           master    #3928      +/-   ##
==========================================
+ Coverage   85.23%   85.26%   +0.02%     
==========================================
  Files         382      385       +3     
  Lines       67670    68759    +1089     
==========================================
+ Hits        57680    58624     +944     
- Misses       9990    10135     +145

@janezd janezd force-pushed the som branch 3 times, most recently from 8329382 to c2aad32 Compare July 13, 2019 11:46
@janezd
Copy link
Contributor Author

janezd commented Jul 13, 2019

Ready for review.

Reports still fail, but this is due to a problem in report_plot which uses scene's size as png size. This widget's scene coordinates are small, like 10x10,with a legend that does not resize.

Help (in form of a local fix or general solution) appreciated.

@janezd janezd changed the title [WIP] [ENH] Widget for Self-Organizing Maps [ENH] Widget for Self-Organizing Maps Jul 13, 2019
@janezd
Copy link
Contributor Author

janezd commented Jul 13, 2019

Correction: Reports don't fail, they produce a 10x10 png.

Pylint fails because the widget has to many attributes (24/20). This is a common problem that we'll have to discuss.

Also, I fear tests for this widget may sometimes fail because of threading problems. I suspect that onDeleteWidget finishes the optimization thread, but some signals are still being processed and may trigger redrawing. Is there a way to block/remove pending signals?

Stop / Restart button also doesn't work properly. @ales-erjavec could you check what I did wrong this time?

@ajdapretnar
Copy link
Contributor

As always, I love to share my comments with you. 😆

  • Selection is strange. It is currently a square, which I could survive, but it goes behind the grid and I can't see what I am selecting. (select some data, then try to select a part of the current selection) If one could select just the hexagons, that would be even nicer! What I have in mind is something like Data Table, where selection would just be colored blue.

  • The legend is an issue. Use SOM on iris, then change the data to heart_disease. The legend is placed under the visualization for some reason.

  • Is the widget intentionally missing Apply button? I think it would be nice to have it, no?

  • The widget should color by class by default.

  • Nothing major, but should we not thing a bit about the first box? FreeViz has initialization then the button, MDS the button, then the steps, then initialization (+ jittering) buttons, tSNE has parameters then start button, SOM has start button then the initialization... Probably a more unified interface would be beneficial here.

  • Perhaps @larazupan could think of a better icon? This one looks more like it should be in the Geo add-on... 😬

def update(_progress, som):
from AnyQt.QtWidgets import qApp
progressbar.advance()
qApp.processEvents() # This is apparently needed to advance the bar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is apparently needed to advance the bar

And to invoke a MaximumRecursion error if NITERATIONS is ever increased to ~1000.
Instead avoid doing superfluous work in _assign_instances,_redraw.

Never call processEvents() from queued signal connections (and these are queued). It can/will recurse when the signals are emitted faster then they are processed on the receiver side. As a consequence the order of calls to self._assign_instances(som) can be inverted (as they are called after the recursion point).

In fact it seems like som.winners call in _assign_instances uses the som.weights while they are still being mutated by the continuing optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think I made the same mistakes in the network analysis widget. I removed processEvents, and som.winners now gets a oopy of the data.

I can't remove much from _assign_instances and _redraw, but it seems to work alright now.

@janezd
Copy link
Contributor Author

janezd commented Jul 16, 2019

  • Square selection: I was lazy. But now I found a better way, it looks nice and its implementation is simpler. Try it, I'll think you'll like it.
  • Legend under the visualization: I already fixed, but then unfixed it. Now it's refixed.
  • Apply button: it had it. I didn't like it. I somehow feel the selection has to be auto-applied. Only other settings that require "application" are grid shape and dimensions. I can add the button, if you think I should.
  • Default color: I don't know why class var was not chosen. I changed something and now it seems to work.
  • Order of buttons: I think this makes sense. Initialization decides how SOM is ran. FreeViz and PCA are different because the user can manually manipulate the image (at least for FreeViz), so the optimization starts from some position. In case of SOM, the user would always have to press two buttons, if we separated them. I don't have a hard stance here and can change it, but we can have discussions about unifying all guis later.
  • Icon: I've drawn a different icon first, but I thought it's boring. I'm attaching it. What do you think?

SOM copy.svg.zip

@janezd
Copy link
Contributor Author

janezd commented Jul 16, 2019

@ajdapretnar, after playing with Adult data I agree about the Apply button.

But I'm not sure what it should do. Should changing the grid size (with auto apply off), clear the visualization until apply is pressed? Or should the visualization stay as it is, although the values in controls wouldn't match the visualization? Both will be annoying to implement, so I also welcome any other suggestions.

I also think it would be cleaner if auto apply only blocked grid shape and size change, while selection would always propagate. Also because these are two different types of "applies": one restarts the optimization and the other is about output.

@ajdapretnar
Copy link
Contributor

Yes, Adult is exactly the data set I had in mind. When things get big, Auto apply starts to make sense. And I agree, what should changing the parameters do? Looking at other visualizations widgets, they block the selection. But that is because other parameters only change the size and color of the points, not the actual visualization. In t-SNE there is a special section for visualization parameters and once the user presses Start, the changes are applied. Perhaps instead of the Auto apply, there should be a simple button called Update visualization? Or something along this line... Nothing is ideal, I agree.

@ajdapretnar
Copy link
Contributor

p.s. I like the beehive icon (big fan of bees 🐝 here). But I already asked Blaz to ask Lara to design them... 🙊

@janezd
Copy link
Contributor Author

janezd commented Jul 17, 2019

@ajdapretnar, like this? Not finished, cleaned up, and may crash. I just want your opinion.

You'll notice that it runs optimization immediately after receiving new data, which should usually be OK. But you can stop it, change parameters, rerun. It can easily handle Adult on 15x15 (works on 30x30, too, but it's slow).

Initializations are now in a combo instead of radios, like in some other widget(s). It looks nicer in this rearrangement, too.

@BlazZupan
Copy link
Contributor

BlazZupan commented Jul 19, 2019

This is a great reincarnation of SOM widget. I have some comments, questions, and a request:

  • When I use the widget (say, on Iris data), some circles are hollow (there is a thick border with color of higher intensity), while majority of the other circles are filled. Why the distinction.

image

  • On Iris data set, the legend does not look right when displaying real-valued features

image

  • Selection works great but is a bit different from, say, Scatterplot widget. There, use of a shift modifier would introduce classes with selected items. This feature is actually great because it supports the definition of different groups of items and subsequent analysis of differences. Would it be possible to implement this functionality in SOM as well?

  • Can we have an option to remove the legend?

  • The widget's current name is "Self-organizing Maps". I have googled this name, and it looks like people capitalize "organizing" as well, to have "Self-Organizing Maps".

  • I will ask Lara to render the icon for the widget.

  • Could the widget implement an automatic start of computation? For smaller data sets, I, for instance, change the dimensions, but then have to press Start after every change, whereas the algorithm is fast enough to just run the computation automatically.

  • Throughout Orange, we need to decide what to do with categorical features. Some widgets automatically continuize such features. An example is PCA. If I take zoo data set (all features are categorical), the PCA would work, but SOM would complain that there are no real-valued features. I would vote for automatic and default continuization in cases where input data includes any categorical features.

@janezd
Copy link
Contributor Author

janezd commented Jul 26, 2019

some circles are hollow

The intensity of interior color shows the proportion of majority class. It was the same as the border color if the cell was pure. I made the border thinner (it was already thinner on my screen, compared to your screenshot) and the interior color is now always a bit lighter than the border.

the legend does not look right when displaying real-valued features

It worked until I optimized some code. :) Also the output data was wrong (normalized) due to this bug.

Can we have an option to remove the legend?

Ask @ajdapretnar how to (re)move the legend. :) I can add a checkbox... but no other widget can hide the legend, because we decided against it at some point and remove these checkboxes.

capitalize "organizing" as well, to have "Self-Organizing Maps".

OK.

use of a shift modifier would introduce classes with selected items

I thought SOM doesn't need this, but it makes sense even just for consistency. I added it.

Could the widget implement an automatic start of computation?

It annoys me, too, but I'm not able to implement it (not for lack of trying). When the optimization is running, changing the control should terminate and restart it, but it doesn't, and sometimes it even crashes. In short: no, I can't do this.

Throughout Orange, we need to decide what to do with categorical features. Some widgets automatically continuize such features.

Maybe write a separate issue, just so that we don't forget to discuss it?

@janezd
Copy link
Contributor Author

janezd commented Jul 26, 2019

Fails on pylint; one problem is not mine, the other will stay and we'll discuss it later. So it's ready for rereview.

@ajdapretnar
Copy link
Contributor

Documentation provided in #3956.

@ajdapretnar ajdapretnar mentioned this pull request Aug 1, 2019
3 tasks
@ajdapretnar
Copy link
Contributor

Two problems remain, which I think should be addressed within this PR.

1.) Use brown-selected. SOM silently ignores instances with missing values. It should not. I prefer we use imputation as in other widgets and let the user know it happened.

2.) Use heart_disease. It is unclear why some instances are lighter than others. I suggest tooltips.
For discrete colors: absolute and relative numbers of each value of the coloring attribute.
For continuous colors: mean value of the instance group of the coloring attribute.

@janezd
Copy link
Contributor Author

janezd commented Aug 17, 2019

1.) Use brown-selected. SOM silently ignores instances with missing values. It should not. I prefer we use imputation as in other widgets and let the user know it happened.

Information about skipped instances was actually shown in the tooltip at the "input status" icon. But you're right, this was obscure.

Visualization widgets (from projections to mosaic and sieve) do not impute. I added a proper warning instead.

2.) Use heart_disease. It is unclear why some instances are lighter than others. I suggest tooltips.
For discrete colors: absolute and relative numbers of each value of the coloring attribute.
For continuous colors: mean value of the instance group of the coloring attribute.

Done, except that numeric variables are binned and colors correspond to bins (as the legend shows). Instead of showing just the mean, the widget shows the whole distribution (by bins). This works better for pie charts, as well as for single-color circles, where the color corresponds to the majority bin (and not to the bin that contains the average value).

@lanzagar
Copy link
Contributor

  1. There seem to be some problems with tooltips. E.g. check iris on the default 8x8 hex grid and hover over tiles in bottom row. I often get shown the tooltip for the tile to the right of the one I have my mouse over.
    som
  2. Choose a 5x10 grid. Clicking right of the grid correctly deselects, while clicking left of the grid makes a strange selection in that row (a couple of tiles, depending on how left/right you click). Not a big problem, but maybe an indicator of something fishy :)



def configuration(parent_package='', top_path=None):
from numpy.distutils.misc_util import Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is imported inside the function instead of at the top? I see that setup.pys in other dirs do the same, but don't know if there is a reason behind it or are we just copy pasting and propagating this.

I ask because moving it up and changing import numpy to from numpy import get_include (and using that below) might be nice enough and make lint pass as well and avoid the ugly red cross on travis :)
(it is complaining that numpy is not used, which is a bit strange anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suspicion is correct, I copied this from other setups. :) If we're sure this import doesn't need to be local, we should fix all setups. Which implies it doesn't belong to this PR. :)

@janezd
Copy link
Contributor Author

janezd commented Aug 23, 2019

  1. There seem to be some problems with tooltips.

Tooltips were not wrong but appeared at wrong positions: you could also get a tooltip to the left of the (ellipse) object and below it. I wasn't able to discover the reason but implemented the whole thing differently (and, perhaps, better).

  1. Choose a 5x10 grid. Clicking right of the grid correctly deselects, while clicking left ...

Fixed.

@janezd
Copy link
Contributor Author

janezd commented Aug 23, 2019

I fixed lint, but I won't attempt to improve the coverage - too much user interaction code.

@lanzagar lanzagar merged commit e218f68 into biolab:master Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants