-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Move Louvain clustering from prototypes to core #3111
Conversation
_DEFAULT_K_NEIGHBORS = 30 | ||
|
||
|
||
METRICS = [('Euclidean', 'l2'), ('Manhattan', 'l1')] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the only two metrics that are supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being, only these two metrics are supported because of the way sklearn computes nearest neighbours. The only other metrics I had considered adding was the cosine, pearsonr and spearmanr distances, but unfortunately the fast tree methods for finding nearest neighbours in sklearn don't support these and fall back to the O(n^2) brute force approach, which is prohibitively expensive for larger data sets, so we decided it is better to leave them out.
If you feel any of the other supported metrics should be added, they are listed in the source: KDTree and BallTree. Most of these don't appear anywhere else in Orange, so including them doesn't really make sense.
self._invalidate_graph() | ||
self.commit() | ||
|
||
def _update_k_neighbors(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you do not expect these methods will be expanded in the future with code specific to changing one of the settings I don't see a reason to have 3 methods with the same body but different names.
I think having just one _update_graph and using it as the callback for all 3 controls would be better.
Orange/clustering/louvain.py
Outdated
|
||
import Orange | ||
from Orange.data import Table | ||
from community import best_partition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this import belongs to the above group of external libraries now
The widget works well as far as I was able to test it. What I am missing is documentation - at least a stub with something completely basic that can be expanded later. Another thought was, why not a slider for Resolution like for PCA components. If anything I would sooner expect a spinbox for the number of components (everybody understands the number, I know what I am doing when setting e.g. 5 or 20). For Resolution, I have no idea what 0.5 means and how much bigger 1.5 is or 2 (or why 5 is the max). So a resolution slider from smaller scale communities to larger scale could make more sense. |
Oh, and the only issue I have with the functionality is that the pca slider now commits on any change so dragging it starts a lot of computations and isn't smooth. Commit should happen when I stop dragging. |
Codecov Report
@@ Coverage Diff @@
## master #3111 +/- ##
==========================================
+ Coverage 82.54% 82.59% +0.05%
==========================================
Files 337 340 +3
Lines 58431 58767 +336
==========================================
+ Hits 48229 48540 +311
- Misses 10202 10227 +25 |
Using sliders makes more sense, I agree. I've changed it now. As far as the documentation goes, I can write some documentation, but I can't for the life of me figure out how to incorporate that into Orange. |
Ajda to the rescue! 🏇
If you have any questions, just drop by. I would be happy if you just provide the key content, I can later take care of the rest. |
cf0cdb3
to
15802fb
Compare
Great, thanks! I had no idea this existed. I've added a fairly minimal stub explaining what the controls do and how the widget works. I've tried to make the stamping consistent with the other widgets. Let me know if I should add anything or anything is poorly worded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some small comments to the documentation. You can opt not to include the example as the widget is easy to use, but perhaps it could be nice if an expert did it. :) However, it is best if you run make html in the visual-programming folder to inspect where the documentation needs to be fixed. The log will give you hints.
Another a small note: it would be nice to include a report option in the widget, too. Just implement send_report function and report on parameters.
@@ -0,0 +1,38 @@ | |||
Louvain Clustering | |||
======= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to extend the signs to encompass at minimum the entire title.
|
||
.. figure:: images/Louvain-stamped.png | ||
|
||
1. PCA processing is typically be applied to the original data to remove noise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically be applied
values recover clusters containing more data points. | ||
5. When *Apply Automatically* is ticked, the widget will automatically | ||
communicate all changes. Alternatively, click *Apply*. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the bottom it is great to provide a minimum workflow for the widget. Or a cool way how to use it, preferably on a simple data set that exists in Orange.
@ajdapretnar I've added the report and a simple example showing off integration with the Network addon. I also don't know what indexing images is, or if it's a problem in this case - if it is I can look into it. Is this all right? |
🤔 I think tests won't pass if images aren't indexed. This is what I had in mind. It is as simple as opening Gimp and using Image - Mode - Indexed (Convert). Then File - Export As (and click all the Export buttons that appear). This is what you add then to the repo. |
@lanzagar I've fixed the images, all the tests are passing. The only thing remaining are some pointless pylint errors. Should I try to fix them and add ignore statements where necessary or is this fine? |
Most of the pylint errors seem to be import related and easy to fix and improve the code. |
current_progress = idx / num_tasks | ||
# How much progress can each task contribute to the total | ||
# work to be done | ||
task_percentage = len(self.__tasks) ** -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is wrong with 1 / len(self.__tasks)? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I wrote this code at a point where I was doing a considerable amount of calculus and I guess I found this clearer at that point.
1beb225
to
1e705f9
Compare
@lanzagar The only remaining issue is the coverage. I've written all the tests that I think are reasonable for this widget, and I think most of the code is hit at least once, so I'm not sure where this poor coverage comes from. Should I look into this or can this be merged? |
I would like to merge this tomorrow if possible. If you can, take a look at the diff from @astaric and add a test or two to cover the bigger red parts. While I know that saying tests can also be added later, after this is merged, is not good practice, but I also wouldn't like to leave this PR open too long. It would be nice if it gets some use before the release and really should be in the next one. |
3d27c52
to
c8e9c64
Compare
c8e9c64
to
86ce71b
Compare
fdf0556
to
6a0c36f
Compare
Issue
Fixes #3110
Description of changes
Move Louvain clustering widget from prototypes
Includes