Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement binary uniform affinity #242

Merged
merged 5 commits into from
May 22, 2023

Conversation

dkobak
Copy link
Contributor

@dkobak dkobak commented May 12, 2023

This is a very minor suggestion, but I realized some time ago that in some papers we implement "uniform affinities" slightly differently: not as (P + P.T) / 2 but as (P + P.T > 0).astype(float) where P is the kNN adjacency matrix. The only difference is that the former has two possible non-zero values, whereas the latter has only one possible non-zero value, which is kind of neat.

This does not really make much of a different in practice, but I thought it may be nice to implement it in Uniform() using symmetrize="average" and symmetrize="or", in addition to symmetrize=True/False which this class has anyway.

tsne-uniform-two-kinds

@pavlin-policar
Copy link
Owner

I like this, but perhaps instead of "average" and "or", I would prefer "mean", and "max", which (to me) feel more standard. If we do go this route, I would also change the default parameter value from symmetrize=True to symmetrize="max". Do you think it would make more sense to default to the binary affinity kernel, or the current "mean" symmetrization?

@dkobak
Copy link
Contributor Author

dkobak commented May 16, 2023

Makes sense, I changed it to "mean" and "max". Regarding the default value, I talked to @sdamrich and he suggested we use "max" as default, because that results in a conceptually simpler affinity matrix which can be seen as uniform on the "skNN" ("symmetrized kNN") graph. I don't have a strong opinion here myself. So I set "max" to default (but kept symmetrize=True as a possible input option that also defaults to "max"; symmetrize=False performs no symmetrization).

Note that this is going to change the default behaviour of this class, but I guess not many people will be affected.

@pavlin-policar
Copy link
Owner

pavlin-policar commented May 16, 2023

I hate to be nit-picky, but could you please add a case if we get a string that isn't either "mean" or "max" and raise a ValueError with something like symmetrization {method} is not recognized. Just so we don't fail silently in case the user makes a typo or something like that.

Note that this is going to change the default behaviour of this class, but I guess not many people will be affected.

At first, I didn't really have any qualms about this, since probably not many people rely on this. But on second thought, it would probably be better to have a version with the same default as before, so "mean", but have a FutureWarning indicating that the default will change in the next version. So the roadmap might be 0.7.2 with the same behaviour as now, but with a warning, then I would change the default to "max" in the next version. Even though this change probably wouldn't affect many people, I don't want to get in this habit of assuming people don't use it, then we inadvertently break something.

@dkobak
Copy link
Contributor Author

dkobak commented May 17, 2023

Makes sense. I changed it so that the default is True (that's how it was before) and this behaves like "mean" but triggers a FutureWarning.

@pavlin-policar pavlin-policar merged commit c0a02b1 into pavlin-policar:master May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants