Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

info & background on merge_closest() method #34

Open
jGaboardi opened this issue Jun 11, 2024 · 8 comments
Open

info & background on merge_closest() method #34

jGaboardi opened this issue Jun 11, 2024 · 8 comments
Assignees
Labels

Comments

@jGaboardi
Copy link
Collaborator

I am curious about the history & commenting out of the merge_closest() method. A quick search didn't lead to any previous issues or PRs (though I likely overlooked something). Is there anything you can point me to for that, @carsonfarmer?

@carsonfarmer
Copy link
Owner

Yeh, to be honest, I'm not totally sure? I suspect it was commented out, because I didn't have a very robust test for it. I recall essentially wanting to create random clusters, and then testing that they were recoverable via the merge_closest() flow. But I didn't end up implementing that.

@jGaboardi
Copy link
Collaborator Author

jGaboardi commented Jun 11, 2024

So do you think we want to:

  • 1. ... keep it in as it is (commented out),
  • 2. ... remove it entirely, or
  • 3. ... try to create MWE confirmed test case

cc @gegen07

@carsonfarmer
Copy link
Owner

Well, obviously 3 is the ideal outcome. But given I no longer have a core use case for this, I would likely find it difficult to devote the time. If @gegen07 has a use case/data set/example, I'm sure we could use that!

Alternatively, we can just enable it, do the minimal test, and see if folks find it useful? For instance, the merge_closest method is really just a wrapper around the example/demo, which does indeed work.

@jGaboardi
Copy link
Collaborator Author

I found the commit where merge_closest() and interact() were removed -- 9236496

@carsonfarmer
Copy link
Owner

Ah yes, interesting. So based on the comment, it looks like there was some weirdness with merging and not getting consistent results in tests. Having said that, it looks like there's no good reason to not support this, and just allow someone to supply their own merge function as input to merge_closest().

@jGaboardi
Copy link
Collaborator Author

Putting this chunk from the README in here to keep around for a notebook for when these become part of fastpair again -- xref #45 .

To illustrate the mergeing methods, here is a simple example of hierarchical clustering, treating points as the 'centroids' of various clusters:

for i in range(len(fp)-1):
   # First method... do it manually:
   dist, (a, b) = fp.closest_pair()
   c = interact(a, b)  # Compute mean centroid
   fp -= b
   fp -= a
   fp += c
   # Alternatively... do it all in one step:
   # fp.merge_closest()
len(fp)  # 1

@jGaboardi
Copy link
Collaborator Author

I'm just now realizing that interact() is actually defined within test_fastpair.py and then used for a merge_closest() test there, as well. Not sure how I missed this before.

@jGaboardi
Copy link
Collaborator Author

@carsonfarmer Should we go ahead and get those moved back into base.py and flesh out some tests for them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants