Wrapper to convert arbitrary clusterer into a classifying one #768

ablaom · 2021-11-18T03:21:21Z

Opening this issue after a nice suggestion of @davnn .

Some clusterers (eg, sckitlearn's DBSCAN) only deliver labels for the training data and cannot immediately label new unseen data. In that case one can use any ordinary classifier (eg, KNN from NearestNeighborModels.jl) to generate labels for new data.

If the classifier is a probabilistic predictor, we can even get "fuzzy" labels (like GMMClusterer from BetaML) - which could be useful even for clusterers that already generalise to new data.

Any design depends on firming up the API for clusterers: JuliaAI/MLJ.jl#852

One possible implementation (requiring MLJBase as a dependency) is to use a learning network (wrapped in a fit definition) to define the new model (see eg, TransformedTargetModel). One advantage would be that changes to the classifier hyper-parameters would not trigger re-training of the base clusterer. (I mean you could arrange that with a "hard-wired" implementation, but that would be duplicating logic we already have, extra testing, etc).

See below for a proof-of-concept.

Thoughts anyone?

@juliohm @jbrea @OkonSamuel @alyst

using MLJBase
using MLJModels

pure_clusterer = (@load DBSCAN pkg=ScikitLearn)()
classifier = (@load KNNClassifier)()

Xraw, yraw  = make_blobs(1000, rng=123)
X, Xtest = partition(Xraw, 0.5)
_, ytest = partition(yraw, 0.5)

# the learning network (with training data at the source node):

Xs = source(X)

# this clusterer stores the training labels in its fitted_params:
mach1 = machine(pure_clusterer, Xs)
Θ = node(fitted_params, mach1)
y = node(θ -> θ.labels, Θ) # the training labels

# classifier will train using the training_labels `y`:
mach2 = machine(classifier, Xs, y)
ŷ = predict(mach2, Xs) # returns probability distributions

# train the network:
fit!(ŷ)

# getting "probabilistic" labels for new data:
ŷ(Xtest);

# getting labels for new data:
y = mode.(ŷ(Xtest));

# good agreement up to relabelling:
julia> zip(ytest, y) |> collect
 (1, 3)
 (2, 2)
 (2, 2)
 (1, 3)
 (3, 1)
 (1, 3)
 (2, 2)
 (1, 3)
 (1, 3)
 (2, 2)
 (2, 2)
 (3, 1)
 (1, 3)
 (3, 1)
 (3, -1)
...

The text was updated successfully, but these errors were encountered:

ablaom mentioned this issue Nov 18, 2021

Add Clustering.DBSCAN to interface JuliaAI/MLJClusteringInterface.jl#11

Closed

ablaom transferred this issue from JuliaAI/MLJClusteringInterface.jl May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrapper to convert arbitrary clusterer into a classifying one #768

Wrapper to convert arbitrary clusterer into a classifying one #768

ablaom commented Nov 18, 2021 •

edited

Loading

Wrapper to convert arbitrary clusterer into a classifying one #768

Wrapper to convert arbitrary clusterer into a classifying one #768

Comments

ablaom commented Nov 18, 2021 • edited Loading

ablaom commented Nov 18, 2021 •

edited

Loading