Unsupervised learning interfaces - is transformer too narrow? #51

fkiraly · 2019-01-23T11:25:56Z

Regarding unsupervised models such as PCA, kmeans, etc discussed in #44.

I know these are commonly encapsulated within the transformer formalism, but it would do the methodology behind them injustice as feature extraction is only one major usage cases of unsupervised models. More precisely, there are, as far as I can see, three use cases:

(i) feature extraction. For clusterers, create a column with cluster assignment. For continuous dimension reducers, create multiple continuous columns.

(ii) model structure inference - essentially, inspection of the fitted parameters. E.g., PCA components and loadings. Cluster separation metrics etc. These may be of interest in isolation, or used as an (hyper-parameter) input of other atomic models in a learning pipeline.

(iii) full probabilistic modelling aka density estimation. This behaves as a probabilistic multivariate regressor/classifier on the input variables.

For the start if makes sense to implement only "transformer" functionality, but it is maybe good to keep in mind for implementation that eventually one may like to expose the other outputs via interfaces. E.g., the estimated multivariate density in a fully probabilistic implementation of k-means.

ablaom · 2019-01-23T21:37:27Z

I think this is a good point. There are two choices for exposing extra functionality at present:

(i) fit may return additional information in its report dictionary (this could include functions/closures but was not the original intention)

(ii) one implements methods beyond transform dispatched on the fit-result. This presently requires adding ("registering") the method name to MLJBase.

fkiraly · 2019-01-24T12:20:03Z

@ablaom, I think the report dictionary returned by fit should, at most, be diagnostic reports of the fitting itself and not be abused for parameter inference or reporting.

I'd personally introduce a single method for all models, e.g., fitted_params which could return a dictionary of model parameters and diagnostics. These would be different for each model - for example, for ordinary least squares regression, it might return coefficients, CI, R-squared, and t/F test results.

What we may want to be careful about is the interaction with the parameter interface. I usually like to distinguish hyper-parameters = set externally, not changed by fit, and model parameters = no external access, set by fit.

ablaom · 2019-01-24T21:17:47Z

Two issues here:

Type of information to be accessed after a fit call. I suppose we can classify these into "parameter inference" and "other". It's not clear to me how "other" can be unambiguously divided further, but help me out here if you can.

Method of access. Dictionary or method. The original idea of dictionary was that it would be a persistent kind of thing, or even some kind of log/history. A dictionary has the added convenience that one adds keys according to circumstance (e.g., if I set a hyperparameter requesting fit to rank features, then :feature_rankings is a key of the report dictionary, otherwise it is not.) Actually, report isn't used currently to maintain a running log at the moment (by the correspondingmachine) but it could be. A method has the advantage that extra computation required to produce the information wanted can be avoided until the user calls for it. Now that I think of it, method and dictionary could be combined - method computes a dictionary that it returns.

I like the simplicity of a returning a single object to report all information of possible interest, computed after every fit, whether it be fitted parameters or whatever. What is less clear to me is whether information that requires extra computation should be accessed:

(i) by requesting the computation through an "instruction" hyperparameter and returning the result in the same report object; or

(ii) having a dedicated method dispatched on the fit-result, like predict.

Your thoughts?

What we may want to be careful about is the interaction with the parameter interface. I usually like to distinguish hyper-parameters = set externally, not changed by fit, and model parameters = no external access, set by fit.
Agreed!

fkiraly · 2019-02-04T12:05:13Z

Some thoughts (after a longer time of thinking):

I think it would be a good idea to have a dedicated interface for fitted parameters, just as we have for hyperparameters, i.e., dictionary-style, and following exactly the same structure, nesting and accessor conventions for the fitting result as we have for the models.

What is automatically returned in this extension of fitresult are "standard model parameters that are easy to compute", i.e., it can be more than what predict needs but shouldn't add a lot of computational overhead. It also should be data-agnostic model structure parameters (e.g., model coefficients), or easy-to-obtain intermediate results for diagnostics (e.g., R-squared?).

Separate from this should be operations on the model that require significant computational overhead over fit/predict (e.g., variable importance), or that are data-dependent (e.g., F-test in-sample).

The standard stuff - i.e., standard methodology for diagnostics and parameter inference (e.g., for OLS, t-tests, CI, F-test, R-squared, diagnostic plots) I'd put in fixed dispatch methods diagnose (returns pretty-printable dict-like of summaries) or diagnose_visualize (produces plots/visualizations).

Advanced and non-standard diagnostics (e.g., specialized diagnostics or non-canonical visualizations) should be external, but these will be facilitated through the standardized model parameter interface once it exists.

Thoughts?

ablaom · 2019-03-05T20:46:16Z

@fkiraly I have come around to accepting your suggestion for a dedicated method to retrieve fitted parameters, separate from the report field of a machine. I also agree that params and fitted_params (which will have "nested" values for composite models) should return the same kind of object. I think a Julia NamedTuple (like a dict but with ordered keys and type parameters for each value) is the way to go. This will also be the form of the (possibly nested) report field, and report will get an accessor function, so that params, fitted_params, report are all methods that can be called on a (fitted) machine to return a named tuple.

I am working on implementing these various things simultaneously.

tlienart · 2019-03-06T00:00:37Z

I think a Julia NamedTuple (like a dict but with ordered keys and type parameters for each value) is the way to go

A noteworthy difference being that a NamedTuple is immutable, could that cause a problem here?

fkiraly · 2019-03-06T18:21:28Z

@ablaom, I'm onboard with NamedTuple or dictionary returned by method. The method be able to return abstract structs in its fields, and should be able to change with each run of fit.

Regarding user interface: I'd make it a method (by dispatch), and call it "inspect" unless you have a better idea.

On a side note, I think this would also help greatly with the issue highlighted in the visualization issue #85 , the "report" being possibly arcane and non-standardized.

Further to this, I think computationally expensive diagnostics such as "interpretable machine learning" style meta-methods should not be bundled with "inspect", but rather with external "interpretability meta-methods" (to be dealt with at a much later point).
The "inspect" interface point should be reserved for parameters or properties which do not add substantial computational overhead over "fit" - this could, for example, be defined as only constant (or log(# training data pts) ) added computational effort above "fit".

fkiraly · 2019-03-06T18:34:52Z

Hm, maybe another two default interface points - "print" and "plot" would be great?
These are default interface points in R.

"print" gives back a written summary, for example

Call:
lm(formula = weight ~ group - 1)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.0710 -0.4938  0.0685  0.2462  1.3690 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
groupCtl   5.0320     0.2202   22.85 9.55e-15 ***
groupTrt   4.6610     0.2202   21.16 3.62e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared:  0.9818,	Adjusted R-squared:  0.9798 
F-statistic: 485.1 on 2 and 18 DF,  p-value: < 2.2e-16

"plot" produces a series of standard diagnostic plots, which may differ by model type and/or task. I would conjecture there's some that you always want for a task (e.g., cross-plot and residual plot for deterministic supervised regerssion; calibration curves for probabilistic classification), and some that you only want for a specific model class (e.g., learning curves for SGD based methods, heatmaps for tuning methods)

fkiraly · 2019-03-06T18:36:41Z

Interesting question: where would "cross-plots out-of-sample" sit? Probably only available in the evaluation/validation phase, i.e., with the benchmark orchestrator.

fkiraly · 2019-03-06T18:38:18Z

Actually, I notice you already made a suggestion for a name: fitted_params.
Also fine with me - though I wonder, should this include easy-to-compute stuff such as F-statistic and in-sample-R-squared as well? Or should that be left to (a separate interface point!) "inspect"? Thoughts?

fkiraly · 2019-03-06T18:52:57Z

Also I realize, I've already said some of these things, albeit slightly differently, on Feb 4.
So greetings, @fkiraly from the past, I reserve the right to not fully agree with you.

ablaom · 2019-03-07T00:01:40Z

To clarify the existing design, we have these methods (dispatched on machines, params also on models):

params to retrieve possibly nested hyperparameters
fitted_params to retrieve possibly nested learned parameters
report to retrieve most everything else (could be nested), including computationally expensive stuff

As laid out in the guide (see below): Whether or not a computationally expensive item is actually computed is controlled by an "instruction" hyperparameter of the model. If a default value is not overridden, the item is empty (but the key is still there), a clue to user that more is available. I prefer this to a separate method to avoid method name proliferation.

I think the above cover MLR's "print" method. But we could overload Base.show for named tuples to make more user-friendly. Don't like name "print". Print what? Just about every command prints something. (edit but you could say the same about "report" - aarrgh!. Maybe "extras" ??)

Not so keen on changing name of "report" as this is breaking.

@tlienart I think every item of report should be regenerated at every call to fit (or update) so that information there is synchronised with the hyperparamter values attached to the machine's current model. So immutability not an issue. So far, the params method is just a convenience method for the user; tuning is carried out using other methods.

From the guide:

report is a (possibly empty) NamedTuple, for example,
report=(deviance=..., dof_residual=..., stderror=..., vcov=...).
Any training-related statistics, such as internal estimates of the
generalization error, and feature rankings, should be returned in
the report tuple. How, or if, these are generated should be
controlled by hyperparameters (the fields of model). Fitted
parameters, such as the coefficients of a linear model, do not go
in the report as they will be extractable from fitresult (and
accessible to MLJ through the fitted_params method, see below).

...

A fitted_params method may be optionally overloaded. It's purpose is
to provide MLJ accesss to a user-friendly representation of the
learned parameters of the model (as opposed to the
hyperparameters). They must be extractable from fitresult.

MLJBase.fitted_params(model::SomeSupervisedModelType, fitresult) -> friendly_fitresult::NamedTuple

For a linear model, for example, one might declare something like
friendly_fitresult=(coefs=[...], bias=...).

The fallback is to return (fitresult=fitresult,).

fkiraly · 2019-03-07T09:52:57Z

Very sensible. Maybe, do you want to make plot a specified/uniform interface point as well, along the lines of your suggestion in #85 (and/or mine above)?

Small detail regarding your reference "mlr's print".
mlr doesn't have a too good interface for pretty-printing or plotting.

It is actually the R language itself (i.e., base R) which has "print" and "plot" as designated interface points.
Agreed with "print" being a strange choice of name though for pretty-printed reports - when I first saw this long long ago, I thought it might mean saving to a file, or calling an actual printer.

fkiraly · 2019-03-07T09:54:29Z

"report" could be "inspect" the next time we write an MLJ, but let's not change a working system.

ablaom · 2019-03-08T02:18:34Z

At the moment the Plots.jl package "plot" function just about the "standard" Julia interface point for plotting, although the future is not clear to me and others may have a better crystal ball.

Plots.jl is a front end for plotting and, at present, most of the backends are still wrapped C/Python/Java code. It is a notorious nuisance to load and execute first time. However, there is a "PlotsBase" (called PlotRecipes) which allows you to import the "plot" function you overload in your application, without loading Plots or a backend (until you need it).

fkiraly · 2019-03-08T11:32:08Z

... we could factor out in a MLJplots module, thus solving the dependency issue?
I come starting to appreciate how Julia's dispatch philosophy makes this easy (though its package management functionality could be improved).

ablaom · 2019-03-08T19:05:26Z

No, no. This is not necessary. We only need PlotsBase (lightweight) as a dependency. The user does need to manually load Plots.jl if they want to plot, but I don't think that's a big deal. The backends get lazy-loaded (ie, as needed).

ablaom · 2019-05-26T22:50:54Z

@fkiraly and others. Returning to your original comment opening this thread, where should one-class classification fit into our scheme? Unsupervised, yes?

fkiraly · 2019-05-27T12:08:12Z

In terms of taxonomy, I'd consider that something completely different, i.e., neither supervised nor unsupervised.

I'd consider one-class classifiers (including one-class kernel SVM) as an instance of outlier detectors, or anomaly detectors (if also on-line).

Even in the case where labelled outliers/artefacts/anomalies are provided in the training set, it's different from the (semi-)supervised task, since there is a designated "normal" class.

It's also different from unsupervised, since unsupervised methods have no interface point to feed back "this is an anomaly".

I.e., naturally, the one-class-SVM would have a task-specific fit/detect interface (or similar, I'm not too insistent on naming here).

One could also consider it sitting in the wider class of "annotator" tasks.

datnamer · 2019-05-27T17:15:55Z

Does this mean the type hierarchy is not granular enough. Maybe it should be traits

fkiraly · 2019-05-27T17:27:20Z

@datnamer, that's an interesting question for @ablaom - where do we draw the distinction between type and trait?

If I recall an earlier discussion correctly, whenever we need to dispatch or inherit differently?

It's just a feeling, but I think anomaly detectors and (un)supervised learners should be different - you can use the latter to do the former, so if feels more like a wrapper/reduction rather than trait variation.

ablaom · 2019-05-28T02:27:12Z

Some coarse distinctions are realised in a type hierarchy. From the docs:

The ultimate supertype of all models is MLJBase.Model, which
has two abstract subtypes:

abstract type Supervised <: Model end
abstract type Unsupervised <: Model end

Supervised models are further divided according to whether they are
able to furnish probabilistic predictions of the target (which they
will then do so by default) or directly predict "point" estimates, for each
new input pattern:

abstract type Probabilistic <: Supervised end
abstract type Deterministic <: Supervised end

All further distinctions are realised with traits some of which take values in the scitype hierarchy or in types derived from them. An example of such a trait is target_scitype_union.

So, I suppose we create a new abstract subtype of MLJ.Model, called AnomalyDetection? With a predict method that only predicts Bool ? Or only predicts objects of scitype Finite{2} (a CategoricalValue{Bool})? With the same traits delineating input scitype types that we have for Unsupervised models, yes?

Obviously this not a priority right now but it did recently come up.

fkiraly · 2019-06-13T18:01:47Z

@ablaom regarding AnomalyDetection agreed, though I'd just call it detect rather than predict.

Regarding unsupervised learners: have we progressed about the distinction between (i) and (ii) at least, from the first post? For #161 especially, a "transformer" type (or sub-type? aspect?) as in (i) would be necessary.

Update: actually, I think we will be fine with (i), i.e., transformer style behaviour only for ManifoldLearning.jl in #161.

ablaom added the design discussion Discussing design issues label Jan 23, 2019

fkiraly mentioned this issue Feb 5, 2019

Investigate "interpretable machine learning" integration #73

Open

fkiraly mentioned this issue Feb 17, 2019

API for models which can do multiple things (e.g., predict multiple output types) #81

Closed

fkiraly mentioned this issue Mar 6, 2019

Visualization tool for one and two-parameter tuning #85

Closed

2 tasks

ablaom mentioned this issue May 26, 2019

add LIBSVM JuliaAI/MLJModels.jl#19

Merged

fkiraly mentioned this issue Jun 13, 2019

Interface ManifoldLearning.jl #161

Closed

Oblynx mentioned this issue Jan 9, 2020

Integrating online and active learning models #60

Open

ablaom mentioned this issue Mar 15, 2020

Scikitlearn clustering methods cleanup JuliaAI/MLJModels.jl#209

Open

This was referenced Apr 24, 2020

For a 0.11.0 release #499

Merged

For a 0.11.0 release #500

Merged

Oblynx mentioned this issue Jul 25, 2020

MLJ integration Oblynx/HierarchicalTemporalMemory.jl#36

Open

davnn mentioned this issue Apr 30, 2021

Discussion: Outlier Detection API in MLJ #780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsupervised learning interfaces - is transformer too narrow? #51

Unsupervised learning interfaces - is transformer too narrow? #51

fkiraly commented Jan 23, 2019

ablaom commented Jan 23, 2019

fkiraly commented Jan 24, 2019

ablaom commented Jan 24, 2019

fkiraly commented Feb 4, 2019

ablaom commented Mar 5, 2019

tlienart commented Mar 6, 2019

fkiraly commented Mar 6, 2019 •

edited

Loading

fkiraly commented Mar 6, 2019

fkiraly commented Mar 6, 2019

fkiraly commented Mar 6, 2019 •

edited

Loading

fkiraly commented Mar 6, 2019

ablaom commented Mar 7, 2019 •

edited

Loading

fkiraly commented Mar 7, 2019

fkiraly commented Mar 7, 2019

ablaom commented Mar 8, 2019

fkiraly commented Mar 8, 2019

ablaom commented Mar 8, 2019

ablaom commented May 26, 2019

fkiraly commented May 27, 2019 •

edited

Loading

datnamer commented May 27, 2019

fkiraly commented May 27, 2019

ablaom commented May 28, 2019

fkiraly commented Jun 13, 2019 •

edited

Loading

Unsupervised learning interfaces - is transformer too narrow? #51

Unsupervised learning interfaces - is transformer too narrow? #51

Comments

fkiraly commented Jan 23, 2019

ablaom commented Jan 23, 2019

fkiraly commented Jan 24, 2019

ablaom commented Jan 24, 2019

fkiraly commented Feb 4, 2019

ablaom commented Mar 5, 2019

tlienart commented Mar 6, 2019

fkiraly commented Mar 6, 2019 • edited Loading

fkiraly commented Mar 6, 2019

fkiraly commented Mar 6, 2019

fkiraly commented Mar 6, 2019 • edited Loading

fkiraly commented Mar 6, 2019

ablaom commented Mar 7, 2019 • edited Loading

fkiraly commented Mar 7, 2019

fkiraly commented Mar 7, 2019

ablaom commented Mar 8, 2019

fkiraly commented Mar 8, 2019

ablaom commented Mar 8, 2019

ablaom commented May 26, 2019

fkiraly commented May 27, 2019 • edited Loading

datnamer commented May 27, 2019

fkiraly commented May 27, 2019

ablaom commented May 28, 2019

fkiraly commented Jun 13, 2019 • edited Loading

fkiraly commented Mar 6, 2019 •

edited

Loading

fkiraly commented Mar 6, 2019 •

edited

Loading

ablaom commented Mar 7, 2019 •

edited

Loading

fkiraly commented May 27, 2019 •

edited

Loading

fkiraly commented Jun 13, 2019 •

edited

Loading