From 25993d2cde385b754c40c5773e12a7f048125836 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Sun, 31 Jul 2022 11:24:56 +1200 Subject: [PATCH 01/15] more docs --- docs/src/anatomy_of_an_implementation.md | 20 ++++++++++++-------- docs/src/index.md | 12 +++++++++++- 2 files changed, 23 insertions(+), 9 deletions(-) diff --git a/docs/src/anatomy_of_an_implementation.md b/docs/src/anatomy_of_an_implementation.md index d3c04e9..21c3682 100644 --- a/docs/src/anatomy_of_an_implementation.md +++ b/docs/src/anatomy_of_an_implementation.md @@ -144,8 +144,8 @@ contracts they imply is given in TODO. > package is registered there. As explained in the introduction, the ML Model Interface does not attempt to define strict -model "types", such as "regressor" or "clusterer". Nevertheless, we can specify suggestive -non-binding keywords: +model "types", such as "regressor" or "clusterer". Nevertheless, we can optionally specify +suggestive non-binding keywords: ```julia MLJInterface.keywords(::Type{<:MyRidge}) = [:regression,] @@ -172,7 +172,7 @@ declarations, which in this case look like: ```julia using ScientificTypesBase -fit_data_scitype(::Type{<:MyRidge}) = Tuple{Table(Continuous), AbstractVector{Continuous}} +MLInterface.fit_data_scitype(::Type{<:MyRidge}) = Tuple{Table(Continuous), AbstractVector{Continuous}} ``` This is a contract that `data` is acceptable in the call `fit(model, verbosity, data...)` @@ -194,20 +194,24 @@ Or, in other words: elements. -## Output data types +## Operation data types -An operation, such as `predict` returns an object whose scientific type is articulated in -this way: +A promise that an operation, such as `predict`, returns an object of given scientific type is articulated in this way: ```julia -operation_scitypes(::Type{<:MyRidge}) = Dict(:predict => AbstractVector{<:Continuous}) +MLJInterface.return_scitypes(::Type{<:MyRidge}) = Dict(:predict => AbstractVector{<:Continuous}) ``` If `predict` had instead returned `Distributions.pdf`-accessible probability distributions, the declaration would be ```julia -operation_scitypes(::Type{<:MyRidge}) = Dict(:predict => AbstractVector{Density{<:Continuous}}) +MLJInterface.return_scitypes(::Type{<:MyRidge}) = Dict(:predict => AbstractVector{Density{<:Continuous}}) ``` +There is also an `input_scitypes` trait for operations. However, this falls back to the +scitype for the first argument of `fit`, as inferred from `fit_data_scitype` (see above). So +we need not overload it here. + + ## Convenience macros diff --git a/docs/src/index.md b/docs/src/index.md index 46ed2b4..294a179 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -22,7 +22,17 @@ of compulsory traits and a larger number of optional ones. There is a single ab type `Model`, but model types can implement the interface without subtyping this. There is no abstract model type hierarchy. -The preceding observations notwithstanding, it is useful to have a guide to the interface +The ML Model Interface provides methods for training and applying machine learning models, +and that is all. It does distinguish between data that comes in a number of "observations" +(such as features and target variables for a classical supervised learning) and other +"metadata" that is non-observational, such as target class weights or group lasso feature +groupings. However, no assumptions are made about how observations are organized or +accessed, which is relevant to resampling, and so ultimately, model optimization. At time of +writing, two promising general data container interfaces for machine learning are provided +by [Tables.jl](https://github.com/JuliaData/Tables.jl) and +[MLUtils.jl](https://github.com/JuliaML/MLUtils.jl). + +Our earlier observations notwithstanding, it is useful to have a guide to the interface organized around common informally defined patterns; the definitive specification of the interface is provided in the [Reference](@ref) section: From 1fc1e0c4c204a6e4fc9a2bbe3f620972f7c99db5 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Mon, 1 Aug 2022 16:59:32 +1200 Subject: [PATCH 02/15] first complete draft of fit, update and ingest --- docs/src/anatomy_of_an_implementation.md | 3 +- docs/src/common_implementation_patterns.md | 7 +- docs/src/fit_update_and_ingest.md | 48 ++++++++- docs/src/index.md | 18 ++-- docs/src/reference.md | 39 +++++--- src/fit_update_ingest.jl | 109 +++++++++++++++++++-- src/models.jl | 2 +- 7 files changed, 190 insertions(+), 36 deletions(-) diff --git a/docs/src/anatomy_of_an_implementation.md b/docs/src/anatomy_of_an_implementation.md index 21c3682..12fc787 100644 --- a/docs/src/anatomy_of_an_implementation.md +++ b/docs/src/anatomy_of_an_implementation.md @@ -186,8 +186,7 @@ Or, in other words: - `X` in `fit(model, verbosity, X, y)` is acceptable, provided `scitype(X) <: Table(Continuous)` - meaning that `X` is a Tables.jl compatible table whose columns have - some `<:AbstractFloat` element type (and the same must be true `Xnew` in `predict(model, - fitresult, Xnew)`). + some `<:AbstractFloat` element type. - `y` in `fit(model, verbosity, X, y)` is acceptable if `scitype(y) <: AbstractVector{Continuous}` - meaning that it is an abstract vector with `<:AbstractFloat` diff --git a/docs/src/common_implementation_patterns.md b/docs/src/common_implementation_patterns.md index 4d907cc..fb4bcf6 100644 --- a/docs/src/common_implementation_patterns.md +++ b/docs/src/common_implementation_patterns.md @@ -9,12 +9,17 @@ This guide is intended to be consulted after reading [Anatomy of a Model Implementation](@ref), which introduces the main interface objects and terminology. Although an implementation is defined purely by the methods and traits it implements, most -implementations fall into one of the following informally understood algorithm "types": +implementations fall into one (or more) of the following informally understood algorithm +"types": - [Classifiers](@ref): Supervised learners for categorical targets - [Regressors](@ref): Supervised learners for continuous targets +- [Iterative Models](@ref) + +- [Incremental Models](@ref) + - [Static Transformers](@ref): Transformations that do not learn but which have hyper-parameters and/or deliver ancilliary information about the transformation diff --git a/docs/src/fit_update_and_ingest.md b/docs/src/fit_update_and_ingest.md index b89b6f5..7e2a038 100644 --- a/docs/src/fit_update_and_ingest.md +++ b/docs/src/fit_update_and_ingest.md @@ -1,5 +1,51 @@ -# Fit, update and ingest +# Fit, update! and ingest! + +> **Summary.** All models that learn, i.e., generalize to new data, must implement `fit`; +> the fallback, useful for so-called **static** models, performs no operation and returns +> all `nothing`. Implement `update!` if certain hyper-parameter changes do not necessitate +> retraining from scratch (e.g., iterative models). Implement `ingest!` to implement +> incremental learning. + +| method | fallback | compulsory? | +|:---------------------------|:---------------------------------------------------|-------------| +[`MLInterface.fit`](@ref) | does nothing, returns `(nothing, nothing, nothing)`| no | +[`MLInterface.update!`](@ref) | calls `fit` | no | +[`MLJInterface.ingest!`](@ref)| none | no | + +Implement `fit` unless your model is **static**, meaning its [operations](@ref operations) +such as `predict` and `transform`, ignore their `fitresult` argument (which will be +`nothing`). This is the case for models that have hyper-parameters, but do not generalize to +new data, such as a basic DBSCAN clustering algorithm. Related: +[`MLInterface.reporting_operations`](@ref), [Static Models](@ref). + +The `update!` method is intended for all subsequent calls to train a model *using the same +data*, but with possibly altered hyperparameters (`model` argument). As described below, a +fallback implementation simply calls `fit`. The main use cases are for warm-restarting +iterative model training, and for "smart" training of composite models, such as linear +pipelines. Here "smart" means that hyperparameter changes only trigger the retraining of +downstream components. + +The `ingest!` method supports incremental learning (same hyperparameters, but new +data). Like `update!`, it depends on the output a preceding `fit` or `ingest!` call. + ```@docs MLInterface.fit +MLInterface.update! +MLInterface.ingest! ``` + +## Further guidance on what goes where + +Recall that the `fitresult` returned as part of `fit` represents everything needed by an +[operation](@ref operations), such as [`MLInterface.predict`](@ref). + +The properties of your model (typically struct fields) are *hyperparameters*, i.e., those +parameters declared by the user ahead of time that generally affect the outcome of training +and are not learned. It is okay to add "control" parameters (such a specifying whether or +not to use a GPU). Use `report` to return *everything else*. This includes: feature +rankings/importances, SVM support vectors, clustering centres, methods for visualizing +training outcomes, methods for saving learned parameters in a custom format, degrees of +freedom, deviances. If there is a performance cost to extra functionality you want to +expose, the functionality can be toggled on/off through a hyperparameter, but this should +otherwise be avoided. diff --git a/docs/src/index.md b/docs/src/index.md index 294a179..6b48a08 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -11,9 +11,9 @@ A Julia interface for training and applying models in machine learning and stati Machine learning algorithms, also called *models*, have a complicated taxonomy. Grouping models into a relatively small number of types, such as "classifier" and "clusterer", and -attempting to impose uniform behaviour within each group, is a problematic approach. It -either leads to limitations on the models that can be included in a general interface, or to -undesirable complexity needed to cope with exceptional cases. +attempting to impose uniform behaviour within each group, is problematic. It either leads to +limitations on the models that can be included in a general interface, or to undesirable +complexity needed to cope with exceptional cases. For these and other reasons, the behaviour of a model implementing the **ML Model Interface** documented here is articulated using traits - methods dispatched on @@ -24,17 +24,17 @@ no abstract model type hierarchy. The ML Model Interface provides methods for training and applying machine learning models, and that is all. It does distinguish between data that comes in a number of "observations" -(such as features and target variables for a classical supervised learning) and other +(such as features and target variables for a classical supervised learning model) and other "metadata" that is non-observational, such as target class weights or group lasso feature groupings. However, no assumptions are made about how observations are organized or -accessed, which is relevant to resampling, and so ultimately, model optimization. At time of -writing, two promising general data container interfaces for machine learning are provided -by [Tables.jl](https://github.com/JuliaData/Tables.jl) and +accessed, which is relevant to resampling, and so ultimately, for model optimization. At +time of writing, two promising general data container interfaces for machine learning are +provided by [Tables.jl](https://github.com/JuliaData/Tables.jl) and [MLUtils.jl](https://github.com/JuliaML/MLUtils.jl). Our earlier observations notwithstanding, it is useful to have a guide to the interface -organized around common informally defined patterns; the definitive specification of the -interface is provided in the [Reference](@ref) section: +organized around common *informally defined* patterns; however, the definitive specification +of the interface is provided in the [Reference](@ref) section: - Overview: [Anatomy of an Implementation](@ref) diff --git a/docs/src/reference.md b/docs/src/reference.md index 1a95f93..0191078 100644 --- a/docs/src/reference.md +++ b/docs/src/reference.md @@ -7,14 +7,16 @@ guide see [Common Implementation Patterns](@ref). ## Models > **Summary** In the ML Model Interface a **model** is a Julia object whose properties are -> the hyper-parameters of some learning algorithm. The behaviour of a model is determined -> purely by the methods in MLInterface.jl that are overloaded for it. +> the hyper-parameters of some learning algorithm. Functionality is created by overloading +> methods defined by the interface and promises of certain behavior articulated by model +> traits. -In this document the word "model" has a very specific meaning that may conflict with the reader's -common understanding of the word - in statistics, for example. In this document a **model** is -any julia object `some_model` storing the hyper-parameters of some learning algorithm that -are accessible as named properties of the model, as in `some_model.epochs`. Calling -`Base.propertynames(some_model)` must return the names of those hyper-parameters. +In this document the word "model" has a very specific meaning that may conflict with the +reader's common understanding of the word - in statistics, for example. In this document a +**model** is any julia object, `some_model` say, storing the hyper-parameters of some +learning algorithm that are accessible as named properties of the model, as in +`some_model.epochs`. Calling `Base.propertynames(some_model)` must return the names of those +hyper-parameters. Two models with the same type should be `==` if and only if all their hyper-parameters are `==`. Of course, a hyper-parameter could be another model. @@ -23,30 +25,35 @@ Any instance of `SomeType` below is a model in the above sense: ```julia struct SomeType{T<:Real} <: MLInterface.Model - epochs::Int - lambda::T + epochs::Int + lambda::T end ``` -The subtyping `<: MLInterface.Model` is optional. If it is included and the type is -instead a `mutable struct`, then there is no need to explicitly overload `Base.==`. If it is +The subtyping `<: MLInterface.Model` is optional. If it is included and the type is instead +a `mutable struct`, then there is no need to explicitly overload `Base.==`. If it is omitted, then one must make the declaration `MLInterface.ismodel(::SomeType) = true` -and overload `Base.==` if necessary. +and overload `Base.==` in the mutable case. > **MLJ only.** The subtyping also ensures instances will be displayed according to a > standard MLJ convention, assuming MLJ or MLJBase are loaded. +```@docs +MLInterface.ismodel +MLInterface.Model +``` ## Methods -Model functionality is created and dilineated by implementing `fit`, one or more -*operations*, optional **accessor functions**, and some number of **model traits**. Examples -of these methods are given in [Anatomy of an Interface](@ref)). +Model functionality is created by implementing `fit` (and optionally `update!` and +`ingest!`), one or more *operations*, like `predict`, and optional **accessor functions**; +promises of certain behavior is articulated using **model traits**. Examples of these +methods are given in [Anatomy of an Interface](@ref)). -- [Fit, update and ingest](@ref): for models that "learn" (generalize to +- [Fit, update! and ingest!](@ref): for models that "learn" (generalize to new data) - [Operations](@ref): `predict`, `transform` and their relatives diff --git a/src/fit_update_ingest.jl b/src/fit_update_ingest.jl index f30c014..333cbad 100644 --- a/src/fit_update_ingest.jl +++ b/src/fit_update_ingest.jl @@ -1,6 +1,14 @@ -const DOC_OPERATIONS = "An *operation* is a method like [`MLInterface.predict`](@ref) or"* +const DOC_OPERATIONS = + "An *operation* is a method like [`MLInterface.predict`](@ref) or "* "[`MLInterface.transform`](@ref); do `MLInterface.OPERATIONS` to list." +DOC_IMPLEMENTED_METHODS(name) = + "If implemented, include `:$name` in the vector returned by the "* + "[`MLInterface.implemented_methods`](@ref) trait. " + + +# # FIT + """ MLInterface.fit(model, verbosity, data...; metadata...) @@ -29,18 +37,107 @@ Returns a tuple (`fitresult`, `state`, `report`) where: user-interest is not needed for operations, it should be part of `report` instead (see below). -- The `state` is for passing to [`MLInterface.update`](@ref) or - [`MLInterface.ingest`](@ref). For models that implement neither, `state` should be +- The `state` is for passing to [`MLInterface.update!`](@ref) or + [`MLInterface.ingest!`](@ref). For models that implement neither, `state` should be `nothing`. - The `report` records byproducts of training not in the `fitresult`. -# Fallback +# New model implementations -A fallback performs no computation, returning `(nothing, nothing, nothing)`. +This method is an optional method the ML Model Interface. A fallback performs no +computation, returning `(nothing, nothing, nothing)`. -See also [`update`](@ref), [`ingest`](@ref). +$(DOC_IMPLEMENTED_METHODS(:fit)) + +See also [`MLInterface.update!`](@ref), [`MLInterface.ingest!`](@ref). """ fit(::Any, ::Any, ::Integer, data...; metadata...) = nothing, nothing, nothing + + +# # UPDATE + +""" + MLInterface.update!(model, verbosity, fitresult, state, data...; metadata...)d Based on the values of `state`, and `fitresult` returned by a preceding call to +[`MLInterface.fit`](ref), [`MLInterface.ingest!`](@ref), or [`MLInterface.update!`](@ref), +update a model's learned parameters, returning new (or mutated) `state` and `fitresult`. + +Intended for retraining a model when the training data has not changed, but `model` +properties (hyperparameters) may have changed. Specifically, the assumption is that `data` +and `metadata` have the same values seen in the most recent call to `fit/update!/ingest!`. + +The most common use case is for continuing the training of an iterative model: `state` is +simply a copy of the model used in the last training call (`fit`, `update!' or `ingest!`) and +this will include the current number of iterations as a property. If `model` and `state` +differ only in the number of iterations (e.g., epochs in a neural networ), which has +increased, then the learned parameters (weights) are updated, rather computed ab +initio. Otherwise, `update!` simply calls `fit` to retrain from scratch. + +**Important.** It is permitted to return mutated versions of `state` and `fitresult`, rather + than new objects, but no other argument may be mutated. + +For incremental training (same model, new data) see instead [`MLInterface.ingest!`](@ref). + + +# Return value + +Same as [`MLInterface.fit`](@ref), namely a tuple (`fitresult`, `state`, `report`). See +[`MLInterface.fit`](@ref) for details. + + +# New model implementations + +This method is an optional method in the ML Model Interface. A fallback calls +`MLInterfaceperforms.fit`: + +```julia +MLInterface.update!(model, verbosity, fitresult, state, data...; metadata...) = + fit(model, verbosity, data; metadata...) +``` + +$(DOC_IMPLEMENTED_METHODS(:fit)) + +See also [`MLInterface.fit`](@ref), [`MLInterface.ingest!`](@ref). + +""" +update!(model, verbosity, fitresult, state, data...; metadata...) = + fit(model, verbosity, data...; metadata...) + + +# # INGEST + +""" + MLInterface.ingest!(model, verbosity, fitresult, state, data...; metadata...) + +For a model that supports incremental learning, update the learned parameters using `data`, +which has typically not been seen before. The arguments `state` and `fitresult` are the +output of a preceding call to [`MLInterface.fit`](ref), [`MLInterface.ingest!`](@ref), or +[`MLInterface.update!`](@ref), of which mutated or new versions are returned. + +For updating learned parameters using the *same* data but new hyperparameters, see instead +[`MLInterface.update!`](@ref). + +**Important.** It is permitted to return mutated versions of `state` and `fitresult`, rather +than new objects, but no other argument may be mutated. + +For incremental training, see instead [`MLInterface.ingest'](@ref). + + +# Return value + +Same as [`MLInterface.fit`](@ref), namely a tuple (`fitresult`, `state`, `report`). See +[`MLInterface.fit`](@ref) for details. + + +# New model implementations + +This method is an optional method in the ML Model Interface. It has no fallback. + +$(DOC_IMPLEMENTED_METHODS(:fit)) + +See also [`MLInterface.fit`](@ref), [`MLInterface.update!`](@ref). + +""" +function ingest!(model, verbosity, fitresult, state, data...; metadata...) end diff --git a/src/models.jl b/src/models.jl index 4156972..3978d91 100644 --- a/src/models.jl +++ b/src/models.jl @@ -35,7 +35,7 @@ documentation. In particular, this means: # New ML Model Implementations -Either declare `NewModelType <: MLInterface.Model` or `MLInterface.model(::SomeModelType) = +Either declare `NewModelType <: MLInterface.Model` or `MLInterface.model(::NewModelType) = true`. See also [`MLInterface.Model`]. From 23dd9b1f764e6303ba04831190d449e4a3676cfe Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Mon, 8 Aug 2022 10:17:23 +1200 Subject: [PATCH 03/15] more --- Project.toml | 3 ++ docs/make.jl | 8 ++-- docs/src/anatomy_of_an_implementation.md | 44 ++++++++++-------- docs/src/fit_update_and_ingest.md | 14 +++--- docs/src/index.md | 59 ++++++++++++------------ docs/src/reference.md | 41 +++++++++++++--- src/MLInterface.jl | 5 +- src/fit_update_ingest.jl | 55 ++++++++++++---------- src/models.jl | 2 +- 9 files changed, 138 insertions(+), 93 deletions(-) diff --git a/Project.toml b/Project.toml index e77c895..ed07c52 100644 --- a/Project.toml +++ b/Project.toml @@ -2,3 +2,6 @@ name = "MLInterface" uuid = "92ad9a40-7767-427a-9ee6-6e577f1266cb" authors = ["Anthony D. Blaom "] version = "0.1.0" + +[deps] +Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" diff --git a/docs/make.jl b/docs/make.jl index 61bd61d..72b9942 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -12,14 +12,14 @@ makedocs(; "Common Implementation Patterns" => "common_implementation_patterns.md", "Reference" => "reference.md", "Fit, update and ingest" => "fit_update_and_ingest.md", + "Predict and other operations" => "operations.md", ], repo="https://$REPO/blob/{commit}{path}#L{line}", sitename="MLInterface.jl" ) -deploydocs(; - repo=REPO, +deploydocs( + ; repo=REPO, devbranch="dev", push_preview=false, - - ) +) diff --git a/docs/src/anatomy_of_an_implementation.md b/docs/src/anatomy_of_an_implementation.md index 12fc787..b0a2dfd 100644 --- a/docs/src/anatomy_of_an_implementation.md +++ b/docs/src/anatomy_of_an_implementation.md @@ -65,7 +65,7 @@ function MLInterface.fit(model::MyRidge, verbosity, X, y) coefficients = (x'x + model.lambda*I)\(x'y) # prepare output - learned parameters: - fitresult = (; coefficients) + fitted_params = (; coefficients) # prepare output - model state: state = nothing # not relevant here @@ -77,13 +77,13 @@ function MLInterface.fit(model::MyRidge, verbosity, X, y) verbosity > 1 && @info "Features in order of importance: $(first.(feature_importances))" report = (; feature_importances) - return fitresult, state, report + return fitted_params, state, report end ``` Regarding the return value of `fit`: -- The `fitresult` is for the model's learned parameters, in any form, for passing to +- The `fitted_params` is for the model's learned parameters, in any form, for passing to `predict` (see below). - The `state` variable is only relevant when additionally implementing an [`update`](@ref) @@ -101,7 +101,7 @@ implementations, the ML Model Interface puts no restrictions on the form of `X` Now we need a method for predicting the target on new input features: ```julia -MLInterface.predict(::MyRidge, fitresult, Xnew) = Tables.matrix(Xnew)*fitresult.coefficients +MLInterface.predict(::MyRidge, fitted_params, Xnew) = Tables.matrix(Xnew)*fitted_params.coefficients ``` The above `predict` method is an example of an **operation**. Other operations include @@ -112,13 +112,13 @@ K-means clustering model might implement a `transform` for dimension reduction, ## Accessor functions -The arguments of an operation are always `(model, fitresult, data...)`. The interface also -provides **accessor functions** for extracting information from the `fitresult` and/or +The arguments of an operation are always `(model, fitted_params, data...)`. The interface also +provides **accessor functions** for extracting information from the `fitted_params` and/or `report` that is shared by several model types. There is one for feature importances that we can implement for `MyRidge`: ```julia -MLInterface.feature_importances(::MyRidge, fitresult, report) = report.feature_importances +MLInterface.feature_importances(::MyRidge, fitted_params, report) = report.feature_importances ``` Another example of an accessor function is `training_losses`. @@ -126,32 +126,40 @@ Another example of an accessor function is `training_losses`. ## Model traits -Now the data argument `Xnew` of `predict` has the same type as the *first* argument `X` -encountered in `fit`, while `predict` returns an object with the type of the *second* data -argument `y` of `fit`. It therefore makes sense, for example, to apply a suitable metric -(e.g., a sum of squares) to the pair `(ŷ, y)`, where `ŷ = predict(model, fitresult, X)`. We -will flag this behavior by declaring +In this supervised learning example, `predict` returns an object with the same type of the +*second* data argument `y` of `fit` (the target). It therefore makes sense, for example, to +apply a suitable metric (e.g., a sum of squares) to the pair `(ŷ, y)`, where `ŷ = +predict(model, fitted_params, X)`. We will flag this behavior by declaring ```julia MLInterface.is_supervised(::Type{<:MyRidge}) = true ``` This is an example of a **model trait** declaration. A complete list of traits and the -contracts they imply is given in TODO. +contracts they imply is given in [`Model traits`](@ref). > **MLJ only.** The values of all traits constitute a model's **metadata**, which is > recorded in the searchable MLJ Model Registry, assuming the implementation-providing > package is registered there. +Since our model is supervised, we are required to implement an additional trait that +distinguishes our model from other regressors that make probabilistic or other kinds of +predictions of the target: + +```julia +MLInterface.prediction_type(::Type{<:MyRidge}) = :deterministic +``` + As explained in the introduction, the ML Model Interface does not attempt to define strict -model "types", such as "regressor" or "clusterer". Nevertheless, we can optionally specify -suggestive non-binding keywords: +model "types", such as "regressor" or "clusterer". We can optionally specify +suggestive keywords, as in ```julia MLJInterface.keywords(::Type{<:MyRidge}) = [:regression,] ``` -Do `MLInterface.keywords()` to get a list of available keywords. +but note that this declaration promises nothing. Do `MLInterface.keywords()` to get a list +of available keywords. Finally, we are required to declare what methods (excluding traits) we have explicitly overloaded for our type: @@ -201,8 +209,8 @@ A promise that an operation, such as `predict`, returns an object of given scien MLJInterface.return_scitypes(::Type{<:MyRidge}) = Dict(:predict => AbstractVector{<:Continuous}) ``` -If `predict` had instead returned `Distributions.pdf`-accessible probability distributions, -the declaration would be +If `predict` had instead returned probability distributions, and these implement the +`Distributions.pdf` interface, then the declaration would be ```julia MLJInterface.return_scitypes(::Type{<:MyRidge}) = Dict(:predict => AbstractVector{Density{<:Continuous}}) diff --git a/docs/src/fit_update_and_ingest.md b/docs/src/fit_update_and_ingest.md index 7e2a038..008ac7a 100644 --- a/docs/src/fit_update_and_ingest.md +++ b/docs/src/fit_update_and_ingest.md @@ -6,14 +6,14 @@ > retraining from scratch (e.g., iterative models). Implement `ingest!` to implement > incremental learning. -| method | fallback | compulsory? | -|:---------------------------|:---------------------------------------------------|-------------| -[`MLInterface.fit`](@ref) | does nothing, returns `(nothing, nothing, nothing)`| no | -[`MLInterface.update!`](@ref) | calls `fit` | no | -[`MLJInterface.ingest!`](@ref)| none | no | +| method | fallback | compulsory? | requires | +|:---------------------------|:---------------------------------------------------|-------------|-------------------| +[`MLInterface.fit`](@ref) | does nothing, returns `(nothing, nothing, nothing)`| no | | +[`MLInterface.update!`](@ref) | calls `fit` | no | `MLInterface.fit` | +[`MLJInterface.ingest!`](@ref)| none | no | `MLInterfac.fit` | Implement `fit` unless your model is **static**, meaning its [operations](@ref operations) -such as `predict` and `transform`, ignore their `fitresult` argument (which will be +such as `predict` and `transform`, ignore their `fitted_params` argument (which will be `nothing`). This is the case for models that have hyper-parameters, but do not generalize to new data, such as a basic DBSCAN clustering algorithm. Related: [`MLInterface.reporting_operations`](@ref), [Static Models](@ref). @@ -37,7 +37,7 @@ MLInterface.ingest! ## Further guidance on what goes where -Recall that the `fitresult` returned as part of `fit` represents everything needed by an +Recall that the `fitted_params` returned as part of `fit` represents everything needed by an [operation](@ref operations), such as [`MLInterface.predict`](@ref). The properties of your model (typically struct fields) are *hyperparameters*, i.e., those diff --git a/docs/src/index.md b/docs/src/index.md index 6b48a08..990a3e3 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -15,30 +15,29 @@ attempting to impose uniform behaviour within each group, is problematic. It eit limitations on the models that can be included in a general interface, or to undesirable complexity needed to cope with exceptional cases. -For these and other reasons, the behaviour of a model implementing the **ML Model -Interface** documented here is articulated using traits - methods dispatched on -the model type, such as `is_supervised(model::SomeModel) = true`. There are a small number -of compulsory traits and a larger number of optional ones. There is a single abstract model -type `Model`, but model types can implement the interface without subtyping this. There is -no abstract model type hierarchy. - -The ML Model Interface provides methods for training and applying machine learning models, -and that is all. It does distinguish between data that comes in a number of "observations" -(such as features and target variables for a classical supervised learning model) and other -"metadata" that is non-observational, such as target class weights or group lasso feature -groupings. However, no assumptions are made about how observations are organized or -accessed, which is relevant to resampling, and so ultimately, for model optimization. At -time of writing, two promising general data container interfaces for machine learning are -provided by [Tables.jl](https://github.com/JuliaData/Tables.jl) and -[MLUtils.jl](https://github.com/JuliaML/MLUtils.jl). - -Our earlier observations notwithstanding, it is useful to have a guide to the interface -organized around common *informally defined* patterns; however, the definitive specification -of the interface is provided in the [Reference](@ref) section: - -- Overview: [Anatomy of an Implementation](@ref) - -- User Guide: [Common Implementation Patterns](@ref) +For these and other reasons, the **ML Model Interface** documented here is purely functional +with no abstract model types (apart an optional supertype `Model`). In addition to `fit`, +`update!` and `ingest!` methods (all optional), one implements one or more operations, such +as `predict`, `transform` and `inverse_transform`. Method stubs for access functions, such +as `feature_importances`, are also provided. Finally, a number of optional trait +declarations, such as `is_supervised(model::SomeModel) = true`, make promises of specific +behaviour. + +The ML Model Interface provides methods for training, applying, and saving machine learning +models, and that is all. It does not provide an interface for data resampling, although it +informally distinguishes between training data consisting of "observations", and other +"metadata", such as target class weights or group lasso feature gropings. At present the +only restriction on data containers concerns the target predictions of supervised models +(whether deterministic, probabilistic or otherwise): These must be abstract arrays or tables +compatible with [Tables.jl](https://github.com/JuliaData/Tables.jl). + +Our opening observations notwithstanding, it is useful to have a guide to the interface, +linked below, organized around common *informally defined* patterns. However, the definitive +specification of the interface is the [Reference](@ref) section. + +- [Anatomy of an Implementation](@ref) (Overview) + +- [Common Implementation Patterns](@ref) (User Guide) - [Reference](@ref) @@ -50,9 +49,9 @@ of the interface is provided in the [Reference](@ref) section: consulting the guide or reference sections. -**Note.** The ML Model Interface provides a foundation for the -[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/)'s "machine" interface for user -interaction. However it is a general purpose, standalone, lightweight API for machine -learning algorithms (and has no reference to machines). - - +**Note.** The ML Model Interface provides a foundation for the higher level "machine" +interface for user interaction in the toolbox +[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) created by the same +developers. However, the ML Model Interface provided here is meant as a general purpose, +standalone, lightweight API for machine learning algorithms (and has no reference to +machines). diff --git a/docs/src/reference.md b/docs/src/reference.md index 0191078..69c487e 100644 --- a/docs/src/reference.md +++ b/docs/src/reference.md @@ -3,7 +3,6 @@ Here we give the definitive specification of the ML Model Interface. For a more informal guide see [Common Implementation Patterns](@ref). - ## Models > **Summary** In the ML Model Interface a **model** is a Julia object whose properties are @@ -11,13 +10,19 @@ guide see [Common Implementation Patterns](@ref). > methods defined by the interface and promises of certain behavior articulated by model > traits. -In this document the word "model" has a very specific meaning that may conflict with the +In this document the word "model" has a very specific meaning that may differ from the reader's common understanding of the word - in statistics, for example. In this document a **model** is any julia object, `some_model` say, storing the hyper-parameters of some learning algorithm that are accessible as named properties of the model, as in `some_model.epochs`. Calling `Base.propertynames(some_model)` must return the names of those hyper-parameters. +It is supposed that making copies of model objects is a cheap operation. Consequently, +*learned* parameters, such as coefficients in a linear model, or weights in a neural network +(the `fitted_params` appearing in [Fit, update! and ingest!](@ref)) are not expected to be +part of a model. Storing learned parameters in a model is not explicitly ruled out, but +doing so might lead to performance issues in packages adopting the ML Model Interface. + Two models with the same type should be `==` if and only if all their hyper-parameters are `==`. Of course, a hyper-parameter could be another model. @@ -46,17 +51,39 @@ MLInterface.ismodel MLInterface.Model ``` +## Data containers + +In this document a **data container** is any object implementing some kind of iteration +interface, where the length of the iteration, called the **number of observations**, is +known in advance. At present "some kind of iteration interface" remains undefined, but a +working definition would include the `getrows` interface from +[Tables.jl](https://github.com/JuliaData/Tables.jl) interface and/or the `getobs` interface +from [MLUtils.jl](https://github.com/JuliaML/MLUtils.jl) (the latter interface +[subsuming](https://github.com/JuliaML/MLUtils.jl/issues/61) the former at some point?). The +`getobs` interface includes a built-in implementation for any `AbstractArray`, where the +observation index is understood to be the *last* index. Unfortunately, according to this +convention, a matrix `X` in this interface, corresponds to `Tables.table(X')` in the +`getrows` interface (where observations are rows). + + ## Methods -Model functionality is created by implementing `fit` (and optionally `update!` and -`ingest!`), one or more *operations*, like `predict`, and optional **accessor functions**; -promises of certain behavior is articulated using **model traits**. Examples of these -methods are given in [Anatomy of an Interface](@ref)). +Model functionality is created by implementing: + +- zero or more of the training methods, `fit`, `update!` and `ingest!` (the second and third + require the first) + +- zero or more **operations**, like `predict` + +- zero or more **accessor functions** + +While promises of certain behaviour are articulated using **model traits**. Examples of all +these methods given in [Anatomy of an Interface](@ref)). - [Fit, update! and ingest!](@ref): for models that "learn" (generalize to new data) -- [Operations](@ref): `predict`, `transform` and their relatives +- [Operations](@ref operations): `predict`, `transform` and their relatives - [Accessor Functions](@ref): accessing byproducts of training shared by some models, such as feature importances and training losses diff --git a/src/MLInterface.jl b/src/MLInterface.jl index e61167e..c422764 100644 --- a/src/MLInterface.jl +++ b/src/MLInterface.jl @@ -1,6 +1,9 @@ -module MLInterface +module MLInterface + +using Statistics include("models.jl") include("fit_update_ingest.jl") +include("operations.jl") end diff --git a/src/fit_update_ingest.jl b/src/fit_update_ingest.jl index 333cbad..cd9cac9 100644 --- a/src/fit_update_ingest.jl +++ b/src/fit_update_ingest.jl @@ -1,10 +1,15 @@ +# # DOC STRING HELPERS + const DOC_OPERATIONS = - "An *operation* is a method like [`MLInterface.predict`](@ref) or "* - "[`MLInterface.transform`](@ref); do `MLInterface.OPERATIONS` to list." + "An *operation* is a method, like [`MLInterface.predict`](@ref) or "* + "[`MLInterface.transform`](@ref), that has signature "* + "`(model, fitted_params, data....)`; do `MLInterface.OPERATIONS` to list." -DOC_IMPLEMENTED_METHODS(name) = - "If implemented, include `:$name` in the vector returned by the "* +function DOC_IMPLEMENTED_METHODS(name; overloaded=false) + word = overloaded ? "overloaded" : "implemented" + "If $word, include `:$name` in the vector returned by the "* "[`MLInterface.implemented_methods`](@ref) trait. " +end # # FIT @@ -18,21 +23,21 @@ should be silent if `verbosity == 0`. Lower values should suppress warnings. Her - `model` is a property-accessible object whose properties are the hyper-parameters of some machine learning algorithm; see also [`MLInterface.ismodel`](@ref). -- `data` is a tuple of data objects with a common number of observations, for example, - `data = (X, y, w)` where `X` is a table of features, `y` a target variable, and `w` - per-observation weights. The ML Model Interface does not specify how observations are - structured or accessed. +- `data` is a tuple of data objects (as defined in the ML Interface documentation) with a + common number of observations, for example, `data = (X, y, w)` where `X` is a table of + features, `y` is a target vector with the same number of rows, and `w` a vector of + per-observation weights. -- `metadata` is for extra information pertaining to the data that is not structured as a - number of observations, for example, weights for target classes. Another example - would be feature groupings in the group lasso algorithm. +- `metadata` is for extra information pertaining to the data that is never iterated, for + example, weights for target classes. Another example would be feature groupings in the + group lasso algorithm. # Return value -Returns a tuple (`fitresult`, `state`, `report`) where: +Returns a tuple (`fitted_params`, `state`, `report`) where: -- The `fitresult` is the model's learned parameters (eg, the coefficients in a linear model) +- The `fitted_params` is the model's learned parameters (eg, the coefficients in a linear model) in a form understood by model operations. $DOC_OPERATIONS If some training outcome of user-interest is not needed for operations, it should be part of `report` instead (see below). @@ -41,7 +46,7 @@ Returns a tuple (`fitresult`, `state`, `report`) where: [`MLInterface.ingest!`](@ref). For models that implement neither, `state` should be `nothing`. -- The `report` records byproducts of training not in the `fitresult`. +- The `report` records byproducts of training not in the `fitted_params`. # New model implementations @@ -60,9 +65,9 @@ fit(::Any, ::Any, ::Integer, data...; metadata...) = nothing, nothing, nothing # # UPDATE """ - MLInterface.update!(model, verbosity, fitresult, state, data...; metadata...)d Based on the values of `state`, and `fitresult` returned by a preceding call to + MLInterface.update!(model, verbosity, fitted_params, state, data...; metadata...)d Based on the values of `state`, and `fitted_params` returned by a preceding call to [`MLInterface.fit`](ref), [`MLInterface.ingest!`](@ref), or [`MLInterface.update!`](@ref), -update a model's learned parameters, returning new (or mutated) `state` and `fitresult`. +update a model's learned parameters, returning new (or mutated) `state` and `fitted_params`. Intended for retraining a model when the training data has not changed, but `model` properties (hyperparameters) may have changed. Specifically, the assumption is that `data` @@ -75,7 +80,7 @@ differ only in the number of iterations (e.g., epochs in a neural networ), which increased, then the learned parameters (weights) are updated, rather computed ab initio. Otherwise, `update!` simply calls `fit` to retrain from scratch. -**Important.** It is permitted to return mutated versions of `state` and `fitresult`, rather +**Important.** It is permitted to return mutated versions of `state` and `fitted_params`, rather than new objects, but no other argument may be mutated. For incremental training (same model, new data) see instead [`MLInterface.ingest!`](@ref). @@ -83,7 +88,7 @@ For incremental training (same model, new data) see instead [`MLInterface.ingest # Return value -Same as [`MLInterface.fit`](@ref), namely a tuple (`fitresult`, `state`, `report`). See +Same as [`MLInterface.fit`](@ref), namely a tuple (`fitted_params`, `state`, `report`). See [`MLInterface.fit`](@ref) for details. @@ -93,7 +98,7 @@ This method is an optional method in the ML Model Interface. A fallback calls `MLInterfaceperforms.fit`: ```julia -MLInterface.update!(model, verbosity, fitresult, state, data...; metadata...) = +MLInterface.update!(model, verbosity, fitted_params, state, data...; metadata...) = fit(model, verbosity, data; metadata...) ``` @@ -102,24 +107,24 @@ $(DOC_IMPLEMENTED_METHODS(:fit)) See also [`MLInterface.fit`](@ref), [`MLInterface.ingest!`](@ref). """ -update!(model, verbosity, fitresult, state, data...; metadata...) = +update!(model, verbosity, fitted_params, state, data...; metadata...) = fit(model, verbosity, data...; metadata...) # # INGEST """ - MLInterface.ingest!(model, verbosity, fitresult, state, data...; metadata...) + MLInterface.ingest!(model, verbosity, fitted_params, state, data...; metadata...) For a model that supports incremental learning, update the learned parameters using `data`, -which has typically not been seen before. The arguments `state` and `fitresult` are the +which has typically not been seen before. The arguments `state` and `fitted_params` are the output of a preceding call to [`MLInterface.fit`](ref), [`MLInterface.ingest!`](@ref), or [`MLInterface.update!`](@ref), of which mutated or new versions are returned. For updating learned parameters using the *same* data but new hyperparameters, see instead [`MLInterface.update!`](@ref). -**Important.** It is permitted to return mutated versions of `state` and `fitresult`, rather +**Important.** It is permitted to return mutated versions of `state` and `fitted_params`, rather than new objects, but no other argument may be mutated. For incremental training, see instead [`MLInterface.ingest'](@ref). @@ -127,7 +132,7 @@ For incremental training, see instead [`MLInterface.ingest'](@ref). # Return value -Same as [`MLInterface.fit`](@ref), namely a tuple (`fitresult`, `state`, `report`). See +Same as [`MLInterface.fit`](@ref), namely a tuple (`fitted_params`, `state`, `report`). See [`MLInterface.fit`](@ref) for details. @@ -140,4 +145,4 @@ $(DOC_IMPLEMENTED_METHODS(:fit)) See also [`MLInterface.fit`](@ref), [`MLInterface.update!`](@ref). """ -function ingest!(model, verbosity, fitresult, state, data...; metadata...) end +function ingest!(model, verbosity, fitted_params, state, data...; metadata...) end diff --git a/src/models.jl b/src/models.jl index 3978d91..2cab939 100644 --- a/src/models.jl +++ b/src/models.jl @@ -38,7 +38,7 @@ documentation. In particular, this means: Either declare `NewModelType <: MLInterface.Model` or `MLInterface.model(::NewModelType) = true`. -See also [`MLInterface.Model`]. +See also [`MLInterface.Model`](@ref). """ ismodel(::Any) = false From d79bcb28c02c44bd36268eb58216f4e822a2d8b4 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Mon, 8 Aug 2022 15:49:44 +1200 Subject: [PATCH 04/15] rename MLInterface -> DataAPI + other stuff --- Project.toml | 2 +- README.md | 6 +- docs/Project.toml | 2 +- docs/make.jl | 9 +- docs/src/anatomy_of_an_implementation.md | 55 ++++---- docs/src/common_implementation_patterns.md | 2 +- docs/src/fit_update_and_ingest.md | 14 +- docs/src/index.md | 10 +- docs/src/model_traits.md | 11 ++ docs/src/operations.md | 49 +++++++ docs/src/reference.md | 16 +-- src/{MLInterface.jl => LearnAPI.jl} | 2 +- src/fit_update_ingest.jl | 68 +++++----- src/models.jl | 12 +- src/operations.jl | 146 +++++++++++++++++++++ 15 files changed, 311 insertions(+), 93 deletions(-) create mode 100644 docs/src/model_traits.md create mode 100644 docs/src/operations.md rename src/{MLInterface.jl => LearnAPI.jl} (84%) create mode 100644 src/operations.jl diff --git a/Project.toml b/Project.toml index ed07c52..5ef5910 100644 --- a/Project.toml +++ b/Project.toml @@ -1,4 +1,4 @@ -name = "MLInterface" +name = "LearnAPI" uuid = "92ad9a40-7767-427a-9ee6-6e577f1266cb" authors = ["Anthony D. Blaom "] version = "0.1.0" diff --git a/README.md b/README.md index 1c06e72..26fd613 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# MLInterface.jl +# LearnAPI.jl A Julia interface for training and applying models in machine learning and statistics @@ -9,9 +9,9 @@ Hyperlinks in this README.md do not work. | Linux | Coverage | | :------------ | :------- | -| [![Build Status](https://github.com/JuliaAI/MLInterface.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/MLInterface.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaAI/MLInterface.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/MLInterface.jl?branch=master) | +| [![Build Status](https://github.com/JuliaAI/LearnAPI.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/LearnAPI.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaAI/LearnAPI.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/LearnAPI.jl?branch=master) | -[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliaai.github.io/MLInterface.jl/stable/) +[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliaai.github.io/LearnAPI.jl/stable/) **Status.** Proposal, unregistered. diff --git a/docs/Project.toml b/docs/Project.toml index f1507ff..46f77a9 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -1,6 +1,6 @@ [deps] Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" -MLInterface = "92ad9a40-7767-427a-9ee6-6e577f1266cb" +LearnAPI = "92ad9a40-7767-427a-9ee6-6e577f1266cb" [compat] Documenter = "^0.27" diff --git a/docs/make.jl b/docs/make.jl index 72b9942..ba3edac 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -1,10 +1,10 @@ using Documenter -using MLInterface +using LearnAPI -const REPO="github.com/JuliaAI/MLInterface.jl" +const REPO="github.com/JuliaAI/LearnAPI.jl" makedocs(; - modules=[MLInterface], + modules=[LearnAPI], format=Documenter.HTML(prettyurls = get(ENV, "CI", nothing) == "true"), pages=[ "Introduction" => "index.md", @@ -13,9 +13,10 @@ makedocs(; "Reference" => "reference.md", "Fit, update and ingest" => "fit_update_and_ingest.md", "Predict and other operations" => "operations.md", + "Model Traits" => "model_traits.md", ], repo="https://$REPO/blob/{commit}{path}#L{line}", - sitename="MLInterface.jl" + sitename="LearnAPI.jl" ) deploydocs( diff --git a/docs/src/anatomy_of_an_implementation.md b/docs/src/anatomy_of_an_implementation.md index b0a2dfd..1bd2ad6 100644 --- a/docs/src/anatomy_of_an_implementation.md +++ b/docs/src/anatomy_of_an_implementation.md @@ -8,37 +8,36 @@ > model as supervised, and another to list the implemented methods. Optional traits > articulate the model's data type requirements and the output type of operations. -We begin by describing an implementation of the ML Model Interface for basic ridge +We begin by describing an implementation of the Learn API for basic ridge regression (no intercept) to introduce the main actors in any implementation. ## Defining a model type -The first line below imports the lightweight package MLInterface.jl whose methods we will be -extending, the second libraries needed for the core algorithm. +The first line below imports the lightweight package LearnAPI.jl whose methods we will be +extending, the second, libraries needed for the core algorithm. ```julia -import MLInterface +import LearnAPI using LinearAlgebra, Tables ``` Next, we define a struct to store the single hyper-parameter `lambda` of this model: ```julia -struct MyRidge <: MLInterface.Model +struct MyRidge <: LearnAPI.Model lambda::Float64 end ``` -The subtyping `MyRidge <: MLInterface.Model` is optional but recommended where it is not +The subtyping `MyRidge <: LearnAPI.Model` is optional but recommended where it is not otherwise disruptive. If you omit the subtyping then you must declare ```julia -MLInterface.ismodel(::MyRidge) = true +LearnAPI.ismodel(::MyRidge) = true ``` -as a promise that instances of `MyRidge` implement the compulsory elements of the ML Model -Interface. +as a promise that instances of `MyRidge` implement the Learn API. Instances of `MyRidge` are called **models** and `MyRidge` is a **model type**. @@ -55,7 +54,7 @@ A ridge regressor requires two types of data for training: **input features** `X (`0` should train silently, unless warnings are needed): ```julia -function MLInterface.fit(model::MyRidge, verbosity, X, y) +function LearnAPI.fit(model::MyRidge, verbosity, X, y) # process input: x = Tables.matrix(X) # convert table to matrix @@ -83,17 +82,17 @@ end Regarding the return value of `fit`: -- The `fitted_params` is for the model's learned parameters, in any form, for passing to +- The `fitted_params` is for the model's learned parameters, for passing to `predict` (see below). -- The `state` variable is only relevant when additionally implementing an [`update`](@ref) - or [`ingest`](@ref) method (see [Fit, update and ingest](@ref)). +- The `state` variable is only relevant when additionally implementing an [`update!`](@ref) + or [`ingest!`](@ref) method (see [Fit, update! and ingest!](@ref)). - The `report` is for other byproducts of training, excluding the learned parameters. Notice that we have chosen here to suppose that `X` is presented as a table (rows are the observations); and we suppose `y` is a `Real` vector. (While this is typical of MLJ model -implementations, the ML Model Interface puts no restrictions on the form of `X` and `y`.) +implementations, the Learn API puts no restrictions on the form of `X` and `y`.) ## Operations @@ -101,7 +100,7 @@ implementations, the ML Model Interface puts no restrictions on the form of `X` Now we need a method for predicting the target on new input features: ```julia -MLInterface.predict(::MyRidge, fitted_params, Xnew) = Tables.matrix(Xnew)*fitted_params.coefficients +LearnAPI.predict(::MyRidge, fitted_params, Xnew) = Tables.matrix(Xnew)*fitted_params.coefficients ``` The above `predict` method is an example of an **operation**. Other operations include @@ -118,10 +117,11 @@ provides **accessor functions** for extracting information from the `fitted_para we can implement for `MyRidge`: ```julia -MLInterface.feature_importances(::MyRidge, fitted_params, report) = report.feature_importances +LearnAPI.feature_importances(::MyRidge, fitted_params, report) = report.feature_importances ``` -Another example of an accessor function is `training_losses`. +Another example of an accessor function is `training_losses` (supervised models) and +`training_scores` (outlier detection models). ## Model traits @@ -132,7 +132,7 @@ apply a suitable metric (e.g., a sum of squares) to the pair `(ŷ, y)`, where ` predict(model, fitted_params, X)`. We will flag this behavior by declaring ```julia -MLInterface.is_supervised(::Type{<:MyRidge}) = true +LearnAPI.is_supervised(::Type{<:MyRidge}) = true ``` This is an example of a **model trait** declaration. A complete list of traits and the @@ -147,25 +147,30 @@ distinguishes our model from other regressors that make probabilistic or other k predictions of the target: ```julia -MLInterface.prediction_type(::Type{<:MyRidge}) = :deterministic +LearnAPI.paradigm(::Type{<:MyRidge}) = Dict(:predict => :point) ``` -As explained in the introduction, the ML Model Interface does not attempt to define strict -model "types", such as "regressor" or "clusterer". We can optionally specify -suggestive keywords, as in +If instead, our `predict` method would return probabilistic predictions, we would instead +return `Dict(:predict => :pdf)` or `Dict(:predict => :rand)`, depending on whether or not +`predict` returns objects implementing `Distributions.pdf` from Distributions.jl, or merely +`Base.rand`. Other options are `:interval` and `:survival_probability`. + +As explained in the introduction, the Learn API does not attempt to define strict +model "types", such as "regressor" or "clusterer". We can optionally specify suggestive +keywords, as in ```julia MLJInterface.keywords(::Type{<:MyRidge}) = [:regression,] ``` -but note that this declaration promises nothing. Do `MLInterface.keywords()` to get a list +but note that this declaration promises nothing. Do `LearnAPI.keywords()` to get a list of available keywords. Finally, we are required to declare what methods (excluding traits) we have explicitly overloaded for our type: ```julia -MLInterface.implemented_methods(::Type{<:MyRidge}) = [ +LearnAPI.implemented_methods(::Type{<:MyRidge}) = [ :fit, :predict, :feature_importances, @@ -180,7 +185,7 @@ declarations, which in this case look like: ```julia using ScientificTypesBase -MLInterface.fit_data_scitype(::Type{<:MyRidge}) = Tuple{Table(Continuous), AbstractVector{Continuous}} +LearnAPI.fit_data_scitype(::Type{<:MyRidge}) = Tuple{Table(Continuous), AbstractVector{Continuous}} ``` This is a contract that `data` is acceptable in the call `fit(model, verbosity, data...)` diff --git a/docs/src/common_implementation_patterns.md b/docs/src/common_implementation_patterns.md index fb4bcf6..ed9e676 100644 --- a/docs/src/common_implementation_patterns.md +++ b/docs/src/common_implementation_patterns.md @@ -3,7 +3,7 @@ !!! warning This section is only an implementation guide. The definitive specification of the - ML Model Interface is given in [Reference](@ref). + Learn API is given in [Reference](@ref). This guide is intended to be consulted after reading [Anatomy of a Model Implementation](@ref), which introduces the main interface objects and terminology. diff --git a/docs/src/fit_update_and_ingest.md b/docs/src/fit_update_and_ingest.md index 008ac7a..3759001 100644 --- a/docs/src/fit_update_and_ingest.md +++ b/docs/src/fit_update_and_ingest.md @@ -8,15 +8,15 @@ | method | fallback | compulsory? | requires | |:---------------------------|:---------------------------------------------------|-------------|-------------------| -[`MLInterface.fit`](@ref) | does nothing, returns `(nothing, nothing, nothing)`| no | | -[`MLInterface.update!`](@ref) | calls `fit` | no | `MLInterface.fit` | +[`LearnAPI.fit`](@ref) | does nothing, returns `(nothing, nothing, nothing)`| no | | +[`LearnAPI.update!`](@ref) | calls `fit` | no | `LearnAPI.fit` | [`MLJInterface.ingest!`](@ref)| none | no | `MLInterfac.fit` | Implement `fit` unless your model is **static**, meaning its [operations](@ref operations) such as `predict` and `transform`, ignore their `fitted_params` argument (which will be `nothing`). This is the case for models that have hyper-parameters, but do not generalize to new data, such as a basic DBSCAN clustering algorithm. Related: -[`MLInterface.reporting_operations`](@ref), [Static Models](@ref). +[`LearnAPI.reporting_operations`](@ref), [Static Models](@ref). The `update!` method is intended for all subsequent calls to train a model *using the same data*, but with possibly altered hyperparameters (`model` argument). As described below, a @@ -30,15 +30,15 @@ data). Like `update!`, it depends on the output a preceding `fit` or `ingest!` c ```@docs -MLInterface.fit -MLInterface.update! -MLInterface.ingest! +LearnAPI.fit +LearnAPI.update! +LearnAPI.ingest! ``` ## Further guidance on what goes where Recall that the `fitted_params` returned as part of `fit` represents everything needed by an -[operation](@ref operations), such as [`MLInterface.predict`](@ref). +[operation](@ref operations), such as [`LearnAPI.predict`](@ref). The properties of your model (typically struct fields) are *hyperparameters*, i.e., those parameters declared by the user ahead of time that generally affect the outcome of training diff --git a/docs/src/index.md b/docs/src/index.md index 990a3e3..0ad7b82 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -1,7 +1,7 @@ ```@raw html -MLInterface.jl +LearnAPI.jl
A Julia interface for training and applying models in machine learning and statistics @@ -15,7 +15,7 @@ attempting to impose uniform behaviour within each group, is problematic. It eit limitations on the models that can be included in a general interface, or to undesirable complexity needed to cope with exceptional cases. -For these and other reasons, the **ML Model Interface** documented here is purely functional +For these and other reasons, the **Learn API** documented here is purely functional with no abstract model types (apart an optional supertype `Model`). In addition to `fit`, `update!` and `ingest!` methods (all optional), one implements one or more operations, such as `predict`, `transform` and `inverse_transform`. Method stubs for access functions, such @@ -23,7 +23,7 @@ as `feature_importances`, are also provided. Finally, a number of optional trait declarations, such as `is_supervised(model::SomeModel) = true`, make promises of specific behaviour. -The ML Model Interface provides methods for training, applying, and saving machine learning +The Learn API provides methods for training, applying, and saving machine learning models, and that is all. It does not provide an interface for data resampling, although it informally distinguishes between training data consisting of "observations", and other "metadata", such as target class weights or group lasso feature gropings. At present the @@ -49,9 +49,9 @@ specification of the interface is the [Reference](@ref) section. consulting the guide or reference sections. -**Note.** The ML Model Interface provides a foundation for the higher level "machine" +**Note.** The Learn API provides a foundation for the higher level "machine" interface for user interaction in the toolbox [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) created by the same -developers. However, the ML Model Interface provided here is meant as a general purpose, +developers. However, the Learn API provided here is meant as a general purpose, standalone, lightweight API for machine learning algorithms (and has no reference to machines). diff --git a/docs/src/model_traits.md b/docs/src/model_traits.md new file mode 100644 index 0000000..fcc939e --- /dev/null +++ b/docs/src/model_traits.md @@ -0,0 +1,11 @@ +# Model Traits + +| trait | fallback value | requires | required by | +|:------|:---------|:---------|:---------------| +| [`LearnAPI.ismodel`](@ref) | `false` | one of: `predict`/`predict_joint`/`transform` | all models | +| [`LearnAPI.implemented_methods`](@ref) | `Symbol[]` | | all models | +| [`LearnAPI.is_supervised`](@ref) | `false` | | [`LearnAPI.predict`](@ref) or [`LearnAPI.predict_joint`](@ref) | +| [`LearnAPI.paradigm`](@ref) | `:unknown` | relevant operations | [`LearnAPI.predict`](@ref), [`MLJInterface.predict_joint`](@ref) †| +| [`MLInteface.joint_prediction_type`](@ref) | `:unknown` | [`LearnAPI.predict_joint`](@ref) | [`LearnAPI.predict_joint`](@ref) | + +† If additionally `is_supervised(model) == true`. diff --git a/docs/src/operations.md b/docs/src/operations.md new file mode 100644 index 0000000..6716777 --- /dev/null +++ b/docs/src/operations.md @@ -0,0 +1,49 @@ +# [Predict and other operations](@id operations) + +An *operation* is any method with signature `(model, fitted_params, data...)`, where `fitted_params` +is the learned parameters object, as returned by [`LearnAPI.fit`](@ref) (which will be +`nothing` if `fit` is not implemented). For example, `predict` in the following code snippet +is an operation: + +```julia +fitted_params, state, report = LearnAPI.fit(some_model, 1, X, y) +ŷ = predict(some_model, fitted_params, Xnew) +``` + +## General requirements + +- Each `model` must implement at least one of: `predict`, `transform`, `predict_joint`. + +- If `LearnAPI.is_supervised(model) == true` then `predict` or `predict_joint` must be + implemented. + +- Do not overload `predict_mode`, `predict_mean` or `predict_median` unless + `predict` has been implemented. + +- Do not overload `inverse_transform` unless `transform` has been implemented. + +- Each operation explicitly implemented or overloaded must be included in the return value + of [`LearnAPI.implemented_methods`](@ref). + + +| method | fallback | +|:--------------------------------------|:---------------------------- | +[`LearnAPI.predict`](@ref) | none | +[`LearnAPI.predict_mode`](@ref) | none † | +[`LearnAPI.predict_mean`](@ref) | broadcast `Statistics.mean` | +[`LearnAPI.predict_median`](@ref) | broadcast `Statistic.median` | +[`LearnAPI.predict_joint`](@ref) | none | +[`LearnAPI.transform`](@ref) | none | +[`MLJInterface.inverse_transform`](@ref)| none | + +> **† MLJ only.** MLJBase provides a fallback for `predict_mode`, which broadcasts +> `StatBase.mode` over observations returned by `LearnAPI.predict`. + +## Specifics + +```@docs +LearnAPI.predict +LearnAPI.predict_mean +LearnAPI.predict_median +LearnAPI.predict_joint +``` diff --git a/docs/src/reference.md b/docs/src/reference.md index 69c487e..877ff10 100644 --- a/docs/src/reference.md +++ b/docs/src/reference.md @@ -1,11 +1,11 @@ # Reference -Here we give the definitive specification of the ML Model Interface. For a more informal +Here we give the definitive specification of the Learn API. For a more informal guide see [Common Implementation Patterns](@ref). ## Models -> **Summary** In the ML Model Interface a **model** is a Julia object whose properties are +> **Summary** In the Learn API a **model** is a Julia object whose properties are > the hyper-parameters of some learning algorithm. Functionality is created by overloading > methods defined by the interface and promises of certain behavior articulated by model > traits. @@ -21,7 +21,7 @@ It is supposed that making copies of model objects is a cheap operation. Consequ *learned* parameters, such as coefficients in a linear model, or weights in a neural network (the `fitted_params` appearing in [Fit, update! and ingest!](@ref)) are not expected to be part of a model. Storing learned parameters in a model is not explicitly ruled out, but -doing so might lead to performance issues in packages adopting the ML Model Interface. +doing so might lead to performance issues in packages adopting the Learn API. Two models with the same type should be `==` if and only if all their hyper-parameters are `==`. Of course, a hyper-parameter could be another model. @@ -29,17 +29,17 @@ Two models with the same type should be `==` if and only if all their hyper-para Any instance of `SomeType` below is a model in the above sense: ```julia -struct SomeType{T<:Real} <: MLInterface.Model +struct SomeType{T<:Real} <: LearnAPI.Model epochs::Int lambda::T end ``` -The subtyping `<: MLInterface.Model` is optional. If it is included and the type is instead +The subtyping `<: LearnAPI.Model` is optional. If it is included and the type is instead a `mutable struct`, then there is no need to explicitly overload `Base.==`. If it is omitted, then one must make the declaration -`MLInterface.ismodel(::SomeType) = true` +`LearnAPI.ismodel(::SomeType) = true` and overload `Base.==` in the mutable case. @@ -47,8 +47,8 @@ and overload `Base.==` in the mutable case. > standard MLJ convention, assuming MLJ or MLJBase are loaded. ```@docs -MLInterface.ismodel -MLInterface.Model +LearnAPI.ismodel +LearnAPI.Model ``` ## Data containers diff --git a/src/MLInterface.jl b/src/LearnAPI.jl similarity index 84% rename from src/MLInterface.jl rename to src/LearnAPI.jl index c422764..f393769 100644 --- a/src/MLInterface.jl +++ b/src/LearnAPI.jl @@ -1,4 +1,4 @@ -module MLInterface +module LearnAPI using Statistics diff --git a/src/fit_update_ingest.jl b/src/fit_update_ingest.jl index cd9cac9..1053765 100644 --- a/src/fit_update_ingest.jl +++ b/src/fit_update_ingest.jl @@ -1,27 +1,31 @@ # # DOC STRING HELPERS const DOC_OPERATIONS = - "An *operation* is a method, like [`MLInterface.predict`](@ref) or "* - "[`MLInterface.transform`](@ref), that has signature "* - "`(model, fitted_params, data....)`; do `MLInterface.OPERATIONS` to list." + "An *operation* is a method, like [`LearnAPI.predict`](@ref) or "* + "[`LearnAPI.transform`](@ref), that has signature "* + "`(model, fitted_params, data....)`; do `LearnAPI.OPERATIONS` to list." function DOC_IMPLEMENTED_METHODS(name; overloaded=false) word = overloaded ? "overloaded" : "implemented" "If $word, include `:$name` in the vector returned by the "* - "[`MLInterface.implemented_methods`](@ref) trait. " + "[`LearnAPI.implemented_methods`](@ref) trait. " end +const DOC_MUTATING_MODELS = "**Important.** It is not permitted to mutate `model`. "* + "In particular, if `model` has a random number generator as a hyperparameter "* + "(property) then it must be copied before use. " + # # FIT """ - MLInterface.fit(model, verbosity, data...; metadata...) + LearnAPI.fit(model, verbosity, data...; metadata...) Fit `model` to the provided `data` and `metadata`. With the exception of warnings, training should be silent if `verbosity == 0`. Lower values should suppress warnings. Here: - `model` is a property-accessible object whose properties are the hyper-parameters of some - machine learning algorithm; see also [`MLInterface.ismodel`](@ref). + machine learning algorithm; see also [`LearnAPI.ismodel`](@ref). - `data` is a tuple of data objects (as defined in the ML Interface documentation) with a common number of observations, for example, `data = (X, y, w)` where `X` is a table of @@ -42,8 +46,8 @@ Returns a tuple (`fitted_params`, `state`, `report`) where: user-interest is not needed for operations, it should be part of `report` instead (see below). -- The `state` is for passing to [`MLInterface.update!`](@ref) or - [`MLInterface.ingest!`](@ref). For models that implement neither, `state` should be +- The `state` is for passing to [`LearnAPI.update!`](@ref) or + [`LearnAPI.ingest!`](@ref). For models that implement neither, `state` should be `nothing`. - The `report` records byproducts of training not in the `fitted_params`. @@ -51,12 +55,14 @@ Returns a tuple (`fitted_params`, `state`, `report`) where: # New model implementations -This method is an optional method the ML Model Interface. A fallback performs no +This method is an optional method in the ML Model Interface. A fallback performs no computation, returning `(nothing, nothing, nothing)`. +$DOC_MUTATING_MODELS + $(DOC_IMPLEMENTED_METHODS(:fit)) -See also [`MLInterface.update!`](@ref), [`MLInterface.ingest!`](@ref). +See also [`LearnAPI.update!`](@ref), [`LearnAPI.ingest!`](@ref). """ fit(::Any, ::Any, ::Integer, data...; metadata...) = nothing, nothing, nothing @@ -65,8 +71,8 @@ fit(::Any, ::Any, ::Integer, data...; metadata...) = nothing, nothing, nothing # # UPDATE """ - MLInterface.update!(model, verbosity, fitted_params, state, data...; metadata...)d Based on the values of `state`, and `fitted_params` returned by a preceding call to -[`MLInterface.fit`](ref), [`MLInterface.ingest!`](@ref), or [`MLInterface.update!`](@ref), + LearnAPI.update!(model, verbosity, fitted_params, state, data...; metadata...)d Based on the values of `state`, and `fitted_params` returned by a preceding call to +[`LearnAPI.fit`](ref), [`LearnAPI.ingest!`](@ref), or [`LearnAPI.update!`](@ref), update a model's learned parameters, returning new (or mutated) `state` and `fitted_params`. Intended for retraining a model when the training data has not changed, but `model` @@ -80,31 +86,32 @@ differ only in the number of iterations (e.g., epochs in a neural networ), which increased, then the learned parameters (weights) are updated, rather computed ab initio. Otherwise, `update!` simply calls `fit` to retrain from scratch. -**Important.** It is permitted to return mutated versions of `state` and `fitted_params`, rather - than new objects, but no other argument may be mutated. +It is permitted to return mutated versions of `state` and `fitted_params`. -For incremental training (same model, new data) see instead [`MLInterface.ingest!`](@ref). +For incremental training (same model, new data) see instead [`LearnAPI.ingest!`](@ref). # Return value -Same as [`MLInterface.fit`](@ref), namely a tuple (`fitted_params`, `state`, `report`). See -[`MLInterface.fit`](@ref) for details. +Same as [`LearnAPI.fit`](@ref), namely a tuple (`fitted_params`, `state`, `report`). See +[`LearnAPI.fit`](@ref) for details. # New model implementations This method is an optional method in the ML Model Interface. A fallback calls -`MLInterfaceperforms.fit`: +`LearnAPIperforms.fit`: ```julia -MLInterface.update!(model, verbosity, fitted_params, state, data...; metadata...) = +LearnAPI.update!(model, verbosity, fitted_params, state, data...; metadata...) = fit(model, verbosity, data; metadata...) ``` +$DOC_MUTATING_MODELS + $(DOC_IMPLEMENTED_METHODS(:fit)) -See also [`MLInterface.fit`](@ref), [`MLInterface.ingest!`](@ref). +See also [`LearnAPI.fit`](@ref), [`LearnAPI.ingest!`](@ref). """ update!(model, verbosity, fitted_params, state, data...; metadata...) = @@ -114,35 +121,34 @@ update!(model, verbosity, fitted_params, state, data...; metadata...) = # # INGEST """ - MLInterface.ingest!(model, verbosity, fitted_params, state, data...; metadata...) + LearnAPI.ingest!(model, verbosity, fitted_params, state, data...; metadata...) For a model that supports incremental learning, update the learned parameters using `data`, which has typically not been seen before. The arguments `state` and `fitted_params` are the -output of a preceding call to [`MLInterface.fit`](ref), [`MLInterface.ingest!`](@ref), or -[`MLInterface.update!`](@ref), of which mutated or new versions are returned. +output of a preceding call to [`LearnAPI.fit`](ref), [`LearnAPI.ingest!`](@ref), or +[`LearnAPI.update!`](@ref), of which mutated or new versions are returned. For updating learned parameters using the *same* data but new hyperparameters, see instead -[`MLInterface.update!`](@ref). +[`LearnAPI.update!`](@ref). -**Important.** It is permitted to return mutated versions of `state` and `fitted_params`, rather -than new objects, but no other argument may be mutated. - -For incremental training, see instead [`MLInterface.ingest'](@ref). +For incremental training, see instead [`LearnAPI.ingest'](@ref). # Return value -Same as [`MLInterface.fit`](@ref), namely a tuple (`fitted_params`, `state`, `report`). See -[`MLInterface.fit`](@ref) for details. +Same as [`LearnAPI.fit`](@ref), namely a tuple (`fitted_params`, `state`, `report`). See +[`LearnAPI.fit`](@ref) for details. # New model implementations This method is an optional method in the ML Model Interface. It has no fallback. +$DOC_MUTATING_MODELS + $(DOC_IMPLEMENTED_METHODS(:fit)) -See also [`MLInterface.fit`](@ref), [`MLInterface.update!`](@ref). +See also [`LearnAPI.fit`](@ref), [`LearnAPI.update!`](@ref). """ function ingest!(model, verbosity, fitted_params, state, data...; metadata...) end diff --git a/src/models.jl b/src/models.jl index 2cab939..3bf3854 100644 --- a/src/models.jl +++ b/src/models.jl @@ -2,17 +2,17 @@ abstract type MLType end """ - MLInterface.Model + LearnAPI.Model An optional abstract type for models in the ML Model Interface. # New ML Model Implementations -Either declare `NewModelType <: MLInterface.Model` or `MLInterface.model(::SomeModelType) = +Either declare `NewModelType <: LearnAPI.Model` or `LearnAPI.model(::SomeModelType) = true`. The first implies the second and additionally guarantees `==` has correct behaviour for `NewModelType` instances. -See also [`MLInterface.ismodel`](@ref). +See also [`LearnAPI.ismodel`](@ref). """ abstract type Model <: MLType end @@ -30,15 +30,15 @@ documentation. In particular, this means: corresponding properties are `==`. - `m` correctly implements methods from the ML Model Interface. See the documentation for - MLInterface for details. + LearnAPI for details. # New ML Model Implementations -Either declare `NewModelType <: MLInterface.Model` or `MLInterface.model(::NewModelType) = +Either declare `NewModelType <: LearnAPI.Model` or `LearnAPI.model(::NewModelType) = true`. -See also [`MLInterface.Model`](@ref). +See also [`LearnAPI.Model`](@ref). """ ismodel(::Any) = false diff --git a/src/operations.jl b/src/operations.jl new file mode 100644 index 0000000..22d029f --- /dev/null +++ b/src/operations.jl @@ -0,0 +1,146 @@ +const PREDICT_OPERATIONS = (:predict, + :predict_mode, + :predict_mean, + :predict_median, + :predict_joint) +const OPERATIONS = (PREDICT_OPERATIONS..., :transform, :inverse_transform) + +const DOC_NEW_DATA = + "Here `data` is a tuple of data objects, "* + "generally a single object representing new observations "* + "not seen in training. " + +# # FALLBACK HELPERS + +function _predict(model, args...) + p = predict(model, args...) + if :predict in reporting_operations(model) + return p + else + return (p, nothing) + end +end + +_compress(yhat, report) = (yhat, report) +_compress(yhat, ::Nothing) = yhat + + +# # METHOD STUBS/FALLBACKS + +""" + LearnAPI.predict(model, fitted_params, data...) + +Return predictions or prediction-like output, `ŷ`, for a machine learning model, `model`, +with learned parameters `fitted_params`, as returned by +[`LearnAPI.fit`](@ref). $DOC_NEW_DATA + +However, in the special case that `:predict in LearnAPI.reporting_operations(model)` is +`true`, `(ŷ, report)` is returned instead. Here `report` contains ancilliary byproducts of +computing the prediction. + + +# New model implementations + +$(DOC_IMPLEMENTED_METHODS(:predict)) + +If `is_supervised(model) = true`, then `ŷ` must be: + +- either an array or table with the same number of observations as each element of `data`; + it cannot be a lazy object, such as a `DataLoader` + +- **target-like** (point, probabilistic, or interval); see + [`LearnAPI.prediction_type`](@ref) for specifics. + +Otherwise there are no restrictions on what `predict` may return, apart from what the +implementation itself promises, by making an optional [`LearnAPI.output_scitypes`](@ref) +declaration. + +By default, it is expected that `data` has length one. Otherwise, +[`LearnAPI.input_scitypes`](@ref) must be overloaded. + +See also [`LearnAPI.fit`](@ref), [`MJInterface.predict_mean`](@ref), +[`LearnAPI.predict_mode`](@ref), [`LearnAPI.predict_median`](@ref). + +""" +function predict end + +function DOC_PREDICT(reducer) + operation = Symbol(string("predict_", reducer)) + extra = DOC_IMPLEMENTED_METHODS(operation, overloaded=true) + """ + LearnAPI.predict_$reducer(model, fitted_params, data...) + + If `LearnAPI.predict` returns a vector of probabilistic predictions, `distributions`, + return a corresponding data object `ŷ` of $reducer values. $DOC_NEW_DATA + + In the special case that `LearnAPI.predict` instead returns `(distributions, + report)`, `$operation` instead return `(ŷ, report)`. + + + # New model implementations + + A fallback broadcasts `$reducer` over `ŷ`. An algorithm that predicts probabilistic + predictions may already need to predict mean values, and so overloading this method + might provide a performance advantage. + + $extra + + See also [`LearnAPI.predict`](@ref), [`LearnAPI.fit`](@ref). + + """ +end + +for reducer in [:mean, :median] + operation = Symbol(string("predict_", reducer)) + docstring = DOC_PREDICT(reducer) + quote + "$($docstring)" + function $operation(args...) + distributions, report = _predict(args...) + yhat = $reducer.(distributions) + return _compress(yhat, report) + end + end |> eval +end + +""" + LearnAPI.predict_joint(model, fitted_params, data...) + +For a supervised learning model, return a single probability distribution for the sample +space ``Y^n``, whose elements are `n`-dimensional vectors with element type matching that of +the training target (the second data object in `LearnAPI.fit(model, verbosity, +data...)`). Here `n` is the number of observations in `data`. Here `fitted_params` are the +model's learned parameters, as returned by [`LearnAPI.fit`](@ref). $DOC_NEW_DATA. + +While the interpretation of this distribution depends on the model, marginalizing +component-wise will generally deliver *correlated* univariate distributions, and these will +generally not agree with those returned by `LearnAPI.predict`, if implemented. + +# New model implementations + +It is not necessary that `LearnAPI.predict` be implemented but +`LearnAPI.is_supervised(model)` must return `true`. + +$(DOC_IMPLEMENTED_METHODS(:predict_joint)). + +See also [`LearnAPI.fit`](@ref), [`LearnAPI.predict`](@ref). + +""" +function predict_joint end + +""" +`Unsupervised` models must implement the `transform` operation. +""" +function transform end + +""" + +`Unsupervised` models may implement the `inverse_transform` operation. + +""" +function inverse_transform end + +# models can optionally overload these for enable serialization in a +# custom format: +function save end +function restore end From a493f71bf09377710d2294bb26164a63eaab7427 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Tue, 23 Aug 2022 11:58:56 +1200 Subject: [PATCH 05/15] more changes; revert to inverse_transform --- docs/src/accessor_functions | 4 + docs/src/common_implementation_patterns.md | 4 +- docs/src/fit_update_and_ingest.md | 4 +- docs/src/index.md | 91 ++++++++++++++------ docs/src/model_traits.md | 2 +- docs/src/operations.md | 96 ++++++++++++++++------ docs/src/reference.md | 34 +++----- src/operations.jl | 86 +++++++++---------- 8 files changed, 200 insertions(+), 121 deletions(-) create mode 100644 docs/src/accessor_functions diff --git a/docs/src/accessor_functions b/docs/src/accessor_functions new file mode 100644 index 0000000..79af200 --- /dev/null +++ b/docs/src/accessor_functions @@ -0,0 +1,4 @@ +training_labels +training_losses +training_scores +feature_importances \ No newline at end of file diff --git a/docs/src/common_implementation_patterns.md b/docs/src/common_implementation_patterns.md index ed9e676..fb8c321 100644 --- a/docs/src/common_implementation_patterns.md +++ b/docs/src/common_implementation_patterns.md @@ -9,8 +9,8 @@ This guide is intended to be consulted after reading [Anatomy of a Model Implementation](@ref), which introduces the main interface objects and terminology. Although an implementation is defined purely by the methods and traits it implements, most -implementations fall into one (or more) of the following informally understood algorithm -"types": +implementations fall into one (or more) of the following informally understood patterns or +"tasks": - [Classifiers](@ref): Supervised learners for categorical targets diff --git a/docs/src/fit_update_and_ingest.md b/docs/src/fit_update_and_ingest.md index 3759001..e23c693 100644 --- a/docs/src/fit_update_and_ingest.md +++ b/docs/src/fit_update_and_ingest.md @@ -10,9 +10,9 @@ |:---------------------------|:---------------------------------------------------|-------------|-------------------| [`LearnAPI.fit`](@ref) | does nothing, returns `(nothing, nothing, nothing)`| no | | [`LearnAPI.update!`](@ref) | calls `fit` | no | `LearnAPI.fit` | -[`MLJInterface.ingest!`](@ref)| none | no | `MLInterfac.fit` | +[`LearnAPI.ingest!`](@ref)| none | no | `LearnAPI.fit` | -Implement `fit` unless your model is **static**, meaning its [operations](@ref operations) +Implement `fit` unless your model is **static**, meaning its [operations](@ref operations), such as `predict` and `transform`, ignore their `fitted_params` argument (which will be `nothing`). This is the case for models that have hyper-parameters, but do not generalize to new data, such as a basic DBSCAN clustering algorithm. Related: diff --git a/docs/src/index.md b/docs/src/index.md index 0ad7b82..d16450f 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -7,33 +7,78 @@ LearnAPI.jl A Julia interface for training and applying models in machine learning and statistics ``` -# Introduction - Machine learning algorithms, also called *models*, have a complicated taxonomy. Grouping models into a relatively small number of types, such as "classifier" and "clusterer", and -attempting to impose uniform behaviour within each group, is problematic. It either leads to -limitations on the models that can be included in a general interface, or to undesirable -complexity needed to cope with exceptional cases. - -For these and other reasons, the **Learn API** documented here is purely functional -with no abstract model types (apart an optional supertype `Model`). In addition to `fit`, -`update!` and `ingest!` methods (all optional), one implements one or more operations, such -as `predict`, `transform` and `inverse_transform`. Method stubs for access functions, such -as `feature_importances`, are also provided. Finally, a number of optional trait -declarations, such as `is_supervised(model::SomeModel) = true`, make promises of specific -behaviour. - -The Learn API provides methods for training, applying, and saving machine learning -models, and that is all. It does not provide an interface for data resampling, although it -informally distinguishes between training data consisting of "observations", and other -"metadata", such as target class weights or group lasso feature gropings. At present the -only restriction on data containers concerns the target predictions of supervised models -(whether deterministic, probabilistic or otherwise): These must be abstract arrays or tables -compatible with [Tables.jl](https://github.com/JuliaData/Tables.jl). +attempting to impose uniform behaviour within each group, is challenging. In our +experience, it either leads to limitations on the models that can be included in a general +interface, or additional complexity needed to cope with exceptional cases. Even if a +complete user interface for machine learning might benefit from such groupings, a +basement-level API for ML should, in our view, avoid them. + +The **Learn API** documented here is base API for machine learning that is purely +functional with no abstract model types (apart an optional supertype `Model`). It provides +the following methods, dispatched on model type: + +- `fit` for regular training + +- `update!` for adding model iterations, or responding efficiently to other + post-`fit`changes in hyperparameters + +- `ingest!` for incremental learning + +- **operations**, such as `predict`, `transform` and `inverse_transform` for applying the model + to data + +- common **access functions**, such as `feature_importances` and `training_losses`, for + extracting from training outcomes information common to particular classes of models. + +- **model traits**, such as `is_supervised(model)`, for promising specific behaviour. + +Since this is a functional interface, `fit` returns model "state", in addition to learned +parameters, for passing to the optional `update!` and `ingest!` methods. These three +methods all return a `report` component, for exposing byproducts of training different +from learned parameters. Similarly, all operations also return a `report` component, +although this would typically be `nothing`, unless the model does not implement `fit` +(does not generalize to new data). + + +## Scope and undefined notions + +The Learn API provides methods for training, applying, and saving machine learning models, +and that is all. To keep it *It does not specify an interface for data access or data +resampling*. That said, the interface references a few basic undefined notions, which some +higher-level interface might decide to formalize: + +- Each machine learning model's behaviour is governed by a number of user-specified + **hyper-parameters**. + +- An object which generates ordered sequences of individual **observations** is called + **data**. + +- Information needed for training that is not a model hyper-parameter and not data is called + **metadata** (e.g., target class weights and group lasso feature groupings). + +- Some models, including but not limited to supervised models, involve **target** data, in + training or otherwise, and implement an operation, typically `predict`, that outputs + data that is target-like. To say that data is **target-like** is to say that it can be + paired with target data having the same number of observations to obtain useful + information about the model and the data that has been presented to it, typically a + measure of the model's expected performance on unseen data. Target-like data can take + various informally defined forms, such as `Deterministic`, `Distribution`, `Sampleable`, + `SurvivalFunction` and `Interval` detailed further under [Operations](@ref operations). + +Regarding the last point, consider outlier detection, where target observations are either +"outlier" or "inlier". If the detector predicts probabilities for outlierness (the +target-like data) these can be paired with "outlier"/"inlier" labels assigned by humans, +using, say, area under the ROC curve, to measure performance. Many such detectors are +trainined without supervision. + + +## Contents Our opening observations notwithstanding, it is useful to have a guide to the interface, -linked below, organized around common *informally defined* patterns. However, the definitive -specification of the interface is the [Reference](@ref) section. +linked below, organized around common *informally defined* patterns or "tasks". However, +the definitive specification of the interface is the [Reference](@ref) section. - [Anatomy of an Implementation](@ref) (Overview) diff --git a/docs/src/model_traits.md b/docs/src/model_traits.md index fcc939e..a20a966 100644 --- a/docs/src/model_traits.md +++ b/docs/src/model_traits.md @@ -4,7 +4,7 @@ |:------|:---------|:---------|:---------------| | [`LearnAPI.ismodel`](@ref) | `false` | one of: `predict`/`predict_joint`/`transform` | all models | | [`LearnAPI.implemented_methods`](@ref) | `Symbol[]` | | all models | -| [`LearnAPI.is_supervised`](@ref) | `false` | | [`LearnAPI.predict`](@ref) or [`LearnAPI.predict_joint`](@ref) | +| [`LearnAPI.is_supervised`](@ref) | `false` | [`LearnAPI.predict`](@ref) or [`LearnAPI.predict_joint`](@ref) | [`LearnAPI.predict_joint`](@ref) | | [`LearnAPI.paradigm`](@ref) | `:unknown` | relevant operations | [`LearnAPI.predict`](@ref), [`MLJInterface.predict_joint`](@ref) †| | [`MLInteface.joint_prediction_type`](@ref) | `:unknown` | [`LearnAPI.predict_joint`](@ref) | [`LearnAPI.predict_joint`](@ref) | diff --git a/docs/src/operations.md b/docs/src/operations.md index 6716777..10ea75d 100644 --- a/docs/src/operations.md +++ b/docs/src/operations.md @@ -1,49 +1,97 @@ # [Predict and other operations](@id operations) -An *operation* is any method with signature `(model, fitted_params, data...)`, where `fitted_params` -is the learned parameters object, as returned by [`LearnAPI.fit`](@ref) (which will be -`nothing` if `fit` is not implemented). For example, `predict` in the following code snippet -is an operation: +An *operation* is any method with signature `some_operation(model, fitted_params, +data...)`. Here `fitted_params` is the learned parameters object, as returned by +[`LearnAPI.fit`](@ref), which will be `nothing` if `fit` is not implemented (true for models +that do not generalize to new data). For example, `predict` in the following code snippet is +an operation: ```julia -fitted_params, state, report = LearnAPI.fit(some_model, 1, X, y) -ŷ = predict(some_model, fitted_params, Xnew) +fitted_params, state, fit_report = LearnAPI.fit(some_model, 1, X, y) +ŷ, predict_report = predict(some_model, fitted_params, Xnew) ``` -## General requirements +| method | compulsory? | fallback | requires | +|:-----------------------------------|:-----------:|:--------:|:-----------:| +[`LearnAPI.predict`](@ref) | no | none | | +[`LearnAPI.predict_mode`](@ref) | no | none | `predict` | +[`LearnAPI.predict_mean`](@ref) | no | none | `predict` | +[`LearnAPI.predict_median`](@ref) | no | none | `predict` | +[`LearnAPI.predict_joint`](@ref) | no | none | | +[`LearnAPI.transform`](@ref) | no | none | | +[`LearnAPI.inverse_transform`](@ref) | no | none | `transform` | + +> **† MLJ only.** MLJBase provides fallbacks for `predict_mode`, `predict_mean` and +> `predict_median` by broadcasting methods from `Statistics` and `StatsBase` over the +> results of `predict`. -- Each `model` must implement at least one of: `predict`, `transform`, `predict_joint`. +## General requirements -- If `LearnAPI.is_supervised(model) == true` then `predict` or `predict_joint` must be - implemented. +- Each `model` must implement at least one of: `predict`, `transform`, + `predict_joint`. + +- Only implement `predict_joint` for outputing a *single* multivariate probability + distribution with a dimension for each input observation; see + [`LearnAPI.predict_joint`](@ref) for details. -- Do not overload `predict_mode`, `predict_mean` or `predict_median` unless - `predict` has been implemented. +- Do not overload `predict_mode`, `predict_mean` or `predict_median` unless `predict` has + been implemented. - Do not overload `inverse_transform` unless `transform` has been implemented. - Each operation explicitly implemented or overloaded must be included in the return value of [`LearnAPI.implemented_methods`](@ref). +## Predict or transform? + +- If the model has a target, as defined under [Scope and undefined notions](@ref), then + only `predict` or `predict_joint` can be used to generate corresponding target-like + data. + +- If an operation is to have an inverse operation, then it cannot be `predict` - use + `transform` and `inverse_transform`. + +Here an "inverse" of `transform` is very broadly understood as any operation that can be +applied to the output of `transform` to obtain an object of the same form as the input of +`transform`; for example this includes one-sided inverses, and approximate one-sided +inverses. (In some API's, such an operation is called `reconstruct`.) + +In all other cases, the Learn API makes only informal stipulations on which operation to +use: + +- Clustering algorithms should use `predict` *when returning cluster labels.* (For + clusterering algorithms that perform dimension reduction, `transform` can be used.) + +- Outlier detection models should return raw scores using `transform` and use `predict` for + returning either normalized scores or "outlier"/"inlier" classifications. + + +## Paradigms for target-like output + +Target-like data, as defined under [Scope and undefined notions](@ref), is classified by a +**paradigm**, which is one of the abstract types appearing in the table below. + +| paradigm type | form of observations | possible requirement in some external API | +|:---------------------:|:--------------------|:------------------------------------------| +| `LearnAPI.Deterministic` | the same form as target observations | Observations have same type as target observations. | +| `LearnAPI.Distribution` | explicit probability/mass density functions with sample space all possible target observations | Observations implements `Distributions.pdf`. | +| `LearnAPI.Sampleable` | objects that can be sampled to obtain objects of the same form as target observations) | Each observation implements `Base.rand`. | +| `LearnAPI.Interval` | ordered pairs of real numbers | Each observation `isa Tuple{Real,Real}`. +| `LearnAPI.SurvivalFunction` | survival functions | Observations are single-argument functions mapping `Real` to `Real`. + + +!!! warning + + The last column of the table is not part of the Learn API. -| method | fallback | -|:--------------------------------------|:---------------------------- | -[`LearnAPI.predict`](@ref) | none | -[`LearnAPI.predict_mode`](@ref) | none † | -[`LearnAPI.predict_mean`](@ref) | broadcast `Statistics.mean` | -[`LearnAPI.predict_median`](@ref) | broadcast `Statistic.median` | -[`LearnAPI.predict_joint`](@ref) | none | -[`LearnAPI.transform`](@ref) | none | -[`MLJInterface.inverse_transform`](@ref)| none | -> **† MLJ only.** MLJBase provides a fallback for `predict_mode`, which broadcasts -> `StatBase.mode` over observations returned by `LearnAPI.predict`. -## Specifics +## Operation specifics ```@docs LearnAPI.predict LearnAPI.predict_mean LearnAPI.predict_median LearnAPI.predict_joint +LearnAPI.transform ``` diff --git a/docs/src/reference.md b/docs/src/reference.md index 877ff10..221a205 100644 --- a/docs/src/reference.md +++ b/docs/src/reference.md @@ -5,10 +5,10 @@ guide see [Common Implementation Patterns](@ref). ## Models -> **Summary** In the Learn API a **model** is a Julia object whose properties are -> the hyper-parameters of some learning algorithm. Functionality is created by overloading -> methods defined by the interface and promises of certain behavior articulated by model -> traits. +> **Summary** In the Learn API a **model** is a Julia object whose properties are the +> hyper-parameters of some learning algorithm. Functionality is created by overloading +> methods defined by the interface and promises of certain behavior is articulated by +> model traits. In this document the word "model" has a very specific meaning that may differ from the reader's common understanding of the word - in statistics, for example. In this document a @@ -44,28 +44,13 @@ omitted, then one must make the declaration and overload `Base.==` in the mutable case. > **MLJ only.** The subtyping also ensures instances will be displayed according to a -> standard MLJ convention, assuming MLJ or MLJBase are loaded. +> standard MLJ convention, assuming MLJ or MLJBase is loaded. ```@docs LearnAPI.ismodel LearnAPI.Model ``` -## Data containers - -In this document a **data container** is any object implementing some kind of iteration -interface, where the length of the iteration, called the **number of observations**, is -known in advance. At present "some kind of iteration interface" remains undefined, but a -working definition would include the `getrows` interface from -[Tables.jl](https://github.com/JuliaData/Tables.jl) interface and/or the `getobs` interface -from [MLUtils.jl](https://github.com/JuliaML/MLUtils.jl) (the latter interface -[subsuming](https://github.com/JuliaML/MLUtils.jl/issues/61) the former at some point?). The -`getobs` interface includes a built-in implementation for any `AbstractArray`, where the -observation index is understood to be the *last* index. Unfortunately, according to this -convention, a matrix `X` in this interface, corresponds to `Tables.table(X')` in the -`getrows` interface (where observations are rows). - - ## Methods Model functionality is created by implementing: @@ -77,16 +62,17 @@ Model functionality is created by implementing: - zero or more **accessor functions** -While promises of certain behaviour are articulated using **model traits**. Examples of all -these methods given in [Anatomy of an Interface](@ref)). +Meanwhile, promises of certain behaviour are articulated using **model traits**. + +Examples of all these methods given in [Anatomy of an Interface](@ref). - [Fit, update! and ingest!](@ref): for models that "learn" (generalize to new data) - [Operations](@ref operations): `predict`, `transform` and their relatives -- [Accessor Functions](@ref): accessing byproducts of training shared by some models, such - as feature importances and training losses +- [Accessor Functions](@ref): accessing certain byproducts of training that many models + share, such as feature importances and training losses - [Model Traits](@ref): contracts for specific behaviour, such as "I am supervised" or "I predict probability distributions" diff --git a/src/operations.jl b/src/operations.jl index 22d029f..962465d 100644 --- a/src/operations.jl +++ b/src/operations.jl @@ -6,50 +6,32 @@ const PREDICT_OPERATIONS = (:predict, const OPERATIONS = (PREDICT_OPERATIONS..., :transform, :inverse_transform) const DOC_NEW_DATA = - "Here `data` is a tuple of data objects, "* + "Here `report` contains ancilliary byproducts of the computation, or "* + "is `nothing`; `data` is a tuple of data objects, "* "generally a single object representing new observations "* "not seen in training. " -# # FALLBACK HELPERS - -function _predict(model, args...) - p = predict(model, args...) - if :predict in reporting_operations(model) - return p - else - return (p, nothing) - end -end - -_compress(yhat, report) = (yhat, report) -_compress(yhat, ::Nothing) = yhat - # # METHOD STUBS/FALLBACKS """ LearnAPI.predict(model, fitted_params, data...) -Return predictions or prediction-like output, `ŷ`, for a machine learning model, `model`, -with learned parameters `fitted_params`, as returned by -[`LearnAPI.fit`](@ref). $DOC_NEW_DATA - -However, in the special case that `:predict in LearnAPI.reporting_operations(model)` is -`true`, `(ŷ, report)` is returned instead. Here `report` contains ancilliary byproducts of -computing the prediction. +Return `(ŷ, report)` where `ŷ` are the predictions, or prediction-like output, for +a machine learning model, `model`, with learned parameters `fitted_params`, as returned by +[`LearnAPI.fit`](@ref). $DOC_NEW_DATA # New model implementations $(DOC_IMPLEMENTED_METHODS(:predict)) -If `is_supervised(model) = true`, then `ŷ` must be: +If `performance_measureable = true`, then `ŷ` must be: - either an array or table with the same number of observations as each element of `data`; it cannot be a lazy object, such as a `DataLoader` -- **target-like** (point, probabilistic, or interval); see - [`LearnAPI.prediction_type`](@ref) for specifics. +- **target-like**; see [`LearnAPI.paradigm`](@ref) for specifics. Otherwise there are no restrictions on what `predict` may return, apart from what the implementation itself promises, by making an optional [`LearnAPI.output_scitypes`](@ref) @@ -58,7 +40,7 @@ declaration. By default, it is expected that `data` has length one. Otherwise, [`LearnAPI.input_scitypes`](@ref) must be overloaded. -See also [`LearnAPI.fit`](@ref), [`MJInterface.predict_mean`](@ref), +See also [`LearnAPI.fit`](@ref), [`LearnAPI.predict_mean`](@ref), [`LearnAPI.predict_mode`](@ref), [`LearnAPI.predict_median`](@ref). """ @@ -70,18 +52,15 @@ function DOC_PREDICT(reducer) """ LearnAPI.predict_$reducer(model, fitted_params, data...) - If `LearnAPI.predict` returns a vector of probabilistic predictions, `distributions`, - return a corresponding data object `ŷ` of $reducer values. $DOC_NEW_DATA - - In the special case that `LearnAPI.predict` instead returns `(distributions, - report)`, `$operation` instead return `(ŷ, report)`. - + Same as [`LearnAPI.predict`](@ref) except replace probababilistic predictions with + $reducer values. # New model implementations - A fallback broadcasts `$reducer` over `ŷ`. An algorithm that predicts probabilistic - predictions may already need to predict mean values, and so overloading this method - might provide a performance advantage. + A fallback broadcasts `$reducer` over the first return value `ŷ` of + `LearnAPI.predict`. An algorithm that computes probabilistic predictions may already + need to predict mean values, and so overloading this method might enable a performance + boost. $extra @@ -96,9 +75,9 @@ for reducer in [:mean, :median] quote "$($docstring)" function $operation(args...) - distributions, report = _predict(args...) + distributions, report = predict(args...) yhat = $reducer.(distributions) - return _compress(yhat, report) + return (yhat, report) end end |> eval end @@ -106,20 +85,21 @@ end """ LearnAPI.predict_joint(model, fitted_params, data...) -For a supervised learning model, return a single probability distribution for the sample -space ``Y^n``, whose elements are `n`-dimensional vectors with element type matching that of -the training target (the second data object in `LearnAPI.fit(model, verbosity, -data...)`). Here `n` is the number of observations in `data`. Here `fitted_params` are the -model's learned parameters, as returned by [`LearnAPI.fit`](@ref). $DOC_NEW_DATA. +For a supervised learning model, return `(d, report)`, where `d` is a *single* probability +distribution for the sample space ``Y^n``, whose elements are `n`-dimensional vectors with +element type matching that of the training target (the second data object in +`LearnAPI.fit(model, verbosity, data...)`). Here `n` is the number of observations in +`data`. Here `fitted_params` are the model's learned parameters, as returned by +[`LearnAPI.fit`](@ref). $DOC_NEW_DATA. While the interpretation of this distribution depends on the model, marginalizing component-wise will generally deliver *correlated* univariate distributions, and these will -generally not agree with those returned by `LearnAPI.predict`, if implemented. +generally not agree with those returned by `LearnAPI.predict`, if also implemented. # New model implementations It is not necessary that `LearnAPI.predict` be implemented but -`LearnAPI.is_supervised(model)` must return `true`. +`LearnAPI.performance_measureable(model)` must return `true`. $(DOC_IMPLEMENTED_METHODS(:predict_joint)). @@ -129,7 +109,23 @@ See also [`LearnAPI.fit`](@ref), [`LearnAPI.predict`](@ref). function predict_joint end """ -`Unsupervised` models must implement the `transform` operation. + LearnAPI.transform(model, fitted_params, data...) + +Return `(output, report)`, where `output` is some kind of transformation of `data`, provided +by `model`, based on the learned parameters `fitted_params`, as returned by +[`LearnAPI.fit`](@ref) (which could be `nothing` for models that do not generalize to new +data, such as "static transformers"). $DOC_NEW_DATA + + +# New model implementations + +$(DOC_IMPLEMENTED_METHODS(:transform)) + +By default, it is expected that `data` has length one. Otherwise, +[`LearnAPI.input_scitypes`](@ref) must be overloaded. + +See also [`LearnAPI.fit`](@ref), [`MJInterface.predict`](@ref), + """ function transform end From 5f311fe7879143af083749afcba283bb7c7d33ab Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Mon, 5 Sep 2022 18:03:38 +1200 Subject: [PATCH 06/15] bunch of stuff --- docs/make.jl | 2 +- docs/src/anatomy_of_an_implementation.md | 201 +++++++++++++++-------- docs/src/fit_update_and_ingest.md | 65 ++++---- docs/src/index.md | 110 +++++++------ docs/src/model_traits.md | 21 ++- docs/src/operations.md | 113 ++++++++----- docs/src/reference.md | 35 ++-- src/fit_update_ingest.jl | 54 +++--- src/models.jl | 13 +- src/operations.jl | 87 +++++++--- 10 files changed, 428 insertions(+), 273 deletions(-) diff --git a/docs/make.jl b/docs/make.jl index ba3edac..87a44c2 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -4,7 +4,7 @@ using LearnAPI const REPO="github.com/JuliaAI/LearnAPI.jl" makedocs(; - modules=[LearnAPI], + modules=[LearnAPI,], format=Documenter.HTML(prettyurls = get(ENV, "CI", nothing) == "true"), pages=[ "Introduction" => "index.md", diff --git a/docs/src/anatomy_of_an_implementation.md b/docs/src/anatomy_of_an_implementation.md index 1bd2ad6..2676bcb 100644 --- a/docs/src/anatomy_of_an_implementation.md +++ b/docs/src/anatomy_of_an_implementation.md @@ -1,14 +1,15 @@ # Anatomy of an Implementation -> **Summary.** A **model** is just a container for hyper-parameters. A basic implementation -> for a ridge regressor requires implementing `fit` and `predict` methods dispatched on the -> model type; `predict` is an example of an **operation**; another is `transform`. In this -> example we also implement an **accessor function** called `feature_importance` (returning -> the absolute values of the linear coefficients). We need trait declarations to flag the -> model as supervised, and another to list the implemented methods. Optional traits -> articulate the model's data type requirements and the output type of operations. - -We begin by describing an implementation of the Learn API for basic ridge +> **Summary.** A **model** is just a container for hyper-parameters. A basic +> implementation of the ridge regressor requires implementing `fit` and `predict` methods +> dispatched on the model type; `predict` is an example of an **operation** (another is +> `transform`). In this example we also implement an **accessor function** called +> `feature_importance` (returning the absolute values of the linear coefficients). The +> ridge regressor has a target variable and one trait declaration flags the output of +> `predict` as being a [proxy](@ref scope) for the target. Other traits articulate the +> model's training data type requirements and the output type of `predict`. + +We begin by describing an implementation of LearnAPI.jl for basic ridge regression (no intercept) to introduce the main actors in any implementation. @@ -26,7 +27,7 @@ Next, we define a struct to store the single hyper-parameter `lambda` of this mo ```julia struct MyRidge <: LearnAPI.Model - lambda::Float64 + lambda::Float64 end ``` @@ -37,7 +38,7 @@ otherwise disruptive. If you omit the subtyping then you must declare LearnAPI.ismodel(::MyRidge) = true ``` -as a promise that instances of `MyRidge` implement the Learn API. +as a promise that instances of `MyRidge` implement LearnAPI.jl. Instances of `MyRidge` are called **models** and `MyRidge` is a **model type**. @@ -50,49 +51,49 @@ MyRidge(; lambda=0.1) = MyRidge(lambda) ## A method to fit the model A ridge regressor requires two types of data for training: **input features** `X` and a -**target** `y`. Training is implemented by overloading `fit`. Here `verbosity` is an integer +[**target**](@ref scope) `y`. Training is implemented by overloading `fit`. Here `verbosity` is an integer (`0` should train silently, unless warnings are needed): ```julia function LearnAPI.fit(model::MyRidge, verbosity, X, y) - # process input: - x = Tables.matrix(X) # convert table to matrix - features = Tables.columnnames(X) + # process input: + x = Tables.matrix(X) # convert table to matrix + features = Tables.columnnames(X) - # core solver: - coefficients = (x'x + model.lambda*I)\(x'y) + # core solver: + coefficients = (x'x + model.lambda*I)\(x'y) - # prepare output - learned parameters: - fitted_params = (; coefficients) + # prepare output - learned parameters: + fitted_params = (; coefficients) - # prepare output - model state: - state = nothing # not relevant here + # prepare output - model state: + state = nothing # not relevant here - # prepare output - byproducts of training: - feature_importances = - [features[j] => abs(coefficients[j]) for j in eachindex(features)] - sort!(feature_importances, by=last) |> reverse! - verbosity > 1 && @info "Features in order of importance: $(first.(feature_importances))" - report = (; feature_importances) + # prepare output - byproducts of training: + feature_importances = + [features[j] => abs(coefficients[j]) for j in eachindex(features)] + sort!(feature_importances, by=last) |> reverse! + verbosity > 1 && @info "Features in order of importance: $(first.(feature_importances))" + report = (; feature_importances) - return fitted_params, state, report + return fitted_params, state, report end ``` Regarding the return value of `fit`: -- The `fitted_params` is for the model's learned parameters, for passing to +- The `fitted_params` variable is for the model's learned parameters, for passing to `predict` (see below). -- The `state` variable is only relevant when additionally implementing an [`update!`](@ref) - or [`ingest!`](@ref) method (see [Fit, update! and ingest!](@ref)). +- The `state` variable is only relevant when additionally implementing a [`LearnAPI.update!`](@ref) + or [`LearnAPI.ingest!`](@ref) method (see [Fit, update! and ingest!](@ref)). - The `report` is for other byproducts of training, excluding the learned parameters. Notice that we have chosen here to suppose that `X` is presented as a table (rows are the observations); and we suppose `y` is a `Real` vector. (While this is typical of MLJ model -implementations, the Learn API puts no restrictions on the form of `X` and `y`.) +implementations, LearnAPI.jl puts no restrictions on the form of `X` and `y`.) ## Operations @@ -100,67 +101,72 @@ implementations, the Learn API puts no restrictions on the form of `X` and `y`.) Now we need a method for predicting the target on new input features: ```julia -LearnAPI.predict(::MyRidge, fitted_params, Xnew) = Tables.matrix(Xnew)*fitted_params.coefficients +function LearnAPI.predict(::MyRidge, fitted_params, Xnew) + Xmatrix = Tables.matrix(Xnew) + report = nothing + return Xmatrix*fitted_params.coefficients, report +end ``` -The above `predict` method is an example of an **operation**. Other operations include -`transform` and `inverse_transform` and a model can implement more than one. For example, a -K-means clustering model might implement a `transform` for dimension reduction, and a +In some models `predict` computes something of interest in addition to the target +prediction, and this `report` item is returned as the second component of the return +value. When there's nothing to report, we must return `nothing`, as here. + +Our `predict` method is an example of an **operation**. Other operations include +`transform` and `inverse_transform` and a model can implement more than one. For example, +a K-means clustering model might implement a `transform` for dimension reduction, and a `predict` to return cluster labels. ## Accessor functions The arguments of an operation are always `(model, fitted_params, data...)`. The interface also -provides **accessor functions** for extracting information from the `fitted_params` and/or -`report` that is shared by several model types. There is one for feature importances that +provides **accessor functions** for extracting information, from the `fitted_params` and/or +`report`, that is shared by several model types. There is one for feature importances that we can implement for `MyRidge`: ```julia -LearnAPI.feature_importances(::MyRidge, fitted_params, report) = report.feature_importances +LearnAPI.feature_importances(::MyRidge, fitted_params, report) = +report.feature_importances ``` -Another example of an accessor function is `training_losses` (supervised models) and -`training_scores` (outlier detection models). +Another example of an accessor function is `training_losses`. ## Model traits -In this supervised learning example, `predict` returns an object with the same type of the -*second* data argument `y` of `fit` (the target). It therefore makes sense, for example, to -apply a suitable metric (e.g., a sum of squares) to the pair `(ŷ, y)`, where `ŷ = -predict(model, fitted_params, X)`. We will flag this behavior by declaring +Our model has a target variable, in the sense outlined in [Scope and undefined +notions](@ref scope), and `predict` returns an object with exactly the same form as the +target. We indicate this behaviour by declaring ```julia -LearnAPI.is_supervised(::Type{<:MyRidge}) = true +LearnAPI.target_proxy_kind(::Type{<:MyRidge}) = (; predict=LearnAPI.Target()) ``` -This is an example of a **model trait** declaration. A complete list of traits and the -contracts they imply is given in [`Model traits`](@ref). +More generally, `predict` only returns a *proxy* for the target, such as probability +distributions, and we would make a different declaration here. See [Target proxies](@ref) +for details. + +`LearnAPI.target_proxy_kind` is an example of a **model trait**. A complete list of traits +and the contracts they imply is given in [Model traits](@ref). > **MLJ only.** The values of all traits constitute a model's **metadata**, which is > recorded in the searchable MLJ Model Registry, assuming the implementation-providing > package is registered there. -Since our model is supervised, we are required to implement an additional trait that -distinguishes our model from other regressors that make probabilistic or other kinds of -predictions of the target: +We also need to indicate that the target appears in training (this is a *supervised* +model) and the position of `target` within the `data` argument of `fit`: ```julia -LearnAPI.paradigm(::Type{<:MyRidge}) = Dict(:predict => :point) +LearnAPI.position_of_target(::Type{<:MyRidge}) = 2 ``` -If instead, our `predict` method would return probabilistic predictions, we would instead -return `Dict(:predict => :pdf)` or `Dict(:predict => :rand)`, depending on whether or not -`predict` returns objects implementing `Distributions.pdf` from Distributions.jl, or merely -`Base.rand`. Other options are `:interval` and `:survival_probability`. - -As explained in the introduction, the Learn API does not attempt to define strict -model "types", such as "regressor" or "clusterer". We can optionally specify suggestive +As explained in the introduction, LearnAPI.jl does not attempt to define strict model +"types", such as "regressor" or "clusterer". However, we can optionally specify suggestive keywords, as in ```julia -MLJInterface.keywords(::Type{<:MyRidge}) = [:regression,] +MLJInterface.keywords(::Type{<:MyRidge}) = (:regression,) ``` but note that this declaration promises nothing. Do `LearnAPI.keywords()` to get a list @@ -170,11 +176,11 @@ Finally, we are required to declare what methods (excluding traits) we have expl overloaded for our type: ```julia -LearnAPI.implemented_methods(::Type{<:MyRidge}) = [ - :fit, - :predict, - :feature_importances, -] +LearnAPI.implemented_methods(::Type{<:MyRidge}) = ( + :fit, + :predict, + :feature_importances, +) ``` ## Training data types @@ -206,19 +212,20 @@ Or, in other words: elements. -## Operation data types +## Types for data returned by operations -A promise that an operation, such as `predict`, returns an object of given scientific type is articulated in this way: +A promise that an operation, such as `predict`, returns an object of given scientific type +is articulated in this way: ```julia -MLJInterface.return_scitypes(::Type{<:MyRidge}) = Dict(:predict => AbstractVector{<:Continuous}) +MLJInterface.return_scitypes(::Type{<:MyRidge}) = (:predict => AbstractVector{<:Continuous},) ``` If `predict` had instead returned probability distributions, and these implement the `Distributions.pdf` interface, then the declaration would be ```julia -MLJInterface.return_scitypes(::Type{<:MyRidge}) = Dict(:predict => AbstractVector{Density{<:Continuous}}) +MLJInterface.return_scitypes(::Type{<:MyRidge}) = (:predict => AbstractVector{Density{<:Continuous}},) ``` There is also an `input_scitypes` trait for operations. However, this falls back to the @@ -227,3 +234,59 @@ we need not overload it here. ## Convenience macros + + +## [Illustrative fit/predict workflow](@id workflow) + +Here's some toy data for supervised learning: + +```julia +using Tables + +n = 10 # number of training observations +train = 1:6 +test = 7:10 + +a, b, c = rand(n), rand(n), rand(n) +X = (; a, b, c) |> Tables.rowtable +y = 2a - b + 3c + 0.05*rand(n) +``` +Instantiate a model with relevant hyperparameters: + +```julia +model = MyRidge(lambda=0.5) +``` + +Train the model: + +```julia +import LearnAPI: fit, predict, feature_importances + +fitted_params, state, fit_report = fit(model, 1, X[train], y[train]) +``` + +Inspect the learned paramters and report: + +```julia +@info "training outcomes" fitted_params report +``` + +Inspect feature importances: + +```julia +feature_importances(model, fitted_params, report) +``` + +Make a prediction using new data: + +```julia +yhat, predict_report = predict(model, fitted_params, X[test]) +``` + +Compare predictions with ground truth + +```julia +deviations = yhat - y[test] +loss = deviations .^2 |> sum +@info "Sum of squares loss" loss +``` diff --git a/docs/src/fit_update_and_ingest.md b/docs/src/fit_update_and_ingest.md index e23c693..dd69d5d 100644 --- a/docs/src/fit_update_and_ingest.md +++ b/docs/src/fit_update_and_ingest.md @@ -1,51 +1,42 @@ # Fit, update! and ingest! -> **Summary.** All models that learn, i.e., generalize to new data, must implement `fit`; -> the fallback, useful for so-called **static** models, performs no operation and returns -> all `nothing`. Implement `update!` if certain hyper-parameter changes do not necessitate -> retraining from scratch (e.g., iterative models). Implement `ingest!` to implement -> incremental learning. +> **Summary.** Models that learn, i.e., generalize to new data, must overload `fit`; +> the fallback performs no operation and returns all `nothing`. Implement `update!` if +> certain hyper-parameter changes do not necessitate retraining from scratch (e.g., +> increasing iteration parameters). Implement `ingest!` to implement incremental learning. | method | fallback | compulsory? | requires | |:---------------------------|:---------------------------------------------------|-------------|-------------------| -[`LearnAPI.fit`](@ref) | does nothing, returns `(nothing, nothing, nothing)`| no | | -[`LearnAPI.update!`](@ref) | calls `fit` | no | `LearnAPI.fit` | -[`LearnAPI.ingest!`](@ref)| none | no | `LearnAPI.fit` | - -Implement `fit` unless your model is **static**, meaning its [operations](@ref operations), -such as `predict` and `transform`, ignore their `fitted_params` argument (which will be -`nothing`). This is the case for models that have hyper-parameters, but do not generalize to -new data, such as a basic DBSCAN clustering algorithm. Related: -[`LearnAPI.reporting_operations`](@ref), [Static Models](@ref). +[`LearnAPI.fit`](@ref) | does nothing, returns `(nothing, nothing, nothing)`| no | | +[`LearnAPI.update!`](@ref) | calls `fit` | no | `LearnAPI.fit` | +[`LearnAPI.ingest!`](@ref) | none | no | `LearnAPI.fit` | + +All three methods above return a triple `(fitted_params, state, report)` whose components +are explained under [`LearnAPI.fit`](@ref) below. Items that might be returned in +`report` include: feature rankings/importances, SVM support vectors, clustering centres, +methods for visualizing training outcomes, methods for saving learned parameters in a +custom format, degrees of freedom, deviances. Precisely what `report` includes might be +controlled by model hyperparameters, especially if there is a performance cost to it's +inclusion. + +Implement `fit` unless all [operations](@ref operations), such as `predict` and +`transform`, ignore their `fitted_params` argument (which will be `nothing`). This is the +case for many models that have hyperparameters, but do not generalize to new data, such +as a basic DBSCAN clustering algorithm. The `update!` method is intended for all subsequent calls to train a model *using the same -data*, but with possibly altered hyperparameters (`model` argument). As described below, a -fallback implementation simply calls `fit`. The main use cases are for warm-restarting -iterative model training, and for "smart" training of composite models, such as linear -pipelines. Here "smart" means that hyperparameter changes only trigger the retraining of -downstream components. - -The `ingest!` method supports incremental learning (same hyperparameters, but new -data). Like `update!`, it depends on the output a preceding `fit` or `ingest!` call. +observations*, but with possibly altered hyperparameters (`model` argument). A fallback +implementation simply calls `fit`. The main use cases for implementing `update` are: (i) +warm-restarting iterative models, and (ii) "smart" training of composite models, such as +linear pipelines. Here "smart" means that hyperparameter changes only trigger the +retraining of downstream components. +The `ingest!` method supports incremental learning (same hyperparameters, but new training +observations). Like `update!`, it depends on the output a preceding `fit` or `ingest!` +call. ```@docs LearnAPI.fit LearnAPI.update! LearnAPI.ingest! ``` - -## Further guidance on what goes where - -Recall that the `fitted_params` returned as part of `fit` represents everything needed by an -[operation](@ref operations), such as [`LearnAPI.predict`](@ref). - -The properties of your model (typically struct fields) are *hyperparameters*, i.e., those -parameters declared by the user ahead of time that generally affect the outcome of training -and are not learned. It is okay to add "control" parameters (such a specifying whether or -not to use a GPU). Use `report` to return *everything else*. This includes: feature -rankings/importances, SVM support vectors, clustering centres, methods for visualizing -training outcomes, methods for saving learned parameters in a custom format, degrees of -freedom, deviances. If there is a performance cost to extra functionality you want to -expose, the functionality can be toggled on/off through a hyperparameter, but this should -otherwise be avoided. diff --git a/docs/src/index.md b/docs/src/index.md index d16450f..ac8951f 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -4,75 +4,91 @@ LearnAPI.jl
-A Julia interface for training and applying models in machine learning and statistics +A basic Julia interface for training and applying machine learning models +

``` +**Quick tour for developers of ML software.** For a rapid overview, by way of a sample +implementation, see [Anatomy of an Implementation](@ref). + +**Quick tour for users of models implementing LearnAPI.jl.** Although primarily intended +as a basement-level machine learning interface for developers, users can interact directly +with LearnAPI.jl models, as illustrated [here](@ref workflow). For a more powerful +interface built on top of LearnAPI.jl, see +[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/). + +## Summary + Machine learning algorithms, also called *models*, have a complicated taxonomy. Grouping -models into a relatively small number of types, such as "classifier" and "clusterer", and -attempting to impose uniform behaviour within each group, is challenging. In our -experience, it either leads to limitations on the models that can be included in a general -interface, or additional complexity needed to cope with exceptional cases. Even if a -complete user interface for machine learning might benefit from such groupings, a -basement-level API for ML should, in our view, avoid them. +models, or modelling tasks, into a relatively small number of types, such as "classifier" +and "clusterer", and attempting to impose uniform behaviour within each group, is +challenging. In our experience, it either leads to limitations on the models that can be +included in a general interface, or additional complexity needed to cope with exceptional +cases. Even if a complete user interface for machine learning might benefit from such +groupings, a basement-level API for ML should, in our view, avoid them. -The **Learn API** documented here is base API for machine learning that is purely -functional with no abstract model types (apart an optional supertype `Model`). It provides -the following methods, dispatched on model type: +LearnAPI.jl is a base interface for machine learning algorithms in which behaviour is +articulated using traits. It has no abstract model types, apart from an optional supertype +`Model`. It provides the following methods, dispatched on model type: -- `fit` for regular training +- `fit` for regular training, overloaded if the model generalizes to new data, as in + classical supervised learning - `update!` for adding model iterations, or responding efficiently to other post-`fit`changes in hyperparameters - `ingest!` for incremental learning -- **operations**, such as `predict`, `transform` and `inverse_transform` for applying the model - to data +- **operations**, such as `predict`, `transform` and `inverse_transform` for applying the + model to data not used for training - common **access functions**, such as `feature_importances` and `training_losses`, for - extracting from training outcomes information common to particular classes of models. + extracting, from training outcomes information, common to particular classes of models -- **model traits**, such as `is_supervised(model)`, for promising specific behaviour. +- **model traits**, such as `target_proxy_kind(model)`, for promising specific behaviour -Since this is a functional interface, `fit` returns model "state", in addition to learned -parameters, for passing to the optional `update!` and `ingest!` methods. These three -methods all return a `report` component, for exposing byproducts of training different -from learned parameters. Similarly, all operations also return a `report` component, -although this would typically be `nothing`, unless the model does not implement `fit` -(does not generalize to new data). +There is flexibility about how much of the interface is implemented by a given model +object `model`. A special trait `implemented_methods(model)` declares what has been +explicitly implemented or overloaded to work with `model`. +Since this is a functional-style interface, `fit` returns model `state`, in addition to +learned parameters, for passing to the optional `update!` and `ingest!` methods. These +training methods also return a `report` component, for exposing byproducts of training +different from learned parameters. Similarly, all operations also return a `report` +component (important for models that do not generalize to new data). -## Scope and undefined notions +Models can be supervised or not supervised, can generalize to new data observations, or +not generalize. To ensure proper handling by client packages of probabilistic and other +non-literal forms of target predictions (pdfs, confidence intervals, survival functions, +etc) the kind of prediction can be flagged appropriately; see more at "target" below. -The Learn API provides methods for training, applying, and saving machine learning models, -and that is all. To keep it *It does not specify an interface for data access or data -resampling*. That said, the interface references a few basic undefined notions, which some -higher-level interface might decide to formalize: -- Each machine learning model's behaviour is governed by a number of user-specified - **hyper-parameters**. +## [Scope and undefined notions](@id scope) -- An object which generates ordered sequences of individual **observations** is called - **data**. +The Learn API provides methods for training, applying, and saving machine learning models, +and that is all. *It does not specify an interface for data access or data +resampling*. However, LearnAPI.jl is predicated on a few basic undefined notions (in +boldface) which some higher-level interface might decide to formalize: -- Information needed for training that is not a model hyper-parameter and not data is called - **metadata** (e.g., target class weights and group lasso feature groupings). +- An object which generates ordered sequences of individual **observations** is + called **data**. -- Some models, including but not limited to supervised models, involve **target** data, in - training or otherwise, and implement an operation, typically `predict`, that outputs - data that is target-like. To say that data is **target-like** is to say that it can be - paired with target data having the same number of observations to obtain useful - information about the model and the data that has been presented to it, typically a - measure of the model's expected performance on unseen data. Target-like data can take - various informally defined forms, such as `Deterministic`, `Distribution`, `Sampleable`, - `SurvivalFunction` and `Interval` detailed further under [Operations](@ref operations). +- Each machine learning model's behaviour is governed by a number of user-specified + **hyperparameters**. -Regarding the last point, consider outlier detection, where target observations are either -"outlier" or "inlier". If the detector predicts probabilities for outlierness (the -target-like data) these can be paired with "outlier"/"inlier" labels assigned by humans, -using, say, area under the ROC curve, to measure performance. Many such detectors are -trainined without supervision. +- Information needed for training that is not a model hyperparameter and not data is called + **metadata** (e.g., target class weights and group lasso feature groupings). +- Some models involve the notion of a **target** variable and generate output with the + same form as the target, or, more generally, some kind of target proxy, such as + probability distributions. A *target proxy* is something that can be *paired* with target + data to obtain useful information about the model and the data that has been presented + to it, typically a measure of the model's expected performance on unseen data. A target + variable is not necessarily encountered during training, i.e., target variables can make + sense for unsupervised models, and also for models that do not generalize to new + observations. For examples, and an informal classification of target proxy types, refer + to [Target proxies](@ref). + ## Contents @@ -86,7 +102,7 @@ the definitive specification of the interface is the [Reference](@ref) section. - [Reference](@ref) -- [Testing an implementation](@ref) +- [Testing an Implementation](@ref) !!! info @@ -97,6 +113,6 @@ the definitive specification of the interface is the [Reference](@ref) section. **Note.** The Learn API provides a foundation for the higher level "machine" interface for user interaction in the toolbox [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) created by the same -developers. However, the Learn API provided here is meant as a general purpose, +developers. However, LearnAPI.jl provided here is meant as a general purpose, standalone, lightweight API for machine learning algorithms (and has no reference to machines). diff --git a/docs/src/model_traits.md b/docs/src/model_traits.md index a20a966..1e115d8 100644 --- a/docs/src/model_traits.md +++ b/docs/src/model_traits.md @@ -1,11 +1,16 @@ # Model Traits -| trait | fallback value | requires | required by | -|:------|:---------|:---------|:---------------| -| [`LearnAPI.ismodel`](@ref) | `false` | one of: `predict`/`predict_joint`/`transform` | all models | -| [`LearnAPI.implemented_methods`](@ref) | `Symbol[]` | | all models | -| [`LearnAPI.is_supervised`](@ref) | `false` | [`LearnAPI.predict`](@ref) or [`LearnAPI.predict_joint`](@ref) | [`LearnAPI.predict_joint`](@ref) | -| [`LearnAPI.paradigm`](@ref) | `:unknown` | relevant operations | [`LearnAPI.predict`](@ref), [`MLJInterface.predict_joint`](@ref) †| -| [`MLInteface.joint_prediction_type`](@ref) | `:unknown` | [`LearnAPI.predict_joint`](@ref) | [`LearnAPI.predict_joint`](@ref) | +| trait | fallback value | return value | example | +|:-------------------------------------------------|:----------------------|:--------------|:--------| +| [`LearnAPI.ismodel`](@ref)`(model)` | `false` | is `true` for any model, as defined in [`Models`](@ref) | `true` | +| [`LearnAPI.implemented_methods`](@ref)`(model)` | `()` | lists of all overloaded/implemented methods (traits excluded) | `(:fit, :predict)` | +| [`LearnAPI.target_proxy_kind`](@ref)`(model)` | `()` | details form of target proxy output | `(:predict => LearnAPI.Distribution,)` | +| [`LearnAPI.position_of_target`](@ref)`(model)` | `0` | † the positional index of the **target** in `data` in `fit(..., data...; metadata)` calls | 2 | +| [`LearnAPI.position_of_weights`](@ref)`(model)` | `0` | † the positional index of **observation weights** in `data` in `fit(..., data...; metadata)` | 3 | +| [`LearnAPI.keywords`](@ref)`(model)` | `()` | lists one or more suggestive model descriptors from `LearnAPI.keywords()` | (:regressor, ) | -† If additionally `is_supervised(model) == true`. +† If the value is `0`, then the variable in boldface type is not supported and never +appears in `data`. If `length(data)` exceeds the trait value, then `data` is understood to +exclude the variable, but note that `fit` can have multiple signatures of varying lengths, +as in `fit(model, verbosity, X, y)` and `fit(model, verbosity, X, y, w)`. A non-zero value +is a promise that `fit` includes a signature where the variable is included. diff --git a/docs/src/operations.md b/docs/src/operations.md index 10ea75d..9c9d546 100644 --- a/docs/src/operations.md +++ b/docs/src/operations.md @@ -1,14 +1,19 @@ # [Predict and other operations](@id operations) +> **Summary** Methods like `predict` and `transform`, that generally depend on learned +> parameters, are called **operations**. All implemented operations must be included in +> the output of the `implemented_methods` model trait. When an operation returns a [target +> proxy](@ref scope), it must make a `target_proxy_kind` declaration. + An *operation* is any method with signature `some_operation(model, fitted_params, data...)`. Here `fitted_params` is the learned parameters object, as returned by -[`LearnAPI.fit`](@ref), which will be `nothing` if `fit` is not implemented (true for models -that do not generalize to new data). For example, `predict` in the following code snippet is -an operation: +[`LearnAPI.fit`](@ref)`(model, ...)`, which will be `nothing` if `fit` is not implemented +(true for models that do not generalize to new data). For example, `LearnAPI.predict` in +the following code snippet is an operation: ```julia fitted_params, state, fit_report = LearnAPI.fit(some_model, 1, X, y) -ŷ, predict_report = predict(some_model, fitted_params, Xnew) +ŷ, predict_report = LearnAPI.predict(some_model, fitted_params, Xnew) ``` | method | compulsory? | fallback | requires | @@ -27,66 +32,87 @@ ŷ, predict_report = predict(some_model, fitted_params, Xnew) ## General requirements -- Each `model` must implement at least one of: `predict`, `transform`, - `predict_joint`. - - Only implement `predict_joint` for outputing a *single* multivariate probability - distribution with a dimension for each input observation; see - [`LearnAPI.predict_joint`](@ref) for details. - -- Do not overload `predict_mode`, `predict_mean` or `predict_median` unless `predict` has - been implemented. - -- Do not overload `inverse_transform` unless `transform` has been implemented. + distribution for multiple target predictions, as described further at + [`LearnAPI.predict_joint`](@ref). - Each operation explicitly implemented or overloaded must be included in the return value of [`LearnAPI.implemented_methods`](@ref). -## Predict or transform? +## Predict or transform? -- If the model has a target, as defined under [Scope and undefined notions](@ref), then - only `predict` or `predict_joint` can be used to generate corresponding target-like - data. +- If the model has a target, as defined under [Scope and undefined notions](@ref scope), then + only `predict` or `predict_joint` can be used to generate a corresponding target proxy. - If an operation is to have an inverse operation, then it cannot be `predict` - use `transform` and `inverse_transform`. +- If only a single operation is implemented, and there is no target variable, use `transform`. + Here an "inverse" of `transform` is very broadly understood as any operation that can be applied to the output of `transform` to obtain an object of the same form as the input of `transform`; for example this includes one-sided inverses, and approximate one-sided -inverses. (In some API's, such an operation is called `reconstruct`.) - -In all other cases, the Learn API makes only informal stipulations on which operation to -use: - -- Clustering algorithms should use `predict` *when returning cluster labels.* (For - clusterering algorithms that perform dimension reduction, `transform` can be used.) - -- Outlier detection models should return raw scores using `transform` and use `predict` for - returning either normalized scores or "outlier"/"inlier" classifications. +inverses. + + +## Target proxies + +In the case that a model has the concept of a **target** variable, as described under +[Scope and undefined notions](@ref scope), the output of an operation may have the form of +a proxy for the target, such as a vector of truth-probabilities for binary targets. + +We assume the reader is already familiar with the notion of a target variable in +supervised learning, but target variables are not limited to supervised models. For +example, we may regard the "outlier"/"inlier" assignments in unsupervised anomaly +detection as a target. A target proxy in this example would be probabilities for +outlierness, as these can be paired with "outlier"/"inlier" labels assigned by humans, +using, say, area under the ROC curve, to quantify performance. + +Similarly, the integer labels assigned to some observations by a clustering algorithm can +be regarded as a target variable. The labels obtained can be paired with human labels +using, say, the Rand index. + +The kind of proxy one has for the target in some operation output is informally +classified by a subtype of `LearnAPI.TargetProxy`. These types are intended for dispatch +outside of LearnAPI.jl and have no fields. + +| type | form of observations | possible requirement in some external API | +|:-------------------------------:|:---------------------|:------------------------------------------| +| `LearnAPI.Target ` | same as target observations | Observations have same type as target observations. | +| `LearnAPI.Sampleable` | objects that can be sampled to obtain objects of the same form as target observations | Each observation implements `Base.rand`. | +| `LearnAPI.Distribution` | explicit probability density/mass functions with sample space all possible target observations | Observations implement `Distributions.pdf` and `Base.rand` | +| `LearnAPI.LogDistribution` | explicit log probability density/mass functions with sample space all possible target observations | Observations implement `Distributions.logpdf` and `Base.rand` | +| † `LearnAPI.Probability` | raw numerical probability or probability vector | | +| † `LearnAPI.LogProbability` | log probability or log probability vector | | +| † `LearnAPI.Parametric` | a list of parameters describing some distribution | +| `LearnAPI.Ambiguous` | same form as the (multi-class) target, but with new, unmatched labels of possibly unequal number (as in, e.g., clustering)| +| `LearnAPI.AmbiguousSampleable` | sampleable version of `Ambiguous`; see `Sampleable` above | +| `LearnAPI.AmbiguousDistribution`| pdf/pmf version of `Ambiguous`; see `Distribution` above | +| `LearnAPI.ConfidenceInterval` | confidence intervals | Each observation `isa Tuple{Real,Real}`. +| `LearnAPI.SurvivalFunction` | survival functions | Observations are single-argument functions mapping `Real` to `Real`. +| `LearnAPI.SurvivalDistribution` | probability distribution for survival time | Observations have type `Distributions.ContinuousUnivariateDistribution`. + +† Not recommended because of ambiguities in interpretation +!!! warning -## Paradigms for target-like output - -Target-like data, as defined under [Scope and undefined notions](@ref), is classified by a -**paradigm**, which is one of the abstract types appearing in the table below. - -| paradigm type | form of observations | possible requirement in some external API | -|:---------------------:|:--------------------|:------------------------------------------| -| `LearnAPI.Deterministic` | the same form as target observations | Observations have same type as target observations. | -| `LearnAPI.Distribution` | explicit probability/mass density functions with sample space all possible target observations | Observations implements `Distributions.pdf`. | -| `LearnAPI.Sampleable` | objects that can be sampled to obtain objects of the same form as target observations) | Each observation implements `Base.rand`. | -| `LearnAPI.Interval` | ordered pairs of real numbers | Each observation `isa Tuple{Real,Real}`. -| `LearnAPI.SurvivalFunction` | survival functions | Observations are single-argument functions mapping `Real` to `Real`. + The last column of the table is not part of LearnAPI.jl. +An operation with target proxy as output must declare the `TargetProxy` subtype using the +[`LearnAPI.target_proxy_kind`](@ref), as in -!!! warning +```julia +LearnAPI.target_proxy_kind(::Type{<:SomeModel}) = (predict=LearnAPI.Distribution,) +``` - The last column of the table is not part of the Learn API. +### Special case of predict_joint +If `predict_joint` is implemented, then a `target_proxy_kind` declaration is required, but +the interpretation is slightly different. This is because the output of `predict_joint` is +not a number of observations but a single object. See more at [`LearnAPI.predict_joint`](@ref) below. -## Operation specifics +## Operation-specific details ```@docs LearnAPI.predict @@ -94,4 +120,5 @@ LearnAPI.predict_mean LearnAPI.predict_median LearnAPI.predict_joint LearnAPI.transform +LearnAPI.inverse_transform ``` diff --git a/docs/src/reference.md b/docs/src/reference.md index 221a205..a412d26 100644 --- a/docs/src/reference.md +++ b/docs/src/reference.md @@ -1,13 +1,13 @@ # Reference -Here we give the definitive specification of the Learn API. For a more informal -guide see [Common Implementation Patterns](@ref). +Here we give the definitive specification of interface provided by LearnAPI.jl. For a more +informal guide see [Common Implementation Patterns](@ref). ## Models -> **Summary** In the Learn API a **model** is a Julia object whose properties are the +> **Summary** In LearnAPI.jl a **model** is a Julia object whose properties are the > hyper-parameters of some learning algorithm. Functionality is created by overloading -> methods defined by the interface and promises of certain behavior is articulated by +> methods defined by the interface and promises of certain behavior are articulated by > model traits. In this document the word "model" has a very specific meaning that may differ from the @@ -18,10 +18,10 @@ learning algorithm that are accessible as named properties of the model, as in hyper-parameters. It is supposed that making copies of model objects is a cheap operation. Consequently, -*learned* parameters, such as coefficients in a linear model, or weights in a neural network -(the `fitted_params` appearing in [Fit, update! and ingest!](@ref)) are not expected to be -part of a model. Storing learned parameters in a model is not explicitly ruled out, but -doing so might lead to performance issues in packages adopting the Learn API. +*learned* parameters, such as weights in a neural network (the `fitted_params` described +in [Fit, update! and ingest!](@ref)) are not expected to be part of a model. Storing +learned parameters in a model is not explicitly ruled out, but doing so might lead to +performance issues in packages adopting LearnAPI.jl. Two models with the same type should be `==` if and only if all their hyper-parameters are `==`. Of course, a hyper-parameter could be another model. @@ -41,7 +41,7 @@ omitted, then one must make the declaration `LearnAPI.ismodel(::SomeType) = true` -and overload `Base.==` in the mutable case. +and overload `Base.==` if the type is mutable. > **MLJ only.** The subtyping also ensures instances will be displayed according to a > standard MLJ convention, assuming MLJ or MLJBase is loaded. @@ -53,18 +53,15 @@ LearnAPI.Model ## Methods -Model functionality is created by implementing: +None of the methods described in the linked sections below are compulsory, but any +implemented or overloaded method that is not a model trait must be added to the return +value of [`LearnAPI.implemented_methods`](@ref), as in -- zero or more of the training methods, `fit`, `update!` and `ingest!` (the second and third - require the first) - -- zero or more **operations**, like `predict` - -- zero or more **accessor functions** - -Meanwhile, promises of certain behaviour are articulated using **model traits**. +```julia +LearnAPI.implemented_methods(::Type{ first(inverse_transform(model, fitted_params, data)) +``` + +will be an inverse, approximate inverse, right inverse, or approximate right inverse, for +the map + +```julia +data -> first(transform(model, fitted_params, data)) +``` + +For example, if `transform` corresponds to a projection, `inverse_transform` is the +corresponding embedding. + + +# New model implementations + +$(DOC_IMPLEMENTED_METHODS(:transform)) -`Unsupervised` models may implement the `inverse_transform` operation. +See also [`LearnAPI.fit`](@ref), [`LearnAPI.predict`](@ref), """ function inverse_transform end -# models can optionally overload these for enable serialization in a -# custom format: function save end function restore end From c72ee93476bc2518b4290c4e5a26a751a5337375 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Mon, 5 Sep 2022 18:03:57 +1200 Subject: [PATCH 07/15] add blank doc files for each common interface pattern --- docs/src/patterns/classifiers.md | 1 + docs/src/patterns/clusterering.md | 1 + docs/src/patterns/dimension_reduction.md | 1 + docs/src/patterns/incremental_models.md | 1 + docs/src/patterns/iterative_models.md | 1 + docs/src/patterns/learning_a_probability_distribution.md | 1 + docs/src/patterns/missing_value_imputation.md | 1 + docs/src/patterns/outlier_detection.md | 1 + docs/src/patterns/regressors.md | 1 + docs/src/patterns/static_transformers.md | 1 + docs/src/patterns/supervised_bayesian_models.md | 1 + docs/src/patterns/survival_analysis.md | 1 + docs/src/patterns/time_series_classifiction.md | 1 + docs/src/patterns/time_series_forecasting.md | 1 + 14 files changed, 14 insertions(+) create mode 100644 docs/src/patterns/classifiers.md create mode 100644 docs/src/patterns/clusterering.md create mode 100644 docs/src/patterns/dimension_reduction.md create mode 100644 docs/src/patterns/incremental_models.md create mode 100644 docs/src/patterns/iterative_models.md create mode 100644 docs/src/patterns/learning_a_probability_distribution.md create mode 100644 docs/src/patterns/missing_value_imputation.md create mode 100644 docs/src/patterns/outlier_detection.md create mode 100644 docs/src/patterns/regressors.md create mode 100644 docs/src/patterns/static_transformers.md create mode 100644 docs/src/patterns/supervised_bayesian_models.md create mode 100644 docs/src/patterns/survival_analysis.md create mode 100644 docs/src/patterns/time_series_classifiction.md create mode 100644 docs/src/patterns/time_series_forecasting.md diff --git a/docs/src/patterns/classifiers.md b/docs/src/patterns/classifiers.md new file mode 100644 index 0000000..3571bc7 --- /dev/null +++ b/docs/src/patterns/classifiers.md @@ -0,0 +1 @@ +# Classifiers diff --git a/docs/src/patterns/clusterering.md b/docs/src/patterns/clusterering.md new file mode 100644 index 0000000..1a27dd8 --- /dev/null +++ b/docs/src/patterns/clusterering.md @@ -0,0 +1 @@ +# Clusterering diff --git a/docs/src/patterns/dimension_reduction.md b/docs/src/patterns/dimension_reduction.md new file mode 100644 index 0000000..3174adb --- /dev/null +++ b/docs/src/patterns/dimension_reduction.md @@ -0,0 +1 @@ +# Dimension Reduction diff --git a/docs/src/patterns/incremental_models.md b/docs/src/patterns/incremental_models.md new file mode 100644 index 0000000..54095fa --- /dev/null +++ b/docs/src/patterns/incremental_models.md @@ -0,0 +1 @@ +# Incremental Models diff --git a/docs/src/patterns/iterative_models.md b/docs/src/patterns/iterative_models.md new file mode 100644 index 0000000..af52482 --- /dev/null +++ b/docs/src/patterns/iterative_models.md @@ -0,0 +1 @@ +# Iterative Models diff --git a/docs/src/patterns/learning_a_probability_distribution.md b/docs/src/patterns/learning_a_probability_distribution.md new file mode 100644 index 0000000..19a53b8 --- /dev/null +++ b/docs/src/patterns/learning_a_probability_distribution.md @@ -0,0 +1 @@ +# Learning a Probability Distribution diff --git a/docs/src/patterns/missing_value_imputation.md b/docs/src/patterns/missing_value_imputation.md new file mode 100644 index 0000000..cf93d83 --- /dev/null +++ b/docs/src/patterns/missing_value_imputation.md @@ -0,0 +1 @@ +# Missing Value Imputation diff --git a/docs/src/patterns/outlier_detection.md b/docs/src/patterns/outlier_detection.md new file mode 100644 index 0000000..36fd8ca --- /dev/null +++ b/docs/src/patterns/outlier_detection.md @@ -0,0 +1 @@ +# Outlier Detection diff --git a/docs/src/patterns/regressors.md b/docs/src/patterns/regressors.md new file mode 100644 index 0000000..0b6163f --- /dev/null +++ b/docs/src/patterns/regressors.md @@ -0,0 +1 @@ +# Regressors diff --git a/docs/src/patterns/static_transformers.md b/docs/src/patterns/static_transformers.md new file mode 100644 index 0000000..f413050 --- /dev/null +++ b/docs/src/patterns/static_transformers.md @@ -0,0 +1 @@ +# Static Transformers diff --git a/docs/src/patterns/supervised_bayesian_models.md b/docs/src/patterns/supervised_bayesian_models.md new file mode 100644 index 0000000..7819b1d --- /dev/null +++ b/docs/src/patterns/supervised_bayesian_models.md @@ -0,0 +1 @@ +# Supervised Bayesian Models diff --git a/docs/src/patterns/survival_analysis.md b/docs/src/patterns/survival_analysis.md new file mode 100644 index 0000000..804292d --- /dev/null +++ b/docs/src/patterns/survival_analysis.md @@ -0,0 +1 @@ +# Survival Analysis diff --git a/docs/src/patterns/time_series_classifiction.md b/docs/src/patterns/time_series_classifiction.md new file mode 100644 index 0000000..d1a40d7 --- /dev/null +++ b/docs/src/patterns/time_series_classifiction.md @@ -0,0 +1 @@ +# Time Series Classifiction diff --git a/docs/src/patterns/time_series_forecasting.md b/docs/src/patterns/time_series_forecasting.md new file mode 100644 index 0000000..f2d6b32 --- /dev/null +++ b/docs/src/patterns/time_series_forecasting.md @@ -0,0 +1 @@ +# Time Series Forecasting From bd5ae54fcc02ead6fdc3113de285c2fcd58a3c76 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Tue, 6 Sep 2022 14:45:02 +1200 Subject: [PATCH 08/15] more stuff --- docs/src/anatomy_of_an_implementation.md | 5 ++-- docs/src/index.md | 30 ++++++++++++------------ docs/src/operations.md | 17 ++++++++------ 3 files changed, 28 insertions(+), 24 deletions(-) diff --git a/docs/src/anatomy_of_an_implementation.md b/docs/src/anatomy_of_an_implementation.md index 2676bcb..5c6d36c 100644 --- a/docs/src/anatomy_of_an_implementation.md +++ b/docs/src/anatomy_of_an_implementation.md @@ -92,8 +92,9 @@ Regarding the return value of `fit`: - The `report` is for other byproducts of training, excluding the learned parameters. Notice that we have chosen here to suppose that `X` is presented as a table (rows are the -observations); and we suppose `y` is a `Real` vector. (While this is typical of MLJ model -implementations, LearnAPI.jl puts no restrictions on the form of `X` and `y`.) +observations); and we suppose `y` is a `Real` vector. This is not a restriction on types +placed by LearnAPI.jl. However, we can articulate our model's particular type requirements +with the [`LearnAPI.fit_data_scitype`](@ref) trait; see [Training data types](@ref) below. ## Operations diff --git a/docs/src/index.md b/docs/src/index.md index ac8951f..88a0960 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -13,19 +13,21 @@ implementation, see [Anatomy of an Implementation](@ref). **Quick tour for users of models implementing LearnAPI.jl.** Although primarily intended as a basement-level machine learning interface for developers, users can interact directly -with LearnAPI.jl models, as illustrated [here](@ref workflow). For a more powerful -interface built on top of LearnAPI.jl, see -[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/). +with LearnAPI.jl models, as illustrated [here](@ref workflow). -## Summary +## Approach Machine learning algorithms, also called *models*, have a complicated taxonomy. Grouping models, or modelling tasks, into a relatively small number of types, such as "classifier" and "clusterer", and attempting to impose uniform behaviour within each group, is -challenging. In our experience, it either leads to limitations on the models that can be -included in a general interface, or additional complexity needed to cope with exceptional -cases. Even if a complete user interface for machine learning might benefit from such -groupings, a basement-level API for ML should, in our view, avoid them. +challenging. In our experience developing the [MLJ +ecosystem](https://github.com/alan-turing-institute/MLJ.jl), this either leads to +limitations on the models that can be included in a general interface, or additional +complexity needed to cope with exceptional cases. Even if a complete user interface for +machine learning might benefit from such groupings, a basement-level API for ML should, in +our view, avoid them. + +## Summary LearnAPI.jl is a base interface for machine learning algorithms in which behaviour is articulated using traits. It has no abstract model types, apart from an optional supertype @@ -68,7 +70,7 @@ etc) the kind of prediction can be flagged appropriately; see more at "target" b The Learn API provides methods for training, applying, and saving machine learning models, and that is all. *It does not specify an interface for data access or data resampling*. However, LearnAPI.jl is predicated on a few basic undefined notions (in -boldface) which some higher-level interface might decide to formalize: +**boldface**) which some higher-level interface might decide to formalize: - An object which generates ordered sequences of individual **observations** is called **data**. @@ -110,9 +112,7 @@ the definitive specification of the interface is the [Reference](@ref) section. consulting the guide or reference sections. -**Note.** The Learn API provides a foundation for the higher level "machine" -interface for user interaction in the toolbox -[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) created by the same -developers. However, LearnAPI.jl provided here is meant as a general purpose, -standalone, lightweight API for machine learning algorithms (and has no reference to -machines). +**Note.** In the future, LearnAPI.jl will become the new foundation for the +[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) toolbox created by the same +developers. However, LearnAPI.jl is meant as a general purpose, standalone, lightweight +API for machine learning algorithms (and has no reference to the "machines" used there). diff --git a/docs/src/operations.md b/docs/src/operations.md index 9c9d546..d7c34c5 100644 --- a/docs/src/operations.md +++ b/docs/src/operations.md @@ -58,8 +58,9 @@ inverses. ## Target proxies In the case that a model has the concept of a **target** variable, as described under -[Scope and undefined notions](@ref scope), the output of an operation may have the form of -a proxy for the target, such as a vector of truth-probabilities for binary targets. +[Scope and undefined notions](@ref scope), the output of `predict` or `predict_joint` may +have the form of a proxy for the target, such as a vector of truth-probabilities for +binary targets. We assume the reader is already familiar with the notion of a target variable in supervised learning, but target variables are not limited to supervised models. For @@ -72,9 +73,9 @@ Similarly, the integer labels assigned to some observations by a clustering algo be regarded as a target variable. The labels obtained can be paired with human labels using, say, the Rand index. -The kind of proxy one has for the target in some operation output is informally -classified by a subtype of `LearnAPI.TargetProxy`. These types are intended for dispatch -outside of LearnAPI.jl and have no fields. +The kind of proxy one has is informally classified by a subtype of +`LearnAPI.TargetProxy`. These types are intended for dispatch outside of LearnAPI.jl and +have no fields. | type | form of observations | possible requirement in some external API | |:-------------------------------:|:---------------------|:------------------------------------------| @@ -84,7 +85,7 @@ outside of LearnAPI.jl and have no fields. | `LearnAPI.LogDistribution` | explicit log probability density/mass functions with sample space all possible target observations | Observations implement `Distributions.logpdf` and `Base.rand` | | † `LearnAPI.Probability` | raw numerical probability or probability vector | | | † `LearnAPI.LogProbability` | log probability or log probability vector | | -| † `LearnAPI.Parametric` | a list of parameters describing some distribution | +| † `LearnAPI.Parametric` | a list of parameters (e.g., mean and variance) describing some distribution | | `LearnAPI.Ambiguous` | same form as the (multi-class) target, but with new, unmatched labels of possibly unequal number (as in, e.g., clustering)| | `LearnAPI.AmbiguousSampleable` | sampleable version of `Ambiguous`; see `Sampleable` above | | `LearnAPI.AmbiguousDistribution`| pdf/pmf version of `Ambiguous`; see `Distribution` above | @@ -92,7 +93,9 @@ outside of LearnAPI.jl and have no fields. | `LearnAPI.SurvivalFunction` | survival functions | Observations are single-argument functions mapping `Real` to `Real`. | `LearnAPI.SurvivalDistribution` | probability distribution for survival time | Observations have type `Distributions.ContinuousUnivariateDistribution`. -† Not recommended because of ambiguities in interpretation +> **† MLJ only.** To avoid [ambiguities in +> representation](https://github.com/alan-turing-institute/MLJ.jl/blob/dev/paper/paper.md#a-unified-approach-to-probabilistic-predictions-and-their-evaluation), +> these options are disallowed, in favour of the preceding alternatives. !!! warning From 3bb82450adeb3b8756762e3650dede06f81f1f69 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Wed, 7 Sep 2022 08:04:14 +1200 Subject: [PATCH 09/15] more stuff --- README.md | 11 +--------- docs/src/anatomy_of_an_implementation.md | 6 +++--- docs/src/model_traits.md | 2 +- docs/src/operations.md | 8 +++---- src/LearnAPI.jl | 1 + src/models.jl | 27 +----------------------- 6 files changed, 11 insertions(+), 44 deletions(-) diff --git a/README.md b/README.md index 26fd613..29aa3c5 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,10 @@ # LearnAPI.jl -A Julia interface for training and applying models in machine learning and statistics +A Julia interface for training and applying machine learning models. 🚧 -Hyperlinks in this README.md do not work. | Linux | Coverage | | :------------ | :------- | @@ -13,15 +12,7 @@ Hyperlinks in this README.md do not work. [![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliaai.github.io/LearnAPI.jl/stable/) - **Status.** Proposal, unregistered. -This repository is to provide a general purpose machine learning interface. It is designed -based on experiences of developers of MLJ's [MLJModelInterface.jl]() which it will -eventually replace, but hopes to be useful more generally. The design is not yet fixed and -comments (posted as issues) are welcome. - -To get an idea of the kinds of models we are aiming to indclude, see [Common Implementation -Patterns](). diff --git a/docs/src/anatomy_of_an_implementation.md b/docs/src/anatomy_of_an_implementation.md index 5c6d36c..70c1513 100644 --- a/docs/src/anatomy_of_an_implementation.md +++ b/docs/src/anatomy_of_an_implementation.md @@ -134,14 +134,14 @@ report.feature_importances Another example of an accessor function is `training_losses`. -## Model traits +## [Model traits](@id traits) Our model has a target variable, in the sense outlined in [Scope and undefined notions](@ref scope), and `predict` returns an object with exactly the same form as the target. We indicate this behaviour by declaring ```julia -LearnAPI.target_proxy_kind(::Type{<:MyRidge}) = (; predict=LearnAPI.Target()) +LearnAPI.target_proxy_kind(::Type{<:MyRidge}) = (; predict=LearnAPI.TrueTarget()) ``` More generally, `predict` only returns a *proxy* for the target, such as probability @@ -149,7 +149,7 @@ distributions, and we would make a different declaration here. See [Target proxi for details. `LearnAPI.target_proxy_kind` is an example of a **model trait**. A complete list of traits -and the contracts they imply is given in [Model traits](@ref). +and the contracts they imply is given in [Model Traits](@ref). > **MLJ only.** The values of all traits constitute a model's **metadata**, which is > recorded in the searchable MLJ Model Registry, assuming the implementation-providing diff --git a/docs/src/model_traits.md b/docs/src/model_traits.md index 1e115d8..5d196d5 100644 --- a/docs/src/model_traits.md +++ b/docs/src/model_traits.md @@ -4,7 +4,7 @@ |:-------------------------------------------------|:----------------------|:--------------|:--------| | [`LearnAPI.ismodel`](@ref)`(model)` | `false` | is `true` for any model, as defined in [`Models`](@ref) | `true` | | [`LearnAPI.implemented_methods`](@ref)`(model)` | `()` | lists of all overloaded/implemented methods (traits excluded) | `(:fit, :predict)` | -| [`LearnAPI.target_proxy_kind`](@ref)`(model)` | `()` | details form of target proxy output | `(:predict => LearnAPI.Distribution,)` | +| [`LearnAPI.target_proxy_kind`](@ref)`(model)` | `()` | details form of target proxy output | `(predict= LearnAPI.Distribution,)` | | [`LearnAPI.position_of_target`](@ref)`(model)` | `0` | † the positional index of the **target** in `data` in `fit(..., data...; metadata)` calls | 2 | | [`LearnAPI.position_of_weights`](@ref)`(model)` | `0` | † the positional index of **observation weights** in `data` in `fit(..., data...; metadata)` | 3 | | [`LearnAPI.keywords`](@ref)`(model)` | `()` | lists one or more suggestive model descriptors from `LearnAPI.keywords()` | (:regressor, ) | diff --git a/docs/src/operations.md b/docs/src/operations.md index d7c34c5..b0ceb31 100644 --- a/docs/src/operations.md +++ b/docs/src/operations.md @@ -79,16 +79,16 @@ have no fields. | type | form of observations | possible requirement in some external API | |:-------------------------------:|:---------------------|:------------------------------------------| -| `LearnAPI.Target ` | same as target observations | Observations have same type as target observations. | +| `LearnAPI.TrueTarget` | same as target observations | Observations have same type as target observations. | | `LearnAPI.Sampleable` | objects that can be sampled to obtain objects of the same form as target observations | Each observation implements `Base.rand`. | | `LearnAPI.Distribution` | explicit probability density/mass functions with sample space all possible target observations | Observations implement `Distributions.pdf` and `Base.rand` | | `LearnAPI.LogDistribution` | explicit log probability density/mass functions with sample space all possible target observations | Observations implement `Distributions.logpdf` and `Base.rand` | | † `LearnAPI.Probability` | raw numerical probability or probability vector | | | † `LearnAPI.LogProbability` | log probability or log probability vector | | | † `LearnAPI.Parametric` | a list of parameters (e.g., mean and variance) describing some distribution | -| `LearnAPI.Ambiguous` | same form as the (multi-class) target, but with new, unmatched labels of possibly unequal number (as in, e.g., clustering)| -| `LearnAPI.AmbiguousSampleable` | sampleable version of `Ambiguous`; see `Sampleable` above | -| `LearnAPI.AmbiguousDistribution`| pdf/pmf version of `Ambiguous`; see `Distribution` above | +| `LearnAPI.LabelAmbiguous` | same form as the (multi-class) target, but with new, unmatched labels of possibly unequal number (as in, e.g., clustering)| +| `LearnAPI.LabelAmbiguousSampleable` | sampleable version of `LabelAmbiguous`; see `Sampleable` above | +| `LearnAPI.LabelAmbiguousDistribution`| pdf/pmf version of `LabelAmbiguous`; see `Distribution` above | | `LearnAPI.ConfidenceInterval` | confidence intervals | Each observation `isa Tuple{Real,Real}`. | `LearnAPI.SurvivalFunction` | survival functions | Observations are single-argument functions mapping `Real` to `Real`. | `LearnAPI.SurvivalDistribution` | probability distribution for survival time | Observations have type `Distributions.ContinuousUnivariateDistribution`. diff --git a/src/LearnAPI.jl b/src/LearnAPI.jl index f393769..7bfc10f 100644 --- a/src/LearnAPI.jl +++ b/src/LearnAPI.jl @@ -5,5 +5,6 @@ using Statistics include("models.jl") include("fit_update_ingest.jl") include("operations.jl") +include("model_traits.jl") end diff --git a/src/models.jl b/src/models.jl index 10ba360..5f355b2 100644 --- a/src/models.jl +++ b/src/models.jl @@ -18,29 +18,4 @@ See also [`LearnAPI.ismodel`](@ref). """ abstract type Model <: MLType end -""" - ismodel(m) - -Returns `true` exactly when `m` is a *model*, as defined in the LearnAPI.jl -documentation. In particular, this means: - -- `m` is an object whose properties, as returned by `getproperty(m, field)` for `field in - propertynames(m)`, represent the hyper-parameters of a machine learning algorithm. - -- If `n` is another model, then `m == n` if and only if `typeof(n) == typeof(m)` and - corresponding properties are `==`. - -- `m` correctly implements zero or more methods from LearnAPI.jl. See the LearnAPI.jl - documentation for details. - - -# New model implementations - -Either declare `NewModelType <: LearnAPI.Model` or `LearnAPI.model(::NewModelType) = -true`. - -See also [`LearnAPI.Model`](@ref). - -""" -ismodel(::Any) = false -ismodel(::Model) = true +# See src/model_traits.jl for the `ismodel` trait definition. From fae00808e47e7bfcb56b54c49b0c14d7489c8935 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Wed, 7 Sep 2022 08:10:43 +1200 Subject: [PATCH 10/15] fix doc generation in ci.yml --- .github/workflows/ci.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index dd09a67..ca229e2 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -99,9 +99,9 @@ jobs: julia --project=docs -e ' if ENV["BUILD_DOCS"] == "true" using Documenter: doctest - using MLInterface + using LearnAPI @info "attempting to run the doctests" - doctest(MLInterface) + doctest(LearnAPI) else @info "skipping the doctests" end' From 90fa328596c851cd6a7ac072db54f874ed0c979c Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Wed, 7 Sep 2022 08:11:53 +1200 Subject: [PATCH 11/15] update readme --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 29aa3c5..5ad104f 100644 --- a/README.md +++ b/README.md @@ -10,9 +10,10 @@ A Julia interface for training and applying machine learning models. | :------------ | :------- | | [![Build Status](https://github.com/JuliaAI/LearnAPI.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/LearnAPI.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaAI/LearnAPI.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/LearnAPI.jl?branch=master) | -[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliaai.github.io/LearnAPI.jl/stable/) +**Status.** Proposal, unregistered. See the documentation below for details. + -**Status.** Proposal, unregistered. +[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliaai.github.io/LearnAPI.jl/stable/) From 549974bb25406996695387c7fa9da2bde2ef6722 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Wed, 7 Sep 2022 08:16:25 +1200 Subject: [PATCH 12/15] update readme --- README.md | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 5ad104f..b8d32eb 100644 --- a/README.md +++ b/README.md @@ -6,14 +6,11 @@ A Julia interface for training and applying machine learning models. 🚧 -| Linux | Coverage | -| :------------ | :------- | -| [![Build Status](https://github.com/JuliaAI/LearnAPI.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/LearnAPI.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaAI/LearnAPI.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/LearnAPI.jl?branch=master) | - -**Status.** Proposal, unregistered. See the documentation below for details. - - +[![Build Status](https://github.com/JuliaAI/LearnAPI.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/LearnAPI.jl/actions) +[![Coverage](https://codecov.io/gh/JuliaAI/LearnAPI.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/LearnAPI.jl?branch=master) [![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliaai.github.io/LearnAPI.jl/stable/) +Please refer to the documentation for a detailed preview of what this package proposes to +offer. From 92a51ff9fdc014c8e106cd7b11440e62165051b5 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Wed, 7 Sep 2022 08:19:33 +1200 Subject: [PATCH 13/15] add forgotten file --- src/model_traits.jl | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 src/model_traits.jl diff --git a/src/model_traits.jl b/src/model_traits.jl new file mode 100644 index 0000000..6e77b2d --- /dev/null +++ b/src/model_traits.jl @@ -0,0 +1,26 @@ +""" + ismodel(m) + +Returns `true` exactly when `m` is a *model*, as defined in the LearnAPI.jl +documentation. In particular, this means: + +- `m` is an object whose properties, as returned by `getproperty(m, field)` for `field in + propertynames(m)`, represent the hyper-parameters of a machine learning algorithm. + +- If `n` is another model, then `m == n` if and only if `typeof(n) == typeof(m)` and + corresponding properties are `==`. + +- `m` correctly implements zero or more methods from LearnAPI.jl. See the LearnAPI.jl + documentation for details. + + +# New model implementations + +Either declare `NewModelType <: LearnAPI.Model` or `LearnAPI.model(::NewModelType) = +true`. + +See also [`LearnAPI.Model`](@ref). + +""" +ismodel(::Any) = false +ismodel(::Model) = true From a85f7db43ff7e1401cf6ba557c57a1cbd6698586 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Wed, 7 Sep 2022 08:37:59 +1200 Subject: [PATCH 14/15] trivial commit --- src/LearnAPI.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/LearnAPI.jl b/src/LearnAPI.jl index 7bfc10f..3394ee2 100644 --- a/src/LearnAPI.jl +++ b/src/LearnAPI.jl @@ -6,5 +6,5 @@ include("models.jl") include("fit_update_ingest.jl") include("operations.jl") include("model_traits.jl") - + end From 2013b15218365e5aeba29591afa510fc11351fc4 Mon Sep 17 00:00:00 2001 From: "Anthony D. Blaom" Date: Wed, 7 Sep 2022 09:21:31 +1200 Subject: [PATCH 15/15] try push_preview=true --- docs/make.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/make.jl b/docs/make.jl index 87a44c2..0b6918e 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -22,5 +22,5 @@ makedocs(; deploydocs( ; repo=REPO, devbranch="dev", - push_preview=false, + push_preview=true, )