Proposal to add support to allow non-generalizing models to contribute to a machine's report #806

ablaom · 2022-07-12T01:48:13Z

In response to design discussions at JuliaAI/MLJ.jl#950 and JuliaAI/MLJ.jl#852.

Requires:

Add reporting_operations to list of model traits MLJModelInterface.jl#158

Related: JuliaAI/StatisticalTraits.jl#25

To do:

Bump compat for MLJModelInterface
Create issues at MLJ to support new trait and to update the MLJ manual

Context. Some clustering models (eg, DBSCAN) and some imputing models do not generalize to new data. That is, there is no training data - only a transformation determined completely by input data. From the point-of-view of model composition, such models fit most naturally fit into the current Static model typing, which means that fit is a no-op; all the heavy lifting occurs in the transform method. However, at present, only the fit method can contribute a report, which at the level of machines is accessible via report(mach). So extra byproducts of the transformation computation (eg, point types in DBSCAN) cannot be easily exposed to the user.

It is proposed that we add a new model trait reporting_operations that lists those operations (such as :transform) which are understood to return two pieces of information, when called on a model instance: the usual output, and some report data (named tuple) . In those cases, calling the operation on a machine is only to return the output, but the report gets merged into the machine's report.

While the main use-case is Static models, such enhancements could be applied to any model, and such models can be used in composite models (eg, pipelines) with their reports accessible as usual.

The proposal in action

In implementation

mutable struct StaticKefir <: Static
    alpha::Float64 # non-zero to be invertible
end
MLJBase.reporting_operations(::Type{<:StaticKefir}) = (:transform, :inverse_transform)

# piece-wise linear function that is linear only for `alpha=1`:
kefir(x, alpha) = x > 0 ? x * alpha : x / alpha

MLJBase.transform(model::StaticKefir, _, X) = (
    broadcast(kefir, X, model.alpha),
    (; first = first(X)),                          # <-----------------   report component 
)

MLJBase.inverse_transform(model::StaticKefir, _, W) = (
    broadcast(kefir, W, 1/(model.alpha)),
    (; last = last(W)),                          # <-----------------   report component 
)

User workflow

model = StaticKefir(2)
mach = machine(StaticKefir(2))  # remember there is no training data to attach to a machine for a `Static` model
julia> transform(mach, [1, 2, 3])  # no need to `fit!` a `Static` model
3-element Vector{Float64}:
 2.0
 4.0
 6.0

julia> report(mach)
(first = 1,)

julia> inverse_transform(mach, [2, 4, 6])
3-element Vector{Float64}:
 1.0
 2.0
 3.0

julia> report(mach)
(first = 1,
 last = 6,)

If you don't care to see the report, there's the one-liner,

julia> transform(machine(StaticKefir(2)), [1, 2, 3])
3-element Vector{Float64}:
 2.0
 4.0
 6.0

ablaom · 2022-07-12T01:52:28Z

cc @sylvaticus @davnn @juliohm @CameronBieganek

ablaom · 2022-07-12T02:38:18Z

cc @pazzo83

ablaom · 2022-07-14T06:04:34Z

I've now tested this using MLJTestIntegration.jl, which tests integration with wider MLJ ecosystem, and all good.

codecov-commenter · 2022-07-14T07:48:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.70%. Comparing base (ab8d12c) to head (8af858b).
Report is 420 commits behind head on dev.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #806      +/-   ##
==========================================
+ Coverage   85.61%   85.70%   +0.08%     
==========================================
  Files          36       36              
  Lines        3477     3497      +20     
==========================================
+ Hits         2977     2997      +20     
  Misses        500      500

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ablaom added 4 commits July 12, 2022 12:54

get Static models to amend reports where supported. Untested

9d30b0e

tweak

bc043b8

add further support for operations that return a "report" value

a3f9637

rm debugging @show line

54c7d7f

ablaom marked this pull request as draft July 12, 2022 01:52

change format of report field for Composites to fix hackiness

1aa3360

ablaom mentioned this pull request Jul 14, 2022

add Hierarchical Clustering & some docstring fixes JuliaAI/MLJClusteringInterface.jl#9

Merged

bump compat MLJModelInterface = "1.6", StatisticalTraits = "3.2"

8af858b

ablaom marked this pull request as ready for review July 14, 2022 07:28

ablaom mentioned this pull request Jul 14, 2022

Update manual re new reporting_operations trait JuliaAI/MLJ.jl#956

Closed

ablaom merged commit 9b6ef64 into dev Jul 14, 2022

ablaom deleted the operational-reports branch July 14, 2022 08:21

This was referenced Jul 14, 2022

For a 0.20.12 release #808

Merged

Issue to trigger releases #345

Closed

ablaom mentioned this pull request Aug 24, 2022

Add interface for DBSCAN JuliaAI/MLJClusteringInterface.jl#17

Merged

ablaom mentioned this pull request Sep 16, 2022

Add new option for exporting learning networks as stand-alone composite model types #841

Merged

3 tasks

This was referenced Nov 3, 2022

Re-implement TransformedTargetModel as NetworkComposite model #857

Merged

Re-implement pipelines as NetworkComposite models #858

Merged

Latest release breaks MCMCDiagnosticTools #863

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal to add support to allow non-generalizing models to contribute to a machine's report #806

Proposal to add support to allow non-generalizing models to contribute to a machine's report #806

ablaom commented Jul 12, 2022 •

edited

Loading

ablaom commented Jul 12, 2022

ablaom commented Jul 12, 2022

ablaom commented Jul 14, 2022

codecov-commenter commented Jul 14, 2022 •

edited by codecov bot

Loading

Proposal to add support to allow non-generalizing models to contribute to a machine's report #806

Proposal to add support to allow non-generalizing models to contribute to a machine's report #806

Conversation

ablaom commented Jul 12, 2022 • edited Loading

The proposal in action

ablaom commented Jul 12, 2022

ablaom commented Jul 12, 2022

ablaom commented Jul 14, 2022

codecov-commenter commented Jul 14, 2022 • edited by codecov bot Loading

Codecov Report

ablaom commented Jul 12, 2022 •

edited

Loading

codecov-commenter commented Jul 14, 2022 •

edited by codecov bot

Loading