Implementing a logging interface for MLFlow #912

pebeto · 2023-07-01T12:58:46Z

In support of this proposal, this PR shows the possibility to adapt a logging platform to MLJBase functionality. This completely modular implementation allows the creation and addition of multiple logging interfaces in the future by the use of package extensions. This work includes a way to connect MLJ with MLFlow, and opens new opportunities with other platforms (i.e. Neptune, SageMaker, etc...). The changes will be listed below:

A general package extension named LoggersExt, that allows the possibility to extend main methods functionality and loading desired behaviors dynamically
A minimal constructor named MLFlowLogger inside MLJBase source that will be extended when MLFlowClient is loaded, bringing a MLFlowInstance struct with the configurations needed by the user.
A save function extension that logs the model hyperparameters and saves it into the experiment artifact location.
An evaluate! function extension that saves the model hyperparameters and measures. This implementation is a mockup that will be changed in the future.
A new way to flat parameters from a NamedTuple, preserving names as key trees.

julia> t = (X = (x = 1, y = 2), Y = 3)
julia> flat_params(t)
LittleDict{...} with 3 entries:
"X_x" => 1
"X_y" => 2
"Y"   => 3

Save resampling strategy and it's parameters (as run tags)
Adapt the functionality for composite models
Adapt the functionality for tuned models
Adapt the functionality for iterative models

Additional information:

ablaom · 2023-07-03T04:19:59Z

@pebeto Great work! Good to have this contribution and the ball rolling on this project.

A new way to flat parameters from a NamedTuple, preserving names as key trees.

We should adapt this to model structs, right? Or are you planning (nested) model structs -> nested named tuples -> flat named tuples? I suggest double underscores for the compostite names as model property names often have single underscores in them already.

codecov-commenter · 2023-07-04T00:59:40Z

Codecov Report

Merging #912 (c60140d) into dev (f808e98) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##              dev     #912      +/-   ##
==========================================
+ Coverage   87.77%   87.78%   +0.01%     
==========================================
  Files          40       40              
  Lines        3656     3659       +3     
==========================================
+ Hits         3209     3212       +3     
  Misses        447      447

Files Changed	Coverage Δ
src/composition/models/stacking.jl	`95.32% <ø> (ø)`
src/resampling.jl	`91.50% <100.00%> (+0.06%)`	⬆️

src/resampling.jl

src/loggers.jl

ablaom · 2023-07-19T22:13:57Z

Just starting a review of this great new feature. Let me relate my initial experience, which suggests to me some changes in the user API.

I first looked at the evaluate! doc-strings to see clues of how to incorporate logging, but because the Logger ref is dead, I could not discover the mlflow_logger constructor I needed to learn about. (Found it from the code.)
I knew (only because you have explained it to me) that I must install and launch the (python) mlflow application. I wasn't exactly sure how to launch it, but the mlflow docs seemed to suggest entering > mlflow ui would work, so I did that. A url appeared in the log and I successfully navigated my browser to that location, "http://127.0.0.1:5000". I know launching the application is a system specific operation; still, an concrete example of how to do this on a linux-like machine would have been helpful in the mlflow_logger docstring.
I used the default constructor logger=mlflow_logger(), which created baseuri = "localhost:5000". I then tried to use evaluate! with logger=logger and I got a long intimidating unhelpful error. Here's a part of it:

ERROR: HTTP.Exceptions.StatusError(403, "GET", "/api/2.0/mlflow/experiments/get-by-name?experiment_name=MLJ%20experiments", HTTP.Messages.Response:
"""
HTTP/1.1 403 Forbidden
Content-Length: 0
Server: AirTunes/620.8.2
""")
Stacktrace:
 [1] getexperiment(mlf::MLFlow, experiment_name::String)
@ MLFlowClient ~/.julia/packages/MLFlowClient/yjzJX/src/experiments.jl:71

I eventually figure out that I need to explicitly specify the baseuri given to me by the mlflow application and that worked (after I figured out that the kwarg name is base_uri, not baseuri). Then all was well.

Following this experience, I have some suggestions. Let me know what you think about them:

Make baseuri a compulsory positional argument of mlflow_logger. This is less convenient if localhost:5000 happens to work for a user, but it didn't in my case. Also, logically, since the name of server is really controlled by the mlflow app, it seems to me that logically the MLJ user needs to specify this information. Also, I think a common gotcha for users will be not realising they need to launch mlflow manually. Having to specify the uri forces them to confront that issue.
If it is possible, catch invalidly set baseuri values and throw a more helpful error message. If this is very difficult to do, then we can use throw/catch construct (always a method of last resort, in my view, but I can imagine it's hard to say what is a "valid" uri on the Julia side.)

Project.toml

ext/LoggersExt/LoggersExt.jl

ext/LoggersExt/mlflow.jl

ext/LoggersExt/utils.jl

src/resampling.jl

src/utilities.jl

ext/LoggersExt/mlflow.jl

test/extensions/loggers.jl

ablaom · 2023-07-20T00:53:46Z

test/extensions/loggers.jl

+
+        @testset "save" begin
+            run = MLJBase.save(logger, mach)
+            experiment = getexperiment(logger.client, run.info.experiment_id)


These tests are getting me thinking about how the user should do this.

I accept that we can assume the user is familiar with MLFlowClient.jl API, so run.info.experiment_id is okay.
But she does not know to get inside logger, as in logger.client? So, perhaps we need to add and document a method
client(logger::MLFlowLogger) = logger.client? And any other access methods the user might need.

Or, maybe better is to overload MLFlowClient.getexperiment in this way:

""" getexperiment(logger::MLFlowLogger, run) Calls `getexperiment(client, run)` where `client` is the client associated with `logger`. """ MLFlowClient.getexperiment(logger::MLFlowLogger, run) = MLFlowClient.getexperiment(logger.client, run)

Or we could do both - provide a client accessor function and do the overloading.

These comments obviously apply to other MLFlowClient methods.

Our MLFlowLogger object will be used just to retrieve the related experiments. Crossing that line, we are just going to create auxiliary functions (like client(logger::MLFlowLogger) = logger.client that simplifies the calls from MLFlowClient. I don't think this could be a good idea because, by the nature of package extensions, we will import MLFlowClient and the end-user will be able to call anything from that package.

test/extensions/loggers.jl

ablaom

Okay, was great to play around with this new feature. Very cool. Thanks a great start on this project! 🎉

In addition to addressing my inline comments, can we please have:

A unit test for flat_params
Some tests on the client side of:
- measures and measurements
- parameters
  You may want to upgrade your ConstantClassifier to something more complicated like, model = Standardizer() |> DecisionTreeClassifier() to get some nested parameters to check. I'm pretty sure the testing module Models provides these two components.
The refactor of Resampler that we discussed in our July 18th call.
serialisation and deserialization of a machine as artifact.

And I think we need to have a call to discuss that last one.

…I#916

…t inside representation

pebeto · 2023-08-06T03:37:49Z

@ablaom, I'm moving the extension into an external package that will be a dependency of MLJ.jl instead of MLJBase. MLJFlow.jl will contain all the required logic related to the connection between all the MLJ dependencies and MLFlow (via MLFlowClient.jl).
This PR now contains the generalization of the methods that will be used by different logging platforms.

…aluation report

src/MLJBase.jl

src/resampling.jl

ablaom

I guess you're still working on this. I don't see a call to log_evaluation in any of the evaluate! methods, as discussed on slack, and copied below:

# In MLJBase:
log_evaluation(logger::Nothing, e) = nothing # (publicly extendable method)
function evaluate!(..., logger)
    <create evaluation object `e`>
    log_evaluation(logger, e)
end

# In MLJFlow.jl:
struct MLFlowLogger
    ...
end

MLJBase.log_evaluation(logger::MLFlowLogger, e)
    <log to the client in `logger`; you can find everything you need in `e`>
end

ablaom · 2023-08-08T21:49:29Z

Also, I no longer see the logger kwarg additions to evaluate/evaluate! (with the nothing fallback values).

And I don't see any refactoring of Resampler, as previously discussed.

…g_evaluation

ablaom

Looking good. We still have the refactoring of Resampler to go, yes?

pebeto · 2023-08-16T15:38:41Z

This implementation has changed over time. Closing this PR in favor of #925

pebeto marked this pull request as draft July 3, 2023 01:23

ablaom mentioned this pull request Jul 3, 2023

Proposal for mlflow integration JuliaAI/MLJ.jl#1029

Open

pebeto force-pushed the dev branch 2 times, most recently from 18e60c7 to 7977fd2 Compare July 17, 2023 07:15

pebeto added the WIP label Jul 18, 2023

ablaom marked this pull request as ready for review July 19, 2023 21:05

ablaom marked this pull request as draft July 19, 2023 21:06