-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing a logging interface for MLFlow #912
Conversation
@pebeto Great work! Good to have this contribution and the ball rolling on this project.
We should adapt this to model structs, right? Or are you planning (nested) model structs -> nested named tuples -> flat named tuples? I suggest double underscores for the compostite names as model property names often have single underscores in them already. |
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## dev #912 +/- ##
==========================================
+ Coverage 87.77% 87.78% +0.01%
==========================================
Files 40 40
Lines 3656 3659 +3
==========================================
+ Hits 3209 3212 +3
Misses 447 447
|
18e60c7
to
7977fd2
Compare
Just starting a review of this great new feature. Let me relate my initial experience, which suggests to me some changes in the user API.
Following this experience, I have some suggestions. Let me know what you think about them:
|
test/extensions/loggers.jl
Outdated
|
||
@testset "save" begin | ||
run = MLJBase.save(logger, mach) | ||
experiment = getexperiment(logger.client, run.info.experiment_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are getting me thinking about how the user should do this.
I accept that we can assume the user is familiar with MLFlowClient.jl API, so run.info.experiment_id
is okay.
But she does not know to get inside logger
, as in logger.client
? So, perhaps we need to add and document a method
client(logger::MLFlowLogger) = logger.client
? And any other access methods the user might need.
Or, maybe better is to overload MLFlowClient.getexperiment
in this way:
"""
getexperiment(logger::MLFlowLogger, run)
Calls `getexperiment(client, run)` where `client` is the client associated with `logger`.
"""
MLFlowClient.getexperiment(logger::MLFlowLogger, run) = MLFlowClient.getexperiment(logger.client, run)
Or we could do both - provide a client
accessor function and do the overloading.
These comments obviously apply to other MLFlowClient methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our MLFlowLogger object will be used just to retrieve the related experiments. Crossing that line, we are just going to create auxiliary functions (like client(logger::MLFlowLogger) = logger.client
that simplifies the calls from MLFlowClient. I don't think this could be a good idea because, by the nature of package extensions, we will import MLFlowClient and the end-user will be able to call anything from that package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, was great to play around with this new feature. Very cool. Thanks a great start on this project! 🎉
In addition to addressing my inline comments, can we please have:
- A unit test for
flat_params
- Some tests on the client side of:
- measures and measurements
- parameters
You may want to upgrade yourConstantClassifier
to something more complicated like,model = Standardizer() |> DecisionTreeClassifier()
to get some nested parameters to check. I'm pretty sure the testing moduleModels
provides these two components.
- The refactor of
Resampler
that we discussed in our July 18th call. - serialisation and deserialization of a machine as artifact.
And I think we need to have a call to discuss that last one.
…t inside representation
@ablaom, I'm moving the extension into an external package that will be a dependency of MLJ.jl instead of MLJBase. MLJFlow.jl will contain all the required logic related to the connection between all the MLJ dependencies and MLFlow (via MLFlowClient.jl). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you're still working on this. I don't see a call to log_evaluation
in any of the evaluate!
methods, as discussed on slack, and copied below:
# In MLJBase:
log_evaluation(logger::Nothing, e) = nothing # (publicly extendable method)
function evaluate!(..., logger)
<create evaluation object `e`>
log_evaluation(logger, e)
end
# In MLJFlow.jl:
struct MLFlowLogger
...
end
MLJBase.log_evaluation(logger::MLFlowLogger, e)
<log to the client in `logger`; you can find everything you need in `e`>
end
Also, I no longer see the And I don't see any refactoring of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. We still have the refactoring of Resampler
to go, yes?
This implementation has changed over time. Closing this PR in favor of #925 |
In support of this proposal, this PR shows the possibility to adapt a logging platform to MLJBase functionality. This completely modular implementation allows the creation and addition of multiple logging interfaces in the future by the use of package extensions. This work includes a way to connect MLJ with MLFlow, and opens new opportunities with other platforms (i.e. Neptune, SageMaker, etc...). The changes will be listed below:
save
function extension that logs the model hyperparameters and saves it into the experiment artifact location.evaluate!
function extension that saves the model hyperparameters and measures. This implementation is a mockup that will be changed in the future.Additional information: