From a2c99991622a0660511d010279ea2cc872ea6dae Mon Sep 17 00:00:00 2001
From: "Anthony D. Blaom" <anthony.blaom@gmail.com>
Date: Thu, 17 Aug 2023 14:36:09 +1200
Subject: [PATCH] polish readme and extend readme example

---
 README.md | 131 +++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 104 insertions(+), 27 deletions(-)

diff --git a/README.md b/README.md
index 1f2eb19..a16846a 100644
--- a/README.md
+++ b/README.md
@@ -6,13 +6,14 @@
 
 [MLJ](https://github.com/alan-turing-institute/MLJ.jl) is a Julia framework for
 combining and tuning machine learning models. MLJFlow is a package that extends
-the MLJ capabilities to use [mlflow](https://mlflow.org/) as a backend for
+the MLJ capabilities to use [MLflow](https://mlflow.org/) as a backend for
 model tracking and experiment management. To be specific, MLJFlow provides a
-close to zero-preparation to use mlflow with MLJ; by the usage of function
-extensions that automate the mlflow cycle (create experiment, create run, log
+close to zero-preparation to use MLflow with MLJ; by the usage of function
+extensions that automate the MLflow cycle (create experiment, create run, log
 metrics, log parameters, log artifacts, etc.).
 
 ## Background
+
 This project is part of the GSoC 2023 program. The proposal description can be
 found [here](https://summerofcode.withgoogle.com/programs/2023/projects/iRxuzeGJ).
 The entire workload is divided into three different repositories:
@@ -20,30 +21,106 @@ The entire workload is divided into three different repositories:
 [MLFlowClient.jl](https://github.com/JuliaAI/MLFlowClient.jl) and this one.
 
 ## Features
-- [x] mlflow cycle automation (create experiment, create run, log metrics, log
-  parameters, log artifacts, etc.)
-- [x] Wrapper type used by MLJ to store mlflow metadata and client instance
-  from MLFlowClient.jl
-- [x] MLJ extended functions to allow mlflow logging
-- [x] Polished compatibility with composed models
-- [ ] Polished compatibility with tuned models
-- [ ] Polished compatibility with iterative models
-
-## Example
-```julia
-# We first define a logger instance, providing the mlflow server address.
-# The experiment name and artifact location are optional.
-logger = MLFlowLogger("http://localhost:5000";
-    experiment_name="MLJFlow tests",
-    artifact_location="./mlj-test")
-
-X, y = make_moons(100) # X is a 100x2 matrix, y is a 100-element vector
-
-# Writing a normal MLJ workflow
+
+- [x] MLflow cycle automation (create experiment, create run, log metrics, log parameters,
+      log artifacts, etc.)
+
+- [x] Provides a wrapper `MLFlowLogger` for MLFlowClient.jl clients and associated
+      metadata; instances of this type are valid "loggers", which can be passed to MLJ
+      functions supporting the `logger` keyword argument.
+	  
+- [x] Provides MLflow integration with MLJ's `evaluate!`/`evaluate` method (model
+      **performance evaluation**)
+
+- [x] Extends MLJ's `MLJ.save` method, to save trained machines as retrievable MLflow
+      client artifacts
+
+- [ ] Provides MLflow integration with MLJ's `TunedModel` wrapper (to log **hyper-parameter
+      tuning** workflows)
+
+- [ ] Provides MLflow integration with MLJ's `IteratedModel` wrapper (to log **controlled
+      iteration** of tree gradient boosters, neural networks, and other iterative models)
+
+- [x] Plays well with **composite models** (pipelines, stacks, etc.)
+
+
+## Examples
+
+### Logging a model performance evaluation
+
+The example below assumes the user is familiar with basic MLflow concepts. We suppose an
+MLflow API service is running on a local server, with address "http://127.0.0.1:5000". (In a
+shell/console, run `mlflow server` to launch an mlflow service on a local server.)
+
+Refer to the [MLflow documentation](https://www.mlflow.org/docs/latest/index.html) for
+necessary background.
+
+In addition to the packages listed on the first line below, we assume
+MLJDecisionTreeClassifier is in the user's active Julia package environment.
+
+```julia
+using MLJBase, MLJFlow, MLJModels
+```
+
+We first define a logger, providing the address of our running MLflow. The experiment
+name and artifact location are optional.
+
+```julia
+logger = MLFlowLogger(
+    "http://127.0.0.1:5000";
+    experiment_name="MLJFlow test",
+    artifact_location="./mlj-test"
+)
+```
+
+Next, grab some synthetic data and choose an MLJ model:
+
+```julia
+X, y = make_moons(100) # a table and a vector with 100 rows
 DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
-dtc_machine = machine(dtc, X, y)
+model = DecisionTreeClassifier(max_depth=4)
+```
+
+Now we call `evaluate` as usual but provide the `logger` as a keyword argument:
+
+```julia
+evaluate(model, X, y, resampling=CV(nfolds=5), measures=[LogLoss(), Accuracy()], logger=logger)
+```
+
+Navigate to "http://127.0.0.1:5000" on your browser and select the "Experiment" matching
+the name above ("MLJFlow test"). Select the single run displayed to see the logged results
+of the performance evaluation.
+
+
+### Saving and retrieving trained machines as MLflow artifacts
+
+Let's train the model on all data and save the trained machine as an MLflow artifact:
+
+```julia
+mach = machine(model, X, y) |> fit!
+run = MLJBase.save(logger, mach)
+```
 
-# Passing the logger to the machine is enough to enable mlflow logging
-e1 = evaluate!(dtc_machine, resampling=CV(),
-    measures=[LogLoss(), Accuracy()], verbosity=1, logger=logger)
+Notice that in this case `MLJBase.save` returns a run (and instance of `MLFlowRun` from
+MLFlowClient.jl). 
+
+To retrieve an artifact we need to use the MLFlowClient.jl API, and for that we need to
+know the MLflow service that our `logger` wraps:
+
+```julia
+	service = MLJFlow.service(logger)  # DOESN'T WORK YET!
+```
+
+And we reconstruct our trained machine thus:
+
+```julia
+using MLFlowClient
+artifacts = MLFlowClient.listartifacts(service, run)
+mach2 = machine(artifact[1].filepath)
+```
+
+We can predict using the deserialized machine:
+
+```julia
+predict(mach2, X)
 ```