Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use repl language tag for sample #1107

Merged
merged 26 commits into from
May 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
a89faf7
Use repl language tag for sample
abhro Apr 22, 2024
8e45385
Update language tags for code samples
abhro Apr 22, 2024
fddc289
Follow blue style in docs/src/working_with_categorical_data.md
abhro Apr 24, 2024
3d6d15f
Update mlj_cheatsheet.md
abhro Apr 29, 2024
ae28151
Consistenly use @example in common_mlj_workflows.md
abhro Apr 30, 2024
9f274ad
Fix @example namespace in common workflows
abhro May 3, 2024
367db46
Break up predicting transformers into separate @example blocks
abhro May 3, 2024
f86b01b
Use @example instead of pre-built repl sample in learning_networks.md
abhro May 3, 2024
dc71382
Merge branch 'dev' into patch-1
abhro May 3, 2024
ce4bce2
Merge branch 'dev' into patch-1
abhro May 11, 2024
211bcf9
Do mechanical fixes of spacing, semicolons, and punc
abhro May 15, 2024
c7b5d3a
Fix indentation of markdown line
abhro May 15, 2024
925ec42
Move hidden example block to setup
abhro May 15, 2024
2a1202f
Pull code sample into list
abhro May 15, 2024
f8518f4
Use proper markdown lists
abhro May 15, 2024
ad9129b
Use example block for workflows
abhro May 15, 2024
da2e45a
Remove lambdas
abhro May 15, 2024
72f2be2
Use repl blocks for user defined models
abhro May 15, 2024
c24a96b
Use bigger fences for cheatsheet code
abhro May 15, 2024
331bac8
Promote headers in cheatsheet
abhro May 15, 2024
0acc876
Use Clustering.jl instead of ParallelKMeans
abhro May 15, 2024
18e9c9f
Remove unsupported use of info() from cheatsheet
abhro May 15, 2024
739ca21
Remove comments to have not as wide code lines
abhro May 15, 2024
f211322
Add description of data coercion in cheatsheet
abhro May 15, 2024
d079644
Update docs/src/mlj_cheatsheet.md
abhro May 15, 2024
650ebbd
Remove other occurence of `info` on measure
abhro May 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 11 additions & 13 deletions docs/src/about_mlj.md
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# About MLJ

MLJ (Machine Learning in Julia) is a toolbox written in Julia
MLJ (Machine Learning in Julia) is a toolbox written in Julia
providing a common interface and meta-algorithms for selecting,
tuning, evaluating, composing and comparing [over 180 machine learning
models](@ref model_list) written in Julia and other languages. In
Expand All @@ -22,8 +22,7 @@ The first code snippet below creates a new Julia environment
[Installation](@ref) for more on creating a Julia environment for use
with MLJ.

Julia installation instructions are
[here](https://julialang.org/downloads/).
Julia installation instructions are [here](https://julialang.org/downloads/).

```julia
using Pkg
Expand All @@ -44,7 +43,7 @@ Loading and instantiating a gradient tree-boosting model:
using MLJ
Booster = @load EvoTreeRegressor # loads code defining a model type
booster = Booster(max_depth=2) # specify hyper-parameter at construction
booster.nrounds=50 # or mutate afterwards
booster.nrounds = 50 # or mutate afterwards
```

This model is an example of an iterative model. As it stands, the
Expand Down Expand Up @@ -92,7 +91,7 @@ it "self-tuning":
```julia
self_tuning_pipe = TunedModel(model=pipe,
tuning=RandomSearch(),
ranges = max_depth_range,
ranges=max_depth_range,
resampling=CV(nfolds=3, rng=456),
measure=l1,
acceleration=CPUThreads(),
Expand All @@ -105,12 +104,12 @@ Loading a selection of features and labels from the Ames
House Price dataset:

```julia
X, y = @load_reduced_ames;
X, y = @load_reduced_ames
```
Evaluating the "self-tuning" pipeline model's performance using 5-fold
cross-validation (implies multiple layers of nested resampling):

```julia
```julia-repl
julia> evaluate(self_tuning_pipe, X, y,
measures=[l1, l2],
resampling=CV(nfolds=5, rng=123),
Expand Down Expand Up @@ -155,8 +154,7 @@ Extract:

* Consistent interface to handle probabilistic predictions.

* Extensible [tuning
interface](https://github.com/JuliaAI/MLJTuning.jl),
* Extensible [tuning interface](https://github.com/JuliaAI/MLJTuning.jl),
to support a growing number of optimization strategies, and designed
to play well with model composition.

Expand Down Expand Up @@ -229,19 +227,19 @@ installed in a new
[environment](https://julialang.github.io/Pkg.jl/v1/environments/) to
avoid package conflicts. You can do this with

```julia
```julia-repl
julia> using Pkg; Pkg.activate("my_MLJ_env", shared=true)
```

Installing MLJ is also done with the package manager:

```julia
```julia-repl
julia> Pkg.add("MLJ")
```

**Optional:** To test your installation, run

```julia
```julia-repl
julia> Pkg.test("MLJ")
```

Expand All @@ -252,7 +250,7 @@ environment to make model-specific code available. This
happens automatically when you use MLJ's interactive load command
`@iload`, as in

```julia
```julia-repl
julia> Tree = @iload DecisionTreeClassifier # load type
julia> tree = Tree() # instance
```
Expand Down
2 changes: 1 addition & 1 deletion docs/src/adding_models_for_general_use.md
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ suitable for addition to the MLJ Model Registry, consult the [MLJModelInterface.
documentation](https://juliaai.github.io/MLJModelInterface.jl/dev/).

For quick-and-dirty user-defined models see [Simple User Defined
Models](simple_user_defined_models.md).
Models](simple_user_defined_models.md).
Empty file modified docs/src/api.md
100755 → 100644
Empty file.
57 changes: 27 additions & 30 deletions docs/src/common_mlj_workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,31 +23,27 @@ MLJ_VERSION
## Data ingestion

```@setup workflows
# to avoid RDatasets as a doc dependency:
# to avoid RDatasets as a doc dependency, generate synthetic data with
# similar parameters, with the first four rows mimicking the original dataset
# for display purposes
color_off()
import DataFrames
channing = (Sex = rand(["Male","Female"], 462),
Entry = rand(Int, 462),
Exit = rand(Int, 462),
Time = rand(Int, 462),
Cens = rand(Int, 462)) |> DataFrames.DataFrame
channing = (Sex = [repeat(["Male"], 4)..., rand(["Male","Female"], 458)...],
Entry = Int32[782, 1020, 856, 915, rand(733:1140, 458)...],
Exit = Int32[909, 1128, 969, 957, rand(777:1207, 458)...],
Time = Int32[127, 108, 113, 42, rand(0:137, 458)...],
Cens = Int32[1, 1, 1, 1, rand(0:1, 458)...]) |> DataFrames.DataFrame
coerce!(channing, :Sex => Multiclass)
```


```julia
import RDatasets
channing = RDatasets.dataset("boot", "channing")
```

julia> first(channing, 4)
4×5 DataFrame
Row │ Sex Entry Exit Time Cens
│ Cat… Int32 Int32 Int32 Int32
─────┼──────────────────────────────────
1 │ Male 782 909 127 1
2 │ Male 1020 1128 108 1
3 │ Male 856 969 113 1
4 │ Male 915 957 42 1
```@example workflows
first(channing, 4) |> pretty
```

Inspecting metadata, including column scientific types:
Expand All @@ -61,17 +57,17 @@ Horizontally splitting data and shuffling rows.
Here `y` is the `:Exit` column and `X` a table with everything else:

```@example workflows
y, X = unpack(channing, ==(:Exit), rng=123);
y, X = unpack(channing, ==(:Exit), rng=123)
nothing # hide
```

Here `y` is the `:Exit` column and `X` everything else except `:Time`:

```@example workflows
y, X = unpack(channing,
==(:Exit),
!=(:Time);
rng=123);
y, X = unpack(channing,
==(:Exit),
!=(:Time);
rng=123);
scitype(y)
```

Expand Down Expand Up @@ -115,7 +111,7 @@ nothing # hide
Or, if already horizontally split:

```@example workflows
(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.6, multi=true, rng=123)
(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.6, multi=true, rng=123)
```


Expand Down Expand Up @@ -171,7 +167,7 @@ nothing # hide

## Instantiating a model

*Reference:* [Getting Started](@ref), [Loading Model Code](@ref)
*Reference:* [Getting Started](@ref), [Loading Model Code](@ref)

Assumes `MLJDecisionTreeClassifier` is in your environment. Otherwise, try interactive
loading with `@iload`:
Expand All @@ -183,7 +179,7 @@ tree = Tree(min_samples_split=5, max_depth=4)

or

```@julia
```julia
tree = (@load DecisionTreeClassifier)()
tree.min_samples_split = 5
tree.max_depth = 4
Expand All @@ -208,7 +204,7 @@ Do `measures()` to list all losses and scores and their aliases, or refer to the
StatisticalMeasures.jl [docs](https://juliaai.github.io/StatisticalMeasures.jl/dev/).


## Basic fit/evaluate/predict by hand:
## Basic fit/evaluate/predict by hand

*Reference:* [Getting Started](index.md), [Machines](machines.md),
[Evaluating Model Performance](evaluating_model_performance.md), [Performance Measures](performance_measures.md)
Expand Down Expand Up @@ -251,7 +247,7 @@ Note `LogLoss()` has aliases `log_loss` and `cross_entropy`.
Predict on the new data set:

```@example workflows
Xnew = (FL = rand(3), RW = rand(3), CL = rand(3), CW = rand(3), BD =rand(3))
Xnew = (FL = rand(3), RW = rand(3), CL = rand(3), CW = rand(3), BD = rand(3))
predict(mach, Xnew) # a vector of distributions
```

Expand Down Expand Up @@ -379,8 +375,8 @@ z = transform(mach, y);

*Reference:* [Tuning Models](tuning_models.md)

```@example workflows
X, y = @load_iris; nothing # hide
```@setup workflows
X, y = @load_iris
```

Define a model with nested hyperparameters:
Expand Down Expand Up @@ -502,7 +498,7 @@ Tree = @load DecisionTreeRegressor pkg=DecisionTree verbosity=0
tree_with_target = TransformedTargetModel(model=Tree(),
transformer=y -> log.(y),
inverse = z -> exp.(z))
pipe2 = (X -> coerce(X, :age=>Continuous)) |> OneHotEncoder() |> tree_with_target;
pipe2 = (X -> coerce(X, :age=>Continuous)) |> OneHotEncoder() |> tree_with_target
nothing # hide
```

Expand Down Expand Up @@ -538,7 +534,8 @@ curve = learning_curve(mach,

```julia
using Plots
plot(curve.parameter_values, curve.measurements, xlab=curve.parameter_name, xscale=curve.parameter_scale)
plot(curve.parameter_values, curve.measurements,
xlab=curve.parameter_name, xscale=curve.parameter_scale)
```

![](img/workflows_learning_curve.png)
Expand All @@ -558,7 +555,7 @@ curve = learning_curve(mach,

```julia
plot(curve.parameter_values, curve.measurements,
xlab=curve.parameter_name, xscale=curve.parameter_scale)
xlab=curve.parameter_name, xscale=curve.parameter_scale)
```

![](img/workflows_learning_curves.png)
17 changes: 8 additions & 9 deletions docs/src/controlling_iterative_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ control | description
[`TimeLimit`](@ref EarlyStopping.TimeLimit)`(t=0.5)` | Stop after `t` hours | yes
[`NumberLimit`](@ref EarlyStopping.NumberLimit)`(n=100)` | Stop after `n` applications of the control | yes
[`NumberSinceBest`](@ref EarlyStopping.NumberSinceBest)`(n=6)` | Stop when best loss occurred `n` control applications ago | yes
[`InvalidValue`](@ref IterationControl.InvalidValue)() | Stop when `NaN`, `Inf` or `-Inf` loss/training loss encountered | yes
[`InvalidValue`](@ref IterationControl.InvalidValue)() | Stop when `NaN`, `Inf` or `-Inf` loss/training loss encountered | yes
[`Threshold`](@ref EarlyStopping.Threshold)`(value=0.0)` | Stop when `loss < value` | yes
[`GL`](@ref EarlyStopping.GL)`(alpha=2.0)` | † Stop after the "generalization loss (GL)" exceeds `alpha` | yes
[`PQ`](@ref EarlyStopping.PQ)`(alpha=0.75, k=5)` | † Stop after "progress-modified GL" exceeds `alpha` | yes
Expand All @@ -109,15 +109,15 @@ control | description
[`Error`](@ref IterationControl.Error)`(predicate; f="")` | Log to `Error` the value of `f` or `f(mach)`, if `predicate(mach)` holds and then stop | yes
[`Callback`](@ref IterationControl.Callback)`(f=mach->nothing)`| Call `f(mach)` | yes
[`WithNumberDo`](@ref IterationControl.WithNumberDo)`(f=n->@info(n))` | Call `f(n + 1)` where `n` is the number of complete control cycles so far | yes
[`WithIterationsDo`](@ref MLJIteration.WithIterationsDo)`(f=i->@info("iterations: $i"))`| Call `f(i)`, where `i` is total number of iterations | yes
[`WithIterationsDo`](@ref MLJIteration.WithIterationsDo)`(f=i->@info("iterations: $i"))` | Call `f(i)`, where `i` is total number of iterations | yes
[`WithLossDo`](@ref IterationControl.WithLossDo)`(f=x->@info("loss: $x"))` | Call `f(loss)` where `loss` is the current loss | yes
[`WithTrainingLossesDo`](@ref IterationControl.WithTrainingLossesDo)`(f=v->@info(v))` | Call `f(v)` where `v` is the current batch of training losses | yes
[`WithEvaluationDo`](@ref MLJIteration.WithEvaluationDo)`(f->e->@info("evaluation: $e))`| Call `f(e)` where `e` is the current performance evaluation object | yes
[`WithTrainingLossesDo`](@ref IterationControl.WithTrainingLossesDo)`(f=v->@info(v))` | Call `f(v)` where `v` is the current batch of training losses | yes
[`WithEvaluationDo`](@ref MLJIteration.WithEvaluationDo)`(f->e->@info("evaluation: $e))` | Call `f(e)` where `e` is the current performance evaluation object | yes
[`WithFittedParamsDo`](@ref MLJIteration.WithFittedParamsDo)`(f->fp->@info("fitted_params: $fp))`| Call `f(fp)` where `fp` is fitted parameters of training machine | yes
[`WithReportDo`](@ref MLJIteration.WithReportDo)`(f->e->@info("report: $e))`| Call `f(r)` where `r` is the training machine report | yes
[`WithModelDo`](@ref MLJIteration.WithModelDo)`(f->m->@info("model: $m))`| Call `f(m)` where `m` is the model, which may be mutated by `f` | yes
[`WithMachineDo`](@ref MLJIteration.WithMachineDo)`(f->mach->@info("report: $mach))`| Call `f(mach)` wher `mach` is the training machine in its current state | yes
[`Save`](@ref MLJIteration.Save)`(filename="machine.jls")`|Save current training machine to `machine1.jls`, `machine2.jsl`, etc | yes
[`WithReportDo`](@ref MLJIteration.WithReportDo)`(f->e->@info("report: $e))`| Call `f(r)` where `r` is the training machine report | yes
[`WithModelDo`](@ref MLJIteration.WithModelDo)`(f->m->@info("model: $m))`| Call `f(m)` where `m` is the model, which may be mutated by `f` | yes
[`WithMachineDo`](@ref MLJIteration.WithMachineDo)`(f->mach->@info("report: $mach))`| Call `f(mach)` wher `mach` is the training machine in its current state | yes
[`Save`](@ref MLJIteration.Save)`(filename="machine.jls")` | Save current training machine to `machine1.jls`, `machine2.jsl`, etc | yes

> Table 1. Atomic controls. Some advanced options are omitted.

Expand Down Expand Up @@ -253,7 +253,6 @@ In the code, `wrapper` is an object that wraps the training machine
in this example).

```julia

import IterationControl # or MLJ.IterationControl

struct IterateFromList
Expand Down
15 changes: 7 additions & 8 deletions docs/src/evaluating_model_performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ using MLJ
X = (a=rand(12), b=rand(12), c=rand(12));
y = X.a + 2X.b + 0.05*rand(12);
model = (@load RidgeRegressor pkg=MultivariateStats verbosity=0)()
cv=CV(nfolds=3)
cv = CV(nfolds=3)
evaluate(model, X, y, resampling=cv, measure=l2, verbosity=0)
```

Expand All @@ -51,8 +51,8 @@ Multiple measures are specified as a vector:
evaluate!(
mach,
resampling=cv,
measures=[l1, rms, rmslp1],
verbosity=0,
measures=[l1, rms, rmslp1],
verbosity=0,
)
```

Expand All @@ -70,7 +70,7 @@ evaluate!(
mach,
resampling=CV(nfolds=3),
measure=[l2, rsquared],
weights=weights,
weights=weights,
)
```

Expand All @@ -91,12 +91,12 @@ fold1 = 1:6; fold2 = 7:12;
evaluate!(
mach,
resampling = [(fold1, fold2), (fold2, fold1)],
measures=[l1, l2],
verbosity=0,
measures=[l1, l2],
verbosity=0,
)
```

Or the user can define their own re-usable `ResamplingStrategy` objects, - see [Custom
Or the user can define their own re-usable `ResamplingStrategy` objects; see [Custom
resampling strategies](@ref) below.


Expand Down Expand Up @@ -170,4 +170,3 @@ function train_test_pairs(holdout::Holdout, rows)
return [(train, test),]
end
```

Empty file modified docs/src/frequently_asked_questions.md
100755 → 100644
Empty file.
12 changes: 6 additions & 6 deletions docs/src/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@ For an outline of MLJ's **goals** and **features**, see

This page introduces some MLJ basics, assuming some familiarity with
machine learning. For a complete list of other MLJ learning resources,
see [Learning MLJ](@ref).
see [Learning MLJ](@ref).

MLJ collects together the functionality provided by mutliple packages. To learn how to
install components separately, run `using MLJ; @doc MLJ`.

This section introduces only the most basic MLJ operations and
concepts. It assumes MLJ has been successfully installed. See
[Installation](@ref) if this is not the case.
[Installation](@ref) if this is not the case.


```@setup doda
Expand All @@ -31,7 +31,7 @@ column vectors:
```@repl doda
using MLJ
iris = load_iris();
selectrows(iris, 1:3) |> pretty
selectrows(iris, 1:3) |> pretty
schema(iris)
```

Expand Down Expand Up @@ -114,8 +114,8 @@ computing the mode of each prediction):
```@repl doda
evaluate(tree, X, y,
resampling=CV(shuffle=true),
measures=[log_loss, accuracy],
verbosity=0)
measures=[log_loss, accuracy],
verbosity=0)
```

Under the hood, `evaluate` calls lower level functions `predict` or
Expand Down Expand Up @@ -260,7 +260,7 @@ evaluate!(mach, resampling=Holdout(fraction_train=0.7),
Changing a hyperparameter and re-evaluating:

```@repl doda
tree.max_depth = 3
tree.max_depth = 3;
evaluate!(mach, resampling=Holdout(fraction_train=0.7),
measures=[log_loss, accuracy],
verbosity=0)
Expand Down
Loading
Loading