Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docstring patches #983

Merged
merged 12 commits into from
Jul 2, 2024
2 changes: 1 addition & 1 deletion src/composition/learning_networks/nodes.jl
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ function _formula(stream, X::Node, depth, indent)
if X.machine !== nothing
print(stream, crind(indent + length(operation_name) - anti))
printstyled(IOContext(stream, :color=>SHOW_COLOR[]),
# handle(X.machine),
#handle(X.machine),
X.machine,
bold=SHOW_COLOR[])
n_args == 0 || print(stream, ", ")
Expand Down
3 changes: 2 additions & 1 deletion src/composition/learning_networks/signatures.jl
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,8 @@ See also [`MLJBase.Signature`](@ref).
"""
fitted_params_supplement(signature::Signature) = call_and_copy(fitted_params_nodes(signature))

""" report(signature; supplement=true)
"""
report(signature; supplement=true)

**Private method.**

Expand Down
2 changes: 1 addition & 1 deletion src/composition/models/pipelines.jl
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ or what `transform` returns if it is `Unsupervised`.
Names for the component fields are automatically generated unless
explicitly specified, as in

```
```julia
Pipeline(encoder=ContinuousEncoder(drop_last=false),
stand=Standardizer())
```
Expand Down
10 changes: 6 additions & 4 deletions src/data/data.jl
Original file line number Diff line number Diff line change
Expand Up @@ -401,12 +401,18 @@ _isnan(x::Number) = isnan(x)

skipnan(x) = Iterators.filter(!_isnan, x)

isinvalid(x) = ismissing(x) || _isnan(x)

"""
skipinvalid(itr)

Return an iterator over the elements in `itr` skipping `missing` and
`NaN` values. Behaviour is similar to [`skipmissing`](@ref).

"""
skipinvalid(v) = v |> skipmissing |> skipnan

"""
skipinvalid(A, B)

For vectors `A` and `B` of the same length, return a tuple of vectors
Expand All @@ -417,10 +423,6 @@ always returns a vector. Does not remove `Missing` from the element
types if present in the original iterators.

"""
skipinvalid(v) = v |> skipmissing |> skipnan

isinvalid(x) = ismissing(x) || _isnan(x)

function skipinvalid(yhat, y)
mask = .!(isinvalid.(yhat) .| isinvalid.(y))
return yhat[mask], y[mask]
Expand Down
9 changes: 5 additions & 4 deletions src/data/datasets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ function load_smarket()
end

"""Load a well-known sunspot time series (table with one column).
[https://www.sws.bom.gov.au/Educational/2/3/6]](https://www.sws.bom.gov.au/Educational/2/3/6)
<https://www.sws.bom.gov.au/Educational/2/3/6>
"""
load_sunspots() = load_dataset("sunspots.csv", COERCE_SUNSPOTS)

Expand Down Expand Up @@ -250,9 +250,10 @@ macro load_crabs()
end
end

""" Load S&P Stock Market dataset, as used in (An Introduction to
Statistical Learning with applications in
R)[https://rdrr.io/cran/ISLR/man/Smarket.html](https://rdrr.io/cran/ISLR/man/Smarket.html),
"""
Load S&P Stock Market dataset, as used in
[An Introduction to Statistical Learning with applications in
R](https://rdrr.io/cran/ISLR/man/Smarket.html),
by Witten et al (2013), Springer-Verlag, New York."""
macro load_smarket()
quote
Expand Down
22 changes: 13 additions & 9 deletions src/data/datasets_synthetic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ Internal function to finalize the `make_*` functions.
function finalize_Xy(X, y, shuffle, as_table, eltype, rng; clf::Bool=true)
# Shuffle the rows if required
if shuffle
X, y = shuffle_rows(X, y; rng=rng)
end
if eltype != Float64
X = convert.(eltype, X)
end
# return as matrix if as_table=false
X, y = shuffle_rows(X, y; rng=rng)
end
if eltype != Float64
X = convert.(eltype, X)
end
# return as matrix if as_table=false
as_table || return X, y
clf && return MLJBase.table(X), categorical(y)
if length(size(y)) > 1
Expand Down Expand Up @@ -172,7 +172,6 @@ membership to the smaller or larger circle, respectively.
* `noise=0`: standard deviation of the Gaussian noise added to the data,

* `factor=0.8`: ratio of the smaller radius over the larger one,

$(EXTRA_KW_MAKE*EXTRA_CLASSIFICATION)

### Example
Expand Down Expand Up @@ -318,7 +317,12 @@ Make portion `s` of vector `θ` exactly 0.
"""
sparsify!(rng, θ, s) = (θ .*= (rand(rng, length(θ)) .< s))

"""Add outliers to portion s of vector."""
"""
outlify!(rng, y, s)

Add outliers to portion `s` of vector.
abhro marked this conversation as resolved.
Show resolved Hide resolved

"""
outlify!(rng, y, s) =
(n = length(y); y .+= 20 * randn(rng, n) .* (rand(rng, n) .< s))

Expand All @@ -329,7 +333,7 @@ const SIGMOID_32 = log(Float32(1)/eps(Float32) - Float32(1))
sigmoid(x)

Return the sigmoid computed in a numerically stable way:
``σ(x) = 1/(1+exp(-x))``
``σ(x) = 1/(1+\\exp(-x))``

"""
function sigmoid(x::Float64)
Expand Down
50 changes: 25 additions & 25 deletions src/hyperparam/one_dimensional_range_methods.jl
Original file line number Diff line number Diff line change
Expand Up @@ -66,31 +66,31 @@ In the first case iteration is over all `values` stored in the range
iteration is over approximately `n` ordered values, generated as
follows:

(i) First, exactly `n` values are generated between `U` and `L`, with a
spacing determined by `r.scale` (uniform if `scale=:linear`) where `U`
and `L` are given by the following table:

| `r.lower` | `r.upper` | `L` | `U` |
|-------------|------------|---------------------|---------------------|
| finite | finite | `r.lower` | `r.upper` |
| `-Inf` | finite | `r.upper - 2r.unit` | `r.upper` |
| finite | `Inf` | `r.lower` | `r.lower + 2r.unit` |
| `-Inf` | `Inf` | `r.origin - r.unit` | `r.origin + r.unit` |

(ii) If a callable `f` is provided as `scale`, then a uniform spacing
is always applied in (i) but `f` is broadcast over the results. (Unlike
ordinary scales, this alters the effective range of values generated,
instead of just altering the spacing.)

(iii) If `r` is a discrete numeric range (`r isa NumericRange{<:Integer}`)
then the values are additionally rounded, with any duplicate values
removed. Otherwise all the values are used (and there are exacltly `n`
of them).

(iv) Finally, if a random number generator `rng` is specified, then the values are
returned in random order (sampling without replacement), and otherwise
they are returned in numeric order, or in the order provided to the
range constructor, in the case of a `NominalRange`.
1. First, exactly `n` values are generated between `U` and `L`, with a
spacing determined by `r.scale` (uniform if `scale=:linear`) where `U`
and `L` are given by the following table:

| `r.lower` | `r.upper` | `L` | `U` |
|-------------|------------|---------------------|---------------------|
| finite | finite | `r.lower` | `r.upper` |
| `-Inf` | finite | `r.upper - 2r.unit` | `r.upper` |
| finite | `Inf` | `r.lower` | `r.lower + 2r.unit` |
| `-Inf` | `Inf` | `r.origin - r.unit` | `r.origin + r.unit` |

2. If a callable `f` is provided as `scale`, then a uniform spacing
is always applied in (1) but `f` is broadcast over the results. (Unlike
ordinary scales, this alters the effective range of values generated,
instead of just altering the spacing.)

3. If `r` is a discrete numeric range (`r isa NumericRange{<:Integer}`)
then the values are additionally rounded, with any duplicate values
removed. Otherwise all the values are used (and there are exacltly `n`
of them).

4. Finally, if a random number generator `rng` is specified, then the values are
returned in random order (sampling without replacement), and otherwise
they are returned in numeric order, or in the order provided to the
range constructor, in the case of a `NominalRange`.

"""
iterator(rng::AbstractRNG, r::ParamRange, args...) =
Expand Down
46 changes: 23 additions & 23 deletions src/machines.jl
Original file line number Diff line number Diff line change
Expand Up @@ -529,7 +529,7 @@ err_missing_model(model) = ErrorException(
)

"""
last_model(mach::Machine)
last_model(mach::Machine)

Return the last model used to train the machine `mach`. This is a bona fide model, even if
`mach.model` is a symbol.
Expand Down Expand Up @@ -572,31 +572,31 @@ the true model given by `getproperty(composite, model)`. See also [`machine`](@r
For the action to be a no-operation, either `mach.frozen == true` or
or none of the following apply:

- (i) `mach` has never been trained (`mach.state == 0`).
1. `mach` has never been trained (`mach.state == 0`).

- (ii) `force == true`.
2. `force == true`.

- (iii) The `state` of some other machine on which `mach` depends has
changed since the last time `mach` was trained (ie, the last time
`mach.state` was last incremented).
3. The `state` of some other machine on which `mach` depends has
changed since the last time `mach` was trained (ie, the last time
`mach.state` was last incremented).

- (iv) The specified `rows` have changed since the last retraining and
`mach.model` does not have `Static` type.
4. The specified `rows` have changed since the last retraining and
`mach.model` does not have `Static` type.
ablaom marked this conversation as resolved.
Show resolved Hide resolved

- (v) `mach.model` is a model and different from the last model used for training, but has
the same type.
5. `mach.model` is a model and different from the last model used for training, but has
the same type.

- (vi) `mach.model` is a model but has a type different from the last model used for
training.
6. `mach.model` is a model but has a type different from the last model used for
training.

- (vii) `mach.model` is a symbol and `(composite, mach.model)` is different from the last
model used for training, but has the same type.
7. `mach.model` is a symbol and `(composite, mach.model)` is different from the last
model used for training, but has the same type.

- (viii) `mach.model` is a symbol and `(composite, mach.model)` has a different type from
the last model used for training.
8. `mach.model` is a symbol and `(composite, mach.model)` has a different type from
the last model used for training.

In any of the cases (i) - (iv), (vi), or (viii), `mach` is trained ab initio. If (v) or
(vii) is true, then a training update is applied.
In any of the cases (1) - (4), (6), or (8), `mach` is trained ab initio.
If (5) or (7) is true, then a training update is applied.

To freeze or unfreeze `mach`, use `freeze!(mach)` or `thaw!(mach)`.

Expand Down Expand Up @@ -1044,9 +1044,10 @@ To serialise using a different format, see [`serializable`](@ref).
Machines are deserialized using the `machine` constructor as shown in
the example below.

> The implementation of `save` for machines changed in MLJ 0.18
> (MLJBase 0.20). You can only restore a machine saved using older
> versions of MLJ using an older version.
!!! note
The implementation of `save` for machines changed in MLJ 0.18
(MLJBase 0.20). You can only restore a machine saved using older
versions of MLJ using an older version.

### Example

Expand All @@ -1073,8 +1074,7 @@ predict(predict_only_mach, X)
general purpose serialization formats, can allow for arbitrary code
execution during loading. This means it is possible for someone
to use a JLS file that looks like a serialized MLJ machine as a
[Trojan
horse](https://en.wikipedia.org/wiki/Trojan_horse_(computing)).
[Trojan horse](https://en.wikipedia.org/wiki/Trojan_horse_(computing)).

See also [`serializable`](@ref), [`machine`](@ref).

Expand Down
27 changes: 14 additions & 13 deletions src/resampling.jl
Original file line number Diff line number Diff line change
Expand Up @@ -536,8 +536,8 @@ and the corresponding estimates, aggregated over all train/test pairs, are recor
When displayed, a `PerformanceEvaluation` object includes a value under the heading
`1.96*SE`, derived from the standard error of the `per_fold` entries. This value is
suitable for constructing a formal 95% confidence interval for the given
`measurement`. Such intervals should be interpreted with caution. See, for example, Bates
et al. [(2021)](https://arxiv.org/abs/2104.00673).
`measurement`. Such intervals should be interpreted with caution. See, for example, [Bates
et al. (2021)](https://arxiv.org/abs/2104.00673).

### Fields

Expand Down Expand Up @@ -752,15 +752,15 @@ Base.show(io::IO, e::CompactPerformanceEvaluation) =
## USER CONTROL OF DEFAULT LOGGING

const DOC_DEFAULT_LOGGER =
"""
"""

The default logger is used in calls to [`evaluate!`](@ref) and [`evaluate`](@ref), and
in the constructors `TunedModel` and `IteratedModel`, unless the `logger` keyword is
explicitly specified.
The default logger is used in calls to [`evaluate!`](@ref) and [`evaluate`](@ref), and
in the constructors `TunedModel` and `IteratedModel`, unless the `logger` keyword is
explicitly specified.

!!! note
!!! note

Prior to MLJ v0.20.7 (and MLJBase 1.5) the default logger was always `nothing`.
Prior to MLJ v0.20.7 (and MLJBase 1.5) the default logger was always `nothing`.

"""

Expand All @@ -772,8 +772,8 @@ tracking platforms, such as [MLflow](https://mlflow.org/docs/latest/index.html).

$DOC_DEFAULT_LOGGER

When MLJBase is first loaded, the default logger is `nothing`. To reset the logger, see
beow.
When MLJBase is first loaded, the default logger is `nothing`. To reset the logger, see
below.

"""
default_logger() = DEFAULT_LOGGER[]
Expand All @@ -790,14 +790,15 @@ on a local server at `http://127.0.0.1:500`. Then every in every `evaluate` call
`logger` is not specified, as in the example below, the peformance evaluation is
automatically logged to the service.

```julia-repl
```julia
using MLJ
logger = MLJFlow.Logger("http://127.0.0.1:5000/api")
default_logger(logger)

X, y = make_moons()
model = ConstantClassifier()
evaluate(model, X, y, measures=[log_loss, accuracy)])
```

"""
function default_logger(logger)
Expand Down Expand Up @@ -1073,8 +1074,8 @@ instance of one of these, then a vector of tuples of the form `(train_rows, test
is expected. For example, setting

```julia
resampling = [((1:100), (101:200)),
((101:200), (1:100))]
resampling = [(1:100, 101:200),
(101:200, 1:100)]
```

gives two-fold cross-validation using the first 200 rows of data.
Expand Down
Loading
Loading