Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tuning by stochastic search #37

Closed
ablaom opened this issue Jan 13, 2019 · 6 comments
Closed

Add tuning by stochastic search #37

ablaom opened this issue Jan 13, 2019 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@ablaom
Copy link
Member

ablaom commented Jan 13, 2019

We have a Grid tuning strategy but should add a stochastic tuning strategy Stochastic <: TuningStrategy with a corresponding fit method for TunedModel{Stochastic, <:Model}. The implementer should aquaint themselves with the nested parameter API (see [src/parameters.jl] and [test/parameters.jl]). To this end, I suggest first giving the iterator(::NumericRange, resolution) and iterator(::NominalRange) methods stochastic versions, perhaps by adding with a keyword argument stochastic=true.

@ablaom ablaom added the help wanted Extra attention is needed label Jan 13, 2019
@fkiraly
Copy link
Collaborator

fkiraly commented Jan 13, 2019

would it be worth writing up the design for tuning strategies?
You mention a "best" method next to the obvious "fit/predict".

However, should the best strategy (as per tuning) not be queriable via a "fitted model" interface which has these as parameters? E.g., in a the same (interface) way as you would query coefficients and CI of a linear model.

@ablaom
Copy link
Member Author

ablaom commented Jan 13, 2019

Good point.

The "best strategy" is indeed implemented via a "fitted model" interface, as shown in the README.md. The best method doesn't compute anything, it just retrieves the model (i.e. hyper parameters) that the fitting (=tuning) process determined (and used for fitting the final model to all available data).

If a user wants details about a fit-result (e.g., coefficients of a linear model), then he would seek these in the report field of the corresponding machine. I could do the same here and drop best, no problem.

@ablaom
Copy link
Member Author

ablaom commented Jan 14, 2019

As to writing up design for genetic algorithm: Good idea. Do we have any volunteers?

@ablaom ablaom added the enhancement New feature or request label Feb 5, 2019
@ablaom ablaom assigned ablaom and unassigned tlienart Mar 12, 2020
@ablaom ablaom removed the help wanted Extra attention is needed label Mar 12, 2020
@ablaom
Copy link
Member Author

ablaom commented Mar 12, 2020

Okay, I'm planning to implement this soon. Below is the doc-string for the implementation I am proposing. (Yes, it could be more user-friendly.) Feedback on the proposal very welcome. I plan to start this early next week (16/17 March).

Context:

  • Tuning section of user manual including doc strings for TunedModel (the user's main interface point for scheduling tuning), range, sampler (for wrapping ranges as samplers) and fit(::Univariate, ::ParamRange) (for fitting distributions to a range).

  • The tuning strategy API

Proposed doc-string for random search tuning strategy

RandomSearch(bounded=Distributions.Uniform,
             positive_unbounded=Distributions.Gamma,
             others=Normal,
             rng=Random.GLOBAL_RNG)

Instantiate a random search tuning strategy for searching over
Cartesian hyperparameter domains.

Supported ranges:

  • A single one-dimensional range (ParamRange object) r, or a pair
    of the form (r, d), where d is a probability vector of the same
    length as r.values, if r is a NominalRange, and is otherwise:
    (i) any Distributions.Univariate instance; or (ii) one of the
    subtypes of Distributions.Univariate listed in the table below,
    for automatic fitting using Distributions.fit(d, r).

  • Any vector of objects of the above form

distribution types for fitting to ranges of this type
Arcsine, Uniform, Biweight, Cosine, Epanechnikov, SymTriangularDist, Triweight bounded
Gamma, InverseGaussian, Poisson positive
Normal, Logistic, LogNormal, Cauchy, Gumbel, Laplace any

ParamRange objects are constructed using the range method.

Example range 1:

range(model, :hyper1, lower=1, origin=2, unit=1)

Example range 2:

[(range(model, :hyper1, lower=1, upper=10), Arcsine),
  range(model, :hyper2, lower=2, upper=4),
  (range(model, :hyper2, lower=2, upper=4), Normal(0, 3)),
  range(model, :hyper3, values=[:ball, :tree], [0.3, 0.7])]

Note: All the field values of the ParamRange objects (:hyper1,
:hyper2, :hyper3 in the preceding example) must refer to field
names a of single model (the model specified during TunedModel
construction).

Algorithm

Models for evaulation are generated by sampling each range r using
rng(s) where, s = sampler(r, d). See sampler for details. If d
is not specified, then sampling is uniform (with replacement) in the
case of a NominalRange, and is otherwise given by the defaults
specified by the tuning strategy parameters bounded,
positive_unbounded, and other, depending on the NumericRange
type.

See also TunedModel, range, sampler.

@tlienart
Copy link
Collaborator

tlienart commented Mar 12, 2020

this sounds fantastic. I'm a bit confused by your example, shouldn't the third one be a bounded distr? or do you automatically truncate over the range?

I also wonder a few things:

  1. whether you could do extend the current syntax to pass a distribution or a sampler where a sampler is anything that can be queried and could be user defined
  2. in light of (1) whether you could pass the history ("context") to the sampler
  3. at the moment, unless I misunderstand something, it seems you would sample equally in all dimensions (one configuration = one sample per each HP) this makes sense however I could see interest in sampling more along specific dimensions in which case you may want to pass a number of samples?

Apologies if theses questions are poorly formulated and great work as always

@ablaom
Copy link
Member Author

ablaom commented Mar 12, 2020

@tlienart Thanks for that!

I'm a bit confused by your example, shouldn't the third one be a bounded distr? or do you automatically truncate over the range?

Yes, sampler(r, d) always creates a sampler truncated to the range, but this should be made explicit in the current docstring, thanks.

  1. whether you could do extend the current syntax to pass a distribution or a sampler where a sampler is anything that can be queried and could be user defined

So, instead of passing r or (r, d) I pass (:lambda, s), where s is any sampler? Sounds like a good idea!

  1. in light of (1) whether you could pass the history ("context") to the sampler

Also sounds like a nice idea but a non-trivial API complication. How would the interface for passing context to a sampler look like?

  1. at the moment, unless I misunderstand something, it seems you would sample equally in all dimensions (one configuration = one sample per each HP) this makes sense however I could see interest in sampling more along specific dimensions in which case you may want to pass a number of samples?

I'm not sure I understand the proposal. Are you suggesting that some hyper parameters be sampled less often (ie are kept fixed while others change?). Can you explain a situation where this might be beneficial? (Assuming here that we are not leaving the realm of ordinary random sampling which does not consider history of previous evaluations.) What do you mean by "pass a number of samples"? Or do you mean samplers? Could you give me a little more detail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants