Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 2.0.0a1 #875

Merged
merged 753 commits into from
Oct 12, 2022
Merged

Version 2.0.0a1 #875

merged 753 commits into from
Oct 12, 2022

Conversation

renesass
Copy link
Collaborator

@renesass renesass commented Jul 19, 2022

Big pull-request to make SMAC more user-friendly.

Documentation

https://automl.github.io/SMAC3/development-2.0/

Todo

  • Adapt facades
    • Black-Box
    • Function: DROPPED
    • Random
    • Hyperband
    • Hyperparameter
    • Multi-Fidelity
    • Algorithm Configuration
  • Replacing ambiguous variables
  • Split-up long files
    • Acquisition functions
    • Acquisition optimizer
    • Runners
    • RunHistory
    • InitialDesign
    • RanomDesign
  • Remove "runtime" optimization completely
  • Remove PSMAC completely (we have dask for parallelization and sweeper plugin)
  • Incorporate pynisher 1.0
  • Write script to update copyrights
  • Clean-up runhistory (get rid of np.ndarray, int as costs, remove adaptive capping compeltely, etc.)
  • Optimize runhistory output format
  • Add option to add custom (evaluated) configs
  • Update minimal example in README and docs
  • Clean scripts folder
  • Return budget = None if not used
  • Signature fix: Check if specific signature is given (if intensifier SH/HB -> fidelity, if instances defined -> instances, seed always)
  • Fix ParEGO (implement an update method and write tests) [Katha]
  • Facade: Put logic into SMBO object and only pass instantiated objects
  • Expand SMBO object with more methods like update_model or update_acquisition_function, n_iteration (current iteration?), etc.
  • Rename target algorithm to target function
  • Force to run initial design anyways (although runhistory is not empty)
  • Add version to get_meta
  • Validation
  • Rename everything to AbstractClass
  • Rename RunKey -> TrialKey and RunValue -> TrialValue
  • Rename get_next_run-> get_next_trial in intensifier run_value -> trial_value and run_info-> trial_info
  • Fix RandomFacade -> RandomDesign is confusing: I changed components to dummy components if they are not used and added a lot of comments to reduce the confusingness
  • It is not clear which signatures the target algorithm should use. For example, if I do MO optimization, my target algorithm can only have the config in the signature although it is depending on the budget.
Warnings should be raised if a signature is missing in a specific use-case.
  • Update CLI
  • Examples (clean-up, minimal docstrings, Grammarly, …)
    • Basics
      • Synthetic Function (just check again)
      • Support Vector Machine with Cross-Validation (just check again)
      • Ask-Tell Interface
      • Custom Callback
      • User Prior
    • Multi-Fidelity and Multi-Instances
      • Multi-Layer Perceptron using multiple Epochs (fix warnings)
      • Stochastic Gradient Descent for multiple Datasets (just check again)
    • Multi-Objective
      • 2D Schaffer Function (just check again)
      • ParEGO + Objective weights
    • BOinG and TurBO
  • Update files (remove unused imports, remove unnecessary comments, complete get_meta method, revise docstrings, fix mypi, add copyright, …)
    • acquisition
      • functions
      • optimizer
    • facade
    • initial_design
    • intensification
    • main
    • model
    • multi_objective
    • random_design
    • runhistory
      • encoders (remove runtime optimization, remove todos in docstrings, get rid of abstract method and put everything into the encoder.py directly?, rename variables so someone knows what happens, rename consider_for_higher_budgets_state, …)
      • runhistory
    • runner [Eddie]
    • utils
    • others (callback, constants, scenario)
  • Update tests (remove unittests completely and adapt to the new syntax)
    • acquisition [Caro]
      • functions
      • optimizer
    • facade
    • initial_design
    • intensification
    • model
    • multi_objective
    • random_design
    • runhistory
    • runner
    • utils
    • callback (run optimization, increase internal variable in each iteration and check if callback was called x-times)
    • scenario
    • continue run (raise exception, if filenames are renamed, etc.)
    • Add ask-tell tests
    • Terminate cost threshold (with and without fidelities)
  • Update documentation
    • Facades 
      • Table (how they differ and what defaults they use, basically expand the table)
      • How to customize facades to your needs
      • Update inheritance
    • Target Function
      • Show examples
      • Say which signatures are needed (maybe even add it to getting started)
    • Features
    • Continuing (basically describe how it works how I did in the changes)
    • RunHistory (RunInfo, RunValue, how you iterate over them, etc.)
    • Multi-Fidelity and Multi-Instances
    • Multi-Objective (how to define objectives and show output formats)
    • Parallelismus (n_workers, what is supported?)
    • Callbacks
    • Logging (passing custom logging.yml to the facade, etc.)
    • Publications
  • Fix known issues
    • When using n_workers > 1 always warnings and errors on end

Required for release candidate:

  • HB/SH seems to be stuck when using more than 1 worker and also ignores walltime limit (just add workers to 2/1 example) -> Seems like it's a mac problem
  • Multi-Tasking für Intensifier?
  • Integrate BOinG + TurBO again
  • Integrate HydraFacade

Discussions / Postponed improvements

  • Continue runs.
    • Many components are depending on states. Idea: Save/load states s.t. the optimization can pick-off where it stopped.
    • Have the option to rerun crashed ones.
    • HyperparameterFacade starts a local search when continuing a run. However, the limit already was exceeded.
    • HB/SH does not work because an incumbent is given and it's stuck in the first stage.
  • Ask-and-Tell Interface
    • Call asks multiple times before calling tell
    • Find a way to incorporate the trials from the user when only using tell. Partially works for intensifier already.
    • It does not make sense to tell SMAC trials in advance when using SH. Reason: It's heavily depending on a budget+instance combination and even if the user provides it, SMAC have to wait till the other trials have been finished too.
  • Constraints instead of imputation?
  • Facade: Build the facade automatically based on the scenario inputs (like if budgets are defined use successivehalving e.g.)
-> AutoFacade?
-> Log which components are used
  • Random_design.check should not be in acquition but in SMBO main loop, since it obleviates the necessity of computing epm and acquition to begin with?
  • Ask and intensification inversion. Currently, the ask method is passed to the intensifier. But ideally, the ask method defines what configuration to be evaluated exactly on what problem instance. This implies the inverse relation: the intensifier should be called from within ask?
  • Intensification is one way of defining a fidelity (as number of problem instances to evaluate on) but it shouldn’t be at the heart of SMAC, since nowadays the multiple dataset optimization is no longer as prominent.
  • 10000 challengers: Never touch the surrogate model again? This usually happens due to the intensification percentage being at 0.5, the model fitting taking quite long, and the functions taking no time. If these three things don't apply, this is indeed an issue
  • _collect_data in smbo.py: Training only on the highest fidelity or mixed fidelity? -> Docs
  • Problem with _get_x_best and instances: Only the config with the lowest cost is used?!
  • Tools to visualize things.

Findings

  • I tried hard to find the reasons why SMAC is not reproducible: The reason is because of the method _get_timebound_for_intensification (influences the time_bound from the process_results method from the intensifier) in base_smbo.py as the time for the calculation is never 100% the same. Setting it to a fixed number results in reproducible results.

Changelog

Big Changes

  • We redesigned the scenario class completely. The scenario is implemented as a dataclass now and holds only environment variables (like limitations or save directory). Everything else was moved to the components directly.
  • We removed runtime optimization completely (no adaptive capping or imputing anymore).
  • We removed the command-line interface and restructured everything alongside. Since SMAC was building upon the command-line interface (especially in combination with the scenario), it was complicated to understand the behavior or find specific implementations. With the removal, we re-wrote everything in python and re-implemented the feature of using scripts as target functions.
  • Introducing trials: Each config/seed/budget/instance calculation is a trial.
  • The configuration chooser is integrated into the SMBO object now. Therefore, SMBO finally implements an ask-tell interface now.
  • Facades are redesigned so that they accept instantiated components directly. If a component is not passed, a default component is used, which is specified for each facade individually in the form of static methods. You can use those static methods directly to adapt a component to your choice.
  • A lot of API changes and renamings (e.g., RandomConfigurationChooser -> RandomDesign, Runhistory2EPM -> RunHistoryEncoder).
  • Ambiguous variables are renamed and unified across files.
  • Dependencies of modules are reduced drastically.
  • We incorporated Pynisher 1.0, which ensures limitations cross-platform.
  • We incorporated ConfigSpace 0.6, which simplified our examples.
  • Examples and documentation are completely reworked. Examples use the new ConfigSpace, and the documentation is adapted to version 2.0.
  • Transparent target function signatures: SMAC checks now explicitly if an argument is available (the required arguments are now specified in the intensifier). If there are more arguments that are not passed by SMAC, a warning is raised.
  • Components implement a meta property now, all of which describe the initial state of SMAC. The facade collects all metadata and saves the initial state of the scenario.
  • Improved multi-objective in general: RunHistory (in addition to RunHistoryEncoder) both incorporates the multi-objective algorithm. In other words, if the multi-objective algorithm changes the output, it directly affects the optimization process.
  • Configspace is saved in json only
  • StatusType is saved as integer and not as dict anymore
  • We changed the behavior of continuing a run:
    • SMAC automatically checks if a scenario was saved earlier. If there exists a scenario and the initial state is the same, SMAC automatically loads the previous data. However, continuing from that run is not possible yet.
    • If there was a scenario earlier, but the initial state is different, then the user is asked to overwrite the run or to still continue the run although the state is different (Note that this only can happen if the name specified in the scenario is the same). Alternatively, an old to the old run is added (e.g., the name was test, it becomes test-old).
    • The initial state of the SMAC run also specifies the name (if no name in the scenario is specified). If the user changes something in the code base or in the scenario, the name and, therefore, the save location automatically changes.

New Features

  • Added a new termination feature: Use terminate_cost_threshold in the scenario to stop the optimization after a configuration was evaluated with a cost lower than the threshold.
  • Callbacks are completely redesigned. Added callbacks to the facade are called in different positions in the Bayesian optimization loop.
  • The multi-objective algorithm MeanAggregationStrategy supports objective weights now.
  • RunHistory got more methods like get_incumbent or get_pareto_front.

Fixes

  • You ever noticed that the third configuration has no origin? It's fixed now.
  • We fixed ParEGO (it updates every time training is performed now).

Optimization Changes

  • Changed initial design behavior
    • You can add additional configurations now.
    • max_ratio will limit both n_configs and n_configs_per_hyperparameter but not additional configurations
    • Reduced default max_ratio to 0.1.

Code Related

  • Converted all unittests to pytests.
  • Instances, seeds, and budgets can be set to none now. However, mixing none and non-none will throw an exception.

@renesass renesass linked an issue Aug 16, 2022 that may be closed by this pull request
@renesass renesass linked an issue Aug 17, 2022 that may be closed by this pull request
2 tasks
@renesass renesass changed the title Development 2.0 Version 2.0.0a1 Oct 11, 2022
changelog.md Outdated Show resolved Hide resolved
changelog.md Outdated Show resolved Hide resolved
changelog.md Outdated Show resolved Hide resolved
@renesass renesass merged commit ca4ffba into main Oct 12, 2022
@renesass renesass deleted the development-2.0 branch October 12, 2022 10:31
github-actions bot pushed a commit that referenced this pull request Oct 12, 2022
@renesass renesass restored the development-2.0 branch October 12, 2022 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation is needed/added. enhancement example feature test Tests are needed/added.
Projects
Status: Done
6 participants